This essay goals to debate the event of the word2vec and GloVe algorithms because it pertains to a secondary purpose for which these algorithms have been applied: the evaluation of concepts contained inside text corpora. First, the word2vec algorithm is discussed in light of its historical context. Then, the analogy-completion task that highlighted the potential of the semantic arithmetic possible with word2vec embeddings is described. Finally, the event of the GloVe algorithm is contrasted with the word2vec algorithm.
The word2vec algorithm (Mikolov et al., 2013a) combines two major technical insights: (1) continuous vectors could be used to represent semantic information (2) and the inner representations learned by neural networks are conceptually meaningful. When the algorithm was introduced in 2013, nonetheless, neither the continual representation of semantic information nor the conceptual value of internal representations were recent ideas. More specifically, in the data retrieval space, latent semantic evaluation (LSA; Deerwester et al., 1990) and latent Dirichlet allocation (Blei et al., 2003) were proposed as statistical methods that leverage the semantic information latent in texts to enhance upon methods that treated words as indexical features (that exist…
