• Mention doc2vec in package description.
  • Add perplexity() to asses models’ the goodness-of-fit.
  • Save quanteda’s internal docvars in the textmodel_doc2vec objects.
  • Add group to as.matrix() to average sentence or paragraph vectors from the same documents.
  • Upgrade textmodel_doc2vec to train the distributed memory (DM) and distributed bag-of-word (DBOW) models.
  • Add as.textmodel_doc2vec() to create document vectors as weighted average of word vectors.
  • Add layer to as.matrix() to choose between word or document vectors.
  • normalize is now defunct in textmodel_word2vec().
  • Add the tolower argument and set to TRUE to lower-case tokens.
  • Allow x to be quanteda’s tokens_xptr object to enhance efficiency.
  • Save docvars in the textmodel_doc2vec objects.
  • Set zero for empty documents in the textmodel_doc2vec objects.
  • Add probability() to compute probability of words.
  • Rename word2vec(), doc2vec() and lsa() to textmodel_word2vec(), textmodel_doc2vec() and textmodel_lsa() respectively.
  • Simplify the C++ code to make maintenance easier.
  • Add normalize to word2vec to disable or enable word vector normalization.
  • Add weights() to extract back-propagation weights.
  • Make analogy() to convert a formula to named character vector.
  • Improve the stability of word2vec() when verbose = TRUE.
  • Fork https://github.com/bnosac/word2vec and change the package name to wordvector.
  • Replace a list of character with quanteda’s tokens object as an input object.
  • Recreate word2vec() with new argument names and object structures.
  • Create lda() to train word vectors using Latent Semantic Analysis.
  • Add similarity() and analogy() functions using proxyC.
  • Add data_corpus_news2014 that contain 20,000 news summaries as package data.