Changelog • wordvector

Add layer to perplexity() for textmodel_doc2vec models.
Save document lengths as ntoken in trained textmodel_doc2vec models.
Update as.textmode_doc2vec() to save output layer weights.
Update tests for quanteda v4.4.0.

Mention doc2vec in package description.
Add perplexity() to asses models’ the goodness-of-fit to data.
Save quanteda’s internal docvars in the textmodel_doc2vec objects.
Add group to as.matrix() to average sentence or paragraph vectors from the same documents.

Upgrade textmodel_doc2vec to train the distributed memory (DM) and distributed bag-of-word (DBOW) models.
Add as.textmodel_doc2vec() to create document vectors as weighted average of word vectors.
Add layer to as.matrix() to choose between word or document vectors.
normalize is now defunct in textmodel_word2vec().

Add normalize to textmodel_doc2vec() and pass it to as.matrix().
Add weights to textmodel_doc2vec() to adjust the salience of words in the document vectors.
Add include_data to textmodel_word2vec() to save the original tokens object.

Add the model argument to textmodel_word2vec() to update existing models.
The normalize argument is moved from textmodel_word2vec() to as.matrix(). The original argument is deprecated and set to FALSE by default.
Remove weights().
Improve the structure of C++ code.

Add the tolower argument and set to TRUE to lower-case tokens.
Allow x to be quanteda’s tokens_xptr object to enhance efficiency.

Save docvars in the textmodel_doc2vec objects.
Set zero for empty documents in the textmodel_doc2vec objects.
Add probability() to compute probability of words.

Rename word2vec(), doc2vec() and lsa() to textmodel_word2vec(), textmodel_doc2vec() and textmodel_lsa() respectively.
Simplify the C++ code to make maintenance easier.
Add normalize to word2vec to disable or enable word vector normalization.
Add weights() to extract back-propagation weights.
Make analogy() to convert a formula to named character vector.
Improve the stability of word2vec() when verbose = TRUE.

Fork https://github.com/bnosac/word2vec and change the package name to wordvector.
Replace a list of character with quanteda’s tokens object as an input object.
Recreate word2vec() with new argument names and object structures.
Create lda() to train word vectors using Latent Semantic Analysis.
Add similarity() and analogy() functions using proxyC.
Add data_corpus_news2014 that contain 20,000 news summaries as package data.