This vignette explains how proxyC compute the
similarity and distance measures.
Notation
The length of the vector
,
while
is the absolute values of the elements.
Operations on vectors are element-wise:
Summation of the elements of vectors is written using sigma without
specifying the range:
When the elements of the vector is compared with a value in a pair of
square brackets, the summation is counting the number of elements that
equal (or unequal) to the value:
 
Similarity Measures
Similarity measures are available in
proxyC::simil().
Cosine similarity (“cosine”)
 
Pearson correlation coefficient (“correlation”)
 
Jaccard similarity (“jaccard” and “ejaccard”)
The values of
and
are Boolean for “jaccard”.
 
Fuzzy Jaccard similarity (“fjaccard”)
The values must be
and
.
 
Dice similarity (“dice” and “edice”)
The values of
and
are Boolean for “dice”.
 
Hamann similarity (“hamann”)
 
Faith similarity (“faith”)
 
Simple matching (“matching”)
 
 
Distance Measures
Similarity measures are available in proxyC::dist().
Smoothing of the vectors can be performed when method is
“chisquared”, “kullback”, “jefferys” or “jensen”: the value of
smooth will be added to each element of
and
.
Manhattan distance (“manhattan”)
 
Canberra distance (“canberra”)
 
Euclidian (“euclidian”)
 
Minkowski distance (“minkowski”)
 
Hamming distance (“hamming”)
 
The largest difference between values (“maximum”)
 
Chi-squared divergence (“chisquared”)
 
Kullback–Leibler divergence (“kullback”)
 
Jeffreys divergence (“jeffreys”)
 
Jensen-Shannon divergence (“jensen”)
 
 
References
- Choi, S., Cha, S., & Tappert, C. C. (2010). A survey of binary
similarity and distance measures. Journal of Systemics, Cybernetics
and Informatics, 8(1), 43–48.
 
- Nielsen, F. (2019). On the Jensen–Shannon Symmetrization of
Distances Relying on Abstract Means. Entropy, 21(5), 485. https://doi.org/10.3390/e21050485
 
- Jain, G., Mahara, T., & Tripathi, K. N. (2020). A Survey of
Similarity Measures for Collaborative Filtering-Based Recommender
System. In M. Pant, T. K. Sharma, O. P. Verma, R. Singla, & A.
Sikander (Eds.), Soft Computing: Theories and Applications
(pp. 343–352). Springer. https://doi.org/10.1007/978-981-15-0751-9_32
 
- Miyamoto, S. (1990). Hierarchical Cluster Analysis and Fuzzy Sets.
In S. Miyamoto (Ed.), Fuzzy Sets in Information Retrieval and Cluster
Analysis (pp. 125–188). Springer Netherlands. https://doi.org/10.1007/978-94-015-7887-5_6