You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ju...@apache.org on 2021/10/20 16:50:34 UTC
[lucene] branch main updated: LUCENE-10146: Add note that dot
product is preferred over cosine (#400)
This is an automated email from the ASF dual-hosted git repository.
julietibs pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/lucene.git
The following commit(s) were added to refs/heads/main by this push:
new 6bb2bbc LUCENE-10146: Add note that dot product is preferred over cosine (#400)
6bb2bbc is described below
commit 6bb2bbcd6ab2e07a646c17351437ea5210b08004
Author: Julie Tibshirani <ju...@elastic.co>
AuthorDate: Wed Oct 20 09:50:25 2021 -0700
LUCENE-10146: Add note that dot product is preferred over cosine (#400)
While VectorSimilarityFunction#COSINE is helpful when you need to preserve the
original vectors, it is significantly slower than DOT_PRODUCT. This commit adds
javadocs to COSINE explaining that dot product is the fastest option.
---
.../src/java/org/apache/lucene/index/VectorSimilarityFunction.java | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java b/lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java
index a133bf2..a237e3d 100644
--- a/lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java
+++ b/lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java
@@ -56,7 +56,12 @@ public enum VectorSimilarityFunction {
}
},
- /** Cosine similarity */
+ /**
+ * Cosine similarity. NOTE: the preferred way to perform cosine similarity is to normalize all
+ * vectors to unit length, and instead use {@link VectorSimilarityFunction#DOT_PRODUCT}. You
+ * should only use this function if you need to preserve the original vectors and cannot normalize
+ * them in advance.
+ */
COSINE {
@Override
public float compare(float[] v1, float[] v2) {