You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Julie Tibshirani (Jira)" <ji...@apache.org> on 2021/10/04 22:07:00 UTC

[jira] [Created] (LUCENE-10147) KnnVectorQuery can produce negative scores

Julie Tibshirani created LUCENE-10147:
-----------------------------------------

             Summary: KnnVectorQuery can produce negative scores
                 Key: LUCENE-10147
                 URL: https://issues.apache.org/jira/browse/LUCENE-10147
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Julie Tibshirani


The cosine similarity of two vectors falls in the range [-1, 1]. So currently with cosine similarity, {{KnnVectorQuery}} can produce negative scores. Maybe we should just adjust the scores in this case by adding 1, shifting them to the range [0, 2].

As a side note, this made me notice that {{VectorSimilarityFunction.DOT_PRODUCT}} is really quite "expert"! Users need to know to normalize all document and query vectors to unit length when using this similarity. Otherwise the output is unbounded and difficult to handle in scoring. Also dot product is not a true metric: for example, it doesn't obey the triangle inequality. So many ANN algorithms have trouble supporting it. As part of this issue, we could improve the documentation on {{VectorSimilarityFunction.DOT_PRODUCT}} to clarify that normalization is required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org