You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Julie Tibshirani (Jira)" <ji...@apache.org> on 2021/10/04 22:07:00 UTC
[jira] [Created] (LUCENE-10147) KnnVectorQuery can produce negative
scores
Julie Tibshirani created LUCENE-10147:
-----------------------------------------
Summary: KnnVectorQuery can produce negative scores
Key: LUCENE-10147
URL: https://issues.apache.org/jira/browse/LUCENE-10147
Project: Lucene - Core
Issue Type: Bug
Reporter: Julie Tibshirani
The cosine similarity of two vectors falls in the range [-1, 1]. So currently with cosine similarity, {{KnnVectorQuery}} can produce negative scores. Maybe we should just adjust the scores in this case by adding 1, shifting them to the range [0, 2].
As a side note, this made me notice that {{VectorSimilarityFunction.DOT_PRODUCT}} is really quite "expert"! Users need to know to normalize all document and query vectors to unit length when using this similarity. Otherwise the output is unbounded and difficult to handle in scoring. Also dot product is not a true metric: for example, it doesn't obey the triangle inequality. So many ANN algorithms have trouble supporting it. As part of this issue, we could improve the documentation on {{VectorSimilarityFunction.DOT_PRODUCT}} to clarify that normalization is required.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org