You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Alessandro Benedetti (Jira)" <ji...@apache.org> on 2022/04/20 14:36:00 UTC
[jira] [Updated] (LUCENE-10146) Add VectorSimilarityFunction.COSINE
[ https://issues.apache.org/jira/browse/LUCENE-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alessandro Benedetti updated LUCENE-10146:
------------------------------------------
Labels: vector-based-search (was: )
> Add VectorSimilarityFunction.COSINE
> -----------------------------------
>
> Key: LUCENE-10146
> URL: https://issues.apache.org/jira/browse/LUCENE-10146
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Julie Tibshirani
> Priority: Major
> Labels: vector-based-search
> Fix For: 9.0
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> To perform ANN search with cosine similarity, users are expected to normalize the document and query vectors to unit length, then use {{VectorSimilarityFunction.DOT_PRODUCT}}. I think it would be good to also support cosine similarity directly through {{VectorSimilarityFunction.COSINE}}. This would allow users to perform ANN based on cosine similarity, while retaining access to the original vectors through {{VectorValues}}. That way they can use the original vectors in a reranking step or return them to the application for further processing.
> It looks like nmslib and hnswlib support cosine similarity. On the other hand, FAISS only supports dot product and suggests users normalize the vectors to perform cosine similarity (https://github.com/facebookresearch/faiss/issues/95). To me adding this one additional similarity is worth it in terms of what it lets users accomplish.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org