You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/10/11 15:53:00 UTC

[jira] [Commented] (LUCENE-10146) Add VectorSimilarityFunction.COSINE

    [ https://issues.apache.org/jira/browse/LUCENE-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427216#comment-17427216 ] 

ASF subversion and git services commented on LUCENE-10146:
----------------------------------------------------------

Commit f4861159c3cc3decd50c8e6b37f24992f03a8d18 in lucene's branch refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f486115 ]

LUCENE-10146: Add VectorSimilarityFunction.COSINE (#366)

This PR adds support for using cosine similarity with kNN vector fields.

It takes a simple approach and doesn't attempt optimizations like normalizing
the query vector in advance, or performing loop unrolling. The thinking is that
users who prioritize efficiency can normalize all vectors in advance and use
`VectorSimilarityFunction.DOT_PRODUCT`.

> Add VectorSimilarityFunction.COSINE
> -----------------------------------
>
>                 Key: LUCENE-10146
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10146
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Julie Tibshirani
>            Priority: Major
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> To perform ANN search with cosine similarity, users are expected to normalize the document and query vectors to unit length, then use {{VectorSimilarityFunction.DOT_PRODUCT}}. I think it would be good to also support cosine similarity directly through {{VectorSimilarityFunction.COSINE}}. This would allow users to perform ANN based on cosine similarity, while retaining access to the original vectors through {{VectorValues}}. That way they can use the original vectors in a reranking step or return them to the application for further processing.
> It looks like nmslib and hnswlib support cosine similarity. On the other hand, FAISS only supports dot product and suggests users normalize the vectors to perform cosine similarity (https://github.com/facebookresearch/faiss/issues/95). To me adding this one additional similarity is worth it in terms of what it lets users accomplish.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org