You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by "Julie Tibshirani (Jira)" <ji...@apache.org> on 2021/10/19 19:33:00 UTC

[jira] [Created] (LUCENE-10191) Optimize vector functions by precomputing magnitudes

Julie Tibshirani created LUCENE-10191:
-----------------------------------------

             Summary: Optimize vector functions by precomputing magnitudes
                 Key: LUCENE-10191
                 URL: https://issues.apache.org/jira/browse/LUCENE-10191
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Julie Tibshirani


Both euclidean distance (L2 norm) and cosine similarity can be expressed in terms of dot product and vector magnitudes:
 * l2_norm(a, b) = ||a - b|| = sqrt(||a||^2 - 2(a . b) + ||b||^2)
 * cosine(a, b) = a . b / ||a|| ||b||

We could compute and store each vector's magnitude upfront while indexing, and compute the query vector's magnitude once per query. Then we'd calculate the distance using our (very optimized) dot product method, plus the precomputed values.

This is an exploratory issue: I haven't tested this out yet, so I'm not sure how much it would help. I would at least expect it to help with cosine similarity – several months ago we tried out similar ideas in Elasticsearch and were able to get a nice boost in cosine performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org