You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/08/10 13:29:19 UTC

[GitHub] [lucene] jtibshirani edited a comment on pull request #235: LUCENE-9614: add KnnVectorQuery implementation

jtibshirani edited a comment on pull request #235:
URL: https://github.com/apache/lucene/pull/235#issuecomment-896029536


   > I noticed that scores for the default similarity (Euclidean) had very low precision as they got large... The way we were handling this was to apply an `exp(-distance)` to convert distances to scores.
   
   I wonder if we could just swap in `f(x) = 1 / (1 + x)`, which decays a lot more slowly than `exp(-x)`. This maintains the nice property of producing scores within [0, 1].
   
   > There's a clever implementation (hack?!) to deal with trying to minimize over-collection across multiple segments. Basically the idea is to optimistically collect the expected proportion of top K based on the segment size (plus a margin)...
   
   This is a nice idea! The binomial estimate is based on the idea that nearest vectors are randomly distributed through the index. But since segment membership is related to when a document was indexed, I wonder if it'll be common for most nearest neighbors to be found in one segment. For example, maybe we are indexing (and embedding) news articles as they're written, and our query is a news event. Would it make sense to start with a simple approach where we just collect 'k' from each segment? Then we would explore optimizations in a follow-up with benchmarks?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org