You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/11/17 16:53:23 UTC

[GitHub] [lucene] agorlenko commented on pull request #11946: add similarity threshold for hnsw

agorlenko commented on PR #11946:
URL: https://github.com/apache/lucene/pull/11946#issuecomment-1318924833

   > how common is this use-case? This change is fairly invasive... adding method signatures to e.g. LeafReader.
   
   It is difficult for me to judge in general, but I face with such tasks quite often. Here is the start of the discussion about that  functionality: https://lists.apache.org/list?dev@lucene.apache.org:lte=1M:HNSW%20search%20with%20threshold.
   
   The typical case: suppose we have a recommendation system. We have a huge collection of items and we want to give user recommendation of items which would be suitable for him/her. Ranking models, which can provide high quality, can be quite complex and resource consuming. So we can build several layers of models. The most complex ranking model is the last level. Each previous level are easier than previous one, and it selects candidates for the next level. If we have good embeddings for items, then we can build the first layer in the following way. We can calculate similarity between some embedding of user and embeddings of items and compare the similarity value with threshold. If the similarity value exceeds threshold then we consider such item as candidate for next level. This approach can be very productive in practice. But complexity is a problem in this approach. Because we have to calculate cosine between user' embedding and all embeddings of items. 
   
   I think the proposed functionality would help with this kind of tasks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org