You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "joshdevins (via GitHub)" <gi...@apache.org> on 2023/05/30 10:26:40 UTC

[GitHub] [lucene] joshdevins commented on pull request #12314: Multi-value support for KnnVectorField

joshdevins commented on PR #12314:
URL: https://github.com/apache/lucene/pull/12314#issuecomment-1568187706

   > SUM = the similarity score between the query and each vector is computed, all scores are summed to get the final score
   > SUM = every time we find a nearest neighbor vector to be added to the topK, if the document is already there, its score is updated summing the old and new score
   
   Just a note on the aggregation functions `max` and `sum`. Most commonly it seems that `max` is used as it is length independent. When using `sum`, the longer the original text of a document field, and thus the more passages it will have, the higher the `sum` of all matching passages will be since all passages will "match". I'm not sure if it will matter in the end, but my suggestion would be that if `sum` is used, one could optionally use a radius/similarity threshold be used to limit the advantage of longer texts, and/or allow using just a limited top-k passages of a document for `sum`.
   
   @alessandrobenedetti Do you have any good references/papers on approaches to re-aggregating passages into documents for SERPs? It seems that the art was abandoned a couple years ago with most approaches settling on `max` passage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org