You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by "Michael Sokolov (Jira)" <ji...@apache.org> on 2022/05/04 21:24:00 UTC

[jira] [Created] (LUCENE-10559) Add preFilter/postFilter options to KnnGraphTester

Michael Sokolov created LUCENE-10559:
----------------------------------------

             Summary: Add preFilter/postFilter options to KnnGraphTester
                 Key: LUCENE-10559
                 URL: https://issues.apache.org/jira/browse/LUCENE-10559
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael Sokolov


We want to be able to test the efficacy of pre-filtering in KnnVectorQuery: if you (say) want the top K nearest neighbors subject to a constraint Q, are you better off over-selecting (say 2K) top hits and *then* filtering (post-filtering), or incorporating the filtering into the query (pre-filtering). How does it depend on the selectivity of the filter?

I think we can get a reasonable testbed by generating a uniform random filter with some selectivity (that is consistent and repeatable). Possibly we'd also want to try filters that are correlated with index order, but it seems they'd be unlikely to be correlated with vector values in a way that the graph structure would notice, so random is a pretty good starting point for this.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org