You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/02/08 20:43:20 UTC
[GitHub] [lucene] jtibshirani edited a comment on pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani edited a comment on pull request #656:
URL: https://github.com/apache/lucene/pull/656#issuecomment-1032109021
I tried out the around stopping the HNSW search early if it visits too many docs. To test, I modified `KnnGraphTester` to create `acceptDocs` uniformly at random with a certain selectivity, then measured recall and QPS. Here are the results on glove-100-angular (~1.2 million docs) with a filter selectivity 0.01:
**Baseline**
```
k Recall VisitedDocs QPS
10 0.774 15957 232.083
50 0.930 63429 58.994
80 0.958 95175 42.470
100 0.967 118891 35.203
500 0.997 1176237 8.136
800 0.999 1183514 5.571
```
**PR**
```
k Recall VisitedDocs QPS
10 1.000 22908 190.286
50 1.000 23607 152.224
80 1.000 23608 148.036
100 1.000 23608 145.381
500 1.000 23608 138.903
800 1.000 23608 137.882
```
Since the filter is so selective, HNSW always visits more than 1% of the docs. The adaptive logic in the PR decides to stop the search and switch to an exact search, which bounds the visited docs at 2%. For `k=10` this makes the QPS a little worse, but overall prevents QPS from degrading (with the side benefit of perfect recall). I also tested with less restrictive filters, and in these cases the fallback just doesn't kick in, so the QPS remains the same as before.
Overall I like this approach because it doesn't require us to fiddle with thresholds or expose new parameters. It could also help make HNSW more robust in "pathological" cases where even when the filter is not that selective, all the nearest vectors to a query happen to be filtered away.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org