You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/21 06:04:37 UTC

[GitHub] [lucene] LuXugang commented on issue #11773: Could `PointRangeQuery`'s boundary values used for `NumericComparator` to calculate `estimatedNumberOfMatches`

LuXugang commented on issue #11773:
URL: https://github.com/apache/lucene/issues/11773#issuecomment-1253245181

   > The estimatedNumberOfMatches should still be very close to the actual number
   
   Actually `estimatedNumberOfMatches` may far away from the actual number. 
   
   I wrote a [test](https://github.com/LuXugang/Lucene-7.5.0/blob/master/LuceneDemo9.2.0/src/main/java/NumericDocValuesTopNOptimization2.java) shows documents which are out of query boundary will participate in the calculation of `estimatedNumberOfMatches` which should not be  what we expected.
   
   In that [test](https://github.com/LuXugang/Lucene-7.5.0/blob/master/LuceneDemo9.2.0/src/main/java/NumericDocValuesTopNOptimization2.java), `80003` documents were indexed would match `PointRangeQuery`, and `TopFieldCollector` will collect different numbers of docs according to the number of documents which are out of query boundary.
   
   
   
   
   number of  documents which are out of query boundary | number of hits in Collector
   -- | --
   1 | 1001
   1000 | 1001
   10000 | 1001
   20000 | 80003
   100000 | 80003
   10000+ | 80003
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org