You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/11/16 06:57:52 UTC

[GitHub] [lucene] maosuhan opened a new pull request, #11939: fix bug of incorrect cost after upgradeToBitSet in DocIdSetBuilder class

maosuhan opened a new pull request, #11939:
URL: https://github.com/apache/lucene/pull/11939

   ### Description
   
   When we execute TermRangeQuery or TermInSet query, lucene use DocIdSetBuilder to store doc id list. When the doc id list becomes large, it will convert from array to bitset in upgradeToBitSet. When new doc id is added, the `counter` variable of DocIdSetBuilder is unchanged, and the cost is incorrect in DocIdSetBuilder.build.
   
   How to reproduce:
   
           Directory dir = FSDirectory.open(Files.createTempDirectory(null, new FileAttribute[0]));
           IndexWriter w = new IndexWriter(dir, new IndexWriterConfig());
           for (int i = 100000; i < 300000; ++i) {
               Document doc = new Document();
               doc.add(new StringField("f1", i + "", Field.Store.NO));
               w.addDocument(doc);
           }
           w.forceMerge(1);
           IndexReader reader = DirectoryReader.open(w);
           IndexSearcher searcher = new IndexSearcher(reader);
           searcher.setQueryCache(null);
   
           Query query = new TermRangeQuery("f1", new BytesRef("200000"), new BytesRef("300000"), true, true);
           Weight weight = searcher.createWeight(searcher.rewrite(query), ScoreMode.COMPLETE, 1);
           ScorerSupplier scorerSupplier = weight.scorerSupplier(searcher.getIndexReader().leaves().get(0));
           System.out.println(scorerSupplier.cost());
   
   it is wrong cost=1026, the actual cost should be 100000. This will cause some performance unexpected issue like lead selection in bool query.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #11939: fix bug of incorrect cost after upgradeToBitSet in DocIdSetBuilder class

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #11939:
URL: https://github.com/apache/lucene/pull/11939#issuecomment-1317027842

   This looks good to me. Can you add a CHANGES entry under `9.4.2`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] maosuhan commented on pull request #11939: fix bug of incorrect cost after upgradeToBitSet in DocIdSetBuilder class

Posted by GitBox <gi...@apache.org>.
maosuhan commented on PR #11939:
URL: https://github.com/apache/lucene/pull/11939#issuecomment-1316675899

   > Great catch, can you add a test?
   
   @jpountz I have added the test code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #11939: fix bug of incorrect cost after upgradeToBitSet in DocIdSetBuilder class

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #11939:
URL: https://github.com/apache/lucene/pull/11939#issuecomment-1316641902

   Great catch, can you add a test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] maosuhan commented on pull request #11939: fix bug of incorrect cost after upgradeToBitSet in DocIdSetBuilder class

Posted by GitBox <gi...@apache.org>.
maosuhan commented on PR #11939:
URL: https://github.com/apache/lucene/pull/11939#issuecomment-1317154566

   @jpountz changed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz merged pull request #11939: fix bug of incorrect cost after upgradeToBitSet in DocIdSetBuilder class

Posted by GitBox <gi...@apache.org>.
jpountz merged PR #11939:
URL: https://github.com/apache/lucene/pull/11939


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org