You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/02/19 17:23:28 UTC

[GitHub] [lucene] rmuir commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

rmuir commented on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1046065536


   I'm still a bit confused about why we need to `grow(long)` on a bitset that can only hold `Integer.MAX_VALUE` elements. I've re-read the description of the JIRA several times this morning, but honestly I was confused about this before, too.
   
   It seems the only purpose of the `long`, we're doing a lot of elaborate work just to estimate the cardinality that we'll ultimately pass down to the bitset? I don't see any other use of the `long` value other than this `counter`, and that's all we are doing with it. But surely using a `long` isn't helpful to this estimation, maybe we should just estimate it differently?
   
   Sorry if my comment isn't very helpful, but I want to really understand the problem and why we need to bring 64 bits into this, currently it is very confusing. Perhaps we should remove this `counter` completely (temporarily), and use the other `BitDocIdSet` constructor. How simpler does the code get then?
   
   It turns the problem around, into, how can we estimate the cost better. I don't think we should have to majorly reorganize the code just for this `counter`, it doesn't seem right at all. There are other ways we could estimate the cost rather than summing up the calls to `grow()`. 
   
   For example in the sparse/buffer case, wouldn't a much simpler estimation simply be the `length` of int array?  I'm also confused why we have this sorted array buffer case instead of using SparseFixedBitSet (which has approximateCardinality already and needs no such special grow-tracking).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org