You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Arihant Samar <ar...@gmail.com> on 2021/06/12 15:47:11 UTC

Boolean Scorer

Hi ,

I am new here . I would like to know what is the exact optimisation carried
out in “Boolean Scorer.java” code which led to a separate class for
resolving Boolean Queries in bulk documents. I could not find any material
in the documentation for this as well, hence I decided to ask here.


Thanking you in advance,

Arihant.



Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for Windows
10

Re: Boolean Scorer

Posted by Atri Sharma <at...@apache.org>.
TBH, the proposal sound like an overkill to me - IndexSearcher's
concurrency should be good enough (unless you are searching a single large
segment)

On Mon, 21 Jun 2021, 19:04 Adrien Grand, <jp...@gmail.com> wrote:

> It should be possible to make something like this work. The main issue is
> that Lucene has the expectation that a (Bulk)Scorer is consumed in the
> thread where it was pulled, so this would require substantial changes to
> how BooleanScorer currently operates I believe.
>
> I'd be curious to know why you are looking into this rather than passing
> an Executor to IndexSearcher so that it can search segments concurrently.
> Is it not providing enough concurrency for you?
>
> On Mon, Jun 21, 2021 at 9:46 AM Arihant Samar <ar...@gmail.com> wrote:
>
>> Hi,
>> There is a function "ScoreWindowIntoBitSetAndReplay" in
>> "BooleanScorer.java" which runs over all the scorers.
>> I was wondering if we can use multi-threading here with numScorers
>> threads. Anyways we are using a special OrCollector here which updates the
>> matching array and the score in the buckets of 2048 docs. So we can use a
>> Reentrant lock for synchronization in the collector.
>>
>> I just wanted reviews on this since I tried this and some tests were not
>> passing. So if you could tell what is wrong in this approach, I
>> would appreciate it.
>>
>> Thanking You in advance,
>> Arihant.
>>
>> On Tue, 15 Jun 2021, 19:05 Adrien Grand, <jp...@gmail.com> wrote:
>>
>>> Glad it helped. :)
>>>
>>> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <gs...@gmail.com> wrote:
>>>
>>>> Thanks for this explanation Adrien! I'd been wondering about this a bit
>>>> myself since seeing that DrillSideways also implements a TAAT approach (in
>>>> addition to a doc-at-a-time approach). This really helps clear that up.
>>>> Appreciate you taking the time to explain!
>>>>
>>>> Cheers,
>>>> -Greg
>>>>
>>>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <jp...@gmail.com> wrote:
>>>>
>>>>> Hello Arihant,
>>>>>
>>>>> The Scorer for disjunctions uses a heap data structure that needs to
>>>>> be reordered upon every hit. While reordering heaps is efficient as it runs
>>>>> in logarithmic time, the fact that it needs to run on every document might
>>>>> add non-negligible overhead. BooleanScorer tries to work around this
>>>>> overhead by scoring large windows of documents in a more TAAT
>>>>> (term-at-a-time) fashion so that Lucene only needs to reorder the heap
>>>>> every 2048 doc IDs (the hardcoded window size).
>>>>>
>>>>> This paper gives a bit more context:
>>>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>>>>> see section 4 in particular.
>>>>>
>>>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <ar...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi ,
>>>>>>
>>>>>> I am new here . I would like to know what is the exact optimisation
>>>>>> carried out in “Boolean Scorer.java” code which led to a separate class for
>>>>>> resolving Boolean Queries in bulk documents. I could not find any material
>>>>>> in the documentation for this as well, hence I decided to ask here.
>>>>>>
>>>>>>
>>>>>> Thanking you in advance,
>>>>>>
>>>>>> Arihant.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>>>>> Windows 10
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Adrien
>>>>>
>>>>
>>>
>>> --
>>> Adrien
>>>
>>
>
> --
> Adrien
>

Re: Boolean Scorer

Posted by Arihant Samar <ar...@gmail.com>.
I managed to correct some mistakes and now the tests which checks scores
are passing. Obviously the tests which check about the same thread
generating and collecting fail , but just out of interest I removed those
asserts. Are there any tests or benchmarks which I can compare how these
changes perform.

Thanking you in advance,
Arihant.

On Tue, 22 Jun 2021 at 11:37, Arihant Samar <ar...@gmail.com> wrote:

> There was a Jira relating to GPU acceleration where it was mentioned that
> Boolean Scorer has possibilities of GPU usage.
>  So I was just checking first with multithreading in Java itself and
> thought that this function may be amenable to parallelization.
> Hence I was just giving it a try.
> Will this not be useful if there are very long Boolean queries with a lot
> of SHOULD clauses although I have no clue if this is a common situation.
>
> I just need one more little help. Although some of the tests do give the
> error Adrien mentioned that docs should be collected in the same thread
> they were generated, but some tests also give wrong scores itself. Do you
> see anything wrong in the synchronization I have done?
> The synchronization I have done is basically creating an array of
> matching.length size of Reentrant locks and just running the function
> "ScoreWindowIntoBitSetAndReplay " with numScorer threads instead of the for
> loop.
> /// in BooleanScorer.java -> OrCollector -> collect function
> Lock[idx].lock();
> matching[idx] |= 1L << i;
> final Bucket bucket = buckets[i];
> bucket.freq++;
> bucket.score += scorer.score();
> Lock[idx].unlock();
>
>
>
> On Mon, 21 Jun 2021 at 19:04, Adrien Grand <jp...@gmail.com> wrote:
>
>> It should be possible to make something like this work. The main issue is
>> that Lucene has the expectation that a (Bulk)Scorer is consumed in the
>> thread where it was pulled, so this would require substantial changes to
>> how BooleanScorer currently operates I believe.
>>
>> I'd be curious to know why you are looking into this rather than passing
>> an Executor to IndexSearcher so that it can search segments concurrently.
>> Is it not providing enough concurrency for you?
>>
>> On Mon, Jun 21, 2021 at 9:46 AM Arihant Samar <ar...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> There is a function "ScoreWindowIntoBitSetAndReplay" in
>>> "BooleanScorer.java" which runs over all the scorers.
>>> I was wondering if we can use multi-threading here with numScorers
>>> threads. Anyways we are using a special OrCollector here which updates the
>>> matching array and the score in the buckets of 2048 docs. So we can use a
>>> Reentrant lock for synchronization in the collector.
>>>
>>> I just wanted reviews on this since I tried this and some tests were not
>>> passing. So if you could tell what is wrong in this approach, I
>>> would appreciate it.
>>>
>>> Thanking You in advance,
>>> Arihant.
>>>
>>> On Tue, 15 Jun 2021, 19:05 Adrien Grand, <jp...@gmail.com> wrote:
>>>
>>>> Glad it helped. :)
>>>>
>>>> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <gs...@gmail.com> wrote:
>>>>
>>>>> Thanks for this explanation Adrien! I'd been wondering about this a
>>>>> bit myself since seeing that DrillSideways also implements a TAAT approach
>>>>> (in addition to a doc-at-a-time approach). This really helps clear that up.
>>>>> Appreciate you taking the time to explain!
>>>>>
>>>>> Cheers,
>>>>> -Greg
>>>>>
>>>>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <jp...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Arihant,
>>>>>>
>>>>>> The Scorer for disjunctions uses a heap data structure that needs to
>>>>>> be reordered upon every hit. While reordering heaps is efficient as it runs
>>>>>> in logarithmic time, the fact that it needs to run on every document might
>>>>>> add non-negligible overhead. BooleanScorer tries to work around this
>>>>>> overhead by scoring large windows of documents in a more TAAT
>>>>>> (term-at-a-time) fashion so that Lucene only needs to reorder the heap
>>>>>> every 2048 doc IDs (the hardcoded window size).
>>>>>>
>>>>>> This paper gives a bit more context:
>>>>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>>>>>> see section 4 in particular.
>>>>>>
>>>>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <ar...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi ,
>>>>>>>
>>>>>>> I am new here . I would like to know what is the exact optimisation
>>>>>>> carried out in “Boolean Scorer.java” code which led to a separate class for
>>>>>>> resolving Boolean Queries in bulk documents. I could not find any material
>>>>>>> in the documentation for this as well, hence I decided to ask here.
>>>>>>>
>>>>>>>
>>>>>>> Thanking you in advance,
>>>>>>>
>>>>>>> Arihant.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>>>>>> Windows 10
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Adrien
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Adrien
>>>>
>>>
>>
>> --
>> Adrien
>>
>

Re: Boolean Scorer

Posted by Arihant Samar <ar...@gmail.com>.
There was a Jira relating to GPU acceleration where it was mentioned that
Boolean Scorer has possibilities of GPU usage.
 So I was just checking first with multithreading in Java itself and
thought that this function may be amenable to parallelization.
Hence I was just giving it a try.
Will this not be useful if there are very long Boolean queries with a lot
of SHOULD clauses although I have no clue if this is a common situation.

I just need one more little help. Although some of the tests do give the
error Adrien mentioned that docs should be collected in the same thread
they were generated, but some tests also give wrong scores itself. Do you
see anything wrong in the synchronization I have done?
The synchronization I have done is basically creating an array of
matching.length size of Reentrant locks and just running the function
"ScoreWindowIntoBitSetAndReplay " with numScorer threads instead of the for
loop.
/// in BooleanScorer.java -> OrCollector -> collect function
Lock[idx].lock();
matching[idx] |= 1L << i;
final Bucket bucket = buckets[i];
bucket.freq++;
bucket.score += scorer.score();
Lock[idx].unlock();



On Mon, 21 Jun 2021 at 19:04, Adrien Grand <jp...@gmail.com> wrote:

> It should be possible to make something like this work. The main issue is
> that Lucene has the expectation that a (Bulk)Scorer is consumed in the
> thread where it was pulled, so this would require substantial changes to
> how BooleanScorer currently operates I believe.
>
> I'd be curious to know why you are looking into this rather than passing
> an Executor to IndexSearcher so that it can search segments concurrently.
> Is it not providing enough concurrency for you?
>
> On Mon, Jun 21, 2021 at 9:46 AM Arihant Samar <ar...@gmail.com> wrote:
>
>> Hi,
>> There is a function "ScoreWindowIntoBitSetAndReplay" in
>> "BooleanScorer.java" which runs over all the scorers.
>> I was wondering if we can use multi-threading here with numScorers
>> threads. Anyways we are using a special OrCollector here which updates the
>> matching array and the score in the buckets of 2048 docs. So we can use a
>> Reentrant lock for synchronization in the collector.
>>
>> I just wanted reviews on this since I tried this and some tests were not
>> passing. So if you could tell what is wrong in this approach, I
>> would appreciate it.
>>
>> Thanking You in advance,
>> Arihant.
>>
>> On Tue, 15 Jun 2021, 19:05 Adrien Grand, <jp...@gmail.com> wrote:
>>
>>> Glad it helped. :)
>>>
>>> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <gs...@gmail.com> wrote:
>>>
>>>> Thanks for this explanation Adrien! I'd been wondering about this a bit
>>>> myself since seeing that DrillSideways also implements a TAAT approach (in
>>>> addition to a doc-at-a-time approach). This really helps clear that up.
>>>> Appreciate you taking the time to explain!
>>>>
>>>> Cheers,
>>>> -Greg
>>>>
>>>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <jp...@gmail.com> wrote:
>>>>
>>>>> Hello Arihant,
>>>>>
>>>>> The Scorer for disjunctions uses a heap data structure that needs to
>>>>> be reordered upon every hit. While reordering heaps is efficient as it runs
>>>>> in logarithmic time, the fact that it needs to run on every document might
>>>>> add non-negligible overhead. BooleanScorer tries to work around this
>>>>> overhead by scoring large windows of documents in a more TAAT
>>>>> (term-at-a-time) fashion so that Lucene only needs to reorder the heap
>>>>> every 2048 doc IDs (the hardcoded window size).
>>>>>
>>>>> This paper gives a bit more context:
>>>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>>>>> see section 4 in particular.
>>>>>
>>>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <ar...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi ,
>>>>>>
>>>>>> I am new here . I would like to know what is the exact optimisation
>>>>>> carried out in “Boolean Scorer.java” code which led to a separate class for
>>>>>> resolving Boolean Queries in bulk documents. I could not find any material
>>>>>> in the documentation for this as well, hence I decided to ask here.
>>>>>>
>>>>>>
>>>>>> Thanking you in advance,
>>>>>>
>>>>>> Arihant.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>>>>> Windows 10
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Adrien
>>>>>
>>>>
>>>
>>> --
>>> Adrien
>>>
>>
>
> --
> Adrien
>

Re: Boolean Scorer

Posted by Adrien Grand <jp...@gmail.com>.
It should be possible to make something like this work. The main issue is
that Lucene has the expectation that a (Bulk)Scorer is consumed in the
thread where it was pulled, so this would require substantial changes to
how BooleanScorer currently operates I believe.

I'd be curious to know why you are looking into this rather than passing an
Executor to IndexSearcher so that it can search segments concurrently. Is
it not providing enough concurrency for you?

On Mon, Jun 21, 2021 at 9:46 AM Arihant Samar <ar...@gmail.com> wrote:

> Hi,
> There is a function "ScoreWindowIntoBitSetAndReplay" in
> "BooleanScorer.java" which runs over all the scorers.
> I was wondering if we can use multi-threading here with numScorers
> threads. Anyways we are using a special OrCollector here which updates the
> matching array and the score in the buckets of 2048 docs. So we can use a
> Reentrant lock for synchronization in the collector.
>
> I just wanted reviews on this since I tried this and some tests were not
> passing. So if you could tell what is wrong in this approach, I
> would appreciate it.
>
> Thanking You in advance,
> Arihant.
>
> On Tue, 15 Jun 2021, 19:05 Adrien Grand, <jp...@gmail.com> wrote:
>
>> Glad it helped. :)
>>
>> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <gs...@gmail.com> wrote:
>>
>>> Thanks for this explanation Adrien! I'd been wondering about this a bit
>>> myself since seeing that DrillSideways also implements a TAAT approach (in
>>> addition to a doc-at-a-time approach). This really helps clear that up.
>>> Appreciate you taking the time to explain!
>>>
>>> Cheers,
>>> -Greg
>>>
>>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <jp...@gmail.com> wrote:
>>>
>>>> Hello Arihant,
>>>>
>>>> The Scorer for disjunctions uses a heap data structure that needs to be
>>>> reordered upon every hit. While reordering heaps is efficient as it runs in
>>>> logarithmic time, the fact that it needs to run on every document might add
>>>> non-negligible overhead. BooleanScorer tries to work around this overhead
>>>> by scoring large windows of documents in a more TAAT (term-at-a-time)
>>>> fashion so that Lucene only needs to reorder the heap every 2048 doc IDs
>>>> (the hardcoded window size).
>>>>
>>>> This paper gives a bit more context:
>>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>>>> see section 4 in particular.
>>>>
>>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <ar...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi ,
>>>>>
>>>>> I am new here . I would like to know what is the exact optimisation
>>>>> carried out in “Boolean Scorer.java” code which led to a separate class for
>>>>> resolving Boolean Queries in bulk documents. I could not find any material
>>>>> in the documentation for this as well, hence I decided to ask here.
>>>>>
>>>>>
>>>>> Thanking you in advance,
>>>>>
>>>>> Arihant.
>>>>>
>>>>>
>>>>>
>>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>>>> Windows 10
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Adrien
>>>>
>>>
>>
>> --
>> Adrien
>>
>

-- 
Adrien

Re: Boolean Scorer

Posted by Arihant Samar <ar...@gmail.com>.
Hi,
There is a function "ScoreWindowIntoBitSetAndReplay" in
"BooleanScorer.java" which runs over all the scorers.
I was wondering if we can use multi-threading here with numScorers threads.
Anyways we are using a special OrCollector here which updates the matching
array and the score in the buckets of 2048 docs. So we can use a Reentrant
lock for synchronization in the collector.

I just wanted reviews on this since I tried this and some tests were not
passing. So if you could tell what is wrong in this approach, I
would appreciate it.

Thanking You in advance,
Arihant.

On Tue, 15 Jun 2021, 19:05 Adrien Grand, <jp...@gmail.com> wrote:

> Glad it helped. :)
>
> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <gs...@gmail.com> wrote:
>
>> Thanks for this explanation Adrien! I'd been wondering about this a bit
>> myself since seeing that DrillSideways also implements a TAAT approach (in
>> addition to a doc-at-a-time approach). This really helps clear that up.
>> Appreciate you taking the time to explain!
>>
>> Cheers,
>> -Greg
>>
>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <jp...@gmail.com> wrote:
>>
>>> Hello Arihant,
>>>
>>> The Scorer for disjunctions uses a heap data structure that needs to be
>>> reordered upon every hit. While reordering heaps is efficient as it runs in
>>> logarithmic time, the fact that it needs to run on every document might add
>>> non-negligible overhead. BooleanScorer tries to work around this overhead
>>> by scoring large windows of documents in a more TAAT (term-at-a-time)
>>> fashion so that Lucene only needs to reorder the heap every 2048 doc IDs
>>> (the hardcoded window size).
>>>
>>> This paper gives a bit more context:
>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>>> see section 4 in particular.
>>>
>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <ar...@gmail.com>
>>> wrote:
>>>
>>>> Hi ,
>>>>
>>>> I am new here . I would like to know what is the exact optimisation
>>>> carried out in “Boolean Scorer.java” code which led to a separate class for
>>>> resolving Boolean Queries in bulk documents. I could not find any material
>>>> in the documentation for this as well, hence I decided to ask here.
>>>>
>>>>
>>>> Thanking you in advance,
>>>>
>>>> Arihant.
>>>>
>>>>
>>>>
>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>>> Windows 10
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Adrien
>>>
>>
>
> --
> Adrien
>

Re: Boolean Scorer

Posted by Adrien Grand <jp...@gmail.com>.
Glad it helped. :)

On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <gs...@gmail.com> wrote:

> Thanks for this explanation Adrien! I'd been wondering about this a bit
> myself since seeing that DrillSideways also implements a TAAT approach (in
> addition to a doc-at-a-time approach). This really helps clear that up.
> Appreciate you taking the time to explain!
>
> Cheers,
> -Greg
>
> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <jp...@gmail.com> wrote:
>
>> Hello Arihant,
>>
>> The Scorer for disjunctions uses a heap data structure that needs to be
>> reordered upon every hit. While reordering heaps is efficient as it runs in
>> logarithmic time, the fact that it needs to run on every document might add
>> non-negligible overhead. BooleanScorer tries to work around this overhead
>> by scoring large windows of documents in a more TAAT (term-at-a-time)
>> fashion so that Lucene only needs to reorder the heap every 2048 doc IDs
>> (the hardcoded window size).
>>
>> This paper gives a bit more context:
>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>> see section 4 in particular.
>>
>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <ar...@gmail.com>
>> wrote:
>>
>>> Hi ,
>>>
>>> I am new here . I would like to know what is the exact optimisation
>>> carried out in “Boolean Scorer.java” code which led to a separate class for
>>> resolving Boolean Queries in bulk documents. I could not find any material
>>> in the documentation for this as well, hence I decided to ask here.
>>>
>>>
>>> Thanking you in advance,
>>>
>>> Arihant.
>>>
>>>
>>>
>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>> Windows 10
>>>
>>>
>>>
>>
>>
>> --
>> Adrien
>>
>

-- 
Adrien

Re: Boolean Scorer

Posted by Greg Miller <gs...@gmail.com>.
Thanks for this explanation Adrien! I'd been wondering about this a bit
myself since seeing that DrillSideways also implements a TAAT approach (in
addition to a doc-at-a-time approach). This really helps clear that up.
Appreciate you taking the time to explain!

Cheers,
-Greg

On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <jp...@gmail.com> wrote:

> Hello Arihant,
>
> The Scorer for disjunctions uses a heap data structure that needs to be
> reordered upon every hit. While reordering heaps is efficient as it runs in
> logarithmic time, the fact that it needs to run on every document might add
> non-negligible overhead. BooleanScorer tries to work around this overhead
> by scoring large windows of documents in a more TAAT (term-at-a-time)
> fashion so that Lucene only needs to reorder the heap every 2048 doc IDs
> (the hardcoded window size).
>
> This paper gives a bit more context:
> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
> see section 4 in particular.
>
> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <ar...@gmail.com> wrote:
>
>> Hi ,
>>
>> I am new here . I would like to know what is the exact optimisation
>> carried out in “Boolean Scorer.java” code which led to a separate class for
>> resolving Boolean Queries in bulk documents. I could not find any material
>> in the documentation for this as well, hence I decided to ask here.
>>
>>
>> Thanking you in advance,
>>
>> Arihant.
>>
>>
>>
>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>> Windows 10
>>
>>
>>
>
>
> --
> Adrien
>

Re: Boolean Scorer

Posted by Adrien Grand <jp...@gmail.com>.
Hello Arihant,

The Scorer for disjunctions uses a heap data structure that needs to be
reordered upon every hit. While reordering heaps is efficient as it runs in
logarithmic time, the fact that it needs to run on every document might add
non-negligible overhead. BooleanScorer tries to work around this overhead
by scoring large windows of documents in a more TAAT (term-at-a-time)
fashion so that Lucene only needs to reorder the heap every 2048 doc IDs
(the hardcoded window size).

This paper gives a bit more context:
http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
see section 4 in particular.

On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <ar...@gmail.com> wrote:

> Hi ,
>
> I am new here . I would like to know what is the exact optimisation
> carried out in “Boolean Scorer.java” code which led to a separate class for
> resolving Boolean Queries in bulk documents. I could not find any material
> in the documentation for this as well, hence I decided to ask here.
>
>
> Thanking you in advance,
>
> Arihant.
>
>
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>


-- 
Adrien