You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by David Smiley <da...@gmail.com> on 2018/06/07 17:21:26 UTC

ScorerSupplier and cost() avoidance

I'm working with some folks who did some profiling and noticed that
ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says
only to call it if necessary. Unfortunately, a BooleanQuery is going to
call cost() (via BooleanWeight.scorer() even if ultimately no Query in the
tree cares what the cost is.  I'm not sure if that's a perf bug or not;
it's hard to tell.

The expensive part of cost() for Boolean2ScorerSupplier is over in
MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even
if trivially numScorers == 1.  That's a weird case... why do we even need a
Boolean2ScorerSupplier around one clause; couldn't that clause be returned
from the outer weight, BooleanWeight.scorerSupplier() close to the end as
an optimization?  I could file an issue.

~ David
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: ScorerSupplier and cost() avoidance

Posted by Adrien Grand <jp...@gmail.com>.
Got it. I thought you were talking about the class method, not the static
function.

Le jeu. 7 juin 2018 à 22:34, David Smiley <da...@gmail.com> a
écrit :

> I'll have to look further with the team on what these queries look like in
> terms of relative cheapness and throughput.
>
> RE MinShouldMatchSumScorer.cost() (static method called by
> Boolean2ScorerSupplier.computeCost) --  it's pretty easy to see that this
> is called with minShouldMatch==0 (or 1).  Set a conditional breakpoint in
> your IDE and run TestBooleanMinShouldMatch.testRandomQueries
>
> On Thu, Jun 7, 2018 at 3:43 PM Adrien Grand <jp...@gmail.com> wrote:
>
>> I suspect this could only show up as a bottleneck if they run very cheap
>> queries (low cost) at a very high throughput? Is it the case? I've seen a
>> couple workloads like that in the past and profilers suggested that things
>> that usually do not matter were bottleneck like creating scorers or
>> deciding whether a query should be cached. But trying to fix it didn't
>> really help as there are lots of things that we need to do to decide how to
>> run a query that run in O(num_segments * num_clauses)
>>
>> I'm confused why MinShouldMatchSumScorer would be used when
>> minShouldMatch is 0 or 1. DisjunctionSumScorer should be used instead for
>> such values of minShouldMatch?
>>
>> Le jeu. 7 juin 2018 à 19:38, Michael McCandless <
>> lucene@mikemccandless.com> a écrit :
>>
>>> Doesn't BQ rewrite itself if it has only one clause?
>>>
>>> Or maybe if there were more than one clause, and then all but one of
>>> them had null scorers (on SHOULD clauses) you could wind up in that state?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Thu, Jun 7, 2018 at 1:21 PM, David Smiley <da...@gmail.com>
>>> wrote:
>>>
>>>> I'm working with some folks who did some profiling and noticed that
>>>> ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says
>>>> only to call it if necessary. Unfortunately, a BooleanQuery is going to
>>>> call cost() (via BooleanWeight.scorer() even if ultimately no Query in the
>>>> tree cares what the cost is.  I'm not sure if that's a perf bug or not;
>>>> it's hard to tell.
>>>>
>>>> The expensive part of cost() for Boolean2ScorerSupplier is over in
>>>> MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even
>>>> if trivially numScorers == 1.  That's a weird case... why do we even need a
>>>> Boolean2ScorerSupplier around one clause; couldn't that clause be returned
>>>> from the outer weight, BooleanWeight.scorerSupplier() close to the end as
>>>> an optimization?  I could file an issue.
>>>>
>>>> ~ David
>>>> --
>>>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>>> http://www.solrenterprisesearchserver.com
>>>>
>>>
>>> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>

Re: ScorerSupplier and cost() avoidance

Posted by David Smiley <da...@gmail.com>.
I'll have to look further with the team on what these queries look like in
terms of relative cheapness and throughput.

RE MinShouldMatchSumScorer.cost() (static method called by
Boolean2ScorerSupplier.computeCost) --  it's pretty easy to see that this
is called with minShouldMatch==0 (or 1).  Set a conditional breakpoint in
your IDE and run TestBooleanMinShouldMatch.testRandomQueries

On Thu, Jun 7, 2018 at 3:43 PM Adrien Grand <jp...@gmail.com> wrote:

> I suspect this could only show up as a bottleneck if they run very cheap
> queries (low cost) at a very high throughput? Is it the case? I've seen a
> couple workloads like that in the past and profilers suggested that things
> that usually do not matter were bottleneck like creating scorers or
> deciding whether a query should be cached. But trying to fix it didn't
> really help as there are lots of things that we need to do to decide how to
> run a query that run in O(num_segments * num_clauses)
>
> I'm confused why MinShouldMatchSumScorer would be used when minShouldMatch
> is 0 or 1. DisjunctionSumScorer should be used instead for such values of
> minShouldMatch?
>
> Le jeu. 7 juin 2018 à 19:38, Michael McCandless <lu...@mikemccandless.com>
> a écrit :
>
>> Doesn't BQ rewrite itself if it has only one clause?
>>
>> Or maybe if there were more than one clause, and then all but one of them
>> had null scorers (on SHOULD clauses) you could wind up in that state?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Jun 7, 2018 at 1:21 PM, David Smiley <da...@gmail.com>
>> wrote:
>>
>>> I'm working with some folks who did some profiling and noticed that
>>> ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says
>>> only to call it if necessary. Unfortunately, a BooleanQuery is going to
>>> call cost() (via BooleanWeight.scorer() even if ultimately no Query in the
>>> tree cares what the cost is.  I'm not sure if that's a perf bug or not;
>>> it's hard to tell.
>>>
>>> The expensive part of cost() for Boolean2ScorerSupplier is over in
>>> MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even
>>> if trivially numScorers == 1.  That's a weird case... why do we even need a
>>> Boolean2ScorerSupplier around one clause; couldn't that clause be returned
>>> from the outer weight, BooleanWeight.scorerSupplier() close to the end as
>>> an optimization?  I could file an issue.
>>>
>>> ~ David
>>> --
>>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>> http://www.solrenterprisesearchserver.com
>>>
>>
>> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: ScorerSupplier and cost() avoidance

Posted by Adrien Grand <jp...@gmail.com>.
I suspect this could only show up as a bottleneck if they run very cheap
queries (low cost) at a very high throughput? Is it the case? I've seen a
couple workloads like that in the past and profilers suggested that things
that usually do not matter were bottleneck like creating scorers or
deciding whether a query should be cached. But trying to fix it didn't
really help as there are lots of things that we need to do to decide how to
run a query that run in O(num_segments * num_clauses)

I'm confused why MinShouldMatchSumScorer would be used when minShouldMatch
is 0 or 1. DisjunctionSumScorer should be used instead for such values of
minShouldMatch?

Le jeu. 7 juin 2018 à 19:38, Michael McCandless <lu...@mikemccandless.com>
a écrit :

> Doesn't BQ rewrite itself if it has only one clause?
>
> Or maybe if there were more than one clause, and then all but one of them
> had null scorers (on SHOULD clauses) you could wind up in that state?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Jun 7, 2018 at 1:21 PM, David Smiley <da...@gmail.com>
> wrote:
>
>> I'm working with some folks who did some profiling and noticed that
>> ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says
>> only to call it if necessary. Unfortunately, a BooleanQuery is going to
>> call cost() (via BooleanWeight.scorer() even if ultimately no Query in the
>> tree cares what the cost is.  I'm not sure if that's a perf bug or not;
>> it's hard to tell.
>>
>> The expensive part of cost() for Boolean2ScorerSupplier is over in
>> MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even
>> if trivially numScorers == 1.  That's a weird case... why do we even need a
>> Boolean2ScorerSupplier around one clause; couldn't that clause be returned
>> from the outer weight, BooleanWeight.scorerSupplier() close to the end as
>> an optimization?  I could file an issue.
>>
>> ~ David
>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>
>

Re: ScorerSupplier and cost() avoidance

Posted by David Smiley <da...@gmail.com>.
I think that's right Michael -- some null scorers.  Started with 3 in one
case and wound up with 1.

Also, in MinShouldMatchSumScorer.cost, sometimes the size of the
PriorityQueue isn't needed at all, which is when minShouldMatch is 0 or 1.
In that case, simply sum the costs.

On Thu, Jun 7, 2018 at 1:38 PM Michael McCandless <lu...@mikemccandless.com>
wrote:

> Doesn't BQ rewrite itself if it has only one clause?
>
> Or maybe if there were more than one clause, and then all but one of them
> had null scorers (on SHOULD clauses) you could wind up in that state?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Jun 7, 2018 at 1:21 PM, David Smiley <da...@gmail.com>
> wrote:
>
>> I'm working with some folks who did some profiling and noticed that
>> ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says
>> only to call it if necessary. Unfortunately, a BooleanQuery is going to
>> call cost() (via BooleanWeight.scorer() even if ultimately no Query in the
>> tree cares what the cost is.  I'm not sure if that's a perf bug or not;
>> it's hard to tell.
>>
>> The expensive part of cost() for Boolean2ScorerSupplier is over in
>> MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even
>> if trivially numScorers == 1.  That's a weird case... why do we even need a
>> Boolean2ScorerSupplier around one clause; couldn't that clause be returned
>> from the outer weight, BooleanWeight.scorerSupplier() close to the end as
>> an optimization?  I could file an issue.
>>
>> ~ David
>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: ScorerSupplier and cost() avoidance

Posted by Michael McCandless <lu...@mikemccandless.com>.
Doesn't BQ rewrite itself if it has only one clause?

Or maybe if there were more than one clause, and then all but one of them
had null scorers (on SHOULD clauses) you could wind up in that state?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jun 7, 2018 at 1:21 PM, David Smiley <da...@gmail.com>
wrote:

> I'm working with some folks who did some profiling and noticed that
> ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says
> only to call it if necessary. Unfortunately, a BooleanQuery is going to
> call cost() (via BooleanWeight.scorer() even if ultimately no Query in the
> tree cares what the cost is.  I'm not sure if that's a perf bug or not;
> it's hard to tell.
>
> The expensive part of cost() for Boolean2ScorerSupplier is over in
> MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even
> if trivially numScorers == 1.  That's a weird case... why do we even need a
> Boolean2ScorerSupplier around one clause; couldn't that clause be returned
> from the outer weight, BooleanWeight.scorerSupplier() close to the end as
> an optimization?  I could file an issue.
>
> ~ David
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.
> solrenterprisesearchserver.com
>