You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Daniel Jeliński <dj...@gmail.com> on 2015/11/01 23:39:23 UTC

TopDocs.merge PriorityQueue usage

Hello all,
The function TopDocs.merge uses PriorityQueue in a pattern: pop, update
value (ref.hitIndex++), add. JavaDocs for PriorityQueue.updateTop
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/5.2.0/org/apache/lucene/util/PriorityQueue.java#204>
say that using this function instead should be at least twice as fast.
Would a patch like the one attached be acceptable? Should I create a JIRA
issue for it?
I tried comparing the time taken to run ant test before and after the patch
was applied, but apparently it was affected by random factors more than it
was affected by the patch, so I don't have any performance numbers to show
if / how much it changed. Is there any standard way of benchmarking?
Regards,
Daniel

Re: TopDocs.merge PriorityQueue usage

Posted by Adrien Grand <jp...@gmail.com>.

Hi Daniel,

I saw the JIRA issue, thanks! Unfortunately Mike's benchmarks won't help as
they try to track performance of indexing/searching a single index, while
the code that you modified is about merging TopDocs from several indices.
When I said micro benchmarks, I don't think we need anything fancy like
JMH, I would be fine with just some java code that creates some random
TopDocs instances and measures how long was spent running TopDocs.merge on
them with System.nanotime. If we want to avoid the pitfalls of benchmarks,
something else that would be interesting to know is how many times the
PriorityQueue.lessThan method is called on trunk vs. your patch, I suspect
it will be less with your patch.

Le mar. 3 nov. 2015 à 00:25, Daniel Jeliński <dj...@gmail.com> a
écrit :

> Hi Adrien,
> LUCENE-6878 created.
> This method is called by some of IndexSearcher's search overrides. I'm
> going to try out Mike's benchmark
> <https://github.com/mikemccand/luceneutil> first, and learn to write
> micro benchmarks in the meantime.
> Regards,
> Daniel
>
> 2015-11-02 0:08 GMT+01:00 Adrien Grand <jp...@gmail.com>:
>
>> Hi Daniel,
>>
>> Your patch could indeed make things more efficient when merging top hits
>> from many shards, and the code is still easy to read, so +1 to create a
>> JIRA issue. I'm not surprised that ant test did not get faster as we rarely
>> call this method when running tests, maybe you can try to write a simple
>> micro benchmark from randomly generated TopDocs instances?
>>
>>
>> Le dim. 1 nov. 2015 à 23:39, Daniel Jeliński <dj...@gmail.com> a
>> écrit :
>>
>>> Hello all,
>>> The function TopDocs.merge uses PriorityQueue in a pattern: pop, update
>>> value (ref.hitIndex++), add. JavaDocs for PriorityQueue.updateTop
>>> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/5.2.0/org/apache/lucene/util/PriorityQueue.java#204>
>>> say that using this function instead should be at least twice as fast.
>>> Would a patch like the one attached be acceptable? Should I create a JIRA
>>> issue for it?
>>> I tried comparing the time taken to run ant test before and after the
>>> patch was applied, but apparently it was affected by random factors more
>>> than it was affected by the patch, so I don't have any performance numbers
>>> to show if / how much it changed. Is there any standard way of benchmarking?
>>> Regards,
>>> Daniel
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

Re: TopDocs.merge PriorityQueue usage

Posted by Daniel Jeliński <dj...@gmail.com>.

Hi Adrien,
LUCENE-6878 created.
This method is called by some of IndexSearcher's search overrides. I'm
going to try out Mike's benchmark <https://github.com/mikemccand/luceneutil>
first, and learn to write micro benchmarks in the meantime.
Regards,
Daniel

2015-11-02 0:08 GMT+01:00 Adrien Grand <jp...@gmail.com>:

> Hi Daniel,
>
> Your patch could indeed make things more efficient when merging top hits
> from many shards, and the code is still easy to read, so +1 to create a
> JIRA issue. I'm not surprised that ant test did not get faster as we rarely
> call this method when running tests, maybe you can try to write a simple
> micro benchmark from randomly generated TopDocs instances?
>
>
> Le dim. 1 nov. 2015 à 23:39, Daniel Jeliński <dj...@gmail.com> a
> écrit :
>
>> Hello all,
>> The function TopDocs.merge uses PriorityQueue in a pattern: pop, update
>> value (ref.hitIndex++), add. JavaDocs for PriorityQueue.updateTop
>> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/5.2.0/org/apache/lucene/util/PriorityQueue.java#204>
>> say that using this function instead should be at least twice as fast.
>> Would a patch like the one attached be acceptable? Should I create a JIRA
>> issue for it?
>> I tried comparing the time taken to run ant test before and after the
>> patch was applied, but apparently it was affected by random factors more
>> than it was affected by the patch, so I don't have any performance numbers
>> to show if / how much it changed. Is there any standard way of benchmarking?
>> Regards,
>> Daniel
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: TopDocs.merge PriorityQueue usage

Posted by Adrien Grand <jp...@gmail.com>.

Hi Daniel,

Your patch could indeed make things more efficient when merging top hits
from many shards, and the code is still easy to read, so +1 to create a
JIRA issue. I'm not surprised that ant test did not get faster as we rarely
call this method when running tests, maybe you can try to write a simple
micro benchmark from randomly generated TopDocs instances?


Le dim. 1 nov. 2015 à 23:39, Daniel Jeliński <dj...@gmail.com> a
écrit :

> Hello all,
> The function TopDocs.merge uses PriorityQueue in a pattern: pop, update
> value (ref.hitIndex++), add. JavaDocs for PriorityQueue.updateTop
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/5.2.0/org/apache/lucene/util/PriorityQueue.java#204>
> say that using this function instead should be at least twice as fast.
> Would a patch like the one attached be acceptable? Should I create a JIRA
> issue for it?
> I tried comparing the time taken to run ant test before and after the
> patch was applied, but apparently it was affected by random factors more
> than it was affected by the patch, so I don't have any performance numbers
> to show if / how much it changed. Is there any standard way of benchmarking?
> Regards,
> Daniel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org