You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2007/12/10 08:11:50 UTC

Performance Improvement for Search using PriorityQueue

Hi

Lucene's PQ implements two methods: put (assumes the PQ has room for the
object) and insert (checks whether the object can be inserted etc.). The
implementation of insert() requires the application that uses it to allocate
a new object every time it calls insert. Specifically, it cannot reuse the
objects that were removed from the PQ.
I've done some measurements on the performance that search would gain by
using that method, through the change of TopDocCollector.

PriorityQueue change (added insertWithOverflow method)
-----------------------------------------------------------------------------------
    /**
     * insertWithOverflow() is similar to insert(), except its return value:
it
     * returns the object (if any) that was dropped off the heap because it
was
     * full. This can be the given parameter (in case it is smaller than the
     * full heap's minimum, and couldn't be added) or another object that
was
     * previously the smallest value in the heap and now has been replaced
by a
     * larger one.
     */
    public Object insertWithOverflow(Object element) {
        if (size < maxSize) {
            put(element);
            return null;
        } else if (size > 0 && !lessThan(element, top())) {
            Object ret = heap[1];
            heap[1] = element;
            adjustTop();
            return ret;
        } else {
            return element;
        }
    }
[Very similar to insert(), only it returns the object that was kicked out of
the Queue, or null]

TopDocCollector's current implementation of collect()
----------------------------------------------------------------------------
  public void collect(int doc, float score) {
    if (score > 0.0f) {
      totalHits++;
      if (hq.size() < numHits || score >= minScore) {
        hq.insert(new ScoreDoc(doc, score));
        minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
      }
    }
  }
[See how it allocates a new ScoreDoc every time this method is called]

TopDocCollector's new implementation of collect()
-----------------------------------------------------------------------------
  public void collect(int doc, float score) {
      if (score == 0.0f) {
          return;
      }
      totalHits++;
      if (hq.size() < numHits || score >= minScore) {
          if (sd == null) {
              sd = new ScoreDoc(doc, score);
          } else {
              sd.doc = doc;
              sd.score = score;
          }
          sd = (ScoreDoc) hq.insertWithOverflow(sd);
          minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
      }
  }
[sd is a class memeber of the collector, of type ScoreDoc]

Now for the performance tests. I've done two tests:
1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and 100,000,000
times using both implementations. Here are the results:
                              1M         10M          100M
Current Collector      218 ms   1,907ms    20,000ms
Modified Collector    141 ms    1,297ms   12,672ms
As you can see, the new implementation 30-40% faster.

2. Wrote some tasks in the benchmark package that makes use of the new PQ
while executing a real search over an index with 1M documents, of small size
(10 characters). Here are the results:
                          Current TopDocCollector
-----------------------------------------------------------------------------------------------------------------------------------------------
------------> Report Sum By (any) Name (5 about 6 out of 6)
Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
avgUsedMem    avgTotalMem
CreateIndex         0        1            1         10.6        0.09
2,219,112      4,389,376
AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   52,075.2 -  -  19.20 -
34,497,176 -   52,689,408
CloseIndex          0        2            1          0.1       16.62
26,178,972     51,378,688
OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00 -
48,771,304 -   50,067,968
--> MySearch            0        1            1          5.3        0.19
48,771,304     50,067,968


                          Modified TopDocCollector
-----------------------------------------------------------------------------------------------------------------------------------------------
------------> Report Sum By (any) Name (5 about 6 out of 6)
Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
avgUsedMem    avgTotalMem
CreateIndex         0        1            1         10.6        0.09
2,207,008      4,389,376
AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   50,955.4 -  -  19.62 -
32,531,992 -   44,825,088
CloseIndex          0        2            1          0.1       16.84
57,853,148     61,929,984
OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00 -
57,792,136 -   61,929,984
--> MySearch            0        1            1          7.1        0.14
57,939,856     61,929,984
Notice the time difference in search (0.14 modified vs. 0.19 current). Here
too, a 30% improvement.

One thing that I wasn't able to show, but I think it's pretty much clear -->
the new implementation causes a lot less object allocations. Consider the
typical search for 10 results over an index with 1M documents. The current
implementation will allocate 1M (!) ScoreDoc instances, while the new one
will allocate 11 (10 for the PQ and 1 for reusing). On heavily loaded
systems, this will result in far less work for the GC.

I would like to suggest to reflect this new implementation in PriorityQueue
and also modify TopDocCollector to make use of this new method. Several ways
to do it:
1. Add insertWithOverflow to PQ (I hate the name and I'm willing to accept
better ones), make insert() deprecated (let's say in 2.3) and release after
that (2.4?) rename insertWithOverflow to insert(). A complex process, but it
won't break existing applications' code.
2. Change insert's signature and state that applications that move to
2.3(for example), need to change their code as well (actually it won't
compile).
3. Any other alternatives?

And .. of course, review the classes in the Lucene code that use PQ and
modify them as necessary.

A very long email for a very simple (but important) performance improvement.

Shai

Re: Performance Improvement for Search using PriorityQueue

Posted by Michael McCandless <lu...@mikemccandless.com>.

I agree that even though we don't see gains on the queries tested,  
there are in theory cases where there could be a great many  
allocations that would be saved.

I think we should do Shai's suggested option 1 (add the method and  
change TDC to call it), change heap to be protected not private, plus  
the 2 tiny performance gains Nadav suggests below?  Shai can you open  
a Jira issue & attach a patch for these changes?  Thanks!

Mike

Nadav Har'El wrote:

> On Mon, Dec 10, 2007, Shai Erera wrote about "Performance  
> Improvement for Search using PriorityQueue":
>> Hi
>>
>> Lucene's PQ implements two methods: put (assumes the PQ has room  
>> for the
>> object) and insert (checks whether the object can be inserted  
>> etc.). The
>> implementation of insert() requires the application that uses it  
>> to allocate
>> a new object every time it calls insert. Specifically, it cannot  
>> reuse the
>> objects that were removed from the PQ.
>
> I've read this entire thread, and would like to add my comments  
> about three
> independent issues, which I think that can and perhaps should be  
> considered
> separately:
>
> 1. When Shai wanted to add the insertWithOverflow() method to  
> PriorityQueue
>    he couldn't just subclass PriorityQueue in his application, but  
> rather
>    was forced to modify PriorityQueue itself. Why? just because one  
> field
>    of that classi - "heap" - was defined "private" instead of  
> "protected".
>
>    Is there a special reason for that? If not, can we make the  
> trivial change
>    to make PriorityQueue's fields protected, to allow Shai and  
> others (see the
>    next point) to add functionality on top of PriorityQueue?
>
> 2. PriorityQueue, in addition to being used in about a dozen places  
> inside
>    Lucene, is also a public class that advanced users often find  
> useful when
>    implementing features like new collectors, new queries, and so on.
>    Unfortunately, my experience exactly matches Shai's: In the two  
> occasions
>    where I used a PriorityQueue, I found that I needed such a
>    insertWithOverflow() method. If this feature is so useful (I  
> can't believe
>    that Shai and me are the only ones who found it useful), I think  
> it would
>    be nice to add it to Lucene's PriorityQueue, even if it isn't  
> (yet) used
>    inside Lucene.
>
>    Just to make it sound more interesting, let me give you an  
> example where
>    I needed (and implemented) an "insertWithOverflow()": I was  
> implementing a
>    faceted search capability over Lucene. It calculated a count for  
> each
>    facet value, and then I used a PriorityQueue to find the 10 best  
> values.
>    The problem is that I also needed an "other" aggregator, which  
> was supposed
>    to aggregate (in various ways) all the facets except the 10 best  
> ones. For
>    that, I needed to know which facets dropped off the priorityqueue.
>
> 3. Finally, Shai asked for this new PriorityQueue.insertWithOverflow()
>    to be used in TopDocCollector. I have to admit I don't know how  
> much
>    of a benefit this will be in the "typical" case. But I do know that
>    there's no such thing as a "typical" case...
>    I can easily think of a quite typical "worst case" though:  
> Consider a
>    collection indexed in order of document age (a pretty typical  
> scenario
>    for a long-running index), and then you do a sorting query
>    (TopFieldDocCollector), asking it to bring the 10 newest documents.
>    In that case, each and every document will have a new DocScore  
> created -
>    which is the worst-case that Shai feared.
>    It would be nice if Shai or someone else could provide a  
> measurement in
>    that case.
>
> P.S. When looking now at PriorityQueue's code, I found two tiny
> performance improvements that could be easily made to it - I wonder if
> there's any reason not to do them:
>
>  1. Insert can use heap[1] directly instead of calling top(). After  
> all,
>     this is done in an if() that already ensures that size>0.
>
>  2. Regardless, top() could return heap[1] always, without any if 
> (). After
>     all, the heap array is initialized to all nulls, so when  
> size==0, heap[1]
>     is null anyway.
>
>
>
>> PriorityQueue change (added insertWithOverflow method)
>> --------------------------------------------------------------------- 
>> --------------
>>     /**
>>      * insertWithOverflow() is similar to insert(), except its  
>> return value:
>> it
>>      * returns the object (if any) that was dropped off the heap  
>> because it
>> was
>>      * full. This can be the given parameter (in case it is  
>> smaller than the
>>      * full heap's minimum, and couldn't be added) or another  
>> object that
>> was
>>      * previously the smallest value in the heap and now has been  
>> replaced
>> by a
>>      * larger one.
>>      */
>>     public Object insertWithOverflow(Object element) {
>>         if (size < maxSize) {
>>             put(element);
>>             return null;
>>         } else if (size > 0 && !lessThan(element, top())) {
>>             Object ret = heap[1];
>>             heap[1] = element;
>>             adjustTop();
>>             return ret;
>>         } else {
>>             return element;
>>         }
>>     }
>> [Very similar to insert(), only it returns the object that was  
>> kicked out of
>> the Queue, or null]
>
> -- 
> Nadav Har'El                        |       Tuesday, Dec 11 2007, 3  
> Tevet 5768
> IBM Haifa Research Lab               
> |-----------------------------------------
>                                     |A professor is one who talks  
> in someone
> http://nadav.harel.org.il           |else's sleep.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Michael McCandless <lu...@mikemccandless.com>.

Shai Erera wrote:

> Hi,
> I will open an issue and create the patch. One thing I'm not sure  
> of is the
> wouldBeInserted method you mentioned - in what context should it be  
> used?
> And ... lessThan shouldn't be public, it can stay protected.

Sorry, this is a method Peter suggested (see below) in order to add  
de-duping logic on top of the PQ.

We should do it separately (it's unrelated).

Peter can you open an issue for that one?  Thanks!

Mike

>
> On 12/11/07, Michael McCandless <lu...@mikemccandless.com> wrote:
>>
>>
>> I think we can't make lessThan public since that would cause
>> subclasses to fail to compile (ie this breaks backwards  
>> compatibility)?
>>
>> Adding "wouldBeInserted()" seems OK?
>>
>> Mike
>>
>> Peter Keegan wrote:
>>
>>> See my similar request from last March:
>>> http://www.nabble.com/FieldSortedHitQueue-enhancement-
>>> to9733550.html#a9733550
>>>
>>> Peter
>>>
>>> On Dec 11, 2007 11:54 AM, Nadav Har'El <ny...@math.technion.ac.il>
>>> wrote:
>>>
>>>> On Mon, Dec 10, 2007, Shai Erera wrote about "Performance
>>>> Improvement for
>>>> Search using PriorityQueue":
>>>>> Hi
>>>>>
>>>>> Lucene's PQ implements two methods: put (assumes the PQ has room
>>>>> for the
>>>>> object) and insert (checks whether the object can be inserted
>>>>> etc.). The
>>>>> implementation of insert() requires the application that uses  
>>>>> it to
>>>> allocate
>>>>> a new object every time it calls insert. Specifically, it cannot
>>>>> reuse
>>>> the
>>>>> objects that were removed from the PQ.
>>>>
>>>> I've read this entire thread, and would like to add my comments  
>>>> about
>>>> three
>>>> independent issues, which I think that can and perhaps should be
>>>> considered
>>>> separately:
>>>>
>>>> 1. When Shai wanted to add the insertWithOverflow() method to
>>>> PriorityQueue
>>>>   he couldn't just subclass PriorityQueue in his application, but
>>>> rather
>>>>   was forced to modify PriorityQueue itself. Why? just because one
>>>> field
>>>>   of that classi - "heap" - was defined "private" instead of
>>>> "protected".
>>>>
>>>>   Is there a special reason for that? If not, can we make the  
>>>> trivial
>>>> change
>>>>   to make PriorityQueue's fields protected, to allow Shai and
>>>> others (see
>>>> the
>>>>   next point) to add functionality on top of PriorityQueue?
>>>>
>>>> 2. PriorityQueue, in addition to being used in about a dozen
>>>> places inside
>>>>   Lucene, is also a public class that advanced users often find
>>>> useful
>>>> when
>>>>   implementing features like new collectors, new queries, and so  
>>>> on.
>>>>   Unfortunately, my experience exactly matches Shai's: In the two
>>>> occasions
>>>>   where I used a PriorityQueue, I found that I needed such a
>>>>   insertWithOverflow() method. If this feature is so useful (I  
>>>> can't
>>>> believe
>>>>   that Shai and me are the only ones who found it useful), I  
>>>> think it
>>>> would
>>>>   be nice to add it to Lucene's PriorityQueue, even if it isn't
>>>> (yet) used
>>>>   inside Lucene.
>>>>
>>>>   Just to make it sound more interesting, let me give you an
>>>> example where
>>>>   I needed (and implemented) an "insertWithOverflow()": I was
>>>> implementing
>>>> a
>>>>   faceted search capability over Lucene. It calculated a count for
>>>> each
>>>>   facet value, and then I used a PriorityQueue to find the 10 best
>>>> values.
>>>>   The problem is that I also needed an "other" aggregator, which  
>>>> was
>>>> supposed
>>>>   to aggregate (in various ways) all the facets except the 10 best
>>>> ones.
>>>> For
>>>>   that, I needed to know which facets dropped off the  
>>>> priorityqueue.
>>>>
>>>> 3. Finally, Shai asked for this new
>>>> PriorityQueue.insertWithOverflow()
>>>>   to be used in TopDocCollector. I have to admit I don't know how
>>>> much
>>>>   of a benefit this will be in the "typical" case. But I do know  
>>>> that
>>>>   there's no such thing as a "typical" case...
>>>>   I can easily think of a quite typical "worst case" though:
>>>> Consider a
>>>>   collection indexed in order of document age (a pretty typical
>>>> scenario
>>>>   for a long-running index), and then you do a sorting query
>>>>   (TopFieldDocCollector), asking it to bring the 10 newest  
>>>> documents.
>>>>   In that case, each and every document will have a new DocScore
>>>> created -
>>>>   which is the worst-case that Shai feared.
>>>>   It would be nice if Shai or someone else could provide a
>>>> measurement in
>>>>   that case.
>>>>
>>>> P.S. When looking now at PriorityQueue's code, I found two tiny
>>>> performance improvements that could be easily made to it - I
>>>> wonder if
>>>> there's any reason not to do them:
>>>>
>>>>  1. Insert can use heap[1] directly instead of calling top().
>>>> After all,
>>>>    this is done in an if() that already ensures that size>0.
>>>>
>>>>  2. Regardless, top() could return heap[1] always, without any if
>>>> (). After
>>>>    all, the heap array is initialized to all nulls, so when  
>>>> size==0,
>>>> heap[1]
>>>>    is null anyway.
>>>>
>>>>
>>>>
>>>>> PriorityQueue change (added insertWithOverflow method)
>>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> --------------
>>>>>     /**
>>>>>      * insertWithOverflow() is similar to insert(), except its
>>>>> return
>>>> value:
>>>>> it
>>>>>      * returns the object (if any) that was dropped off the heap
>>>>> because
>>>> it
>>>>> was
>>>>>      * full. This can be the given parameter (in case it is
>>>>> smaller than
>>>> the
>>>>>      * full heap's minimum, and couldn't be added) or another  
>>>>> object
>>>> that
>>>>> was
>>>>>      * previously the smallest value in the heap and now has been
>>>> replaced
>>>>> by a
>>>>>      * larger one.
>>>>>      */
>>>>>     public Object insertWithOverflow(Object element) {
>>>>>         if (size < maxSize) {
>>>>>             put(element);
>>>>>             return null;
>>>>>         } else if (size > 0 && !lessThan(element, top())) {
>>>>>             Object ret = heap[1];
>>>>>             heap[1] = element;
>>>>>             adjustTop();
>>>>>             return ret;
>>>>>         } else {
>>>>>             return element;
>>>>>         }
>>>>>     }
>>>>> [Very similar to insert(), only it returns the object that was
>>>>> kicked
>>>> out of
>>>>> the Queue, or null]
>>>>
>>>> --
>>>> Nadav Har'El                        |       Tuesday, Dec 11 2007,
>>>> 3 Tevet
>>>> 5768
>>>> IBM Haifa Research Lab
>>>>  |-----------------------------------------
>>>>                                    |A professor is one who talks in
>>>> someone
>>>> http://nadav.harel.org.il           |else's sleep.
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> -- 
> Regards,
>
> Shai Erera


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

Hi,
I will open an issue and create the patch. One thing I'm not sure of is the
wouldBeInserted method you mentioned - in what context should it be used?
And ... lessThan shouldn't be public, it can stay protected.


On 12/11/07, Michael McCandless <lu...@mikemccandless.com> wrote:
>
>
> I think we can't make lessThan public since that would cause
> subclasses to fail to compile (ie this breaks backwards compatibility)?
>
> Adding "wouldBeInserted()" seems OK?
>
> Mike
>
> Peter Keegan wrote:
>
> > See my similar request from last March:
> > http://www.nabble.com/FieldSortedHitQueue-enhancement-
> > to9733550.html#a9733550
> >
> > Peter
> >
> > On Dec 11, 2007 11:54 AM, Nadav Har'El <ny...@math.technion.ac.il>
> > wrote:
> >
> >> On Mon, Dec 10, 2007, Shai Erera wrote about "Performance
> >> Improvement for
> >> Search using PriorityQueue":
> >>> Hi
> >>>
> >>> Lucene's PQ implements two methods: put (assumes the PQ has room
> >>> for the
> >>> object) and insert (checks whether the object can be inserted
> >>> etc.). The
> >>> implementation of insert() requires the application that uses it to
> >> allocate
> >>> a new object every time it calls insert. Specifically, it cannot
> >>> reuse
> >> the
> >>> objects that were removed from the PQ.
> >>
> >> I've read this entire thread, and would like to add my comments about
> >> three
> >> independent issues, which I think that can and perhaps should be
> >> considered
> >> separately:
> >>
> >> 1. When Shai wanted to add the insertWithOverflow() method to
> >> PriorityQueue
> >>   he couldn't just subclass PriorityQueue in his application, but
> >> rather
> >>   was forced to modify PriorityQueue itself. Why? just because one
> >> field
> >>   of that classi - "heap" - was defined "private" instead of
> >> "protected".
> >>
> >>   Is there a special reason for that? If not, can we make the trivial
> >> change
> >>   to make PriorityQueue's fields protected, to allow Shai and
> >> others (see
> >> the
> >>   next point) to add functionality on top of PriorityQueue?
> >>
> >> 2. PriorityQueue, in addition to being used in about a dozen
> >> places inside
> >>   Lucene, is also a public class that advanced users often find
> >> useful
> >> when
> >>   implementing features like new collectors, new queries, and so on.
> >>   Unfortunately, my experience exactly matches Shai's: In the two
> >> occasions
> >>   where I used a PriorityQueue, I found that I needed such a
> >>   insertWithOverflow() method. If this feature is so useful (I can't
> >> believe
> >>   that Shai and me are the only ones who found it useful), I think it
> >> would
> >>   be nice to add it to Lucene's PriorityQueue, even if it isn't
> >> (yet) used
> >>   inside Lucene.
> >>
> >>   Just to make it sound more interesting, let me give you an
> >> example where
> >>   I needed (and implemented) an "insertWithOverflow()": I was
> >> implementing
> >> a
> >>   faceted search capability over Lucene. It calculated a count for
> >> each
> >>   facet value, and then I used a PriorityQueue to find the 10 best
> >> values.
> >>   The problem is that I also needed an "other" aggregator, which was
> >> supposed
> >>   to aggregate (in various ways) all the facets except the 10 best
> >> ones.
> >> For
> >>   that, I needed to know which facets dropped off the priorityqueue.
> >>
> >> 3. Finally, Shai asked for this new
> >> PriorityQueue.insertWithOverflow()
> >>   to be used in TopDocCollector. I have to admit I don't know how
> >> much
> >>   of a benefit this will be in the "typical" case. But I do know that
> >>   there's no such thing as a "typical" case...
> >>   I can easily think of a quite typical "worst case" though:
> >> Consider a
> >>   collection indexed in order of document age (a pretty typical
> >> scenario
> >>   for a long-running index), and then you do a sorting query
> >>   (TopFieldDocCollector), asking it to bring the 10 newest documents.
> >>   In that case, each and every document will have a new DocScore
> >> created -
> >>   which is the worst-case that Shai feared.
> >>   It would be nice if Shai or someone else could provide a
> >> measurement in
> >>   that case.
> >>
> >> P.S. When looking now at PriorityQueue's code, I found two tiny
> >> performance improvements that could be easily made to it - I
> >> wonder if
> >> there's any reason not to do them:
> >>
> >>  1. Insert can use heap[1] directly instead of calling top().
> >> After all,
> >>    this is done in an if() that already ensures that size>0.
> >>
> >>  2. Regardless, top() could return heap[1] always, without any if
> >> (). After
> >>    all, the heap array is initialized to all nulls, so when size==0,
> >> heap[1]
> >>    is null anyway.
> >>
> >>
> >>
> >>> PriorityQueue change (added insertWithOverflow method)
> >>>
> >> ---------------------------------------------------------------------
> >> --------------
> >>>     /**
> >>>      * insertWithOverflow() is similar to insert(), except its
> >>> return
> >> value:
> >>> it
> >>>      * returns the object (if any) that was dropped off the heap
> >>> because
> >> it
> >>> was
> >>>      * full. This can be the given parameter (in case it is
> >>> smaller than
> >> the
> >>>      * full heap's minimum, and couldn't be added) or another object
> >> that
> >>> was
> >>>      * previously the smallest value in the heap and now has been
> >> replaced
> >>> by a
> >>>      * larger one.
> >>>      */
> >>>     public Object insertWithOverflow(Object element) {
> >>>         if (size < maxSize) {
> >>>             put(element);
> >>>             return null;
> >>>         } else if (size > 0 && !lessThan(element, top())) {
> >>>             Object ret = heap[1];
> >>>             heap[1] = element;
> >>>             adjustTop();
> >>>             return ret;
> >>>         } else {
> >>>             return element;
> >>>         }
> >>>     }
> >>> [Very similar to insert(), only it returns the object that was
> >>> kicked
> >> out of
> >>> the Queue, or null]
> >>
> >> --
> >> Nadav Har'El                        |       Tuesday, Dec 11 2007,
> >> 3 Tevet
> >> 5768
> >> IBM Haifa Research Lab
> >>  |-----------------------------------------
> >>                                    |A professor is one who talks in
> >> someone
> >> http://nadav.harel.org.il           |else's sleep.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Michael McCandless <lu...@mikemccandless.com>.

I think we can't make lessThan public since that would cause  
subclasses to fail to compile (ie this breaks backwards compatibility)?

Adding "wouldBeInserted()" seems OK?

Mike

Peter Keegan wrote:

> See my similar request from last March:
> http://www.nabble.com/FieldSortedHitQueue-enhancement- 
> to9733550.html#a9733550
>
> Peter
>
> On Dec 11, 2007 11:54 AM, Nadav Har'El <ny...@math.technion.ac.il>  
> wrote:
>
>> On Mon, Dec 10, 2007, Shai Erera wrote about "Performance  
>> Improvement for
>> Search using PriorityQueue":
>>> Hi
>>>
>>> Lucene's PQ implements two methods: put (assumes the PQ has room  
>>> for the
>>> object) and insert (checks whether the object can be inserted  
>>> etc.). The
>>> implementation of insert() requires the application that uses it to
>> allocate
>>> a new object every time it calls insert. Specifically, it cannot  
>>> reuse
>> the
>>> objects that were removed from the PQ.
>>
>> I've read this entire thread, and would like to add my comments about
>> three
>> independent issues, which I think that can and perhaps should be
>> considered
>> separately:
>>
>> 1. When Shai wanted to add the insertWithOverflow() method to
>> PriorityQueue
>>   he couldn't just subclass PriorityQueue in his application, but  
>> rather
>>   was forced to modify PriorityQueue itself. Why? just because one  
>> field
>>   of that classi - "heap" - was defined "private" instead of  
>> "protected".
>>
>>   Is there a special reason for that? If not, can we make the trivial
>> change
>>   to make PriorityQueue's fields protected, to allow Shai and  
>> others (see
>> the
>>   next point) to add functionality on top of PriorityQueue?
>>
>> 2. PriorityQueue, in addition to being used in about a dozen  
>> places inside
>>   Lucene, is also a public class that advanced users often find  
>> useful
>> when
>>   implementing features like new collectors, new queries, and so on.
>>   Unfortunately, my experience exactly matches Shai's: In the two
>> occasions
>>   where I used a PriorityQueue, I found that I needed such a
>>   insertWithOverflow() method. If this feature is so useful (I can't
>> believe
>>   that Shai and me are the only ones who found it useful), I think it
>> would
>>   be nice to add it to Lucene's PriorityQueue, even if it isn't  
>> (yet) used
>>   inside Lucene.
>>
>>   Just to make it sound more interesting, let me give you an  
>> example where
>>   I needed (and implemented) an "insertWithOverflow()": I was  
>> implementing
>> a
>>   faceted search capability over Lucene. It calculated a count for  
>> each
>>   facet value, and then I used a PriorityQueue to find the 10 best  
>> values.
>>   The problem is that I also needed an "other" aggregator, which was
>> supposed
>>   to aggregate (in various ways) all the facets except the 10 best  
>> ones.
>> For
>>   that, I needed to know which facets dropped off the priorityqueue.
>>
>> 3. Finally, Shai asked for this new  
>> PriorityQueue.insertWithOverflow()
>>   to be used in TopDocCollector. I have to admit I don't know how  
>> much
>>   of a benefit this will be in the "typical" case. But I do know that
>>   there's no such thing as a "typical" case...
>>   I can easily think of a quite typical "worst case" though:  
>> Consider a
>>   collection indexed in order of document age (a pretty typical  
>> scenario
>>   for a long-running index), and then you do a sorting query
>>   (TopFieldDocCollector), asking it to bring the 10 newest documents.
>>   In that case, each and every document will have a new DocScore  
>> created -
>>   which is the worst-case that Shai feared.
>>   It would be nice if Shai or someone else could provide a  
>> measurement in
>>   that case.
>>
>> P.S. When looking now at PriorityQueue's code, I found two tiny
>> performance improvements that could be easily made to it - I  
>> wonder if
>> there's any reason not to do them:
>>
>>  1. Insert can use heap[1] directly instead of calling top().  
>> After all,
>>    this is done in an if() that already ensures that size>0.
>>
>>  2. Regardless, top() could return heap[1] always, without any if 
>> (). After
>>    all, the heap array is initialized to all nulls, so when size==0,
>> heap[1]
>>    is null anyway.
>>
>>
>>
>>> PriorityQueue change (added insertWithOverflow method)
>>>
>> --------------------------------------------------------------------- 
>> --------------
>>>     /**
>>>      * insertWithOverflow() is similar to insert(), except its  
>>> return
>> value:
>>> it
>>>      * returns the object (if any) that was dropped off the heap  
>>> because
>> it
>>> was
>>>      * full. This can be the given parameter (in case it is  
>>> smaller than
>> the
>>>      * full heap's minimum, and couldn't be added) or another object
>> that
>>> was
>>>      * previously the smallest value in the heap and now has been
>> replaced
>>> by a
>>>      * larger one.
>>>      */
>>>     public Object insertWithOverflow(Object element) {
>>>         if (size < maxSize) {
>>>             put(element);
>>>             return null;
>>>         } else if (size > 0 && !lessThan(element, top())) {
>>>             Object ret = heap[1];
>>>             heap[1] = element;
>>>             adjustTop();
>>>             return ret;
>>>         } else {
>>>             return element;
>>>         }
>>>     }
>>> [Very similar to insert(), only it returns the object that was  
>>> kicked
>> out of
>>> the Queue, or null]
>>
>> --
>> Nadav Har'El                        |       Tuesday, Dec 11 2007,  
>> 3 Tevet
>> 5768
>> IBM Haifa Research Lab
>>  |-----------------------------------------
>>                                    |A professor is one who talks in
>> someone
>> http://nadav.harel.org.il           |else's sleep.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Peter Keegan <pe...@gmail.com>.

See my similar request from last March:
http://www.nabble.com/FieldSortedHitQueue-enhancement-to9733550.html#a9733550

Peter

On Dec 11, 2007 11:54 AM, Nadav Har'El <ny...@math.technion.ac.il> wrote:

> On Mon, Dec 10, 2007, Shai Erera wrote about "Performance Improvement for
> Search using PriorityQueue":
> > Hi
> >
> > Lucene's PQ implements two methods: put (assumes the PQ has room for the
> > object) and insert (checks whether the object can be inserted etc.). The
> > implementation of insert() requires the application that uses it to
> allocate
> > a new object every time it calls insert. Specifically, it cannot reuse
> the
> > objects that were removed from the PQ.
>
> I've read this entire thread, and would like to add my comments about
> three
> independent issues, which I think that can and perhaps should be
> considered
> separately:
>
> 1. When Shai wanted to add the insertWithOverflow() method to
> PriorityQueue
>   he couldn't just subclass PriorityQueue in his application, but rather
>   was forced to modify PriorityQueue itself. Why? just because one field
>   of that classi - "heap" - was defined "private" instead of "protected".
>
>   Is there a special reason for that? If not, can we make the trivial
> change
>   to make PriorityQueue's fields protected, to allow Shai and others (see
> the
>   next point) to add functionality on top of PriorityQueue?
>
> 2. PriorityQueue, in addition to being used in about a dozen places inside
>   Lucene, is also a public class that advanced users often find useful
> when
>   implementing features like new collectors, new queries, and so on.
>   Unfortunately, my experience exactly matches Shai's: In the two
> occasions
>   where I used a PriorityQueue, I found that I needed such a
>   insertWithOverflow() method. If this feature is so useful (I can't
> believe
>   that Shai and me are the only ones who found it useful), I think it
> would
>   be nice to add it to Lucene's PriorityQueue, even if it isn't (yet) used
>   inside Lucene.
>
>   Just to make it sound more interesting, let me give you an example where
>   I needed (and implemented) an "insertWithOverflow()": I was implementing
> a
>   faceted search capability over Lucene. It calculated a count for each
>   facet value, and then I used a PriorityQueue to find the 10 best values.
>   The problem is that I also needed an "other" aggregator, which was
> supposed
>   to aggregate (in various ways) all the facets except the 10 best ones.
> For
>   that, I needed to know which facets dropped off the priorityqueue.
>
> 3. Finally, Shai asked for this new PriorityQueue.insertWithOverflow()
>   to be used in TopDocCollector. I have to admit I don't know how much
>   of a benefit this will be in the "typical" case. But I do know that
>   there's no such thing as a "typical" case...
>   I can easily think of a quite typical "worst case" though: Consider a
>   collection indexed in order of document age (a pretty typical scenario
>   for a long-running index), and then you do a sorting query
>   (TopFieldDocCollector), asking it to bring the 10 newest documents.
>   In that case, each and every document will have a new DocScore created -
>   which is the worst-case that Shai feared.
>   It would be nice if Shai or someone else could provide a measurement in
>   that case.
>
> P.S. When looking now at PriorityQueue's code, I found two tiny
> performance improvements that could be easily made to it - I wonder if
> there's any reason not to do them:
>
>  1. Insert can use heap[1] directly instead of calling top(). After all,
>    this is done in an if() that already ensures that size>0.
>
>  2. Regardless, top() could return heap[1] always, without any if(). After
>    all, the heap array is initialized to all nulls, so when size==0,
> heap[1]
>    is null anyway.
>
>
>
> > PriorityQueue change (added insertWithOverflow method)
> >
> -----------------------------------------------------------------------------------
> >     /**
> >      * insertWithOverflow() is similar to insert(), except its return
> value:
> > it
> >      * returns the object (if any) that was dropped off the heap because
> it
> > was
> >      * full. This can be the given parameter (in case it is smaller than
> the
> >      * full heap's minimum, and couldn't be added) or another object
> that
> > was
> >      * previously the smallest value in the heap and now has been
> replaced
> > by a
> >      * larger one.
> >      */
> >     public Object insertWithOverflow(Object element) {
> >         if (size < maxSize) {
> >             put(element);
> >             return null;
> >         } else if (size > 0 && !lessThan(element, top())) {
> >             Object ret = heap[1];
> >             heap[1] = element;
> >             adjustTop();
> >             return ret;
> >         } else {
> >             return element;
> >         }
> >     }
> > [Very similar to insert(), only it returns the object that was kicked
> out of
> > the Queue, or null]
>
> --
> Nadav Har'El                        |       Tuesday, Dec 11 2007, 3 Tevet
> 5768
> IBM Haifa Research Lab
>  |-----------------------------------------
>                                    |A professor is one who talks in
> someone
> http://nadav.harel.org.il           |else's sleep.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Performance Improvement for Search using PriorityQueue

Posted by Nadav Har'El <ny...@math.technion.ac.il>.

On Mon, Dec 10, 2007, Shai Erera wrote about "Performance Improvement for Search using PriorityQueue":
> Hi
> 
> Lucene's PQ implements two methods: put (assumes the PQ has room for the
> object) and insert (checks whether the object can be inserted etc.). The
> implementation of insert() requires the application that uses it to allocate
> a new object every time it calls insert. Specifically, it cannot reuse the
> objects that were removed from the PQ.

I've read this entire thread, and would like to add my comments about three
independent issues, which I think that can and perhaps should be considered
separately:

1. When Shai wanted to add the insertWithOverflow() method to PriorityQueue
   he couldn't just subclass PriorityQueue in his application, but rather
   was forced to modify PriorityQueue itself. Why? just because one field
   of that classi - "heap" - was defined "private" instead of "protected".

   Is there a special reason for that? If not, can we make the trivial change
   to make PriorityQueue's fields protected, to allow Shai and others (see the
   next point) to add functionality on top of PriorityQueue?

2. PriorityQueue, in addition to being used in about a dozen places inside
   Lucene, is also a public class that advanced users often find useful when
   implementing features like new collectors, new queries, and so on.
   Unfortunately, my experience exactly matches Shai's: In the two occasions
   where I used a PriorityQueue, I found that I needed such a 
   insertWithOverflow() method. If this feature is so useful (I can't believe
   that Shai and me are the only ones who found it useful), I think it would
   be nice to add it to Lucene's PriorityQueue, even if it isn't (yet) used
   inside Lucene.

   Just to make it sound more interesting, let me give you an example where
   I needed (and implemented) an "insertWithOverflow()": I was implementing a
   faceted search capability over Lucene. It calculated a count for each
   facet value, and then I used a PriorityQueue to find the 10 best values.
   The problem is that I also needed an "other" aggregator, which was supposed
   to aggregate (in various ways) all the facets except the 10 best ones. For
   that, I needed to know which facets dropped off the priorityqueue.

3. Finally, Shai asked for this new PriorityQueue.insertWithOverflow()
   to be used in TopDocCollector. I have to admit I don't know how much
   of a benefit this will be in the "typical" case. But I do know that
   there's no such thing as a "typical" case...
   I can easily think of a quite typical "worst case" though: Consider a
   collection indexed in order of document age (a pretty typical scenario
   for a long-running index), and then you do a sorting query
   (TopFieldDocCollector), asking it to bring the 10 newest documents.
   In that case, each and every document will have a new DocScore created -
   which is the worst-case that Shai feared.
   It would be nice if Shai or someone else could provide a measurement in
   that case.

P.S. When looking now at PriorityQueue's code, I found two tiny
performance improvements that could be easily made to it - I wonder if
there's any reason not to do them:

 1. Insert can use heap[1] directly instead of calling top(). After all,
    this is done in an if() that already ensures that size>0.

 2. Regardless, top() could return heap[1] always, without any if(). After
    all, the heap array is initialized to all nulls, so when size==0, heap[1]
    is null anyway.



> PriorityQueue change (added insertWithOverflow method)
> -----------------------------------------------------------------------------------
>     /**
>      * insertWithOverflow() is similar to insert(), except its return value:
> it
>      * returns the object (if any) that was dropped off the heap because it
> was
>      * full. This can be the given parameter (in case it is smaller than the
>      * full heap's minimum, and couldn't be added) or another object that
> was
>      * previously the smallest value in the heap and now has been replaced
> by a
>      * larger one.
>      */
>     public Object insertWithOverflow(Object element) {
>         if (size < maxSize) {
>             put(element);
>             return null;
>         } else if (size > 0 && !lessThan(element, top())) {
>             Object ret = heap[1];
>             heap[1] = element;
>             adjustTop();
>             return ret;
>         } else {
>             return element;
>         }
>     }
> [Very similar to insert(), only it returns the object that was kicked out of
> the Queue, or null]

-- 
Nadav Har'El                        |       Tuesday, Dec 11 2007, 3 Tevet 5768
IBM Haifa Research Lab              |-----------------------------------------
                                    |A professor is one who talks in someone
http://nadav.harel.org.il           |else's sleep.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

Done (PriorityQueue-2.patch)

Shai

On Dec 12, 2007 1:46 PM, Michael McCandless <lu...@mikemccandless.com>
wrote:

>
> I think it's fine to include those changes with this issue.
>
> Mike
>
> Shai Erera wrote:
>
> > Hi
> >
> > I created https://issues.apache.org/jira/browse/LUCENE-1089 and
> > added a
> > patch.
> > I noticed that we can replace the calls to insert() with
> > insertWithOverflow() in several other places, like
> > QualityQueriesFinder,
> > FuzzyQuery and TopFieldDocCollector. I wasn't sure if that should
> > be handled
> > as part of this issue, or a different one.
> >
> > On Dec 11, 2007 8:32 PM, Yonik Seeley <yo...@apache.org> wrote:
> >
> >> On Dec 11, 2007 1:21 PM, Timo Nentwig <lu...@nitwit.de> wrote:
> >>> On Tuesday 11 December 2007 14:32:12 Shai Erera wrote:
> >>>> For (1) - I can't explain it but I've run into documents with
> >>>> 0.0fscores.
> >>>> For (2) - this is a simple logic - if the lowest score in the
> >>>> queue is
> >> 'x'
> >>>> and you want to top docs only, then there's no point in
> >>>> attempting to
> >>>> insert a document with score lower than 'x' (it will not be added).
> >>>
> >>> Sure. I didn't notice that score is passed as parameter and was
> >> surprised that
> >>> subsequent calls to collect() are supposed to be guaranteed to
> >>> have a
> >> lower
> >>> score.
> >>
> >> One is not guaranteed this... collect() generally goes in docid
> >> order,
> >> and scores are unordered.
> >>
> >> If you are only gathering the top 10 docs by score, you can compare
> >> the current score to the lowest of the top 10 you currently have to
> >> determine if you should bother inserting into the queue.
> >>
> >> -Yonik
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Shai Erera
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Michael McCandless <lu...@mikemccandless.com>.

I think it's fine to include those changes with this issue.

Mike

Shai Erera wrote:

> Hi
>
> I created https://issues.apache.org/jira/browse/LUCENE-1089 and  
> added a
> patch.
> I noticed that we can replace the calls to insert() with
> insertWithOverflow() in several other places, like  
> QualityQueriesFinder,
> FuzzyQuery and TopFieldDocCollector. I wasn't sure if that should  
> be handled
> as part of this issue, or a different one.
>
> On Dec 11, 2007 8:32 PM, Yonik Seeley <yo...@apache.org> wrote:
>
>> On Dec 11, 2007 1:21 PM, Timo Nentwig <lu...@nitwit.de> wrote:
>>> On Tuesday 11 December 2007 14:32:12 Shai Erera wrote:
>>>> For (1) - I can't explain it but I've run into documents with  
>>>> 0.0fscores.
>>>> For (2) - this is a simple logic - if the lowest score in the  
>>>> queue is
>> 'x'
>>>> and you want to top docs only, then there's no point in  
>>>> attempting to
>>>> insert a document with score lower than 'x' (it will not be added).
>>>
>>> Sure. I didn't notice that score is passed as parameter and was
>> surprised that
>>> subsequent calls to collect() are supposed to be guaranteed to  
>>> have a
>> lower
>>> score.
>>
>> One is not guaranteed this... collect() generally goes in docid  
>> order,
>> and scores are unordered.
>>
>> If you are only gathering the top 10 docs by score, you can compare
>> the current score to the lowest of the top 10 you currently have to
>> determine if you should bother inserting into the queue.
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> -- 
> Regards,
>
> Shai Erera


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

Hi

I created https://issues.apache.org/jira/browse/LUCENE-1089 and added a
patch.
I noticed that we can replace the calls to insert() with
insertWithOverflow() in several other places, like QualityQueriesFinder,
FuzzyQuery and TopFieldDocCollector. I wasn't sure if that should be handled
as part of this issue, or a different one.

On Dec 11, 2007 8:32 PM, Yonik Seeley <yo...@apache.org> wrote:

> On Dec 11, 2007 1:21 PM, Timo Nentwig <lu...@nitwit.de> wrote:
> > On Tuesday 11 December 2007 14:32:12 Shai Erera wrote:
> > > For (1) - I can't explain it but I've run into documents with 0.0fscores.
> > > For (2) - this is a simple logic - if the lowest score in the queue is
> 'x'
> > > and you want to top docs only, then there's no point in attempting to
> > > insert a document with score lower than 'x' (it will not be added).
> >
> > Sure. I didn't notice that score is passed as parameter and was
> surprised that
> > subsequent calls to collect() are supposed to be guaranteed to have a
> lower
> > score.
>
> One is not guaranteed this... collect() generally goes in docid order,
> and scores are unordered.
>
> If you are only gathering the top 10 docs by score, you can compare
> the current score to the lowest of the top 10 you currently have to
> determine if you should bother inserting into the queue.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Yonik Seeley <yo...@apache.org>.

On Dec 11, 2007 1:21 PM, Timo Nentwig <lu...@nitwit.de> wrote:
> On Tuesday 11 December 2007 14:32:12 Shai Erera wrote:
> > For (1) - I can't explain it but I've run into documents with 0.0f scores.
> > For (2) - this is a simple logic - if the lowest score in the queue is 'x'
> > and you want to top docs only, then there's no point in attempting to
> > insert a document with score lower than 'x' (it will not be added).
>
> Sure. I didn't notice that score is passed as parameter and was surprised that
> subsequent calls to collect() are supposed to be guaranteed to have a lower
> score.

One is not guaranteed this... collect() generally goes in docid order,
and scores are unordered.

If you are only gathering the top 10 docs by score, you can compare
the current score to the lowest of the top 10 you currently have to
determine if you should bother inserting into the queue.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Timo Nentwig <lu...@nitwit.de>.

On Tuesday 11 December 2007 14:32:12 Shai Erera wrote:
> For (1) - I can't explain it but I've run into documents with 0.0f scores.
> For (2) - this is a simple logic - if the lowest score in the queue is 'x'
> and you want to top docs only, then there's no point in attempting to
> insert a document with score lower than 'x' (it will not be added).

Sure. I didn't notice that score is passed as parameter and was surprised that 
subsequent calls to collect() are supposed to be guaranteed to have a lower 
score.

Ok, stupid question :)

> Maybe I didn't understand your question correctly though ...
>
> On Dec 11, 2007 2:25 PM, Timo Nentwig <lu...@nitwit.de> wrote:
> > On Monday 10 December 2007 09:15:12 Paul Elschot wrote:
> > > The current TopDocCollector only allocates a ScoreDoc when the given
> > > score causes a new ScoreDoc to be added into the queue, but it does
> >
> > I actually wrote my own HitCollector and now wonder about
> > TopDocCollector:
> >
> >  public void collect(int doc, float score) {
> >    if (score > 0.0f) {
> >      totalHits++;
> >      if (hq.size() < numHits || score >= minScore) {
> >        hq.insert(new ScoreDoc(doc, score));
> >        minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
> >      }
> >    }
> >  }
> >
> > 1) How can there be hits with score=0.0?
> > 2) I don't understand minScore: inserts only document having a higher
> > score
> > than the lowest score already in queue?
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

For (1) - I can't explain it but I've run into documents with 0.0f scores.
For (2) - this is a simple logic - if the lowest score in the queue is 'x'
and you want to top docs only, then there's no point in attempting to insert
a document with score lower than 'x' (it will not be added).
Maybe I didn't understand your question correctly though ...

On Dec 11, 2007 2:25 PM, Timo Nentwig <lu...@nitwit.de> wrote:

> On Monday 10 December 2007 09:15:12 Paul Elschot wrote:
> > The current TopDocCollector only allocates a ScoreDoc when the given
> > score causes a new ScoreDoc to be added into the queue, but it does
>
> I actually wrote my own HitCollector and now wonder about TopDocCollector:
>
>  public void collect(int doc, float score) {
>    if (score > 0.0f) {
>      totalHits++;
>      if (hq.size() < numHits || score >= minScore) {
>        hq.insert(new ScoreDoc(doc, score));
>        minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
>      }
>    }
>  }
>
> 1) How can there be hits with score=0.0?
> 2) I don't understand minScore: inserts only document having a higher
> score
> than the lowest score already in queue?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Timo Nentwig <lu...@nitwit.de>.

On Monday 10 December 2007 09:15:12 Paul Elschot wrote:
> The current TopDocCollector only allocates a ScoreDoc when the given
> score causes a new ScoreDoc to be added into the queue, but it does

I actually wrote my own HitCollector and now wonder about TopDocCollector:

  public void collect(int doc, float score) {
    if (score > 0.0f) {
      totalHits++;
      if (hq.size() < numHits || score >= minScore) {
        hq.insert(new ScoreDoc(doc, score));
        minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
      }
    }
  }

1) How can there be hits with score=0.0?
2) I don't understand minScore: inserts only document having a higher score 
than the lowest score already in queue?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by robert engels <re...@ix.netcom.com>.

As has been pointed out on many threads, a modern JVM really doesn't  
need you to recycle objects, especially for small short lived objects.

It is actually less efficient in many cases (since the variables need  
to be reinitialized).

Using object pools (except when pooling external resources) is a  
waste of time.

On Dec 10, 2007, at 1:31 PM, Shai Erera wrote:

> Hi
>
> Well, I have results from a 1M index for now (the index contains 100K
> documents duplicated 10 times, so it's not the final test I'll run,  
> but it
> still shows something). I ran 2000 short queries (2.4 keywords on  
> average)
> on a 1M docs index, after 50 queries warm-up. Following are the  
> results:
>
> Current TopDocCollector:
> ------------------------------------
> num queries: 2000
> numDocs=1000000
> total time: 15910 ms
> avg time: 7.955 ms
> avg. allocations: 79.7445
> total allocation time: 0
> avg. num results: 54841.26
>
> Modified TopDocCollector:
> -------------------------------------
> num queries: 2000
> numDocs=1000000
> total time: 15909 ms
> avg time: 7.9545 ms
> avg. allocations: 9.8565
> total allocation time: 0
> avg. num results: 54841.26
>
> As you can see, the actual allocation time is really negligible and  
> there
> isn't much difference in the avg. running times of the queries.  
> However, the
> *current* runs performed a lot worse at the beginning, before the  
> OS cache
> warmed up.
> The only significant difference is the number of allocations - the  
> modified
> TDC and PQ allocate ~90% (!) less objects. This is significant,  
> especially
> in heavy loaded systems.
>
> Tomorrow I will run the same test on a 10M (unique) documents  
> index, but I
> think the picture is getting clear ...
>
>
> On Dec 10, 2007 6:15 PM, Paul Elschot <pa...@xs4all.nl> wrote:
>
>> On Monday 10 December 2007 09:19:43 Shai Erera wrote:
>>> I agree w.r.t the current implementation, however in the worse  
>>> case (as
>> we
>>> tend to consider when talking about computer algorithms), it will
>> allocate a
>>> ScoreDoc per result. With the overflow reuse, it will not  
>>> allocate those
>>> objects, no matter what's the input it gets.
>>> Also, notice that there is a performance hit with the current
>> implementation
>>> of TopDocCollector: it first checks that the document should be  
>>> inserted
>> and
>>> if so, PQ does the same check again.
>>> So, if you change the current implementation to always attempt an
>> insert,
>>> you'd gain some performance improvements as well.
>>
>> One could consider a specialized priority queue, somewhat like
>> ScorerDocQueue, with an insert operation using the arguments
>> as in the collect() call instead a ScoreDoc object.
>> I don't know where HitQueue is currently used, but that could be
>> the place to inline PriorityQueue instead of inheriting from it.
>> Otherwise TopDocCollector could be used for inlining PriorityQueue.
>>
>> Although practical tests are always good, an extra performance test
>> with the worst case of slowly increasing score values should be quite
>> helpful for comparing with the current implementation and to get  
>> the last
>> bits of performance out of this.
>>
>> Regards,
>> Paul Elschot
>>
>>
>>>
>>> On Dec 10, 2007 10:15 AM, Paul Elschot <pa...@xs4all.nl>  
>>> wrote:
>>>
>>>> The current TopDocCollector only allocates a ScoreDoc when the  
>>>> given
>>>> score causes a new ScoreDoc to be added into the queue, but it does
>>>> not reuse anything that overflows out of the queue.
>>>> So, reusing the overflow out of the queue should reduce object
>>>> allocations. especially for indexes that tend to have better  
>>>> scoring
>>>> docs at the end. I wouldn't expect a 30% improvement out of that,
>>>> but it would help, if only to reduce occasional performance
>>>> deteriorations.
>>>>
>>>> Regards,
>>>> Paul Elschot
>>>>
>>>>
>>>> On Monday 10 December 2007 08:11:50 Shai Erera wrote:
>>>>> Hi
>>>>>
>>>>> Lucene's PQ implements two methods: put (assumes the PQ has  
>>>>> room for
>> the
>>>>> object) and insert (checks whether the object can be inserted  
>>>>> etc.).
>> The
>>>>> implementation of insert() requires the application that uses  
>>>>> it to
>>>> allocate
>>>>> a new object every time it calls insert. Specifically, it cannot
>> reuse
>>>> the
>>>>> objects that were removed from the PQ.
>>>>> I've done some measurements on the performance that search would
>> gain by
>>>>> using that method, through the change of TopDocCollector.
>>>>>
>>>>> PriorityQueue change (added insertWithOverflow method)
>>>>>
>>>>
>> --------------------------------------------------------------------- 
>> --------------
>>>>>     /**
>>>>>      * insertWithOverflow() is similar to insert(), except its
>> return
>>>> value:
>>>>> it
>>>>>      * returns the object (if any) that was dropped off the heap
>> because
>>>> it
>>>>> was
>>>>>      * full. This can be the given parameter (in case it is  
>>>>> smaller
>> than
>>>> the
>>>>>      * full heap's minimum, and couldn't be added) or another  
>>>>> object
>>>> that
>>>>> was
>>>>>      * previously the smallest value in the heap and now has been
>>>> replaced
>>>>> by a
>>>>>      * larger one.
>>>>>      */
>>>>>     public Object insertWithOverflow(Object element) {
>>>>>         if (size < maxSize) {
>>>>>             put(element);
>>>>>             return null;
>>>>>         } else if (size > 0 && !lessThan(element, top())) {
>>>>>             Object ret = heap[1];
>>>>>             heap[1] = element;
>>>>>             adjustTop();
>>>>>             return ret;
>>>>>         } else {
>>>>>             return element;
>>>>>         }
>>>>>     }
>>>>> [Very similar to insert(), only it returns the object that was
>> kicked
>>>> out of
>>>>> the Queue, or null]
>>>>>
>>>>> TopDocCollector's current implementation of collect()
>>>>>
>>>>
>> --------------------------------------------------------------------- 
>> -------
>>>>>   public void collect(int doc, float score) {
>>>>>     if (score > 0.0f) {
>>>>>       totalHits++;
>>>>>       if (hq.size() < numHits || score >= minScore) {
>>>>>         hq.insert(new ScoreDoc(doc, score));
>>>>>         minScore = ((ScoreDoc)hq.top()).score; // maintain  
>>>>> minScore
>>>>>       }
>>>>>     }
>>>>>   }
>>>>> [See how it allocates a new ScoreDoc every time this method is
>> called]
>>>>>
>>>>> TopDocCollector's new implementation of collect()
>>>>>
>>>>
>> --------------------------------------------------------------------- 
>> --------
>>>>>   public void collect(int doc, float score) {
>>>>>       if (score == 0.0f) {
>>>>>           return;
>>>>>       }
>>>>>       totalHits++;
>>>>>       if (hq.size() < numHits || score >= minScore) {
>>>>>           if (sd == null) {
>>>>>               sd = new ScoreDoc(doc, score);
>>>>>           } else {
>>>>>               sd.doc = doc;
>>>>>               sd.score = score;
>>>>>           }
>>>>>           sd = (ScoreDoc) hq.insertWithOverflow(sd);
>>>>>           minScore = ((ScoreDoc)hq.top()).score; // maintain
>> minScore
>>>>>       }
>>>>>   }
>>>>> [sd is a class memeber of the collector, of type ScoreDoc]
>>>>>
>>>>> Now for the performance tests. I've done two tests:
>>>>> 1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and
>> 100,000,000
>>>>> times using both implementations. Here are the results:
>>>>>                               1M         10M          100M
>>>>> Current Collector      218 ms   1,907ms    20,000ms
>>>>> Modified Collector    141 ms    1,297ms   12,672ms
>>>>> As you can see, the new implementation 30-40% faster.
>>>>>
>>>>> 2. Wrote some tasks in the benchmark package that makes use of the
>> new
>>>> PQ
>>>>> while executing a real search over an index with 1M documents, of
>> small
>>>> size
>>>>> (10 characters). Here are the results:
>>>>>                           Current TopDocCollector
>>>>>
>>>>
>> --------------------------------------------------------------------- 
>> --------------------------------------------------------------------- 
>> -----
>>>>> ------------> Report Sum By (any) Name (5 about 6 out of 6)
>>>>> Operation       round   runCnt   recsPerRun        rec/s   
>>>>> elapsedSec
>>>>> avgUsedMem    avgTotalMem
>>>>> CreateIndex         0        1            1         10.6         
>>>>> 0.09
>>>>> 2,219,112      4,389,376
>>>>> AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   52,075.2 -  -   
>>>>> 19.20-
>>>>> 34,497,176 -   52,689,408
>>>>> CloseIndex          0        2            1          0.1        
>>>>> 16.62
>>>>> 26,178,972     51,378,688
>>>>> OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -    
>>>>> 0.00-
>>>>> 48,771,304 -   50,067,968
>>>>> --> MySearch            0        1            1          5.3
>> 0.19
>>>>> 48,771,304     50,067,968
>>>>>
>>>>>
>>>>>                           Modified TopDocCollector
>>>>>
>>>>
>> --------------------------------------------------------------------- 
>> --------------------------------------------------------------------- 
>> -----
>>>>> ------------> Report Sum By (any) Name (5 about 6 out of 6)
>>>>> Operation       round   runCnt   recsPerRun        rec/s   
>>>>> elapsedSec
>>>>> avgUsedMem    avgTotalMem
>>>>> CreateIndex         0        1            1         10.6         
>>>>> 0.09
>>>>> 2,207,008      4,389,376
>>>>> AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   50,955.4 -  -   
>>>>> 19.62-
>>>>> 32,531,992 -   44,825,088
>>>>> CloseIndex          0        2            1          0.1        
>>>>> 16.84
>>>>> 57,853,148     61,929,984
>>>>> OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -    
>>>>> 0.00-
>>>>> 57,792,136 -   61,929,984
>>>>> --> MySearch            0        1            1          7.1
>> 0.14
>>>>> 57,939,856     61,929,984
>>>>> Notice the time difference in search (0.14 modified vs.  
>>>>> 0.19current).
>>>> Here
>>>>> too, a 30% improvement.
>>>>>
>>>>> One thing that I wasn't able to show, but I think it's pretty much
>> clear
>>>> -->
>>>>> the new implementation causes a lot less object allocations.
>> Consider
>>>> the
>>>>> typical search for 10 results over an index with 1M documents. The
>>>> current
>>>>> implementation will allocate 1M (!) ScoreDoc instances, while the
>> new
>>>> one
>>>>> will allocate 11 (10 for the PQ and 1 for reusing). On heavily
>> loaded
>>>>> systems, this will result in far less work for the GC.
>>>>>
>>>>> I would like to suggest to reflect this new implementation in
>>>> PriorityQueue
>>>>> and also modify TopDocCollector to make use of this new method.
>> Several
>>>> ways
>>>>> to do it:
>>>>> 1. Add insertWithOverflow to PQ (I hate the name and I'm  
>>>>> willing to
>>>> accept
>>>>> better ones), make insert() deprecated (let's say in 2.3) and
>> release
>>>> after
>>>>> that (2.4?) rename insertWithOverflow to insert(). A complex
>> process,
>>>> but it
>>>>> won't break existing applications' code.
>>>>> 2. Change insert's signature and state that applications that move
>> to
>>>>> 2.3(for example), need to change their code as well (actually it
>> won't
>>>>> compile).
>>>>> 3. Any other alternatives?
>>>>>
>>>>> And .. of course, review the classes in the Lucene code that  
>>>>> use PQ
>> and
>>>>> modify them as necessary.
>>>>>
>>>>> A very long email for a very simple (but important) performance
>>>> improvement.
>>>>>
>>>>> Shai
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> -- 
> Regards,
>
> Shai Erera


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

Well .. I suspect this behavior is due the nature of the index - 100K docs
duplicated 10 times. Therefore at some point it hits the same documents (and
scores). Like I said, tomorrow I'll re-run the test on a 10M unique docs
index.
I agree that 80 allocations are not much, but that's per query. There were
160,000 allocations overall, which does cause some work to the GC.
Why not save those allocations?

On Dec 10, 2007 10:09 PM, Michael McCandless <lu...@mikemccandless.com>
wrote:

>
> Shai Erera wrote:
>
> > Hi
> >
> > Well, I have results from a 1M index for now (the index contains 100K
> > documents duplicated 10 times, so it's not the final test I'll run,
> > but it
> > still shows something). I ran 2000 short queries (2.4 keywords on
> > average)
> > on a 1M docs index, after 50 queries warm-up. Following are the
> > results:
> >
> > Current TopDocCollector:
> > ------------------------------------
> > num queries: 2000
> > numDocs=1000000
> > total time: 15910 ms
> > avg time: 7.955 ms
> > avg. allocations: 79.7445
> > total allocation time: 0
> > avg. num results: 54841.26
>
> Avg number of allocations per query is ~80?  Ie, there were only ~80
> inserts in to the PQ, meaning it had very quickly accumulated the top
> scoring docs and then doesn't change much after that?  This seems
> surprisingly low?  Am I mis-reading this or something?
>
> > Modified TopDocCollector:
> > -------------------------------------
> > num queries: 2000
> > numDocs=1000000
> > total time: 15909 ms
> > avg time: 7.9545 ms
> > avg. allocations: 9.8565
> > total allocation time: 0
> > avg. num results: 54841.26
> >
> > As you can see, the actual allocation time is really negligible and
> > there
> > isn't much difference in the avg. running times of the queries.
> > However, the
> > *current* runs performed a lot worse at the beginning, before the
> > OS cache
> > warmed up.
>
> I'm also baffled by that difference, especially if we are only
> talking about 80 allocations per query to begin with.  Did you flush
> OS cache before running "Modified" to make sure it wasn't just using
> the cache?
>
> > The only significant difference is the number of allocations - the
> > modified
> > TDC and PQ allocate ~90% (!) less objects. This is significant,
> > especially
> > in heavy loaded systems.
>
> But, 80 allocations is really not very many to begin with?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Michael McCandless <lu...@mikemccandless.com>.

Shai Erera wrote:

> Hi
>
> Well, I have results from a 1M index for now (the index contains 100K
> documents duplicated 10 times, so it's not the final test I'll run,  
> but it
> still shows something). I ran 2000 short queries (2.4 keywords on  
> average)
> on a 1M docs index, after 50 queries warm-up. Following are the  
> results:
>
> Current TopDocCollector:
> ------------------------------------
> num queries: 2000
> numDocs=1000000
> total time: 15910 ms
> avg time: 7.955 ms
> avg. allocations: 79.7445
> total allocation time: 0
> avg. num results: 54841.26

Avg number of allocations per query is ~80?  Ie, there were only ~80  
inserts in to the PQ, meaning it had very quickly accumulated the top  
scoring docs and then doesn't change much after that?  This seems  
surprisingly low?  Am I mis-reading this or something?

> Modified TopDocCollector:
> -------------------------------------
> num queries: 2000
> numDocs=1000000
> total time: 15909 ms
> avg time: 7.9545 ms
> avg. allocations: 9.8565
> total allocation time: 0
> avg. num results: 54841.26
>
> As you can see, the actual allocation time is really negligible and  
> there
> isn't much difference in the avg. running times of the queries.  
> However, the
> *current* runs performed a lot worse at the beginning, before the  
> OS cache
> warmed up.

I'm also baffled by that difference, especially if we are only  
talking about 80 allocations per query to begin with.  Did you flush  
OS cache before running "Modified" to make sure it wasn't just using  
the cache?

> The only significant difference is the number of allocations - the  
> modified
> TDC and PQ allocate ~90% (!) less objects. This is significant,  
> especially
> in heavy loaded systems.

But, 80 allocations is really not very many to begin with?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Daniel Naber <li...@danielnaber.de>.

On Montag, 10. Dezember 2007, Michael Busch wrote:

> Reboot your machine ;-) That's what I usually do - if there's another
> way I'd like to know as well!

On Linux (kernel 2.6.16 and later), call:

sync ; echo 3 > /proc/sys/vm/drop_caches

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

Hi

Back from the experiments lab with more results. I've used two indexes (1
and 10 million documents) and ran over the two 2000 queries. Each run was
executed 4 times and I paste here the average of the latest 3 (to eliminate
any caching that is done by the OS and to mimic systems that are already
working and therefore have some data in the OS cache). Following are the
results:

Current TopDocCollector + PQ
--------------------------------------------
Index Size         1M              10M
Avg. Time           8.519ms     289.232ms
Avg. Allocations  77.38          97.35
Avg. # results      51,113        461,019

Modified TopDocCollector + PQ
----------------------------------------------
Index Size         1M              10M
Avg. Time           9.619ms     298.197ms
Avg. Allocations  9.92           10.12
Avg. # results      51,113        461,019

Basically the results haven't changed from yesterday. There isn't any
significant difference in the execution time of both versions. The only
difference is the number of allocations.
Although the number of allocations is very small (100 for 461,000 results),
I think it should not be neglected. On systems that rely solely on memory
(such as powerful systems that are able to keep entire indexes in-memory),
the number of object allocations may be significant.

The way I see it we can do either of the following:
1. Add the method to PQ and change TDC implementation to reuse ScoreDocs. We
gain only in the number of allocations. Basically, we don't lose anything by
doing that, we only gain.
2. Add the method to PQ for applications that require it and not change
TDC's implementation. For example, applications that want to show the 10
most recent documents from a very large collection need to run a
MatchAllDocsQuery with some sorting. They may create a lot more instances of
ScoreDoc.
3. Do nothing.

If you think I should run more tests, please let me know - I already have
the two indexes and any further tests can be performed quite immediately.

Thanks,

Shai

On Dec 10, 2007 11:46 PM, Mike Klaas <mi...@gmail.com> wrote:

> On 10-Dec-07, at 1:20 PM, Shai Erera wrote:
>
> > Thanks for the info. Too bad I use Windows ...
>
> Just allocate a bunch of memory and free it.  This linux, but
> something similar should work on windows:
>
> $ vmstat -S M
> procs -----------memory----------
> r  b   swpd   free   buff  cache
> 0  0      0     45    372    786
>
> $ python -c '"a"*2000000000'
>
> $ vmstat -S M
> procs -----------memory----------
> r  b   swpd   free   buff  cache
> 0  0    463   1761      0      6
>
> -Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Mike Klaas <mi...@gmail.com>.

On 10-Dec-07, at 1:20 PM, Shai Erera wrote:

> Thanks for the info. Too bad I use Windows ...

Just allocate a bunch of memory and free it.  This linux, but  
something similar should work on windows:

$ vmstat -S M
procs -----------memory----------
r  b   swpd   free   buff  cache
0  0      0     45    372    786

$ python -c '"a"*2000000000'

$ vmstat -S M
procs -----------memory----------
r  b   swpd   free   buff  cache
0  0    463   1761      0      6

-Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

Thanks for the info. Too bad I use Windows ...

On Dec 10, 2007 11:16 PM, robert engels <re...@ix.netcom.com> wrote:

> On linux, use "drop_caches", see http://linux-mm.org/Drop_Caches
> On "OS X", use "purge", which is part of the CHUD tools.
> On Windows, I think you're hosed.
>
> On Dec 10, 2007, at 3:06 PM, Michael Busch wrote:
>
> > Shai Erera wrote:
> >> No I haven't done that (to be honest, I don't know how to do
> >> that ... :-) ).
> >> That's the reason I ran both tests multiple times and reported the
> >> last run.
> >>
> >
> > Reboot your machine ;-) That's what I usually do - if there's another
> > way I'd like to know as well!
> >
> > -Michael
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by robert engels <re...@ix.netcom.com>.

On linux, use "drop_caches", see http://linux-mm.org/Drop_Caches
On "OS X", use "purge", which is part of the CHUD tools.
On Windows, I think you're hosed.

On Dec 10, 2007, at 3:06 PM, Michael Busch wrote:

> Shai Erera wrote:
>> No I haven't done that (to be honest, I don't know how to do  
>> that ... :-) ).
>> That's the reason I ran both tests multiple times and reported the  
>> last run.
>>
>
> Reboot your machine ;-) That's what I usually do - if there's another
> way I'd like to know as well!
>
> -Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

I see :-). It's just that "clear up the disk cache" sounds so professional,
I assumed there's a way to do it that I don't know of ... :-)
Thanks, I'll report again with a larger index measurement.

On Dec 10, 2007 11:06 PM, Michael Busch <bu...@gmail.com> wrote:

> Shai Erera wrote:
> > No I haven't done that (to be honest, I don't know how to do that ...
> :-) ).
> > That's the reason I ran both tests multiple times and reported the last
> run.
> >
>
> Reboot your machine ;-) That's what I usually do - if there's another
> way I'd like to know as well!
>
> -Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Michael Busch <bu...@gmail.com>.

Shai Erera wrote:
> No I haven't done that (to be honest, I don't know how to do that ... :-) ).
> That's the reason I ran both tests multiple times and reported the last run.
> 

Reboot your machine ;-) That's what I usually do - if there's another
way I'd like to know as well!

-Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

No I haven't done that (to be honest, I don't know how to do that ... :-) ).
That's the reason I ran both tests multiple times and reported the last run.

On Dec 10, 2007 10:24 PM, Mike Klaas <mi...@gmail.com> wrote:

> On 10-Dec-07, at 12:11 PM, Shai Erera wrote:
>
> > Actually, queries on large indexes are not necessarily I/O bound.
> > It depends
> > on how much of the posting list is being read into memory at once.
> > I'm not
> > that familiar with the inner-most of Lucene, but let's assume a
> > posting
> > element takes 4 bytes for docId and 2 more bytes per position in a
> > document
> > (that's without compression, I'm sure Lucene does some compression
> > on the
> > doc Ids). So, I think I won't miss by much by guessing that at most a
> > posting element takes 10 bytes. Which means that 1M posting
> > elements take
> > 10MB (this is considered a very long posting list).
> > Therefore if you read it into memory in chunks (16, 32, 64 KB),
> > most of the
> > time the query spends in the CPU, computing the scores, PQ etc. The
> > real IO
> > operations only involve reading fragments of the posting into
> > memory. In
> > todays hardware, reading 10MB into memory is pretty fast.
> > So I wouldn't be surprised here (unless I misunderstood you).
>
> My experience is that queries against indices which haven't been
> warmed into the os disk cache to be many times slower (this is
> especially true if the prox file is used at all).
>
> I initially assumed that you had cleared the os disk cache between
> the runs of the two algorithms, and were seeing a difference in
> uncached query performance.  I assume though from your comments that
> this isn't the case at all.
>
> -Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Mike Klaas <mi...@gmail.com>.

On 10-Dec-07, at 12:11 PM, Shai Erera wrote:

> Actually, queries on large indexes are not necessarily I/O bound.  
> It depends
> on how much of the posting list is being read into memory at once.  
> I'm not
> that familiar with the inner-most of Lucene, but let's assume a  
> posting
> element takes 4 bytes for docId and 2 more bytes per position in a  
> document
> (that's without compression, I'm sure Lucene does some compression  
> on the
> doc Ids). So, I think I won't miss by much by guessing that at most a
> posting element takes 10 bytes. Which means that 1M posting  
> elements take
> 10MB (this is considered a very long posting list).
> Therefore if you read it into memory in chunks (16, 32, 64 KB),  
> most of the
> time the query spends in the CPU, computing the scores, PQ etc. The  
> real IO
> operations only involve reading fragments of the posting into  
> memory. In
> todays hardware, reading 10MB into memory is pretty fast.
> So I wouldn't be surprised here (unless I misunderstood you).

My experience is that queries against indices which haven't been  
warmed into the os disk cache to be many times slower (this is  
especially true if the prox file is used at all).

I initially assumed that you had cleared the os disk cache between  
the runs of the two algorithms, and were seeing a difference in  
uncached query performance.  I assume though from your comments that  
this isn't the case at all.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by robert engels <re...@ix.netcom.com>.

I think you might be underestimating the IO cost a bit.

A modern drive does 40-70 mb per second, but that is sequential  
reads. The seek time (9 ms is good) can kill you.

Because of disk fragmentation, there is no guarantee that the posting  
is sequential on the disk.  Obviously the OS and drive controller  
read-ahead trying to improve things, but I think you would be  
surprised how much IO affects things.

A lot of this is going to depend on the size of the index. If it all  
fits in the disk cache, repeated reads (even by multiple threads) is  
going to be very fast.


On Dec 10, 2007, at 2:11 PM, Shai Erera wrote:

> Actually, queries on large indexes are not necessarily I/O bound.  
> It depends
> on how much of the posting list is being read into memory at once.  
> I'm not
> that familiar with the inner-most of Lucene, but let's assume a  
> posting
> element takes 4 bytes for docId and 2 more bytes per position in a  
> document
> (that's without compression, I'm sure Lucene does some compression  
> on the
> doc Ids). So, I think I won't miss by much by guessing that at most a
> posting element takes 10 bytes. Which means that 1M posting  
> elements take
> 10MB (this is considered a very long posting list).
> Therefore if you read it into memory in chunks (16, 32, 64 KB),  
> most of the
> time the query spends in the CPU, computing the scores, PQ etc. The  
> real IO
> operations only involve reading fragments of the posting into  
> memory. In
> todays hardware, reading 10MB into memory is pretty fast.
> So I wouldn't be surprised here (unless I misunderstood you).
>
>> I agree; this is desirable.
> I will run the test with 10M documents tomorrow and then if the  
> results are
> the same will open an issue. Is that agreed?
>
> On Dec 10, 2007 9:59 PM, Mike Klaas <mi...@gmail.com> wrote:
>
>> On 10-Dec-07, at 11:31 AM, Shai Erera wrote:
>>
>>> As you can see, the actual allocation time is really negligible and
>>> there
>>> isn't much difference in the avg. running times of the queries.
>>> However, the
>>> *current* runs performed a lot worse at the beginning, before the
>>> OS cache
>>> warmed up.
>>
>> This surprises me.  I would have thought that the query execution at
>> this point would be IO-bound, hence _less_ suceptible to a little
>> extra gc.
>>
>>> The only significant difference is the number of allocations - the
>>> modified
>>> TDC and PQ allocate ~90% (!) less objects. This is significant,
>>> especially
>>> in heavy loaded systems.
>>
>> I agree; this is desirable.
>>
>> nice work,
>> -Mike
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> -- 
> Regards,
>
> Shai Erera


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

Actually, queries on large indexes are not necessarily I/O bound. It depends
on how much of the posting list is being read into memory at once. I'm not
that familiar with the inner-most of Lucene, but let's assume a posting
element takes 4 bytes for docId and 2 more bytes per position in a document
(that's without compression, I'm sure Lucene does some compression on the
doc Ids). So, I think I won't miss by much by guessing that at most a
posting element takes 10 bytes. Which means that 1M posting elements take
10MB (this is considered a very long posting list).
Therefore if you read it into memory in chunks (16, 32, 64 KB), most of the
time the query spends in the CPU, computing the scores, PQ etc. The real IO
operations only involve reading fragments of the posting into memory. In
todays hardware, reading 10MB into memory is pretty fast.
So I wouldn't be surprised here (unless I misunderstood you).

> I agree; this is desirable.
I will run the test with 10M documents tomorrow and then if the results are
the same will open an issue. Is that agreed?

On Dec 10, 2007 9:59 PM, Mike Klaas <mi...@gmail.com> wrote:

> On 10-Dec-07, at 11:31 AM, Shai Erera wrote:
>
> > As you can see, the actual allocation time is really negligible and
> > there
> > isn't much difference in the avg. running times of the queries.
> > However, the
> > *current* runs performed a lot worse at the beginning, before the
> > OS cache
> > warmed up.
>
> This surprises me.  I would have thought that the query execution at
> this point would be IO-bound, hence _less_ suceptible to a little
> extra gc.
>
> > The only significant difference is the number of allocations - the
> > modified
> > TDC and PQ allocate ~90% (!) less objects. This is significant,
> > especially
> > in heavy loaded systems.
>
> I agree; this is desirable.
>
> nice work,
> -Mike
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Mike Klaas <mi...@gmail.com>.

On 10-Dec-07, at 11:31 AM, Shai Erera wrote:

> As you can see, the actual allocation time is really negligible and  
> there
> isn't much difference in the avg. running times of the queries.  
> However, the
> *current* runs performed a lot worse at the beginning, before the  
> OS cache
> warmed up.

This surprises me.  I would have thought that the query execution at  
this point would be IO-bound, hence _less_ suceptible to a little  
extra gc.

> The only significant difference is the number of allocations - the  
> modified
> TDC and PQ allocate ~90% (!) less objects. This is significant,  
> especially
> in heavy loaded systems.

I agree; this is desirable.

nice work,
-Mike



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

Hi

Well, I have results from a 1M index for now (the index contains 100K
documents duplicated 10 times, so it's not the final test I'll run, but it
still shows something). I ran 2000 short queries (2.4 keywords on average)
on a 1M docs index, after 50 queries warm-up. Following are the results:

Current TopDocCollector:
------------------------------------
num queries: 2000
numDocs=1000000
total time: 15910 ms
avg time: 7.955 ms
avg. allocations: 79.7445
total allocation time: 0
avg. num results: 54841.26

Modified TopDocCollector:
-------------------------------------
num queries: 2000
numDocs=1000000
total time: 15909 ms
avg time: 7.9545 ms
avg. allocations: 9.8565
total allocation time: 0
avg. num results: 54841.26

As you can see, the actual allocation time is really negligible and there
isn't much difference in the avg. running times of the queries. However, the
*current* runs performed a lot worse at the beginning, before the OS cache
warmed up.
The only significant difference is the number of allocations - the modified
TDC and PQ allocate ~90% (!) less objects. This is significant, especially
in heavy loaded systems.

Tomorrow I will run the same test on a 10M (unique) documents index, but I
think the picture is getting clear ...


On Dec 10, 2007 6:15 PM, Paul Elschot <pa...@xs4all.nl> wrote:

> On Monday 10 December 2007 09:19:43 Shai Erera wrote:
> > I agree w.r.t the current implementation, however in the worse case (as
> we
> > tend to consider when talking about computer algorithms), it will
> allocate a
> > ScoreDoc per result. With the overflow reuse, it will not allocate those
> > objects, no matter what's the input it gets.
> > Also, notice that there is a performance hit with the current
> implementation
> > of TopDocCollector: it first checks that the document should be inserted
> and
> > if so, PQ does the same check again.
> > So, if you change the current implementation to always attempt an
> insert,
> > you'd gain some performance improvements as well.
>
> One could consider a specialized priority queue, somewhat like
> ScorerDocQueue, with an insert operation using the arguments
> as in the collect() call instead a ScoreDoc object.
> I don't know where HitQueue is currently used, but that could be
> the place to inline PriorityQueue instead of inheriting from it.
> Otherwise TopDocCollector could be used for inlining PriorityQueue.
>
> Although practical tests are always good, an extra performance test
> with the worst case of slowly increasing score values should be quite
> helpful for comparing with the current implementation and to get the last
> bits of performance out of this.
>
> Regards,
> Paul Elschot
>
>
> >
> > On Dec 10, 2007 10:15 AM, Paul Elschot <pa...@xs4all.nl> wrote:
> >
> > > The current TopDocCollector only allocates a ScoreDoc when the given
> > > score causes a new ScoreDoc to be added into the queue, but it does
> > > not reuse anything that overflows out of the queue.
> > > So, reusing the overflow out of the queue should reduce object
> > > allocations. especially for indexes that tend to have better scoring
> > > docs at the end. I wouldn't expect a 30% improvement out of that,
> > > but it would help, if only to reduce occasional performance
> > > deteriorations.
> > >
> > > Regards,
> > > Paul Elschot
> > >
> > >
> > > On Monday 10 December 2007 08:11:50 Shai Erera wrote:
> > > > Hi
> > > >
> > > > Lucene's PQ implements two methods: put (assumes the PQ has room for
> the
> > > > object) and insert (checks whether the object can be inserted etc.).
> The
> > > > implementation of insert() requires the application that uses it to
> > > allocate
> > > > a new object every time it calls insert. Specifically, it cannot
> reuse
> > > the
> > > > objects that were removed from the PQ.
> > > > I've done some measurements on the performance that search would
> gain by
> > > > using that method, through the change of TopDocCollector.
> > > >
> > > > PriorityQueue change (added insertWithOverflow method)
> > > >
> > >
> -----------------------------------------------------------------------------------
> > > >     /**
> > > >      * insertWithOverflow() is similar to insert(), except its
> return
> > > value:
> > > > it
> > > >      * returns the object (if any) that was dropped off the heap
> because
> > > it
> > > > was
> > > >      * full. This can be the given parameter (in case it is smaller
> than
> > > the
> > > >      * full heap's minimum, and couldn't be added) or another object
> > > that
> > > > was
> > > >      * previously the smallest value in the heap and now has been
> > > replaced
> > > > by a
> > > >      * larger one.
> > > >      */
> > > >     public Object insertWithOverflow(Object element) {
> > > >         if (size < maxSize) {
> > > >             put(element);
> > > >             return null;
> > > >         } else if (size > 0 && !lessThan(element, top())) {
> > > >             Object ret = heap[1];
> > > >             heap[1] = element;
> > > >             adjustTop();
> > > >             return ret;
> > > >         } else {
> > > >             return element;
> > > >         }
> > > >     }
> > > > [Very similar to insert(), only it returns the object that was
> kicked
> > > out of
> > > > the Queue, or null]
> > > >
> > > > TopDocCollector's current implementation of collect()
> > > >
> > >
> ----------------------------------------------------------------------------
> > > >   public void collect(int doc, float score) {
> > > >     if (score > 0.0f) {
> > > >       totalHits++;
> > > >       if (hq.size() < numHits || score >= minScore) {
> > > >         hq.insert(new ScoreDoc(doc, score));
> > > >         minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
> > > >       }
> > > >     }
> > > >   }
> > > > [See how it allocates a new ScoreDoc every time this method is
> called]
> > > >
> > > > TopDocCollector's new implementation of collect()
> > > >
> > >
> -----------------------------------------------------------------------------
> > > >   public void collect(int doc, float score) {
> > > >       if (score == 0.0f) {
> > > >           return;
> > > >       }
> > > >       totalHits++;
> > > >       if (hq.size() < numHits || score >= minScore) {
> > > >           if (sd == null) {
> > > >               sd = new ScoreDoc(doc, score);
> > > >           } else {
> > > >               sd.doc = doc;
> > > >               sd.score = score;
> > > >           }
> > > >           sd = (ScoreDoc) hq.insertWithOverflow(sd);
> > > >           minScore = ((ScoreDoc)hq.top()).score; // maintain
> minScore
> > > >       }
> > > >   }
> > > > [sd is a class memeber of the collector, of type ScoreDoc]
> > > >
> > > > Now for the performance tests. I've done two tests:
> > > > 1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and
> 100,000,000
> > > > times using both implementations. Here are the results:
> > > >                               1M         10M          100M
> > > > Current Collector      218 ms   1,907ms    20,000ms
> > > > Modified Collector    141 ms    1,297ms   12,672ms
> > > > As you can see, the new implementation 30-40% faster.
> > > >
> > > > 2. Wrote some tasks in the benchmark package that makes use of the
> new
> > > PQ
> > > > while executing a real search over an index with 1M documents, of
> small
> > > size
> > > > (10 characters). Here are the results:
> > > >                           Current TopDocCollector
> > > >
> > >
> -----------------------------------------------------------------------------------------------------------------------------------------------
> > > > ------------> Report Sum By (any) Name (5 about 6 out of 6)
> > > > Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> > > > avgUsedMem    avgTotalMem
> > > > CreateIndex         0        1            1         10.6        0.09
> > > > 2,219,112      4,389,376
> > > > AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   52,075.2 -  -  19.20-
> > > > 34,497,176 -   52,689,408
> > > > CloseIndex          0        2            1          0.1       16.62
> > > > 26,178,972     51,378,688
> > > > OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00-
> > > > 48,771,304 -   50,067,968
> > > > --> MySearch            0        1            1          5.3
> 0.19
> > > > 48,771,304     50,067,968
> > > >
> > > >
> > > >                           Modified TopDocCollector
> > > >
> > >
> -----------------------------------------------------------------------------------------------------------------------------------------------
> > > > ------------> Report Sum By (any) Name (5 about 6 out of 6)
> > > > Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> > > > avgUsedMem    avgTotalMem
> > > > CreateIndex         0        1            1         10.6        0.09
> > > > 2,207,008      4,389,376
> > > > AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   50,955.4 -  -  19.62-
> > > > 32,531,992 -   44,825,088
> > > > CloseIndex          0        2            1          0.1       16.84
> > > > 57,853,148     61,929,984
> > > > OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00-
> > > > 57,792,136 -   61,929,984
> > > > --> MySearch            0        1            1          7.1
> 0.14
> > > > 57,939,856     61,929,984
> > > > Notice the time difference in search (0.14 modified vs. 0.19current).
> > > Here
> > > > too, a 30% improvement.
> > > >
> > > > One thing that I wasn't able to show, but I think it's pretty much
> clear
> > > -->
> > > > the new implementation causes a lot less object allocations.
> Consider
> > > the
> > > > typical search for 10 results over an index with 1M documents. The
> > > current
> > > > implementation will allocate 1M (!) ScoreDoc instances, while the
> new
> > > one
> > > > will allocate 11 (10 for the PQ and 1 for reusing). On heavily
> loaded
> > > > systems, this will result in far less work for the GC.
> > > >
> > > > I would like to suggest to reflect this new implementation in
> > > PriorityQueue
> > > > and also modify TopDocCollector to make use of this new method.
> Several
> > > ways
> > > > to do it:
> > > > 1. Add insertWithOverflow to PQ (I hate the name and I'm willing to
> > > accept
> > > > better ones), make insert() deprecated (let's say in 2.3) and
> release
> > > after
> > > > that (2.4?) rename insertWithOverflow to insert(). A complex
> process,
> > > but it
> > > > won't break existing applications' code.
> > > > 2. Change insert's signature and state that applications that move
> to
> > > > 2.3(for example), need to change their code as well (actually it
> won't
> > > > compile).
> > > > 3. Any other alternatives?
> > > >
> > > > And .. of course, review the classes in the Lucene code that use PQ
> and
> > > > modify them as necessary.
> > > >
> > > > A very long email for a very simple (but important) performance
> > > improvement.
> > > >
> > > > Shai
> > > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-dev-help@lucene.apache.org
> > >
> > >
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Paul Elschot <pa...@xs4all.nl>.

On Monday 10 December 2007 09:19:43 Shai Erera wrote:
> I agree w.r.t the current implementation, however in the worse case (as we
> tend to consider when talking about computer algorithms), it will allocate a
> ScoreDoc per result. With the overflow reuse, it will not allocate those
> objects, no matter what's the input it gets.
> Also, notice that there is a performance hit with the current implementation
> of TopDocCollector: it first checks that the document should be inserted and
> if so, PQ does the same check again.
> So, if you change the current implementation to always attempt an insert,
> you'd gain some performance improvements as well.

One could consider a specialized priority queue, somewhat like
ScorerDocQueue, with an insert operation using the arguments
as in the collect() call instead a ScoreDoc object.
I don't know where HitQueue is currently used, but that could be
the place to inline PriorityQueue instead of inheriting from it.
Otherwise TopDocCollector could be used for inlining PriorityQueue.

Although practical tests are always good, an extra performance test
with the worst case of slowly increasing score values should be quite
helpful for comparing with the current implementation and to get the last
bits of performance out of this.

Regards,
Paul Elschot


> 
> On Dec 10, 2007 10:15 AM, Paul Elschot <pa...@xs4all.nl> wrote:
> 
> > The current TopDocCollector only allocates a ScoreDoc when the given
> > score causes a new ScoreDoc to be added into the queue, but it does
> > not reuse anything that overflows out of the queue.
> > So, reusing the overflow out of the queue should reduce object
> > allocations. especially for indexes that tend to have better scoring
> > docs at the end. I wouldn't expect a 30% improvement out of that,
> > but it would help, if only to reduce occasional performance
> > deteriorations.
> >
> > Regards,
> > Paul Elschot
> >
> >
> > On Monday 10 December 2007 08:11:50 Shai Erera wrote:
> > > Hi
> > >
> > > Lucene's PQ implements two methods: put (assumes the PQ has room for the
> > > object) and insert (checks whether the object can be inserted etc.). The
> > > implementation of insert() requires the application that uses it to
> > allocate
> > > a new object every time it calls insert. Specifically, it cannot reuse
> > the
> > > objects that were removed from the PQ.
> > > I've done some measurements on the performance that search would gain by
> > > using that method, through the change of TopDocCollector.
> > >
> > > PriorityQueue change (added insertWithOverflow method)
> > >
> > -----------------------------------------------------------------------------------
> > >     /**
> > >      * insertWithOverflow() is similar to insert(), except its return
> > value:
> > > it
> > >      * returns the object (if any) that was dropped off the heap because
> > it
> > > was
> > >      * full. This can be the given parameter (in case it is smaller than
> > the
> > >      * full heap's minimum, and couldn't be added) or another object
> > that
> > > was
> > >      * previously the smallest value in the heap and now has been
> > replaced
> > > by a
> > >      * larger one.
> > >      */
> > >     public Object insertWithOverflow(Object element) {
> > >         if (size < maxSize) {
> > >             put(element);
> > >             return null;
> > >         } else if (size > 0 && !lessThan(element, top())) {
> > >             Object ret = heap[1];
> > >             heap[1] = element;
> > >             adjustTop();
> > >             return ret;
> > >         } else {
> > >             return element;
> > >         }
> > >     }
> > > [Very similar to insert(), only it returns the object that was kicked
> > out of
> > > the Queue, or null]
> > >
> > > TopDocCollector's current implementation of collect()
> > >
> > ----------------------------------------------------------------------------
> > >   public void collect(int doc, float score) {
> > >     if (score > 0.0f) {
> > >       totalHits++;
> > >       if (hq.size() < numHits || score >= minScore) {
> > >         hq.insert(new ScoreDoc(doc, score));
> > >         minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
> > >       }
> > >     }
> > >   }
> > > [See how it allocates a new ScoreDoc every time this method is called]
> > >
> > > TopDocCollector's new implementation of collect()
> > >
> > -----------------------------------------------------------------------------
> > >   public void collect(int doc, float score) {
> > >       if (score == 0.0f) {
> > >           return;
> > >       }
> > >       totalHits++;
> > >       if (hq.size() < numHits || score >= minScore) {
> > >           if (sd == null) {
> > >               sd = new ScoreDoc(doc, score);
> > >           } else {
> > >               sd.doc = doc;
> > >               sd.score = score;
> > >           }
> > >           sd = (ScoreDoc) hq.insertWithOverflow(sd);
> > >           minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
> > >       }
> > >   }
> > > [sd is a class memeber of the collector, of type ScoreDoc]
> > >
> > > Now for the performance tests. I've done two tests:
> > > 1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and 100,000,000
> > > times using both implementations. Here are the results:
> > >                               1M         10M          100M
> > > Current Collector      218 ms   1,907ms    20,000ms
> > > Modified Collector    141 ms    1,297ms   12,672ms
> > > As you can see, the new implementation 30-40% faster.
> > >
> > > 2. Wrote some tasks in the benchmark package that makes use of the new
> > PQ
> > > while executing a real search over an index with 1M documents, of small
> > size
> > > (10 characters). Here are the results:
> > >                           Current TopDocCollector
> > >
> > -----------------------------------------------------------------------------------------------------------------------------------------------
> > > ------------> Report Sum By (any) Name (5 about 6 out of 6)
> > > Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> > > avgUsedMem    avgTotalMem
> > > CreateIndex         0        1            1         10.6        0.09
> > > 2,219,112      4,389,376
> > > AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   52,075.2 -  -  19.20 -
> > > 34,497,176 -   52,689,408
> > > CloseIndex          0        2            1          0.1       16.62
> > > 26,178,972     51,378,688
> > > OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00 -
> > > 48,771,304 -   50,067,968
> > > --> MySearch            0        1            1          5.3        0.19
> > > 48,771,304     50,067,968
> > >
> > >
> > >                           Modified TopDocCollector
> > >
> > -----------------------------------------------------------------------------------------------------------------------------------------------
> > > ------------> Report Sum By (any) Name (5 about 6 out of 6)
> > > Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> > > avgUsedMem    avgTotalMem
> > > CreateIndex         0        1            1         10.6        0.09
> > > 2,207,008      4,389,376
> > > AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   50,955.4 -  -  19.62 -
> > > 32,531,992 -   44,825,088
> > > CloseIndex          0        2            1          0.1       16.84
> > > 57,853,148     61,929,984
> > > OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00 -
> > > 57,792,136 -   61,929,984
> > > --> MySearch            0        1            1          7.1        0.14
> > > 57,939,856     61,929,984
> > > Notice the time difference in search (0.14 modified vs. 0.19 current).
> > Here
> > > too, a 30% improvement.
> > >
> > > One thing that I wasn't able to show, but I think it's pretty much clear
> > -->
> > > the new implementation causes a lot less object allocations. Consider
> > the
> > > typical search for 10 results over an index with 1M documents. The
> > current
> > > implementation will allocate 1M (!) ScoreDoc instances, while the new
> > one
> > > will allocate 11 (10 for the PQ and 1 for reusing). On heavily loaded
> > > systems, this will result in far less work for the GC.
> > >
> > > I would like to suggest to reflect this new implementation in
> > PriorityQueue
> > > and also modify TopDocCollector to make use of this new method. Several
> > ways
> > > to do it:
> > > 1. Add insertWithOverflow to PQ (I hate the name and I'm willing to
> > accept
> > > better ones), make insert() deprecated (let's say in 2.3) and release
> > after
> > > that (2.4?) rename insertWithOverflow to insert(). A complex process,
> > but it
> > > won't break existing applications' code.
> > > 2. Change insert's signature and state that applications that move to
> > > 2.3(for example), need to change their code as well (actually it won't
> > > compile).
> > > 3. Any other alternatives?
> > >
> > > And .. of course, review the classes in the Lucene code that use PQ and
> > > modify them as necessary.
> > >
> > > A very long email for a very simple (but important) performance
> > improvement.
> > >
> > > Shai
> > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK, sounds like a plan, thanks!

Yes, contrib/benchmark has EnwikiDocMaker to generate docs off the  
Wikipedia XML export file.

Mike

On Dec 10, 2007, at 7:03 AM, Shai Erera wrote:

> I have access to TREC. I can try this.
> W.r.t the large indexes - I don't have access to the data, just  
> scenarios
> our customers ran into the past.
> Does the benchmark package includes code to crawl Wikipedia? If  
> not, do you
> have such code? I don't want to write it from scratch for this  
> specific
> task.
>
> On Dec 10, 2007 1:50 PM, Michael McCandless  
> <lu...@mikemccandless.com>
> wrote:
>
>>
>> I don't offhand.  Working on the indexing side is so much easier :)
>>
>> You mentioned your experience with large indices & large result sets
>> -- is that something you could draw on?
>>
>> There have also been discussions lately about finding real search
>> logs we could use for exactly this reason, though I don't think
>> that's come to a good solution yet.
>>
>> As a simple test you could break Wikipedia into smallish docs (~4K
>> each = ~2.1 million docs), build the index, and make up a set of
>> queries, or randomly pick terms for queries?  Obviously the queries
>> aren't "real", but it's at least a step closer.... progress not
>> perfection.
>>
>> Or, if you have access to TREC...
>>
>> Mike
>>
>> Shai Erera wrote:
>>
>>> Do you have a dataset and queries I can test on?
>>>
>>> On Dec 10, 2007 1:16 PM, Michael McCandless
>>> <lu...@mikemccandless.com>
>>> wrote:
>>>
>>>> Shai Erera wrote:
>>>>
>>>>> No - I didn't try to populate an index with real data and run real
>>>>> queries
>>>>> (what is "real" after all?). I know from my experience of indexes
>>>>> with
>>>>> several millions of documents where there are queries with several
>>>>> hundred
>>>>> thousands results (one query even hit 2.5 M documents). This is
>>>>> typical in
>>>>> search: users type on average 2.3 terms in a query. The chances
>>>>> you'd hit a
>>>>> query with huge result set are not that small in such cases (I'm
>>>>> not saying
>>>>> this is the most common case though, I agree that most of the
>>>>> searches don't
>>>>> process that many documents).
>>>>
>>>> Agreed: many queries do hit a great many results.  But I agree with
>>>> Paul:
>>>> it's not clear how this "typically" translates into how many
>>>> ScoreDocs
>>>> get created?
>>>>
>>>>> However, this change will improve performance from the algorithm
>>>>> point of
>>>>> view - you allocate as many as numRequestedHits+1 no matter how  
>>>>> many
>>>>> documents your query processes.
>>>>
>>>> It's definitely a good step forward: not creating extra garbage in
>>>> hot
>>>> spots is worthwhile, so I think we should make this change.  Still
>>>> I'm
>>>> wondering how much this helps in practice.
>>>>
>>>> I think benchmarking on "real" use cases (vs synthetic tests) is
>>>> worthwhile: it keeps you focused on what really counts, in the end.
>>>>
>>>> In this particular case there are at least 2 things it could  
>>>> show us:
>>>>
>>>>   * How many ScoreDocs really get created, or, what %tg of hits
>>>>     actually result in an insertion into the PQ?
>>>>
>>>>   * How much is this savings as a %tg of the overall time spent
>>>>     searching?
>>>>
>>>> Mike
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Shai Erera
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> -- 
> Regards,
>
> Shai Erera


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

I have access to TREC. I can try this.
W.r.t the large indexes - I don't have access to the data, just scenarios
our customers ran into the past.
Does the benchmark package includes code to crawl Wikipedia? If not, do you
have such code? I don't want to write it from scratch for this specific
task.

On Dec 10, 2007 1:50 PM, Michael McCandless <lu...@mikemccandless.com>
wrote:

>
> I don't offhand.  Working on the indexing side is so much easier :)
>
> You mentioned your experience with large indices & large result sets
> -- is that something you could draw on?
>
> There have also been discussions lately about finding real search
> logs we could use for exactly this reason, though I don't think
> that's come to a good solution yet.
>
> As a simple test you could break Wikipedia into smallish docs (~4K
> each = ~2.1 million docs), build the index, and make up a set of
> queries, or randomly pick terms for queries?  Obviously the queries
> aren't "real", but it's at least a step closer.... progress not
> perfection.
>
> Or, if you have access to TREC...
>
> Mike
>
> Shai Erera wrote:
>
> > Do you have a dataset and queries I can test on?
> >
> > On Dec 10, 2007 1:16 PM, Michael McCandless
> > <lu...@mikemccandless.com>
> > wrote:
> >
> >> Shai Erera wrote:
> >>
> >>> No - I didn't try to populate an index with real data and run real
> >>> queries
> >>> (what is "real" after all?). I know from my experience of indexes
> >>> with
> >>> several millions of documents where there are queries with several
> >>> hundred
> >>> thousands results (one query even hit 2.5 M documents). This is
> >>> typical in
> >>> search: users type on average 2.3 terms in a query. The chances
> >>> you'd hit a
> >>> query with huge result set are not that small in such cases (I'm
> >>> not saying
> >>> this is the most common case though, I agree that most of the
> >>> searches don't
> >>> process that many documents).
> >>
> >> Agreed: many queries do hit a great many results.  But I agree with
> >> Paul:
> >> it's not clear how this "typically" translates into how many
> >> ScoreDocs
> >> get created?
> >>
> >>> However, this change will improve performance from the algorithm
> >>> point of
> >>> view - you allocate as many as numRequestedHits+1 no matter how many
> >>> documents your query processes.
> >>
> >> It's definitely a good step forward: not creating extra garbage in
> >> hot
> >> spots is worthwhile, so I think we should make this change.  Still
> >> I'm
> >> wondering how much this helps in practice.
> >>
> >> I think benchmarking on "real" use cases (vs synthetic tests) is
> >> worthwhile: it keeps you focused on what really counts, in the end.
> >>
> >> In this particular case there are at least 2 things it could show us:
> >>
> >>   * How many ScoreDocs really get created, or, what %tg of hits
> >>     actually result in an insertion into the PQ?
> >>
> >>   * How much is this savings as a %tg of the overall time spent
> >>     searching?
> >>
> >> Mike
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Shai Erera
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Doron Cohen <DO...@il.ibm.com>.

I have a TREC index of 25M docs that I can try this
with. Shai, if you can create an issue and upload a
patch that I (and others) can experiment with, I
will try a few queries on this index with and
without your patch...

Doron

Michael McCandless <lu...@mikemccandless.com> wrote on 10/12/2007
13:50:53:

>
> I don't offhand.  Working on the indexing side is so much easier :)
>
> You mentioned your experience with large indices & large result sets
> -- is that something you could draw on?
>
> There have also been discussions lately about finding real search
> logs we could use for exactly this reason, though I don't think
> that's come to a good solution yet.
>
> As a simple test you could break Wikipedia into smallish docs (~4K
> each = ~2.1 million docs), build the index, and make up a set of
> queries, or randomly pick terms for queries?  Obviously the queries
> aren't "real", but it's at least a step closer.... progress not
> perfection.
>
> Or, if you have access to TREC...
>
> Mike
>
> Shai Erera wrote:
>
> > Do you have a dataset and queries I can test on?
> >
> > On Dec 10, 2007 1:16 PM, Michael McCandless
> > <lu...@mikemccandless.com>
> > wrote:
> >
> >> Shai Erera wrote:
> >>
> >>> No - I didn't try to populate an index with real data and run real
> >>> queries
> >>> (what is "real" after all?). I know from my experience of indexes
> >>> with
> >>> several millions of documents where there are queries with several
> >>> hundred
> >>> thousands results (one query even hit 2.5 M documents). This is
> >>> typical in
> >>> search: users type on average 2.3 terms in a query. The chances
> >>> you'd hit a
> >>> query with huge result set are not that small in such cases (I'm
> >>> not saying
> >>> this is the most common case though, I agree that most of the
> >>> searches don't
> >>> process that many documents).
> >>
> >> Agreed: many queries do hit a great many results.  But I agree with
> >> Paul:
> >> it's not clear how this "typically" translates into how many
> >> ScoreDocs
> >> get created?
> >>
> >>> However, this change will improve performance from the algorithm
> >>> point of
> >>> view - you allocate as many as numRequestedHits+1 no matter how many
> >>> documents your query processes.
> >>
> >> It's definitely a good step forward: not creating extra garbage in
> >> hot
> >> spots is worthwhile, so I think we should make this change.  Still
> >> I'm
> >> wondering how much this helps in practice.
> >>
> >> I think benchmarking on "real" use cases (vs synthetic tests) is
> >> worthwhile: it keeps you focused on what really counts, in the end.
> >>
> >> In this particular case there are at least 2 things it could show us:
> >>
> >>   * How many ScoreDocs really get created, or, what %tg of hits
> >>     actually result in an insertion into the PQ?
> >>
> >>   * How much is this savings as a %tg of the overall time spent
> >>     searching?
> >>
> >> Mike
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Shai Erera
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Michael McCandless <lu...@mikemccandless.com>.

I don't offhand.  Working on the indexing side is so much easier :)

You mentioned your experience with large indices & large result sets  
-- is that something you could draw on?

There have also been discussions lately about finding real search  
logs we could use for exactly this reason, though I don't think  
that's come to a good solution yet.

As a simple test you could break Wikipedia into smallish docs (~4K  
each = ~2.1 million docs), build the index, and make up a set of  
queries, or randomly pick terms for queries?  Obviously the queries  
aren't "real", but it's at least a step closer.... progress not  
perfection.

Or, if you have access to TREC...

Mike

Shai Erera wrote:

> Do you have a dataset and queries I can test on?
>
> On Dec 10, 2007 1:16 PM, Michael McCandless  
> <lu...@mikemccandless.com>
> wrote:
>
>> Shai Erera wrote:
>>
>>> No - I didn't try to populate an index with real data and run real
>>> queries
>>> (what is "real" after all?). I know from my experience of indexes  
>>> with
>>> several millions of documents where there are queries with several
>>> hundred
>>> thousands results (one query even hit 2.5 M documents). This is
>>> typical in
>>> search: users type on average 2.3 terms in a query. The chances
>>> you'd hit a
>>> query with huge result set are not that small in such cases (I'm
>>> not saying
>>> this is the most common case though, I agree that most of the
>>> searches don't
>>> process that many documents).
>>
>> Agreed: many queries do hit a great many results.  But I agree with
>> Paul:
>> it's not clear how this "typically" translates into how many  
>> ScoreDocs
>> get created?
>>
>>> However, this change will improve performance from the algorithm
>>> point of
>>> view - you allocate as many as numRequestedHits+1 no matter how many
>>> documents your query processes.
>>
>> It's definitely a good step forward: not creating extra garbage in  
>> hot
>> spots is worthwhile, so I think we should make this change.  Still  
>> I'm
>> wondering how much this helps in practice.
>>
>> I think benchmarking on "real" use cases (vs synthetic tests) is
>> worthwhile: it keeps you focused on what really counts, in the end.
>>
>> In this particular case there are at least 2 things it could show us:
>>
>>   * How many ScoreDocs really get created, or, what %tg of hits
>>     actually result in an insertion into the PQ?
>>
>>   * How much is this savings as a %tg of the overall time spent
>>     searching?
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> -- 
> Regards,
>
> Shai Erera


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

Do you have a dataset and queries I can test on?

On Dec 10, 2007 1:16 PM, Michael McCandless <lu...@mikemccandless.com>
wrote:

> Shai Erera wrote:
>
> > No - I didn't try to populate an index with real data and run real
> > queries
> > (what is "real" after all?). I know from my experience of indexes with
> > several millions of documents where there are queries with several
> > hundred
> > thousands results (one query even hit 2.5 M documents). This is
> > typical in
> > search: users type on average 2.3 terms in a query. The chances
> > you'd hit a
> > query with huge result set are not that small in such cases (I'm
> > not saying
> > this is the most common case though, I agree that most of the
> > searches don't
> > process that many documents).
>
> Agreed: many queries do hit a great many results.  But I agree with
> Paul:
> it's not clear how this "typically" translates into how many ScoreDocs
> get created?
>
> > However, this change will improve performance from the algorithm
> > point of
> > view - you allocate as many as numRequestedHits+1 no matter how many
> > documents your query processes.
>
> It's definitely a good step forward: not creating extra garbage in hot
> spots is worthwhile, so I think we should make this change.  Still I'm
> wondering how much this helps in practice.
>
> I think benchmarking on "real" use cases (vs synthetic tests) is
> worthwhile: it keeps you focused on what really counts, in the end.
>
> In this particular case there are at least 2 things it could show us:
>
>   * How many ScoreDocs really get created, or, what %tg of hits
>     actually result in an insertion into the PQ?
>
>   * How much is this savings as a %tg of the overall time spent
>     searching?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Michael McCandless <lu...@mikemccandless.com>.

Shai Erera wrote:

> No - I didn't try to populate an index with real data and run real  
> queries
> (what is "real" after all?). I know from my experience of indexes with
> several millions of documents where there are queries with several  
> hundred
> thousands results (one query even hit 2.5 M documents). This is  
> typical in
> search: users type on average 2.3 terms in a query. The chances  
> you'd hit a
> query with huge result set are not that small in such cases (I'm  
> not saying
> this is the most common case though, I agree that most of the  
> searches don't
> process that many documents).

Agreed: many queries do hit a great many results.  But I agree with  
Paul:
it's not clear how this "typically" translates into how many ScoreDocs
get created?

> However, this change will improve performance from the algorithm  
> point of
> view - you allocate as many as numRequestedHits+1 no matter how many
> documents your query processes.

It's definitely a good step forward: not creating extra garbage in hot
spots is worthwhile, so I think we should make this change.  Still I'm
wondering how much this helps in practice.

I think benchmarking on "real" use cases (vs synthetic tests) is
worthwhile: it keeps you focused on what really counts, in the end.

In this particular case there are at least 2 things it could show us:

   * How many ScoreDocs really get created, or, what %tg of hits
     actually result in an insertion into the PQ?

   * How much is this savings as a %tg of the overall time spent
     searching?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

No - I didn't try to populate an index with real data and run real queries
(what is "real" after all?). I know from my experience of indexes with
several millions of documents where there are queries with several hundred
thousands results (one query even hit 2.5 M documents). This is typical in
search: users type on average 2.3 terms in a query. The chances you'd hit a
query with huge result set are not that small in such cases (I'm not saying
this is the most common case though, I agree that most of the searches don't
process that many documents).
However, this change will improve performance from the algorithm point of
view - you allocate as many as numRequestedHits+1 no matter how many
documents your query processes.

On Dec 10, 2007 10:59 AM, Michael McCandless <lu...@mikemccandless.com>
wrote:

>
> Have you done any tests on real queries to see what impact this
> improvement has in practice?  Or, to measure how many ScoreDocs are
> "typically" allocated?
>
> Mike
>
> Shai Erera wrote:
>
> > I agree w.r.t the current implementation, however in the worse case
> > (as we
> > tend to consider when talking about computer algorithms), it will
> > allocate a
> > ScoreDoc per result. With the overflow reuse, it will not allocate
> > those
> > objects, no matter what's the input it gets.
> > Also, notice that there is a performance hit with the current
> > implementation
> > of TopDocCollector: it first checks that the document should be
> > inserted and
> > if so, PQ does the same check again.
> > So, if you change the current implementation to always attempt an
> > insert,
> > you'd gain some performance improvements as well.
> >
> > On Dec 10, 2007 10:15 AM, Paul Elschot <pa...@xs4all.nl> wrote:
> >
> >> The current TopDocCollector only allocates a ScoreDoc when the given
> >> score causes a new ScoreDoc to be added into the queue, but it does
> >> not reuse anything that overflows out of the queue.
> >> So, reusing the overflow out of the queue should reduce object
> >> allocations. especially for indexes that tend to have better scoring
> >> docs at the end. I wouldn't expect a 30% improvement out of that,
> >> but it would help, if only to reduce occasional performance
> >> deteriorations.
> >>
> >> Regards,
> >> Paul Elschot
> >>
> >>
> >> On Monday 10 December 2007 08:11:50 Shai Erera wrote:
> >>> Hi
> >>>
> >>> Lucene's PQ implements two methods: put (assumes the PQ has room
> >>> for the
> >>> object) and insert (checks whether the object can be inserted
> >>> etc.). The
> >>> implementation of insert() requires the application that uses it to
> >> allocate
> >>> a new object every time it calls insert. Specifically, it cannot
> >>> reuse
> >> the
> >>> objects that were removed from the PQ.
> >>> I've done some measurements on the performance that search would
> >>> gain by
> >>> using that method, through the change of TopDocCollector.
> >>>
> >>> PriorityQueue change (added insertWithOverflow method)
> >>>
> >> ---------------------------------------------------------------------
> >> --------------
> >>>     /**
> >>>      * insertWithOverflow() is similar to insert(), except its
> >>> return
> >> value:
> >>> it
> >>>      * returns the object (if any) that was dropped off the heap
> >>> because
> >> it
> >>> was
> >>>      * full. This can be the given parameter (in case it is
> >>> smaller than
> >> the
> >>>      * full heap's minimum, and couldn't be added) or another object
> >> that
> >>> was
> >>>      * previously the smallest value in the heap and now has been
> >> replaced
> >>> by a
> >>>      * larger one.
> >>>      */
> >>>     public Object insertWithOverflow(Object element) {
> >>>         if (size < maxSize) {
> >>>             put(element);
> >>>             return null;
> >>>         } else if (size > 0 && !lessThan(element, top())) {
> >>>             Object ret = heap[1];
> >>>             heap[1] = element;
> >>>             adjustTop();
> >>>             return ret;
> >>>         } else {
> >>>             return element;
> >>>         }
> >>>     }
> >>> [Very similar to insert(), only it returns the object that was
> >>> kicked
> >> out of
> >>> the Queue, or null]
> >>>
> >>> TopDocCollector's current implementation of collect()
> >>>
> >> ---------------------------------------------------------------------
> >> -------
> >>>   public void collect(int doc, float score) {
> >>>     if (score > 0.0f) {
> >>>       totalHits++;
> >>>       if (hq.size() < numHits || score >= minScore) {
> >>>         hq.insert(new ScoreDoc(doc, score));
> >>>         minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
> >>>       }
> >>>     }
> >>>   }
> >>> [See how it allocates a new ScoreDoc every time this method is
> >>> called]
> >>>
> >>> TopDocCollector's new implementation of collect()
> >>>
> >> ---------------------------------------------------------------------
> >> --------
> >>>   public void collect(int doc, float score) {
> >>>       if (score == 0.0f) {
> >>>           return;
> >>>       }
> >>>       totalHits++;
> >>>       if (hq.size() < numHits || score >= minScore) {
> >>>           if (sd == null) {
> >>>               sd = new ScoreDoc(doc, score);
> >>>           } else {
> >>>               sd.doc = doc;
> >>>               sd.score = score;
> >>>           }
> >>>           sd = (ScoreDoc) hq.insertWithOverflow(sd);
> >>>           minScore = ((ScoreDoc)hq.top()).score; // maintain
> >>> minScore
> >>>       }
> >>>   }
> >>> [sd is a class memeber of the collector, of type ScoreDoc]
> >>>
> >>> Now for the performance tests. I've done two tests:
> >>> 1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and
> >>> 100,000,000
> >>> times using both implementations. Here are the results:
> >>>                               1M         10M          100M
> >>> Current Collector      218 ms   1,907ms    20,000ms
> >>> Modified Collector    141 ms    1,297ms   12,672ms
> >>> As you can see, the new implementation 30-40% faster.
> >>>
> >>> 2. Wrote some tasks in the benchmark package that makes use of
> >>> the new
> >> PQ
> >>> while executing a real search over an index with 1M documents, of
> >>> small
> >> size
> >>> (10 characters). Here are the results:
> >>>                           Current TopDocCollector
> >>>
> >> ---------------------------------------------------------------------
> >> ---------------------------------------------------------------------
> >> -----
> >>> ------------> Report Sum By (any) Name (5 about 6 out of 6)
> >>> Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> >>> avgUsedMem    avgTotalMem
> >>> CreateIndex         0        1            1         10.6        0.09
> >>> 2,219,112      4,389,376
> >>> AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   52,075.2 -  -
> >>> 19.20 -
> >>> 34,497,176 -   52,689,408
> >>> CloseIndex          0        2            1          0.1       16.62
> >>> 26,178,972     51,378,688
> >>> OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -
> >>> 0.00 -
> >>> 48,771,304 -   50,067,968
> >>> --> MySearch            0        1            1
> >>> 5.3        0.19
> >>> 48,771,304     50,067,968
> >>>
> >>>
> >>>                           Modified TopDocCollector
> >>>
> >> ---------------------------------------------------------------------
> >> ---------------------------------------------------------------------
> >> -----
> >>> ------------> Report Sum By (any) Name (5 about 6 out of 6)
> >>> Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> >>> avgUsedMem    avgTotalMem
> >>> CreateIndex         0        1            1         10.6        0.09
> >>> 2,207,008      4,389,376
> >>> AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   50,955.4 -  -
> >>> 19.62 -
> >>> 32,531,992 -   44,825,088
> >>> CloseIndex          0        2            1          0.1       16.84
> >>> 57,853,148     61,929,984
> >>> OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -
> >>> 0.00 -
> >>> 57,792,136 -   61,929,984
> >>> --> MySearch            0        1            1
> >>> 7.1        0.14
> >>> 57,939,856     61,929,984
> >>> Notice the time difference in search (0.14 modified vs. 0.19
> >>> current).
> >> Here
> >>> too, a 30% improvement.
> >>>
> >>> One thing that I wasn't able to show, but I think it's pretty
> >>> much clear
> >> -->
> >>> the new implementation causes a lot less object allocations.
> >>> Consider
> >> the
> >>> typical search for 10 results over an index with 1M documents. The
> >> current
> >>> implementation will allocate 1M (!) ScoreDoc instances, while the
> >>> new
> >> one
> >>> will allocate 11 (10 for the PQ and 1 for reusing). On heavily
> >>> loaded
> >>> systems, this will result in far less work for the GC.
> >>>
> >>> I would like to suggest to reflect this new implementation in
> >> PriorityQueue
> >>> and also modify TopDocCollector to make use of this new method.
> >>> Several
> >> ways
> >>> to do it:
> >>> 1. Add insertWithOverflow to PQ (I hate the name and I'm willing to
> >> accept
> >>> better ones), make insert() deprecated (let's say in 2.3) and
> >>> release
> >> after
> >>> that (2.4?) rename insertWithOverflow to insert(). A complex
> >>> process,
> >> but it
> >>> won't break existing applications' code.
> >>> 2. Change insert's signature and state that applications that
> >>> move to
> >>> 2.3(for example), need to change their code as well (actually it
> >>> won't
> >>> compile).
> >>> 3. Any other alternatives?
> >>>
> >>> And .. of course, review the classes in the Lucene code that use
> >>> PQ and
> >>> modify them as necessary.
> >>>
> >>> A very long email for a very simple (but important) performance
> >> improvement.
> >>>
> >>> Shai
> >>>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Shai Erera
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Michael McCandless <lu...@mikemccandless.com>.

Have you done any tests on real queries to see what impact this  
improvement has in practice?  Or, to measure how many ScoreDocs are  
"typically" allocated?

Mike

Shai Erera wrote:

> I agree w.r.t the current implementation, however in the worse case  
> (as we
> tend to consider when talking about computer algorithms), it will  
> allocate a
> ScoreDoc per result. With the overflow reuse, it will not allocate  
> those
> objects, no matter what's the input it gets.
> Also, notice that there is a performance hit with the current  
> implementation
> of TopDocCollector: it first checks that the document should be  
> inserted and
> if so, PQ does the same check again.
> So, if you change the current implementation to always attempt an  
> insert,
> you'd gain some performance improvements as well.
>
> On Dec 10, 2007 10:15 AM, Paul Elschot <pa...@xs4all.nl> wrote:
>
>> The current TopDocCollector only allocates a ScoreDoc when the given
>> score causes a new ScoreDoc to be added into the queue, but it does
>> not reuse anything that overflows out of the queue.
>> So, reusing the overflow out of the queue should reduce object
>> allocations. especially for indexes that tend to have better scoring
>> docs at the end. I wouldn't expect a 30% improvement out of that,
>> but it would help, if only to reduce occasional performance
>> deteriorations.
>>
>> Regards,
>> Paul Elschot
>>
>>
>> On Monday 10 December 2007 08:11:50 Shai Erera wrote:
>>> Hi
>>>
>>> Lucene's PQ implements two methods: put (assumes the PQ has room  
>>> for the
>>> object) and insert (checks whether the object can be inserted  
>>> etc.). The
>>> implementation of insert() requires the application that uses it to
>> allocate
>>> a new object every time it calls insert. Specifically, it cannot  
>>> reuse
>> the
>>> objects that were removed from the PQ.
>>> I've done some measurements on the performance that search would  
>>> gain by
>>> using that method, through the change of TopDocCollector.
>>>
>>> PriorityQueue change (added insertWithOverflow method)
>>>
>> --------------------------------------------------------------------- 
>> --------------
>>>     /**
>>>      * insertWithOverflow() is similar to insert(), except its  
>>> return
>> value:
>>> it
>>>      * returns the object (if any) that was dropped off the heap  
>>> because
>> it
>>> was
>>>      * full. This can be the given parameter (in case it is  
>>> smaller than
>> the
>>>      * full heap's minimum, and couldn't be added) or another object
>> that
>>> was
>>>      * previously the smallest value in the heap and now has been
>> replaced
>>> by a
>>>      * larger one.
>>>      */
>>>     public Object insertWithOverflow(Object element) {
>>>         if (size < maxSize) {
>>>             put(element);
>>>             return null;
>>>         } else if (size > 0 && !lessThan(element, top())) {
>>>             Object ret = heap[1];
>>>             heap[1] = element;
>>>             adjustTop();
>>>             return ret;
>>>         } else {
>>>             return element;
>>>         }
>>>     }
>>> [Very similar to insert(), only it returns the object that was  
>>> kicked
>> out of
>>> the Queue, or null]
>>>
>>> TopDocCollector's current implementation of collect()
>>>
>> --------------------------------------------------------------------- 
>> -------
>>>   public void collect(int doc, float score) {
>>>     if (score > 0.0f) {
>>>       totalHits++;
>>>       if (hq.size() < numHits || score >= minScore) {
>>>         hq.insert(new ScoreDoc(doc, score));
>>>         minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
>>>       }
>>>     }
>>>   }
>>> [See how it allocates a new ScoreDoc every time this method is  
>>> called]
>>>
>>> TopDocCollector's new implementation of collect()
>>>
>> --------------------------------------------------------------------- 
>> --------
>>>   public void collect(int doc, float score) {
>>>       if (score == 0.0f) {
>>>           return;
>>>       }
>>>       totalHits++;
>>>       if (hq.size() < numHits || score >= minScore) {
>>>           if (sd == null) {
>>>               sd = new ScoreDoc(doc, score);
>>>           } else {
>>>               sd.doc = doc;
>>>               sd.score = score;
>>>           }
>>>           sd = (ScoreDoc) hq.insertWithOverflow(sd);
>>>           minScore = ((ScoreDoc)hq.top()).score; // maintain  
>>> minScore
>>>       }
>>>   }
>>> [sd is a class memeber of the collector, of type ScoreDoc]
>>>
>>> Now for the performance tests. I've done two tests:
>>> 1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and  
>>> 100,000,000
>>> times using both implementations. Here are the results:
>>>                               1M         10M          100M
>>> Current Collector      218 ms   1,907ms    20,000ms
>>> Modified Collector    141 ms    1,297ms   12,672ms
>>> As you can see, the new implementation 30-40% faster.
>>>
>>> 2. Wrote some tasks in the benchmark package that makes use of  
>>> the new
>> PQ
>>> while executing a real search over an index with 1M documents, of  
>>> small
>> size
>>> (10 characters). Here are the results:
>>>                           Current TopDocCollector
>>>
>> --------------------------------------------------------------------- 
>> --------------------------------------------------------------------- 
>> -----
>>> ------------> Report Sum By (any) Name (5 about 6 out of 6)
>>> Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
>>> avgUsedMem    avgTotalMem
>>> CreateIndex         0        1            1         10.6        0.09
>>> 2,219,112      4,389,376
>>> AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   52,075.2 -  -   
>>> 19.20 -
>>> 34,497,176 -   52,689,408
>>> CloseIndex          0        2            1          0.1       16.62
>>> 26,178,972     51,378,688
>>> OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -    
>>> 0.00 -
>>> 48,771,304 -   50,067,968
>>> --> MySearch            0        1            1           
>>> 5.3        0.19
>>> 48,771,304     50,067,968
>>>
>>>
>>>                           Modified TopDocCollector
>>>
>> --------------------------------------------------------------------- 
>> --------------------------------------------------------------------- 
>> -----
>>> ------------> Report Sum By (any) Name (5 about 6 out of 6)
>>> Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
>>> avgUsedMem    avgTotalMem
>>> CreateIndex         0        1            1         10.6        0.09
>>> 2,207,008      4,389,376
>>> AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   50,955.4 -  -   
>>> 19.62 -
>>> 32,531,992 -   44,825,088
>>> CloseIndex          0        2            1          0.1       16.84
>>> 57,853,148     61,929,984
>>> OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -    
>>> 0.00 -
>>> 57,792,136 -   61,929,984
>>> --> MySearch            0        1            1           
>>> 7.1        0.14
>>> 57,939,856     61,929,984
>>> Notice the time difference in search (0.14 modified vs. 0.19  
>>> current).
>> Here
>>> too, a 30% improvement.
>>>
>>> One thing that I wasn't able to show, but I think it's pretty  
>>> much clear
>> -->
>>> the new implementation causes a lot less object allocations.  
>>> Consider
>> the
>>> typical search for 10 results over an index with 1M documents. The
>> current
>>> implementation will allocate 1M (!) ScoreDoc instances, while the  
>>> new
>> one
>>> will allocate 11 (10 for the PQ and 1 for reusing). On heavily  
>>> loaded
>>> systems, this will result in far less work for the GC.
>>>
>>> I would like to suggest to reflect this new implementation in
>> PriorityQueue
>>> and also modify TopDocCollector to make use of this new method.  
>>> Several
>> ways
>>> to do it:
>>> 1. Add insertWithOverflow to PQ (I hate the name and I'm willing to
>> accept
>>> better ones), make insert() deprecated (let's say in 2.3) and  
>>> release
>> after
>>> that (2.4?) rename insertWithOverflow to insert(). A complex  
>>> process,
>> but it
>>> won't break existing applications' code.
>>> 2. Change insert's signature and state that applications that  
>>> move to
>>> 2.3(for example), need to change their code as well (actually it  
>>> won't
>>> compile).
>>> 3. Any other alternatives?
>>>
>>> And .. of course, review the classes in the Lucene code that use  
>>> PQ and
>>> modify them as necessary.
>>>
>>> A very long email for a very simple (but important) performance
>> improvement.
>>>
>>> Shai
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> -- 
> Regards,
>
> Shai Erera


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Performance Improvement for Search using PriorityQueue

Posted by Shai Erera <se...@gmail.com>.

I agree w.r.t the current implementation, however in the worse case (as we
tend to consider when talking about computer algorithms), it will allocate a
ScoreDoc per result. With the overflow reuse, it will not allocate those
objects, no matter what's the input it gets.
Also, notice that there is a performance hit with the current implementation
of TopDocCollector: it first checks that the document should be inserted and
if so, PQ does the same check again.
So, if you change the current implementation to always attempt an insert,
you'd gain some performance improvements as well.

On Dec 10, 2007 10:15 AM, Paul Elschot <pa...@xs4all.nl> wrote:

> The current TopDocCollector only allocates a ScoreDoc when the given
> score causes a new ScoreDoc to be added into the queue, but it does
> not reuse anything that overflows out of the queue.
> So, reusing the overflow out of the queue should reduce object
> allocations. especially for indexes that tend to have better scoring
> docs at the end. I wouldn't expect a 30% improvement out of that,
> but it would help, if only to reduce occasional performance
> deteriorations.
>
> Regards,
> Paul Elschot
>
>
> On Monday 10 December 2007 08:11:50 Shai Erera wrote:
> > Hi
> >
> > Lucene's PQ implements two methods: put (assumes the PQ has room for the
> > object) and insert (checks whether the object can be inserted etc.). The
> > implementation of insert() requires the application that uses it to
> allocate
> > a new object every time it calls insert. Specifically, it cannot reuse
> the
> > objects that were removed from the PQ.
> > I've done some measurements on the performance that search would gain by
> > using that method, through the change of TopDocCollector.
> >
> > PriorityQueue change (added insertWithOverflow method)
> >
> -----------------------------------------------------------------------------------
> >     /**
> >      * insertWithOverflow() is similar to insert(), except its return
> value:
> > it
> >      * returns the object (if any) that was dropped off the heap because
> it
> > was
> >      * full. This can be the given parameter (in case it is smaller than
> the
> >      * full heap's minimum, and couldn't be added) or another object
> that
> > was
> >      * previously the smallest value in the heap and now has been
> replaced
> > by a
> >      * larger one.
> >      */
> >     public Object insertWithOverflow(Object element) {
> >         if (size < maxSize) {
> >             put(element);
> >             return null;
> >         } else if (size > 0 && !lessThan(element, top())) {
> >             Object ret = heap[1];
> >             heap[1] = element;
> >             adjustTop();
> >             return ret;
> >         } else {
> >             return element;
> >         }
> >     }
> > [Very similar to insert(), only it returns the object that was kicked
> out of
> > the Queue, or null]
> >
> > TopDocCollector's current implementation of collect()
> >
> ----------------------------------------------------------------------------
> >   public void collect(int doc, float score) {
> >     if (score > 0.0f) {
> >       totalHits++;
> >       if (hq.size() < numHits || score >= minScore) {
> >         hq.insert(new ScoreDoc(doc, score));
> >         minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
> >       }
> >     }
> >   }
> > [See how it allocates a new ScoreDoc every time this method is called]
> >
> > TopDocCollector's new implementation of collect()
> >
> -----------------------------------------------------------------------------
> >   public void collect(int doc, float score) {
> >       if (score == 0.0f) {
> >           return;
> >       }
> >       totalHits++;
> >       if (hq.size() < numHits || score >= minScore) {
> >           if (sd == null) {
> >               sd = new ScoreDoc(doc, score);
> >           } else {
> >               sd.doc = doc;
> >               sd.score = score;
> >           }
> >           sd = (ScoreDoc) hq.insertWithOverflow(sd);
> >           minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
> >       }
> >   }
> > [sd is a class memeber of the collector, of type ScoreDoc]
> >
> > Now for the performance tests. I've done two tests:
> > 1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and 100,000,000
> > times using both implementations. Here are the results:
> >                               1M         10M          100M
> > Current Collector      218 ms   1,907ms    20,000ms
> > Modified Collector    141 ms    1,297ms   12,672ms
> > As you can see, the new implementation 30-40% faster.
> >
> > 2. Wrote some tasks in the benchmark package that makes use of the new
> PQ
> > while executing a real search over an index with 1M documents, of small
> size
> > (10 characters). Here are the results:
> >                           Current TopDocCollector
> >
> -----------------------------------------------------------------------------------------------------------------------------------------------
> > ------------> Report Sum By (any) Name (5 about 6 out of 6)
> > Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> > avgUsedMem    avgTotalMem
> > CreateIndex         0        1            1         10.6        0.09
> > 2,219,112      4,389,376
> > AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   52,075.2 -  -  19.20 -
> > 34,497,176 -   52,689,408
> > CloseIndex          0        2            1          0.1       16.62
> > 26,178,972     51,378,688
> > OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00 -
> > 48,771,304 -   50,067,968
> > --> MySearch            0        1            1          5.3        0.19
> > 48,771,304     50,067,968
> >
> >
> >                           Modified TopDocCollector
> >
> -----------------------------------------------------------------------------------------------------------------------------------------------
> > ------------> Report Sum By (any) Name (5 about 6 out of 6)
> > Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> > avgUsedMem    avgTotalMem
> > CreateIndex         0        1            1         10.6        0.09
> > 2,207,008      4,389,376
> > AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   50,955.4 -  -  19.62 -
> > 32,531,992 -   44,825,088
> > CloseIndex          0        2            1          0.1       16.84
> > 57,853,148     61,929,984
> > OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00 -
> > 57,792,136 -   61,929,984
> > --> MySearch            0        1            1          7.1        0.14
> > 57,939,856     61,929,984
> > Notice the time difference in search (0.14 modified vs. 0.19 current).
> Here
> > too, a 30% improvement.
> >
> > One thing that I wasn't able to show, but I think it's pretty much clear
> -->
> > the new implementation causes a lot less object allocations. Consider
> the
> > typical search for 10 results over an index with 1M documents. The
> current
> > implementation will allocate 1M (!) ScoreDoc instances, while the new
> one
> > will allocate 11 (10 for the PQ and 1 for reusing). On heavily loaded
> > systems, this will result in far less work for the GC.
> >
> > I would like to suggest to reflect this new implementation in
> PriorityQueue
> > and also modify TopDocCollector to make use of this new method. Several
> ways
> > to do it:
> > 1. Add insertWithOverflow to PQ (I hate the name and I'm willing to
> accept
> > better ones), make insert() deprecated (let's say in 2.3) and release
> after
> > that (2.4?) rename insertWithOverflow to insert(). A complex process,
> but it
> > won't break existing applications' code.
> > 2. Change insert's signature and state that applications that move to
> > 2.3(for example), need to change their code as well (actually it won't
> > compile).
> > 3. Any other alternatives?
> >
> > And .. of course, review the classes in the Lucene code that use PQ and
> > modify them as necessary.
> >
> > A very long email for a very simple (but important) performance
> improvement.
> >
> > Shai
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Posted by Paul Elschot <pa...@xs4all.nl>.

The current TopDocCollector only allocates a ScoreDoc when the given
score causes a new ScoreDoc to be added into the queue, but it does
not reuse anything that overflows out of the queue.
So, reusing the overflow out of the queue should reduce object
allocations. especially for indexes that tend to have better scoring
docs at the end. I wouldn't expect a 30% improvement out of that,
but it would help, if only to reduce occasional performance
deteriorations.

Regards,
Paul Elschot


On Monday 10 December 2007 08:11:50 Shai Erera wrote:
> Hi
> 
> Lucene's PQ implements two methods: put (assumes the PQ has room for the
> object) and insert (checks whether the object can be inserted etc.). The
> implementation of insert() requires the application that uses it to allocate
> a new object every time it calls insert. Specifically, it cannot reuse the
> objects that were removed from the PQ.
> I've done some measurements on the performance that search would gain by
> using that method, through the change of TopDocCollector.
> 
> PriorityQueue change (added insertWithOverflow method)
> -----------------------------------------------------------------------------------
>     /**
>      * insertWithOverflow() is similar to insert(), except its return value:
> it
>      * returns the object (if any) that was dropped off the heap because it
> was
>      * full. This can be the given parameter (in case it is smaller than the
>      * full heap's minimum, and couldn't be added) or another object that
> was
>      * previously the smallest value in the heap and now has been replaced
> by a
>      * larger one.
>      */
>     public Object insertWithOverflow(Object element) {
>         if (size < maxSize) {
>             put(element);
>             return null;
>         } else if (size > 0 && !lessThan(element, top())) {
>             Object ret = heap[1];
>             heap[1] = element;
>             adjustTop();
>             return ret;
>         } else {
>             return element;
>         }
>     }
> [Very similar to insert(), only it returns the object that was kicked out of
> the Queue, or null]
> 
> TopDocCollector's current implementation of collect()
> ----------------------------------------------------------------------------
>   public void collect(int doc, float score) {
>     if (score > 0.0f) {
>       totalHits++;
>       if (hq.size() < numHits || score >= minScore) {
>         hq.insert(new ScoreDoc(doc, score));
>         minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
>       }
>     }
>   }
> [See how it allocates a new ScoreDoc every time this method is called]
> 
> TopDocCollector's new implementation of collect()
> -----------------------------------------------------------------------------
>   public void collect(int doc, float score) {
>       if (score == 0.0f) {
>           return;
>       }
>       totalHits++;
>       if (hq.size() < numHits || score >= minScore) {
>           if (sd == null) {
>               sd = new ScoreDoc(doc, score);
>           } else {
>               sd.doc = doc;
>               sd.score = score;
>           }
>           sd = (ScoreDoc) hq.insertWithOverflow(sd);
>           minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
>       }
>   }
> [sd is a class memeber of the collector, of type ScoreDoc]
> 
> Now for the performance tests. I've done two tests:
> 1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and 100,000,000
> times using both implementations. Here are the results:
>                               1M         10M          100M
> Current Collector      218 ms   1,907ms    20,000ms
> Modified Collector    141 ms    1,297ms   12,672ms
> As you can see, the new implementation 30-40% faster.
> 
> 2. Wrote some tasks in the benchmark package that makes use of the new PQ
> while executing a real search over an index with 1M documents, of small size
> (10 characters). Here are the results:
>                           Current TopDocCollector
> -----------------------------------------------------------------------------------------------------------------------------------------------
> ------------> Report Sum By (any) Name (5 about 6 out of 6)
> Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> avgUsedMem    avgTotalMem
> CreateIndex         0        1            1         10.6        0.09
> 2,219,112      4,389,376
> AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   52,075.2 -  -  19.20 -
> 34,497,176 -   52,689,408
> CloseIndex          0        2            1          0.1       16.62
> 26,178,972     51,378,688
> OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00 -
> 48,771,304 -   50,067,968
> --> MySearch            0        1            1          5.3        0.19
> 48,771,304     50,067,968
> 
> 
>                           Modified TopDocCollector
> -----------------------------------------------------------------------------------------------------------------------------------------------
> ------------> Report Sum By (any) Name (5 about 6 out of 6)
> Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> avgUsedMem    avgTotalMem
> CreateIndex         0        1            1         10.6        0.09
> 2,207,008      4,389,376
> AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   50,955.4 -  -  19.62 -
> 32,531,992 -   44,825,088
> CloseIndex          0        2            1          0.1       16.84
> 57,853,148     61,929,984
> OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00 -
> 57,792,136 -   61,929,984
> --> MySearch            0        1            1          7.1        0.14
> 57,939,856     61,929,984
> Notice the time difference in search (0.14 modified vs. 0.19 current). Here
> too, a 30% improvement.
> 
> One thing that I wasn't able to show, but I think it's pretty much clear -->
> the new implementation causes a lot less object allocations. Consider the
> typical search for 10 results over an index with 1M documents. The current
> implementation will allocate 1M (!) ScoreDoc instances, while the new one
> will allocate 11 (10 for the PQ and 1 for reusing). On heavily loaded
> systems, this will result in far less work for the GC.
> 
> I would like to suggest to reflect this new implementation in PriorityQueue
> and also modify TopDocCollector to make use of this new method. Several ways
> to do it:
> 1. Add insertWithOverflow to PQ (I hate the name and I'm willing to accept
> better ones), make insert() deprecated (let's say in 2.3) and release after
> that (2.4?) rename insertWithOverflow to insert(). A complex process, but it
> won't break existing applications' code.
> 2. Change insert's signature and state that applications that move to
> 2.3(for example), need to change their code as well (actually it won't
> compile).
> 3. Any other alternatives?
> 
> And .. of course, review the classes in the Lucene code that use PQ and
> modify them as necessary.
> 
> A very long email for a very simple (but important) performance improvement.
> 
> Shai
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org