You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2009/10/23 21:54:00 UTC

[jira] Created: (LUCENE-2006) Optimization for FieldDocSortedHitQueue

Optimization for FieldDocSortedHitQueue
---------------------------------------

                 Key: LUCENE-2006
                 URL: https://issues.apache.org/jira/browse/LUCENE-2006
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
    Affects Versions: 3.0
            Reporter: Uwe Schindler
            Assignee: Uwe Schindler
             Fix For: 3.0
         Attachments: LUCENE-2006.patch

When updating core for generics,  I found the following as a optimization of FieldDocSortedHitQueue:

All FieldDoc values are Compareables (also the score or docid, if they
appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
of lessThan seems very ineffective, as it has a big switch statement on the
SortField type, then casts the value to the underlying numeric type Object,
calls Number.xxxValue() & co for it and then compares manually. As
j.l.Number is itself Comparable, I see no reason to do this. Just call
compareTo on the Comparable interface and we are happy. The big deal is that
it prevents casting and the two method calls xxxValue(), as Number.compareTo
works more efficient internally.

The only special cases are String sort, where the Locale may be used and the
score sorting which is backwards. But these are two if statements instead of
the whole switch.

I had not tested it now for performance, but in my opinion it should be
faster for MultiSearchers. All tests still pass (because they should).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769515#action_12769515 ] 

Uwe Schindler commented on LUCENE-2006:
---------------------------------------

I think it is because you only merge the top 1000 docs into the HitQueue. The merging of the HQs at the end of search is simple, because it only merges the top n docs of each queue. You would only see a difference if you sort all hits.

I think we can commit this, too.

> Optimization for FieldDocSortedHitQueue
> ---------------------------------------
>
>                 Key: LUCENE-2006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2006
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>
>         Attachments: LUCENE-2006.patch
>
>
> When updating core for generics,  I found the following as a optimization of FieldDocSortedHitQueue:
> All FieldDoc values are Compareables (also the score or docid, if they
> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
> of lessThan seems very ineffective, as it has a big switch statement on the
> SortField type, then casts the value to the underlying numeric type Object,
> calls Number.xxxValue() & co for it and then compares manually. As
> j.l.Number is itself Comparable, I see no reason to do this. Just call
> compareTo on the Comparable interface and we are happy. The big deal is that
> it prevents casting and the two method calls xxxValue(), as Number.compareTo
> works more efficient internally.
> The only special cases are String sort, where the Locale may be used and the
> score sorting which is backwards. But these are two if statements instead of
> the whole switch.
> I had not tested it now for performance, but in my opinion it should be
> faster for MultiSearchers. All tests still pass (because they should).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769521#action_12769521 ] 

Mark Miller commented on LUCENE-2006:
-------------------------------------

Right - I don't think we have to worry about things much over top 1000.

And while I don't want to take the time to do top 4*64,000, for kicks I tried top 64,000 over a couple runs.

It actually does show a 2-3% win with the new method once you get up that high ;)

Its somethin' anyway :)

> Optimization for FieldDocSortedHitQueue
> ---------------------------------------
>
>                 Key: LUCENE-2006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2006
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>
>         Attachments: LUCENE-2006.patch
>
>
> When updating core for generics,  I found the following as a optimization of FieldDocSortedHitQueue:
> All FieldDoc values are Compareables (also the score or docid, if they
> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
> of lessThan seems very ineffective, as it has a big switch statement on the
> SortField type, then casts the value to the underlying numeric type Object,
> calls Number.xxxValue() & co for it and then compares manually. As
> j.l.Number is itself Comparable, I see no reason to do this. Just call
> compareTo on the Comparable interface and we are happy. The big deal is that
> it prevents casting and the two method calls xxxValue(), as Number.compareTo
> works more efficient internally.
> The only special cases are String sort, where the Locale may be used and the
> score sorting which is backwards. But these are two if statements instead of
> the whole switch.
> I had not tested it now for performance, but in my opinion it should be
> faster for MultiSearchers. All tests still pass (because they should).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2006) Optimization for FieldDocSortedHitQueue

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-2006.
-----------------------------------

    Resolution: Fixed

Committed revision: 829274

Thanks Mark for perf testing!

> Optimization for FieldDocSortedHitQueue
> ---------------------------------------
>
>                 Key: LUCENE-2006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2006
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>
>         Attachments: LUCENE-2006.patch
>
>
> When updating core for generics,  I found the following as a optimization of FieldDocSortedHitQueue:
> All FieldDoc values are Compareables (also the score or docid, if they
> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
> of lessThan seems very ineffective, as it has a big switch statement on the
> SortField type, then casts the value to the underlying numeric type Object,
> calls Number.xxxValue() & co for it and then compares manually. As
> j.l.Number is itself Comparable, I see no reason to do this. Just call
> compareTo on the Comparable interface and we are happy. The big deal is that
> it prevents casting and the two method calls xxxValue(), as Number.compareTo
> works more efficient internally.
> The only special cases are String sort, where the Locale may be used and the
> score sorting which is backwards. But these are two if statements instead of
> the whole switch.
> I had not tested it now for performance, but in my opinion it should be
> faster for MultiSearchers. All tests still pass (because they should).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769398#action_12769398 ] 

Uwe Schindler commented on LUCENE-2006:
---------------------------------------

Is there any MultiSearcher related task/alg in contrib/benchmark or somewhere in JIRA?

> Optimization for FieldDocSortedHitQueue
> ---------------------------------------
>
>                 Key: LUCENE-2006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2006
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>
>         Attachments: LUCENE-2006.patch
>
>
> When updating core for generics,  I found the following as a optimization of FieldDocSortedHitQueue:
> All FieldDoc values are Compareables (also the score or docid, if they
> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
> of lessThan seems very ineffective, as it has a big switch statement on the
> SortField type, then casts the value to the underlying numeric type Object,
> calls Number.xxxValue() & co for it and then compares manually. As
> j.l.Number is itself Comparable, I see no reason to do this. Just call
> compareTo on the Comparable interface and we are happy. The big deal is that
> it prevents casting and the two method calls xxxValue(), as Number.compareTo
> works more efficient internally.
> The only special cases are String sort, where the Locale may be used and the
> score sorting which is backwards. But these are two if statements instead of
> the whole switch.
> I had not tested it now for performance, but in my opinion it should be
> faster for MultiSearchers. All tests still pass (because they should).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769389#action_12769389 ] 

Uwe Schindler commented on LUCENE-2006:
---------------------------------------

Mark Miller on java-dev:

{quote}
Nice! I like it. Even if its not much faster (havn't checked either), I
can't see it being much slower and its cleaner code.

I'd be happy to do some quick perf tests when I get a chance, but I'm +1
on it.
{quote}

> Optimization for FieldDocSortedHitQueue
> ---------------------------------------
>
>                 Key: LUCENE-2006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2006
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>
>         Attachments: LUCENE-2006.patch
>
>
> When updating core for generics,  I found the following as a optimization of FieldDocSortedHitQueue:
> All FieldDoc values are Compareables (also the score or docid, if they
> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
> of lessThan seems very ineffective, as it has a big switch statement on the
> SortField type, then casts the value to the underlying numeric type Object,
> calls Number.xxxValue() & co for it and then compares manually. As
> j.l.Number is itself Comparable, I see no reason to do this. Just call
> compareTo on the Comparable interface and we are happy. The big deal is that
> it prevents casting and the two method calls xxxValue(), as Number.compareTo
> works more efficient internally.
> The only special cases are String sort, where the Locale may be used and the
> score sorting which is backwards. But these are two if statements instead of
> the whole switch.
> I had not tested it now for performance, but in my opinion it should be
> faster for MultiSearchers. All tests still pass (because they should).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769509#action_12769509 ] 

Mark Miller commented on LUCENE-2006:
-------------------------------------

Okay Uwe -

I took a 2 GB zipped Wiki dump and used a SkipDocTask to create four unique indices of 64,000 docs each.

Then I ran a search matching all docs and sorting on title, taking the average of 1000 runs and recording that overage over a few times for each method.

I tried topn's of 10, 100, and 1000.

I couldn't measure a meaningful difference one way or the other. Lets do it.

> Optimization for FieldDocSortedHitQueue
> ---------------------------------------
>
>                 Key: LUCENE-2006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2006
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>
>         Attachments: LUCENE-2006.patch
>
>
> When updating core for generics,  I found the following as a optimization of FieldDocSortedHitQueue:
> All FieldDoc values are Compareables (also the score or docid, if they
> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
> of lessThan seems very ineffective, as it has a big switch statement on the
> SortField type, then casts the value to the underlying numeric type Object,
> calls Number.xxxValue() & co for it and then compares manually. As
> j.l.Number is itself Comparable, I see no reason to do this. Just call
> compareTo on the Comparable interface and we are happy. The big deal is that
> it prevents casting and the two method calls xxxValue(), as Number.compareTo
> works more efficient internally.
> The only special cases are String sort, where the Locale may be used and the
> score sorting which is backwards. But these are two if statements instead of
> the whole switch.
> I had not tested it now for performance, but in my opinion it should be
> faster for MultiSearchers. All tests still pass (because they should).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769404#action_12769404 ] 

Uwe Schindler commented on LUCENE-2006:
---------------------------------------

The reason why this code looked like this is simple (from SVN log): at the beginning the FieldDoc values were just "Object[] fields". So the casts were needed. After adding custom comparators they get "Comparable". So there was no real perf idea behind doing it so complicated and ineffective.

> Optimization for FieldDocSortedHitQueue
> ---------------------------------------
>
>                 Key: LUCENE-2006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2006
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>
>         Attachments: LUCENE-2006.patch
>
>
> When updating core for generics,  I found the following as a optimization of FieldDocSortedHitQueue:
> All FieldDoc values are Compareables (also the score or docid, if they
> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
> of lessThan seems very ineffective, as it has a big switch statement on the
> SortField type, then casts the value to the underlying numeric type Object,
> calls Number.xxxValue() & co for it and then compares manually. As
> j.l.Number is itself Comparable, I see no reason to do this. Just call
> compareTo on the Comparable interface and we are happy. The big deal is that
> it prevents casting and the two method calls xxxValue(), as Number.compareTo
> works more efficient internally.
> The only special cases are String sort, where the Locale may be used and the
> score sorting which is backwards. But these are two if statements instead of
> the whole switch.
> I had not tested it now for performance, but in my opinion it should be
> faster for MultiSearchers. All tests still pass (because they should).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769522#action_12769522 ] 

Uwe Schindler commented on LUCENE-2006:
---------------------------------------

OK, I commit soon!

> Optimization for FieldDocSortedHitQueue
> ---------------------------------------
>
>                 Key: LUCENE-2006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2006
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>
>         Attachments: LUCENE-2006.patch
>
>
> When updating core for generics,  I found the following as a optimization of FieldDocSortedHitQueue:
> All FieldDoc values are Compareables (also the score or docid, if they
> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
> of lessThan seems very ineffective, as it has a big switch statement on the
> SortField type, then casts the value to the underlying numeric type Object,
> calls Number.xxxValue() & co for it and then compares manually. As
> j.l.Number is itself Comparable, I see no reason to do this. Just call
> compareTo on the Comparable interface and we are happy. The big deal is that
> it prevents casting and the two method calls xxxValue(), as Number.compareTo
> works more efficient internally.
> The only special cases are String sort, where the Locale may be used and the
> score sorting which is backwards. But these are two if statements instead of
> the whole switch.
> I had not tested it now for performance, but in my opinion it should be
> faster for MultiSearchers. All tests still pass (because they should).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2006) Optimization for FieldDocSortedHitQueue

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2006:
----------------------------------

    Attachment: LUCENE-2006.patch

Patch.

> Optimization for FieldDocSortedHitQueue
> ---------------------------------------
>
>                 Key: LUCENE-2006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2006
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>
>         Attachments: LUCENE-2006.patch
>
>
> When updating core for generics,  I found the following as a optimization of FieldDocSortedHitQueue:
> All FieldDoc values are Compareables (also the score or docid, if they
> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
> of lessThan seems very ineffective, as it has a big switch statement on the
> SortField type, then casts the value to the underlying numeric type Object,
> calls Number.xxxValue() & co for it and then compares manually. As
> j.l.Number is itself Comparable, I see no reason to do this. Just call
> compareTo on the Comparable interface and we are happy. The big deal is that
> it prevents casting and the two method calls xxxValue(), as Number.compareTo
> works more efficient internally.
> The only special cases are String sort, where the Locale may be used and the
> score sorting which is backwards. But these are two if statements instead of
> the whole switch.
> I had not tested it now for performance, but in my opinion it should be
> faster for MultiSearchers. All tests still pass (because they should).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org