You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2010/09/15 15:01:41 UTC

[jira] Issue Comment Edited: (LUCENE-2504) sorting performance regression

    [ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909407#action_12909407 ] 

Yonik Seeley edited comment on LUCENE-2504 at 9/15/10 9:00 AM:
---------------------------------------------------------------

bq. The open question is whether this hotspot fickleness is particular to Oracle's java impl, or, is somehow endemic to bytecode VMs (.NET included).

I tried IBM's latest Java6 (SR8 FP1, 20100624)
It seems to have some of the same pitfalls as Oracle's JVM, just different.
The first run does not differ from the second run in the same JVM as it does with Oracle, but the first run itself has much more variation.  The worst case is worse, and just like the Oracle JVM, it gets stuck in it's worst case.

Each run (of the complete set of fields) in a separate JVM since two runs in the same JVM didn't really differ as they did in the oracle JVM.


branch_3x:
|unique terms in field|median sort time of 100 sorts in ms|another run|another run|another run|another run|another run|another run
|100000|129|128|130|109|98|128|135
|10000|128|123|127|127|98|128|135
|1000|129|130|130|128|98|130|136
|100|128|133|133|130|100|132|139
|10|150|153|153|154|122|153|159

trunk:
|unique terms in field|median sort time of 100 sorts in ms|another run|another run|another run|another run|another run|another run
|100000|217|81|383|99|79|78|215
|10000|254|73|346|101|106|108|267
|1000|253|74|347|99|107|108|258
|100|253|107|394|98|107|102|255
|10|251|107|388|99|106|98|257

The second way of testing is to completely mix fields (no serial correlation between what field is sorted on).  This is the test that is very predictable with the Oracle JVM, but I still see wide variability with the IBM JVM.  Here is the list of different runs for the IBM JVM (ms):

branch_3x
|128|129|123|120|128|100|95|74|130|91|120

trunk
|106|89|168|116|155|119|108|118|112|169|165

To my eye, it looks like we have more variability in trunk, due to increased use of abstractions?

edit: corrected the table description - all times in this message are for the IBM JVM.


      was (Author: yseeley@gmail.com):
    bq. The open question is whether this hotspot fickleness is particular to Oracle's java impl, or, is somehow endemic to bytecode VMs (.NET included).

I tried IBM's latest Java6 (SR8 FP1, 20100624)
It seems to have some of the same pitfalls as Oracle's JVM, just different.
The first run does not differ from the second run in the same JVM as it does with Oracle, but the first run itself has much more variation.  The worst case is worse, and just like the Oracle JVM, it gets stuck in it's worst case.

Each run (of the complete set of fields) in a separate JVM since two runs in the same JVM didn't really differ as they did in the oracle JVM.


branch_3x:
|unique terms in field|median sort time of 100 sorts in ms|another run|another run|another run|another run|another run|another run
|100000|129|128|130|109|98|128|135
|10000|128|123|127|127|98|128|135
|1000|129|130|130|128|98|130|136
|100|128|133|133|130|100|132|139
|10|150|153|153|154|122|153|159

trunk:
|unique terms in field|median sort time of 100 sorts in ms|another run|another run|another run|another run|another run|another run
|100000|217|81|383|99|79|78|215
|10000|254|73|346|101|106|108|267
|1000|253|74|347|99|107|108|258
|100|253|107|394|98|107|102|255
|10|251|107|388|99|106|98|257

The second way of testing is to completely mix fields (no serial correlation between what field is sorted on).  This is the test that is very predictable with the Oracle JVM, but I still see wide variability with the IBM JVM.  Here is the list of different runs for the Oracle JVM (ms):

branch_3x
|128|129|123|120|128|100|95|74|130|91|120

trunk
|106|89|168|116|155|119|108|118|112|169|165

To my eye, it looks like we have more variability in trunk, due to increased use of abstractions?

  
> sorting performance regression
> ------------------------------
>
>                 Key: LUCENE-2504
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2504
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip, LUCENE-2504_SortMissingLast.patch
>
>
> sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org