You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2010/07/11 01:18:50 UTC

[jira] Updated: (LUCENE-2504) sorting performance regression

     [ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2504:
---------------------------------------

    Attachment: LUCENE-2504.zip


Digging into this, finally...

To try to make a somewhat more realistic search test, I created a
standalone test (attached zip file), which runs different query types
(term, phrase, OR of 2 terms, AND of 2 terms, prefix, phrase) sorting
by score or by a string field (with increasing numbers of unique
values: country (~250 values I think), and then
unique10/100/1K/10K/100K/1M)).  I derive the unique fields by taking
first N unique titles from wikipedia; the country field comes from the
SortableSingleDocSource in contrib/benchmark.

It runs with 2 threads (machine has 2 cores), and each thread first
shuffles the queries privately but deterministically, so that each
matching thread in the trunk & 3x tests are running query+sort in same
order.

I then created a Wikipedia index with first 5M docs, one optimized and
one not optimized (13 segments) and with 5% docs deleted, on trunk and
3x.

I sweep through all query+sorts 23 times (getting top 10 hits for
each), using 2 threads, measuring wall clock time each time.  I
discard first 3 results for each query+sort, and then take fastest
time of the remaining 20.

Java is 1.6.0_17; I run with -server -Xmx1g -Xms1g (machine has 3G
RAM); OS is Linux CentOS 5.5.

*NOTE*: these results include the patch from LUCENE-2504, for both
trunk & 3.x!

Results (pctg change in query time, going from 3x -> trunk) on
optimized index:

Results on optimized index:

||Query||country||unique10||unique100||unique1K||unique10K||unique100K||unique1M||score||
|<all>|{color:red}40.5%{color}|{color:red}40.6%{color}|{color:red}41.0%{color}|{color:red}40.5%{color}|{color:red}40.7%{color}|{color:green}1.6%{color}|{color:green}1.8%{color}|{color:red}2.8%{color}|
|+united +states|{color:red}6.1%{color}|{color:red}6.0%{color}|{color:red}6.0%{color}|{color:red}6.6%{color}|{color:red}6.3%{color}|{color:red}0.4%{color}|{color:red}1.4%{color}|{color:green}1.7%{color}|
|"united states"|{color:green}8.4%{color}|{color:green}8.5%{color}|{color:green}8.2%{color}|{color:green}8.1%{color}|{color:green}8.1%{color}|{color:green}9.2%{color}|{color:green}9.3%{color}|{color:green}8.7%{color}|
|states|{color:red}20.3%{color}|{color:red}20.4%{color}|{color:red}20.9%{color}|{color:red}22.5%{color}|{color:red}22.5%{color}|{color:red}8.0%{color}|{color:red}8.1%{color}|{color:green}0.1%{color}|
|unite*|{color:red}8.1%{color}|{color:red}8.3%{color}|{color:red}8.3%{color}|{color:red}8.6%{color}|{color:red}9.0%{color}|{color:green}2.8%{color}|{color:green}0.8%{color}|{color:green}1.2%{color}|
|united states|{color:red}1.3%{color}|{color:red}1.9%{color}|{color:red}2.5%{color}|{color:red}1.8%{color}|{color:red}2.2%{color}|{color:green}2.3%{color}|{color:green}1.3%{color}|{color:green}2.2%{color}|

Results on unoptimized index (w/ 5% deletions):

||Query||country||unique10||unique100||unique1K||unique10K||unique100K||unique1M||score||
|<all>|{color:red}25.1%{color}|{color:red}25.8%{color}|{color:red}24.9%{color}|{color:red}27.2%{color}|{color:red}26.3%{color}|{color:red}27.4%{color}|{color:red}27.3%{color}|{color:red}1.4%{color}|
|+united +states|{color:red}7.8%{color}|{color:red}7.6%{color}|{color:red}7.5%{color}|{color:red}7.8%{color}|{color:red}7.6%{color}|{color:red}8.6%{color}|{color:red}8.9%{color}|{color:red}6.5%{color}|
|"united states"|{color:green}13.4%{color}|{color:green}13.7%{color}|{color:green}13.6%{color}|{color:green}13.8%{color}|{color:green}13.4%{color}|{color:green}14.1%{color}|{color:green}13.6%{color}|{color:green}14.8%{color}|
|states|{color:red}13.6%{color}|{color:red}14.3%{color}|{color:red}14.2%{color}|{color:red}15.5%{color}|{color:red}15.5%{color}|{color:red}18.6%{color}|{color:red}18.8%{color}|{color:red}1.7%{color}|
|unite*|{color:red}5.8%{color}|{color:red}5.3%{color}|{color:red}5.0%{color}|{color:red}5.7%{color}|{color:red}5.3%{color}|{color:red}6.9%{color}|{color:red}6.9%{color}|{color:green}2.4%{color}|
|united states|{color:red}2.3%{color}|{color:red}2.6%{color}|{color:red}1.4%{color}|{color:red}1.9%{color}|{color:red}2.5%{color}|{color:red}4.9%{color}|{color:red}6.6%{color}|{color:red}0.1%{color}|

Unfortunately, the tests have highish variance (up to maybe +/- 10%),
I think thanks to hotspot's unpredictability ("java ghosts").  EG if I
change the order in which the queries are run, the results change
quite a bit.  If I run the exact same test, results change alot.  This
of course makes conclusions nearly impossible... but still some rough
observations:
 
  * Trunk is definitely slower when sorting by field; sorting by
    score is roughly the same perf.

  * For some reason, the unoptimized index generally takes less perf
    hit than the optimized index... odd.

  * Curious that phrase query is faster across the board... not sure
    why.  Maybe my recent optos to PhraseQuery somehow favor flex?

  * Perf loss is in proportion to how "easy" the query is
    (AllDocsQuery is the worst; TermQuery next), which makes sense
    since the slowdown is in collection.

Even though the results are noisy... I still think we should try to
specialize direct access to the native array for doc->ord lookup.
I'll work on that next...


> sorting performance regression
> ------------------------------
>
>                 Key: LUCENE-2504
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2504
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: LUCENE-2504.zip
>
>
> sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org