You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2010/07/11 01:18:50 UTC
[jira] Updated: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-2504:
---------------------------------------
Attachment: LUCENE-2504.zip
Digging into this, finally...
To try to make a somewhat more realistic search test, I created a
standalone test (attached zip file), which runs different query types
(term, phrase, OR of 2 terms, AND of 2 terms, prefix, phrase) sorting
by score or by a string field (with increasing numbers of unique
values: country (~250 values I think), and then
unique10/100/1K/10K/100K/1M)). I derive the unique fields by taking
first N unique titles from wikipedia; the country field comes from the
SortableSingleDocSource in contrib/benchmark.
It runs with 2 threads (machine has 2 cores), and each thread first
shuffles the queries privately but deterministically, so that each
matching thread in the trunk & 3x tests are running query+sort in same
order.
I then created a Wikipedia index with first 5M docs, one optimized and
one not optimized (13 segments) and with 5% docs deleted, on trunk and
3x.
I sweep through all query+sorts 23 times (getting top 10 hits for
each), using 2 threads, measuring wall clock time each time. I
discard first 3 results for each query+sort, and then take fastest
time of the remaining 20.
Java is 1.6.0_17; I run with -server -Xmx1g -Xms1g (machine has 3G
RAM); OS is Linux CentOS 5.5.
*NOTE*: these results include the patch from LUCENE-2504, for both
trunk & 3.x!
Results (pctg change in query time, going from 3x -> trunk) on
optimized index:
Results on optimized index:
||Query||country||unique10||unique100||unique1K||unique10K||unique100K||unique1M||score||
|<all>|{color:red}40.5%{color}|{color:red}40.6%{color}|{color:red}41.0%{color}|{color:red}40.5%{color}|{color:red}40.7%{color}|{color:green}1.6%{color}|{color:green}1.8%{color}|{color:red}2.8%{color}|
|+united +states|{color:red}6.1%{color}|{color:red}6.0%{color}|{color:red}6.0%{color}|{color:red}6.6%{color}|{color:red}6.3%{color}|{color:red}0.4%{color}|{color:red}1.4%{color}|{color:green}1.7%{color}|
|"united states"|{color:green}8.4%{color}|{color:green}8.5%{color}|{color:green}8.2%{color}|{color:green}8.1%{color}|{color:green}8.1%{color}|{color:green}9.2%{color}|{color:green}9.3%{color}|{color:green}8.7%{color}|
|states|{color:red}20.3%{color}|{color:red}20.4%{color}|{color:red}20.9%{color}|{color:red}22.5%{color}|{color:red}22.5%{color}|{color:red}8.0%{color}|{color:red}8.1%{color}|{color:green}0.1%{color}|
|unite*|{color:red}8.1%{color}|{color:red}8.3%{color}|{color:red}8.3%{color}|{color:red}8.6%{color}|{color:red}9.0%{color}|{color:green}2.8%{color}|{color:green}0.8%{color}|{color:green}1.2%{color}|
|united states|{color:red}1.3%{color}|{color:red}1.9%{color}|{color:red}2.5%{color}|{color:red}1.8%{color}|{color:red}2.2%{color}|{color:green}2.3%{color}|{color:green}1.3%{color}|{color:green}2.2%{color}|
Results on unoptimized index (w/ 5% deletions):
||Query||country||unique10||unique100||unique1K||unique10K||unique100K||unique1M||score||
|<all>|{color:red}25.1%{color}|{color:red}25.8%{color}|{color:red}24.9%{color}|{color:red}27.2%{color}|{color:red}26.3%{color}|{color:red}27.4%{color}|{color:red}27.3%{color}|{color:red}1.4%{color}|
|+united +states|{color:red}7.8%{color}|{color:red}7.6%{color}|{color:red}7.5%{color}|{color:red}7.8%{color}|{color:red}7.6%{color}|{color:red}8.6%{color}|{color:red}8.9%{color}|{color:red}6.5%{color}|
|"united states"|{color:green}13.4%{color}|{color:green}13.7%{color}|{color:green}13.6%{color}|{color:green}13.8%{color}|{color:green}13.4%{color}|{color:green}14.1%{color}|{color:green}13.6%{color}|{color:green}14.8%{color}|
|states|{color:red}13.6%{color}|{color:red}14.3%{color}|{color:red}14.2%{color}|{color:red}15.5%{color}|{color:red}15.5%{color}|{color:red}18.6%{color}|{color:red}18.8%{color}|{color:red}1.7%{color}|
|unite*|{color:red}5.8%{color}|{color:red}5.3%{color}|{color:red}5.0%{color}|{color:red}5.7%{color}|{color:red}5.3%{color}|{color:red}6.9%{color}|{color:red}6.9%{color}|{color:green}2.4%{color}|
|united states|{color:red}2.3%{color}|{color:red}2.6%{color}|{color:red}1.4%{color}|{color:red}1.9%{color}|{color:red}2.5%{color}|{color:red}4.9%{color}|{color:red}6.6%{color}|{color:red}0.1%{color}|
Unfortunately, the tests have highish variance (up to maybe +/- 10%),
I think thanks to hotspot's unpredictability ("java ghosts"). EG if I
change the order in which the queries are run, the results change
quite a bit. If I run the exact same test, results change alot. This
of course makes conclusions nearly impossible... but still some rough
observations:
* Trunk is definitely slower when sorting by field; sorting by
score is roughly the same perf.
* For some reason, the unoptimized index generally takes less perf
hit than the optimized index... odd.
* Curious that phrase query is faster across the board... not sure
why. Maybe my recent optos to PhraseQuery somehow favor flex?
* Perf loss is in proportion to how "easy" the query is
(AllDocsQuery is the worst; TermQuery next), which makes sense
since the slowdown is in collection.
Even though the results are noisy... I still think we should try to
specialize direct access to the native array for doc->ord lookup.
I'll work on that next...
> sorting performance regression
> ------------------------------
>
> Key: LUCENE-2504
> URL: https://issues.apache.org/jira/browse/LUCENE-2504
> Project: Lucene - Java
> Issue Type: Bug
> Affects Versions: 4.0
> Reporter: Yonik Seeley
> Fix For: 4.0
>
> Attachments: LUCENE-2504.zip
>
>
> sorting can be much slower on trunk than branch_3x
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org