You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by John Wang <jo...@gmail.com> on 2009/10/12 07:19:33 UTC

new sorting api and some perf numbers

Hi guys:
    The new FieldComparator api looks really scary :)

    But after some perf testing with numbers I'd like to share, I guess it
is worth it:

HW: Mac Pro with 16G memory
jvm: 1.6.0_13"
jvm arg: -Xms1g -Xmx1g -server

setup

index:
1M docs even split into 8 segments (to make sure the test is fair across
segment boundaries)
each doc has 3 fields:
1) id - stored
2) val - random number, indexed, not analyzed, no norms, omit tf
3) string - "even" or "odd" of the corresponding id, not analyzed, no norms,
omit tf

built with lucene 2.4.1 to keep the same index across lucene 2.4.1 and
lucene 2.9.0 search tests

Search:
query on the term: "even" (TermQuery, minimizes the overhead of the text
search), matches 500k docs, and across segment boundary, sort by val, sort
type: string. Numhits, e.g. number of slots = 100.

ran 20 iterations of the same query for each test.

First query, includes loading

lucene 2.4.1: 4858ms, lucene 2.9.0: 816ms, gain of 595%

avg of the rest 19 queries:

lucene 2.4.1: 32ms, lucene 2.9.0: 17ms , gain of 188%

I ran this test about 5 times, the findings are similar.

The performance gain is significant!

Great job!

-John

Re: new sorting api and some perf numbers

Posted by Bradford Stephens <br...@gmail.com>.

Wow! This is awesome. Can't wait to see how it plays with Bobo :)

On Sun, Oct 11, 2009 at 10:19 PM, John Wang <jo...@gmail.com> wrote:
> Hi guys:
>    The new FieldComparator api looks really scary :)
>
>    But after some perf testing with numbers I'd like to share, I guess it
> is worth it:
>
> HW: Mac Pro with 16G memory
> jvm: 1.6.0_13"
> jvm arg: -Xms1g -Xmx1g -server
>
> setup
>
> index:
> 1M docs even split into 8 segments (to make sure the test is fair across
> segment boundaries)
> each doc has 3 fields:
> 1) id - stored
> 2) val - random number, indexed, not analyzed, no norms, omit tf
> 3) string - "even" or "odd" of the corresponding id, not analyzed, no norms,
> omit tf
>
> built with lucene 2.4.1 to keep the same index across lucene 2.4.1 and
> lucene 2.9.0 search tests
>
> Search:
> query on the term: "even" (TermQuery, minimizes the overhead of the text
> search), matches 500k docs, and across segment boundary, sort by val, sort
> type: string. Numhits, e.g. number of slots = 100.
>
> ran 20 iterations of the same query for each test.
>
> First query, includes loading
>
> lucene 2.4.1: 4858ms, lucene 2.9.0: 816ms, gain of 595%
>
> avg of the rest 19 queries:
>
> lucene 2.4.1: 32ms, lucene 2.9.0: 17ms , gain of 188%
>
> I ran this test about 5 times, the findings are similar.
>
> The performance gain is significant!
>
> Great job!
>
> -John
>



-- 
http://www.drawntoscaleconsulting.com - Scalability, Hadoop, HBase,
and Distributed Lucene Consulting

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org