You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2007/03/06 17:16:16 UTC

Search benchmark: 2.0 vs. 2.2-dev and heap sizing

Hi,

I'm doing some Lucene search benchmarking (got to love massive query logs :)) and have 2 questions:

1) Has anyone compared Lucene 2.0 and 2.2-dev?  My benchmarks found 2.2-dev (freshly baked) to be somewhat slower than 2.0, despite all those performance improvements (see CHANGES.txt)... Has anyone else done the comparison?  My queries are a mixture of 2-3 required keywords (majority) and phrase queries with 2-3 keywords.

To give you an idea about how much slower 2.2-dev is for me, here are some counts for queries I considered slow (> 1s latency) during my benchmark with 8 concurrent search threads and then 64 threads:


$ grep -c SLOW 5-shard-log-2.0/8.log 
1183
$ grep -c SLOW 5-shard-log-2.2-dev/8.log 
5479

$ grep -c SLOW 5-shard-log-2.0/64.log 
28657
$ grep -c SLOW 5-shard-log-2.2-dev/64.log 
33459

This is of a total of 100K queries.

2) My benchmark was against 5 optimized compound Lucene indices, about 9GB each, on a box with 32GB of RAM and several CPUs.  I gave the JVM 22GB with Xms and Xmx.  However, I am wondering if giving it that much is actually smart.  While I'm letting JVM use more RAM, I'm taking it away from the OS for FS caching.  So, I'm now thinking about running the same benchmark, but with a smaller max heap.  But how much should I give it?  I'm thinking about adding up sizes of all .tii files, adding some padding for the JVM, GC, etc., and using that.  Is there anything else I should consider here?

So I looked at one of the .cfs files:

_0.f0: 11164467 bytes
... other fields, same size, of course
_0.fdt: 381343723 bytes
_0.fdx: 89315736 bytes
_0.fnm: 78 bytes
_0.frq: 4591955197 bytes
_0.prx: 4242807266 bytes
_0.tii: 11498861 bytes
_0.tis: 829868070 bytes


Here, the .tii file is only about 11 MB.  That looks awfully small!  There is no way 5 x 11 MB + padding will be enough.  Should I be adding the size of some other file(s)?  .tis perhaps?

Thanks,
Otis

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Search benchmark: 2.0 vs. 2.2-dev and heap sizing

Posted by Doron Cohen <DO...@il.ibm.com>.

This is interesting.

Very large heaps can sometimes cause an expensive gc cycle ("Can heap be
too big?" - http://www.javaperformancetuning.com/news/qotm045.shtml) and
different memory allocation patterns between 2.0 and 2.2 could I think play
in too, so it would be interesting to know the numbers with smaller heap
sizes.

A few more questions:
- Readers reuse: Are all searches of the same thread sharing
searchers/readers? Are different threads sharing searchers/readers?
- What happens with a single thread?
- Is this degradation visible also by single queries, or are some queries
faster in 2.0 and some in 2.2?

Thanks,
Doron

Otis Gospodnetic <ot...@yahoo.com> wrote on 06/03/2007 08:16:16:

> Hi,
>
> I'm doing some Lucene search benchmarking (got to love massive query
> logs :)) and have 2 questions:
>
> 1) Has anyone compared Lucene 2.0 and 2.2-dev?  My benchmarks found
> 2.2-dev (freshly baked) to be somewhat slower than 2.0, despite all
> those performance improvements (see CHANGES.txt)... Has anyone else
> done the comparison?  My queries are a mixture of 2-3 required
> keywords (majority) and phrase queries with 2-3 keywords.
>
> To give you an idea about how much slower 2.2-dev is for me, here
> are some counts for queries I considered slow (> 1s latency) during
> my benchmark with 8 concurrent search threads and then 64 threads:
>
>
> $ grep -c SLOW 5-shard-log-2.0/8.log
> 1183
> $ grep -c SLOW 5-shard-log-2.2-dev/8.log
> 5479
>
> $ grep -c SLOW 5-shard-log-2.0/64.log
> 28657
> $ grep -c SLOW 5-shard-log-2.2-dev/64.log
> 33459
>
> This is of a total of 100K queries.
>
> 2) My benchmark was against 5 optimized compound Lucene indices,
> about 9GB each, on a box with 32GB of RAM and several CPUs.  I gave
> the JVM 22GB with Xms and Xmx.  However, I am wondering if giving it
> that much is actually smart.  While I'm letting JVM use more RAM,
> I'm taking it away from the OS for FS caching.  So, I'm now thinking
> about running the same benchmark, but with a smaller max heap.  But
> how much should I give it?  I'm thinking about adding up sizes of
> all .tii files, adding some padding for the JVM, GC, etc., and using
> that.  Is there anything else I should consider here?
>
> So I looked at one of the .cfs files:
>
> _0.f0: 11164467 bytes
> ... other fields, same size, of course
> _0.fdt: 381343723 bytes
> _0.fdx: 89315736 bytes
> _0.fnm: 78 bytes
> _0.frq: 4591955197 bytes
> _0.prx: 4242807266 bytes
> _0.tii: 11498861 bytes
> _0.tis: 829868070 bytes
>
>
> Here, the .tii file is only about 11 MB.  That looks awfully small!
> There is no way 5 x 11 MB + padding will be enough.  Should I be
> adding the size of some other file(s)?  .tis perhaps?
>
> Thanks,
> Otis


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org