You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Scott Schneider <Sc...@symantec.com> on 2014/01/21 05:22:40 UTC

Performance testing Lucene

Hello,

Would you folks mind giving me a few tips on performance testing Lucene?  I want to test the performance impact of a Directory subclass.

What is a good testing tool to use?  I don't see a great way to get SolrMeter to run the max # updates/minute and measure throughput that way.  When I set the # updates/minute to a large #, SolrMeter logs NullPointerExceptions.  (I assume these are within SolrMeter, as I don't see errors in Solr.)  Mike McCandless's nightly Lucene performance tests look good, though I've only just started looking at it.

Are there any particularly standard or good test sets?  I'd like to test 3 scenarios:  indexing only, querying only, and indexing plus querying.  McCandless's indexing test uses wikipedia, which seems great, but he has a slew of tests that are each specific to some querying feature.  I'd like a single, general query test.  It's not hard to come up with a decent set of queries, but I'd really like something representative of real world queries.  If there some standard set of commonly used queries, that would be ideal.

Thanks!

Scott


RE: Performance testing Lucene

Posted by Scott Schneider <Sc...@symantec.com>.
Thanks.  I wound up writing my own performance test tool, since, among other things, I want to be sure that the index is big enough to not fit in memory (either in a Lucene cache or the OS disk cache).  I will take your recommendation about nightly tests, though.  It's hard to have too many unit tests!

Scott


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Friday, January 24, 2014 3:03 AM
> To: java-user@lucene.apache.org
> Subject: RE: Performance testing Lucene
> 
> Hi Scott,
> 
> the unit tests are also a good performance test. But to compare your
> directory with another one, be sure to:
> - use a defined directory instance to compare. The most performant
> Lucene one is: -Dtests.directory=MMapDirectory - so compare you results
> with that one. If you don't define a diferectly, it uses RAMDirectory
> in most cases.
> - use a defined random seed when comparing results. Lucene tests
> randomize a lot. Randomization can be prevented by explicitely stating
> a given random seed (one example is given on startup). Also run "ant
> test-help" to get more usage help.
> - to do more stress testing - this will create larger indexes: -
> Dtests.nightly=true
> - use a single JVM: -Dtests.jvms=1
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Scott Schneider [mailto:Scott_Schneider@symantec.com]
> > Sent: Friday, January 24, 2014 2:41 AM
> > To: java-user@lucene.apache.org
> > Subject: RE: Performance testing Lucene
> >
> > Thanks!  I ran this Directory subclass through the Lucene unit tests
> (and
> > found 3 race conditions).  Unit tests are wonderful.
> >
> > Scott
> >
> >
> > > -----Original Message-----
> > > From: Michael McCandless [mailto:lucene@mikemccandless.com]
> > > Sent: Wednesday, January 22, 2014 7:05 AM
> > > To: Lucene Users
> > > Subject: Re: Performance testing Lucene
> > >
> > > All the source code for the nightly Lucene perf tests I run (
> > > http://people.apache.org/~mikemccand/lucenebench/ ) are here:
> > > https://code.google.com/a/apache-extras.org/p/luceneutil/
> > >
> > > These are also the scripts I use for A/B performance tests for a
> new
> > > patch.
> > >
> > > It's somewhat tricky getting those Python scripts set up to run ...
> > > but I think it'd be a good way to smoke test your new Directory.
> > >
> > > The queries are "synthetic"; it's a real problem, not having a real
> > > world, biggish corpus plus real queries, for better performance
> > > testing...
> > >
> > > Mike McCandless
> > >
> > > http://blog.mikemccandless.com
> > >
> > >
> > > On Mon, Jan 20, 2014 at 11:22 PM, Scott Schneider
> > > <Sc...@symantec.com> wrote:
> > > > Hello,
> > > >
> > > > Would you folks mind giving me a few tips on performance testing
> > > Lucene?  I want to test the performance impact of a Directory
> subclass.
> > > >
> > > > What is a good testing tool to use?  I don't see a great way to
> get
> > > SolrMeter to run the max # updates/minute and measure throughput
> that
> > > way.  When I set the # updates/minute to a large #, SolrMeter logs
> > > NullPointerExceptions.  (I assume these are within SolrMeter, as I
> > > don't see errors in Solr.)  Mike McCandless's nightly Lucene
> > > performance tests look good, though I've only just started looking
> at
> > > it.
> > > >
> > > > Are there any particularly standard or good test sets?  I'd like
> to
> > > test 3 scenarios:  indexing only, querying only, and indexing plus
> > > querying.  McCandless's indexing test uses wikipedia, which seems
> > > great, but he has a slew of tests that are each specific to some
> > > querying feature.  I'd like a single, general query test.  It's not
> > > hard to come up with a decent set of queries, but I'd really like
> > > something representative of real world queries.  If there some
> > > standard set of commonly used queries, that would be ideal.
> > > >
> > > > Thanks!
> > > >
> > > > Scott
> > > >
> > >
> > > -------------------------------------------------------------------
> --
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Performance testing Lucene

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Scott,

the unit tests are also a good performance test. But to compare your directory with another one, be sure to:
- use a defined directory instance to compare. The most performant Lucene one is: -Dtests.directory=MMapDirectory - so compare you results with that one. If you don't define a diferectly, it uses RAMDirectory in most cases.
- use a defined random seed when comparing results. Lucene tests randomize a lot. Randomization can be prevented by explicitely stating a given random seed (one example is given on startup). Also run "ant test-help" to get more usage help.
- to do more stress testing - this will create larger indexes: -Dtests.nightly=true
- use a single JVM: -Dtests.jvms=1

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Scott Schneider [mailto:Scott_Schneider@symantec.com]
> Sent: Friday, January 24, 2014 2:41 AM
> To: java-user@lucene.apache.org
> Subject: RE: Performance testing Lucene
> 
> Thanks!  I ran this Directory subclass through the Lucene unit tests (and
> found 3 race conditions).  Unit tests are wonderful.
> 
> Scott
> 
> 
> > -----Original Message-----
> > From: Michael McCandless [mailto:lucene@mikemccandless.com]
> > Sent: Wednesday, January 22, 2014 7:05 AM
> > To: Lucene Users
> > Subject: Re: Performance testing Lucene
> >
> > All the source code for the nightly Lucene perf tests I run (
> > http://people.apache.org/~mikemccand/lucenebench/ ) are here:
> > https://code.google.com/a/apache-extras.org/p/luceneutil/
> >
> > These are also the scripts I use for A/B performance tests for a new
> > patch.
> >
> > It's somewhat tricky getting those Python scripts set up to run ...
> > but I think it'd be a good way to smoke test your new Directory.
> >
> > The queries are "synthetic"; it's a real problem, not having a real
> > world, biggish corpus plus real queries, for better performance
> > testing...
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Mon, Jan 20, 2014 at 11:22 PM, Scott Schneider
> > <Sc...@symantec.com> wrote:
> > > Hello,
> > >
> > > Would you folks mind giving me a few tips on performance testing
> > Lucene?  I want to test the performance impact of a Directory subclass.
> > >
> > > What is a good testing tool to use?  I don't see a great way to get
> > SolrMeter to run the max # updates/minute and measure throughput that
> > way.  When I set the # updates/minute to a large #, SolrMeter logs
> > NullPointerExceptions.  (I assume these are within SolrMeter, as I
> > don't see errors in Solr.)  Mike McCandless's nightly Lucene
> > performance tests look good, though I've only just started looking at
> > it.
> > >
> > > Are there any particularly standard or good test sets?  I'd like to
> > test 3 scenarios:  indexing only, querying only, and indexing plus
> > querying.  McCandless's indexing test uses wikipedia, which seems
> > great, but he has a slew of tests that are each specific to some
> > querying feature.  I'd like a single, general query test.  It's not
> > hard to come up with a decent set of queries, but I'd really like
> > something representative of real world queries.  If there some
> > standard set of commonly used queries, that would be ideal.
> > >
> > > Thanks!
> > >
> > > Scott
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Performance testing Lucene

Posted by Michael McCandless <lu...@mikemccandless.com>.
Oh that's good to hear.  Lucene's unit tests are quite stressful on a
new Directory impl...

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 23, 2014 at 8:40 PM, Scott Schneider
<Sc...@symantec.com> wrote:
> Thanks!  I ran this Directory subclass through the Lucene unit tests (and found 3 race conditions).  Unit tests are wonderful.
>
> Scott
>
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Wednesday, January 22, 2014 7:05 AM
>> To: Lucene Users
>> Subject: Re: Performance testing Lucene
>>
>> All the source code for the nightly Lucene perf tests I run (
>> http://people.apache.org/~mikemccand/lucenebench/ ) are here:
>> https://code.google.com/a/apache-extras.org/p/luceneutil/
>>
>> These are also the scripts I use for A/B performance tests for a new
>> patch.
>>
>> It's somewhat tricky getting those Python scripts set up to run ...
>> but I think it'd be a good way to smoke test your new Directory.
>>
>> The queries are "synthetic"; it's a real problem, not having a real
>> world, biggish corpus plus real queries, for better performance
>> testing...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, Jan 20, 2014 at 11:22 PM, Scott Schneider
>> <Sc...@symantec.com> wrote:
>> > Hello,
>> >
>> > Would you folks mind giving me a few tips on performance testing
>> Lucene?  I want to test the performance impact of a Directory subclass.
>> >
>> > What is a good testing tool to use?  I don't see a great way to get
>> SolrMeter to run the max # updates/minute and measure throughput that
>> way.  When I set the # updates/minute to a large #, SolrMeter logs
>> NullPointerExceptions.  (I assume these are within SolrMeter, as I
>> don't see errors in Solr.)  Mike McCandless's nightly Lucene
>> performance tests look good, though I've only just started looking at
>> it.
>> >
>> > Are there any particularly standard or good test sets?  I'd like to
>> test 3 scenarios:  indexing only, querying only, and indexing plus
>> querying.  McCandless's indexing test uses wikipedia, which seems
>> great, but he has a slew of tests that are each specific to some
>> querying feature.  I'd like a single, general query test.  It's not
>> hard to come up with a decent set of queries, but I'd really like
>> something representative of real world queries.  If there some standard
>> set of commonly used queries, that would be ideal.
>> >
>> > Thanks!
>> >
>> > Scott
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Performance testing Lucene

Posted by Scott Schneider <Sc...@symantec.com>.
Thanks!  I ran this Directory subclass through the Lucene unit tests (and found 3 race conditions).  Unit tests are wonderful.

Scott
 

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Wednesday, January 22, 2014 7:05 AM
> To: Lucene Users
> Subject: Re: Performance testing Lucene
> 
> All the source code for the nightly Lucene perf tests I run (
> http://people.apache.org/~mikemccand/lucenebench/ ) are here:
> https://code.google.com/a/apache-extras.org/p/luceneutil/
> 
> These are also the scripts I use for A/B performance tests for a new
> patch.
> 
> It's somewhat tricky getting those Python scripts set up to run ...
> but I think it'd be a good way to smoke test your new Directory.
> 
> The queries are "synthetic"; it's a real problem, not having a real
> world, biggish corpus plus real queries, for better performance
> testing...
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Mon, Jan 20, 2014 at 11:22 PM, Scott Schneider
> <Sc...@symantec.com> wrote:
> > Hello,
> >
> > Would you folks mind giving me a few tips on performance testing
> Lucene?  I want to test the performance impact of a Directory subclass.
> >
> > What is a good testing tool to use?  I don't see a great way to get
> SolrMeter to run the max # updates/minute and measure throughput that
> way.  When I set the # updates/minute to a large #, SolrMeter logs
> NullPointerExceptions.  (I assume these are within SolrMeter, as I
> don't see errors in Solr.)  Mike McCandless's nightly Lucene
> performance tests look good, though I've only just started looking at
> it.
> >
> > Are there any particularly standard or good test sets?  I'd like to
> test 3 scenarios:  indexing only, querying only, and indexing plus
> querying.  McCandless's indexing test uses wikipedia, which seems
> great, but he has a slew of tests that are each specific to some
> querying feature.  I'd like a single, general query test.  It's not
> hard to come up with a decent set of queries, but I'd really like
> something representative of real world queries.  If there some standard
> set of commonly used queries, that would be ideal.
> >
> > Thanks!
> >
> > Scott
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Performance testing Lucene

Posted by Michael McCandless <lu...@mikemccandless.com>.
All the source code for the nightly Lucene perf tests I run (
http://people.apache.org/~mikemccand/lucenebench/ ) are here:
https://code.google.com/a/apache-extras.org/p/luceneutil/

These are also the scripts I use for A/B performance tests for a new patch.

It's somewhat tricky getting those Python scripts set up to run ...
but I think it'd be a good way to smoke test your new Directory.

The queries are "synthetic"; it's a real problem, not having a real
world, biggish corpus plus real queries, for better performance
testing...

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jan 20, 2014 at 11:22 PM, Scott Schneider
<Sc...@symantec.com> wrote:
> Hello,
>
> Would you folks mind giving me a few tips on performance testing Lucene?  I want to test the performance impact of a Directory subclass.
>
> What is a good testing tool to use?  I don't see a great way to get SolrMeter to run the max # updates/minute and measure throughput that way.  When I set the # updates/minute to a large #, SolrMeter logs NullPointerExceptions.  (I assume these are within SolrMeter, as I don't see errors in Solr.)  Mike McCandless's nightly Lucene performance tests look good, though I've only just started looking at it.
>
> Are there any particularly standard or good test sets?  I'd like to test 3 scenarios:  indexing only, querying only, and indexing plus querying.  McCandless's indexing test uses wikipedia, which seems great, but he has a slew of tests that are each specific to some querying feature.  I'd like a single, general query test.  It's not hard to come up with a decent set of queries, but I'd really like something representative of real world queries.  If there some standard set of commonly used queries, that would be ideal.
>
> Thanks!
>
> Scott
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org