You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Jokin Cuadrado <jo...@gmail.com> on 2009/11/03 12:05:29 UTC

Re: 26m documents, 8 fields, Core2 Duo (E6550 2.33GHz), /tags/Lucene.Net_2_4_0, 40 requests/second...is that fast or slow?

Lucene has a "cold start time" depending how big is your index, I
think that you should make some searches (50 for example) for warming
all the internal caches.

As you are searching just for an exact match that doesn't return much
results, the throughput should be higher.

However, the test that you are making is not very representative,
because you are using Lucene like a giant hashtable, without knowing
what are going to be the usual queries and the average hit number,
it's difficult to guess any optimization suggestion.

On Tue, Nov 3, 2009 at 4:58 AM, Ron Grabowski <ro...@yahoo.com> wrote:
> searcher

-- 
Jokin

RE: 26m documents, 8 fields, Core2 Duo (E6550 2.33GHz), /tags/Lucene.Net_2_4_0, 40 requests/second...is that fast or slow?

Posted by Michael Garski <mg...@myspace-inc.com>.
Ron,

Technically those caches are already serialized in the index itself.  I suppose you could serialize the caches yourself as there is no internal mechanism to do so, but that would be cumbersome and require accessing internal fields.  The approach I use is to warm up the caches before actually executing a query so all of the queries execute as fast as possible.

In your warm up be sure to access all of the filter and field caches as you would when executing queries and throw a few queries at it.  With that all of your queries will be responsive.

If you really want that warm up period as short as possible, increasing your IO bandwidth will help out quite a bit.  SATA is OK, 15k SAS drives are nice, SSD a big jump up, and Fusion IO cards are screamingly fast (provided you have the budget).

Michael
www.myspace.com/michaelgarski

-----Original Message-----
From: Ron Grabowski [mailto:rongrabowski@yahoo.com] 
Sent: Tuesday, November 03, 2009 2:47 PM
To: lucene-net-user@incubator.apache.org
Subject: Re: 26m documents, 8 fields, Core2 Duo (E6550 2.33GHz), /tags/Lucene.Net_2_4_0, 40 requests/second...is that fast or slow?

Thanks for the help Artem and Jokin, I let my index "warm up" and found exactly the results I was looking for! The improvement was fairly minor at first:

Found 500 documents in 11 seconds, 46 requests/second
Found 500 documents in 10 seconds, 48 requests/second
Found 500 documents in 10 seconds, 50 requests/second
Found 500 documents in 9 seconds, 53 requests/second
...
Found 500 documents in 9 seconds, 58 requests/second
Found 500 documents in 8 seconds, 62 requests/second

When I increased the number of documents for each round I was quite impressed:

Found 10000 documents in 36 seconds, 281 requests/second
Found 10000 documents in 4 seconds, 2,498 requests/second
Found 10000 documents in 3 seconds, 3,991 requests/second
Found 10000 documents in 2 seconds, 4,878 requests/second
Found 10000 documents in 2 seconds, 6,271 requests/second
Found 10000 documents in 1 seconds, 7,066 requests/second
Found 10000 documents in 1 seconds, 7,759 requests/second
Found 10000 documents in 1 seconds, 7,685 requests/second
Found 10000 documents in 1 seconds, 8,105 requests/second
Found 10000 documents in 1 seconds, 8,289 requests/second
Found 10000 documents in 1 seconds, 8,355 requests/second

That's with just a single thread accessing IndexSearcher. I think this experiment also answers a lot of my other questions to the list over the past few days with regards to QueryParser being "slow"...I just need to let my index warm up a bit longer. Is it possible to serialize the internal caches of an IndexSearcher/IndexReader so when its re-opened it doesn't require the warm up period?

Jokin, I agree this test wasn't very representative of a normal search. I wanted to set a baseline. I was concerned that if the fastest possible search was 40 requests/second then adding QueryParsers, geocoding, spellchecking, etc. would require me to look elsewhere for a solution.

As a new user to Lucene, I'm finding the Solr Wiki to be quite helpful explaining the role of tokens, analyzers, filters, etc.

Thanks,
Ron



----- Original Message ----
From: Artem Chereisky <a....@gmail.com>
To: lucene-net-user@incubator.apache.org
Sent: Tue, November 3, 2009 4:35:12 PM
Subject: Re: 26m documents, 8 fields, Core2 Duo (E6550 2.33GHz),  /tags/Lucene.Net_2_4_0, 40 requests/second...is that fast or slow?

My index has 2M documents. Any search with a term query and a simple custom
hit collector takes under 20ms. How large is your index? Can you use
RAMDirectory and make IndexReader read-only. That speeds things up.


On Tue, Nov 3, 2009 at 10:05 PM, Jokin Cuadrado <jo...@gmail.com> wrote:

> Lucene has a "cold start time" depending how big is your index, I
> think that you should make some searches (50 for example) for warming
> all the internal caches.
>
> As you are searching just for an exact match that doesn't return much
> results, the throughput should be higher.
>
> However, the test that you are making is not very representative,
> because you are using Lucene like a giant hashtable, without knowing
> what are going to be the usual queries and the average hit number,
> it's difficult to guess any optimization suggestion.
>
> On Tue, Nov 3, 2009 at 4:58 AM, Ron Grabowski <ro...@yahoo.com>
> wrote:
> > searcher
>
> --
> Jokin
>



Re: 26m documents, 8 fields, Core2 Duo (E6550 2.33GHz), /tags/Lucene.Net_2_4_0, 40 requests/second...is that fast or slow?

Posted by Ron Grabowski <ro...@yahoo.com>.
Thanks for the help Artem and Jokin, I let my index "warm up" and found exactly the results I was looking for! The improvement was fairly minor at first:

Found 500 documents in 11 seconds, 46 requests/second
Found 500 documents in 10 seconds, 48 requests/second
Found 500 documents in 10 seconds, 50 requests/second
Found 500 documents in 9 seconds, 53 requests/second
...
Found 500 documents in 9 seconds, 58 requests/second
Found 500 documents in 8 seconds, 62 requests/second

When I increased the number of documents for each round I was quite impressed:

Found 10000 documents in 36 seconds, 281 requests/second
Found 10000 documents in 4 seconds, 2,498 requests/second
Found 10000 documents in 3 seconds, 3,991 requests/second
Found 10000 documents in 2 seconds, 4,878 requests/second
Found 10000 documents in 2 seconds, 6,271 requests/second
Found 10000 documents in 1 seconds, 7,066 requests/second
Found 10000 documents in 1 seconds, 7,759 requests/second
Found 10000 documents in 1 seconds, 7,685 requests/second
Found 10000 documents in 1 seconds, 8,105 requests/second
Found 10000 documents in 1 seconds, 8,289 requests/second
Found 10000 documents in 1 seconds, 8,355 requests/second

That's with just a single thread accessing IndexSearcher. I think this experiment also answers a lot of my other questions to the list over the past few days with regards to QueryParser being "slow"...I just need to let my index warm up a bit longer. Is it possible to serialize the internal caches of an IndexSearcher/IndexReader so when its re-opened it doesn't require the warm up period?

Jokin, I agree this test wasn't very representative of a normal search. I wanted to set a baseline. I was concerned that if the fastest possible search was 40 requests/second then adding QueryParsers, geocoding, spellchecking, etc. would require me to look elsewhere for a solution.

As a new user to Lucene, I'm finding the Solr Wiki to be quite helpful explaining the role of tokens, analyzers, filters, etc.

Thanks,
Ron



----- Original Message ----
From: Artem Chereisky <a....@gmail.com>
To: lucene-net-user@incubator.apache.org
Sent: Tue, November 3, 2009 4:35:12 PM
Subject: Re: 26m documents, 8 fields, Core2 Duo (E6550 2.33GHz),  /tags/Lucene.Net_2_4_0, 40 requests/second...is that fast or slow?

My index has 2M documents. Any search with a term query and a simple custom
hit collector takes under 20ms. How large is your index? Can you use
RAMDirectory and make IndexReader read-only. That speeds things up.


On Tue, Nov 3, 2009 at 10:05 PM, Jokin Cuadrado <jo...@gmail.com> wrote:

> Lucene has a "cold start time" depending how big is your index, I
> think that you should make some searches (50 for example) for warming
> all the internal caches.
>
> As you are searching just for an exact match that doesn't return much
> results, the throughput should be higher.
>
> However, the test that you are making is not very representative,
> because you are using Lucene like a giant hashtable, without knowing
> what are going to be the usual queries and the average hit number,
> it's difficult to guess any optimization suggestion.
>
> On Tue, Nov 3, 2009 at 4:58 AM, Ron Grabowski <ro...@yahoo.com>
> wrote:
> > searcher
>
> --
> Jokin
>


Re: 26m documents, 8 fields, Core2 Duo (E6550 2.33GHz), /tags/Lucene.Net_2_4_0, 40 requests/second...is that fast or slow?

Posted by Artem Chereisky <a....@gmail.com>.
My index has 2M documents. Any search with a term query and a simple custom
hit collector takes under 20ms. How large is your index? Can you use
RAMDirectory and make IndexReader read-only. That speeds things up.


On Tue, Nov 3, 2009 at 10:05 PM, Jokin Cuadrado <jo...@gmail.com> wrote:

> Lucene has a "cold start time" depending how big is your index, I
> think that you should make some searches (50 for example) for warming
> all the internal caches.
>
> As you are searching just for an exact match that doesn't return much
> results, the throughput should be higher.
>
> However, the test that you are making is not very representative,
> because you are using Lucene like a giant hashtable, without knowing
> what are going to be the usual queries and the average hit number,
> it's difficult to guess any optimization suggestion.
>
> On Tue, Nov 3, 2009 at 4:58 AM, Ron Grabowski <ro...@yahoo.com>
> wrote:
> > searcher
>
> --
> Jokin
>