You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Scott Oshima <so...@gmail.com> on 2007/03/28 19:56:48 UTC

index file size threshold affecting search performance?

So I assumed a linear decay of performance as an index got bigger.

For some reason when going from an index size of 1.89 to 1.95 gigs
dramatically increased cpu across all of our servers.

I was thinking of splitting the 1.95 index into 2 separate indexes  and
using a multisearcher on those parts?

thanks.

-scott

Re: index file size threshold affecting search performance?

Posted by Mike Klaas <mi...@gmail.com>.
On 3/28/07, Scott Oshima <so...@gmail.com> wrote:
> So I assumed a linear decay of performance as an index got bigger.
>
> For some reason when going from an index size of 1.89 to 1.95 gigs
> dramatically increased cpu across all of our servers.
>
> I was thinking of splitting the 1.95 index into 2 separate indexes  and
> using a multisearcher on those parts?

PS, you might find this helpful:
http://www.catb.org/~esr/faqs/smart-questions.html

You should tell us what you have done and what was the unexpected
consequences (and possibly your hypothesis as to why).  Instead,
you've only told us your hypothesis, but not:

 - whether the increase is due to more documents or more data per document
 - whether the increase is in the indexed content or store field content
 - what format of additional data is being stored and what you are doing with it
 - where the performance degradation is occurring (query or document retrieval)
 - if the former, what type of queries are being used
 - if the latter, how many documents are being retrieved

I could see raw index size having an effect if the active set _just_
fits in the OS buffer cache, and you've pushed it over the edge.  But
in that case, I would expect the performance degradation to manifest
as increased io.

One guess is that you added a compressed field, which can be cpu intensive.

Guessing is painful for us and for you.  Provide more details! :)

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: index file size threshold affecting search performance?

Posted by Erick Erickson <er...@gmail.com>.
Well, if you're adding data, you must be doing something with
it later. Are you sure the problem is in the index growing and not
how you use the data afterwards?

The reason I ask is that I found almost a 10-fold increase in
my apps performance when I used FieldSelector (Lucene 2.1)
to only load the important fields in my documents. See the
thread labeled

Lucene 2.1, using FieldSelector speeds up my app by a factor of 10+,
numbers attached

A quick test you could do is to time the actual search as
opposed to the overall response time. That is, just time
the call to Searcher.search, and collect the time you spend,
say, assembling whatever you do to respond separately to
see whether the search is killing you or your manipulation
after the search.

The take-away is that we're all surprised by the difference you're
seeing and I'm betting that it's something other than merely
adding 5% to your index size. What is left as an exercise for the
reader <G>.

Best
Erick


On 3/28/07, Oshima, Scott <so...@business.com> wrote:
>
> Yeah it might be an hardware issue, with a slightly smaller index with
> less stored data, the performance is what we want it to be.  Just adding
> 5% more stored data(unidexed of course) pushes us over some sort of
> threshold causing performance to tank.
>
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Wednesday, March 28, 2007 12:46 PM
> To: java-user@lucene.apache.org
> Subject: Re: index file size threshold affecting search performance?
>
> I've just built a 9.3G index (admittedly tons of stored data in there,
> 3.3M documents) and performance is amazing (through Solr).
>
>         Erik
>
>
>
> On Mar 28, 2007, at 3:11 PM, Erick Erickson wrote:
>
> > This surprises me, I'm currently working with a 4G index, and the
> > improvement from when it was an 8G index was only 10% or so.
> > And it's plenty speedy.
> >
> > Are you hitting hardware limitations and perhaps swapping like crazy?
> > In which case, unless you split things across several machines, I
> > doubt it would help to make two smaller indexes.
> >
> > In sum, I really suspect that you're NOT hitting a Lucene limitation,
> > but it's something else about your system....
> >
> > Best
> > Erick
> >
> > On 3/28/07, Scott Oshima <so...@gmail.com> wrote:
> >>
> >> So I assumed a linear decay of performance as an index got bigger.
> >>
> >> For some reason when going from an index size of 1.89 to 1.95 gigs
> >> dramatically increased cpu across all of our servers.
> >>
> >> I was thinking of splitting the 1.95 index into 2 separate indexes
> >> and using a multisearcher on those parts?
> >>
> >> thanks.
> >>
> >> -scott
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: index file size threshold affecting search performance?

Posted by "Oshima, Scott" <so...@business.com>.
Yeah it might be an hardware issue, with a slightly smaller index with
less stored data, the performance is what we want it to be.  Just adding
5% more stored data(unidexed of course) pushes us over some sort of
threshold causing performance to tank.  

 

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: Wednesday, March 28, 2007 12:46 PM
To: java-user@lucene.apache.org
Subject: Re: index file size threshold affecting search performance?

I've just built a 9.3G index (admittedly tons of stored data in there,
3.3M documents) and performance is amazing (through Solr).

	Erik



On Mar 28, 2007, at 3:11 PM, Erick Erickson wrote:

> This surprises me, I'm currently working with a 4G index, and the 
> improvement from when it was an 8G index was only 10% or so.
> And it's plenty speedy.
>
> Are you hitting hardware limitations and perhaps swapping like crazy? 
> In which case, unless you split things across several machines, I 
> doubt it would help to make two smaller indexes.
>
> In sum, I really suspect that you're NOT hitting a Lucene limitation, 
> but it's something else about your system....
>
> Best
> Erick
>
> On 3/28/07, Scott Oshima <so...@gmail.com> wrote:
>>
>> So I assumed a linear decay of performance as an index got bigger.
>>
>> For some reason when going from an index size of 1.89 to 1.95 gigs 
>> dramatically increased cpu across all of our servers.
>>
>> I was thinking of splitting the 1.95 index into 2 separate indexes  
>> and using a multisearcher on those parts?
>>
>> thanks.
>>
>> -scott
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: index file size threshold affecting search performance?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
I've just built a 9.3G index (admittedly tons of stored data in  
there, 3.3M documents) and performance is amazing (through Solr).

	Erik



On Mar 28, 2007, at 3:11 PM, Erick Erickson wrote:

> This surprises me, I'm currently working with a 4G index, and the
> improvement from when it was an 8G index was only 10% or so.
> And it's plenty speedy.
>
> Are you hitting hardware limitations and perhaps swapping like
> crazy? In which case, unless you split things across several
> machines, I doubt it would help to make two smaller indexes.
>
> In sum, I really suspect that you're NOT hitting a Lucene limitation,
> but it's something else about your system....
>
> Best
> Erick
>
> On 3/28/07, Scott Oshima <so...@gmail.com> wrote:
>>
>> So I assumed a linear decay of performance as an index got bigger.
>>
>> For some reason when going from an index size of 1.89 to 1.95 gigs
>> dramatically increased cpu across all of our servers.
>>
>> I was thinking of splitting the 1.95 index into 2 separate  
>> indexes  and
>> using a multisearcher on those parts?
>>
>> thanks.
>>
>> -scott
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: index file size threshold affecting search performance?

Posted by Erick Erickson <er...@gmail.com>.
This surprises me, I'm currently working with a 4G index, and the
improvement from when it was an 8G index was only 10% or so.
And it's plenty speedy.

Are you hitting hardware limitations and perhaps swapping like
crazy? In which case, unless you split things across several
machines, I doubt it would help to make two smaller indexes.

In sum, I really suspect that you're NOT hitting a Lucene limitation,
but it's something else about your system....

Best
Erick

On 3/28/07, Scott Oshima <so...@gmail.com> wrote:
>
> So I assumed a linear decay of performance as an index got bigger.
>
> For some reason when going from an index size of 1.89 to 1.95 gigs
> dramatically increased cpu across all of our servers.
>
> I was thinking of splitting the 1.95 index into 2 separate indexes  and
> using a multisearcher on those parts?
>
> thanks.
>
> -scott
>