You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by George Aroush <ge...@aroush.net> on 2006/11/01 05:58:23 UTC

RE: Question about query performance degredation

Hi Andy,

I believe you are on the right track, index fragmentation maybe your issue.

How frequently are you updating the index, vs. how frequently are you
optimizing it?  Is the update adding new documents vs. modifying existing
documents?

If after optimizing you still don't get back the original performance, stop
indexing for a bit and see if search gets better.

If fragmentation is your issue, I have some suggestions that may work for
you.

Regards,

-- George

-----Original Message-----
From: Andy Berryman [mailto:topdev1@gmail.com] 
Sent: Tuesday, October 31, 2006 1:25 PM
To: lucene-net-user@incubator.apache.org;
lucene-net-dev@incubator.apache.org
Subject: Question about query performance degredation

I have a scenario where I'm seeing the performance (specifically time) of
searches against my index degrade on a daily basis.  The amount of time it
is taking to load the index is staying fairly constant however.  This is a
fairly large index.  It has over a million documents in it.

The scenario I have is that I'm maintaining the index from data in the
database ... and I'm doing so on onstant basis.  So essentially as changes
are made in the database I have a background task that updates the index.
So I'm supporting concurrent readers and writers on a constant basis
throughout the day.  I'm NOT using compound files.  During my development
and testing, the use of compound files caused a significant increase in Disk
I/O usage and caused the maintenance of the index to take much longer.  As
such ... I decided against them.

My thoughts are that the reason the search is taking longer is because the
index files are getting more and more "fragmented" over time because I'm not
using the compound files.  And that's why the searches are taking longer.

Thoughts?

Thanks
Andy


RE: Question about query performance degredation

Posted by George Aroush <ge...@aroush.net>.
Hi Andy,

I am glad to see you got this solved.  How long did it take to optimize the
index?  I think you are trying keep your searcher fresh with the index
within 10 minute, right?  So if the optimization took longer then 10
minutes, you may have a new problem.  (Lets discuss this in the other email
thread.)

Regards,

-- George Aroush

-----Original Message-----
From: Andy Berryman [mailto:topdev1@gmail.com] 
Sent: Thursday, November 02, 2006 12:55 PM
To: lucene-net-dev@incubator.apache.org
Subject: Re: Question about query performance degredation

I dont really have the ability to just perform the operation in my
"production" environment.  So what I did to test this theory out was copy
one of my indexes from "production" down to my local machine.  I then setup
a test app to do the following:

- run 10 identicle searches against the index and output the hit count and
search time for each
- optimize the index
- run the same 10 searches over again

And what I saw was pretty astounding.  The results improved by almost 60%
and the size of the index shrunk by about 50%.

So I'm gonna guess that fragmentation is the key factor here.  So what I
think that I'm going to end up doing is adding a step into my indexing
process to optimize the index once every couple of days.  That should give
me some pretty nice results without adding too much overall load to the
system.

Thanks for the guidance
Andy

On 11/1/06, George Aroush <ge...@aroush.net> wrote:
>
> Hi Andy,
>
> Yes, please, let us know how it goes when you optimize.  If that 
> doesn't help, after optimizing, stop indexing for a bit.  Even a 
> better stop the indexer application, and re-start the searcher.  I.e.: 
> a reboot of your application with the indexer out of your way.
>
> Regards,
>
> -- George Aroush
>
> -----Original Message-----
> From: Andy Berryman [mailto:topdev1@gmail.com]
> Sent: Wednesday, November 01, 2006 9:29 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: Re: Question about query performance degredation
>
> I'm maintaining the index at a pretty constant rate throughout the day.
> Right now its possible that at least 1 document is getting updated 
> every 10 minutes.  (The background process I am using runs every 10 
> minutes to look for changes that need to be indexed.)
>
> I my specific case ... For a document that I need to "update" in the 
> index ... I make a call to delete the document first and then I create 
> a new document (with the updated info from the database) and add it 
> into the index.
>
> As for optimizing ... Currently I am not making any calls to "Optimize()".
>
> So I guess your first suggestion would be to optimize the index and 
> check the query performance after that?
>
> Thanks
> Andy
>
>
> On 10/31/06, George Aroush <ge...@aroush.net> wrote:
> >
> > Hi Andy,
> >
> > I believe you are on the right track, index fragmentation maybe your 
> > issue.
> >
> > How frequently are you updating the index, vs. how frequently are 
> > you optimizing it?  Is the update adding new documents vs. modifying 
> > existing documents?
> >
> > If after optimizing you still don't get back the original 
> > performance, stop indexing for a bit and see if search gets better.
> >
> > If fragmentation is your issue, I have some suggestions that may 
> > work for you.
> >
> > Regards,
> >
> > -- George
> >
> > -----Original Message-----
> > From: Andy Berryman [mailto:topdev1@gmail.com]
> > Sent: Tuesday, October 31, 2006 1:25 PM
> > To: lucene-net-user@incubator.apache.org;
> > lucene-net-dev@incubator.apache.org
> > Subject: Question about query performance degredation
> >
> > I have a scenario where I'm seeing the performance (specifically 
> > time) of searches against my index degrade on a daily basis.  The 
> > amount of time it is taking to load the index is staying fairly 
> > constant however.  This is a fairly large index.  It has over a 
> > million documents
> in it.
> >
> > The scenario I have is that I'm maintaining the index from data in 
> > the database ... and I'm doing so on onstant basis.  So essentially 
> > as changes are made in the database I have a background task that 
> > updates
> the
> index.
> > So I'm supporting concurrent readers and writers on a constant basis 
> > throughout the day.  I'm NOT using compound files.  During my 
> > development and testing, the use of compound files caused a 
> > significant increase in Disk I/O usage and caused the maintenance of 
> > the index to take much longer.  As such ... I decided against them.
> >
> > My thoughts are that the reason the search is taking longer is 
> > because the index files are getting more and more "fragmented" over 
> > time because I'm not using the compound files.  And that's why the 
> > searches are taking longer.
> >
> > Thoughts?
> >
> > Thanks
> > Andy
> >
> >
>
>


Re: Question about query performance degredation

Posted by Andy Berryman <to...@gmail.com>.
I dont really have the ability to just perform the operation in my
"production" environment.  So what I did to test this theory out was copy
one of my indexes from "production" down to my local machine.  I then setup
a test app to do the following:

- run 10 identicle searches against the index and output the hit count and
search time for each
- optimize the index
- run the same 10 searches over again

And what I saw was pretty astounding.  The results improved by almost 60%
and the size of the index shrunk by about 50%.

So I'm gonna guess that fragmentation is the key factor here.  So what I
think that I'm going to end up doing is adding a step into my indexing
process to optimize the index once every couple of days.  That should give
me some pretty nice results without adding too much overall load to the
system.

Thanks for the guidance
Andy

On 11/1/06, George Aroush <ge...@aroush.net> wrote:
>
> Hi Andy,
>
> Yes, please, let us know how it goes when you optimize.  If that doesn't
> help, after optimizing, stop indexing for a bit.  Even a better stop the
> indexer application, and re-start the searcher.  I.e.: a reboot of your
> application with the indexer out of your way.
>
> Regards,
>
> -- George Aroush
>
> -----Original Message-----
> From: Andy Berryman [mailto:topdev1@gmail.com]
> Sent: Wednesday, November 01, 2006 9:29 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: Re: Question about query performance degredation
>
> I'm maintaining the index at a pretty constant rate throughout the day.
> Right now its possible that at least 1 document is getting updated every
> 10
> minutes.  (The background process I am using runs every 10 minutes to look
> for changes that need to be indexed.)
>
> I my specific case ... For a document that I need to "update" in the index
> ... I make a call to delete the document first and then I create a new
> document (with the updated info from the database) and add it into the
> index.
>
> As for optimizing ... Currently I am not making any calls to "Optimize()".
>
> So I guess your first suggestion would be to optimize the index and check
> the query performance after that?
>
> Thanks
> Andy
>
>
> On 10/31/06, George Aroush <ge...@aroush.net> wrote:
> >
> > Hi Andy,
> >
> > I believe you are on the right track, index fragmentation maybe your
> > issue.
> >
> > How frequently are you updating the index, vs. how frequently are you
> > optimizing it?  Is the update adding new documents vs. modifying
> > existing documents?
> >
> > If after optimizing you still don't get back the original performance,
> > stop indexing for a bit and see if search gets better.
> >
> > If fragmentation is your issue, I have some suggestions that may work
> > for you.
> >
> > Regards,
> >
> > -- George
> >
> > -----Original Message-----
> > From: Andy Berryman [mailto:topdev1@gmail.com]
> > Sent: Tuesday, October 31, 2006 1:25 PM
> > To: lucene-net-user@incubator.apache.org;
> > lucene-net-dev@incubator.apache.org
> > Subject: Question about query performance degredation
> >
> > I have a scenario where I'm seeing the performance (specifically time)
> > of searches against my index degrade on a daily basis.  The amount of
> > time it is taking to load the index is staying fairly constant
> > however.  This is a fairly large index.  It has over a million documents
> in it.
> >
> > The scenario I have is that I'm maintaining the index from data in the
> > database ... and I'm doing so on onstant basis.  So essentially as
> > changes are made in the database I have a background task that updates
> the
> index.
> > So I'm supporting concurrent readers and writers on a constant basis
> > throughout the day.  I'm NOT using compound files.  During my
> > development and testing, the use of compound files caused a
> > significant increase in Disk I/O usage and caused the maintenance of
> > the index to take much longer.  As such ... I decided against them.
> >
> > My thoughts are that the reason the search is taking longer is because
> > the index files are getting more and more "fragmented" over time
> > because I'm not using the compound files.  And that's why the searches
> > are taking longer.
> >
> > Thoughts?
> >
> > Thanks
> > Andy
> >
> >
>
>

RE: Question about query performance degredation

Posted by George Aroush <ge...@aroush.net>.
Hi Andy,

Yes, please, let us know how it goes when you optimize.  If that doesn't
help, after optimizing, stop indexing for a bit.  Even a better stop the
indexer application, and re-start the searcher.  I.e.: a reboot of your
application with the indexer out of your way.

Regards,

-- George Aroush

-----Original Message-----
From: Andy Berryman [mailto:topdev1@gmail.com] 
Sent: Wednesday, November 01, 2006 9:29 AM
To: lucene-net-dev@incubator.apache.org
Subject: Re: Question about query performance degredation

I'm maintaining the index at a pretty constant rate throughout the day.
Right now its possible that at least 1 document is getting updated every 10
minutes.  (The background process I am using runs every 10 minutes to look
for changes that need to be indexed.)

I my specific case ... For a document that I need to "update" in the index
... I make a call to delete the document first and then I create a new
document (with the updated info from the database) and add it into the
index.

As for optimizing ... Currently I am not making any calls to "Optimize()".

So I guess your first suggestion would be to optimize the index and check
the query performance after that?

Thanks
Andy


On 10/31/06, George Aroush <ge...@aroush.net> wrote:
>
> Hi Andy,
>
> I believe you are on the right track, index fragmentation maybe your 
> issue.
>
> How frequently are you updating the index, vs. how frequently are you 
> optimizing it?  Is the update adding new documents vs. modifying 
> existing documents?
>
> If after optimizing you still don't get back the original performance, 
> stop indexing for a bit and see if search gets better.
>
> If fragmentation is your issue, I have some suggestions that may work 
> for you.
>
> Regards,
>
> -- George
>
> -----Original Message-----
> From: Andy Berryman [mailto:topdev1@gmail.com]
> Sent: Tuesday, October 31, 2006 1:25 PM
> To: lucene-net-user@incubator.apache.org;
> lucene-net-dev@incubator.apache.org
> Subject: Question about query performance degredation
>
> I have a scenario where I'm seeing the performance (specifically time) 
> of searches against my index degrade on a daily basis.  The amount of 
> time it is taking to load the index is staying fairly constant 
> however.  This is a fairly large index.  It has over a million documents
in it.
>
> The scenario I have is that I'm maintaining the index from data in the 
> database ... and I'm doing so on onstant basis.  So essentially as 
> changes are made in the database I have a background task that updates the
index.
> So I'm supporting concurrent readers and writers on a constant basis 
> throughout the day.  I'm NOT using compound files.  During my 
> development and testing, the use of compound files caused a 
> significant increase in Disk I/O usage and caused the maintenance of 
> the index to take much longer.  As such ... I decided against them.
>
> My thoughts are that the reason the search is taking longer is because 
> the index files are getting more and more "fragmented" over time 
> because I'm not using the compound files.  And that's why the searches 
> are taking longer.
>
> Thoughts?
>
> Thanks
> Andy
>
>


Re: Question about query performance degredation

Posted by Andy Berryman <to...@gmail.com>.
I'm maintaining the index at a pretty constant rate throughout the day.
Right now its possible that at least 1 document is getting updated every 10
minutes.  (The background process I am using runs every 10 minutes to look
for changes that need to be indexed.)

I my specific case ... For a document that I need to "update" in the index
... I make a call to delete the document first and then I create a new
document (with the updated info from the database) and add it into the
index.

As for optimizing ... Currently I am not making any calls to "Optimize()".

So I guess your first suggestion would be to optimize the index and check
the query performance after that?

Thanks
Andy


On 10/31/06, George Aroush <ge...@aroush.net> wrote:
>
> Hi Andy,
>
> I believe you are on the right track, index fragmentation maybe your
> issue.
>
> How frequently are you updating the index, vs. how frequently are you
> optimizing it?  Is the update adding new documents vs. modifying existing
> documents?
>
> If after optimizing you still don't get back the original performance,
> stop
> indexing for a bit and see if search gets better.
>
> If fragmentation is your issue, I have some suggestions that may work for
> you.
>
> Regards,
>
> -- George
>
> -----Original Message-----
> From: Andy Berryman [mailto:topdev1@gmail.com]
> Sent: Tuesday, October 31, 2006 1:25 PM
> To: lucene-net-user@incubator.apache.org;
> lucene-net-dev@incubator.apache.org
> Subject: Question about query performance degredation
>
> I have a scenario where I'm seeing the performance (specifically time) of
> searches against my index degrade on a daily basis.  The amount of time it
> is taking to load the index is staying fairly constant however.  This is a
> fairly large index.  It has over a million documents in it.
>
> The scenario I have is that I'm maintaining the index from data in the
> database ... and I'm doing so on onstant basis.  So essentially as changes
> are made in the database I have a background task that updates the index.
> So I'm supporting concurrent readers and writers on a constant basis
> throughout the day.  I'm NOT using compound files.  During my development
> and testing, the use of compound files caused a significant increase in
> Disk
> I/O usage and caused the maintenance of the index to take much longer.  As
> such ... I decided against them.
>
> My thoughts are that the reason the search is taking longer is because the
> index files are getting more and more "fragmented" over time because I'm
> not
> using the compound files.  And that's why the searches are taking longer.
>
> Thoughts?
>
> Thanks
> Andy
>
>