You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Andreas Guther <An...@markettools.com> on 2007/05/11 18:44:45 UTC

IndexReader.isCurrent very slow in 2.1

I moved today from Lucene 2.0 to 2.1 and I noticed that the
IndexReader.isCurrent() call is very expensive.  What took 20
milliseconds in 2.0 now takes seconds in 2.1.

I have the following scenario:

- 7 index directories of different size, ranging from some MB to 5 GIG 
- Some index are upgraded to Lucene 2.1, some are still in the old
format, depending if an update happened or not
- Cached IndexSearcher for each index
- I was using the IndexSearcher's indexReader to check if changes
happened since the Searcher was chached

The isCurrent check takes between less than 10 millis up to 1400 millis,
depending on the folder.  The time needed seems not to be relevant to
the size of the index.

However, the isCurrent check is definitely too expensive.  

What can I do to get a faster information?

Andreas 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader.isCurrent very slow in 2.1

Posted by Chris Hostetter <ho...@fucit.org>.
: I am experiencing a same problem with some 40 segments. Chris, Do you have

do you have 40 segments, or do you have 40 files matching the glob
segement* .. there is a differnece (the "segment" files records the number
of segments, as of 2.1 they are versioned so they have names like
"segments_7" "segments_8" etc...

: any recommendation on the file system to use?

not really .. i just wanted to know which filesystem people noticing this
problem are using ... becuase it might make a differnece (since checking
currentness is esentially just a directory lookup followed by reading one
int from a file)

back in the day i remember being told that NFS volumns would have serious
performance issues doing any directory operations (list, create ne file,
dleete file) once they had ~4K files, but i have no idea if that's still
true.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader.isCurrent very slow in 2.1

Posted by jafarim <ja...@gmail.com>.
I am experiencing a same problem with some 40 segments. Chris, Do you have
any recommendation on the file system to use?

On 5/11/07, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> : Are there are large number of files in your index directory?
>
> and is there any correlation between the number files matching segment*
> and the time isCurrent taks?
>
> it would also be handy to know what filesystem you use as well ...
> directory listings may be more expensive on some filesystems then others.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: IndexReader.isCurrent very slow in 2.1

Posted by Andreas Guther <An...@markettools.com>.
We have everything on Windows NTFS.  Our index folders are on a server
and accessed via shared drive.

I haven't optimized the folders yet but after doing optimization on a
test folder I noticed that we have very little files left.  That might
help.

I am going to optimize all folders now and then will come back with more
information.

Andreas


-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Friday, May 11, 2007 11:03 AM
To: java-user@lucene.apache.org
Subject: Re: IndexReader.isCurrent very slow in 2.1


: Are there are large number of files in your index directory?

and is there any correlation between the number files matching segment*
and the time isCurrent taks?

it would also be handy to know what filesystem you use as well ...
directory listings may be more expensive on some filesystems then
others.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: IndexReader.isCurrent very slow in 2.1

Posted by Michael McCandless <lu...@mikemccandless.com>.
"Andreas Guther" <An...@markettools.com> wrote:

> I have optimized our index directories using the compound index format.
> I have also moved the index directories for testing purposes local to
> the search process (before it was over network and shared NTFS file
> system).
> 
> Now the time for getting the isCurrent information is negligible, i.e.
> less than 10 millis per call.

OK, glad to hear that.

Though I'm surprised that SMB/CIFS mount was so slow.  Is your file
server on the same LAN as your client?  Or is the file server heavily
shared with many clients?  Just want to gather data while we have your
attention :)  Thanks.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader.isCurrent very slow in 2.1

Posted by Andreas Guther <an...@gmail.com>.
I am currently converting a copy of our production data and then I am going
to run some tests against a test server.  This is the same server I started
with.  Once I have a better picture and a more complete and production like
test suite I will come back with my last conclusions.

Andreas



On 5/12/07, Erick Erickson <er...@gmail.com> wrote:
>
> Also, do you have a sense of this action it actually was that
> made the biggest difference? Or, if you have to move the files
> back out to the network, could you let us know if your times
> go back up?
>
> Or was it just the optimization?
>
> Thanks
> Erick
>
> On 5/11/07, Andreas Guther <An...@markettools.com> wrote:
> >
> > Chris,
> >
> > I have optimized our index directories using the compound index format.
> > I have also moved the index directories for testing purposes local to
> > the search process (before it was over network and shared NTFS file
> > system).
> >
> > Now the time for getting the isCurrent information is negligible, i.e.
> > less than 10 millis per call.
> >
> > Thanks for your input.
> >
> > Andreas
> >
> >
> >
> > -----Original Message-----
> > From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> > Sent: Friday, May 11, 2007 11:03 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: IndexReader.isCurrent very slow in 2.1
> >
> >
> > : Are there are large number of files in your index directory?
> >
> > and is there any correlation between the number files matching segment*
> > and the time isCurrent taks?
> >
> > it would also be handy to know what filesystem you use as well ...
> > directory listings may be more expensive on some filesystems then
> > others.
> >
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: IndexReader.isCurrent very slow in 2.1

Posted by Erick Erickson <er...@gmail.com>.
Also, do you have a sense of this action it actually was that
made the biggest difference? Or, if you have to move the files
back out to the network, could you let us know if your times
go back up?

Or was it just the optimization?

Thanks
Erick

On 5/11/07, Andreas Guther <An...@markettools.com> wrote:
>
> Chris,
>
> I have optimized our index directories using the compound index format.
> I have also moved the index directories for testing purposes local to
> the search process (before it was over network and shared NTFS file
> system).
>
> Now the time for getting the isCurrent information is negligible, i.e.
> less than 10 millis per call.
>
> Thanks for your input.
>
> Andreas
>
>
>
> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: Friday, May 11, 2007 11:03 AM
> To: java-user@lucene.apache.org
> Subject: Re: IndexReader.isCurrent very slow in 2.1
>
>
> : Are there are large number of files in your index directory?
>
> and is there any correlation between the number files matching segment*
> and the time isCurrent taks?
>
> it would also be handy to know what filesystem you use as well ...
> directory listings may be more expensive on some filesystems then
> others.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: IndexReader.isCurrent very slow in 2.1

Posted by Andreas Guther <An...@markettools.com>.
Chris,

I have optimized our index directories using the compound index format.
I have also moved the index directories for testing purposes local to
the search process (before it was over network and shared NTFS file
system).

Now the time for getting the isCurrent information is negligible, i.e.
less than 10 millis per call.

Thanks for your input.

Andreas



-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Friday, May 11, 2007 11:03 AM
To: java-user@lucene.apache.org
Subject: Re: IndexReader.isCurrent very slow in 2.1


: Are there are large number of files in your index directory?

and is there any correlation between the number files matching segment*
and the time isCurrent taks?

it would also be handy to know what filesystem you use as well ...
directory listings may be more expensive on some filesystems then
others.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader.isCurrent very slow in 2.1

Posted by Chris Hostetter <ho...@fucit.org>.
: Are there are large number of files in your index directory?

and is there any correlation between the number files matching segment*
and the time isCurrent taks?

it would also be handy to know what filesystem you use as well ...
directory listings may be more expensive on some filesystems then others.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader.isCurrent very slow in 2.1

Posted by Michael McCandless <lu...@mikemccandless.com>.
"Andreas Guther" <An...@markettools.com> wrote:
> I moved today from Lucene 2.0 to 2.1 and I noticed that the
> IndexReader.isCurrent() call is very expensive.  What took 20
> milliseconds in 2.0 now takes seconds in 2.1.
> 
> I have the following scenario:
> 
> - 7 index directories of different size, ranging from some MB to 5 GIG 
> - Some index are upgraded to Lucene 2.1, some are still in the old
> format, depending if an update happened or not
> - Cached IndexSearcher for each index
> - I was using the IndexSearcher's indexReader to check if changes
> happened since the Searcher was chached
> 
> The isCurrent check takes between less than 10 millis up to 1400 millis,
> depending on the folder.  The time needed seems not to be relevant to
> the size of the index.

Hmmm, that code did change in 2.1 as part of lockless commits
(LUCENE-701).

Previously we obtained the commit lock, opened the file "segments" and
read the version from that.

Now, we list the directory, locate the segments_N file with the
largest "N" (falling back to "segments" if it exists) and open that
one, and read the version.

Are you always using a 2.1 IndexReader to check isCurrent, and you are
finding that those indexes that have been updated with a 2.1
IndexWriter are the slow ones?

Or, are you comparing 2.0 isCurrent call with the 2.1 isCurrent call?

Are there are large number of files in your index directory?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org