You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Eran Sevi <er...@gmail.com> on 2009/11/12 18:05:40 UTC

IndexWriter is slow when reader is open

Hi,
I'm using Lucene.Net 2.4 and I just noticed that when I index documents
while there's at least one IndexReader open on that index (even without
doing anything), the indexing speed is slower by a factor of 3 to 5. When
closing the reader, the indexing speed goes back to normal.
I'm not doing any deletes, only adds.

 My index is going to be updated regularly and there's going to be a
reader/searcher in use almost all the time so this might be a big problem
for me.

Does anyone have a clue if this is normal behavior? why does it happen and
how can I avoid such a big loss in performance?


Thanks,
Eran.

Re: IndexWriter is slow when reader is open

Posted by Eran Sevi <er...@gmail.com>.
I've noticed that the slow down only happens when the reader created is
"ReadOnlyMultiSegmentReader".
When the index is fully optimized (thus the reader created is
"ReadOnlySegmentReader"), the writer that is opened afterwards still
functions at full speed.
Since most of the time the index is far from being optimized, this is still
a major problem.

I can only guess that it's because of locking issues. I'll continue to
research it and update if I find something new.

On Mon, Nov 16, 2009 at 3:07 PM, Eran Sevi <er...@gmail.com> wrote:

> I've tried to use it with read-only mode and it looks like it's even worse
> right now.
>
> I must admit that we're abusing the indexing a bit by commiting after each
> document addition, but still when there's no reader open, each document is
> indexed in about 30-50ms and when there's a read-only reader open then each
> document is indexed in about 150-500ms.
> Why should an open reader affect the commit process so deeply?
>
> I wonder if no one encountered this phenomena before.
>
>
> On Sat, Nov 14, 2009 at 8:27 PM, Matt Honeycutt <mb...@gmail.com>wrote:
>
>> 2.4 does indeed support read-only mode. I don't know how much it will
>> help, but I would definitely try it.
>>
>> On 11/14/09, Eran Sevi <er...@gmail.com> wrote:
>> > I'm still using version 2.4 so I think there's still no read only mode.
>> > Is there no other way to prevent this slow down in previous versions?
>> >
>> > Eran.
>> >
>> > On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski
>> > <mg...@myspace-inc.com>wrote:
>> >
>> >> Eran,
>> >>
>> >> What version of Lucene are you using?  Are you opening the IndexReader
>> >> in read-only mode?
>> >>
>> >> Michael
>> >>
>> >> -----Original Message-----
>> >> From: Eran Sevi [mailto:eransevi@gmail.com]
>> >> Sent: Thursday, November 12, 2009 9:06 AM
>> >> To: lucene-net-user@incubator.apache.org
>> >> Subject: IndexWriter is slow when reader is open
>> >>
>> >> Hi,
>> >> I'm using Lucene.Net 2.4 and I just noticed that when I index documents
>> >> while there's at least one IndexReader open on that index (even without
>> >> doing anything), the indexing speed is slower by a factor of 3 to 5.
>> >> When
>> >> closing the reader, the indexing speed goes back to normal.
>> >> I'm not doing any deletes, only adds.
>> >>
>> >>  My index is going to be updated regularly and there's going to be a
>> >> reader/searcher in use almost all the time so this might be a big
>> >> problem
>> >> for me.
>> >>
>> >> Does anyone have a clue if this is normal behavior? why does it happen
>> >> and
>> >> how can I avoid such a big loss in performance?
>> >>
>> >>
>> >> Thanks,
>> >> Eran.
>> >>
>> >>
>> >
>>
>
>

RE: IndexWriter is slow when reader is open

Posted by Michael Garski <mg...@myspace-inc.com>.
Eran,

The transactional functionality can rollback changes to an index should
something happen during a commit.  Refer to the methods PrepareCommit &
Rollback.  You would have to implement your own logic to re-process any
changes that were rolled back.

Michael

-----Original Message-----
From: Eran Sevi [mailto:eransevi@gmail.com] 
Sent: Tuesday, November 17, 2009 9:30 AM
To: lucene-net-user@incubator.apache.org
Subject: Re: IndexWriter is slow when reader is open

Thanks Michael for the detailed explanation.It's much more clearer now.

By "transactional capabilities" do you mean that if in the middle of a
commit something happens, it is guaranteed that either all the data
added
from the last commit is in index or all the data is discarded?

We have a steady stream of documents for indexing coming in
(unfortunately
only one at a time, but at a rate of up to 50 per second) and I hoped I
could guarantee that when the add method returns, the document is
secured on
disk. We keep a status for each document in our DB and want to discard
the
original data.

We'll just have to hang on to the original data until each commit has
finished and in case of a crash or error reindex the original data.

Eran.

On Tue, Nov 17, 2009 at 5:59 PM, Michael Garski
<mg...@myspace-inc.com>wrote:

> Eran,
>
> Make no mistake, the poor performance you are experiencing is due to
> calling commit on every document addition and not due to internal
'coding by
> exception'.  There are transactional capabilities of Lucene that will
ensure
> that your documents are added and persisted to disk.  Check out the
> IndexWriter documentation for more information.
>
> The only 'connection' between the reader and the writer are the files
on
> disk.  The writer writes them once, they are not updated, and the
reader
> holds a reference to the file to ensure it is not deleted out from
> underneath it as it still needs to read from it to perform searches.
>
> During a commit, all of your changes are written to disk and any
necessary
> segment merges take place, which leaves the older segments that were
merged
> together as 'orphans' that are no longer referenced by the segments
file and
> are cleaned up during the final stage of the commit process after all
of the
> new segments have been written.  An attempt is made to then clean up
the
> older segments that are no longer necessary, which will fail as your
reader
> still has them open.  It fails gracefully in that the file names are
> persisted internally to attempt to delete again later, hopefully after
the
> reader has been reopened and a reference to the orphaned files is no
longer
> being held.
>
> I suggest you step through the commit process in a debugger or use a
> profiler to demonstrate this issue.
>
> Michael
>
>
>
> -----Original Message-----
> From: Eran Sevi [mailto:eransevi@gmail.com]
> Sent: Tue 11/17/2009 4:55 AM
> To: lucene-net-user@incubator.apache.org
> Subject: Re: IndexWriter is slow when reader is open
>
> Michael,
> Thanks for the answer.
>
> I thought the reader was less connected to the writer. Basically what
your
> saying is that as long as at least one reader is open, exceptions are
> thrown
> when trying to commit changes (or more accurately, when trying to
merge
> segments) ?
> Can you point me to the place in the source code where that happens?
>
> What happens to the new documents that were added? are they still
saved in
> another segments?
>
> It's very important to us to make sure every document is persistent in
the
> index so working in batches could be a problem.
> But if there's a way to save each added document to disk without
merging
> the
> segment with older segments, this can solve our problem. And since the
> reader can't see the new segments anyway until it's reopened, I don't
see a
> problem continuing writing documents to new segments without
performing a
> merge. I'll try to change the merge policy/scheduler and see what
happens.
>
> Anyway, coding by exception is quite bad practice. Since we're
following
> the
> java versions I guess it'll take time to be able to change that.
>
> Eran.
>
> On Mon, Nov 16, 2009 at 8:56 PM, Michael Garski
<mgarski@myspace-inc.com
> >wrote:
>
> > Eran,
> >
> > The root cause of the issue is due to calling commit after every
document
> > addition while having a reader open.  Calls to commit should be
batched
> up -
> > we frequently use batches of 100 or 1000 between commits.
> >
> > This is by design within Lucene.  Adding documents will cause
segments to
> > merge and the writer will then delete the older segments that have
been
> > merged together to create a new one, however with an open reader the
> writer
> > will not be able to delete the older segment due to a file lock held
by
> the
> > reader.  On the call to delete the file an exception is thrown and
> swallowed
> > internally and the name of the file that the delete was attempted
upon is
> > added to a list of files that can be deleted on another call.
> >
> > I suggest you refrain from calling commit so often, as that is why
you
> are
> > experiencing performance issues.
> >
> > Michael
> >
> >
> > -----Original Message-----
> > From: Eran Sevi [mailto:eransevi@gmail.com]
> > Sent: Mon 11/16/2009 5:07 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: Re: IndexWriter is slow when reader is open
> >
> > I've tried to use it with read-only mode and it looks like it's even
> worse
> > right now.
> >
> > I must admit that we're abusing the indexing a bit by commiting
after
> each
> > document addition, but still when there's no reader open, each
document
> is
> > indexed in about 30-50ms and when there's a read-only reader open
then
> each
> > document is indexed in about 150-500ms.
> > Why should an open reader affect the commit process so deeply?
> >
> > I wonder if no one encountered this phenomena before.
> >
> >
> > On Sat, Nov 14, 2009 at 8:27 PM, Matt Honeycutt
<mbhoneycutt@gmail.com
> > >wrote:
> >
> > > 2.4 does indeed support read-only mode. I don't know how much it
will
> > > help, but I would definitely try it.
> > >
> > > On 11/14/09, Eran Sevi <er...@gmail.com> wrote:
> > > > I'm still using version 2.4 so I think there's still no read
only
> mode.
> > > > Is there no other way to prevent this slow down in previous
versions?
> > > >
> > > > Eran.
> > > >
> > > > On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski
> > > > <mg...@myspace-inc.com>wrote:
> > > >
> > > >> Eran,
> > > >>
> > > >> What version of Lucene are you using?  Are you opening the
> IndexReader
> > > >> in read-only mode?
> > > >>
> > > >> Michael
> > > >>
> > > >> -----Original Message-----
> > > >> From: Eran Sevi [mailto:eransevi@gmail.com]
> > > >> Sent: Thursday, November 12, 2009 9:06 AM
> > > >> To: lucene-net-user@incubator.apache.org
> > > >> Subject: IndexWriter is slow when reader is open
> > > >>
> > > >> Hi,
> > > >> I'm using Lucene.Net 2.4 and I just noticed that when I index
> > documents
> > > >> while there's at least one IndexReader open on that index (even
> > without
> > > >> doing anything), the indexing speed is slower by a factor of 3
to 5.
> > > >> When
> > > >> closing the reader, the indexing speed goes back to normal.
> > > >> I'm not doing any deletes, only adds.
> > > >>
> > > >>  My index is going to be updated regularly and there's going to
be a
> > > >> reader/searcher in use almost all the time so this might be a
big
> > > >> problem
> > > >> for me.
> > > >>
> > > >> Does anyone have a clue if this is normal behavior? why does it
> happen
> > > >> and
> > > >> how can I avoid such a big loss in performance?
> > > >>
> > > >>
> > > >> Thanks,
> > > >> Eran.
> > > >>
> > > >>
> > > >
> > >
> >
> >
> >
>
>
>


Re: IndexWriter is slow when reader is open

Posted by Eran Sevi <er...@gmail.com>.
Thanks Michael for the detailed explanation.It's much more clearer now.

By "transactional capabilities" do you mean that if in the middle of a
commit something happens, it is guaranteed that either all the data added
from the last commit is in index or all the data is discarded?

We have a steady stream of documents for indexing coming in (unfortunately
only one at a time, but at a rate of up to 50 per second) and I hoped I
could guarantee that when the add method returns, the document is secured on
disk. We keep a status for each document in our DB and want to discard the
original data.

We'll just have to hang on to the original data until each commit has
finished and in case of a crash or error reindex the original data.

Eran.

On Tue, Nov 17, 2009 at 5:59 PM, Michael Garski <mg...@myspace-inc.com>wrote:

> Eran,
>
> Make no mistake, the poor performance you are experiencing is due to
> calling commit on every document addition and not due to internal 'coding by
> exception'.  There are transactional capabilities of Lucene that will ensure
> that your documents are added and persisted to disk.  Check out the
> IndexWriter documentation for more information.
>
> The only 'connection' between the reader and the writer are the files on
> disk.  The writer writes them once, they are not updated, and the reader
> holds a reference to the file to ensure it is not deleted out from
> underneath it as it still needs to read from it to perform searches.
>
> During a commit, all of your changes are written to disk and any necessary
> segment merges take place, which leaves the older segments that were merged
> together as 'orphans' that are no longer referenced by the segments file and
> are cleaned up during the final stage of the commit process after all of the
> new segments have been written.  An attempt is made to then clean up the
> older segments that are no longer necessary, which will fail as your reader
> still has them open.  It fails gracefully in that the file names are
> persisted internally to attempt to delete again later, hopefully after the
> reader has been reopened and a reference to the orphaned files is no longer
> being held.
>
> I suggest you step through the commit process in a debugger or use a
> profiler to demonstrate this issue.
>
> Michael
>
>
>
> -----Original Message-----
> From: Eran Sevi [mailto:eransevi@gmail.com]
> Sent: Tue 11/17/2009 4:55 AM
> To: lucene-net-user@incubator.apache.org
> Subject: Re: IndexWriter is slow when reader is open
>
> Michael,
> Thanks for the answer.
>
> I thought the reader was less connected to the writer. Basically what your
> saying is that as long as at least one reader is open, exceptions are
> thrown
> when trying to commit changes (or more accurately, when trying to merge
> segments) ?
> Can you point me to the place in the source code where that happens?
>
> What happens to the new documents that were added? are they still saved in
> another segments?
>
> It's very important to us to make sure every document is persistent in the
> index so working in batches could be a problem.
> But if there's a way to save each added document to disk without merging
> the
> segment with older segments, this can solve our problem. And since the
> reader can't see the new segments anyway until it's reopened, I don't see a
> problem continuing writing documents to new segments without performing a
> merge. I'll try to change the merge policy/scheduler and see what happens.
>
> Anyway, coding by exception is quite bad practice. Since we're following
> the
> java versions I guess it'll take time to be able to change that.
>
> Eran.
>
> On Mon, Nov 16, 2009 at 8:56 PM, Michael Garski <mgarski@myspace-inc.com
> >wrote:
>
> > Eran,
> >
> > The root cause of the issue is due to calling commit after every document
> > addition while having a reader open.  Calls to commit should be batched
> up -
> > we frequently use batches of 100 or 1000 between commits.
> >
> > This is by design within Lucene.  Adding documents will cause segments to
> > merge and the writer will then delete the older segments that have been
> > merged together to create a new one, however with an open reader the
> writer
> > will not be able to delete the older segment due to a file lock held by
> the
> > reader.  On the call to delete the file an exception is thrown and
> swallowed
> > internally and the name of the file that the delete was attempted upon is
> > added to a list of files that can be deleted on another call.
> >
> > I suggest you refrain from calling commit so often, as that is why you
> are
> > experiencing performance issues.
> >
> > Michael
> >
> >
> > -----Original Message-----
> > From: Eran Sevi [mailto:eransevi@gmail.com]
> > Sent: Mon 11/16/2009 5:07 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: Re: IndexWriter is slow when reader is open
> >
> > I've tried to use it with read-only mode and it looks like it's even
> worse
> > right now.
> >
> > I must admit that we're abusing the indexing a bit by commiting after
> each
> > document addition, but still when there's no reader open, each document
> is
> > indexed in about 30-50ms and when there's a read-only reader open then
> each
> > document is indexed in about 150-500ms.
> > Why should an open reader affect the commit process so deeply?
> >
> > I wonder if no one encountered this phenomena before.
> >
> >
> > On Sat, Nov 14, 2009 at 8:27 PM, Matt Honeycutt <mbhoneycutt@gmail.com
> > >wrote:
> >
> > > 2.4 does indeed support read-only mode. I don't know how much it will
> > > help, but I would definitely try it.
> > >
> > > On 11/14/09, Eran Sevi <er...@gmail.com> wrote:
> > > > I'm still using version 2.4 so I think there's still no read only
> mode.
> > > > Is there no other way to prevent this slow down in previous versions?
> > > >
> > > > Eran.
> > > >
> > > > On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski
> > > > <mg...@myspace-inc.com>wrote:
> > > >
> > > >> Eran,
> > > >>
> > > >> What version of Lucene are you using?  Are you opening the
> IndexReader
> > > >> in read-only mode?
> > > >>
> > > >> Michael
> > > >>
> > > >> -----Original Message-----
> > > >> From: Eran Sevi [mailto:eransevi@gmail.com]
> > > >> Sent: Thursday, November 12, 2009 9:06 AM
> > > >> To: lucene-net-user@incubator.apache.org
> > > >> Subject: IndexWriter is slow when reader is open
> > > >>
> > > >> Hi,
> > > >> I'm using Lucene.Net 2.4 and I just noticed that when I index
> > documents
> > > >> while there's at least one IndexReader open on that index (even
> > without
> > > >> doing anything), the indexing speed is slower by a factor of 3 to 5.
> > > >> When
> > > >> closing the reader, the indexing speed goes back to normal.
> > > >> I'm not doing any deletes, only adds.
> > > >>
> > > >>  My index is going to be updated regularly and there's going to be a
> > > >> reader/searcher in use almost all the time so this might be a big
> > > >> problem
> > > >> for me.
> > > >>
> > > >> Does anyone have a clue if this is normal behavior? why does it
> happen
> > > >> and
> > > >> how can I avoid such a big loss in performance?
> > > >>
> > > >>
> > > >> Thanks,
> > > >> Eran.
> > > >>
> > > >>
> > > >
> > >
> >
> >
> >
>
>
>

RE: IndexWriter is slow when reader is open

Posted by Michael Garski <mg...@myspace-inc.com>.
Eran,

Make no mistake, the poor performance you are experiencing is due to calling commit on every document addition and not due to internal 'coding by exception'.  There are transactional capabilities of Lucene that will ensure that your documents are added and persisted to disk.  Check out the IndexWriter documentation for more information.

The only 'connection' between the reader and the writer are the files on disk.  The writer writes them once, they are not updated, and the reader holds a reference to the file to ensure it is not deleted out from underneath it as it still needs to read from it to perform searches.

During a commit, all of your changes are written to disk and any necessary segment merges take place, which leaves the older segments that were merged together as 'orphans' that are no longer referenced by the segments file and are cleaned up during the final stage of the commit process after all of the new segments have been written.  An attempt is made to then clean up the older segments that are no longer necessary, which will fail as your reader still has them open.  It fails gracefully in that the file names are persisted internally to attempt to delete again later, hopefully after the reader has been reopened and a reference to the orphaned files is no longer being held.

I suggest you step through the commit process in a debugger or use a profiler to demonstrate this issue. 

Michael



-----Original Message-----
From: Eran Sevi [mailto:eransevi@gmail.com]
Sent: Tue 11/17/2009 4:55 AM
To: lucene-net-user@incubator.apache.org
Subject: Re: IndexWriter is slow when reader is open
 
Michael,
Thanks for the answer.

I thought the reader was less connected to the writer. Basically what your
saying is that as long as at least one reader is open, exceptions are thrown
when trying to commit changes (or more accurately, when trying to merge
segments) ?
Can you point me to the place in the source code where that happens?

What happens to the new documents that were added? are they still saved in
another segments?

It's very important to us to make sure every document is persistent in the
index so working in batches could be a problem.
But if there's a way to save each added document to disk without merging the
segment with older segments, this can solve our problem. And since the
reader can't see the new segments anyway until it's reopened, I don't see a
problem continuing writing documents to new segments without performing a
merge. I'll try to change the merge policy/scheduler and see what happens.

Anyway, coding by exception is quite bad practice. Since we're following the
java versions I guess it'll take time to be able to change that.

Eran.

On Mon, Nov 16, 2009 at 8:56 PM, Michael Garski <mg...@myspace-inc.com>wrote:

> Eran,
>
> The root cause of the issue is due to calling commit after every document
> addition while having a reader open.  Calls to commit should be batched up -
> we frequently use batches of 100 or 1000 between commits.
>
> This is by design within Lucene.  Adding documents will cause segments to
> merge and the writer will then delete the older segments that have been
> merged together to create a new one, however with an open reader the writer
> will not be able to delete the older segment due to a file lock held by the
> reader.  On the call to delete the file an exception is thrown and swallowed
> internally and the name of the file that the delete was attempted upon is
> added to a list of files that can be deleted on another call.
>
> I suggest you refrain from calling commit so often, as that is why you are
> experiencing performance issues.
>
> Michael
>
>
> -----Original Message-----
> From: Eran Sevi [mailto:eransevi@gmail.com]
> Sent: Mon 11/16/2009 5:07 AM
> To: lucene-net-user@incubator.apache.org
> Subject: Re: IndexWriter is slow when reader is open
>
> I've tried to use it with read-only mode and it looks like it's even worse
> right now.
>
> I must admit that we're abusing the indexing a bit by commiting after each
> document addition, but still when there's no reader open, each document is
> indexed in about 30-50ms and when there's a read-only reader open then each
> document is indexed in about 150-500ms.
> Why should an open reader affect the commit process so deeply?
>
> I wonder if no one encountered this phenomena before.
>
>
> On Sat, Nov 14, 2009 at 8:27 PM, Matt Honeycutt <mbhoneycutt@gmail.com
> >wrote:
>
> > 2.4 does indeed support read-only mode. I don't know how much it will
> > help, but I would definitely try it.
> >
> > On 11/14/09, Eran Sevi <er...@gmail.com> wrote:
> > > I'm still using version 2.4 so I think there's still no read only mode.
> > > Is there no other way to prevent this slow down in previous versions?
> > >
> > > Eran.
> > >
> > > On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski
> > > <mg...@myspace-inc.com>wrote:
> > >
> > >> Eran,
> > >>
> > >> What version of Lucene are you using?  Are you opening the IndexReader
> > >> in read-only mode?
> > >>
> > >> Michael
> > >>
> > >> -----Original Message-----
> > >> From: Eran Sevi [mailto:eransevi@gmail.com]
> > >> Sent: Thursday, November 12, 2009 9:06 AM
> > >> To: lucene-net-user@incubator.apache.org
> > >> Subject: IndexWriter is slow when reader is open
> > >>
> > >> Hi,
> > >> I'm using Lucene.Net 2.4 and I just noticed that when I index
> documents
> > >> while there's at least one IndexReader open on that index (even
> without
> > >> doing anything), the indexing speed is slower by a factor of 3 to 5.
> > >> When
> > >> closing the reader, the indexing speed goes back to normal.
> > >> I'm not doing any deletes, only adds.
> > >>
> > >>  My index is going to be updated regularly and there's going to be a
> > >> reader/searcher in use almost all the time so this might be a big
> > >> problem
> > >> for me.
> > >>
> > >> Does anyone have a clue if this is normal behavior? why does it happen
> > >> and
> > >> how can I avoid such a big loss in performance?
> > >>
> > >>
> > >> Thanks,
> > >> Eran.
> > >>
> > >>
> > >
> >
>
>
>

 

Re: IndexWriter is slow when reader is open

Posted by Eran Sevi <er...@gmail.com>.
Michael,
Thanks for the answer.

I thought the reader was less connected to the writer. Basically what your
saying is that as long as at least one reader is open, exceptions are thrown
when trying to commit changes (or more accurately, when trying to merge
segments) ?
Can you point me to the place in the source code where that happens?

What happens to the new documents that were added? are they still saved in
another segments?

It's very important to us to make sure every document is persistent in the
index so working in batches could be a problem.
But if there's a way to save each added document to disk without merging the
segment with older segments, this can solve our problem. And since the
reader can't see the new segments anyway until it's reopened, I don't see a
problem continuing writing documents to new segments without performing a
merge. I'll try to change the merge policy/scheduler and see what happens.

Anyway, coding by exception is quite bad practice. Since we're following the
java versions I guess it'll take time to be able to change that.

Eran.

On Mon, Nov 16, 2009 at 8:56 PM, Michael Garski <mg...@myspace-inc.com>wrote:

> Eran,
>
> The root cause of the issue is due to calling commit after every document
> addition while having a reader open.  Calls to commit should be batched up -
> we frequently use batches of 100 or 1000 between commits.
>
> This is by design within Lucene.  Adding documents will cause segments to
> merge and the writer will then delete the older segments that have been
> merged together to create a new one, however with an open reader the writer
> will not be able to delete the older segment due to a file lock held by the
> reader.  On the call to delete the file an exception is thrown and swallowed
> internally and the name of the file that the delete was attempted upon is
> added to a list of files that can be deleted on another call.
>
> I suggest you refrain from calling commit so often, as that is why you are
> experiencing performance issues.
>
> Michael
>
>
> -----Original Message-----
> From: Eran Sevi [mailto:eransevi@gmail.com]
> Sent: Mon 11/16/2009 5:07 AM
> To: lucene-net-user@incubator.apache.org
> Subject: Re: IndexWriter is slow when reader is open
>
> I've tried to use it with read-only mode and it looks like it's even worse
> right now.
>
> I must admit that we're abusing the indexing a bit by commiting after each
> document addition, but still when there's no reader open, each document is
> indexed in about 30-50ms and when there's a read-only reader open then each
> document is indexed in about 150-500ms.
> Why should an open reader affect the commit process so deeply?
>
> I wonder if no one encountered this phenomena before.
>
>
> On Sat, Nov 14, 2009 at 8:27 PM, Matt Honeycutt <mbhoneycutt@gmail.com
> >wrote:
>
> > 2.4 does indeed support read-only mode. I don't know how much it will
> > help, but I would definitely try it.
> >
> > On 11/14/09, Eran Sevi <er...@gmail.com> wrote:
> > > I'm still using version 2.4 so I think there's still no read only mode.
> > > Is there no other way to prevent this slow down in previous versions?
> > >
> > > Eran.
> > >
> > > On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski
> > > <mg...@myspace-inc.com>wrote:
> > >
> > >> Eran,
> > >>
> > >> What version of Lucene are you using?  Are you opening the IndexReader
> > >> in read-only mode?
> > >>
> > >> Michael
> > >>
> > >> -----Original Message-----
> > >> From: Eran Sevi [mailto:eransevi@gmail.com]
> > >> Sent: Thursday, November 12, 2009 9:06 AM
> > >> To: lucene-net-user@incubator.apache.org
> > >> Subject: IndexWriter is slow when reader is open
> > >>
> > >> Hi,
> > >> I'm using Lucene.Net 2.4 and I just noticed that when I index
> documents
> > >> while there's at least one IndexReader open on that index (even
> without
> > >> doing anything), the indexing speed is slower by a factor of 3 to 5.
> > >> When
> > >> closing the reader, the indexing speed goes back to normal.
> > >> I'm not doing any deletes, only adds.
> > >>
> > >>  My index is going to be updated regularly and there's going to be a
> > >> reader/searcher in use almost all the time so this might be a big
> > >> problem
> > >> for me.
> > >>
> > >> Does anyone have a clue if this is normal behavior? why does it happen
> > >> and
> > >> how can I avoid such a big loss in performance?
> > >>
> > >>
> > >> Thanks,
> > >> Eran.
> > >>
> > >>
> > >
> >
>
>
>

RE: IndexWriter is slow when reader is open

Posted by Michael Garski <mg...@myspace-inc.com>.
Eran,

The root cause of the issue is due to calling commit after every document addition while having a reader open.  Calls to commit should be batched up - we frequently use batches of 100 or 1000 between commits.

This is by design within Lucene.  Adding documents will cause segments to merge and the writer will then delete the older segments that have been merged together to create a new one, however with an open reader the writer will not be able to delete the older segment due to a file lock held by the reader.  On the call to delete the file an exception is thrown and swallowed internally and the name of the file that the delete was attempted upon is added to a list of files that can be deleted on another call.

I suggest you refrain from calling commit so often, as that is why you are experiencing performance issues.

Michael


-----Original Message-----
From: Eran Sevi [mailto:eransevi@gmail.com]
Sent: Mon 11/16/2009 5:07 AM
To: lucene-net-user@incubator.apache.org
Subject: Re: IndexWriter is slow when reader is open
 
I've tried to use it with read-only mode and it looks like it's even worse
right now.

I must admit that we're abusing the indexing a bit by commiting after each
document addition, but still when there's no reader open, each document is
indexed in about 30-50ms and when there's a read-only reader open then each
document is indexed in about 150-500ms.
Why should an open reader affect the commit process so deeply?

I wonder if no one encountered this phenomena before.


On Sat, Nov 14, 2009 at 8:27 PM, Matt Honeycutt <mb...@gmail.com>wrote:

> 2.4 does indeed support read-only mode. I don't know how much it will
> help, but I would definitely try it.
>
> On 11/14/09, Eran Sevi <er...@gmail.com> wrote:
> > I'm still using version 2.4 so I think there's still no read only mode.
> > Is there no other way to prevent this slow down in previous versions?
> >
> > Eran.
> >
> > On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski
> > <mg...@myspace-inc.com>wrote:
> >
> >> Eran,
> >>
> >> What version of Lucene are you using?  Are you opening the IndexReader
> >> in read-only mode?
> >>
> >> Michael
> >>
> >> -----Original Message-----
> >> From: Eran Sevi [mailto:eransevi@gmail.com]
> >> Sent: Thursday, November 12, 2009 9:06 AM
> >> To: lucene-net-user@incubator.apache.org
> >> Subject: IndexWriter is slow when reader is open
> >>
> >> Hi,
> >> I'm using Lucene.Net 2.4 and I just noticed that when I index documents
> >> while there's at least one IndexReader open on that index (even without
> >> doing anything), the indexing speed is slower by a factor of 3 to 5.
> >> When
> >> closing the reader, the indexing speed goes back to normal.
> >> I'm not doing any deletes, only adds.
> >>
> >>  My index is going to be updated regularly and there's going to be a
> >> reader/searcher in use almost all the time so this might be a big
> >> problem
> >> for me.
> >>
> >> Does anyone have a clue if this is normal behavior? why does it happen
> >> and
> >> how can I avoid such a big loss in performance?
> >>
> >>
> >> Thanks,
> >> Eran.
> >>
> >>
> >
>

 

Re: IndexWriter is slow when reader is open

Posted by Eran Sevi <er...@gmail.com>.
I've tried to use it with read-only mode and it looks like it's even worse
right now.

I must admit that we're abusing the indexing a bit by commiting after each
document addition, but still when there's no reader open, each document is
indexed in about 30-50ms and when there's a read-only reader open then each
document is indexed in about 150-500ms.
Why should an open reader affect the commit process so deeply?

I wonder if no one encountered this phenomena before.


On Sat, Nov 14, 2009 at 8:27 PM, Matt Honeycutt <mb...@gmail.com>wrote:

> 2.4 does indeed support read-only mode. I don't know how much it will
> help, but I would definitely try it.
>
> On 11/14/09, Eran Sevi <er...@gmail.com> wrote:
> > I'm still using version 2.4 so I think there's still no read only mode.
> > Is there no other way to prevent this slow down in previous versions?
> >
> > Eran.
> >
> > On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski
> > <mg...@myspace-inc.com>wrote:
> >
> >> Eran,
> >>
> >> What version of Lucene are you using?  Are you opening the IndexReader
> >> in read-only mode?
> >>
> >> Michael
> >>
> >> -----Original Message-----
> >> From: Eran Sevi [mailto:eransevi@gmail.com]
> >> Sent: Thursday, November 12, 2009 9:06 AM
> >> To: lucene-net-user@incubator.apache.org
> >> Subject: IndexWriter is slow when reader is open
> >>
> >> Hi,
> >> I'm using Lucene.Net 2.4 and I just noticed that when I index documents
> >> while there's at least one IndexReader open on that index (even without
> >> doing anything), the indexing speed is slower by a factor of 3 to 5.
> >> When
> >> closing the reader, the indexing speed goes back to normal.
> >> I'm not doing any deletes, only adds.
> >>
> >>  My index is going to be updated regularly and there's going to be a
> >> reader/searcher in use almost all the time so this might be a big
> >> problem
> >> for me.
> >>
> >> Does anyone have a clue if this is normal behavior? why does it happen
> >> and
> >> how can I avoid such a big loss in performance?
> >>
> >>
> >> Thanks,
> >> Eran.
> >>
> >>
> >
>

Re: IndexWriter is slow when reader is open

Posted by Matt Honeycutt <mb...@gmail.com>.
2.4 does indeed support read-only mode. I don't know how much it will
help, but I would definitely try it.

On 11/14/09, Eran Sevi <er...@gmail.com> wrote:
> I'm still using version 2.4 so I think there's still no read only mode.
> Is there no other way to prevent this slow down in previous versions?
>
> Eran.
>
> On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski
> <mg...@myspace-inc.com>wrote:
>
>> Eran,
>>
>> What version of Lucene are you using?  Are you opening the IndexReader
>> in read-only mode?
>>
>> Michael
>>
>> -----Original Message-----
>> From: Eran Sevi [mailto:eransevi@gmail.com]
>> Sent: Thursday, November 12, 2009 9:06 AM
>> To: lucene-net-user@incubator.apache.org
>> Subject: IndexWriter is slow when reader is open
>>
>> Hi,
>> I'm using Lucene.Net 2.4 and I just noticed that when I index documents
>> while there's at least one IndexReader open on that index (even without
>> doing anything), the indexing speed is slower by a factor of 3 to 5.
>> When
>> closing the reader, the indexing speed goes back to normal.
>> I'm not doing any deletes, only adds.
>>
>>  My index is going to be updated regularly and there's going to be a
>> reader/searcher in use almost all the time so this might be a big
>> problem
>> for me.
>>
>> Does anyone have a clue if this is normal behavior? why does it happen
>> and
>> how can I avoid such a big loss in performance?
>>
>>
>> Thanks,
>> Eran.
>>
>>
>

Re: IndexWriter is slow when reader is open

Posted by Eran Sevi <er...@gmail.com>.
I'm still using version 2.4 so I think there's still no read only mode.
Is there no other way to prevent this slow down in previous versions?

Eran.

On Thu, Nov 12, 2009 at 8:16 PM, Michael Garski <mg...@myspace-inc.com>wrote:

> Eran,
>
> What version of Lucene are you using?  Are you opening the IndexReader
> in read-only mode?
>
> Michael
>
> -----Original Message-----
> From: Eran Sevi [mailto:eransevi@gmail.com]
> Sent: Thursday, November 12, 2009 9:06 AM
> To: lucene-net-user@incubator.apache.org
> Subject: IndexWriter is slow when reader is open
>
> Hi,
> I'm using Lucene.Net 2.4 and I just noticed that when I index documents
> while there's at least one IndexReader open on that index (even without
> doing anything), the indexing speed is slower by a factor of 3 to 5.
> When
> closing the reader, the indexing speed goes back to normal.
> I'm not doing any deletes, only adds.
>
>  My index is going to be updated regularly and there's going to be a
> reader/searcher in use almost all the time so this might be a big
> problem
> for me.
>
> Does anyone have a clue if this is normal behavior? why does it happen
> and
> how can I avoid such a big loss in performance?
>
>
> Thanks,
> Eran.
>
>

RE: IndexWriter is slow when reader is open

Posted by Michael Garski <mg...@myspace-inc.com>.
Eran,

What version of Lucene are you using?  Are you opening the IndexReader
in read-only mode?

Michael

-----Original Message-----
From: Eran Sevi [mailto:eransevi@gmail.com] 
Sent: Thursday, November 12, 2009 9:06 AM
To: lucene-net-user@incubator.apache.org
Subject: IndexWriter is slow when reader is open

Hi,
I'm using Lucene.Net 2.4 and I just noticed that when I index documents
while there's at least one IndexReader open on that index (even without
doing anything), the indexing speed is slower by a factor of 3 to 5.
When
closing the reader, the indexing speed goes back to normal.
I'm not doing any deletes, only adds.

 My index is going to be updated regularly and there's going to be a
reader/searcher in use almost all the time so this might be a big
problem
for me.

Does anyone have a clue if this is normal behavior? why does it happen
and
how can I avoid such a big loss in performance?


Thanks,
Eran.