You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Liviu Matei <li...@gmail.com> on 2014/05/14 11:05:33 UTC

Issue with Lucene 3.6.1 and MMapDirectory

Hi,

I am encountering the following issue with Lucene 3.6.1 if you could let me
know if I am doing something wrong / there is a mistake I am making it
would be great.

In order to improve the performance of the application that I am working at
I went to the approach of reusing the IndexReader and reopening it every 8
hours in order to get the latest changes. (IndexReader is declared as a
global static variable). The search method is called from multiple threads
in parallel so the index reader is shared between threads. Now if I don't
close the old index reader I am noticing increases of virtual memory with
every new reindex reopen (this should not be an issue on 64 bit Linux
correct - this is the configuration I am using and the indexes are on a
shared mount NTFS file system ).
   Also from time to times I noticed JVM crasches with the following stack:
 Thread Stack Trace:
    at memcpy+160()@0x381aa7b060
    -- Java stack --
    at java/nio/DirectByteBuffer.get(DirectByteBuffer.java:294)[optimized]
    at
org/apache/lucene/store/MMapDirectory$MMapIndexInput.readBytes(MMapDirectory.java:298)[optimized]
    at org/apache/lucene/store/DataInput.readBytes(DataInput.java:72)
    at
org/apache/lucene/index/CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:275)[optimized]
    at
org/apache/lucene/store/BufferedIndexInput.refill(BufferedIndexInput.java:270)[optimized]
    at
org/apache/lucene/store/BufferedIndexInput.readByte(BufferedIndexInput.java:40)[inlined]
    at
org/apache/lucene/store/DataInput.readVInt(DataInput.java:107)[inlined]
    at
org/apache/lucene/store/BufferedIndexInput.readVInt(BufferedIndexInput.java:217)[optimized]
    at org/apache/lucene/index/FieldsReader.doc(FieldsReader.java:235)
    at
org/apache/lucene/index/SegmentReader.document(SegmentReader.java:492)
    at
org/apache/lucene/index/DirectoryReader.document(DirectoryReader.java:568)
    at org/apache/lucene/index/MultiReader.document(MultiReader.java:252)
    at org/apache/lucene/index/IndexReader.document(IndexReader.java:1138)
    at
org/apache/lucene/search/IndexSearcher.doc(IndexSearcher.java:258)[inlined]


Can you please tell me if all this corruption is caused by the fact that I
am not closing the old IndexReader. But if I close if given that it is
share by multiple threads I will need to check each time before doing the
search if IndexReader is still open correct? Let's say in a thread I am
reopening the IndexReader and in another thread I am afterwards reusing the
old one in that case I should do the check correct? Or is there a smarter
mechanism in place.

Any help with this would be more than welcome.


Thank you very much and best regards,
Liviu

Re: Issue with Lucene 3.6.1 and MMapDirectory

Posted by Liviu Matei <li...@gmail.com>.
Also one more thing ... sorry forgot to add by using lsof I noticed deleted
index files that are still used by the application. Is this ok? Can't this
cause issues? The IndexReader trying to access an index file that was
deleted ? I suspect the deletion happens because of index merges during
indexing.

Thanks,
Liviu


On Mon, May 19, 2014 at 12:15 PM, Liviu Matei <li...@gmail.com> wrote:

> Thank you very much to all of you the answers.
> Uwe this is the strange thing that I am currently never closing the index
> reader and opening a new one from 8 to 8 hours and I am noticing that crash
> in indeed a highly concurrent environment.
> The indexes reside in a NFS file system. And the location is shared
> between multiple machines - on each machine running  multiple JVMs this is
> why I mentioned shared mount.
> Can you please tell me what parameters/settings should I check on the OS
> side? I did ulimit and it returns unlimited. I will check for "
> AlreadyClosedException" but I didn't saw that in the logs, but I will
> check again.
>
> 2. With the next release I am trying to do the close of IndexReader. I
> looked at SearchManager but what I am doing when doing a search I am also
> doing a indexing of the search content in order to search in it also in
> order to determine of score of the queries that I am constructing in the
> itself content. With SearchManager if I am correct I cannot do that but
> Clemens's approuch with ir.incRef(); should be ok also correct?
>
> Thank you very much again,
> Liviu
>
>
>
>
> On Fri, May 16, 2014 at 11:55 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>> Hi,
>>
>> > Now if I don't
>> > close the old index reader I am noticing increases of virtual memory
>> with
>> > every new reindex reopen (this should not be an issue on 64 bit Linux
>> > correct - this is the configuration I am using and the indexes are on a
>> > shared mount NTFS file system ).
>>
>> This always brings a virtual memory leak on all platforms (also Linux).
>> In addition, files of older segments cannot be completely deleted anymore,
>> so it also consumes disk space.
>>
>> > Can you please tell me if all this corruption is caused by the fact
>> that I
>> > am not closing the old IndexReader. But if I close if given that it is
>> > share by multiple threads I will need to check each time before doing
>> the
>> > search if IndexReader is still open correct? Let's say in a thread I am
>> > reopening the IndexReader and in another thread I am afterwards reusing
>> the
>> > old one in that case I should do the check correct? Or is there a
>> smarter
>> > mechanism in place.
>>
>> It is the other way round: If you not close the IndexReader it cannot
>> crash (unless your JDK has a bug or somehow your filesystem [you mentioned
>> shared... what does this mean?] forcefully unmaps the index files), it only
>> happens if you close it! The issue here is: If you close the IndexReader
>> and another thread is currently running a query, the above can happen,
>> because the memory mapped buffer was forcefully unmapped by the
>> MMapDirectory. Since Lucene 3.6.0, Lucene tries its best to prevent this
>> crash from happening, but in high concurrency cases this may fail (because
>> of missing synchronization, which would kill performance):
>> http://issues.apache.org/jira/browse/LUCENE-3588
>>
>> In your case, in parallel to those crashes, you should also see
>> "AlreadyClosedException", which is the root of the problem. It is just
>> sometimes that the MMapDirectory code cannot correctly detect the already
>> closed and crashes.
>>
>> So forcefully reopening indexreaders and closing the old ones, while
>> queries are running is the wrong way to go. I would suggest to use
>> SearcherManager, which can keep track of the indexreaders correctly.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Clemens Wyss DEV [mailto:clemensdev@mysign.ch]
>> > Sent: Wednesday, May 14, 2014 7:53 PM
>> > To: java-user@lucene.apache.org
>> > Subject: AW: Issue with Lucene 3.6.1 and MMapDirectory
>> >
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: Issue with Lucene 3.6.1 and MMapDirectory

Posted by Liviu Matei <li...@gmail.com>.
Thank you very much to all of you the answers.
Uwe this is the strange thing that I am currently never closing the index
reader and opening a new one from 8 to 8 hours and I am noticing that crash
in indeed a highly concurrent environment.
The indexes reside in a NFS file system. And the location is shared between
multiple machines - on each machine running  multiple JVMs this is why I
mentioned shared mount.
Can you please tell me what parameters/settings should I check on the OS
side? I did ulimit and it returns unlimited. I will check for "
AlreadyClosedException" but I didn't saw that in the logs, but I will check
again.

2. With the next release I am trying to do the close of IndexReader. I
looked at SearchManager but what I am doing when doing a search I am also
doing a indexing of the search content in order to search in it also in
order to determine of score of the queries that I am constructing in the
itself content. With SearchManager if I am correct I cannot do that but
Clemens's approuch with ir.incRef(); should be ok also correct?

Thank you very much again,
Liviu




On Fri, May 16, 2014 at 11:55 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
> > Now if I don't
> > close the old index reader I am noticing increases of virtual memory with
> > every new reindex reopen (this should not be an issue on 64 bit Linux
> > correct - this is the configuration I am using and the indexes are on a
> > shared mount NTFS file system ).
>
> This always brings a virtual memory leak on all platforms (also Linux). In
> addition, files of older segments cannot be completely deleted anymore, so
> it also consumes disk space.
>
> > Can you please tell me if all this corruption is caused by the fact that
> I
> > am not closing the old IndexReader. But if I close if given that it is
> > share by multiple threads I will need to check each time before doing the
> > search if IndexReader is still open correct? Let's say in a thread I am
> > reopening the IndexReader and in another thread I am afterwards reusing
> the
> > old one in that case I should do the check correct? Or is there a smarter
> > mechanism in place.
>
> It is the other way round: If you not close the IndexReader it cannot
> crash (unless your JDK has a bug or somehow your filesystem [you mentioned
> shared... what does this mean?] forcefully unmaps the index files), it only
> happens if you close it! The issue here is: If you close the IndexReader
> and another thread is currently running a query, the above can happen,
> because the memory mapped buffer was forcefully unmapped by the
> MMapDirectory. Since Lucene 3.6.0, Lucene tries its best to prevent this
> crash from happening, but in high concurrency cases this may fail (because
> of missing synchronization, which would kill performance):
> http://issues.apache.org/jira/browse/LUCENE-3588
>
> In your case, in parallel to those crashes, you should also see
> "AlreadyClosedException", which is the root of the problem. It is just
> sometimes that the MMapDirectory code cannot correctly detect the already
> closed and crashes.
>
> So forcefully reopening indexreaders and closing the old ones, while
> queries are running is the wrong way to go. I would suggest to use
> SearcherManager, which can keep track of the indexreaders correctly.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Clemens Wyss DEV [mailto:clemensdev@mysign.ch]
> > Sent: Wednesday, May 14, 2014 7:53 PM
> > To: java-user@lucene.apache.org
> > Subject: AW: Issue with Lucene 3.6.1 and MMapDirectory
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Issue with Lucene 3.6.1 and MMapDirectory

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

> Now if I don't 
> close the old index reader I am noticing increases of virtual memory with 
> every new reindex reopen (this should not be an issue on 64 bit Linux 
> correct - this is the configuration I am using and the indexes are on a 
> shared mount NTFS file system ).

This always brings a virtual memory leak on all platforms (also Linux). In addition, files of older segments cannot be completely deleted anymore, so it also consumes disk space.

> Can you please tell me if all this corruption is caused by the fact that I 
> am not closing the old IndexReader. But if I close if given that it is 
> share by multiple threads I will need to check each time before doing the 
> search if IndexReader is still open correct? Let's say in a thread I am 
> reopening the IndexReader and in another thread I am afterwards reusing the 
> old one in that case I should do the check correct? Or is there a smarter 
> mechanism in place. 

It is the other way round: If you not close the IndexReader it cannot crash (unless your JDK has a bug or somehow your filesystem [you mentioned shared... what does this mean?] forcefully unmaps the index files), it only happens if you close it! The issue here is: If you close the IndexReader and another thread is currently running a query, the above can happen, because the memory mapped buffer was forcefully unmapped by the MMapDirectory. Since Lucene 3.6.0, Lucene tries its best to prevent this crash from happening, but in high concurrency cases this may fail (because of missing synchronization, which would kill performance): http://issues.apache.org/jira/browse/LUCENE-3588

In your case, in parallel to those crashes, you should also see "AlreadyClosedException", which is the root of the problem. It is just sometimes that the MMapDirectory code cannot correctly detect the already closed and crashes.

So forcefully reopening indexreaders and closing the old ones, while queries are running is the wrong way to go. I would suggest to use SearcherManager, which can keep track of the indexreaders correctly.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Clemens Wyss DEV [mailto:clemensdev@mysign.ch]
> Sent: Wednesday, May 14, 2014 7:53 PM
> To: java-user@lucene.apache.org
> Subject: AW: Issue with Lucene 3.6.1 and MMapDirectory
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: Issue with Lucene 3.6.1 and MMapDirectory

Posted by Clemens Wyss DEV <cl...@mysign.ch>.
Not closing an IndexReader most probably (to say the least) results in a mem-leak -> OOM

> But if I close if given that it is share by multiple threads I will 
>need to check each time before doing the search if IndexReader is still open correct?
You can make use of IndexReader#incRef/#decRef , i.e.
ir.incRef();
try
{
<your search code>
}
finally
{
ir.decRef();
}
...
IFF ir.getRefCount() > 1 THEN you are safe to close the "old" ir.

Maybe  SearcherManager http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html  fits your needs?


-----Ursprüngliche Nachricht-----
Von: Liviu Matei [mailto:liviumat@gmail.com] 
Gesendet: Mittwoch, 14. Mai 2014 11:06
An: java-user@lucene.apache.org
Betreff: Issue with Lucene 3.6.1 and MMapDirectory

Hi,

I am encountering the following issue with Lucene 3.6.1 if you could let me know if I am doing something wrong / there is a mistake I am making it would be great.

In order to improve the performance of the application that I am working at I went to the approach of reusing the IndexReader and reopening it every 8 hours in order to get the latest changes. (IndexReader is declared as a global static variable). The search method is called from multiple threads in parallel so the index reader is shared between threads. Now if I don't close the old index reader I am noticing increases of virtual memory with every new reindex reopen (this should not be an issue on 64 bit Linux correct - this is the configuration I am using and the indexes are on a shared mount NTFS file system ).
   Also from time to times I noticed JVM crasches with the following stack:
 Thread Stack Trace:
    at memcpy+160()@0x381aa7b060
    -- Java stack --
    at java/nio/DirectByteBuffer.get(DirectByteBuffer.java:294)[optimized]
    at
org/apache/lucene/store/MMapDirectory$MMapIndexInput.readBytes(MMapDirectory.java:298)[optimized]
    at org/apache/lucene/store/DataInput.readBytes(DataInput.java:72)
    at
org/apache/lucene/index/CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:275)[optimized]
    at
org/apache/lucene/store/BufferedIndexInput.refill(BufferedIndexInput.java:270)[optimized]
    at
org/apache/lucene/store/BufferedIndexInput.readByte(BufferedIndexInput.java:40)[inlined]
    at
org/apache/lucene/store/DataInput.readVInt(DataInput.java:107)[inlined]
    at
org/apache/lucene/store/BufferedIndexInput.readVInt(BufferedIndexInput.java:217)[optimized]
    at org/apache/lucene/index/FieldsReader.doc(FieldsReader.java:235)
    at
org/apache/lucene/index/SegmentReader.document(SegmentReader.java:492)
    at
org/apache/lucene/index/DirectoryReader.document(DirectoryReader.java:568)
    at org/apache/lucene/index/MultiReader.document(MultiReader.java:252)
    at org/apache/lucene/index/IndexReader.document(IndexReader.java:1138)
    at
org/apache/lucene/search/IndexSearcher.doc(IndexSearcher.java:258)[inlined]


Can you please tell me if all this corruption is caused by the fact that I am not closing the old IndexReader. But if I close if given that it is share by multiple threads I will need to check each time before doing the search if IndexReader is still open correct? Let's say in a thread I am reopening the IndexReader and in another thread I am afterwards reusing the old one in that case I should do the check correct? Or is there a smarter mechanism in place.

Any help with this would be more than welcome.


Thank you very much and best regards,
Liviu

AW: Issue with Lucene 3.6.1 and MMapDirectory

Posted by Clemens Wyss DEV <cl...@mysign.ch>.
> But if I close if given that it is share by multiple threads I will need to check each time 
>before doing the search if IndexReader is still open correct?
You can make use of IndexReader#incRef/#decRef , i.e.
ir.incRef();
try
{

Or maybe SearcherManager http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html  may fit your needs?

-----Ursprüngliche Nachricht-----
Von: Liviu Matei [mailto:liviumat@gmail.com] 
Gesendet: Mittwoch, 14. Mai 2014 11:06
An: java-user@lucene.apache.org
Betreff: Issue with Lucene 3.6.1 and MMapDirectory

Hi,

I am encountering the following issue with Lucene 3.6.1 if you could let me know if I am doing something wrong / there is a mistake I am making it would be great.

In order to improve the performance of the application that I am working at I went to the approach of reusing the IndexReader and reopening it every 8 hours in order to get the latest changes. (IndexReader is declared as a global static variable). The search method is called from multiple threads in parallel so the index reader is shared between threads. Now if I don't close the old index reader I am noticing increases of virtual memory with every new reindex reopen (this should not be an issue on 64 bit Linux correct - this is the configuration I am using and the indexes are on a shared mount NTFS file system ).
   Also from time to times I noticed JVM crasches with the following stack:
 Thread Stack Trace:
    at memcpy+160()@0x381aa7b060
    -- Java stack --
    at java/nio/DirectByteBuffer.get(DirectByteBuffer.java:294)[optimized]
    at
org/apache/lucene/store/MMapDirectory$MMapIndexInput.readBytes(MMapDirectory.java:298)[optimized]
    at org/apache/lucene/store/DataInput.readBytes(DataInput.java:72)
    at
org/apache/lucene/index/CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:275)[optimized]
    at
org/apache/lucene/store/BufferedIndexInput.refill(BufferedIndexInput.java:270)[optimized]
    at
org/apache/lucene/store/BufferedIndexInput.readByte(BufferedIndexInput.java:40)[inlined]
    at
org/apache/lucene/store/DataInput.readVInt(DataInput.java:107)[inlined]
    at
org/apache/lucene/store/BufferedIndexInput.readVInt(BufferedIndexInput.java:217)[optimized]
    at org/apache/lucene/index/FieldsReader.doc(FieldsReader.java:235)
    at
org/apache/lucene/index/SegmentReader.document(SegmentReader.java:492)
    at
org/apache/lucene/index/DirectoryReader.document(DirectoryReader.java:568)
    at org/apache/lucene/index/MultiReader.document(MultiReader.java:252)
    at org/apache/lucene/index/IndexReader.document(IndexReader.java:1138)
    at
org/apache/lucene/search/IndexSearcher.doc(IndexSearcher.java:258)[inlined]


Can you please tell me if all this corruption is caused by the fact that I am not closing the old IndexReader. But if I close if given that it is share by multiple threads I will need to check each time before doing the search if IndexReader is still open correct? Let's say in a thread I am reopening the IndexReader and in another thread I am afterwards reusing the old one in that case I should do the check correct? Or is there a smarter mechanism in place.

Any help with this would be more than welcome.


Thank you very much and best regards,
Liviu