You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Denis Bazhenov <ba...@farpost.com> on 2013/08/07 08:45:27 UTC

WeakIdentityMap high memory usage

We have upgraded from Lucene 3.6 to 4.4.On the production we faced high minor GC time. Heap dump showed that one of the biggest objects by size is org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference. About 11 million instances with about 377 megabytes of memory in total (this is not even retained size). Here is screenshot of the JProfiler output: https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013-08-07%20at%205.35.22%20PM.png.

The keys of the map are MMapIndexInput. What this map is for and how can I reduce it memory usage?
---
Denis Bazhenov <ba...@farpost.com>
FarPost.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: WeakIdentityMap high memory usage

Posted by Denis Bazhenov <do...@gmail.com>.
Yes, definitely. Our typical setup is 16Gb physical RAM and -Xmx4G per node (index size is about 1-1.5Gb per node). So there is plenty of room for OS cache, I guess. I'll take a closer look at the number of major page faults, but at the moment iostat says that everything is pretty fine.

On the other hand, it seems like we could pack those search nodes more densely on RAM (we have 20 of those), but it's the topic for another story.

On Aug 8, 2013, at 11:18 PM, Michael McCandless <lu...@mikemccandless.com> wrote:

> Note that you should still run a tight ship, ie don't give excess heap
> to Lucene, and instead let the OS take up the slack of any spare RAM
> for IO caching.

---
Denis Bazhenov <do...@gmail.com>






Re: WeakIdentityMap high memory usage

Posted by Robert Muir <rc...@gmail.com>.
On Thu, Aug 8, 2013 at 11:31 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:

> A number of users have complained about the apparent RAM usage of
> WeakIdentityMap, and it adds complexity to ByteBufferIndexInput to do
> this tracking ... I think defaulting the unmap hack to off is best for
> users of MMapDir.
>

For which users? 100/1000QPS users with too-large-heaps who complain
about weak references?

For users with properly configured heap sizes: there is no problem.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: WeakIdentityMap high memory usage

Posted by Michael McCandless <lu...@mikemccandless.com>.
I agree "file sitting" is not great, but at worse this causes a higher
transient disk usage, which happens already if you have readers open
against those files, during merging, during CFS building, etc.

A number of users have complained about the apparent RAM usage of
WeakIdentityMap, and it adds complexity to ByteBufferIndexInput to do
this tracking ... I think defaulting the unmap hack to off is best for
users of MMapDir.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Aug 8, 2013 at 9:09 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Hi Mike,
>
> I don't think disabling by default is a good idea. It is not only 64 bit wasted address space (which is not a problem at all, you are right), but the JVM also "sits" on those files:
> - On windows they cannot be deleted (not even on Java 7 w/ Lucene trunk, where you can now delete them if not mmapped) - this may cause major pain...!
> - On posix the disk space is locked, so the inode can only be freed when GC was freeing the mapping
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Thursday, August 08, 2013 2:18 PM
>> To: Lucene Users
>> Subject: Re: WeakIdentityMap high memory usage
>>
>> Thanks for bringing closure.
>>
>> Note that you should still run a tight ship, ie don't give excess heap to
>> Lucene, and instead let the OS take up the slack of any spare RAM for IO
>> caching.  Especially with unmap disabled, the JVM will now only unmap once
>> a map is GC'd, so the larger your heap the longer these unused maps are
>> held open.
>>
>> Maybe we should disable unmap by default; I don't see what value it brings
>> for 64 bit envs.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, Aug 7, 2013 at 9:34 PM, Denis Bazhenov <do...@gmail.com>
>> wrote:
>> > Uwe, Michael, thank you very much for your help. We have deployed one
>> of the nodes in our system and tomorrow I'll have more information on that,
>> but it seems that setUseUnmap(false) trick did the job. RT drops significantly
>> comparing to 3.6.0 version. We have about 100 rps per search-node and
>> commit interval about 1 minute, so switching off the unmap seems like a
>> good idea.
>> >
>> > There is one more question. As far as I understand, this map is like a fuse in
>> situation where clients continue to use IndexReader after it is already closed.
>> So if the code is correctly closing IndexReaders (only after all clients have
>> finished using it), there is no need to use this sort of weak map hashing. Did I
>> get it right?
>> >
>> > On Aug 8, 2013, at 4:31 AM, "Uwe Schindler" <uw...@thetaphi.de> wrote:
>> >
>> >> Hi Denis,
>> >>
>> >> I assume you are using Lucene 3.6.0, because in Lucene 3.6.1 the tracking
>> of buffers using weak references is also done (although you cannot switch it
>> off, unfortunately).
>> >>
>> >> I can confirm what Mike says: Its all weak references and the overhead is
>> maybe large, but it gets freed when memory gets low. In general its in most
>> cases better to not allocate too much heap space for Lucene as this makes
>> those maps larger and GC gets stressed. Only use as much memory so no
>> OOM occurs and instead free al memory for the file system cache (so it has
>> less paging). In that case, GC will clean up the concurrent maps faster.
>> >>
>> >> In gernal: If you have an large index that changes seldom, but your query
>> rate is very hight (like 200 queries per second), switch unmapping off (works
>> since Lucene 4.2, see changelog for LUCENE-4740 - unfortunately the issue
>> itself was closed for 4.4, 4.2 would be correct). In that case it's not needed to
>> take care of unmapping and as index reopen rate is low, this does not waste
>> resources.
>> >>
>> >> But if your index changes often, there is no way around unmapping - or
>> use NIOFSDir with NRTCachingDirectory for the optimization of near real time
>> search with highly changing indexes!
>> >>
>> >> Finally: The only way to fix this would be to make all codec structures like
>> TermsEnum or DocsEnum, but also Scorer/DocIdSet/... implement Closeable.
>> When you are done with Scorer you have to close it and the underlying
>> cloned indexinput would be closed, too. In that case, the cloned IndexInput
>> would be refcounted and unmapped when the last clone is closed. This is a
>> larger change and might be an idea for Lucene 5.0 as "optimization". It would
>> be a backwards break because all codecs and all queries would need to close
>> correctly, but with our test frameworak and MockDirWrapper (and other
>> MockFooBarWrappers) we could track this so all resources are closed.
>> >> We had TermEnum.close() up to Lucene 3.x, but it was dropped in 4.0
>> because it was never working in 3.x (nobody ever called close() on
>> TermEnum or TermDocs instances.... :( ). With our new test framework this
>> could be tracked now... So maybe worth a try?
>> >>
>> >> Uwe
>> >>
>> >> -----
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: uwe@thetaphi.de
>> >>
>> >>> -----Original Message-----
>> >>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> >>> Sent: Wednesday, August 07, 2013 3:45 PM
>> >>> To: Lucene Users
>> >>> Subject: Re: WeakIdentityMap high memory usage
>> >>>
>> >>> This map is used to track all cloned open files, which can be a very
>> >>> large number over time (each search will create maybe 3 of them).
>> >>>
>> >>> This is done as a "best effort" to prevent SEGV (JVM dies) if you
>> >>> accidentally try to use an IndexReader after it was closed, while using
>> MMapDirectory.
>> >>>
>> >>> However, it's a weak map, which means when HEAP is tight GC should
>> >>> drop it.
>> >>>
>> >>> So, this should not cause a real problem in "real life", even though
>> >>> it looks scary when you look at its RAM usage under a profiler.
>> >>>
>> >>> If somehow it's causing "real life" problems, please report back!
>> >>> But a simple workaround is to call MMapDirectory.setUseUnmap(false)
>> >>> to turn off this tracking; this means you rely on GC to (eventually)
>> unmap.
>> >>>
>> >>> Mike McCandless
>> >>>
>> >>> http://blog.mikemccandless.com
>> >>>
>> >>>
>> >>> On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov
>> >>> <ba...@farpost.com>
>> >>> wrote:
>> >>>> We have upgraded from Lucene 3.6 to 4.4.On the production we faced
>> >>>> high
>> >>> minor GC time. Heap dump showed that one of the biggest objects by
>> >>> size is
>> >>> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference.
>> About
>> >>> 11 million instances with about 377 megabytes of memory in total (this is
>> not even retained size). Here is screenshot of the JProfiler output:
>> >>>
>> https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013-
>> >>> 08-07%20at%205.35.22%20PM.png.
>> >>>>
>> >>>> The keys of the map are MMapIndexInput. What this map is for and
>> >>>> how
>> >>> can I reduce it memory usage?
>> >>>> ---
>> >>>> Denis Bazhenov <ba...@farpost.com> FarPost.
>> >>>>
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >
>> > ---
>> > Denis Bazhenov <do...@gmail.com>
>> >
>> >
>> >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: WeakIdentityMap high memory usage

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Mike,

I don't think disabling by default is a good idea. It is not only 64 bit wasted address space (which is not a problem at all, you are right), but the JVM also "sits" on those files:
- On windows they cannot be deleted (not even on Java 7 w/ Lucene trunk, where you can now delete them if not mmapped) - this may cause major pain...!
- On posix the disk space is locked, so the inode can only be freed when GC was freeing the mapping

Uwe 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Thursday, August 08, 2013 2:18 PM
> To: Lucene Users
> Subject: Re: WeakIdentityMap high memory usage
> 
> Thanks for bringing closure.
> 
> Note that you should still run a tight ship, ie don't give excess heap to
> Lucene, and instead let the OS take up the slack of any spare RAM for IO
> caching.  Especially with unmap disabled, the JVM will now only unmap once
> a map is GC'd, so the larger your heap the longer these unused maps are
> held open.
> 
> Maybe we should disable unmap by default; I don't see what value it brings
> for 64 bit envs.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Wed, Aug 7, 2013 at 9:34 PM, Denis Bazhenov <do...@gmail.com>
> wrote:
> > Uwe, Michael, thank you very much for your help. We have deployed one
> of the nodes in our system and tomorrow I'll have more information on that,
> but it seems that setUseUnmap(false) trick did the job. RT drops significantly
> comparing to 3.6.0 version. We have about 100 rps per search-node and
> commit interval about 1 minute, so switching off the unmap seems like a
> good idea.
> >
> > There is one more question. As far as I understand, this map is like a fuse in
> situation where clients continue to use IndexReader after it is already closed.
> So if the code is correctly closing IndexReaders (only after all clients have
> finished using it), there is no need to use this sort of weak map hashing. Did I
> get it right?
> >
> > On Aug 8, 2013, at 4:31 AM, "Uwe Schindler" <uw...@thetaphi.de> wrote:
> >
> >> Hi Denis,
> >>
> >> I assume you are using Lucene 3.6.0, because in Lucene 3.6.1 the tracking
> of buffers using weak references is also done (although you cannot switch it
> off, unfortunately).
> >>
> >> I can confirm what Mike says: Its all weak references and the overhead is
> maybe large, but it gets freed when memory gets low. In general its in most
> cases better to not allocate too much heap space for Lucene as this makes
> those maps larger and GC gets stressed. Only use as much memory so no
> OOM occurs and instead free al memory for the file system cache (so it has
> less paging). In that case, GC will clean up the concurrent maps faster.
> >>
> >> In gernal: If you have an large index that changes seldom, but your query
> rate is very hight (like 200 queries per second), switch unmapping off (works
> since Lucene 4.2, see changelog for LUCENE-4740 - unfortunately the issue
> itself was closed for 4.4, 4.2 would be correct). In that case it's not needed to
> take care of unmapping and as index reopen rate is low, this does not waste
> resources.
> >>
> >> But if your index changes often, there is no way around unmapping - or
> use NIOFSDir with NRTCachingDirectory for the optimization of near real time
> search with highly changing indexes!
> >>
> >> Finally: The only way to fix this would be to make all codec structures like
> TermsEnum or DocsEnum, but also Scorer/DocIdSet/... implement Closeable.
> When you are done with Scorer you have to close it and the underlying
> cloned indexinput would be closed, too. In that case, the cloned IndexInput
> would be refcounted and unmapped when the last clone is closed. This is a
> larger change and might be an idea for Lucene 5.0 as "optimization". It would
> be a backwards break because all codecs and all queries would need to close
> correctly, but with our test frameworak and MockDirWrapper (and other
> MockFooBarWrappers) we could track this so all resources are closed.
> >> We had TermEnum.close() up to Lucene 3.x, but it was dropped in 4.0
> because it was never working in 3.x (nobody ever called close() on
> TermEnum or TermDocs instances.... :( ). With our new test framework this
> could be tracked now... So maybe worth a try?
> >>
> >> Uwe
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe@thetaphi.de
> >>
> >>> -----Original Message-----
> >>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> >>> Sent: Wednesday, August 07, 2013 3:45 PM
> >>> To: Lucene Users
> >>> Subject: Re: WeakIdentityMap high memory usage
> >>>
> >>> This map is used to track all cloned open files, which can be a very
> >>> large number over time (each search will create maybe 3 of them).
> >>>
> >>> This is done as a "best effort" to prevent SEGV (JVM dies) if you
> >>> accidentally try to use an IndexReader after it was closed, while using
> MMapDirectory.
> >>>
> >>> However, it's a weak map, which means when HEAP is tight GC should
> >>> drop it.
> >>>
> >>> So, this should not cause a real problem in "real life", even though
> >>> it looks scary when you look at its RAM usage under a profiler.
> >>>
> >>> If somehow it's causing "real life" problems, please report back!
> >>> But a simple workaround is to call MMapDirectory.setUseUnmap(false)
> >>> to turn off this tracking; this means you rely on GC to (eventually)
> unmap.
> >>>
> >>> Mike McCandless
> >>>
> >>> http://blog.mikemccandless.com
> >>>
> >>>
> >>> On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov
> >>> <ba...@farpost.com>
> >>> wrote:
> >>>> We have upgraded from Lucene 3.6 to 4.4.On the production we faced
> >>>> high
> >>> minor GC time. Heap dump showed that one of the biggest objects by
> >>> size is
> >>> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference.
> About
> >>> 11 million instances with about 377 megabytes of memory in total (this is
> not even retained size). Here is screenshot of the JProfiler output:
> >>>
> https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013-
> >>> 08-07%20at%205.35.22%20PM.png.
> >>>>
> >>>> The keys of the map are MMapIndexInput. What this map is for and
> >>>> how
> >>> can I reduce it memory usage?
> >>>> ---
> >>>> Denis Bazhenov <ba...@farpost.com> FarPost.
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> > ---
> > Denis Bazhenov <do...@gmail.com>
> >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: WeakIdentityMap high memory usage

Posted by Michael McCandless <lu...@mikemccandless.com>.
Thanks for bringing closure.

Note that you should still run a tight ship, ie don't give excess heap
to Lucene, and instead let the OS take up the slack of any spare RAM
for IO caching.  Especially with unmap disabled, the JVM will now only
unmap once a map is GC'd, so the larger your heap the longer these
unused maps are held open.

Maybe we should disable unmap by default; I don't see what value it
brings for 64 bit envs.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Aug 7, 2013 at 9:34 PM, Denis Bazhenov <do...@gmail.com> wrote:
> Uwe, Michael, thank you very much for your help. We have deployed one of the nodes in our system and tomorrow I'll have more information on that, but it seems that setUseUnmap(false) trick did the job. RT drops significantly comparing to 3.6.0 version. We have about 100 rps per search-node and commit interval about 1 minute, so switching off the unmap seems like a good idea.
>
> There is one more question. As far as I understand, this map is like a fuse in situation where clients continue to use IndexReader after it is already closed. So if the code is correctly closing IndexReaders (only after all clients have finished using it), there is no need to use this sort of weak map hashing. Did I get it right?
>
> On Aug 8, 2013, at 4:31 AM, "Uwe Schindler" <uw...@thetaphi.de> wrote:
>
>> Hi Denis,
>>
>> I assume you are using Lucene 3.6.0, because in Lucene 3.6.1 the tracking of buffers using weak references is also done (although you cannot switch it off, unfortunately).
>>
>> I can confirm what Mike says: Its all weak references and the overhead is maybe large, but it gets freed when memory gets low. In general its in most cases better to not allocate too much heap space for Lucene as this makes those maps larger and GC gets stressed. Only use as much memory so no OOM occurs and instead free al memory for the file system cache (so it has less paging). In that case, GC will clean up the concurrent maps faster.
>>
>> In gernal: If you have an large index that changes seldom, but your query rate is very hight (like 200 queries per second), switch unmapping off (works since Lucene 4.2, see changelog for LUCENE-4740 - unfortunately the issue itself was closed for 4.4, 4.2 would be correct). In that case it's not needed to take care of unmapping and as index reopen rate is low, this does not waste resources.
>>
>> But if your index changes often, there is no way around unmapping - or use NIOFSDir with NRTCachingDirectory for the optimization of near real time search with highly changing indexes!
>>
>> Finally: The only way to fix this would be to make all codec structures like TermsEnum or DocsEnum, but also Scorer/DocIdSet/... implement Closeable. When you are done with Scorer you have to close it and the underlying cloned indexinput would be closed, too. In that case, the cloned IndexInput would be refcounted and unmapped when the last clone is closed. This is a larger change and might be an idea for Lucene 5.0 as "optimization". It would be a backwards break because all codecs and all queries would need to close correctly, but with our test frameworak and MockDirWrapper (and other MockFooBarWrappers) we could track this so all resources are closed.
>> We had TermEnum.close() up to Lucene 3.x, but it was dropped in 4.0 because it was never working in 3.x (nobody ever called close() on TermEnum or TermDocs instances.... :( ). With our new test framework this could be tracked now... So maybe worth a try?
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>> -----Original Message-----
>>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>>> Sent: Wednesday, August 07, 2013 3:45 PM
>>> To: Lucene Users
>>> Subject: Re: WeakIdentityMap high memory usage
>>>
>>> This map is used to track all cloned open files, which can be a very large
>>> number over time (each search will create maybe 3 of them).
>>>
>>> This is done as a "best effort" to prevent SEGV (JVM dies) if you accidentally
>>> try to use an IndexReader after it was closed, while using MMapDirectory.
>>>
>>> However, it's a weak map, which means when HEAP is tight GC should drop
>>> it.
>>>
>>> So, this should not cause a real problem in "real life", even though it looks
>>> scary when you look at its RAM usage under a profiler.
>>>
>>> If somehow it's causing "real life" problems, please report back!  But a simple
>>> workaround is to call MMapDirectory.setUseUnmap(false) to turn off this
>>> tracking; this means you rely on GC to (eventually) unmap.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov <ba...@farpost.com>
>>> wrote:
>>>> We have upgraded from Lucene 3.6 to 4.4.On the production we faced high
>>> minor GC time. Heap dump showed that one of the biggest objects by size is
>>> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference. About 11
>>> million instances with about 377 megabytes of memory in total (this is not
>>> even retained size). Here is screenshot of the JProfiler output:
>>> https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013-
>>> 08-07%20at%205.35.22%20PM.png.
>>>>
>>>> The keys of the map are MMapIndexInput. What this map is for and how
>>> can I reduce it memory usage?
>>>> ---
>>>> Denis Bazhenov <ba...@farpost.com>
>>>> FarPost.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---
> Denis Bazhenov <do...@gmail.com>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: WeakIdentityMap high memory usage

Posted by Denis Bazhenov <do...@gmail.com>.
Uwe, Michael, thank you very much for your help. We have deployed one of the nodes in our system and tomorrow I'll have more information on that, but it seems that setUseUnmap(false) trick did the job. RT drops significantly comparing to 3.6.0 version. We have about 100 rps per search-node and commit interval about 1 minute, so switching off the unmap seems like a good idea.

There is one more question. As far as I understand, this map is like a fuse in situation where clients continue to use IndexReader after it is already closed. So if the code is correctly closing IndexReaders (only after all clients have finished using it), there is no need to use this sort of weak map hashing. Did I get it right?

On Aug 8, 2013, at 4:31 AM, "Uwe Schindler" <uw...@thetaphi.de> wrote:

> Hi Denis,
> 
> I assume you are using Lucene 3.6.0, because in Lucene 3.6.1 the tracking of buffers using weak references is also done (although you cannot switch it off, unfortunately).
> 
> I can confirm what Mike says: Its all weak references and the overhead is maybe large, but it gets freed when memory gets low. In general its in most cases better to not allocate too much heap space for Lucene as this makes those maps larger and GC gets stressed. Only use as much memory so no OOM occurs and instead free al memory for the file system cache (so it has less paging). In that case, GC will clean up the concurrent maps faster.
> 
> In gernal: If you have an large index that changes seldom, but your query rate is very hight (like 200 queries per second), switch unmapping off (works since Lucene 4.2, see changelog for LUCENE-4740 - unfortunately the issue itself was closed for 4.4, 4.2 would be correct). In that case it's not needed to take care of unmapping and as index reopen rate is low, this does not waste resources.
> 
> But if your index changes often, there is no way around unmapping - or use NIOFSDir with NRTCachingDirectory for the optimization of near real time search with highly changing indexes!
> 
> Finally: The only way to fix this would be to make all codec structures like TermsEnum or DocsEnum, but also Scorer/DocIdSet/... implement Closeable. When you are done with Scorer you have to close it and the underlying cloned indexinput would be closed, too. In that case, the cloned IndexInput would be refcounted and unmapped when the last clone is closed. This is a larger change and might be an idea for Lucene 5.0 as "optimization". It would be a backwards break because all codecs and all queries would need to close correctly, but with our test frameworak and MockDirWrapper (and other MockFooBarWrappers) we could track this so all resources are closed.
> We had TermEnum.close() up to Lucene 3.x, but it was dropped in 4.0 because it was never working in 3.x (nobody ever called close() on TermEnum or TermDocs instances.... :( ). With our new test framework this could be tracked now... So maybe worth a try?
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Wednesday, August 07, 2013 3:45 PM
>> To: Lucene Users
>> Subject: Re: WeakIdentityMap high memory usage
>> 
>> This map is used to track all cloned open files, which can be a very large
>> number over time (each search will create maybe 3 of them).
>> 
>> This is done as a "best effort" to prevent SEGV (JVM dies) if you accidentally
>> try to use an IndexReader after it was closed, while using MMapDirectory.
>> 
>> However, it's a weak map, which means when HEAP is tight GC should drop
>> it.
>> 
>> So, this should not cause a real problem in "real life", even though it looks
>> scary when you look at its RAM usage under a profiler.
>> 
>> If somehow it's causing "real life" problems, please report back!  But a simple
>> workaround is to call MMapDirectory.setUseUnmap(false) to turn off this
>> tracking; this means you rely on GC to (eventually) unmap.
>> 
>> Mike McCandless
>> 
>> http://blog.mikemccandless.com
>> 
>> 
>> On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov <ba...@farpost.com>
>> wrote:
>>> We have upgraded from Lucene 3.6 to 4.4.On the production we faced high
>> minor GC time. Heap dump showed that one of the biggest objects by size is
>> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference. About 11
>> million instances with about 377 megabytes of memory in total (this is not
>> even retained size). Here is screenshot of the JProfiler output:
>> https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013-
>> 08-07%20at%205.35.22%20PM.png.
>>> 
>>> The keys of the map are MMapIndexInput. What this map is for and how
>> can I reduce it memory usage?
>>> ---
>>> Denis Bazhenov <ba...@farpost.com>
>>> FarPost.
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---
Denis Bazhenov <do...@gmail.com>






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: WeakIdentityMap high memory usage

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Denis,

I assume you are using Lucene 3.6.0, because in Lucene 3.6.1 the tracking of buffers using weak references is also done (although you cannot switch it off, unfortunately).

I can confirm what Mike says: Its all weak references and the overhead is maybe large, but it gets freed when memory gets low. In general its in most cases better to not allocate too much heap space for Lucene as this makes those maps larger and GC gets stressed. Only use as much memory so no OOM occurs and instead free al memory for the file system cache (so it has less paging). In that case, GC will clean up the concurrent maps faster.

In gernal: If you have an large index that changes seldom, but your query rate is very hight (like 200 queries per second), switch unmapping off (works since Lucene 4.2, see changelog for LUCENE-4740 - unfortunately the issue itself was closed for 4.4, 4.2 would be correct). In that case it's not needed to take care of unmapping and as index reopen rate is low, this does not waste resources.

But if your index changes often, there is no way around unmapping - or use NIOFSDir with NRTCachingDirectory for the optimization of near real time search with highly changing indexes!

Finally: The only way to fix this would be to make all codec structures like TermsEnum or DocsEnum, but also Scorer/DocIdSet/... implement Closeable. When you are done with Scorer you have to close it and the underlying cloned indexinput would be closed, too. In that case, the cloned IndexInput would be refcounted and unmapped when the last clone is closed. This is a larger change and might be an idea for Lucene 5.0 as "optimization". It would be a backwards break because all codecs and all queries would need to close correctly, but with our test frameworak and MockDirWrapper (and other MockFooBarWrappers) we could track this so all resources are closed.
We had TermEnum.close() up to Lucene 3.x, but it was dropped in 4.0 because it was never working in 3.x (nobody ever called close() on TermEnum or TermDocs instances.... :( ). With our new test framework this could be tracked now... So maybe worth a try?

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Wednesday, August 07, 2013 3:45 PM
> To: Lucene Users
> Subject: Re: WeakIdentityMap high memory usage
> 
> This map is used to track all cloned open files, which can be a very large
> number over time (each search will create maybe 3 of them).
> 
> This is done as a "best effort" to prevent SEGV (JVM dies) if you accidentally
> try to use an IndexReader after it was closed, while using MMapDirectory.
> 
> However, it's a weak map, which means when HEAP is tight GC should drop
> it.
> 
> So, this should not cause a real problem in "real life", even though it looks
> scary when you look at its RAM usage under a profiler.
> 
> If somehow it's causing "real life" problems, please report back!  But a simple
> workaround is to call MMapDirectory.setUseUnmap(false) to turn off this
> tracking; this means you rely on GC to (eventually) unmap.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov <ba...@farpost.com>
> wrote:
> > We have upgraded from Lucene 3.6 to 4.4.On the production we faced high
> minor GC time. Heap dump showed that one of the biggest objects by size is
> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference. About 11
> million instances with about 377 megabytes of memory in total (this is not
> even retained size). Here is screenshot of the JProfiler output:
> https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013-
> 08-07%20at%205.35.22%20PM.png.
> >
> > The keys of the map are MMapIndexInput. What this map is for and how
> can I reduce it memory usage?
> > ---
> > Denis Bazhenov <ba...@farpost.com>
> > FarPost.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: WeakIdentityMap high memory usage

Posted by Michael McCandless <lu...@mikemccandless.com>.
This map is used to track all cloned open files, which can be a very
large number over time (each search will create maybe 3 of them).

This is done as a "best effort" to prevent SEGV (JVM dies) if you
accidentally try to use an IndexReader after it was closed, while
using MMapDirectory.

However, it's a weak map, which means when HEAP is tight GC should drop it.

So, this should not cause a real problem in "real life", even though
it looks scary when you look at its RAM usage under a profiler.

If somehow it's causing "real life" problems, please report back!  But
a simple workaround is to call MMapDirectory.setUseUnmap(false) to
turn off this tracking; this means you rely on GC to (eventually)
unmap.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov <ba...@farpost.com> wrote:
> We have upgraded from Lucene 3.6 to 4.4.On the production we faced high minor GC time. Heap dump showed that one of the biggest objects by size is org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference. About 11 million instances with about 377 megabytes of memory in total (this is not even retained size). Here is screenshot of the JProfiler output: https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013-08-07%20at%205.35.22%20PM.png.
>
> The keys of the map are MMapIndexInput. What this map is for and how can I reduce it memory usage?
> ---
> Denis Bazhenov <ba...@farpost.com>
> FarPost.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org