You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Erik Stephens <mr...@gmail.com> on 2017/08/31 01:58:30 UTC

How to regulate native memory?

Our elasticsearch processes have been slowly consuming memory until a kernel OOM kills it.  Details are here:

    https://github.com/elastic/elasticsearch/issues/26269 <https://github.com/elastic/elasticsearch/issues/26269>

To summarize:

- Explicit GC is enabled
- MaxDirectMemorySize is set
- Total memory usage for the process is roughly heap (30G) + mmap'd (unbounded) + 1-2G (meta space, threads, etc)

The crowd is suggesting "Don't worry. You want to use all that memory."  I understand that sentiment except for:

- The process eventually gets OOM killed.
- I need to support multiple processes on same machine and need a more predictable footprint.

It seems to be relatively common knowledge that direct byte buffers require a GC to trigger their freedom.  However, full GC's are happening but not resulting in a reduction of resident mmap'd memory.

Any pointers to source code, settings, or tools are much appreciated.  Thanks!

--
Erik

Re: How to regulate native memory?

Posted by Erik Stephens <mr...@gmail.com>.

Thanks, Robert.  I found this bit from that link enlightening:

"Some parts of the cache can't be dropped, not even to accomodate new
applications. This includes mmap'd pages that have been mlocked by some
application, dirty pages that have not yet been written to storage, and
data stored in tmpfs (including /dev/shm, used for shared memory). The
mmap'd, mlocked pages are stuck in the page cache. Dirty pages will for the
most part swiftly be written out. Data in tmpfs will be swapped out if
possible."

That could've explained why processes are getting OOM killed when there is
so much available from the fs cache, but our elasticsearch is configured to
not lock memory.  Nothing in /prod/$pid/smaps is showing as locked either.
Will explore other avenues.  Thanks again!

--
Erik

On Wed, Aug 30, 2017 at 9:06 PM, Robert Muir <rc...@gmail.com> wrote:

> From the lucene side, it only uses file mappings for reads and doesn't
> allocate any anonymous memory.
> The way lucene uses cache for reads won't impact your OOM
> (http://www.linuxatemyram.com/play.html)
>
> At the end of the day you are running out of memory on the system
> either way, and your process might just look like a large target based
> for the oom-killer, but that doesn't mean its necessarily your problem
> at all.
>
> I advise sticking with basic operating system tools like /proc and
> free -m, reproduce the OOM kill situation, just like in that example
> link above, and try to track down the real problem.
>
>
> On Wed, Aug 30, 2017 at 11:43 PM, Erik Stephens
> <mr...@gmail.com> wrote:
> > Yeah, apologies for that long issue - the netty comments aren't
> related.  My two comments near the end might be more interesting here:
> >
> >     https://github.com/elastic/elasticsearch/issues/26269#
> issuecomment-326060213
> >
> > To try to summarize, I looked to `/proc/$pid/smaps | grep indices` to
> quantify what I think is mostly lucene usage.  Is that an accurate way to
> quantify that?  It shows 51G with `-XX:MaxDirectMemorySize=15G`.  The heap
> is 30G and the resident memory is reported as 82.5G.  That makes a bit of
> sense: 30G + 51G + miscellaneous.
> >
> > `top` reports roughly 51G as shared which is suspiciously close to what
> I'm seeing in /proc/$pid/smaps. Is it correct to think that if a process
> requests memory and there is not enough "free", then the kernel will purge
> from its cache in order to allocate that requested memory?  I'm struggling
> to see how the kernel thinks there isn't enough free memory when so much is
> in its cache, but that concern is secondary at this point.  My primary
> concern is trying to regulate the overall footprint (shared with file
> system cache or not) so that OOM killer not even part of the conversation
> in the first place.
> >
> > # grep Vm /proc/$pid/status
> > VmPeak: 982739416 kB
> > VmSize: 975784980 kB
> > VmLck:         0 kB
> > VmPin:         0 kB
> > VmHWM:  86555044 kB
> > VmRSS:  86526616 kB
> > VmData: 42644832 kB
> > VmStk:       136 kB
> > VmExe:         4 kB
> > VmLib:     18028 kB
> > VmPTE:    275292 kB
> > VmPMD:      3720 kB
> > VmSwap:        0 kB
> >
> > # free -g
> >               total        used        free      shared  buff/cache
>  available
> > Mem:            125          54           1           1          69
>     69
> > Swap:             0           0           0
> >
> > Thanks for the reply!  Apologies if not apropos to this forum - just
> working my way down the rabbit hole :)
> >
> > --
> > Erik
> >
> >
> >> On Aug 30, 2017, at 8:04 PM, Robert Muir <rc...@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> From the thread linked there, its not clear to me the problem relates
> >> to lucene (vs being e.g. a bug in netty, or too many threads, or
> >> potentially many other problems).
> >>
> >> Can you first try to determine to breakdown your problematic "RSS"
> >> from the operating system? Maybe this helps determine if your issue is
> >> with an anonymous mapping (ByteBuffer.allocateDirect) or file mapping
> >> (FileChannel.map).
> >>
> >> WIth recent kernels you can break down RSS with /proc/pid/XXX/status
> >> (RssAnon vs RssFile vs RssShmem):
> >>
> >>    http://man7.org/linux/man-pages/man5/proc.5.html
> >>
> >> If your kernel is old you may have to go through more trouble (summing
> >> up stuff from smaps or whatever)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: How to regulate native memory?

Posted by Dominique Bejean <do...@eolya.fr>.

Hi Uwe,

When you are saying "MMap is NOT direct memory", I understand that we can
consider that JVM can use (at least) these 3 types of memory:

   - Heap memory (controlled by Xmx and managed by GC)
   - Off-heap MMap (os cache) *which is not* Direct Memory and *is not*
   controlled by MaxDirectMemorySize and *is not* managed by GC
   - Off-heap Direct Memory (off heap) for Direct Byte Buffers and *is*
   controlled by MaxDirectMemorySize and *is not* managed by GC


Solr use Heap memory (of course) and Off-heap MMap but not Off-heap Direct
Memory.

Do you confirm ?

Regards

Dominique



Le jeu. 31 août 2017 à 08:40, Uwe Schindler <uw...@thetaphi.de> a écrit :

> Hi,
>
> As a suggestion from my side: As a first thing: disable the
> bootstrap.memory_lock feature:
> https://www.elastic.co/guide/en/elasticsearch/reference/5.5/important-settings.html#bootstrap.memory_lock
>
> It looks like you are using too much heap space and some plugin in your ES
> installation also using the maximum direct memory size, so I have the
> feeling something is using a lot direct memory and you want to limit that.
> MMap is NOT direct memory! MMap is also not taken into account by the OOM
> killer, because it's not owned by the process.
>
> To me it looks like the operating system kills your process as it sits
> locked on a huge amount of memory. So disable the locking (it is IMHO
> opinion not really useful and too risky). If you also have no swap disk,
> then you may also try to add some swap memory and set systems's swapiness
> to "10" or even lower. In production environments it is better to have a
> little bit of swap as a last resort, but you should tell it with the
> vm.swapiness sysctl that it should only use it as last resort.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19
> <https://maps.google.com/?q=Achterdiek+19&entry=gmail&source=g>, D-28357
> Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Robert Muir [mailto:rcmuir@gmail.com]
> > Sent: Thursday, August 31, 2017 6:07 AM
> > To: java-user <ja...@lucene.apache.org>
> > Subject: Re: How to regulate native memory?
> >
> > From the lucene side, it only uses file mappings for reads and doesn't
> > allocate any anonymous memory.
> > The way lucene uses cache for reads won't impact your OOM
> > (http://www.linuxatemyram.com/play.html)
> >
> > At the end of the day you are running out of memory on the system
> > either way, and your process might just look like a large target based
> > for the oom-killer, but that doesn't mean its necessarily your problem
> > at all.
> >
> > I advise sticking with basic operating system tools like /proc and
> > free -m, reproduce the OOM kill situation, just like in that example
> > link above, and try to track down the real problem.
> >
> >
> > On Wed, Aug 30, 2017 at 11:43 PM, Erik Stephens
> > <mr...@gmail.com> wrote:
> > > Yeah, apologies for that long issue - the netty comments aren't
> related.  My
> > two comments near the end might be more interesting here:
> > >
> > >
> https://github.com/elastic/elasticsearch/issues/26269#issuecomment-
> > 326060213
> > >
> > > To try to summarize, I looked to `/proc/$pid/smaps | grep indices` to
> > quantify what I think is mostly lucene usage.  Is that an accurate way to
> > quantify that?  It shows 51G with `-XX:MaxDirectMemorySize=15G`.  The
> > heap is 30G and the resident memory is reported as 82.5G.  That makes a
> bit
> > of sense: 30G + 51G + miscellaneous.
> > >
> > > `top` reports roughly 51G as shared which is suspiciously close to
> what I'm
> > seeing in /proc/$pid/smaps. Is it correct to think that if a process
> requests
> > memory and there is not enough "free", then the kernel will purge from
> its
> > cache in order to allocate that requested memory?  I'm struggling to see
> how
> > the kernel thinks there isn't enough free memory when so much is in its
> > cache, but that concern is secondary at this point.  My primary concern
> is
> > trying to regulate the overall footprint (shared with file system cache
> or not)
> > so that OOM killer not even part of the conversation in the first place.
> > >
> > > # grep Vm /proc/$pid/status
> > > VmPeak: 982739416 kB
> > > VmSize: 975784980 kB
> > > VmLck:         0 kB
> > > VmPin:         0 kB
> > > VmHWM:  86555044 kB
> > > VmRSS:  86526616 kB
> > > VmData: 42644832 kB
> > > VmStk:       136 kB
> > > VmExe:         4 kB
> > > VmLib:     18028 kB
> > > VmPTE:    275292 kB
> > > VmPMD:      3720 kB
> > > VmSwap:        0 kB
> > >
> > > # free -g
> > >               total        used        free      shared  buff/cache
>  available
> > > Mem:            125          54           1           1          69
>       69
> > > Swap:             0           0           0
> > >
> > > Thanks for the reply!  Apologies if not apropos to this forum - just
> working
> > my way down the rabbit hole :)
> > >
> > > --
> > > Erik
> > >
> > >
> > >> On Aug 30, 2017, at 8:04 PM, Robert Muir <rc...@gmail.com> wrote:
> > >>
> > >> Hello,
> > >>
> > >> From the thread linked there, its not clear to me the problem relates
> > >> to lucene (vs being e.g. a bug in netty, or too many threads, or
> > >> potentially many other problems).
> > >>
> > >> Can you first try to determine to breakdown your problematic "RSS"
> > >> from the operating system? Maybe this helps determine if your issue is
> > >> with an anonymous mapping (ByteBuffer.allocateDirect) or file mapping
> > >> (FileChannel.map).
> > >>
> > >> WIth recent kernels you can break down RSS with /proc/pid/XXX/status
> > >> (RssAnon vs RssFile vs RssShmem):
> > >>
> > >>    http://man7.org/linux/man-pages/man5/proc.5.html
> > >>
> > >> If your kernel is old you may have to go through more trouble (summing
> > >> up stuff from smaps or whatever)
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> --
Dominique Béjean
06 08 46 12 43

Re: How to regulate native memory?

Posted by Erik Stephens <mr...@gmail.com>.

I did not know that mmap is not considered direct memory, so thanks for
that.  Now I can stop barking about why -XX:MaxDirectMemorySize isn't
having any effect :)

--
Erik

On Wed, Aug 30, 2017 at 11:39 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
> As a suggestion from my side: As a first thing: disable the
> bootstrap.memory_lock feature: https://www.elastic.co/guide/
> en/elasticsearch/reference/5.5/important-settings.html#
> bootstrap.memory_lock
>
> It looks like you are using too much heap space and some plugin in your ES
> installation also using the maximum direct memory size, so I have the
> feeling something is using a lot direct memory and you want to limit that.
> MMap is NOT direct memory! MMap is also not taken into account by the OOM
> killer, because it's not owned by the process.
>
> To me it looks like the operating system kills your process as it sits
> locked on a huge amount of memory. So disable the locking (it is IMHO
> opinion not really useful and too risky). If you also have no swap disk,
> then you may also try to add some swap memory and set systems's swapiness
> to "10" or even lower. In production environments it is better to have a
> little bit of swap as a last resort, but you should tell it with the
> vm.swapiness sysctl that it should only use it as last resort.
>

RE: How to regulate native memory?

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi,

As a suggestion from my side: As a first thing: disable the bootstrap.memory_lock feature: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/important-settings.html#bootstrap.memory_lock

It looks like you are using too much heap space and some plugin in your ES installation also using the maximum direct memory size, so I have the feeling something is using a lot direct memory and you want to limit that. MMap is NOT direct memory! MMap is also not taken into account by the OOM killer, because it's not owned by the process. 

To me it looks like the operating system kills your process as it sits locked on a huge amount of memory. So disable the locking (it is IMHO opinion not really useful and too risky). If you also have no swap disk, then you may also try to add some swap memory and set systems's swapiness to "10" or even lower. In production environments it is better to have a little bit of swap as a last resort, but you should tell it with the vm.swapiness sysctl that it should only use it as last resort.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Robert Muir [mailto:rcmuir@gmail.com]
> Sent: Thursday, August 31, 2017 6:07 AM
> To: java-user <ja...@lucene.apache.org>
> Subject: Re: How to regulate native memory?
> 
> From the lucene side, it only uses file mappings for reads and doesn't
> allocate any anonymous memory.
> The way lucene uses cache for reads won't impact your OOM
> (http://www.linuxatemyram.com/play.html)
> 
> At the end of the day you are running out of memory on the system
> either way, and your process might just look like a large target based
> for the oom-killer, but that doesn't mean its necessarily your problem
> at all.
> 
> I advise sticking with basic operating system tools like /proc and
> free -m, reproduce the OOM kill situation, just like in that example
> link above, and try to track down the real problem.
> 
> 
> On Wed, Aug 30, 2017 at 11:43 PM, Erik Stephens
> <mr...@gmail.com> wrote:
> > Yeah, apologies for that long issue - the netty comments aren't related.  My
> two comments near the end might be more interesting here:
> >
> >     https://github.com/elastic/elasticsearch/issues/26269#issuecomment-
> 326060213
> >
> > To try to summarize, I looked to `/proc/$pid/smaps | grep indices` to
> quantify what I think is mostly lucene usage.  Is that an accurate way to
> quantify that?  It shows 51G with `-XX:MaxDirectMemorySize=15G`.  The
> heap is 30G and the resident memory is reported as 82.5G.  That makes a bit
> of sense: 30G + 51G + miscellaneous.
> >
> > `top` reports roughly 51G as shared which is suspiciously close to what I'm
> seeing in /proc/$pid/smaps. Is it correct to think that if a process requests
> memory and there is not enough "free", then the kernel will purge from its
> cache in order to allocate that requested memory?  I'm struggling to see how
> the kernel thinks there isn't enough free memory when so much is in its
> cache, but that concern is secondary at this point.  My primary concern is
> trying to regulate the overall footprint (shared with file system cache or not)
> so that OOM killer not even part of the conversation in the first place.
> >
> > # grep Vm /proc/$pid/status
> > VmPeak: 982739416 kB
> > VmSize: 975784980 kB
> > VmLck:         0 kB
> > VmPin:         0 kB
> > VmHWM:  86555044 kB
> > VmRSS:  86526616 kB
> > VmData: 42644832 kB
> > VmStk:       136 kB
> > VmExe:         4 kB
> > VmLib:     18028 kB
> > VmPTE:    275292 kB
> > VmPMD:      3720 kB
> > VmSwap:        0 kB
> >
> > # free -g
> >               total        used        free      shared  buff/cache   available
> > Mem:            125          54           1           1          69          69
> > Swap:             0           0           0
> >
> > Thanks for the reply!  Apologies if not apropos to this forum - just working
> my way down the rabbit hole :)
> >
> > --
> > Erik
> >
> >
> >> On Aug 30, 2017, at 8:04 PM, Robert Muir <rc...@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> From the thread linked there, its not clear to me the problem relates
> >> to lucene (vs being e.g. a bug in netty, or too many threads, or
> >> potentially many other problems).
> >>
> >> Can you first try to determine to breakdown your problematic "RSS"
> >> from the operating system? Maybe this helps determine if your issue is
> >> with an anonymous mapping (ByteBuffer.allocateDirect) or file mapping
> >> (FileChannel.map).
> >>
> >> WIth recent kernels you can break down RSS with /proc/pid/XXX/status
> >> (RssAnon vs RssFile vs RssShmem):
> >>
> >>    http://man7.org/linux/man-pages/man5/proc.5.html
> >>
> >> If your kernel is old you may have to go through more trouble (summing
> >> up stuff from smaps or whatever)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to regulate native memory?

Posted by Robert Muir <rc...@gmail.com>.

From the lucene side, it only uses file mappings for reads and doesn't
allocate any anonymous memory.
The way lucene uses cache for reads won't impact your OOM
(http://www.linuxatemyram.com/play.html)

At the end of the day you are running out of memory on the system
either way, and your process might just look like a large target based
for the oom-killer, but that doesn't mean its necessarily your problem
at all.

I advise sticking with basic operating system tools like /proc and
free -m, reproduce the OOM kill situation, just like in that example
link above, and try to track down the real problem.


On Wed, Aug 30, 2017 at 11:43 PM, Erik Stephens
<mr...@gmail.com> wrote:
> Yeah, apologies for that long issue - the netty comments aren't related.  My two comments near the end might be more interesting here:
>
>     https://github.com/elastic/elasticsearch/issues/26269#issuecomment-326060213
>
> To try to summarize, I looked to `/proc/$pid/smaps | grep indices` to quantify what I think is mostly lucene usage.  Is that an accurate way to quantify that?  It shows 51G with `-XX:MaxDirectMemorySize=15G`.  The heap is 30G and the resident memory is reported as 82.5G.  That makes a bit of sense: 30G + 51G + miscellaneous.
>
> `top` reports roughly 51G as shared which is suspiciously close to what I'm seeing in /proc/$pid/smaps. Is it correct to think that if a process requests memory and there is not enough "free", then the kernel will purge from its cache in order to allocate that requested memory?  I'm struggling to see how the kernel thinks there isn't enough free memory when so much is in its cache, but that concern is secondary at this point.  My primary concern is trying to regulate the overall footprint (shared with file system cache or not) so that OOM killer not even part of the conversation in the first place.
>
> # grep Vm /proc/$pid/status
> VmPeak: 982739416 kB
> VmSize: 975784980 kB
> VmLck:         0 kB
> VmPin:         0 kB
> VmHWM:  86555044 kB
> VmRSS:  86526616 kB
> VmData: 42644832 kB
> VmStk:       136 kB
> VmExe:         4 kB
> VmLib:     18028 kB
> VmPTE:    275292 kB
> VmPMD:      3720 kB
> VmSwap:        0 kB
>
> # free -g
>               total        used        free      shared  buff/cache   available
> Mem:            125          54           1           1          69          69
> Swap:             0           0           0
>
> Thanks for the reply!  Apologies if not apropos to this forum - just working my way down the rabbit hole :)
>
> --
> Erik
>
>
>> On Aug 30, 2017, at 8:04 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>> Hello,
>>
>> From the thread linked there, its not clear to me the problem relates
>> to lucene (vs being e.g. a bug in netty, or too many threads, or
>> potentially many other problems).
>>
>> Can you first try to determine to breakdown your problematic "RSS"
>> from the operating system? Maybe this helps determine if your issue is
>> with an anonymous mapping (ByteBuffer.allocateDirect) or file mapping
>> (FileChannel.map).
>>
>> WIth recent kernels you can break down RSS with /proc/pid/XXX/status
>> (RssAnon vs RssFile vs RssShmem):
>>
>>    http://man7.org/linux/man-pages/man5/proc.5.html
>>
>> If your kernel is old you may have to go through more trouble (summing
>> up stuff from smaps or whatever)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to regulate native memory?

Posted by Erik Stephens <mr...@gmail.com>.

Yeah, apologies for that long issue - the netty comments aren't related.  My two comments near the end might be more interesting here:

    https://github.com/elastic/elasticsearch/issues/26269#issuecomment-326060213

To try to summarize, I looked to `/proc/$pid/smaps | grep indices` to quantify what I think is mostly lucene usage.  Is that an accurate way to quantify that?  It shows 51G with `-XX:MaxDirectMemorySize=15G`.  The heap is 30G and the resident memory is reported as 82.5G.  That makes a bit of sense: 30G + 51G + miscellaneous.

`top` reports roughly 51G as shared which is suspiciously close to what I'm seeing in /proc/$pid/smaps. Is it correct to think that if a process requests memory and there is not enough "free", then the kernel will purge from its cache in order to allocate that requested memory?  I'm struggling to see how the kernel thinks there isn't enough free memory when so much is in its cache, but that concern is secondary at this point.  My primary concern is trying to regulate the overall footprint (shared with file system cache or not) so that OOM killer not even part of the conversation in the first place.

# grep Vm /proc/$pid/status
VmPeak:	982739416 kB
VmSize:	975784980 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	86555044 kB
VmRSS:	86526616 kB
VmData:	42644832 kB
VmStk:	     136 kB
VmExe:	       4 kB
VmLib:	   18028 kB
VmPTE:	  275292 kB
VmPMD:	    3720 kB
VmSwap:	       0 kB

# free -g
              total        used        free      shared  buff/cache   available
Mem:            125          54           1           1          69          69
Swap:             0           0           0

Thanks for the reply!  Apologies if not apropos to this forum - just working my way down the rabbit hole :)

--
Erik

> On Aug 30, 2017, at 8:04 PM, Robert Muir <rc...@gmail.com> wrote:
> 
> Hello,
> 
> From the thread linked there, its not clear to me the problem relates
> to lucene (vs being e.g. a bug in netty, or too many threads, or
> potentially many other problems).
> 
> Can you first try to determine to breakdown your problematic "RSS"
> from the operating system? Maybe this helps determine if your issue is
> with an anonymous mapping (ByteBuffer.allocateDirect) or file mapping
> (FileChannel.map).
> 
> WIth recent kernels you can break down RSS with /proc/pid/XXX/status
> (RssAnon vs RssFile vs RssShmem):
> 
>    http://man7.org/linux/man-pages/man5/proc.5.html
> 
> If your kernel is old you may have to go through more trouble (summing
> up stuff from smaps or whatever)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to regulate native memory?

Posted by Robert Muir <rc...@gmail.com>.

Hello,

From the thread linked there, its not clear to me the problem relates
to lucene (vs being e.g. a bug in netty, or too many threads, or
potentially many other problems).

Can you first try to determine to breakdown your problematic "RSS"
from the operating system? Maybe this helps determine if your issue is
with an anonymous mapping (ByteBuffer.allocateDirect) or file mapping
(FileChannel.map).

WIth recent kernels you can break down RSS with /proc/pid/XXX/status
(RssAnon vs RssFile vs RssShmem):

    http://man7.org/linux/man-pages/man5/proc.5.html

If your kernel is old you may have to go through more trouble (summing
up stuff from smaps or whatever)

On Wed, Aug 30, 2017 at 9:58 PM, Erik Stephens <mr...@gmail.com> wrote:
> Our elasticsearch processes have been slowly consuming memory until a kernel OOM kills it.  Details are here:
>
>     https://github.com/elastic/elasticsearch/issues/26269 <https://github.com/elastic/elasticsearch/issues/26269>
>
> To summarize:
>
> - Explicit GC is enabled
> - MaxDirectMemorySize is set
> - Total memory usage for the process is roughly heap (30G) + mmap'd (unbounded) + 1-2G (meta space, threads, etc)
>
> The crowd is suggesting "Don't worry. You want to use all that memory."  I understand that sentiment except for:
>
> - The process eventually gets OOM killed.
> - I need to support multiple processes on same machine and need a more predictable footprint.
>
> It seems to be relatively common knowledge that direct byte buffers require a GC to trigger their freedom.  However, full GC's are happening but not resulting in a reduction of resident mmap'd memory.
>
> Any pointers to source code, settings, or tools are much appreciated.  Thanks!
>
> --
> Erik
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org