You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Arpit Gogia <ar...@ixigo.com> on 2019/09/27 08:29:07 UTC

Broker regularly facing memory issues

Hello Kafka user group


I am running a Kafka cluster with 3 brokers and have been experiencing
frequent OutOfMemory errors each time with similar error stack trace


java.io.IOException: Map failed

    at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:938)

    at
kafka.log.AbstractIndex$$anonfun$resize$1.apply$mcZ$sp(AbstractIndex.scala:188)

    at
kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:173)

    at
kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:173)

    at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)

    at kafka.log.AbstractIndex.resize(AbstractIndex.scala:173)

    at
kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcZ$sp(AbstractIndex.scala:242)

    at
kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:242)

    at
kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:242)

    at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)

    at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241)

    at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:501)

    at kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1635)

    at kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1635)

    at scala.Option.foreach(Option.scala:257)

    at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1635)

    at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1599)

    at kafka.log.Log.maybeHandleIOException(Log.scala:1996)

    at kafka.log.Log.roll(Log.scala:1599)

    at kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1434)

    at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1429)

    at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1429)

    at kafka.log.Log.maybeHandleIOException(Log.scala:1996)

    at kafka.log.Log.deleteSegments(Log.scala:1429)

    at kafka.log.Log.deleteOldSegments(Log.scala:1424)

    at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1501)

    at kafka.log.Log.deleteOldSegments(Log.scala:1492)

    at
kafka.log.LogCleaner$CleanerThread$$anonfun$cleanFilthiestLog$1.apply(LogCleaner.scala:328)

    at
kafka.log.LogCleaner$CleanerThread$$anonfun$cleanFilthiestLog$1.apply(LogCleaner.scala:324)

    at scala.collection.immutable.List.foreach(List.scala:392)

    at
kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:324)

    at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)

    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)

Caused by: java.lang.OutOfMemoryError: Map failed

    at sun.nio.ch.FileChannelImpl.map0(Native Method)

    at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:935)

    ... 32 more


Each broker possesses 16 GB of memory out of which 10 GB is allotted to the
JVM as heap. Total partition count on each broker is approximately 2000
with an average partition size of 300 MB.


After looking around, I found out that increasing the OS level memory map
area limit `vm.max_map_count` is a viable solution, since Kafka memory
map’s segment files while rolling over and the above stack trace indicates
a failure in doing that. Since then I have increased this every time a
broker goes down with this error. Currently I am at 250,000 on two brokers
and 200,000 on one, which is very high considering the estimation formula
mentioned at https://kafka.apache.org/documentation/#os. Most recently I
started to monitor the memory map file count (using /proc/<pid>/maps) of
the Kafka process on each broker, below is the graph.


[image: Screenshot 2019-09-27 at 12.02.38 PM.png]


My concern is that this value is on an overall increasing trend, with an
average increase of 27.7K across brokers in the roughly 2 days of
monitoring.


Following are my questions:

   1. Will I have to keep incrementing `vm.max_map_count` till I arrive at
   a stable value?
   2. Could this by any chance indicate a memory leak? Maybe in the
   subroutine that rolls over segment files?
   3. Could the lack of page cache memory be a cause as well? Volume of
   cached memory seems to remain consistent across time so it doesn’t appear
   to be a suspect by I am not ruling it out for now. As a mitigation I will
   be decreasing the JVM heap next time so that more memory is available for
   page cache.

-- 

*Arpit Gogia **|* *Data Engineer*

Re: Broker regularly facing memory issues

Posted by Arpit Gogia <ar...@ixigo.com>.
Thanks for the input.
I do plan on reducing the heap for the next restart. Initially I'd thought
that less heap memory could also be the reason but since then I've learned
that cannot be the issue.
I also found the number 262144 online but I believe those results were
related to Elastic Search.


On Fri, Sep 27, 2019 at 2:31 PM Karolis Pocius
<ka...@sentiance.com.invalid> wrote:

> How did you arrive at the 10 GB JVM heap value? I'm running Kafka on 16 GB
> RAM instances with ~4000 partitions each and only assigning 5 GB to JVM of
> which Kafka only seems to be using ~2 GB at any given time.
>
> Also, I've set vm.max_map_count to 262144 -- didn't use any formula to
> estimate that, must have been some answer I found online, but it's been
> doing its trick -- no issues so far.
>
> On Fri, Sep 27, 2019 at 11:29 AM Arpit Gogia <ar...@ixigo.com> wrote:
>
> > Hello Kafka user group
> >
> >
> > I am running a Kafka cluster with 3 brokers and have been experiencing
> > frequent OutOfMemory errors each time with similar error stack trace
> >
> >
> > java.io.IOException: Map failed
> >
> >     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:938)
> >
> >     at
> >
> kafka.log.AbstractIndex$$anonfun$resize$1.apply$mcZ$sp(AbstractIndex.scala:188)
> >
> >     at
> > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:173)
> >
> >     at
> > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:173)
> >
> >     at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
> >
> >     at kafka.log.AbstractIndex.resize(AbstractIndex.scala:173)
> >
> >     at
> >
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcZ$sp(AbstractIndex.scala:242)
> >
> >     at
> >
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:242)
> >
> >     at
> >
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:242)
> >
> >     at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
> >
> >     at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241)
> >
> >     at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:501)
> >
> >     at
> > kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1635)
> >
> >     at
> > kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1635)
> >
> >     at scala.Option.foreach(Option.scala:257)
> >
> >     at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1635)
> >
> >     at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1599)
> >
> >     at kafka.log.Log.maybeHandleIOException(Log.scala:1996)
> >
> >     at kafka.log.Log.roll(Log.scala:1599)
> >
> >     at
> kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1434)
> >
> >     at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1429)
> >
> >     at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1429)
> >
> >     at kafka.log.Log.maybeHandleIOException(Log.scala:1996)
> >
> >     at kafka.log.Log.deleteSegments(Log.scala:1429)
> >
> >     at kafka.log.Log.deleteOldSegments(Log.scala:1424)
> >
> >     at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1501)
> >
> >     at kafka.log.Log.deleteOldSegments(Log.scala:1492)
> >
> >     at
> >
> kafka.log.LogCleaner$CleanerThread$$anonfun$cleanFilthiestLog$1.apply(LogCleaner.scala:328)
> >
> >     at
> >
> kafka.log.LogCleaner$CleanerThread$$anonfun$cleanFilthiestLog$1.apply(LogCleaner.scala:324)
> >
> >     at scala.collection.immutable.List.foreach(List.scala:392)
> >
> >     at
> >
> kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:324)
> >
> >     at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
> >
> >     at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> >
> > Caused by: java.lang.OutOfMemoryError: Map failed
> >
> >     at sun.nio.ch.FileChannelImpl.map0(Native Method)
> >
> >     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:935)
> >
> >     ... 32 more
> >
> >
> > Each broker possesses 16 GB of memory out of which 10 GB is allotted to
> > the JVM as heap. Total partition count on each broker is approximately
> 2000
> > with an average partition size of 300 MB.
> >
> >
> > After looking around, I found out that increasing the OS level memory map
> > area limit `vm.max_map_count` is a viable solution, since Kafka memory
> > map’s segment files while rolling over and the above stack trace
> indicates
> > a failure in doing that. Since then I have increased this every time a
> > broker goes down with this error. Currently I am at 250,000 on two
> brokers
> > and 200,000 on one, which is very high considering the estimation formula
> > mentioned at https://kafka.apache.org/documentation/#os. Most recently I
> > started to monitor the memory map file count (using /proc/<pid>/maps) of
> > the Kafka process on each broker, below is the graph.
> >
> >
> > [image: Screenshot 2019-09-27 at 12.02.38 PM.png]
> >
> >
> > My concern is that this value is on an overall increasing trend, with an
> > average increase of 27.7K across brokers in the roughly 2 days of
> > monitoring.
> >
> >
> > Following are my questions:
> >
> >    1. Will I have to keep incrementing `vm.max_map_count` till I arrive
> >    at a stable value?
> >    2. Could this by any chance indicate a memory leak? Maybe in the
> >    subroutine that rolls over segment files?
> >    3. Could the lack of page cache memory be a cause as well? Volume of
> >    cached memory seems to remain consistent across time so it doesn’t
> appear
> >    to be a suspect by I am not ruling it out for now. As a mitigation I
> will
> >    be decreasing the JVM heap next time so that more memory is available
> for
> >    page cache.
> >
> > --
> >
> > *Arpit Gogia **|* *Data Engineer*
> >
>


-- 

*Arpit Gogia **|* *Software Engineer*

*+91 9650232753*

Re: Broker regularly facing memory issues

Posted by Karolis Pocius <ka...@sentiance.com.INVALID>.
How did you arrive at the 10 GB JVM heap value? I'm running Kafka on 16 GB
RAM instances with ~4000 partitions each and only assigning 5 GB to JVM of
which Kafka only seems to be using ~2 GB at any given time.

Also, I've set vm.max_map_count to 262144 -- didn't use any formula to
estimate that, must have been some answer I found online, but it's been
doing its trick -- no issues so far.

On Fri, Sep 27, 2019 at 11:29 AM Arpit Gogia <ar...@ixigo.com> wrote:

> Hello Kafka user group
>
>
> I am running a Kafka cluster with 3 brokers and have been experiencing
> frequent OutOfMemory errors each time with similar error stack trace
>
>
> java.io.IOException: Map failed
>
>     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:938)
>
>     at
> kafka.log.AbstractIndex$$anonfun$resize$1.apply$mcZ$sp(AbstractIndex.scala:188)
>
>     at
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:173)
>
>     at
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:173)
>
>     at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
>
>     at kafka.log.AbstractIndex.resize(AbstractIndex.scala:173)
>
>     at
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcZ$sp(AbstractIndex.scala:242)
>
>     at
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:242)
>
>     at
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:242)
>
>     at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
>
>     at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241)
>
>     at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:501)
>
>     at
> kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1635)
>
>     at
> kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1635)
>
>     at scala.Option.foreach(Option.scala:257)
>
>     at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1635)
>
>     at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1599)
>
>     at kafka.log.Log.maybeHandleIOException(Log.scala:1996)
>
>     at kafka.log.Log.roll(Log.scala:1599)
>
>     at kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1434)
>
>     at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1429)
>
>     at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1429)
>
>     at kafka.log.Log.maybeHandleIOException(Log.scala:1996)
>
>     at kafka.log.Log.deleteSegments(Log.scala:1429)
>
>     at kafka.log.Log.deleteOldSegments(Log.scala:1424)
>
>     at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1501)
>
>     at kafka.log.Log.deleteOldSegments(Log.scala:1492)
>
>     at
> kafka.log.LogCleaner$CleanerThread$$anonfun$cleanFilthiestLog$1.apply(LogCleaner.scala:328)
>
>     at
> kafka.log.LogCleaner$CleanerThread$$anonfun$cleanFilthiestLog$1.apply(LogCleaner.scala:324)
>
>     at scala.collection.immutable.List.foreach(List.scala:392)
>
>     at
> kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:324)
>
>     at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
>
>     at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
>
> Caused by: java.lang.OutOfMemoryError: Map failed
>
>     at sun.nio.ch.FileChannelImpl.map0(Native Method)
>
>     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:935)
>
>     ... 32 more
>
>
> Each broker possesses 16 GB of memory out of which 10 GB is allotted to
> the JVM as heap. Total partition count on each broker is approximately 2000
> with an average partition size of 300 MB.
>
>
> After looking around, I found out that increasing the OS level memory map
> area limit `vm.max_map_count` is a viable solution, since Kafka memory
> map’s segment files while rolling over and the above stack trace indicates
> a failure in doing that. Since then I have increased this every time a
> broker goes down with this error. Currently I am at 250,000 on two brokers
> and 200,000 on one, which is very high considering the estimation formula
> mentioned at https://kafka.apache.org/documentation/#os. Most recently I
> started to monitor the memory map file count (using /proc/<pid>/maps) of
> the Kafka process on each broker, below is the graph.
>
>
> [image: Screenshot 2019-09-27 at 12.02.38 PM.png]
>
>
> My concern is that this value is on an overall increasing trend, with an
> average increase of 27.7K across brokers in the roughly 2 days of
> monitoring.
>
>
> Following are my questions:
>
>    1. Will I have to keep incrementing `vm.max_map_count` till I arrive
>    at a stable value?
>    2. Could this by any chance indicate a memory leak? Maybe in the
>    subroutine that rolls over segment files?
>    3. Could the lack of page cache memory be a cause as well? Volume of
>    cached memory seems to remain consistent across time so it doesn’t appear
>    to be a suspect by I am not ruling it out for now. As a mitigation I will
>    be decreasing the JVM heap next time so that more memory is available for
>    page cache.
>
> --
>
> *Arpit Gogia **|* *Data Engineer*
>