You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Cory Kolbeck <ck...@gmail.com> on 2015/10/14 16:28:33 UTC

G1 tuning

Hi folks,

I'm a bit new to the operational side of G1, but pretty familiar with its
basic concept. We recently set up a Kafka cluster to support a new product,
and are seeing some suboptimal GC performance. We're using the parameters
suggested in the docs, except for having switched to java 1.8_40 in order
to get better memory debugging. Even though the cluster is handling only
2-3k messages per second per node, we see periodic 11-18 second
stop-the-world pauses on a roughly hourly cadence. I've turned on
additional GC logging, and see no humongous allocations, it all seems to be
buffers making it into the tenured gen. They appear to be collectable, as
the collection triggered by dumping the heap collects them all. Ideas for
additional diagnosis or tuning very welcome.

--Cory

Re: G1 tuning

Posted by Cory Kolbeck <ck...@gmail.com>.
My current theory, which I haven't dug into the source to confirm, is that
said buffers are being pre-allocated. Because the kafka instance is
relatively bored, they end up living long enough to see a few collections
and be promoted. I could be way off base though.

Command line, broken out for a little better readability:
18918 /opt/java/1.8.0_40/bin/java -cp
:/mnt/services/kafka08/etc::/mnt/services/kafka08/current/lib/java:/mnt/services/kafka08/current/lib/java/zookeeper-3.4.6.jar:/mnt/services/kafka08/current/lib/java/scala-library-2.11.5.jar:/mnt/services/kafka08/current/lib/java/lz4-1.2.0.jar:/mnt/services/kafka08/current/lib/java/scala-parser-combinators_2.11-1.0.2.jar:/mnt/services/kafka08/current/lib/java/metrics-core-2.2.0.jar:/mnt/services/kafka08/current/lib/java/kafka_2.11-0.8.2.0-scaladoc.jar:/mnt/services/kafka08/current/lib/java/kafka_2.11-0.8.2.0-javadoc.jar:/mnt/services/kafka08/current/lib/java/scala-xml_2.11-1.0.2.jar:/mnt/services/kafka08/current/lib/java/kafka_2.11-0.8.2.0.jar:/mnt/services/kafka08/current/lib/java/kafka-clients-0.8.2.0.jar:/mnt/services/kafka08/current/lib/java/snappy-java-1.1.1.6.jar:/mnt/services/kafka08/current/lib/java/kafka_2.11-0.8.2.0-test.jar:/mnt/services/kafka08/current/lib/java/log4j-1.2.16.jar:/mnt/services/kafka08/current/lib/java/kafka_2.11-0.8.2.0-sources.jar:/mnt/services/kafka08/current/lib/java/slf4j-api-1.7.6.jar:/mnt/services/kafka08/current/lib/java/slf4j-log4j12-1.6.1.jar:/mnt/services/kafka08/current/lib/java/zkclient-0.3.jar:/mnt/services/kafka08/current/lib/java/jopt-simple-3.2.jar:
 -Xms4096M -Xmx4096M
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/mnt/services/kafka08/var/dump/2015-10-13-17_34_17.hprof
-XX:+PrintAdaptiveSizePolicy -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:InitiatingHeapOccupancyPercent=35
-Xloggc:/mnt/services/kafka08/var/log/kafka08-gc-2015-10-13-17_34_17.log
-XX:+PrintGCDetails -XX:+PrintGCDateStamps
-Dlog4j.configuration=file:///mnt/services/kafka08/etc/log4j.properties
kafka.Kafka /mnt/services/kafka08/etc/kafka08.properties


On Wed, Oct 14, 2015 at 11:37 AM, Todd Palino <tp...@gmail.com> wrote:

> We've had no problems with G1 in all of our clusters with varying load
> levels. I think we've seen an occasional long GC here and there, but
> nothing recurring at this point.
>
> What's the full command line that you're using with all the options?
>
> -Todd
>
>
> On Wed, Oct 14, 2015 at 2:18 PM, Scott Clasen <sc...@heroku.com> wrote:
>
> > You can also use -Xmn with that gc to size the new gen such that those
> > buffers don't get tenured
> >
> > I don't think that's an option with G1
> >
> > On Wednesday, October 14, 2015, Cory Kolbeck <ck...@gmail.com> wrote:
> >
> > > I'm not sure that will help here, you'll likely have the same
> > > medium-lifetime buffers getting into the tenured generation and forcing
> > > large collections.
> > >
> > > On Wed, Oct 14, 2015 at 10:00 AM, Gerrit Jansen van Vuuren <
> > > gerritjvv@gmail.com <javascript:;>> wrote:
> > >
> > > > Hi,
> > > >
> > > > I've seen pauses using G1 in other applications and have found that
> > > > -XX:+UseParallelGC
> > > > -XX:+UseParallelOldGC  works best if you're having GC issues in
> general
> > > on
> > > > the JVM.
> > > >
> > > >
> > > > Regards,
> > > >  Gerrit
> > > >
> > > > On Wed, Oct 14, 2015 at 4:28 PM, Cory Kolbeck <ckolbeck@gmail.com
> > > <javascript:;>> wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > I'm a bit new to the operational side of G1, but pretty familiar
> with
> > > its
> > > > > basic concept. We recently set up a Kafka cluster to support a new
> > > > product,
> > > > > and are seeing some suboptimal GC performance. We're using the
> > > parameters
> > > > > suggested in the docs, except for having switched to java 1.8_40 in
> > > order
> > > > > to get better memory debugging. Even though the cluster is handling
> > > only
> > > > > 2-3k messages per second per node, we see periodic 11-18 second
> > > > > stop-the-world pauses on a roughly hourly cadence. I've turned on
> > > > > additional GC logging, and see no humongous allocations, it all
> seems
> > > to
> > > > be
> > > > > buffers making it into the tenured gen. They appear to be
> > collectable,
> > > as
> > > > > the collection triggered by dumping the heap collects them all.
> Ideas
> > > for
> > > > > additional diagnosis or tuning very welcome.
> > > > >
> > > > > --Cory
> > > > >
> > > >
> > >
> >
>

Re: G1 tuning

Posted by Todd Palino <tp...@gmail.com>.
We've had no problems with G1 in all of our clusters with varying load
levels. I think we've seen an occasional long GC here and there, but
nothing recurring at this point.

What's the full command line that you're using with all the options?

-Todd


On Wed, Oct 14, 2015 at 2:18 PM, Scott Clasen <sc...@heroku.com> wrote:

> You can also use -Xmn with that gc to size the new gen such that those
> buffers don't get tenured
>
> I don't think that's an option with G1
>
> On Wednesday, October 14, 2015, Cory Kolbeck <ck...@gmail.com> wrote:
>
> > I'm not sure that will help here, you'll likely have the same
> > medium-lifetime buffers getting into the tenured generation and forcing
> > large collections.
> >
> > On Wed, Oct 14, 2015 at 10:00 AM, Gerrit Jansen van Vuuren <
> > gerritjvv@gmail.com <javascript:;>> wrote:
> >
> > > Hi,
> > >
> > > I've seen pauses using G1 in other applications and have found that
> > > -XX:+UseParallelGC
> > > -XX:+UseParallelOldGC  works best if you're having GC issues in general
> > on
> > > the JVM.
> > >
> > >
> > > Regards,
> > >  Gerrit
> > >
> > > On Wed, Oct 14, 2015 at 4:28 PM, Cory Kolbeck <ckolbeck@gmail.com
> > <javascript:;>> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I'm a bit new to the operational side of G1, but pretty familiar with
> > its
> > > > basic concept. We recently set up a Kafka cluster to support a new
> > > product,
> > > > and are seeing some suboptimal GC performance. We're using the
> > parameters
> > > > suggested in the docs, except for having switched to java 1.8_40 in
> > order
> > > > to get better memory debugging. Even though the cluster is handling
> > only
> > > > 2-3k messages per second per node, we see periodic 11-18 second
> > > > stop-the-world pauses on a roughly hourly cadence. I've turned on
> > > > additional GC logging, and see no humongous allocations, it all seems
> > to
> > > be
> > > > buffers making it into the tenured gen. They appear to be
> collectable,
> > as
> > > > the collection triggered by dumping the heap collects them all. Ideas
> > for
> > > > additional diagnosis or tuning very welcome.
> > > >
> > > > --Cory
> > > >
> > >
> >
>

Re: G1 tuning

Posted by Scott Clasen <sc...@heroku.com>.
You can also use -Xmn with that gc to size the new gen such that those
buffers don't get tenured

I don't think that's an option with G1

On Wednesday, October 14, 2015, Cory Kolbeck <ck...@gmail.com> wrote:

> I'm not sure that will help here, you'll likely have the same
> medium-lifetime buffers getting into the tenured generation and forcing
> large collections.
>
> On Wed, Oct 14, 2015 at 10:00 AM, Gerrit Jansen van Vuuren <
> gerritjvv@gmail.com <javascript:;>> wrote:
>
> > Hi,
> >
> > I've seen pauses using G1 in other applications and have found that
> > -XX:+UseParallelGC
> > -XX:+UseParallelOldGC  works best if you're having GC issues in general
> on
> > the JVM.
> >
> >
> > Regards,
> >  Gerrit
> >
> > On Wed, Oct 14, 2015 at 4:28 PM, Cory Kolbeck <ckolbeck@gmail.com
> <javascript:;>> wrote:
> >
> > > Hi folks,
> > >
> > > I'm a bit new to the operational side of G1, but pretty familiar with
> its
> > > basic concept. We recently set up a Kafka cluster to support a new
> > product,
> > > and are seeing some suboptimal GC performance. We're using the
> parameters
> > > suggested in the docs, except for having switched to java 1.8_40 in
> order
> > > to get better memory debugging. Even though the cluster is handling
> only
> > > 2-3k messages per second per node, we see periodic 11-18 second
> > > stop-the-world pauses on a roughly hourly cadence. I've turned on
> > > additional GC logging, and see no humongous allocations, it all seems
> to
> > be
> > > buffers making it into the tenured gen. They appear to be collectable,
> as
> > > the collection triggered by dumping the heap collects them all. Ideas
> for
> > > additional diagnosis or tuning very welcome.
> > >
> > > --Cory
> > >
> >
>

Re: G1 tuning

Posted by Cory Kolbeck <ck...@gmail.com>.
I'm not sure that will help here, you'll likely have the same
medium-lifetime buffers getting into the tenured generation and forcing
large collections.

On Wed, Oct 14, 2015 at 10:00 AM, Gerrit Jansen van Vuuren <
gerritjvv@gmail.com> wrote:

> Hi,
>
> I've seen pauses using G1 in other applications and have found that
> -XX:+UseParallelGC
> -XX:+UseParallelOldGC  works best if you're having GC issues in general on
> the JVM.
>
>
> Regards,
>  Gerrit
>
> On Wed, Oct 14, 2015 at 4:28 PM, Cory Kolbeck <ck...@gmail.com> wrote:
>
> > Hi folks,
> >
> > I'm a bit new to the operational side of G1, but pretty familiar with its
> > basic concept. We recently set up a Kafka cluster to support a new
> product,
> > and are seeing some suboptimal GC performance. We're using the parameters
> > suggested in the docs, except for having switched to java 1.8_40 in order
> > to get better memory debugging. Even though the cluster is handling only
> > 2-3k messages per second per node, we see periodic 11-18 second
> > stop-the-world pauses on a roughly hourly cadence. I've turned on
> > additional GC logging, and see no humongous allocations, it all seems to
> be
> > buffers making it into the tenured gen. They appear to be collectable, as
> > the collection triggered by dumping the heap collects them all. Ideas for
> > additional diagnosis or tuning very welcome.
> >
> > --Cory
> >
>

Re: G1 tuning

Posted by Gerrit Jansen van Vuuren <ge...@gmail.com>.
Hi,

I've seen pauses using G1 in other applications and have found that
-XX:+UseParallelGC
-XX:+UseParallelOldGC  works best if you're having GC issues in general on
the JVM.


Regards,
 Gerrit

On Wed, Oct 14, 2015 at 4:28 PM, Cory Kolbeck <ck...@gmail.com> wrote:

> Hi folks,
>
> I'm a bit new to the operational side of G1, but pretty familiar with its
> basic concept. We recently set up a Kafka cluster to support a new product,
> and are seeing some suboptimal GC performance. We're using the parameters
> suggested in the docs, except for having switched to java 1.8_40 in order
> to get better memory debugging. Even though the cluster is handling only
> 2-3k messages per second per node, we see periodic 11-18 second
> stop-the-world pauses on a roughly hourly cadence. I've turned on
> additional GC logging, and see no humongous allocations, it all seems to be
> buffers making it into the tenured gen. They appear to be collectable, as
> the collection triggered by dumping the heap collects them all. Ideas for
> additional diagnosis or tuning very welcome.
>
> --Cory
>