You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Cory Kolbeck <ck...@gmail.com> on 2015/11/18 23:51:10 UTC

GC Woes

Hi folks,

I've been chasing an issue for a bit now without much luck. We're seeing
occasional (1-2 times a day) pause times of 10+ seconds in a 0.8.2.0 broker
only handling ~3k messages/s. We're only seeing it on one node at a time in
a three node cluster, though which node is affected can change
occasionally. We've tried G1 and Parnew/CMS with various heap sizes and
configurations without fixing the issue.

In digging into things, I found a somewhat odd thing: YourKit's allocation
tracking shows that ~98% (by both count and size) of objects allocated are
closures around string formatting  created in the calls to trace() in the
ReplicaFetcherThread such as
https://github.com/apache/kafka/blob/0.8.2.0/core/src/main/scala/kafka/server/ReplicaFetcherThread.scala#L52

Can anyone replicate this? I see that trace() guards printing internally
with a call to isTraceEnabled, would folks be amenable to explicitly
wrapping the calls there in isTraceEnabled given that it's a decently tight
loop?

Also, if anyone is willing to pitch ideas for GC configs or experiments yo
try, I'm all ears.

Thanks,
Cory K