You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jagbir Hooda <jh...@gmail.com> on 2015/05/01 21:47:00 UTC

High CPU Usage 0.8.1.2 on a kafka client

System: jdk-7u76 (Oracle), on Ubuntu 14.04.1 LTS, Trusty Tahr, Kernel:
3.13.0-44-generic

Kafka Version: 0.8.1.2 Broker Cfg: Three brokers, Zookeeper Cfg: Three
brokers, Replication Factor: 2

Client Machine Configuration::
- Memory: 4G
- CPU: 2 (Xeon @ 2.50GHz)
- Java args:: -server -Xmx2G -Xms2G -XX:+HeapDumpOnOutOfMemoryError
-Djava.awt.headless=true  -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-Dorg.jboss.netty.epollBugWorkaround
-Xrunhprof:cpu=samples,depth=30,file=/tmp/java.hprof


Folks,

I'll appreciate your help in trying to understand high CPU usage by
our kafka clients. We are running four simple java consumers within
the same
java process. One of these consumers uses ten streams (off 50
topics/partitions) consumed by ten worker threads and the rest of the
consumers (three with four independent topics/partitions each) draw
three individual streams fed into three independent worker threads
respectively.
The overall messaging traffic is very low (around two messages/sec).

The process runs fine for some time but after few minutes of normal
run it escalates into a high CPU mode. Following is a 10 minutes of
trace showing
the reason for high CPU

-----------------------------8<-----------------------------
CPU SAMPLES BEGIN (total = 1426307) Fri May  1 18:34:37 2015
 rank   self    accum     count     trace     method
   1 50.26%  50.26%  716803 306229
sun.nio.ch.EPollArrayWrapper.epollWait (kafka consumer see TRACE #
below)
   2 22.27%  72.53%  317632 302942
sun.nio.ch.EPollArrayWrapper.epollWait (netty related see TRACE #
below)
   3 21.85%  94.38%  311678 302855
sun.nio.ch.EPollArrayWrapper.epollWait (zookeeper client see TRACE #
below)
   4   2.78%  97.16%    39709 302957 sun.nio.ch.EPollArrayWrapper.epollWait
   5   2.78%  99.95%    39706 304668 sun.nio.ch.EPollArrayWrapper.epollWait

Where::
TRACE 306229:
        sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown
line)
        sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:221)
        sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
        java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
        kafka.utils.Utils$.read(Utils.scala:375)
        kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
        kafka.network.Receive$class.readCompletely(Transmission.scala:56)
        kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
        kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
        kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:73)
        kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
        kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109)
        kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
        kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
        kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
        kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:108)
        kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
        kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
        kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
        kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107)
        kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
        kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
        kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
TRACE 302855:
        sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown
line)
        sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1134)
TRACE 302942:
        sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown
line)
        sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
        org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
        org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
        org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
-------------------------------->8------------------------------------------

We are using the following connector configuration:

-------------------------8<-----------------------------
"socket.timeout.ms" = "60000"
"socket.receive.buffer.bytes" = "65536"
"fetch.message.max.bytes" = "1000000"
"auto.commit.enable" = "true"
"auto.commit.interval.ms" = "60000"
"queued.max.message.chunks" = "10"
"rebalance.max.retries" = "10"
"fetch.min.bytes" = "1"
"fetch.wait.max.ms" = "50000"
"rebalance.backoff.ms" = "2000"
"refresh.leader.backoff.ms" = "200"
"auto.offset.reset" = "smallest"
"consumer.timeout.ms" = "60000")
"zookeeper.session.timeout.ms" = "6000"
"zookeeper.connection.timeout.ms" = "6000"
"zookeeper.sync.time.ms" = "2000"
------------------------->8----------------------------

My guess is that we are missing/misusing some configuration which is
driving up the CPU, but we are not able to pin point exactly.
Will appreciate if you can provide some pointers.

Thanks a lot,
jsh