You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jay Kreps (JIRA)" <ji...@apache.org> on 2012/08/31 18:42:08 UTC

[jira] [Created] (KAFKA-493) High CPU usage on inactive server

Jay Kreps created KAFKA-493:
-------------------------------

             Summary: High CPU usage on inactive server
                 Key: KAFKA-493
                 URL: https://issues.apache.org/jira/browse/KAFKA-493
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.8
            Reporter: Jay Kreps
             Fix For: 0.8


> I've been playing with the 0.8 branch of Kafka and noticed that idle CPU usage is fairly high (13% of a 
> core). Is that to be expected? I did look at the stack, but didn't see anything obvious. A background 
> task?

> I wanted to mention how I am getting into this state. I've set up two machines with the latest 0.8 
> code base and am using a replication factor of 2. On starting the brokers there is no idle CPU activity. 
> Then I run a test that essential does 10k publish operations followed by immediate consume operations 
> (I was measuring latency). Once this has run the kafka nodes seem to consistently be consuming CPU 
> essentially forever.

hprof results:

THREAD START (obj=500003ae, id = 200004, name="RMI TCP Accept-0", group="system")
THREAD START (obj=500003ae, id = 200005, name="RMI TCP Accept-9999", group="system")
THREAD START (obj=500003ae, id = 200006, name="RMI TCP Accept-0", group="system")
THREAD START (obj=500003ae, id = 200001, name="main", group="main")
THREAD START (obj=500003ae, id = 200007, name="Thread-2", group="main")
THREAD START (obj=500003ae, id = 200008, name="Thread-3", group="main")
THREAD START (obj=500003ae, id = 200009, name="kafka-processor-9092-0", group="main")
THREAD START (obj=500003ae, id = 200010, name="kafka-processor-9092-1", group="main")
THREAD START (obj=500003ae, id = 200011, name="kafka-acceptor", group="main")
THREAD START (obj=5000074b, id = 200012, name="ZkClient-EventThread-20-localhost:2181", group="main")
THREAD START (obj=5000076e, id = 200014, name="main-SendThread()", group="main")
THREAD START (obj=5000076d, id = 200013, name="main-EventThread", group="main")
THREAD START (obj=500003ae, id = 200015, name="metrics-meter-tick-thread-1", group="main")
THREAD START (obj=500003ae, id = 200016, name="metrics-meter-tick-thread-2", group="main")
THREAD START (obj=500003ae, id = 200017, name="request-expiration-task", group="main")
THREAD START (obj=500003ae, id = 200018, name="request-expiration-task", group="main")
THREAD START (obj=500003ae, id = 200019, name="kafka-request-handler-0", group="main")
THREAD START (obj=500003ae, id = 200020, name="kafka-request-handler-1", group="main")
THREAD START (obj=500003ae, id = 200021, name="Thread-6", group="main")
THREAD START (obj=500003ae, id = 200022, name="Thread-7", group="main")
THREAD START (obj=50000899, id = 200023, name="ReplicaFetcherThread-0-2 on broker 1, ", group="main")
THREAD START (obj=50000899, id = 200024, name="ReplicaFetcherThread-0-3 on broker 1, ", group="main")
THREAD START (obj=50000899, id = 200025, name="ReplicaFetcherThread-0-0 on broker 1, ", group="main")
THREAD START (obj=50000899, id = 200026, name="ReplicaFetcherThread-0-1 on broker 1, ", group="main")
THREAD START (obj=500003ae, id = 200028, name="SIGINT handler", group="system")
THREAD START (obj=500003ae, id = 200029, name="Thread-5", group="main")
THREAD START (obj=5000074b, id = 200030, name="Thread-1", group="main")
THREAD START (obj=5000074b, id = 200031, name="Thread-0", group="main")
THREAD END (id = 200031)
THREAD END (id = 200029)
THREAD END (id = 200020)
THREAD END (id = 200019)
THREAD END (id = 200008)
THREAD END (id = 200021)
THREAD END (id = 200007)
THREAD END (id = 200022)
THREAD END (id = 200018)
THREAD END (id = 200017)
THREAD END (id = 200012)
THREAD END (id = 200013)
THREAD END (id = 200014)
THREAD END (id = 200025)
THREAD END (id = 200023)
THREAD END (id = 200026)
THREAD END (id = 200024)
THREAD END (id = 200011)
THREAD END (id = 200009)
THREAD END (id = 200010)
THREAD END (id = 200030)
THREAD END (id = 200028)
TRACE 301281:
        sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
        sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
        sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
        sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218)
        sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
        java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
        kafka.utils.Utils$.read(Utils.scala:453)
        kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
        kafka.network.Receive$class.readCompletely(Transmission.scala:56)
        kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
        kafka.network.BlockingChannel.receive(BlockingChannel.scala:92)
        kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:67)
        kafka.consumer.SimpleConsumer.sendRequest(SimpleConsumer.scala:65)
        kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:95)
        kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:71)
        kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:50)
TRACE 300427:
        java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
        java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
        java.net.ServerSocket.implAccept(ServerSocket.java:522)
        java.net.ServerSocket.accept(ServerSocket.java:490)
        sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
        sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
        java.lang.Thread.run(Thread.java:722)
TRACE 300963:
        sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
        sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
        sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
        sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        kafka.network.Processor.run(SocketServer.scala:202)
        java.lang.Thread.run(Thread.java:722)
TRACE 300558:
        java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
        java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
        java.net.ServerSocket.implAccept(ServerSocket.java:522)
        java.net.ServerSocket.accept(ServerSocket.java:490)
        sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52)
        sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
        sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
        java.lang.Thread.run(Thread.java:722)
TRACE 300978:
        sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
        sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
        sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
        sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        kafka.network.Acceptor.run(SocketServer.scala:142)
        java.lang.Thread.run(Thread.java:722)
TRACE 301024:
        sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
        sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
        sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
        sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1134)
TRACE 301830:
        sun.misc.Unsafe.park(Unsafe.java:Unknown line)
        java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
        java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
        java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
        kafka.server.RequestPurgatory$ExpiredRequestReaper.pollExpired(RequestPurgatory.scala:282)
        kafka.server.RequestPurgatory$ExpiredRequestReaper.run(RequestPurgatory.scala:235)
        java.lang.Thread.run(Thread.java:722)
CPU SAMPLES BEGIN (total = 1878003) Thu Aug 30 22:26:05 2012
rank   self  accum   count trace method
   1 36.22% 36.22%  680139 301281 sun.nio.ch.EPollArrayWrapper.epollWait
   2 18.15% 54.36%  340770 300427 java.net.PlainSocketImpl.socketAccept
   3 18.13% 72.49%  340458 300963 sun.nio.ch.EPollArrayWrapper.epollWait
   4  9.07% 81.56%  170382 300558 java.net.PlainSocketImpl.socketAccept
   5  9.07% 90.63%  170340 300978 sun.nio.ch.EPollArrayWrapper.epollWait
   6  9.06% 99.69%  170139 301024 sun.nio.ch.EPollArrayWrapper.epollWait
   7  0.18% 99.88%    3429 301830 sun.misc.Unsafe.park
CPU SAMPLES END



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-493) High CPU usage on inactive server

Posted by "Sam Meder (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447955#comment-13447955 ] 

Sam Meder commented on KAFKA-493:
---------------------------------

Attached a stacktrace, looks like there definitely are som runnable threads...
                
> High CPU usage on inactive server
> ---------------------------------
>
>                 Key: KAFKA-493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-493
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: stacktrace.txt
>
>
> > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU usage is fairly high (13% of a 
> > core). Is that to be expected? I did look at the stack, but didn't see anything obvious. A background 
> > task?
> > I wanted to mention how I am getting into this state. I've set up two machines with the latest 0.8 
> > code base and am using a replication factor of 2. On starting the brokers there is no idle CPU activity. 
> > Then I run a test that essential does 10k publish operations followed by immediate consume operations 
> > (I was measuring latency). Once this has run the kafka nodes seem to consistently be consuming CPU 
> > essentially forever.
> hprof results:
> THREAD START (obj=500003ae, id = 200004, name="RMI TCP Accept-0", group="system")
> THREAD START (obj=500003ae, id = 200005, name="RMI TCP Accept-9999", group="system")
> THREAD START (obj=500003ae, id = 200006, name="RMI TCP Accept-0", group="system")
> THREAD START (obj=500003ae, id = 200001, name="main", group="main")
> THREAD START (obj=500003ae, id = 200007, name="Thread-2", group="main")
> THREAD START (obj=500003ae, id = 200008, name="Thread-3", group="main")
> THREAD START (obj=500003ae, id = 200009, name="kafka-processor-9092-0", group="main")
> THREAD START (obj=500003ae, id = 200010, name="kafka-processor-9092-1", group="main")
> THREAD START (obj=500003ae, id = 200011, name="kafka-acceptor", group="main")
> THREAD START (obj=5000074b, id = 200012, name="ZkClient-EventThread-20-localhost:2181", group="main")
> THREAD START (obj=5000076e, id = 200014, name="main-SendThread()", group="main")
> THREAD START (obj=5000076d, id = 200013, name="main-EventThread", group="main")
> THREAD START (obj=500003ae, id = 200015, name="metrics-meter-tick-thread-1", group="main")
> THREAD START (obj=500003ae, id = 200016, name="metrics-meter-tick-thread-2", group="main")
> THREAD START (obj=500003ae, id = 200017, name="request-expiration-task", group="main")
> THREAD START (obj=500003ae, id = 200018, name="request-expiration-task", group="main")
> THREAD START (obj=500003ae, id = 200019, name="kafka-request-handler-0", group="main")
> THREAD START (obj=500003ae, id = 200020, name="kafka-request-handler-1", group="main")
> THREAD START (obj=500003ae, id = 200021, name="Thread-6", group="main")
> THREAD START (obj=500003ae, id = 200022, name="Thread-7", group="main")
> THREAD START (obj=50000899, id = 200023, name="ReplicaFetcherThread-0-2 on broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200024, name="ReplicaFetcherThread-0-3 on broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200025, name="ReplicaFetcherThread-0-0 on broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200026, name="ReplicaFetcherThread-0-1 on broker 1, ", group="main")
> THREAD START (obj=500003ae, id = 200028, name="SIGINT handler", group="system")
> THREAD START (obj=500003ae, id = 200029, name="Thread-5", group="main")
> THREAD START (obj=5000074b, id = 200030, name="Thread-1", group="main")
> THREAD START (obj=5000074b, id = 200031, name="Thread-0", group="main")
> THREAD END (id = 200031)
> THREAD END (id = 200029)
> THREAD END (id = 200020)
> THREAD END (id = 200019)
> THREAD END (id = 200008)
> THREAD END (id = 200021)
> THREAD END (id = 200007)
> THREAD END (id = 200022)
> THREAD END (id = 200018)
> THREAD END (id = 200017)
> THREAD END (id = 200012)
> THREAD END (id = 200013)
> THREAD END (id = 200014)
> THREAD END (id = 200025)
> THREAD END (id = 200023)
> THREAD END (id = 200026)
> THREAD END (id = 200024)
> THREAD END (id = 200011)
> THREAD END (id = 200009)
> THREAD END (id = 200010)
> THREAD END (id = 200030)
> THREAD END (id = 200028)
> TRACE 301281:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218)
>         sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>         java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>         kafka.utils.Utils$.read(Utils.scala:453)
>         kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>         kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>         kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>         kafka.network.BlockingChannel.receive(BlockingChannel.scala:92)
>         kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:67)
>         kafka.consumer.SimpleConsumer.sendRequest(SimpleConsumer.scala:65)
>         kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:95)
>         kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:71)
>         kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:50)
> TRACE 300427:
>         java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
>         java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         java.net.ServerSocket.implAccept(ServerSocket.java:522)
>         java.net.ServerSocket.accept(ServerSocket.java:490)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300963:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         kafka.network.Processor.run(SocketServer.scala:202)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300558:
>         java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
>         java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         java.net.ServerSocket.implAccept(ServerSocket.java:522)
>         java.net.ServerSocket.accept(ServerSocket.java:490)
>         sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300978:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         kafka.network.Acceptor.run(SocketServer.scala:142)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 301024:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1134)
> TRACE 301830:
>         sun.misc.Unsafe.park(Unsafe.java:Unknown line)
>         java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>         java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>         java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
>         kafka.server.RequestPurgatory$ExpiredRequestReaper.pollExpired(RequestPurgatory.scala:282)
>         kafka.server.RequestPurgatory$ExpiredRequestReaper.run(RequestPurgatory.scala:235)
>         java.lang.Thread.run(Thread.java:722)
> CPU SAMPLES BEGIN (total = 1878003) Thu Aug 30 22:26:05 2012
> rank   self  accum   count trace method
>    1 36.22% 36.22%  680139 301281 sun.nio.ch.EPollArrayWrapper.epollWait
>    2 18.15% 54.36%  340770 300427 java.net.PlainSocketImpl.socketAccept
>    3 18.13% 72.49%  340458 300963 sun.nio.ch.EPollArrayWrapper.epollWait
>    4  9.07% 81.56%  170382 300558 java.net.PlainSocketImpl.socketAccept
>    5  9.07% 90.63%  170340 300978 sun.nio.ch.EPollArrayWrapper.epollWait
>    6  9.06% 99.69%  170139 301024 sun.nio.ch.EPollArrayWrapper.epollWait
>    7  0.18% 99.88%    3429 301830 sun.misc.Unsafe.park
> CPU SAMPLES END

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-493) High CPU usage on inactive server

Posted by "Sam Meder (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sam Meder updated KAFKA-493:
----------------------------

    Attachment: stacktrace.txt
    
> High CPU usage on inactive server
> ---------------------------------
>
>                 Key: KAFKA-493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-493
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: stacktrace.txt
>
>
> > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU usage is fairly high (13% of a 
> > core). Is that to be expected? I did look at the stack, but didn't see anything obvious. A background 
> > task?
> > I wanted to mention how I am getting into this state. I've set up two machines with the latest 0.8 
> > code base and am using a replication factor of 2. On starting the brokers there is no idle CPU activity. 
> > Then I run a test that essential does 10k publish operations followed by immediate consume operations 
> > (I was measuring latency). Once this has run the kafka nodes seem to consistently be consuming CPU 
> > essentially forever.
> hprof results:
> THREAD START (obj=500003ae, id = 200004, name="RMI TCP Accept-0", group="system")
> THREAD START (obj=500003ae, id = 200005, name="RMI TCP Accept-9999", group="system")
> THREAD START (obj=500003ae, id = 200006, name="RMI TCP Accept-0", group="system")
> THREAD START (obj=500003ae, id = 200001, name="main", group="main")
> THREAD START (obj=500003ae, id = 200007, name="Thread-2", group="main")
> THREAD START (obj=500003ae, id = 200008, name="Thread-3", group="main")
> THREAD START (obj=500003ae, id = 200009, name="kafka-processor-9092-0", group="main")
> THREAD START (obj=500003ae, id = 200010, name="kafka-processor-9092-1", group="main")
> THREAD START (obj=500003ae, id = 200011, name="kafka-acceptor", group="main")
> THREAD START (obj=5000074b, id = 200012, name="ZkClient-EventThread-20-localhost:2181", group="main")
> THREAD START (obj=5000076e, id = 200014, name="main-SendThread()", group="main")
> THREAD START (obj=5000076d, id = 200013, name="main-EventThread", group="main")
> THREAD START (obj=500003ae, id = 200015, name="metrics-meter-tick-thread-1", group="main")
> THREAD START (obj=500003ae, id = 200016, name="metrics-meter-tick-thread-2", group="main")
> THREAD START (obj=500003ae, id = 200017, name="request-expiration-task", group="main")
> THREAD START (obj=500003ae, id = 200018, name="request-expiration-task", group="main")
> THREAD START (obj=500003ae, id = 200019, name="kafka-request-handler-0", group="main")
> THREAD START (obj=500003ae, id = 200020, name="kafka-request-handler-1", group="main")
> THREAD START (obj=500003ae, id = 200021, name="Thread-6", group="main")
> THREAD START (obj=500003ae, id = 200022, name="Thread-7", group="main")
> THREAD START (obj=50000899, id = 200023, name="ReplicaFetcherThread-0-2 on broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200024, name="ReplicaFetcherThread-0-3 on broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200025, name="ReplicaFetcherThread-0-0 on broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200026, name="ReplicaFetcherThread-0-1 on broker 1, ", group="main")
> THREAD START (obj=500003ae, id = 200028, name="SIGINT handler", group="system")
> THREAD START (obj=500003ae, id = 200029, name="Thread-5", group="main")
> THREAD START (obj=5000074b, id = 200030, name="Thread-1", group="main")
> THREAD START (obj=5000074b, id = 200031, name="Thread-0", group="main")
> THREAD END (id = 200031)
> THREAD END (id = 200029)
> THREAD END (id = 200020)
> THREAD END (id = 200019)
> THREAD END (id = 200008)
> THREAD END (id = 200021)
> THREAD END (id = 200007)
> THREAD END (id = 200022)
> THREAD END (id = 200018)
> THREAD END (id = 200017)
> THREAD END (id = 200012)
> THREAD END (id = 200013)
> THREAD END (id = 200014)
> THREAD END (id = 200025)
> THREAD END (id = 200023)
> THREAD END (id = 200026)
> THREAD END (id = 200024)
> THREAD END (id = 200011)
> THREAD END (id = 200009)
> THREAD END (id = 200010)
> THREAD END (id = 200030)
> THREAD END (id = 200028)
> TRACE 301281:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218)
>         sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>         java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>         kafka.utils.Utils$.read(Utils.scala:453)
>         kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>         kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>         kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>         kafka.network.BlockingChannel.receive(BlockingChannel.scala:92)
>         kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:67)
>         kafka.consumer.SimpleConsumer.sendRequest(SimpleConsumer.scala:65)
>         kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:95)
>         kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:71)
>         kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:50)
> TRACE 300427:
>         java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
>         java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         java.net.ServerSocket.implAccept(ServerSocket.java:522)
>         java.net.ServerSocket.accept(ServerSocket.java:490)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300963:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         kafka.network.Processor.run(SocketServer.scala:202)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300558:
>         java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
>         java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         java.net.ServerSocket.implAccept(ServerSocket.java:522)
>         java.net.ServerSocket.accept(ServerSocket.java:490)
>         sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300978:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         kafka.network.Acceptor.run(SocketServer.scala:142)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 301024:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1134)
> TRACE 301830:
>         sun.misc.Unsafe.park(Unsafe.java:Unknown line)
>         java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>         java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>         java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
>         kafka.server.RequestPurgatory$ExpiredRequestReaper.pollExpired(RequestPurgatory.scala:282)
>         kafka.server.RequestPurgatory$ExpiredRequestReaper.run(RequestPurgatory.scala:235)
>         java.lang.Thread.run(Thread.java:722)
> CPU SAMPLES BEGIN (total = 1878003) Thu Aug 30 22:26:05 2012
> rank   self  accum   count trace method
>    1 36.22% 36.22%  680139 301281 sun.nio.ch.EPollArrayWrapper.epollWait
>    2 18.15% 54.36%  340770 300427 java.net.PlainSocketImpl.socketAccept
>    3 18.13% 72.49%  340458 300963 sun.nio.ch.EPollArrayWrapper.epollWait
>    4  9.07% 81.56%  170382 300558 java.net.PlainSocketImpl.socketAccept
>    5  9.07% 90.63%  170340 300978 sun.nio.ch.EPollArrayWrapper.epollWait
>    6  9.06% 99.69%  170139 301024 sun.nio.ch.EPollArrayWrapper.epollWait
>    7  0.18% 99.88%    3429 301830 sun.misc.Unsafe.park
> CPU SAMPLES END

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-493) High CPU usage on inactive server

Posted by "Jay Kreps (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447991#comment-13447991 ] 

Jay Kreps commented on KAFKA-493:
---------------------------------

I think the root cause is this jdk bug:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933

Basically the select loop in NIO can end up with a socket registered in some kind of error state so that the select loop always returns immediately, which results in a kind of busy wait for something that should be blocking.

I suspect that all our NIO usage has the possibility of this occurring, but something about the fetcher thread seems to make it happen much more easily.

I think this is the same root cause issue as KAFKA-479.




                
> High CPU usage on inactive server
> ---------------------------------
>
>                 Key: KAFKA-493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-493
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: stacktrace.txt
>
>
> > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU usage is fairly high (13% of a 
> > core). Is that to be expected? I did look at the stack, but didn't see anything obvious. A background 
> > task?
> > I wanted to mention how I am getting into this state. I've set up two machines with the latest 0.8 
> > code base and am using a replication factor of 2. On starting the brokers there is no idle CPU activity. 
> > Then I run a test that essential does 10k publish operations followed by immediate consume operations 
> > (I was measuring latency). Once this has run the kafka nodes seem to consistently be consuming CPU 
> > essentially forever.
> hprof results:
> THREAD START (obj=500003ae, id = 200004, name="RMI TCP Accept-0", group="system")
> THREAD START (obj=500003ae, id = 200005, name="RMI TCP Accept-9999", group="system")
> THREAD START (obj=500003ae, id = 200006, name="RMI TCP Accept-0", group="system")
> THREAD START (obj=500003ae, id = 200001, name="main", group="main")
> THREAD START (obj=500003ae, id = 200007, name="Thread-2", group="main")
> THREAD START (obj=500003ae, id = 200008, name="Thread-3", group="main")
> THREAD START (obj=500003ae, id = 200009, name="kafka-processor-9092-0", group="main")
> THREAD START (obj=500003ae, id = 200010, name="kafka-processor-9092-1", group="main")
> THREAD START (obj=500003ae, id = 200011, name="kafka-acceptor", group="main")
> THREAD START (obj=5000074b, id = 200012, name="ZkClient-EventThread-20-localhost:2181", group="main")
> THREAD START (obj=5000076e, id = 200014, name="main-SendThread()", group="main")
> THREAD START (obj=5000076d, id = 200013, name="main-EventThread", group="main")
> THREAD START (obj=500003ae, id = 200015, name="metrics-meter-tick-thread-1", group="main")
> THREAD START (obj=500003ae, id = 200016, name="metrics-meter-tick-thread-2", group="main")
> THREAD START (obj=500003ae, id = 200017, name="request-expiration-task", group="main")
> THREAD START (obj=500003ae, id = 200018, name="request-expiration-task", group="main")
> THREAD START (obj=500003ae, id = 200019, name="kafka-request-handler-0", group="main")
> THREAD START (obj=500003ae, id = 200020, name="kafka-request-handler-1", group="main")
> THREAD START (obj=500003ae, id = 200021, name="Thread-6", group="main")
> THREAD START (obj=500003ae, id = 200022, name="Thread-7", group="main")
> THREAD START (obj=50000899, id = 200023, name="ReplicaFetcherThread-0-2 on broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200024, name="ReplicaFetcherThread-0-3 on broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200025, name="ReplicaFetcherThread-0-0 on broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200026, name="ReplicaFetcherThread-0-1 on broker 1, ", group="main")
> THREAD START (obj=500003ae, id = 200028, name="SIGINT handler", group="system")
> THREAD START (obj=500003ae, id = 200029, name="Thread-5", group="main")
> THREAD START (obj=5000074b, id = 200030, name="Thread-1", group="main")
> THREAD START (obj=5000074b, id = 200031, name="Thread-0", group="main")
> THREAD END (id = 200031)
> THREAD END (id = 200029)
> THREAD END (id = 200020)
> THREAD END (id = 200019)
> THREAD END (id = 200008)
> THREAD END (id = 200021)
> THREAD END (id = 200007)
> THREAD END (id = 200022)
> THREAD END (id = 200018)
> THREAD END (id = 200017)
> THREAD END (id = 200012)
> THREAD END (id = 200013)
> THREAD END (id = 200014)
> THREAD END (id = 200025)
> THREAD END (id = 200023)
> THREAD END (id = 200026)
> THREAD END (id = 200024)
> THREAD END (id = 200011)
> THREAD END (id = 200009)
> THREAD END (id = 200010)
> THREAD END (id = 200030)
> THREAD END (id = 200028)
> TRACE 301281:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218)
>         sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>         java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>         kafka.utils.Utils$.read(Utils.scala:453)
>         kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>         kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>         kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>         kafka.network.BlockingChannel.receive(BlockingChannel.scala:92)
>         kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:67)
>         kafka.consumer.SimpleConsumer.sendRequest(SimpleConsumer.scala:65)
>         kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:95)
>         kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:71)
>         kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:50)
> TRACE 300427:
>         java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
>         java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         java.net.ServerSocket.implAccept(ServerSocket.java:522)
>         java.net.ServerSocket.accept(ServerSocket.java:490)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300963:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         kafka.network.Processor.run(SocketServer.scala:202)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300558:
>         java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
>         java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         java.net.ServerSocket.implAccept(ServerSocket.java:522)
>         java.net.ServerSocket.accept(ServerSocket.java:490)
>         sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
>         sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300978:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         kafka.network.Acceptor.run(SocketServer.scala:142)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 301024:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1134)
> TRACE 301830:
>         sun.misc.Unsafe.park(Unsafe.java:Unknown line)
>         java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>         java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>         java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
>         kafka.server.RequestPurgatory$ExpiredRequestReaper.pollExpired(RequestPurgatory.scala:282)
>         kafka.server.RequestPurgatory$ExpiredRequestReaper.run(RequestPurgatory.scala:235)
>         java.lang.Thread.run(Thread.java:722)
> CPU SAMPLES BEGIN (total = 1878003) Thu Aug 30 22:26:05 2012
> rank   self  accum   count trace method
>    1 36.22% 36.22%  680139 301281 sun.nio.ch.EPollArrayWrapper.epollWait
>    2 18.15% 54.36%  340770 300427 java.net.PlainSocketImpl.socketAccept
>    3 18.13% 72.49%  340458 300963 sun.nio.ch.EPollArrayWrapper.epollWait
>    4  9.07% 81.56%  170382 300558 java.net.PlainSocketImpl.socketAccept
>    5  9.07% 90.63%  170340 300978 sun.nio.ch.EPollArrayWrapper.epollWait
>    6  9.06% 99.69%  170139 301024 sun.nio.ch.EPollArrayWrapper.epollWait
>    7  0.18% 99.88%    3429 301830 sun.misc.Unsafe.park
> CPU SAMPLES END

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira