You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Michael Saffitz (JIRA)" <ji...@apache.org> on 2016/09/04 16:20:20 UTC

[jira] [Commented] (KAFKA-3900) High CPU util on broker

    [ https://issues.apache.org/jira/browse/KAFKA-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15463149#comment-15463149 ] 

Michael Saffitz commented on KAFKA-3900:
----------------------------------------

We're seeing a similar issue-- we have a 5 node kafka cluster, 4 of the 5 nodes have CPU around 65% but one is persistently pegged at 100%.  We get the same exception as above and see frequent shrink / expands on the ISRs.  Also on AWS w/ Amazon Linux.

> High CPU util on broker
> -----------------------
>
>                 Key: KAFKA-3900
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3900
>             Project: Kafka
>          Issue Type: Bug
>          Components: network, replication
>    Affects Versions: 0.10.0.0
>         Environment: kafka = 2.11-0.10.0.0
> java version "1.8.0_91"
> amazon linux
>            Reporter: Andrey Konyaev
>
> I start kafka cluster in amazon with m4.xlarge (4 cpu and 16 GB mem (14 allocate for kafka in heap)). Have three nodes.
> I haven't high load (6000 message/sec) and we have cpu_idle = 70%, but sometime (about once a day) I see this message in server.log:
> [2016-06-24 14:52:22,299] WARN [ReplicaFetcherThread-0-2], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@6eaa1034 (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 2 was disconnected before the response was read
>         at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
>         at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
>         at scala.Option.foreach(Option.scala:257)
>         at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
>         at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
>         at kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(NetworkClientBlockingOps.scala:137)
>         at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
>         at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
>         at kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
>         at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:229)
>         at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
>         at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
>         at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> I know, this can be network glitch, but why kafka eat all cpu time?
> My config:
> inter.broker.protocol.version=0.10.0.0
> log.message.format.version=0.10.0.0
> default.replication.factor=3
> num.partitions=3
> replica.lag.time.max.ms=15000
> broker.id=0
> listeners=PLAINTEXT://:9092
> log.dirs=/mnt/kafka/kafka
> log.retention.check.interval.ms=300000
> log.retention.hours=168
> log.segment.bytes=1073741824
> num.io.threads=20
> num.network.threads=10
> num.partitions=1
> num.recovery.threads.per.data.dir=2
> socket.receive.buffer.bytes=102400
> socket.request.max.bytes=104857600
> socket.send.buffer.bytes=102400
> zookeeper.connection.timeout.ms=6000
> delete.topic.enable = true
> broker.max_heap_size=10 GiB 
>   
> Any ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)