You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by li jinyu <di...@gmail.com> on 2016/03/18 12:07:27 UTC

Kafka 0.8.2.2 is using high CPU

hi all,

we encountered problem with 0.8.1.1, as a lot of BadVersion error in log,
we believe it's caused by issue
https://issues.apache.org/jira/browse/KAFKA-1382, so we upgraded kafka to
0.8.2.2.

but we found that kafka is using high CPU. following is the result of 'top'
command:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

16781 root      20   0 9072m 874m  17m S 975.5  5.5 471:27.56 java

I tried to use HRPOF to measure the CPU usage, got the result:
CPU SAMPLES BEGIN (total = 139166340) Fri Mar 18 16:30:28 2016 rank self
accum count trace method 1 37.63% 37.63% 52372145 325245
sun.nio.ch.EPollArrayWrapper.epollWait 2 33.37% 71.00% 46441166 319805
sun.nio.ch.EPollArrayWrapper.epollWait 3 8.49% 79.50% 11819885 302339
java.net.PlainSocketImpl.socketAccept 4 4.25% 83.74% 5909221 302910
java.net.PlainSocketImpl.socketAccept 5 4.23% 87.98% 5891650 310256
sun.nio.ch.EPollArrayWrapper.epollWait 6 4.21% 92.19% 5859357 319905
sun.nio.ch.EPollArrayWrapper.epollWait 7 4.16% 96.34% 5782672 336990
java.net.SocketInputStream.socketRead0 8 1.99% 98.33% 2762683 320651
sun.misc.Unsafe.park 9 0.27% 98.60% 375233 319447 sun.misc.Unsafe.park 10
0.25% 98.85% 351701 320536 sun.misc.Unsafe.park 11 0.24% 99.09% 329232
329274 sun.misc.Unsafe.park 12 0.08% 99.17% 109428 325405
sun.nio.ch.EPollArrayWrapper.epollCtl 13 0.05% 99.22% 69762 329549
sun.nio.ch.EPollArrayWrapper.interrupt 14 0.03% 99.25% 45197 328439
kafka.network.RequestChannel$Request.updateRequestMetrics 15 0.03% 99.28%
38496 328333 sun.nio.ch.FileDispatcherImpl.write0 16 0.02% 99.30% 33307
337909 java.lang.System.identityHashCode 17 0.02% 99.32% 31783 328763
sun.nio.ch.NativeThread.current 18 0.02% 99.34% 30917 322537
sun.misc.Unsafe.unpark 19 0.02% 99.37% 29943 325979
sun.nio.ch.EPollArrayWrapper.epollWait 20 0.02% 99.39% 27788 328474
sun.nio.ch.FileDispatcherImpl.write0 21 0.02% 99.40% 26448 328397
sun.nio.ch.FileDispatcherImpl.write0 22 0.02% 99.42% 25177 329364
scala.collection.mutable.HashTable$class.$init$ 23 0.02% 99.44% 24230
328786 java.lang.Thread.currentThread 24 0.02% 99.46% 22683 328795
java.lang.Thread.currentThread 25 0.01% 99.47% 16264 328092
sun.nio.ch.IOUtil.drain

as most of time, epollwait shouldn't be so expensive, it should be the
problem of way of statistic.
then I tried YourKit, found that most heavy method is in
FetchRequestPurgatory.expire(), I uploaded the screenshot of YourKit:
http://imgur.com/2tXjhBI

does anyone have same problem?

-- 
Don't schedule every day, make them disorder.

Re: Kafka 0.8.2.2 is using high CPU

Posted by li jinyu <di...@gmail.com>.
add more information.
we found that the linux(centos) running kafka has a very high context
switch.
and at first we thought it's caused by corrupted message data, but I tried
to delete all data in one node and restarted kafka, but cpu usage is still
high.

following are outputs from sar and vmstat:
[root@ ~]# vmstat 1

procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa st
17  0      0 3189824 372180 10568836    0    0     0     3    0    0  4  4
92  0  0
17  0      0 3189816 372180 10568836    0    0     0    16 16702 4772818 28
25 47  0  0
16  0      0 3190224 372180 10568836    0    0     0     0 18956 5164116 31
29 40  0  0
16  0      0 3190404 372180 10568836    0    0     0    60 25567 4976697 29
28 43  0  0
15  0      0 3190424 372180 10568836    0    0     0     0 15033 4883211 29
28 43  0  0
16  0      0 3190672 372180 10568836    0    0     0     0 15340 4891077 29
29 42  0  0
14  0      0 3190796 372180 10568836    0    0     0     8 14408 4649112 27
26 47  0  0
16  0      0 3190832 372180 10568836    0    0     0    32 16415 5097257 30
31 39  0  0

[root@ ~]# sar -w 1
02:54:55 PM    proc/s   cswch/s
02:54:56 PM      0.00 6380752.63
02:54:57 PM      0.00 6295443.04
02:54:58 PM      0.00 6107574.36
02:54:59 PM      0.00 6512588.89
02:55:00 PM      0.00 5686784.52
02:55:01 PM      0.00 6081308.64
02:55:02 PM      0.00 6099097.53
02:55:03 PM      0.00 6085111.25
02:55:04 PM      0.00 6206551.90
02:55:05 PM      0.00 6026546.84
Average:         0.00 6145005.26

On Sat, Mar 19, 2016 at 11:32 PM, li jinyu <di...@gmail.com> wrote:

> thanks, the new kafka cluster is stable now, I want to know the root cause.
>
> On Sat, Mar 19, 2016 at 3:34 PM, Stevo Slavić <ss...@gmail.com> wrote:
>
>> Consider upgrading to 0.9.0.1 - our cluster is much more stable since
>> upgrade. Clients can be upgraded later.
>>
>> On Fri, Mar 18, 2016, 23:22 li jinyu <di...@gmail.com> wrote:
>>
>> > I want to add more information.
>> >
>> > we've already setup a completely new Kafka cluster with 0.8.2.2, and the
>> > old cluster is still there, but no consumer or producer, it's really
>> wired
>> > that the CPU usage is still high.
>> >
>> > also changed fetch.purgatory.purge.interval.requests and
>> > producer.purgatory.purge.interval.requests
>> > to 100, as I saw the high CPU usage with FetchRequestPurgatory.expire(),
>> > but no luck.
>> >
>> >
>> > On Fri, Mar 18, 2016 at 7:07 PM, li jinyu <di...@gmail.com>
>> wrote:
>> >
>> > > hi all,
>> > >
>> > > we encountered problem with 0.8.1.1, as a lot of BadVersion error in
>> log,
>> > > we believe it's caused by issue
>> > > https://issues.apache.org/jira/browse/KAFKA-1382, so we upgraded
>> kafka
>> > to
>> > > 0.8.2.2.
>> > >
>> > > but we found that kafka is using high CPU. following is the result of
>> > > 'top' command:
>> > >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> > >
>> > > 16781 root      20   0 9072m 874m  17m S 975.5  5.5 471:27.56 java
>> > >
>> > > I tried to use HRPOF to measure the CPU usage, got the result:
>> > > CPU SAMPLES BEGIN (total = 139166340) Fri Mar 18 16:30:28 2016 rank
>> self
>> > > accum count trace method 1 37.63% 37.63% 52372145 325245
>> > > sun.nio.ch.EPollArrayWrapper.epollWait 2 33.37% 71.00% 46441166 319805
>> > > sun.nio.ch.EPollArrayWrapper.epollWait 3 8.49% 79.50% 11819885 302339
>> > > java.net.PlainSocketImpl.socketAccept 4 4.25% 83.74% 5909221 302910
>> > > java.net.PlainSocketImpl.socketAccept 5 4.23% 87.98% 5891650 310256
>> > > sun.nio.ch.EPollArrayWrapper.epollWait 6 4.21% 92.19% 5859357 319905
>> > > sun.nio.ch.EPollArrayWrapper.epollWait 7 4.16% 96.34% 5782672 336990
>> > > java.net.SocketInputStream.socketRead0 8 1.99% 98.33% 2762683 320651
>> > > sun.misc.Unsafe.park 9 0.27% 98.60% 375233 319447
>> sun.misc.Unsafe.park 10
>> > > 0.25% 98.85% 351701 320536 sun.misc.Unsafe.park 11 0.24% 99.09% 329232
>> > > 329274 sun.misc.Unsafe.park 12 0.08% 99.17% 109428 325405
>> > > sun.nio.ch.EPollArrayWrapper.epollCtl 13 0.05% 99.22% 69762 329549
>> > > sun.nio.ch.EPollArrayWrapper.interrupt 14 0.03% 99.25% 45197 328439
>> > > kafka.network.RequestChannel$Request.updateRequestMetrics 15 0.03%
>> 99.28%
>> > > 38496 328333 sun.nio.ch.FileDispatcherImpl.write0 16 0.02% 99.30%
>> 33307
>> > > 337909 java.lang.System.identityHashCode 17 0.02% 99.32% 31783 328763
>> > > sun.nio.ch.NativeThread.current 18 0.02% 99.34% 30917 322537
>> > > sun.misc.Unsafe.unpark 19 0.02% 99.37% 29943 325979
>> > > sun.nio.ch.EPollArrayWrapper.epollWait 20 0.02% 99.39% 27788 328474
>> > > sun.nio.ch.FileDispatcherImpl.write0 21 0.02% 99.40% 26448 328397
>> > > sun.nio.ch.FileDispatcherImpl.write0 22 0.02% 99.42% 25177 329364
>> > > scala.collection.mutable.HashTable$class.$init$ 23 0.02% 99.44% 24230
>> > > 328786 java.lang.Thread.currentThread 24 0.02% 99.46% 22683 328795
>> > > java.lang.Thread.currentThread 25 0.01% 99.47% 16264 328092
>> > > sun.nio.ch.IOUtil.drain
>> > >
>> > > as most of time, epollwait shouldn't be so expensive, it should be the
>> > > problem of way of statistic.
>> > > then I tried YourKit, found that most heavy method is in
>> > > FetchRequestPurgatory.expire(), I uploaded the screenshot of YourKit:
>> > > http://imgur.com/2tXjhBI
>> > >
>> > > does anyone have same problem?
>> > >
>> > > --
>> > > Don't schedule every day, make them disorder.
>> > >
>> >
>> >
>> >
>> > --
>> > Don't schedule every day, make them disorder.
>> >
>>
>
>
>
> --
> Don't schedule every day, make them disorder.
>



-- 
Don't schedule every day, make them disorder.

Re: Kafka 0.8.2.2 is using high CPU

Posted by li jinyu <di...@gmail.com>.
thanks, the new kafka cluster is stable now, I want to know the root cause.

On Sat, Mar 19, 2016 at 3:34 PM, Stevo Slavić <ss...@gmail.com> wrote:

> Consider upgrading to 0.9.0.1 - our cluster is much more stable since
> upgrade. Clients can be upgraded later.
>
> On Fri, Mar 18, 2016, 23:22 li jinyu <di...@gmail.com> wrote:
>
> > I want to add more information.
> >
> > we've already setup a completely new Kafka cluster with 0.8.2.2, and the
> > old cluster is still there, but no consumer or producer, it's really
> wired
> > that the CPU usage is still high.
> >
> > also changed fetch.purgatory.purge.interval.requests and
> > producer.purgatory.purge.interval.requests
> > to 100, as I saw the high CPU usage with FetchRequestPurgatory.expire(),
> > but no luck.
> >
> >
> > On Fri, Mar 18, 2016 at 7:07 PM, li jinyu <di...@gmail.com> wrote:
> >
> > > hi all,
> > >
> > > we encountered problem with 0.8.1.1, as a lot of BadVersion error in
> log,
> > > we believe it's caused by issue
> > > https://issues.apache.org/jira/browse/KAFKA-1382, so we upgraded kafka
> > to
> > > 0.8.2.2.
> > >
> > > but we found that kafka is using high CPU. following is the result of
> > > 'top' command:
> > >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > >
> > > 16781 root      20   0 9072m 874m  17m S 975.5  5.5 471:27.56 java
> > >
> > > I tried to use HRPOF to measure the CPU usage, got the result:
> > > CPU SAMPLES BEGIN (total = 139166340) Fri Mar 18 16:30:28 2016 rank
> self
> > > accum count trace method 1 37.63% 37.63% 52372145 325245
> > > sun.nio.ch.EPollArrayWrapper.epollWait 2 33.37% 71.00% 46441166 319805
> > > sun.nio.ch.EPollArrayWrapper.epollWait 3 8.49% 79.50% 11819885 302339
> > > java.net.PlainSocketImpl.socketAccept 4 4.25% 83.74% 5909221 302910
> > > java.net.PlainSocketImpl.socketAccept 5 4.23% 87.98% 5891650 310256
> > > sun.nio.ch.EPollArrayWrapper.epollWait 6 4.21% 92.19% 5859357 319905
> > > sun.nio.ch.EPollArrayWrapper.epollWait 7 4.16% 96.34% 5782672 336990
> > > java.net.SocketInputStream.socketRead0 8 1.99% 98.33% 2762683 320651
> > > sun.misc.Unsafe.park 9 0.27% 98.60% 375233 319447 sun.misc.Unsafe.park
> 10
> > > 0.25% 98.85% 351701 320536 sun.misc.Unsafe.park 11 0.24% 99.09% 329232
> > > 329274 sun.misc.Unsafe.park 12 0.08% 99.17% 109428 325405
> > > sun.nio.ch.EPollArrayWrapper.epollCtl 13 0.05% 99.22% 69762 329549
> > > sun.nio.ch.EPollArrayWrapper.interrupt 14 0.03% 99.25% 45197 328439
> > > kafka.network.RequestChannel$Request.updateRequestMetrics 15 0.03%
> 99.28%
> > > 38496 328333 sun.nio.ch.FileDispatcherImpl.write0 16 0.02% 99.30% 33307
> > > 337909 java.lang.System.identityHashCode 17 0.02% 99.32% 31783 328763
> > > sun.nio.ch.NativeThread.current 18 0.02% 99.34% 30917 322537
> > > sun.misc.Unsafe.unpark 19 0.02% 99.37% 29943 325979
> > > sun.nio.ch.EPollArrayWrapper.epollWait 20 0.02% 99.39% 27788 328474
> > > sun.nio.ch.FileDispatcherImpl.write0 21 0.02% 99.40% 26448 328397
> > > sun.nio.ch.FileDispatcherImpl.write0 22 0.02% 99.42% 25177 329364
> > > scala.collection.mutable.HashTable$class.$init$ 23 0.02% 99.44% 24230
> > > 328786 java.lang.Thread.currentThread 24 0.02% 99.46% 22683 328795
> > > java.lang.Thread.currentThread 25 0.01% 99.47% 16264 328092
> > > sun.nio.ch.IOUtil.drain
> > >
> > > as most of time, epollwait shouldn't be so expensive, it should be the
> > > problem of way of statistic.
> > > then I tried YourKit, found that most heavy method is in
> > > FetchRequestPurgatory.expire(), I uploaded the screenshot of YourKit:
> > > http://imgur.com/2tXjhBI
> > >
> > > does anyone have same problem?
> > >
> > > --
> > > Don't schedule every day, make them disorder.
> > >
> >
> >
> >
> > --
> > Don't schedule every day, make them disorder.
> >
>



-- 
Don't schedule every day, make them disorder.

Re: Kafka 0.8.2.2 is using high CPU

Posted by Stevo Slavić <ss...@gmail.com>.
Consider upgrading to 0.9.0.1 - our cluster is much more stable since
upgrade. Clients can be upgraded later.

On Fri, Mar 18, 2016, 23:22 li jinyu <di...@gmail.com> wrote:

> I want to add more information.
>
> we've already setup a completely new Kafka cluster with 0.8.2.2, and the
> old cluster is still there, but no consumer or producer, it's really wired
> that the CPU usage is still high.
>
> also changed fetch.purgatory.purge.interval.requests and
> producer.purgatory.purge.interval.requests
> to 100, as I saw the high CPU usage with FetchRequestPurgatory.expire(),
> but no luck.
>
>
> On Fri, Mar 18, 2016 at 7:07 PM, li jinyu <di...@gmail.com> wrote:
>
> > hi all,
> >
> > we encountered problem with 0.8.1.1, as a lot of BadVersion error in log,
> > we believe it's caused by issue
> > https://issues.apache.org/jira/browse/KAFKA-1382, so we upgraded kafka
> to
> > 0.8.2.2.
> >
> > but we found that kafka is using high CPU. following is the result of
> > 'top' command:
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >
> > 16781 root      20   0 9072m 874m  17m S 975.5  5.5 471:27.56 java
> >
> > I tried to use HRPOF to measure the CPU usage, got the result:
> > CPU SAMPLES BEGIN (total = 139166340) Fri Mar 18 16:30:28 2016 rank self
> > accum count trace method 1 37.63% 37.63% 52372145 325245
> > sun.nio.ch.EPollArrayWrapper.epollWait 2 33.37% 71.00% 46441166 319805
> > sun.nio.ch.EPollArrayWrapper.epollWait 3 8.49% 79.50% 11819885 302339
> > java.net.PlainSocketImpl.socketAccept 4 4.25% 83.74% 5909221 302910
> > java.net.PlainSocketImpl.socketAccept 5 4.23% 87.98% 5891650 310256
> > sun.nio.ch.EPollArrayWrapper.epollWait 6 4.21% 92.19% 5859357 319905
> > sun.nio.ch.EPollArrayWrapper.epollWait 7 4.16% 96.34% 5782672 336990
> > java.net.SocketInputStream.socketRead0 8 1.99% 98.33% 2762683 320651
> > sun.misc.Unsafe.park 9 0.27% 98.60% 375233 319447 sun.misc.Unsafe.park 10
> > 0.25% 98.85% 351701 320536 sun.misc.Unsafe.park 11 0.24% 99.09% 329232
> > 329274 sun.misc.Unsafe.park 12 0.08% 99.17% 109428 325405
> > sun.nio.ch.EPollArrayWrapper.epollCtl 13 0.05% 99.22% 69762 329549
> > sun.nio.ch.EPollArrayWrapper.interrupt 14 0.03% 99.25% 45197 328439
> > kafka.network.RequestChannel$Request.updateRequestMetrics 15 0.03% 99.28%
> > 38496 328333 sun.nio.ch.FileDispatcherImpl.write0 16 0.02% 99.30% 33307
> > 337909 java.lang.System.identityHashCode 17 0.02% 99.32% 31783 328763
> > sun.nio.ch.NativeThread.current 18 0.02% 99.34% 30917 322537
> > sun.misc.Unsafe.unpark 19 0.02% 99.37% 29943 325979
> > sun.nio.ch.EPollArrayWrapper.epollWait 20 0.02% 99.39% 27788 328474
> > sun.nio.ch.FileDispatcherImpl.write0 21 0.02% 99.40% 26448 328397
> > sun.nio.ch.FileDispatcherImpl.write0 22 0.02% 99.42% 25177 329364
> > scala.collection.mutable.HashTable$class.$init$ 23 0.02% 99.44% 24230
> > 328786 java.lang.Thread.currentThread 24 0.02% 99.46% 22683 328795
> > java.lang.Thread.currentThread 25 0.01% 99.47% 16264 328092
> > sun.nio.ch.IOUtil.drain
> >
> > as most of time, epollwait shouldn't be so expensive, it should be the
> > problem of way of statistic.
> > then I tried YourKit, found that most heavy method is in
> > FetchRequestPurgatory.expire(), I uploaded the screenshot of YourKit:
> > http://imgur.com/2tXjhBI
> >
> > does anyone have same problem?
> >
> > --
> > Don't schedule every day, make them disorder.
> >
>
>
>
> --
> Don't schedule every day, make them disorder.
>

Re: Kafka 0.8.2.2 is using high CPU

Posted by li jinyu <di...@gmail.com>.
I want to add more information.

we've already setup a completely new Kafka cluster with 0.8.2.2, and the
old cluster is still there, but no consumer or producer, it's really wired
that the CPU usage is still high.

also changed fetch.purgatory.purge.interval.requests and
producer.purgatory.purge.interval.requests
to 100, as I saw the high CPU usage with FetchRequestPurgatory.expire(),
but no luck.


On Fri, Mar 18, 2016 at 7:07 PM, li jinyu <di...@gmail.com> wrote:

> hi all,
>
> we encountered problem with 0.8.1.1, as a lot of BadVersion error in log,
> we believe it's caused by issue
> https://issues.apache.org/jira/browse/KAFKA-1382, so we upgraded kafka to
> 0.8.2.2.
>
> but we found that kafka is using high CPU. following is the result of
> 'top' command:
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
> 16781 root      20   0 9072m 874m  17m S 975.5  5.5 471:27.56 java
>
> I tried to use HRPOF to measure the CPU usage, got the result:
> CPU SAMPLES BEGIN (total = 139166340) Fri Mar 18 16:30:28 2016 rank self
> accum count trace method 1 37.63% 37.63% 52372145 325245
> sun.nio.ch.EPollArrayWrapper.epollWait 2 33.37% 71.00% 46441166 319805
> sun.nio.ch.EPollArrayWrapper.epollWait 3 8.49% 79.50% 11819885 302339
> java.net.PlainSocketImpl.socketAccept 4 4.25% 83.74% 5909221 302910
> java.net.PlainSocketImpl.socketAccept 5 4.23% 87.98% 5891650 310256
> sun.nio.ch.EPollArrayWrapper.epollWait 6 4.21% 92.19% 5859357 319905
> sun.nio.ch.EPollArrayWrapper.epollWait 7 4.16% 96.34% 5782672 336990
> java.net.SocketInputStream.socketRead0 8 1.99% 98.33% 2762683 320651
> sun.misc.Unsafe.park 9 0.27% 98.60% 375233 319447 sun.misc.Unsafe.park 10
> 0.25% 98.85% 351701 320536 sun.misc.Unsafe.park 11 0.24% 99.09% 329232
> 329274 sun.misc.Unsafe.park 12 0.08% 99.17% 109428 325405
> sun.nio.ch.EPollArrayWrapper.epollCtl 13 0.05% 99.22% 69762 329549
> sun.nio.ch.EPollArrayWrapper.interrupt 14 0.03% 99.25% 45197 328439
> kafka.network.RequestChannel$Request.updateRequestMetrics 15 0.03% 99.28%
> 38496 328333 sun.nio.ch.FileDispatcherImpl.write0 16 0.02% 99.30% 33307
> 337909 java.lang.System.identityHashCode 17 0.02% 99.32% 31783 328763
> sun.nio.ch.NativeThread.current 18 0.02% 99.34% 30917 322537
> sun.misc.Unsafe.unpark 19 0.02% 99.37% 29943 325979
> sun.nio.ch.EPollArrayWrapper.epollWait 20 0.02% 99.39% 27788 328474
> sun.nio.ch.FileDispatcherImpl.write0 21 0.02% 99.40% 26448 328397
> sun.nio.ch.FileDispatcherImpl.write0 22 0.02% 99.42% 25177 329364
> scala.collection.mutable.HashTable$class.$init$ 23 0.02% 99.44% 24230
> 328786 java.lang.Thread.currentThread 24 0.02% 99.46% 22683 328795
> java.lang.Thread.currentThread 25 0.01% 99.47% 16264 328092
> sun.nio.ch.IOUtil.drain
>
> as most of time, epollwait shouldn't be so expensive, it should be the
> problem of way of statistic.
> then I tried YourKit, found that most heavy method is in
> FetchRequestPurgatory.expire(), I uploaded the screenshot of YourKit:
> http://imgur.com/2tXjhBI
>
> does anyone have same problem?
>
> --
> Don't schedule every day, make them disorder.
>



-- 
Don't schedule every day, make them disorder.