You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Navneeth Krishnan <re...@gmail.com> on 2020/01/03 00:18:00 UTC

High CPU Usage on Brokers

Hi All,

We have a kafka cluster with 12 nodes and we are pretty much seeing 90% cpu
usage on all the nodes. Here is all the information. Need some help on
figuring out what the problem is and how to overcome this issue.

*Cluster:*
Kafka version: 2.3.0
Number of brokers in cluster: 12
Node type: 4 vCores 32GB mem
Network In: 10Mbps per broker
Network Out: 16Mbps per broker
Topics: 10 (approximately)
Partitions: 20 (Max), some has only partitions
Replication Factor: 3

*CPU Usage:*
[image: image.png]

*VMStat*

[root]# vmstat 1 10

procs -----------memory---------- ---swap-- -----io---- -system--
------cpu-----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa st

 8  0      0 234444  19064 24046980    0    0    17  2026    1    3 38 33
28  0  1

 7  0      0 256444  19036 24023880    0    0   768     0 64027 22708 44 40
16  0  1

 7  0      0 245356  19052 24034560    0    0   256   472 63509 23276 44 39
17  0  1

 7  0      0 235096  19052 24046616    0    0     0     0 62277 22516 46 38
15  0  1

 8  0      0 260548  19036 24020084    0    0   516 49888 62364 22894 43 38
18  0  1

 5  0      0 249232  19036 24030924    0    0   512     0 61022 24589 41 39
20  0  1

 6  0      0 238072  19036 24042512    0    0  1024     0 63358 23063 44 38
17  0  0

 5  0      0 262904  19052 24017972    0    0     0   440 63078 23499 46 37
17  0  1

 7  0      0 250324  19052 24030008    0    0     0     0 64615 22617 48 38
14  0  1

 6  0      0 237920  19052 24042372    0    0  1024 48900 63223 23029 42 40
18  0  1


*IO Stat:*

[root]# iostat -m

Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
01/02/2020        _x86_64_             (4 CPU)



avg-cpu:  %user   %nice %system %iowait  %steal   %idle

          38.11    0.00   33.09    0.11    0.61   28.08



Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn

xvda              2.36         0.01         0.01      26760      43360

nvme0n1           0.00         0.00         0.00          2          0

xvdf             70.95         0.06         7.67     185908   25205338

*Top Kafka broker threads:*
[image: image.png]

*Top 3:*

"data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
#60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
[0x00007f8a886ce000]

"data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
#62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
[0x00007f8a6aefd000]

"data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
#61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
[0x00007f8a885cd000]

It doesn't looks like GC and IO is the problem.

Thanks

Re: High CPU Usage on Brokers

Posted by Ismael Juma <is...@juma.me.uk>.
You can take a profile with Java Flight Recorder if you use Java 11 or
using async profiler otherwise. See below for the latter:

https://issues.apache.org/jira/browse/KAFKA-9339?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17013400#comment-17013400

It's worth filing a JIRA and discuss it there.

Ismael

On Sun, Jan 12, 2020 at 10:28 PM Navneeth Krishnan <re...@gmail.com>
wrote:

> Hi Ismael,
>
> We were previously running on 0.10.2.1 with 8 brokers running around 80%
> CPU. But now we have upgraded to 2.3 with 16 brokers. It's the same message
> rate, topics, producers and consumers but the CPU is still >80%. How can we
> troubleshoot to find where exactly is the problem?
>
> Thanks
>
> On Wed, Jan 8, 2020 at 10:33 AM Ismael Juma <is...@juma.me.uk> wrote:
>
> > Has the behavior changed after an upgrade or has it been consistent since
> > the start?
> >
> > Ismael
> >
> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
> reachnavneeth2@gmail.com
> > >
> > wrote:
> >
> > > Hi All,
> > >
> > > We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
> > > cpu usage on all the nodes. Here is all the information. Need some help
> > on
> > > figuring out what the problem is and how to overcome this issue.
> > >
> > > *Cluster:*
> > > Kafka version: 2.3.0
> > > Number of brokers in cluster: 12
> > > Node type: 4 vCores 32GB mem
> > > Network In: 10Mbps per broker
> > > Network Out: 16Mbps per broker
> > > Topics: 10 (approximately)
> > > Partitions: 20 (Max), some has only partitions
> > > Replication Factor: 3
> > >
> > > *CPU Usage:*
> > > [image: image.png]
> > >
> > > *VMStat*
> > >
> > > [root]# vmstat 1 10
> > >
> > > procs -----------memory---------- ---swap-- -----io---- -system--
> > > ------cpu-----
> > >
> > >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> > id
> > > wa st
> > >
> > >  8  0      0 234444  19064 24046980    0    0    17  2026    1    3 38
> 33
> > > 28  0  1
> > >
> > >  7  0      0 256444  19036 24023880    0    0   768     0 64027 22708
> 44
> > > 40 16  0  1
> > >
> > >  7  0      0 245356  19052 24034560    0    0   256   472 63509 23276
> 44
> > > 39 17  0  1
> > >
> > >  7  0      0 235096  19052 24046616    0    0     0     0 62277 22516
> 46
> > > 38 15  0  1
> > >
> > >  8  0      0 260548  19036 24020084    0    0   516 49888 62364 22894
> 43
> > > 38 18  0  1
> > >
> > >  5  0      0 249232  19036 24030924    0    0   512     0 61022 24589
> 41
> > > 39 20  0  1
> > >
> > >  6  0      0 238072  19036 24042512    0    0  1024     0 63358 23063
> 44
> > > 38 17  0  0
> > >
> > >  5  0      0 262904  19052 24017972    0    0     0   440 63078 23499
> 46
> > > 37 17  0  1
> > >
> > >  7  0      0 250324  19052 24030008    0    0     0     0 64615 22617
> 48
> > > 38 14  0  1
> > >
> > >  6  0      0 237920  19052 24042372    0    0  1024 48900 63223 23029
> 42
> > > 40 18  0  1
> > >
> > >
> > > *IO Stat:*
> > >
> > > [root]# iostat -m
> > >
> > > Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> > > 01/02/2020        _x86_64_             (4 CPU)
> > >
> > >
> > >
> > > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> > >
> > >           38.11    0.00   33.09    0.11    0.61   28.08
> > >
> > >
> > >
> > > Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> > >
> > > xvda              2.36         0.01         0.01      26760      43360
> > >
> > > nvme0n1           0.00         0.00         0.00          2          0
> > >
> > > xvdf             70.95         0.06         7.67     185908   25205338
> > >
> > > *Top Kafka broker threads:*
> > > [image: image.png]
> > >
> > > *Top 3:*
> > >
> > >
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> > > #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> > > [0x00007f8a886ce000]
> > >
> > >
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> > > #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> > > [0x00007f8a6aefd000]
> > >
> > >
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> > > #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> > > [0x00007f8a885cd000]
> > >
> > > It doesn't looks like GC and IO is the problem.
> > >
> > > Thanks
> > >
> >
>

Re: High CPU Usage on Brokers

Posted by Navneeth Krishnan <re...@gmail.com>.
Hi Ismael,

We were previously running on 0.10.2.1 with 8 brokers running around 80%
CPU. But now we have upgraded to 2.3 with 16 brokers. It's the same message
rate, topics, producers and consumers but the CPU is still >80%. How can we
troubleshoot to find where exactly is the problem?

Thanks

On Wed, Jan 8, 2020 at 10:33 AM Ismael Juma <is...@juma.me.uk> wrote:

> Has the behavior changed after an upgrade or has it been consistent since
> the start?
>
> Ismael
>
> On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <reachnavneeth2@gmail.com
> >
> wrote:
>
> > Hi All,
> >
> > We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
> > cpu usage on all the nodes. Here is all the information. Need some help
> on
> > figuring out what the problem is and how to overcome this issue.
> >
> > *Cluster:*
> > Kafka version: 2.3.0
> > Number of brokers in cluster: 12
> > Node type: 4 vCores 32GB mem
> > Network In: 10Mbps per broker
> > Network Out: 16Mbps per broker
> > Topics: 10 (approximately)
> > Partitions: 20 (Max), some has only partitions
> > Replication Factor: 3
> >
> > *CPU Usage:*
> > [image: image.png]
> >
> > *VMStat*
> >
> > [root]# vmstat 1 10
> >
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ------cpu-----
> >
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id
> > wa st
> >
> >  8  0      0 234444  19064 24046980    0    0    17  2026    1    3 38 33
> > 28  0  1
> >
> >  7  0      0 256444  19036 24023880    0    0   768     0 64027 22708 44
> > 40 16  0  1
> >
> >  7  0      0 245356  19052 24034560    0    0   256   472 63509 23276 44
> > 39 17  0  1
> >
> >  7  0      0 235096  19052 24046616    0    0     0     0 62277 22516 46
> > 38 15  0  1
> >
> >  8  0      0 260548  19036 24020084    0    0   516 49888 62364 22894 43
> > 38 18  0  1
> >
> >  5  0      0 249232  19036 24030924    0    0   512     0 61022 24589 41
> > 39 20  0  1
> >
> >  6  0      0 238072  19036 24042512    0    0  1024     0 63358 23063 44
> > 38 17  0  0
> >
> >  5  0      0 262904  19052 24017972    0    0     0   440 63078 23499 46
> > 37 17  0  1
> >
> >  7  0      0 250324  19052 24030008    0    0     0     0 64615 22617 48
> > 38 14  0  1
> >
> >  6  0      0 237920  19052 24042372    0    0  1024 48900 63223 23029 42
> > 40 18  0  1
> >
> >
> > *IO Stat:*
> >
> > [root]# iostat -m
> >
> > Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> > 01/02/2020        _x86_64_             (4 CPU)
> >
> >
> >
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >
> >           38.11    0.00   33.09    0.11    0.61   28.08
> >
> >
> >
> > Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> >
> > xvda              2.36         0.01         0.01      26760      43360
> >
> > nvme0n1           0.00         0.00         0.00          2          0
> >
> > xvdf             70.95         0.06         7.67     185908   25205338
> >
> > *Top Kafka broker threads:*
> > [image: image.png]
> >
> > *Top 3:*
> >
> > "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> > #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> > [0x00007f8a886ce000]
> >
> > "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> > #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> > [0x00007f8a6aefd000]
> >
> > "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> > #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> > [0x00007f8a885cd000]
> >
> > It doesn't looks like GC and IO is the problem.
> >
> > Thanks
> >
>

Re: High CPU Usage on Brokers

Posted by Ismael Juma <is...@juma.me.uk>.
Has the behavior changed after an upgrade or has it been consistent since
the start?

Ismael

On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <re...@gmail.com>
wrote:

> Hi All,
>
> We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
> cpu usage on all the nodes. Here is all the information. Need some help on
> figuring out what the problem is and how to overcome this issue.
>
> *Cluster:*
> Kafka version: 2.3.0
> Number of brokers in cluster: 12
> Node type: 4 vCores 32GB mem
> Network In: 10Mbps per broker
> Network Out: 16Mbps per broker
> Topics: 10 (approximately)
> Partitions: 20 (Max), some has only partitions
> Replication Factor: 3
>
> *CPU Usage:*
> [image: image.png]
>
> *VMStat*
>
> [root]# vmstat 1 10
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ------cpu-----
>
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa st
>
>  8  0      0 234444  19064 24046980    0    0    17  2026    1    3 38 33
> 28  0  1
>
>  7  0      0 256444  19036 24023880    0    0   768     0 64027 22708 44
> 40 16  0  1
>
>  7  0      0 245356  19052 24034560    0    0   256   472 63509 23276 44
> 39 17  0  1
>
>  7  0      0 235096  19052 24046616    0    0     0     0 62277 22516 46
> 38 15  0  1
>
>  8  0      0 260548  19036 24020084    0    0   516 49888 62364 22894 43
> 38 18  0  1
>
>  5  0      0 249232  19036 24030924    0    0   512     0 61022 24589 41
> 39 20  0  1
>
>  6  0      0 238072  19036 24042512    0    0  1024     0 63358 23063 44
> 38 17  0  0
>
>  5  0      0 262904  19052 24017972    0    0     0   440 63078 23499 46
> 37 17  0  1
>
>  7  0      0 250324  19052 24030008    0    0     0     0 64615 22617 48
> 38 14  0  1
>
>  6  0      0 237920  19052 24042372    0    0  1024 48900 63223 23029 42
> 40 18  0  1
>
>
> *IO Stat:*
>
> [root]# iostat -m
>
> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> 01/02/2020        _x86_64_             (4 CPU)
>
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>
>           38.11    0.00   33.09    0.11    0.61   28.08
>
>
>
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
>
> xvda              2.36         0.01         0.01      26760      43360
>
> nvme0n1           0.00         0.00         0.00          2          0
>
> xvdf             70.95         0.06         7.67     185908   25205338
>
> *Top Kafka broker threads:*
> [image: image.png]
>
> *Top 3:*
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> [0x00007f8a886ce000]
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> [0x00007f8a6aefd000]
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> [0x00007f8a885cd000]
>
> It doesn't looks like GC and IO is the problem.
>
> Thanks
>

Re: High CPU Usage on Brokers

Posted by Navneeth Krishnan <re...@gmail.com>.
Hi Lisheng,

Here are the answers to your questions.

do you set sun.security.jgss.native = true? No

if not, there are some items need to be check.

1. GC, but you say gc is not problem
  - I have verified GC multiple times and I don't see that to be an issue.

2. if you suspect network thread, how many thread did you set?
 - Currently there are 3 network threads per broker and 8 io threads

3. if you enable compression
 - No, compression is not enabled

4. did you change the value of batch.size at producer side?
 - No, there hasn't been any recent changes

5. do you think you can increase "fetch.min,bytes" at consumer side and
"replica.fetch.min.bytes" at broker to test if cpu usage can be down ?
 - Haven't tried it. If this will decrease the CPU then we can give that a
try.

6. you can check some metrics from jmx to analysis, e.g. checking
"kafka.network:type=RequestMetrics,
name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}", if
valus is high , that means cpu will be busy.
 - I don't see RequestsPerSec metrics in 2.3. I have the
"kafka.network:type=RequestMetrics,name=TotalTimeMs"
metric.
ProducerTotalTimeMs - 1.25 ms
FetchFollowerTotalTimeMs - 2.53 ms
FetchConsumerToalTimeMs - 12.5 ms

Thanks.

On Wed, Jan 8, 2020 at 1:29 AM Lisheng Wang <wa...@gmail.com> wrote:

> Hi Navneeth
>
> like the bug you said above,  do you set sun.security.jgss.native = true?
>
> if not, there are some items need to be check.
>
> 1. GC, but you say gc is not problem
> 2. if you suspect network thread, how many thread did you set?
> 3. if you enable compression
> 4. did you change the value of batch.size at producer side?
> 5. do you think you can increase "fetch.min,bytes" at consumer side and
> "replica.fetch.min.bytes" at broker to test if cpu usage can be down ?
> 6. you can check some metrics from jmx to analysis, e.g. checking
> "kafka.network:type=RequestMetrics,
> name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}", if
> valus is high , that means cpu will be busy.
>
> Best,
> Lisheng
>
>
> Navneeth Krishnan <re...@gmail.com> 于2020年1月8日周三 下午3:39写道:
>
> > Hi All,
> >
> > Any suggestions, we are running into this issue in production and any
> > help would be greatly appreciated.
> >
> > Thanks
> >
> > On Mon, Jan 6, 2020 at 9:26 PM Navneeth Krishnan <
> reachnavneeth2@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > Thanks for the response. We were using version 0.11 previously and all
> > our
> > > producers/consumers have been upgraded to either 1.0 or to the latest
> > 2.3.
> > >
> > > Is it normal for the network thread to consume more cpu? If you look at
> > > it, the network thread consumes 50% of the overall cpu.
> > >
> > > Regards
> > >
> > > On Mon, Jan 6, 2020 at 7:04 PM Thunder Stumpges <
> > > thunder.stumpges@gmail.com> wrote:
> > >
> > >> Not sure what version your producers/consumers are, or if you upgraded
> > >> from
> > >> a previous version that used to work, or what, but maybe you're
> hitting
> > >> this?
> > >>
> > >>
> > >>
> >
> https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact
> > >>
> > >>
> > >>
> > >> On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <
> > >> reachnavneeth2@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi All,
> > >> >
> > >> > Any idea on what can be done? Not sure if we are running into this
> > below
> > >> > bug.
> > >> >
> > >> > https://issues.apache.org/jira/browse/KAFKA-7925
> > >> >
> > >> > Thanks
> > >> >
> > >> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
> > >> reachnavneeth2@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> Hi All,
> > >> >>
> > >> >> We have a kafka cluster with 12 nodes and we are pretty much seeing
> > 90%
> > >> >> cpu usage on all the nodes. Here is all the information. Need some
> > >> help on
> > >> >> figuring out what the problem is and how to overcome this issue.
> > >> >>
> > >> >> *Cluster:*
> > >> >> Kafka version: 2.3.0
> > >> >> Number of brokers in cluster: 12
> > >> >> Node type: 4 vCores 32GB mem
> > >> >> Network In: 10Mbps per broker
> > >> >> Network Out: 16Mbps per broker
> > >> >> Topics: 10 (approximately)
> > >> >> Partitions: 20 (Max), some has only partitions
> > >> >> Replication Factor: 3
> > >> >>
> > >> >> *CPU Usage:*
> > >> >> [image: image.png]
> > >> >>
> > >> >> *VMStat*
> > >> >>
> > >> >> [root]# vmstat 1 10
> > >> >>
> > >> >> procs -----------memory---------- ---swap-- -----io---- -system--
> > >> >> ------cpu-----
> > >> >>
> > >> >>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs
> us
> > sy
> > >> >> id wa st
> > >> >>
> > >> >>  8  0      0 234444  19064 24046980    0    0    17  2026    1    3
> > 38
> > >> 33
> > >> >> 28  0  1
> > >> >>
> > >> >>  7  0      0 256444  19036 24023880    0    0   768     0 64027
> 22708
> > >> 44
> > >> >> 40 16  0  1
> > >> >>
> > >> >>  7  0      0 245356  19052 24034560    0    0   256   472 63509
> 23276
> > >> 44
> > >> >> 39 17  0  1
> > >> >>
> > >> >>  7  0      0 235096  19052 24046616    0    0     0     0 62277
> 22516
> > >> 46
> > >> >> 38 15  0  1
> > >> >>
> > >> >>  8  0      0 260548  19036 24020084    0    0   516 49888 62364
> 22894
> > >> 43
> > >> >> 38 18  0  1
> > >> >>
> > >> >>  5  0      0 249232  19036 24030924    0    0   512     0 61022
> 24589
> > >> 41
> > >> >> 39 20  0  1
> > >> >>
> > >> >>  6  0      0 238072  19036 24042512    0    0  1024     0 63358
> 23063
> > >> 44
> > >> >> 38 17  0  0
> > >> >>
> > >> >>  5  0      0 262904  19052 24017972    0    0     0   440 63078
> 23499
> > >> 46
> > >> >> 37 17  0  1
> > >> >>
> > >> >>  7  0      0 250324  19052 24030008    0    0     0     0 64615
> 22617
> > >> 48
> > >> >> 38 14  0  1
> > >> >>
> > >> >>  6  0      0 237920  19052 24042372    0    0  1024 48900 63223
> 23029
> > >> 42
> > >> >> 40 18  0  1
> > >> >>
> > >> >>
> > >> >> *IO Stat:*
> > >> >>
> > >> >> [root]# iostat -m
> > >> >>
> > >> >> Linux 4.14.72-73.55.amzn2.x86_64 (
> loc-kafka11.internal.dnaspaces.io)
> > >> >> 01/02/2020        _x86_64_             (4 CPU)
> > >> >>
> > >> >>
> > >> >>
> > >> >> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> > >> >>
> > >> >>           38.11    0.00   33.09    0.11    0.61   28.08
> > >> >>
> > >> >>
> > >> >>
> > >> >> Device:            tps    MB_read/s    MB_wrtn/s    MB_read
> > MB_wrtn
> > >> >>
> > >> >> xvda              2.36         0.01         0.01      26760
> > 43360
> > >> >>
> > >> >> nvme0n1           0.00         0.00         0.00          2
> > 0
> > >> >>
> > >> >> xvdf             70.95         0.06         7.67     185908
> >  25205338
> > >> >>
> > >> >> *Top Kafka broker threads:*
> > >> >> [image: image.png]
> > >> >>
> > >> >> *Top 3:*
> > >> >>
> > >> >>
> > >>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> > >> >> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> > >> >> [0x00007f8a886ce000]
> > >> >>
> > >> >>
> > >>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> > >> >> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> > >> >> [0x00007f8a6aefd000]
> > >> >>
> > >> >>
> > >>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> > >> >> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> > >> >> [0x00007f8a885cd000]
> > >> >>
> > >> >> It doesn't looks like GC and IO is the problem.
> > >> >>
> > >> >> Thanks
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: High CPU Usage on Brokers

Posted by Lisheng Wang <wa...@gmail.com>.
Hi Navneeth

like the bug you said above,  do you set sun.security.jgss.native = true?

if not, there are some items need to be check.

1. GC, but you say gc is not problem
2. if you suspect network thread, how many thread did you set?
3. if you enable compression
4. did you change the value of batch.size at producer side?
5. do you think you can increase "fetch.min,bytes" at consumer side and
"replica.fetch.min.bytes" at broker to test if cpu usage can be down ?
6. you can check some metrics from jmx to analysis, e.g. checking
"kafka.network:type=RequestMetrics,
name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}", if
valus is high , that means cpu will be busy.

Best,
Lisheng


Navneeth Krishnan <re...@gmail.com> 于2020年1月8日周三 下午3:39写道:

> Hi All,
>
> Any suggestions, we are running into this issue in production and any
> help would be greatly appreciated.
>
> Thanks
>
> On Mon, Jan 6, 2020 at 9:26 PM Navneeth Krishnan <reachnavneeth2@gmail.com
> >
> wrote:
>
> > Hi,
> >
> > Thanks for the response. We were using version 0.11 previously and all
> our
> > producers/consumers have been upgraded to either 1.0 or to the latest
> 2.3.
> >
> > Is it normal for the network thread to consume more cpu? If you look at
> > it, the network thread consumes 50% of the overall cpu.
> >
> > Regards
> >
> > On Mon, Jan 6, 2020 at 7:04 PM Thunder Stumpges <
> > thunder.stumpges@gmail.com> wrote:
> >
> >> Not sure what version your producers/consumers are, or if you upgraded
> >> from
> >> a previous version that used to work, or what, but maybe you're hitting
> >> this?
> >>
> >>
> >>
> https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact
> >>
> >>
> >>
> >> On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <
> >> reachnavneeth2@gmail.com>
> >> wrote:
> >>
> >> > Hi All,
> >> >
> >> > Any idea on what can be done? Not sure if we are running into this
> below
> >> > bug.
> >> >
> >> > https://issues.apache.org/jira/browse/KAFKA-7925
> >> >
> >> > Thanks
> >> >
> >> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
> >> reachnavneeth2@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi All,
> >> >>
> >> >> We have a kafka cluster with 12 nodes and we are pretty much seeing
> 90%
> >> >> cpu usage on all the nodes. Here is all the information. Need some
> >> help on
> >> >> figuring out what the problem is and how to overcome this issue.
> >> >>
> >> >> *Cluster:*
> >> >> Kafka version: 2.3.0
> >> >> Number of brokers in cluster: 12
> >> >> Node type: 4 vCores 32GB mem
> >> >> Network In: 10Mbps per broker
> >> >> Network Out: 16Mbps per broker
> >> >> Topics: 10 (approximately)
> >> >> Partitions: 20 (Max), some has only partitions
> >> >> Replication Factor: 3
> >> >>
> >> >> *CPU Usage:*
> >> >> [image: image.png]
> >> >>
> >> >> *VMStat*
> >> >>
> >> >> [root]# vmstat 1 10
> >> >>
> >> >> procs -----------memory---------- ---swap-- -----io---- -system--
> >> >> ------cpu-----
> >> >>
> >> >>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
> sy
> >> >> id wa st
> >> >>
> >> >>  8  0      0 234444  19064 24046980    0    0    17  2026    1    3
> 38
> >> 33
> >> >> 28  0  1
> >> >>
> >> >>  7  0      0 256444  19036 24023880    0    0   768     0 64027 22708
> >> 44
> >> >> 40 16  0  1
> >> >>
> >> >>  7  0      0 245356  19052 24034560    0    0   256   472 63509 23276
> >> 44
> >> >> 39 17  0  1
> >> >>
> >> >>  7  0      0 235096  19052 24046616    0    0     0     0 62277 22516
> >> 46
> >> >> 38 15  0  1
> >> >>
> >> >>  8  0      0 260548  19036 24020084    0    0   516 49888 62364 22894
> >> 43
> >> >> 38 18  0  1
> >> >>
> >> >>  5  0      0 249232  19036 24030924    0    0   512     0 61022 24589
> >> 41
> >> >> 39 20  0  1
> >> >>
> >> >>  6  0      0 238072  19036 24042512    0    0  1024     0 63358 23063
> >> 44
> >> >> 38 17  0  0
> >> >>
> >> >>  5  0      0 262904  19052 24017972    0    0     0   440 63078 23499
> >> 46
> >> >> 37 17  0  1
> >> >>
> >> >>  7  0      0 250324  19052 24030008    0    0     0     0 64615 22617
> >> 48
> >> >> 38 14  0  1
> >> >>
> >> >>  6  0      0 237920  19052 24042372    0    0  1024 48900 63223 23029
> >> 42
> >> >> 40 18  0  1
> >> >>
> >> >>
> >> >> *IO Stat:*
> >> >>
> >> >> [root]# iostat -m
> >> >>
> >> >> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> >> >> 01/02/2020        _x86_64_             (4 CPU)
> >> >>
> >> >>
> >> >>
> >> >> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >> >>
> >> >>           38.11    0.00   33.09    0.11    0.61   28.08
> >> >>
> >> >>
> >> >>
> >> >> Device:            tps    MB_read/s    MB_wrtn/s    MB_read
> MB_wrtn
> >> >>
> >> >> xvda              2.36         0.01         0.01      26760
> 43360
> >> >>
> >> >> nvme0n1           0.00         0.00         0.00          2
> 0
> >> >>
> >> >> xvdf             70.95         0.06         7.67     185908
>  25205338
> >> >>
> >> >> *Top Kafka broker threads:*
> >> >> [image: image.png]
> >> >>
> >> >> *Top 3:*
> >> >>
> >> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> >> >> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> >> >> [0x00007f8a886ce000]
> >> >>
> >> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> >> >> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> >> >> [0x00007f8a6aefd000]
> >> >>
> >> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> >> >> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> >> >> [0x00007f8a885cd000]
> >> >>
> >> >> It doesn't looks like GC and IO is the problem.
> >> >>
> >> >> Thanks
> >> >>
> >> >
> >>
> >
>

Re: High CPU Usage on Brokers

Posted by Navneeth Krishnan <re...@gmail.com>.
Hi All,

Any suggestions, we are running into this issue in production and any
help would be greatly appreciated.

Thanks

On Mon, Jan 6, 2020 at 9:26 PM Navneeth Krishnan <re...@gmail.com>
wrote:

> Hi,
>
> Thanks for the response. We were using version 0.11 previously and all our
> producers/consumers have been upgraded to either 1.0 or to the latest 2.3.
>
> Is it normal for the network thread to consume more cpu? If you look at
> it, the network thread consumes 50% of the overall cpu.
>
> Regards
>
> On Mon, Jan 6, 2020 at 7:04 PM Thunder Stumpges <
> thunder.stumpges@gmail.com> wrote:
>
>> Not sure what version your producers/consumers are, or if you upgraded
>> from
>> a previous version that used to work, or what, but maybe you're hitting
>> this?
>>
>>
>> https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact
>>
>>
>>
>> On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <
>> reachnavneeth2@gmail.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > Any idea on what can be done? Not sure if we are running into this below
>> > bug.
>> >
>> > https://issues.apache.org/jira/browse/KAFKA-7925
>> >
>> > Thanks
>> >
>> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
>> reachnavneeth2@gmail.com>
>> > wrote:
>> >
>> >> Hi All,
>> >>
>> >> We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
>> >> cpu usage on all the nodes. Here is all the information. Need some
>> help on
>> >> figuring out what the problem is and how to overcome this issue.
>> >>
>> >> *Cluster:*
>> >> Kafka version: 2.3.0
>> >> Number of brokers in cluster: 12
>> >> Node type: 4 vCores 32GB mem
>> >> Network In: 10Mbps per broker
>> >> Network Out: 16Mbps per broker
>> >> Topics: 10 (approximately)
>> >> Partitions: 20 (Max), some has only partitions
>> >> Replication Factor: 3
>> >>
>> >> *CPU Usage:*
>> >> [image: image.png]
>> >>
>> >> *VMStat*
>> >>
>> >> [root]# vmstat 1 10
>> >>
>> >> procs -----------memory---------- ---swap-- -----io---- -system--
>> >> ------cpu-----
>> >>
>> >>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> >> id wa st
>> >>
>> >>  8  0      0 234444  19064 24046980    0    0    17  2026    1    3 38
>> 33
>> >> 28  0  1
>> >>
>> >>  7  0      0 256444  19036 24023880    0    0   768     0 64027 22708
>> 44
>> >> 40 16  0  1
>> >>
>> >>  7  0      0 245356  19052 24034560    0    0   256   472 63509 23276
>> 44
>> >> 39 17  0  1
>> >>
>> >>  7  0      0 235096  19052 24046616    0    0     0     0 62277 22516
>> 46
>> >> 38 15  0  1
>> >>
>> >>  8  0      0 260548  19036 24020084    0    0   516 49888 62364 22894
>> 43
>> >> 38 18  0  1
>> >>
>> >>  5  0      0 249232  19036 24030924    0    0   512     0 61022 24589
>> 41
>> >> 39 20  0  1
>> >>
>> >>  6  0      0 238072  19036 24042512    0    0  1024     0 63358 23063
>> 44
>> >> 38 17  0  0
>> >>
>> >>  5  0      0 262904  19052 24017972    0    0     0   440 63078 23499
>> 46
>> >> 37 17  0  1
>> >>
>> >>  7  0      0 250324  19052 24030008    0    0     0     0 64615 22617
>> 48
>> >> 38 14  0  1
>> >>
>> >>  6  0      0 237920  19052 24042372    0    0  1024 48900 63223 23029
>> 42
>> >> 40 18  0  1
>> >>
>> >>
>> >> *IO Stat:*
>> >>
>> >> [root]# iostat -m
>> >>
>> >> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
>> >> 01/02/2020        _x86_64_             (4 CPU)
>> >>
>> >>
>> >>
>> >> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>> >>
>> >>           38.11    0.00   33.09    0.11    0.61   28.08
>> >>
>> >>
>> >>
>> >> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
>> >>
>> >> xvda              2.36         0.01         0.01      26760      43360
>> >>
>> >> nvme0n1           0.00         0.00         0.00          2          0
>> >>
>> >> xvdf             70.95         0.06         7.67     185908   25205338
>> >>
>> >> *Top Kafka broker threads:*
>> >> [image: image.png]
>> >>
>> >> *Top 3:*
>> >>
>> >>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
>> >> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
>> >> [0x00007f8a886ce000]
>> >>
>> >>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
>> >> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
>> >> [0x00007f8a6aefd000]
>> >>
>> >>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
>> >> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
>> >> [0x00007f8a885cd000]
>> >>
>> >> It doesn't looks like GC and IO is the problem.
>> >>
>> >> Thanks
>> >>
>> >
>>
>

Re: High CPU Usage on Brokers

Posted by Navneeth Krishnan <re...@gmail.com>.
Hi,

Thanks for the response. We were using version 0.11 previously and all our
producers/consumers have been upgraded to either 1.0 or to the latest 2.3.

Is it normal for the network thread to consume more cpu? If you look at it,
the network thread consumes 50% of the overall cpu.

Regards

On Mon, Jan 6, 2020 at 7:04 PM Thunder Stumpges <th...@gmail.com>
wrote:

> Not sure what version your producers/consumers are, or if you upgraded from
> a previous version that used to work, or what, but maybe you're hitting
> this?
>
>
> https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact
>
>
>
> On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <
> reachnavneeth2@gmail.com>
> wrote:
>
> > Hi All,
> >
> > Any idea on what can be done? Not sure if we are running into this below
> > bug.
> >
> > https://issues.apache.org/jira/browse/KAFKA-7925
> >
> > Thanks
> >
> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
> reachnavneeth2@gmail.com>
> > wrote:
> >
> >> Hi All,
> >>
> >> We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
> >> cpu usage on all the nodes. Here is all the information. Need some help
> on
> >> figuring out what the problem is and how to overcome this issue.
> >>
> >> *Cluster:*
> >> Kafka version: 2.3.0
> >> Number of brokers in cluster: 12
> >> Node type: 4 vCores 32GB mem
> >> Network In: 10Mbps per broker
> >> Network Out: 16Mbps per broker
> >> Topics: 10 (approximately)
> >> Partitions: 20 (Max), some has only partitions
> >> Replication Factor: 3
> >>
> >> *CPU Usage:*
> >> [image: image.png]
> >>
> >> *VMStat*
> >>
> >> [root]# vmstat 1 10
> >>
> >> procs -----------memory---------- ---swap-- -----io---- -system--
> >> ------cpu-----
> >>
> >>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> >> id wa st
> >>
> >>  8  0      0 234444  19064 24046980    0    0    17  2026    1    3 38
> 33
> >> 28  0  1
> >>
> >>  7  0      0 256444  19036 24023880    0    0   768     0 64027 22708 44
> >> 40 16  0  1
> >>
> >>  7  0      0 245356  19052 24034560    0    0   256   472 63509 23276 44
> >> 39 17  0  1
> >>
> >>  7  0      0 235096  19052 24046616    0    0     0     0 62277 22516 46
> >> 38 15  0  1
> >>
> >>  8  0      0 260548  19036 24020084    0    0   516 49888 62364 22894 43
> >> 38 18  0  1
> >>
> >>  5  0      0 249232  19036 24030924    0    0   512     0 61022 24589 41
> >> 39 20  0  1
> >>
> >>  6  0      0 238072  19036 24042512    0    0  1024     0 63358 23063 44
> >> 38 17  0  0
> >>
> >>  5  0      0 262904  19052 24017972    0    0     0   440 63078 23499 46
> >> 37 17  0  1
> >>
> >>  7  0      0 250324  19052 24030008    0    0     0     0 64615 22617 48
> >> 38 14  0  1
> >>
> >>  6  0      0 237920  19052 24042372    0    0  1024 48900 63223 23029 42
> >> 40 18  0  1
> >>
> >>
> >> *IO Stat:*
> >>
> >> [root]# iostat -m
> >>
> >> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> >> 01/02/2020        _x86_64_             (4 CPU)
> >>
> >>
> >>
> >> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >>
> >>           38.11    0.00   33.09    0.11    0.61   28.08
> >>
> >>
> >>
> >> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> >>
> >> xvda              2.36         0.01         0.01      26760      43360
> >>
> >> nvme0n1           0.00         0.00         0.00          2          0
> >>
> >> xvdf             70.95         0.06         7.67     185908   25205338
> >>
> >> *Top Kafka broker threads:*
> >> [image: image.png]
> >>
> >> *Top 3:*
> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> >> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> >> [0x00007f8a886ce000]
> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> >> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> >> [0x00007f8a6aefd000]
> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> >> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> >> [0x00007f8a885cd000]
> >>
> >> It doesn't looks like GC and IO is the problem.
> >>
> >> Thanks
> >>
> >
>

Re: High CPU Usage on Brokers

Posted by Thunder Stumpges <th...@gmail.com>.
Not sure what version your producers/consumers are, or if you upgraded from
a previous version that used to work, or what, but maybe you're hitting
this?

https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact



On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <re...@gmail.com>
wrote:

> Hi All,
>
> Any idea on what can be done? Not sure if we are running into this below
> bug.
>
> https://issues.apache.org/jira/browse/KAFKA-7925
>
> Thanks
>
> On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <re...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
>> cpu usage on all the nodes. Here is all the information. Need some help on
>> figuring out what the problem is and how to overcome this issue.
>>
>> *Cluster:*
>> Kafka version: 2.3.0
>> Number of brokers in cluster: 12
>> Node type: 4 vCores 32GB mem
>> Network In: 10Mbps per broker
>> Network Out: 16Mbps per broker
>> Topics: 10 (approximately)
>> Partitions: 20 (Max), some has only partitions
>> Replication Factor: 3
>>
>> *CPU Usage:*
>> [image: image.png]
>>
>> *VMStat*
>>
>> [root]# vmstat 1 10
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ------cpu-----
>>
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> id wa st
>>
>>  8  0      0 234444  19064 24046980    0    0    17  2026    1    3 38 33
>> 28  0  1
>>
>>  7  0      0 256444  19036 24023880    0    0   768     0 64027 22708 44
>> 40 16  0  1
>>
>>  7  0      0 245356  19052 24034560    0    0   256   472 63509 23276 44
>> 39 17  0  1
>>
>>  7  0      0 235096  19052 24046616    0    0     0     0 62277 22516 46
>> 38 15  0  1
>>
>>  8  0      0 260548  19036 24020084    0    0   516 49888 62364 22894 43
>> 38 18  0  1
>>
>>  5  0      0 249232  19036 24030924    0    0   512     0 61022 24589 41
>> 39 20  0  1
>>
>>  6  0      0 238072  19036 24042512    0    0  1024     0 63358 23063 44
>> 38 17  0  0
>>
>>  5  0      0 262904  19052 24017972    0    0     0   440 63078 23499 46
>> 37 17  0  1
>>
>>  7  0      0 250324  19052 24030008    0    0     0     0 64615 22617 48
>> 38 14  0  1
>>
>>  6  0      0 237920  19052 24042372    0    0  1024 48900 63223 23029 42
>> 40 18  0  1
>>
>>
>> *IO Stat:*
>>
>> [root]# iostat -m
>>
>> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
>> 01/02/2020        _x86_64_             (4 CPU)
>>
>>
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>
>>           38.11    0.00   33.09    0.11    0.61   28.08
>>
>>
>>
>> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
>>
>> xvda              2.36         0.01         0.01      26760      43360
>>
>> nvme0n1           0.00         0.00         0.00          2          0
>>
>> xvdf             70.95         0.06         7.67     185908   25205338
>>
>> *Top Kafka broker threads:*
>> [image: image.png]
>>
>> *Top 3:*
>>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
>> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
>> [0x00007f8a886ce000]
>>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
>> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
>> [0x00007f8a6aefd000]
>>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
>> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
>> [0x00007f8a885cd000]
>>
>> It doesn't looks like GC and IO is the problem.
>>
>> Thanks
>>
>

Re: High CPU Usage on Brokers

Posted by Navneeth Krishnan <re...@gmail.com>.
Hi All,

Any idea on what can be done? Not sure if we are running into this below
bug.

https://issues.apache.org/jira/browse/KAFKA-7925

Thanks

On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <re...@gmail.com>
wrote:

> Hi All,
>
> We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
> cpu usage on all the nodes. Here is all the information. Need some help on
> figuring out what the problem is and how to overcome this issue.
>
> *Cluster:*
> Kafka version: 2.3.0
> Number of brokers in cluster: 12
> Node type: 4 vCores 32GB mem
> Network In: 10Mbps per broker
> Network Out: 16Mbps per broker
> Topics: 10 (approximately)
> Partitions: 20 (Max), some has only partitions
> Replication Factor: 3
>
> *CPU Usage:*
> [image: image.png]
>
> *VMStat*
>
> [root]# vmstat 1 10
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ------cpu-----
>
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa st
>
>  8  0      0 234444  19064 24046980    0    0    17  2026    1    3 38 33
> 28  0  1
>
>  7  0      0 256444  19036 24023880    0    0   768     0 64027 22708 44
> 40 16  0  1
>
>  7  0      0 245356  19052 24034560    0    0   256   472 63509 23276 44
> 39 17  0  1
>
>  7  0      0 235096  19052 24046616    0    0     0     0 62277 22516 46
> 38 15  0  1
>
>  8  0      0 260548  19036 24020084    0    0   516 49888 62364 22894 43
> 38 18  0  1
>
>  5  0      0 249232  19036 24030924    0    0   512     0 61022 24589 41
> 39 20  0  1
>
>  6  0      0 238072  19036 24042512    0    0  1024     0 63358 23063 44
> 38 17  0  0
>
>  5  0      0 262904  19052 24017972    0    0     0   440 63078 23499 46
> 37 17  0  1
>
>  7  0      0 250324  19052 24030008    0    0     0     0 64615 22617 48
> 38 14  0  1
>
>  6  0      0 237920  19052 24042372    0    0  1024 48900 63223 23029 42
> 40 18  0  1
>
>
> *IO Stat:*
>
> [root]# iostat -m
>
> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> 01/02/2020        _x86_64_             (4 CPU)
>
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>
>           38.11    0.00   33.09    0.11    0.61   28.08
>
>
>
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
>
> xvda              2.36         0.01         0.01      26760      43360
>
> nvme0n1           0.00         0.00         0.00          2          0
>
> xvdf             70.95         0.06         7.67     185908   25205338
>
> *Top Kafka broker threads:*
> [image: image.png]
>
> *Top 3:*
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> [0x00007f8a886ce000]
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> [0x00007f8a6aefd000]
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> [0x00007f8a885cd000]
>
> It doesn't looks like GC and IO is the problem.
>
> Thanks
>