You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Navneeth Krishnan <re...@gmail.com> on 2020/01/03 00:18:00 UTC
High CPU Usage on Brokers
Hi All,
We have a kafka cluster with 12 nodes and we are pretty much seeing 90% cpu
usage on all the nodes. Here is all the information. Need some help on
figuring out what the problem is and how to overcome this issue.
*Cluster:*
Kafka version: 2.3.0
Number of brokers in cluster: 12
Node type: 4 vCores 32GB mem
Network In: 10Mbps per broker
Network Out: 16Mbps per broker
Topics: 10 (approximately)
Partitions: 20 (Max), some has only partitions
Replication Factor: 3
*CPU Usage:*
[image: image.png]
*VMStat*
[root]# vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- -system--
------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id
wa st
8 0 0 234444 19064 24046980 0 0 17 2026 1 3 38 33
28 0 1
7 0 0 256444 19036 24023880 0 0 768 0 64027 22708 44 40
16 0 1
7 0 0 245356 19052 24034560 0 0 256 472 63509 23276 44 39
17 0 1
7 0 0 235096 19052 24046616 0 0 0 0 62277 22516 46 38
15 0 1
8 0 0 260548 19036 24020084 0 0 516 49888 62364 22894 43 38
18 0 1
5 0 0 249232 19036 24030924 0 0 512 0 61022 24589 41 39
20 0 1
6 0 0 238072 19036 24042512 0 0 1024 0 63358 23063 44 38
17 0 0
5 0 0 262904 19052 24017972 0 0 0 440 63078 23499 46 37
17 0 1
7 0 0 250324 19052 24030008 0 0 0 0 64615 22617 48 38
14 0 1
6 0 0 237920 19052 24042372 0 0 1024 48900 63223 23029 42 40
18 0 1
*IO Stat:*
[root]# iostat -m
Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
01/02/2020 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
38.11 0.00 33.09 0.11 0.61 28.08
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
xvda 2.36 0.01 0.01 26760 43360
nvme0n1 0.00 0.00 0.00 2 0
xvdf 70.95 0.06 7.67 185908 25205338
*Top Kafka broker threads:*
[image: image.png]
*Top 3:*
"data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
#60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
[0x00007f8a886ce000]
"data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
#62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
[0x00007f8a6aefd000]
"data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
#61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
[0x00007f8a885cd000]
It doesn't looks like GC and IO is the problem.
Thanks
Re: High CPU Usage on Brokers
Posted by Ismael Juma <is...@juma.me.uk>.
You can take a profile with Java Flight Recorder if you use Java 11 or
using async profiler otherwise. See below for the latter:
https://issues.apache.org/jira/browse/KAFKA-9339?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17013400#comment-17013400
It's worth filing a JIRA and discuss it there.
Ismael
On Sun, Jan 12, 2020 at 10:28 PM Navneeth Krishnan <re...@gmail.com>
wrote:
> Hi Ismael,
>
> We were previously running on 0.10.2.1 with 8 brokers running around 80%
> CPU. But now we have upgraded to 2.3 with 16 brokers. It's the same message
> rate, topics, producers and consumers but the CPU is still >80%. How can we
> troubleshoot to find where exactly is the problem?
>
> Thanks
>
> On Wed, Jan 8, 2020 at 10:33 AM Ismael Juma <is...@juma.me.uk> wrote:
>
> > Has the behavior changed after an upgrade or has it been consistent since
> > the start?
> >
> > Ismael
> >
> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
> reachnavneeth2@gmail.com
> > >
> > wrote:
> >
> > > Hi All,
> > >
> > > We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
> > > cpu usage on all the nodes. Here is all the information. Need some help
> > on
> > > figuring out what the problem is and how to overcome this issue.
> > >
> > > *Cluster:*
> > > Kafka version: 2.3.0
> > > Number of brokers in cluster: 12
> > > Node type: 4 vCores 32GB mem
> > > Network In: 10Mbps per broker
> > > Network Out: 16Mbps per broker
> > > Topics: 10 (approximately)
> > > Partitions: 20 (Max), some has only partitions
> > > Replication Factor: 3
> > >
> > > *CPU Usage:*
> > > [image: image.png]
> > >
> > > *VMStat*
> > >
> > > [root]# vmstat 1 10
> > >
> > > procs -----------memory---------- ---swap-- -----io---- -system--
> > > ------cpu-----
> > >
> > > r b swpd free buff cache si so bi bo in cs us sy
> > id
> > > wa st
> > >
> > > 8 0 0 234444 19064 24046980 0 0 17 2026 1 3 38
> 33
> > > 28 0 1
> > >
> > > 7 0 0 256444 19036 24023880 0 0 768 0 64027 22708
> 44
> > > 40 16 0 1
> > >
> > > 7 0 0 245356 19052 24034560 0 0 256 472 63509 23276
> 44
> > > 39 17 0 1
> > >
> > > 7 0 0 235096 19052 24046616 0 0 0 0 62277 22516
> 46
> > > 38 15 0 1
> > >
> > > 8 0 0 260548 19036 24020084 0 0 516 49888 62364 22894
> 43
> > > 38 18 0 1
> > >
> > > 5 0 0 249232 19036 24030924 0 0 512 0 61022 24589
> 41
> > > 39 20 0 1
> > >
> > > 6 0 0 238072 19036 24042512 0 0 1024 0 63358 23063
> 44
> > > 38 17 0 0
> > >
> > > 5 0 0 262904 19052 24017972 0 0 0 440 63078 23499
> 46
> > > 37 17 0 1
> > >
> > > 7 0 0 250324 19052 24030008 0 0 0 0 64615 22617
> 48
> > > 38 14 0 1
> > >
> > > 6 0 0 237920 19052 24042372 0 0 1024 48900 63223 23029
> 42
> > > 40 18 0 1
> > >
> > >
> > > *IO Stat:*
> > >
> > > [root]# iostat -m
> > >
> > > Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> > > 01/02/2020 _x86_64_ (4 CPU)
> > >
> > >
> > >
> > > avg-cpu: %user %nice %system %iowait %steal %idle
> > >
> > > 38.11 0.00 33.09 0.11 0.61 28.08
> > >
> > >
> > >
> > > Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
> > >
> > > xvda 2.36 0.01 0.01 26760 43360
> > >
> > > nvme0n1 0.00 0.00 0.00 2 0
> > >
> > > xvdf 70.95 0.06 7.67 185908 25205338
> > >
> > > *Top Kafka broker threads:*
> > > [image: image.png]
> > >
> > > *Top 3:*
> > >
> > >
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> > > #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> > > [0x00007f8a886ce000]
> > >
> > >
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> > > #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> > > [0x00007f8a6aefd000]
> > >
> > >
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> > > #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> > > [0x00007f8a885cd000]
> > >
> > > It doesn't looks like GC and IO is the problem.
> > >
> > > Thanks
> > >
> >
>
Re: High CPU Usage on Brokers
Posted by Navneeth Krishnan <re...@gmail.com>.
Hi Ismael,
We were previously running on 0.10.2.1 with 8 brokers running around 80%
CPU. But now we have upgraded to 2.3 with 16 brokers. It's the same message
rate, topics, producers and consumers but the CPU is still >80%. How can we
troubleshoot to find where exactly is the problem?
Thanks
On Wed, Jan 8, 2020 at 10:33 AM Ismael Juma <is...@juma.me.uk> wrote:
> Has the behavior changed after an upgrade or has it been consistent since
> the start?
>
> Ismael
>
> On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <reachnavneeth2@gmail.com
> >
> wrote:
>
> > Hi All,
> >
> > We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
> > cpu usage on all the nodes. Here is all the information. Need some help
> on
> > figuring out what the problem is and how to overcome this issue.
> >
> > *Cluster:*
> > Kafka version: 2.3.0
> > Number of brokers in cluster: 12
> > Node type: 4 vCores 32GB mem
> > Network In: 10Mbps per broker
> > Network Out: 16Mbps per broker
> > Topics: 10 (approximately)
> > Partitions: 20 (Max), some has only partitions
> > Replication Factor: 3
> >
> > *CPU Usage:*
> > [image: image.png]
> >
> > *VMStat*
> >
> > [root]# vmstat 1 10
> >
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ------cpu-----
> >
> > r b swpd free buff cache si so bi bo in cs us sy
> id
> > wa st
> >
> > 8 0 0 234444 19064 24046980 0 0 17 2026 1 3 38 33
> > 28 0 1
> >
> > 7 0 0 256444 19036 24023880 0 0 768 0 64027 22708 44
> > 40 16 0 1
> >
> > 7 0 0 245356 19052 24034560 0 0 256 472 63509 23276 44
> > 39 17 0 1
> >
> > 7 0 0 235096 19052 24046616 0 0 0 0 62277 22516 46
> > 38 15 0 1
> >
> > 8 0 0 260548 19036 24020084 0 0 516 49888 62364 22894 43
> > 38 18 0 1
> >
> > 5 0 0 249232 19036 24030924 0 0 512 0 61022 24589 41
> > 39 20 0 1
> >
> > 6 0 0 238072 19036 24042512 0 0 1024 0 63358 23063 44
> > 38 17 0 0
> >
> > 5 0 0 262904 19052 24017972 0 0 0 440 63078 23499 46
> > 37 17 0 1
> >
> > 7 0 0 250324 19052 24030008 0 0 0 0 64615 22617 48
> > 38 14 0 1
> >
> > 6 0 0 237920 19052 24042372 0 0 1024 48900 63223 23029 42
> > 40 18 0 1
> >
> >
> > *IO Stat:*
> >
> > [root]# iostat -m
> >
> > Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> > 01/02/2020 _x86_64_ (4 CPU)
> >
> >
> >
> > avg-cpu: %user %nice %system %iowait %steal %idle
> >
> > 38.11 0.00 33.09 0.11 0.61 28.08
> >
> >
> >
> > Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
> >
> > xvda 2.36 0.01 0.01 26760 43360
> >
> > nvme0n1 0.00 0.00 0.00 2 0
> >
> > xvdf 70.95 0.06 7.67 185908 25205338
> >
> > *Top Kafka broker threads:*
> > [image: image.png]
> >
> > *Top 3:*
> >
> > "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> > #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> > [0x00007f8a886ce000]
> >
> > "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> > #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> > [0x00007f8a6aefd000]
> >
> > "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> > #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> > [0x00007f8a885cd000]
> >
> > It doesn't looks like GC and IO is the problem.
> >
> > Thanks
> >
>
Re: High CPU Usage on Brokers
Posted by Ismael Juma <is...@juma.me.uk>.
Has the behavior changed after an upgrade or has it been consistent since
the start?
Ismael
On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <re...@gmail.com>
wrote:
> Hi All,
>
> We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
> cpu usage on all the nodes. Here is all the information. Need some help on
> figuring out what the problem is and how to overcome this issue.
>
> *Cluster:*
> Kafka version: 2.3.0
> Number of brokers in cluster: 12
> Node type: 4 vCores 32GB mem
> Network In: 10Mbps per broker
> Network Out: 16Mbps per broker
> Topics: 10 (approximately)
> Partitions: 20 (Max), some has only partitions
> Replication Factor: 3
>
> *CPU Usage:*
> [image: image.png]
>
> *VMStat*
>
> [root]# vmstat 1 10
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ------cpu-----
>
> r b swpd free buff cache si so bi bo in cs us sy id
> wa st
>
> 8 0 0 234444 19064 24046980 0 0 17 2026 1 3 38 33
> 28 0 1
>
> 7 0 0 256444 19036 24023880 0 0 768 0 64027 22708 44
> 40 16 0 1
>
> 7 0 0 245356 19052 24034560 0 0 256 472 63509 23276 44
> 39 17 0 1
>
> 7 0 0 235096 19052 24046616 0 0 0 0 62277 22516 46
> 38 15 0 1
>
> 8 0 0 260548 19036 24020084 0 0 516 49888 62364 22894 43
> 38 18 0 1
>
> 5 0 0 249232 19036 24030924 0 0 512 0 61022 24589 41
> 39 20 0 1
>
> 6 0 0 238072 19036 24042512 0 0 1024 0 63358 23063 44
> 38 17 0 0
>
> 5 0 0 262904 19052 24017972 0 0 0 440 63078 23499 46
> 37 17 0 1
>
> 7 0 0 250324 19052 24030008 0 0 0 0 64615 22617 48
> 38 14 0 1
>
> 6 0 0 237920 19052 24042372 0 0 1024 48900 63223 23029 42
> 40 18 0 1
>
>
> *IO Stat:*
>
> [root]# iostat -m
>
> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> 01/02/2020 _x86_64_ (4 CPU)
>
>
>
> avg-cpu: %user %nice %system %iowait %steal %idle
>
> 38.11 0.00 33.09 0.11 0.61 28.08
>
>
>
> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
>
> xvda 2.36 0.01 0.01 26760 43360
>
> nvme0n1 0.00 0.00 0.00 2 0
>
> xvdf 70.95 0.06 7.67 185908 25205338
>
> *Top Kafka broker threads:*
> [image: image.png]
>
> *Top 3:*
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> [0x00007f8a886ce000]
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> [0x00007f8a6aefd000]
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> [0x00007f8a885cd000]
>
> It doesn't looks like GC and IO is the problem.
>
> Thanks
>
Re: High CPU Usage on Brokers
Posted by Navneeth Krishnan <re...@gmail.com>.
Hi Lisheng,
Here are the answers to your questions.
do you set sun.security.jgss.native = true? No
if not, there are some items need to be check.
1. GC, but you say gc is not problem
- I have verified GC multiple times and I don't see that to be an issue.
2. if you suspect network thread, how many thread did you set?
- Currently there are 3 network threads per broker and 8 io threads
3. if you enable compression
- No, compression is not enabled
4. did you change the value of batch.size at producer side?
- No, there hasn't been any recent changes
5. do you think you can increase "fetch.min,bytes" at consumer side and
"replica.fetch.min.bytes" at broker to test if cpu usage can be down ?
- Haven't tried it. If this will decrease the CPU then we can give that a
try.
6. you can check some metrics from jmx to analysis, e.g. checking
"kafka.network:type=RequestMetrics,
name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}", if
valus is high , that means cpu will be busy.
- I don't see RequestsPerSec metrics in 2.3. I have the
"kafka.network:type=RequestMetrics,name=TotalTimeMs"
metric.
ProducerTotalTimeMs - 1.25 ms
FetchFollowerTotalTimeMs - 2.53 ms
FetchConsumerToalTimeMs - 12.5 ms
Thanks.
On Wed, Jan 8, 2020 at 1:29 AM Lisheng Wang <wa...@gmail.com> wrote:
> Hi Navneeth
>
> like the bug you said above, do you set sun.security.jgss.native = true?
>
> if not, there are some items need to be check.
>
> 1. GC, but you say gc is not problem
> 2. if you suspect network thread, how many thread did you set?
> 3. if you enable compression
> 4. did you change the value of batch.size at producer side?
> 5. do you think you can increase "fetch.min,bytes" at consumer side and
> "replica.fetch.min.bytes" at broker to test if cpu usage can be down ?
> 6. you can check some metrics from jmx to analysis, e.g. checking
> "kafka.network:type=RequestMetrics,
> name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}", if
> valus is high , that means cpu will be busy.
>
> Best,
> Lisheng
>
>
> Navneeth Krishnan <re...@gmail.com> 于2020年1月8日周三 下午3:39写道:
>
> > Hi All,
> >
> > Any suggestions, we are running into this issue in production and any
> > help would be greatly appreciated.
> >
> > Thanks
> >
> > On Mon, Jan 6, 2020 at 9:26 PM Navneeth Krishnan <
> reachnavneeth2@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > Thanks for the response. We were using version 0.11 previously and all
> > our
> > > producers/consumers have been upgraded to either 1.0 or to the latest
> > 2.3.
> > >
> > > Is it normal for the network thread to consume more cpu? If you look at
> > > it, the network thread consumes 50% of the overall cpu.
> > >
> > > Regards
> > >
> > > On Mon, Jan 6, 2020 at 7:04 PM Thunder Stumpges <
> > > thunder.stumpges@gmail.com> wrote:
> > >
> > >> Not sure what version your producers/consumers are, or if you upgraded
> > >> from
> > >> a previous version that used to work, or what, but maybe you're
> hitting
> > >> this?
> > >>
> > >>
> > >>
> >
> https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact
> > >>
> > >>
> > >>
> > >> On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <
> > >> reachnavneeth2@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi All,
> > >> >
> > >> > Any idea on what can be done? Not sure if we are running into this
> > below
> > >> > bug.
> > >> >
> > >> > https://issues.apache.org/jira/browse/KAFKA-7925
> > >> >
> > >> > Thanks
> > >> >
> > >> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
> > >> reachnavneeth2@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> Hi All,
> > >> >>
> > >> >> We have a kafka cluster with 12 nodes and we are pretty much seeing
> > 90%
> > >> >> cpu usage on all the nodes. Here is all the information. Need some
> > >> help on
> > >> >> figuring out what the problem is and how to overcome this issue.
> > >> >>
> > >> >> *Cluster:*
> > >> >> Kafka version: 2.3.0
> > >> >> Number of brokers in cluster: 12
> > >> >> Node type: 4 vCores 32GB mem
> > >> >> Network In: 10Mbps per broker
> > >> >> Network Out: 16Mbps per broker
> > >> >> Topics: 10 (approximately)
> > >> >> Partitions: 20 (Max), some has only partitions
> > >> >> Replication Factor: 3
> > >> >>
> > >> >> *CPU Usage:*
> > >> >> [image: image.png]
> > >> >>
> > >> >> *VMStat*
> > >> >>
> > >> >> [root]# vmstat 1 10
> > >> >>
> > >> >> procs -----------memory---------- ---swap-- -----io---- -system--
> > >> >> ------cpu-----
> > >> >>
> > >> >> r b swpd free buff cache si so bi bo in cs
> us
> > sy
> > >> >> id wa st
> > >> >>
> > >> >> 8 0 0 234444 19064 24046980 0 0 17 2026 1 3
> > 38
> > >> 33
> > >> >> 28 0 1
> > >> >>
> > >> >> 7 0 0 256444 19036 24023880 0 0 768 0 64027
> 22708
> > >> 44
> > >> >> 40 16 0 1
> > >> >>
> > >> >> 7 0 0 245356 19052 24034560 0 0 256 472 63509
> 23276
> > >> 44
> > >> >> 39 17 0 1
> > >> >>
> > >> >> 7 0 0 235096 19052 24046616 0 0 0 0 62277
> 22516
> > >> 46
> > >> >> 38 15 0 1
> > >> >>
> > >> >> 8 0 0 260548 19036 24020084 0 0 516 49888 62364
> 22894
> > >> 43
> > >> >> 38 18 0 1
> > >> >>
> > >> >> 5 0 0 249232 19036 24030924 0 0 512 0 61022
> 24589
> > >> 41
> > >> >> 39 20 0 1
> > >> >>
> > >> >> 6 0 0 238072 19036 24042512 0 0 1024 0 63358
> 23063
> > >> 44
> > >> >> 38 17 0 0
> > >> >>
> > >> >> 5 0 0 262904 19052 24017972 0 0 0 440 63078
> 23499
> > >> 46
> > >> >> 37 17 0 1
> > >> >>
> > >> >> 7 0 0 250324 19052 24030008 0 0 0 0 64615
> 22617
> > >> 48
> > >> >> 38 14 0 1
> > >> >>
> > >> >> 6 0 0 237920 19052 24042372 0 0 1024 48900 63223
> 23029
> > >> 42
> > >> >> 40 18 0 1
> > >> >>
> > >> >>
> > >> >> *IO Stat:*
> > >> >>
> > >> >> [root]# iostat -m
> > >> >>
> > >> >> Linux 4.14.72-73.55.amzn2.x86_64 (
> loc-kafka11.internal.dnaspaces.io)
> > >> >> 01/02/2020 _x86_64_ (4 CPU)
> > >> >>
> > >> >>
> > >> >>
> > >> >> avg-cpu: %user %nice %system %iowait %steal %idle
> > >> >>
> > >> >> 38.11 0.00 33.09 0.11 0.61 28.08
> > >> >>
> > >> >>
> > >> >>
> > >> >> Device: tps MB_read/s MB_wrtn/s MB_read
> > MB_wrtn
> > >> >>
> > >> >> xvda 2.36 0.01 0.01 26760
> > 43360
> > >> >>
> > >> >> nvme0n1 0.00 0.00 0.00 2
> > 0
> > >> >>
> > >> >> xvdf 70.95 0.06 7.67 185908
> > 25205338
> > >> >>
> > >> >> *Top Kafka broker threads:*
> > >> >> [image: image.png]
> > >> >>
> > >> >> *Top 3:*
> > >> >>
> > >> >>
> > >>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> > >> >> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> > >> >> [0x00007f8a886ce000]
> > >> >>
> > >> >>
> > >>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> > >> >> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> > >> >> [0x00007f8a6aefd000]
> > >> >>
> > >> >>
> > >>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> > >> >> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> > >> >> [0x00007f8a885cd000]
> > >> >>
> > >> >> It doesn't looks like GC and IO is the problem.
> > >> >>
> > >> >> Thanks
> > >> >>
> > >> >
> > >>
> > >
> >
>
Re: High CPU Usage on Brokers
Posted by Lisheng Wang <wa...@gmail.com>.
Hi Navneeth
like the bug you said above, do you set sun.security.jgss.native = true?
if not, there are some items need to be check.
1. GC, but you say gc is not problem
2. if you suspect network thread, how many thread did you set?
3. if you enable compression
4. did you change the value of batch.size at producer side?
5. do you think you can increase "fetch.min,bytes" at consumer side and
"replica.fetch.min.bytes" at broker to test if cpu usage can be down ?
6. you can check some metrics from jmx to analysis, e.g. checking
"kafka.network:type=RequestMetrics,
name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}", if
valus is high , that means cpu will be busy.
Best,
Lisheng
Navneeth Krishnan <re...@gmail.com> 于2020年1月8日周三 下午3:39写道:
> Hi All,
>
> Any suggestions, we are running into this issue in production and any
> help would be greatly appreciated.
>
> Thanks
>
> On Mon, Jan 6, 2020 at 9:26 PM Navneeth Krishnan <reachnavneeth2@gmail.com
> >
> wrote:
>
> > Hi,
> >
> > Thanks for the response. We were using version 0.11 previously and all
> our
> > producers/consumers have been upgraded to either 1.0 or to the latest
> 2.3.
> >
> > Is it normal for the network thread to consume more cpu? If you look at
> > it, the network thread consumes 50% of the overall cpu.
> >
> > Regards
> >
> > On Mon, Jan 6, 2020 at 7:04 PM Thunder Stumpges <
> > thunder.stumpges@gmail.com> wrote:
> >
> >> Not sure what version your producers/consumers are, or if you upgraded
> >> from
> >> a previous version that used to work, or what, but maybe you're hitting
> >> this?
> >>
> >>
> >>
> https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact
> >>
> >>
> >>
> >> On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <
> >> reachnavneeth2@gmail.com>
> >> wrote:
> >>
> >> > Hi All,
> >> >
> >> > Any idea on what can be done? Not sure if we are running into this
> below
> >> > bug.
> >> >
> >> > https://issues.apache.org/jira/browse/KAFKA-7925
> >> >
> >> > Thanks
> >> >
> >> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
> >> reachnavneeth2@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi All,
> >> >>
> >> >> We have a kafka cluster with 12 nodes and we are pretty much seeing
> 90%
> >> >> cpu usage on all the nodes. Here is all the information. Need some
> >> help on
> >> >> figuring out what the problem is and how to overcome this issue.
> >> >>
> >> >> *Cluster:*
> >> >> Kafka version: 2.3.0
> >> >> Number of brokers in cluster: 12
> >> >> Node type: 4 vCores 32GB mem
> >> >> Network In: 10Mbps per broker
> >> >> Network Out: 16Mbps per broker
> >> >> Topics: 10 (approximately)
> >> >> Partitions: 20 (Max), some has only partitions
> >> >> Replication Factor: 3
> >> >>
> >> >> *CPU Usage:*
> >> >> [image: image.png]
> >> >>
> >> >> *VMStat*
> >> >>
> >> >> [root]# vmstat 1 10
> >> >>
> >> >> procs -----------memory---------- ---swap-- -----io---- -system--
> >> >> ------cpu-----
> >> >>
> >> >> r b swpd free buff cache si so bi bo in cs us
> sy
> >> >> id wa st
> >> >>
> >> >> 8 0 0 234444 19064 24046980 0 0 17 2026 1 3
> 38
> >> 33
> >> >> 28 0 1
> >> >>
> >> >> 7 0 0 256444 19036 24023880 0 0 768 0 64027 22708
> >> 44
> >> >> 40 16 0 1
> >> >>
> >> >> 7 0 0 245356 19052 24034560 0 0 256 472 63509 23276
> >> 44
> >> >> 39 17 0 1
> >> >>
> >> >> 7 0 0 235096 19052 24046616 0 0 0 0 62277 22516
> >> 46
> >> >> 38 15 0 1
> >> >>
> >> >> 8 0 0 260548 19036 24020084 0 0 516 49888 62364 22894
> >> 43
> >> >> 38 18 0 1
> >> >>
> >> >> 5 0 0 249232 19036 24030924 0 0 512 0 61022 24589
> >> 41
> >> >> 39 20 0 1
> >> >>
> >> >> 6 0 0 238072 19036 24042512 0 0 1024 0 63358 23063
> >> 44
> >> >> 38 17 0 0
> >> >>
> >> >> 5 0 0 262904 19052 24017972 0 0 0 440 63078 23499
> >> 46
> >> >> 37 17 0 1
> >> >>
> >> >> 7 0 0 250324 19052 24030008 0 0 0 0 64615 22617
> >> 48
> >> >> 38 14 0 1
> >> >>
> >> >> 6 0 0 237920 19052 24042372 0 0 1024 48900 63223 23029
> >> 42
> >> >> 40 18 0 1
> >> >>
> >> >>
> >> >> *IO Stat:*
> >> >>
> >> >> [root]# iostat -m
> >> >>
> >> >> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> >> >> 01/02/2020 _x86_64_ (4 CPU)
> >> >>
> >> >>
> >> >>
> >> >> avg-cpu: %user %nice %system %iowait %steal %idle
> >> >>
> >> >> 38.11 0.00 33.09 0.11 0.61 28.08
> >> >>
> >> >>
> >> >>
> >> >> Device: tps MB_read/s MB_wrtn/s MB_read
> MB_wrtn
> >> >>
> >> >> xvda 2.36 0.01 0.01 26760
> 43360
> >> >>
> >> >> nvme0n1 0.00 0.00 0.00 2
> 0
> >> >>
> >> >> xvdf 70.95 0.06 7.67 185908
> 25205338
> >> >>
> >> >> *Top Kafka broker threads:*
> >> >> [image: image.png]
> >> >>
> >> >> *Top 3:*
> >> >>
> >> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> >> >> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> >> >> [0x00007f8a886ce000]
> >> >>
> >> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> >> >> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> >> >> [0x00007f8a6aefd000]
> >> >>
> >> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> >> >> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> >> >> [0x00007f8a885cd000]
> >> >>
> >> >> It doesn't looks like GC and IO is the problem.
> >> >>
> >> >> Thanks
> >> >>
> >> >
> >>
> >
>
Re: High CPU Usage on Brokers
Posted by Navneeth Krishnan <re...@gmail.com>.
Hi All,
Any suggestions, we are running into this issue in production and any
help would be greatly appreciated.
Thanks
On Mon, Jan 6, 2020 at 9:26 PM Navneeth Krishnan <re...@gmail.com>
wrote:
> Hi,
>
> Thanks for the response. We were using version 0.11 previously and all our
> producers/consumers have been upgraded to either 1.0 or to the latest 2.3.
>
> Is it normal for the network thread to consume more cpu? If you look at
> it, the network thread consumes 50% of the overall cpu.
>
> Regards
>
> On Mon, Jan 6, 2020 at 7:04 PM Thunder Stumpges <
> thunder.stumpges@gmail.com> wrote:
>
>> Not sure what version your producers/consumers are, or if you upgraded
>> from
>> a previous version that used to work, or what, but maybe you're hitting
>> this?
>>
>>
>> https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact
>>
>>
>>
>> On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <
>> reachnavneeth2@gmail.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > Any idea on what can be done? Not sure if we are running into this below
>> > bug.
>> >
>> > https://issues.apache.org/jira/browse/KAFKA-7925
>> >
>> > Thanks
>> >
>> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
>> reachnavneeth2@gmail.com>
>> > wrote:
>> >
>> >> Hi All,
>> >>
>> >> We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
>> >> cpu usage on all the nodes. Here is all the information. Need some
>> help on
>> >> figuring out what the problem is and how to overcome this issue.
>> >>
>> >> *Cluster:*
>> >> Kafka version: 2.3.0
>> >> Number of brokers in cluster: 12
>> >> Node type: 4 vCores 32GB mem
>> >> Network In: 10Mbps per broker
>> >> Network Out: 16Mbps per broker
>> >> Topics: 10 (approximately)
>> >> Partitions: 20 (Max), some has only partitions
>> >> Replication Factor: 3
>> >>
>> >> *CPU Usage:*
>> >> [image: image.png]
>> >>
>> >> *VMStat*
>> >>
>> >> [root]# vmstat 1 10
>> >>
>> >> procs -----------memory---------- ---swap-- -----io---- -system--
>> >> ------cpu-----
>> >>
>> >> r b swpd free buff cache si so bi bo in cs us sy
>> >> id wa st
>> >>
>> >> 8 0 0 234444 19064 24046980 0 0 17 2026 1 3 38
>> 33
>> >> 28 0 1
>> >>
>> >> 7 0 0 256444 19036 24023880 0 0 768 0 64027 22708
>> 44
>> >> 40 16 0 1
>> >>
>> >> 7 0 0 245356 19052 24034560 0 0 256 472 63509 23276
>> 44
>> >> 39 17 0 1
>> >>
>> >> 7 0 0 235096 19052 24046616 0 0 0 0 62277 22516
>> 46
>> >> 38 15 0 1
>> >>
>> >> 8 0 0 260548 19036 24020084 0 0 516 49888 62364 22894
>> 43
>> >> 38 18 0 1
>> >>
>> >> 5 0 0 249232 19036 24030924 0 0 512 0 61022 24589
>> 41
>> >> 39 20 0 1
>> >>
>> >> 6 0 0 238072 19036 24042512 0 0 1024 0 63358 23063
>> 44
>> >> 38 17 0 0
>> >>
>> >> 5 0 0 262904 19052 24017972 0 0 0 440 63078 23499
>> 46
>> >> 37 17 0 1
>> >>
>> >> 7 0 0 250324 19052 24030008 0 0 0 0 64615 22617
>> 48
>> >> 38 14 0 1
>> >>
>> >> 6 0 0 237920 19052 24042372 0 0 1024 48900 63223 23029
>> 42
>> >> 40 18 0 1
>> >>
>> >>
>> >> *IO Stat:*
>> >>
>> >> [root]# iostat -m
>> >>
>> >> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
>> >> 01/02/2020 _x86_64_ (4 CPU)
>> >>
>> >>
>> >>
>> >> avg-cpu: %user %nice %system %iowait %steal %idle
>> >>
>> >> 38.11 0.00 33.09 0.11 0.61 28.08
>> >>
>> >>
>> >>
>> >> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
>> >>
>> >> xvda 2.36 0.01 0.01 26760 43360
>> >>
>> >> nvme0n1 0.00 0.00 0.00 2 0
>> >>
>> >> xvdf 70.95 0.06 7.67 185908 25205338
>> >>
>> >> *Top Kafka broker threads:*
>> >> [image: image.png]
>> >>
>> >> *Top 3:*
>> >>
>> >>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
>> >> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
>> >> [0x00007f8a886ce000]
>> >>
>> >>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
>> >> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
>> >> [0x00007f8a6aefd000]
>> >>
>> >>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
>> >> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
>> >> [0x00007f8a885cd000]
>> >>
>> >> It doesn't looks like GC and IO is the problem.
>> >>
>> >> Thanks
>> >>
>> >
>>
>
Re: High CPU Usage on Brokers
Posted by Navneeth Krishnan <re...@gmail.com>.
Hi,
Thanks for the response. We were using version 0.11 previously and all our
producers/consumers have been upgraded to either 1.0 or to the latest 2.3.
Is it normal for the network thread to consume more cpu? If you look at it,
the network thread consumes 50% of the overall cpu.
Regards
On Mon, Jan 6, 2020 at 7:04 PM Thunder Stumpges <th...@gmail.com>
wrote:
> Not sure what version your producers/consumers are, or if you upgraded from
> a previous version that used to work, or what, but maybe you're hitting
> this?
>
>
> https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact
>
>
>
> On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <
> reachnavneeth2@gmail.com>
> wrote:
>
> > Hi All,
> >
> > Any idea on what can be done? Not sure if we are running into this below
> > bug.
> >
> > https://issues.apache.org/jira/browse/KAFKA-7925
> >
> > Thanks
> >
> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
> reachnavneeth2@gmail.com>
> > wrote:
> >
> >> Hi All,
> >>
> >> We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
> >> cpu usage on all the nodes. Here is all the information. Need some help
> on
> >> figuring out what the problem is and how to overcome this issue.
> >>
> >> *Cluster:*
> >> Kafka version: 2.3.0
> >> Number of brokers in cluster: 12
> >> Node type: 4 vCores 32GB mem
> >> Network In: 10Mbps per broker
> >> Network Out: 16Mbps per broker
> >> Topics: 10 (approximately)
> >> Partitions: 20 (Max), some has only partitions
> >> Replication Factor: 3
> >>
> >> *CPU Usage:*
> >> [image: image.png]
> >>
> >> *VMStat*
> >>
> >> [root]# vmstat 1 10
> >>
> >> procs -----------memory---------- ---swap-- -----io---- -system--
> >> ------cpu-----
> >>
> >> r b swpd free buff cache si so bi bo in cs us sy
> >> id wa st
> >>
> >> 8 0 0 234444 19064 24046980 0 0 17 2026 1 3 38
> 33
> >> 28 0 1
> >>
> >> 7 0 0 256444 19036 24023880 0 0 768 0 64027 22708 44
> >> 40 16 0 1
> >>
> >> 7 0 0 245356 19052 24034560 0 0 256 472 63509 23276 44
> >> 39 17 0 1
> >>
> >> 7 0 0 235096 19052 24046616 0 0 0 0 62277 22516 46
> >> 38 15 0 1
> >>
> >> 8 0 0 260548 19036 24020084 0 0 516 49888 62364 22894 43
> >> 38 18 0 1
> >>
> >> 5 0 0 249232 19036 24030924 0 0 512 0 61022 24589 41
> >> 39 20 0 1
> >>
> >> 6 0 0 238072 19036 24042512 0 0 1024 0 63358 23063 44
> >> 38 17 0 0
> >>
> >> 5 0 0 262904 19052 24017972 0 0 0 440 63078 23499 46
> >> 37 17 0 1
> >>
> >> 7 0 0 250324 19052 24030008 0 0 0 0 64615 22617 48
> >> 38 14 0 1
> >>
> >> 6 0 0 237920 19052 24042372 0 0 1024 48900 63223 23029 42
> >> 40 18 0 1
> >>
> >>
> >> *IO Stat:*
> >>
> >> [root]# iostat -m
> >>
> >> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> >> 01/02/2020 _x86_64_ (4 CPU)
> >>
> >>
> >>
> >> avg-cpu: %user %nice %system %iowait %steal %idle
> >>
> >> 38.11 0.00 33.09 0.11 0.61 28.08
> >>
> >>
> >>
> >> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
> >>
> >> xvda 2.36 0.01 0.01 26760 43360
> >>
> >> nvme0n1 0.00 0.00 0.00 2 0
> >>
> >> xvdf 70.95 0.06 7.67 185908 25205338
> >>
> >> *Top Kafka broker threads:*
> >> [image: image.png]
> >>
> >> *Top 3:*
> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> >> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> >> [0x00007f8a886ce000]
> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> >> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> >> [0x00007f8a6aefd000]
> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> >> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> >> [0x00007f8a885cd000]
> >>
> >> It doesn't looks like GC and IO is the problem.
> >>
> >> Thanks
> >>
> >
>
Re: High CPU Usage on Brokers
Posted by Thunder Stumpges <th...@gmail.com>.
Not sure what version your producers/consumers are, or if you upgraded from
a previous version that used to work, or what, but maybe you're hitting
this?
https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact
On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <re...@gmail.com>
wrote:
> Hi All,
>
> Any idea on what can be done? Not sure if we are running into this below
> bug.
>
> https://issues.apache.org/jira/browse/KAFKA-7925
>
> Thanks
>
> On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <re...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
>> cpu usage on all the nodes. Here is all the information. Need some help on
>> figuring out what the problem is and how to overcome this issue.
>>
>> *Cluster:*
>> Kafka version: 2.3.0
>> Number of brokers in cluster: 12
>> Node type: 4 vCores 32GB mem
>> Network In: 10Mbps per broker
>> Network Out: 16Mbps per broker
>> Topics: 10 (approximately)
>> Partitions: 20 (Max), some has only partitions
>> Replication Factor: 3
>>
>> *CPU Usage:*
>> [image: image.png]
>>
>> *VMStat*
>>
>> [root]# vmstat 1 10
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ------cpu-----
>>
>> r b swpd free buff cache si so bi bo in cs us sy
>> id wa st
>>
>> 8 0 0 234444 19064 24046980 0 0 17 2026 1 3 38 33
>> 28 0 1
>>
>> 7 0 0 256444 19036 24023880 0 0 768 0 64027 22708 44
>> 40 16 0 1
>>
>> 7 0 0 245356 19052 24034560 0 0 256 472 63509 23276 44
>> 39 17 0 1
>>
>> 7 0 0 235096 19052 24046616 0 0 0 0 62277 22516 46
>> 38 15 0 1
>>
>> 8 0 0 260548 19036 24020084 0 0 516 49888 62364 22894 43
>> 38 18 0 1
>>
>> 5 0 0 249232 19036 24030924 0 0 512 0 61022 24589 41
>> 39 20 0 1
>>
>> 6 0 0 238072 19036 24042512 0 0 1024 0 63358 23063 44
>> 38 17 0 0
>>
>> 5 0 0 262904 19052 24017972 0 0 0 440 63078 23499 46
>> 37 17 0 1
>>
>> 7 0 0 250324 19052 24030008 0 0 0 0 64615 22617 48
>> 38 14 0 1
>>
>> 6 0 0 237920 19052 24042372 0 0 1024 48900 63223 23029 42
>> 40 18 0 1
>>
>>
>> *IO Stat:*
>>
>> [root]# iostat -m
>>
>> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
>> 01/02/2020 _x86_64_ (4 CPU)
>>
>>
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>>
>> 38.11 0.00 33.09 0.11 0.61 28.08
>>
>>
>>
>> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
>>
>> xvda 2.36 0.01 0.01 26760 43360
>>
>> nvme0n1 0.00 0.00 0.00 2 0
>>
>> xvdf 70.95 0.06 7.67 185908 25205338
>>
>> *Top Kafka broker threads:*
>> [image: image.png]
>>
>> *Top 3:*
>>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
>> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
>> [0x00007f8a886ce000]
>>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
>> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
>> [0x00007f8a6aefd000]
>>
>> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
>> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
>> [0x00007f8a885cd000]
>>
>> It doesn't looks like GC and IO is the problem.
>>
>> Thanks
>>
>
Re: High CPU Usage on Brokers
Posted by Navneeth Krishnan <re...@gmail.com>.
Hi All,
Any idea on what can be done? Not sure if we are running into this below
bug.
https://issues.apache.org/jira/browse/KAFKA-7925
Thanks
On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <re...@gmail.com>
wrote:
> Hi All,
>
> We have a kafka cluster with 12 nodes and we are pretty much seeing 90%
> cpu usage on all the nodes. Here is all the information. Need some help on
> figuring out what the problem is and how to overcome this issue.
>
> *Cluster:*
> Kafka version: 2.3.0
> Number of brokers in cluster: 12
> Node type: 4 vCores 32GB mem
> Network In: 10Mbps per broker
> Network Out: 16Mbps per broker
> Topics: 10 (approximately)
> Partitions: 20 (Max), some has only partitions
> Replication Factor: 3
>
> *CPU Usage:*
> [image: image.png]
>
> *VMStat*
>
> [root]# vmstat 1 10
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ------cpu-----
>
> r b swpd free buff cache si so bi bo in cs us sy id
> wa st
>
> 8 0 0 234444 19064 24046980 0 0 17 2026 1 3 38 33
> 28 0 1
>
> 7 0 0 256444 19036 24023880 0 0 768 0 64027 22708 44
> 40 16 0 1
>
> 7 0 0 245356 19052 24034560 0 0 256 472 63509 23276 44
> 39 17 0 1
>
> 7 0 0 235096 19052 24046616 0 0 0 0 62277 22516 46
> 38 15 0 1
>
> 8 0 0 260548 19036 24020084 0 0 516 49888 62364 22894 43
> 38 18 0 1
>
> 5 0 0 249232 19036 24030924 0 0 512 0 61022 24589 41
> 39 20 0 1
>
> 6 0 0 238072 19036 24042512 0 0 1024 0 63358 23063 44
> 38 17 0 0
>
> 5 0 0 262904 19052 24017972 0 0 0 440 63078 23499 46
> 37 17 0 1
>
> 7 0 0 250324 19052 24030008 0 0 0 0 64615 22617 48
> 38 14 0 1
>
> 6 0 0 237920 19052 24042372 0 0 1024 48900 63223 23029 42
> 40 18 0 1
>
>
> *IO Stat:*
>
> [root]# iostat -m
>
> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> 01/02/2020 _x86_64_ (4 CPU)
>
>
>
> avg-cpu: %user %nice %system %iowait %steal %idle
>
> 38.11 0.00 33.09 0.11 0.61 28.08
>
>
>
> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
>
> xvda 2.36 0.01 0.01 26760 43360
>
> nvme0n1 0.00 0.00 0.00 2 0
>
> xvdf 70.95 0.06 7.67 185908 25205338
>
> *Top Kafka broker threads:*
> [image: image.png]
>
> *Top 3:*
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> [0x00007f8a886ce000]
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> [0x00007f8a6aefd000]
>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> [0x00007f8a885cd000]
>
> It doesn't looks like GC and IO is the problem.
>
> Thanks
>