You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Brandon Barron <br...@live.com> on 2020/01/30 16:10:59 UTC

High CPU in 2.2.0 kafka cluster

Hi,

We had a small cluster (4 brokers) dealing with very low throughput - a couple hundred messages per minute at the very most. In that cluster we had a little under 3300 total consumers (all were kafka streams instances). All broker CPUs were maxed out almost consistently for a few weeks.

We switched traffic to a new cluster eventually. The old cluster sitting idle for a few days was at ~40% CPU, with consumers still running. When I took down all the consumers, the idle CPU on the brokers went to about 4%.

To test, we decided to mirror active traffic in our new cluster to the old cluster (which now has no running consumers). The CPU didn't budge; it's still at ~4% as expected with the low throughput.

One more thing to add: I ran a thread profiler on a couple brokers when the old cluster was taking active traffic with running consumers and the CPU was maxed out. Each time, I saw the ReplicaFetcherThread eating up around 40% of CPU time.

Can you give any advice on what might be the root cause of this?

Thanks,
Brandon

Re: High CPU in 2.2.0 kafka cluster

Posted by Ismael Juma <is...@juma.me.uk>.
Yes, there were some improvements in 2.4.0. However, the puzzling aspect of
your description is that the CPU usage basically disappeared when you
removed the consumers even though your profiler claimed that the replica
fetchers were using a large chunk of the CPU.

Ismael

On Mon, Feb 3, 2020 at 9:50 AM Brandon Barron <br...@live.com>
wrote:

> Haven't tried updating to 2.4.0 yet. Were there any related fixes or
> improvements in that version? I skimmed the changelog but I didn't see
> anything.
>
> I found this issue https://issues.apache.org/jira/browse/KAFKA-9039 which
> I thought could be related to our problems but it seems like it's projected
> to be included in 2.5.0.
>
> Thanks,
> Brandon
>
> ________________________________
> From: Ismael Juma <is...@juma.me.uk>
> Sent: Monday, February 3, 2020 7:31 AM
> To: Kafka Users <us...@kafka.apache.org>
> Subject: Re: High CPU in 2.2.0 kafka cluster
>
> Hi Brandon,
>
> Are you still seeing this behavior with Apache Kafka 2.4.0?
>
> Ismael
>
> On Fri, Jan 31, 2020 at 10:51 AM Brandon Barron <br...@live.com>
> wrote:
>
> > We were running client version 2.3.0 for a while, then bumped to 2.3.1
> for
> > a particular kafka streams bug fix. We saw this issue while both versions
> > were running.
> >
> > Brandon
> >
> > ________________________________
> > From: Jamie <ja...@aol.co.uk.INVALID>
> > Sent: Thursday, January 30, 2020 1:03 PM
> > To: users@kafka.apache.org <us...@kafka.apache.org>
> > Subject: Re: High CPU in 2.2.0 kafka cluster
> >
> > Hi Brandon,
> > Which version of Kafka are the consumers running? My understanding is
> that
> > if they're running a version lower than the brokers then they could be
> > using a different format for the messages which means the brokers have to
> > convert each record before sending to the consumer.
> > Thanks,
> > Jamie
> >
> >
> > -----Original Message-----
> > From: Brandon Barron <br...@live.com>
> > To: users@kafka.apache.org <us...@kafka.apache.org>
> > Sent: Thu, 30 Jan 2020 16:11
> > Subject: High CPU in 2.2.0 kafka cluster
> >
> > Hi,
> >
> > We had a small cluster (4 brokers) dealing with very low throughput - a
> > couple hundred messages per minute at the very most. In that cluster we
> had
> > a little under 3300 total consumers (all were kafka streams instances).
> All
> > broker CPUs were maxed out almost consistently for a few weeks.
> >
> > We switched traffic to a new cluster eventually. The old cluster sitting
> > idle for a few days was at ~40% CPU, with consumers still running. When I
> > took down all the consumers, the idle CPU on the brokers went to about
> 4%.
> >
> > To test, we decided to mirror active traffic in our new cluster to the
> old
> > cluster (which now has no running consumers). The CPU didn't budge; it's
> > still at ~4% as expected with the low throughput.
> >
> > One more thing to add: I ran a thread profiler on a couple brokers when
> > the old cluster was taking active traffic with running consumers and the
> > CPU was maxed out. Each time, I saw the ReplicaFetcherThread eating up
> > around 40% of CPU time.
> >
> > Can you give any advice on what might be the root cause of this?
> >
> > Thanks,
> > Brandon
> >
>

Re: High CPU in 2.2.0 kafka cluster

Posted by Brandon Barron <br...@live.com>.
Haven't tried updating to 2.4.0 yet. Were there any related fixes or improvements in that version? I skimmed the changelog but I didn't see anything.

I found this issue https://issues.apache.org/jira/browse/KAFKA-9039 which I thought could be related to our problems but it seems like it's projected to be included in 2.5.0.

Thanks,
Brandon

________________________________
From: Ismael Juma <is...@juma.me.uk>
Sent: Monday, February 3, 2020 7:31 AM
To: Kafka Users <us...@kafka.apache.org>
Subject: Re: High CPU in 2.2.0 kafka cluster

Hi Brandon,

Are you still seeing this behavior with Apache Kafka 2.4.0?

Ismael

On Fri, Jan 31, 2020 at 10:51 AM Brandon Barron <br...@live.com>
wrote:

> We were running client version 2.3.0 for a while, then bumped to 2.3.1 for
> a particular kafka streams bug fix. We saw this issue while both versions
> were running.
>
> Brandon
>
> ________________________________
> From: Jamie <ja...@aol.co.uk.INVALID>
> Sent: Thursday, January 30, 2020 1:03 PM
> To: users@kafka.apache.org <us...@kafka.apache.org>
> Subject: Re: High CPU in 2.2.0 kafka cluster
>
> Hi Brandon,
> Which version of Kafka are the consumers running? My understanding is that
> if they're running a version lower than the brokers then they could be
> using a different format for the messages which means the brokers have to
> convert each record before sending to the consumer.
> Thanks,
> Jamie
>
>
> -----Original Message-----
> From: Brandon Barron <br...@live.com>
> To: users@kafka.apache.org <us...@kafka.apache.org>
> Sent: Thu, 30 Jan 2020 16:11
> Subject: High CPU in 2.2.0 kafka cluster
>
> Hi,
>
> We had a small cluster (4 brokers) dealing with very low throughput - a
> couple hundred messages per minute at the very most. In that cluster we had
> a little under 3300 total consumers (all were kafka streams instances). All
> broker CPUs were maxed out almost consistently for a few weeks.
>
> We switched traffic to a new cluster eventually. The old cluster sitting
> idle for a few days was at ~40% CPU, with consumers still running. When I
> took down all the consumers, the idle CPU on the brokers went to about 4%.
>
> To test, we decided to mirror active traffic in our new cluster to the old
> cluster (which now has no running consumers). The CPU didn't budge; it's
> still at ~4% as expected with the low throughput.
>
> One more thing to add: I ran a thread profiler on a couple brokers when
> the old cluster was taking active traffic with running consumers and the
> CPU was maxed out. Each time, I saw the ReplicaFetcherThread eating up
> around 40% of CPU time.
>
> Can you give any advice on what might be the root cause of this?
>
> Thanks,
> Brandon
>

Re: High CPU in 2.2.0 kafka cluster

Posted by Ismael Juma <is...@juma.me.uk>.
Hi Brandon,

Are you still seeing this behavior with Apache Kafka 2.4.0?

Ismael

On Fri, Jan 31, 2020 at 10:51 AM Brandon Barron <br...@live.com>
wrote:

> We were running client version 2.3.0 for a while, then bumped to 2.3.1 for
> a particular kafka streams bug fix. We saw this issue while both versions
> were running.
>
> Brandon
>
> ________________________________
> From: Jamie <ja...@aol.co.uk.INVALID>
> Sent: Thursday, January 30, 2020 1:03 PM
> To: users@kafka.apache.org <us...@kafka.apache.org>
> Subject: Re: High CPU in 2.2.0 kafka cluster
>
> Hi Brandon,
> Which version of Kafka are the consumers running? My understanding is that
> if they're running a version lower than the brokers then they could be
> using a different format for the messages which means the brokers have to
> convert each record before sending to the consumer.
> Thanks,
> Jamie
>
>
> -----Original Message-----
> From: Brandon Barron <br...@live.com>
> To: users@kafka.apache.org <us...@kafka.apache.org>
> Sent: Thu, 30 Jan 2020 16:11
> Subject: High CPU in 2.2.0 kafka cluster
>
> Hi,
>
> We had a small cluster (4 brokers) dealing with very low throughput - a
> couple hundred messages per minute at the very most. In that cluster we had
> a little under 3300 total consumers (all were kafka streams instances). All
> broker CPUs were maxed out almost consistently for a few weeks.
>
> We switched traffic to a new cluster eventually. The old cluster sitting
> idle for a few days was at ~40% CPU, with consumers still running. When I
> took down all the consumers, the idle CPU on the brokers went to about 4%.
>
> To test, we decided to mirror active traffic in our new cluster to the old
> cluster (which now has no running consumers). The CPU didn't budge; it's
> still at ~4% as expected with the low throughput.
>
> One more thing to add: I ran a thread profiler on a couple brokers when
> the old cluster was taking active traffic with running consumers and the
> CPU was maxed out. Each time, I saw the ReplicaFetcherThread eating up
> around 40% of CPU time.
>
> Can you give any advice on what might be the root cause of this?
>
> Thanks,
> Brandon
>

Re: High CPU in 2.2.0 kafka cluster

Posted by Brandon Barron <br...@live.com>.
We were running client version 2.3.0 for a while, then bumped to 2.3.1 for a particular kafka streams bug fix. We saw this issue while both versions were running.

Brandon

________________________________
From: Jamie <ja...@aol.co.uk.INVALID>
Sent: Thursday, January 30, 2020 1:03 PM
To: users@kafka.apache.org <us...@kafka.apache.org>
Subject: Re: High CPU in 2.2.0 kafka cluster

Hi Brandon,
Which version of Kafka are the consumers running? My understanding is that if they're running a version lower than the brokers then they could be using a different format for the messages which means the brokers have to convert each record before sending to the consumer.
Thanks,
Jamie


-----Original Message-----
From: Brandon Barron <br...@live.com>
To: users@kafka.apache.org <us...@kafka.apache.org>
Sent: Thu, 30 Jan 2020 16:11
Subject: High CPU in 2.2.0 kafka cluster

Hi,

We had a small cluster (4 brokers) dealing with very low throughput - a couple hundred messages per minute at the very most. In that cluster we had a little under 3300 total consumers (all were kafka streams instances). All broker CPUs were maxed out almost consistently for a few weeks.

We switched traffic to a new cluster eventually. The old cluster sitting idle for a few days was at ~40% CPU, with consumers still running. When I took down all the consumers, the idle CPU on the brokers went to about 4%.

To test, we decided to mirror active traffic in our new cluster to the old cluster (which now has no running consumers). The CPU didn't budge; it's still at ~4% as expected with the low throughput.

One more thing to add: I ran a thread profiler on a couple brokers when the old cluster was taking active traffic with running consumers and the CPU was maxed out. Each time, I saw the ReplicaFetcherThread eating up around 40% of CPU time.

Can you give any advice on what might be the root cause of this?

Thanks,
Brandon

Re: High CPU in 2.2.0 kafka cluster

Posted by Jamie <ja...@aol.co.uk.INVALID>.
Hi Brandon,
Which version of Kafka are the consumers running? My understanding is that if they're running a version lower than the brokers then they could be using a different format for the messages which means the brokers have to convert each record before sending to the consumer.
Thanks, 
Jamie


-----Original Message-----
From: Brandon Barron <br...@live.com>
To: users@kafka.apache.org <us...@kafka.apache.org>
Sent: Thu, 30 Jan 2020 16:11
Subject: High CPU in 2.2.0 kafka cluster

Hi,

We had a small cluster (4 brokers) dealing with very low throughput - a couple hundred messages per minute at the very most. In that cluster we had a little under 3300 total consumers (all were kafka streams instances). All broker CPUs were maxed out almost consistently for a few weeks.

We switched traffic to a new cluster eventually. The old cluster sitting idle for a few days was at ~40% CPU, with consumers still running. When I took down all the consumers, the idle CPU on the brokers went to about 4%.

To test, we decided to mirror active traffic in our new cluster to the old cluster (which now has no running consumers). The CPU didn't budge; it's still at ~4% as expected with the low throughput.

One more thing to add: I ran a thread profiler on a couple brokers when the old cluster was taking active traffic with running consumers and the CPU was maxed out. Each time, I saw the ReplicaFetcherThread eating up around 40% of CPU time.

Can you give any advice on what might be the root cause of this?

Thanks,
Brandon