You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Steven Wu <st...@gmail.com> on 2015/02/19 19:34:51 UTC

big cpu jump on producer in face of broker outage

I think this is an issue caused by KAFKA-1788.

I was trying to test producer resiliency to broker outage. In this
experiment, I shutdown all brokers and see how producer behavior.

Here are the observations
1) kafka producer can recover from kafka outage. i.e. send resumed after
brokers came back
2) producer instance saw big cpu jump during outage. 28% -> 52% in one
test.

Note that I didn't observe cpu issue when new producer instance started
with brokers outage. In this case, there are no messages accumulated in the
buffer, because KafkaProducer constructor failed with DNS lookup for
route53 name. when brokers came up, my wrapper re-created KafkaProducer
object and recover from outage with sending messages.

Here is the cpu graph for a running producer instance where broker outage
happened in the middle of test run. it shows cpu problem.
https://docs.google.com/drawings/d/1FdEg9-Rf_jbDZX0cC3iZ834c4m-5rqgK-41lSS6VudQ/edit?usp=sharing

Here is the cpu graph for a new producer instance where broker outage
happened before instance startup. cpu is good here.
https://docs.google.com/drawings/d/1NmOdwp79DKHE7kJeskBm411ln6QczAMfmcWeijvZQRQ/edit?usp=sharing

Note that producer is a 4-core m1.xlarge instance. x-axis is time, y-axis
is cpu util.

Thanks,
Steven

Re: big cpu jump on producer in face of broker outage

Posted by Steven Wu <st...@gmail.com>.
Jun,

You are right. I tried 0.8.2.0 producer with my test. confirmed that it
fixed the cpu issue.

Thanks,
Steven


On Thu, Feb 19, 2015 at 12:02 PM, Steven Wu <st...@gmail.com> wrote:

> will try 0.8.2.1 on producer and report back result.
>
> On Thu, Feb 19, 2015 at 11:52 AM, Jun Rao <ju...@confluent.io> wrote:
>
>> This is probably due to KAFKA-1642, which is fixed in 0.8.2.0. Could you
>> try that version or 0.8.2.1 which is being voted now.
>>
>> Thanks,
>>
>> Jun
>>
>> On Thu, Feb 19, 2015 at 10:42 AM, Steven Wu <st...@gmail.com> wrote:
>>
>> > forgot to mention in case it matters
>> > producer: 0.8.2-beta
>> > broker: 0.8.1.1
>> >
>> > On Thu, Feb 19, 2015 at 10:34 AM, Steven Wu <st...@gmail.com>
>> wrote:
>> >
>> > > I think this is an issue caused by KAFKA-1788.
>> > >
>> > > I was trying to test producer resiliency to broker outage. In this
>> > > experiment, I shutdown all brokers and see how producer behavior.
>> > >
>> > > Here are the observations
>> > > 1) kafka producer can recover from kafka outage. i.e. send resumed
>> after
>> > > brokers came back
>> > > 2) producer instance saw big cpu jump during outage. 28% -> 52% in one
>> > > test.
>> > >
>> > > Note that I didn't observe cpu issue when new producer instance
>> started
>> > > with brokers outage. In this case, there are no messages accumulated
>> in
>> > the
>> > > buffer, because KafkaProducer constructor failed with DNS lookup for
>> > > route53 name. when brokers came up, my wrapper re-created
>> KafkaProducer
>> > > object and recover from outage with sending messages.
>> > >
>> > > Here is the cpu graph for a running producer instance where broker
>> outage
>> > > happened in the middle of test run. it shows cpu problem.
>> > >
>> > >
>> >
>> https://docs.google.com/drawings/d/1FdEg9-Rf_jbDZX0cC3iZ834c4m-5rqgK-41lSS6VudQ/edit?usp=sharing
>> > >
>> > > Here is the cpu graph for a new producer instance where broker outage
>> > > happened before instance startup. cpu is good here.
>> > >
>> > >
>> >
>> https://docs.google.com/drawings/d/1NmOdwp79DKHE7kJeskBm411ln6QczAMfmcWeijvZQRQ/edit?usp=sharing
>> > >
>> > > Note that producer is a 4-core m1.xlarge instance. x-axis is time,
>> y-axis
>> > > is cpu util.
>> > >
>> > > Thanks,
>> > > Steven
>> > >
>> >
>>
>
>

Re: big cpu jump on producer in face of broker outage

Posted by Steven Wu <st...@gmail.com>.
will try 0.8.2.1 on producer and report back result.

On Thu, Feb 19, 2015 at 11:52 AM, Jun Rao <ju...@confluent.io> wrote:

> This is probably due to KAFKA-1642, which is fixed in 0.8.2.0. Could you
> try that version or 0.8.2.1 which is being voted now.
>
> Thanks,
>
> Jun
>
> On Thu, Feb 19, 2015 at 10:42 AM, Steven Wu <st...@gmail.com> wrote:
>
> > forgot to mention in case it matters
> > producer: 0.8.2-beta
> > broker: 0.8.1.1
> >
> > On Thu, Feb 19, 2015 at 10:34 AM, Steven Wu <st...@gmail.com>
> wrote:
> >
> > > I think this is an issue caused by KAFKA-1788.
> > >
> > > I was trying to test producer resiliency to broker outage. In this
> > > experiment, I shutdown all brokers and see how producer behavior.
> > >
> > > Here are the observations
> > > 1) kafka producer can recover from kafka outage. i.e. send resumed
> after
> > > brokers came back
> > > 2) producer instance saw big cpu jump during outage. 28% -> 52% in one
> > > test.
> > >
> > > Note that I didn't observe cpu issue when new producer instance started
> > > with brokers outage. In this case, there are no messages accumulated in
> > the
> > > buffer, because KafkaProducer constructor failed with DNS lookup for
> > > route53 name. when brokers came up, my wrapper re-created KafkaProducer
> > > object and recover from outage with sending messages.
> > >
> > > Here is the cpu graph for a running producer instance where broker
> outage
> > > happened in the middle of test run. it shows cpu problem.
> > >
> > >
> >
> https://docs.google.com/drawings/d/1FdEg9-Rf_jbDZX0cC3iZ834c4m-5rqgK-41lSS6VudQ/edit?usp=sharing
> > >
> > > Here is the cpu graph for a new producer instance where broker outage
> > > happened before instance startup. cpu is good here.
> > >
> > >
> >
> https://docs.google.com/drawings/d/1NmOdwp79DKHE7kJeskBm411ln6QczAMfmcWeijvZQRQ/edit?usp=sharing
> > >
> > > Note that producer is a 4-core m1.xlarge instance. x-axis is time,
> y-axis
> > > is cpu util.
> > >
> > > Thanks,
> > > Steven
> > >
> >
>

Re: big cpu jump on producer in face of broker outage

Posted by Jun Rao <ju...@confluent.io>.
This is probably due to KAFKA-1642, which is fixed in 0.8.2.0. Could you
try that version or 0.8.2.1 which is being voted now.

Thanks,

Jun

On Thu, Feb 19, 2015 at 10:42 AM, Steven Wu <st...@gmail.com> wrote:

> forgot to mention in case it matters
> producer: 0.8.2-beta
> broker: 0.8.1.1
>
> On Thu, Feb 19, 2015 at 10:34 AM, Steven Wu <st...@gmail.com> wrote:
>
> > I think this is an issue caused by KAFKA-1788.
> >
> > I was trying to test producer resiliency to broker outage. In this
> > experiment, I shutdown all brokers and see how producer behavior.
> >
> > Here are the observations
> > 1) kafka producer can recover from kafka outage. i.e. send resumed after
> > brokers came back
> > 2) producer instance saw big cpu jump during outage. 28% -> 52% in one
> > test.
> >
> > Note that I didn't observe cpu issue when new producer instance started
> > with brokers outage. In this case, there are no messages accumulated in
> the
> > buffer, because KafkaProducer constructor failed with DNS lookup for
> > route53 name. when brokers came up, my wrapper re-created KafkaProducer
> > object and recover from outage with sending messages.
> >
> > Here is the cpu graph for a running producer instance where broker outage
> > happened in the middle of test run. it shows cpu problem.
> >
> >
> https://docs.google.com/drawings/d/1FdEg9-Rf_jbDZX0cC3iZ834c4m-5rqgK-41lSS6VudQ/edit?usp=sharing
> >
> > Here is the cpu graph for a new producer instance where broker outage
> > happened before instance startup. cpu is good here.
> >
> >
> https://docs.google.com/drawings/d/1NmOdwp79DKHE7kJeskBm411ln6QczAMfmcWeijvZQRQ/edit?usp=sharing
> >
> > Note that producer is a 4-core m1.xlarge instance. x-axis is time, y-axis
> > is cpu util.
> >
> > Thanks,
> > Steven
> >
>

Re: big cpu jump on producer in face of broker outage

Posted by Steven Wu <st...@gmail.com>.
forgot to mention in case it matters
producer: 0.8.2-beta
broker: 0.8.1.1

On Thu, Feb 19, 2015 at 10:34 AM, Steven Wu <st...@gmail.com> wrote:

> I think this is an issue caused by KAFKA-1788.
>
> I was trying to test producer resiliency to broker outage. In this
> experiment, I shutdown all brokers and see how producer behavior.
>
> Here are the observations
> 1) kafka producer can recover from kafka outage. i.e. send resumed after
> brokers came back
> 2) producer instance saw big cpu jump during outage. 28% -> 52% in one
> test.
>
> Note that I didn't observe cpu issue when new producer instance started
> with brokers outage. In this case, there are no messages accumulated in the
> buffer, because KafkaProducer constructor failed with DNS lookup for
> route53 name. when brokers came up, my wrapper re-created KafkaProducer
> object and recover from outage with sending messages.
>
> Here is the cpu graph for a running producer instance where broker outage
> happened in the middle of test run. it shows cpu problem.
>
> https://docs.google.com/drawings/d/1FdEg9-Rf_jbDZX0cC3iZ834c4m-5rqgK-41lSS6VudQ/edit?usp=sharing
>
> Here is the cpu graph for a new producer instance where broker outage
> happened before instance startup. cpu is good here.
>
> https://docs.google.com/drawings/d/1NmOdwp79DKHE7kJeskBm411ln6QczAMfmcWeijvZQRQ/edit?usp=sharing
>
> Note that producer is a 4-core m1.xlarge instance. x-axis is time, y-axis
> is cpu util.
>
> Thanks,
> Steven
>