You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Michael Popov <mp...@microsoft.com> on 2014/02/13 00:22:39 UTC

Sync producers stuck waiting for 2 acks

I am running a test deployment of Kafka 0.8. When I configure sync producers to expect 2 acks for each "write" request, some of the producers get stuck. It looks like broker's response is not delivered back.
This happened with original Kafka performance tools and with a test tool built using a custom C# client library. So I assume the issue is not on the client side.
I checked the sources. Even if a replica broker does not catch up with a leader, a producer request should expire on time out. I don't see configuration parameter to set this timeout. The closest configuration setting I can see is "producer.purgatory.purge.interval.requests" but it is in the number of requests, not time units.

I would appreciate any advice where to look for the problem and how to solve it.

Thank you,
Michael Popov

Re: Sync producers stuck waiting for 2 acks

Posted by Joel Koshy <jj...@gmail.com>.
> I checked the information in Zookeeper and found out that 2 of the brokers are missing. The VMs with these brokers are not quite ... healthy (I cannot find another definition for this situation). I checked the information about replicas distribution and there are 3 replicas for each partition, so that part is ok. Started tests again and get some of producers stuck again. May be there is something wrong with my cluster of VMs.

Can you clarify what you mean by not healthy? Also, when you say three
replicas - are those replicas in the ISR? You can either use
list-topics (./bin/kafka-topics.sh --describe) or read the
topic/partition/state path from zookeeper directly.

> 
> To reproduce the situation with original performance test tools: 
> - start a Zookeeper node on 1 VM
> - start 2 Kafka brokers on 2 VMs
> - create a topic with multiple partitions and replication factor 2
> - run producer performance script on 4 VMs in a sync mode with 2 acks to send 1M messages
> 
> 
> -----Original Message-----
> From: Joel Koshy [mailto:jjkoshy.w@gmail.com] 
> Sent: Wednesday, February 12, 2014 3:40 PM
> To: users@kafka.apache.org
> Subject: Re: Sync producers stuck waiting for 2 acks
> 
> The request time out config is request.timeout.ms - defaults to 10 seconds - so the request should expire by then and return a response.
> Can you run the list-topics command on the topics you are sending to and make sure there are at least two replicas in ISR while you are running your producer test?
> 
> You mentioned you were able to reproduce this with the original performance tools - can you provide exact steps to reproduce if the above information does not help resolve this?
> 
> Joel
> 
> On Wed, Feb 12, 2014 at 11:22:39PM +0000, Michael Popov wrote:
> > I am running a test deployment of Kafka 0.8. When I configure sync producers to expect 2 acks for each "write" request, some of the producers get stuck. It looks like broker's response is not delivered back.
> > This happened with original Kafka performance tools and with a test tool built using a custom C# client library. So I assume the issue is not on the client side.
> > I checked the sources. Even if a replica broker does not catch up with a leader, a producer request should expire on time out. I don't see configuration parameter to set this timeout. The closest configuration setting I can see is "producer.purgatory.purge.interval.requests" but it is in the number of requests, not time units.
> > 
> > I would appreciate any advice where to look for the problem and how to solve it.
> > 
> > Thank you,
> > Michael Popov
> 


RE: Sync producers stuck waiting for 2 acks

Posted by Michael Popov <mp...@microsoft.com>.
Thanks Joel!
I found this configuration setting in "Producer Configs". I guess it means each producer sets this parameter as part of connection settings, like a number of acks.

I checked the information in Zookeeper and found out that 2 of the brokers are missing. The VMs with these brokers are not quite ... healthy (I cannot find another definition for this situation). I checked the information about replicas distribution and there are 3 replicas for each partition, so that part is ok. Started tests again and get some of producers stuck again. May be there is something wrong with my cluster of VMs.

To reproduce the situation with original performance test tools: 
- start a Zookeeper node on 1 VM
- start 2 Kafka brokers on 2 VMs
- create a topic with multiple partitions and replication factor 2
- run producer performance script on 4 VMs in a sync mode with 2 acks to send 1M messages


-----Original Message-----
From: Joel Koshy [mailto:jjkoshy.w@gmail.com] 
Sent: Wednesday, February 12, 2014 3:40 PM
To: users@kafka.apache.org
Subject: Re: Sync producers stuck waiting for 2 acks

The request time out config is request.timeout.ms - defaults to 10 seconds - so the request should expire by then and return a response.
Can you run the list-topics command on the topics you are sending to and make sure there are at least two replicas in ISR while you are running your producer test?

You mentioned you were able to reproduce this with the original performance tools - can you provide exact steps to reproduce if the above information does not help resolve this?

Joel

On Wed, Feb 12, 2014 at 11:22:39PM +0000, Michael Popov wrote:
> I am running a test deployment of Kafka 0.8. When I configure sync producers to expect 2 acks for each "write" request, some of the producers get stuck. It looks like broker's response is not delivered back.
> This happened with original Kafka performance tools and with a test tool built using a custom C# client library. So I assume the issue is not on the client side.
> I checked the sources. Even if a replica broker does not catch up with a leader, a producer request should expire on time out. I don't see configuration parameter to set this timeout. The closest configuration setting I can see is "producer.purgatory.purge.interval.requests" but it is in the number of requests, not time units.
> 
> I would appreciate any advice where to look for the problem and how to solve it.
> 
> Thank you,
> Michael Popov


Re: Sync producers stuck waiting for 2 acks

Posted by Joel Koshy <jj...@gmail.com>.
The request time out config is request.timeout.ms - defaults to 10
seconds - so the request should expire by then and return a response.
Can you run the list-topics command on the topics you are sending to
and make sure there are at least two replicas in ISR while you are
running your producer test?

You mentioned you were able to reproduce this with the original
performance tools - can you provide exact steps to reproduce if the
above information does not help resolve this?

Joel

On Wed, Feb 12, 2014 at 11:22:39PM +0000, Michael Popov wrote:
> I am running a test deployment of Kafka 0.8. When I configure sync producers to expect 2 acks for each "write" request, some of the producers get stuck. It looks like broker's response is not delivered back.
> This happened with original Kafka performance tools and with a test tool built using a custom C# client library. So I assume the issue is not on the client side.
> I checked the sources. Even if a replica broker does not catch up with a leader, a producer request should expire on time out. I don't see configuration parameter to set this timeout. The closest configuration setting I can see is "producer.purgatory.purge.interval.requests" but it is in the number of requests, not time units.
> 
> I would appreciate any advice where to look for the problem and how to solve it.
> 
> Thank you,
> Michael Popov