You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Jeff Widman <je...@netskope.com> on 2016/11/04 18:52:23 UTC

Re: Mysterious timeout

Mike,
Did you ever figure this out?

We're considering using Kafka on Kubernetes and very interested in how it's
going for you.

On Thu, Oct 27, 2016 at 8:34 AM, Martin Gainty <mg...@hotmail.com> wrote:

> MG>can u write simpleConsumer to determine when lead broker times-out..
> then you'll need to tweak connection settings
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.
> 0+SimpleConsumer+Example
>
> MG>to debug the response determine the leadBroker and the reason for fetch
> failure as seen here:
> if (fetchResponse.hasError()) {
>      numErrors++;
>      // Something went wrong!
>      short code = fetchResponse.errorCode(a_topic, a_partition);
>      System.out.println("Error fetching data from the Broker:" +
> leadBroker + " Reason: " + code);
> ________________________________
> From: Mike Kaplinskiy <mi...@ladderlife.com>
> Sent: Thursday, October 27, 2016 3:11:14 AM
> To: users@kafka.apache.org
> Subject: Mysterious timeout
>
> Hey folks,
>
> We're observing a very peculiar behavior on our Kafka cluster. When one of
> the Kafka broker instances goes down, we're seeing the producer block (at
> .flush) for right about `request.timeout.ms` before returning success (or
> at least not throwing an exception) and moving on.
>
> We're running Kafka on Kubernetes, so this may be related. Kafka is a
> Kubernetes PetSet with a global Service (like a load balancer) for
> consumers/producers to use for the bootstrap list. Our Kafka brokers are
> configured to come up with a predetermined set of broker ids (kafka-0,
> kafka-1 & kafka-2), but the IP likely changes every time it's restarted.
>
> Our Kafka settings are as follows:
> Producer:
> "acks" "all"
> "batch.size" "16384"
> "linger.ms" "1"
> "request.timeout.ms" "3000"
> "max.in.flight.requests.per.connection" "1"
> "retries" "2"
> "max.block.ms" "10000"
> "buffer.memory" "33554432"
>
> Broker:
> min.insync.replicas=1
>
> I'm having a bit of a hard time debugging why this happens, mostly because
> I'm not seeing any logs from the producer. Is there a guide somewhere for
> turning up the logging information from the kafka java client? I'm using
> logback if that helps.
>
> Thanks,
> Mike.
>
> Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure your life.
>

Re: Mysterious timeout

Posted by Becket Qin <be...@gmail.com>.

Hi Mike,

From what you described it seems the socket on the producer was still
connected even after the broker instance is down. This is possible if the
broker instance went down in a sudden without closing the TCP connection
(e.g. lost power). Otherwise the producer should be able to detect the
disconnect and won't wait until request timeout.

Another possibility is that the connection between producer and broker goes
through some sort of proxy. When the broker went down, the socket from the
producer to the proxy was still alive so the producer will wait until
request timeout in that case.

Could you check if the cause of the issue you saw matches one of the above?

Thanks,

Jiangjie (Becket) Qin

On Fri, Nov 4, 2016 at 11:52 AM, Jeff Widman <je...@netskope.com> wrote:

> Mike,
> Did you ever figure this out?
>
> We're considering using Kafka on Kubernetes and very interested in how it's
> going for you.
>
> On Thu, Oct 27, 2016 at 8:34 AM, Martin Gainty <mg...@hotmail.com>
> wrote:
>
> > MG>can u write simpleConsumer to determine when lead broker times-out..
> > then you'll need to tweak connection settings
> > https://cwiki.apache.org/confluence/display/KAFKA/0.8.
> > 0+SimpleConsumer+Example
> >
> > MG>to debug the response determine the leadBroker and the reason for
> fetch
> > failure as seen here:
> > if (fetchResponse.hasError()) {
> >      numErrors++;
> >      // Something went wrong!
> >      short code = fetchResponse.errorCode(a_topic, a_partition);
> >      System.out.println("Error fetching data from the Broker:" +
> > leadBroker + " Reason: " + code);
> > ________________________________
> > From: Mike Kaplinskiy <mi...@ladderlife.com>
> > Sent: Thursday, October 27, 2016 3:11:14 AM
> > To: users@kafka.apache.org
> > Subject: Mysterious timeout
> >
> > Hey folks,
> >
> > We're observing a very peculiar behavior on our Kafka cluster. When one
> of
> > the Kafka broker instances goes down, we're seeing the producer block (at
> > .flush) for right about `request.timeout.ms` before returning success
> (or
> > at least not throwing an exception) and moving on.
> >
> > We're running Kafka on Kubernetes, so this may be related. Kafka is a
> > Kubernetes PetSet with a global Service (like a load balancer) for
> > consumers/producers to use for the bootstrap list. Our Kafka brokers are
> > configured to come up with a predetermined set of broker ids (kafka-0,
> > kafka-1 & kafka-2), but the IP likely changes every time it's restarted.
> >
> > Our Kafka settings are as follows:
> > Producer:
> > "acks" "all"
> > "batch.size" "16384"
> > "linger.ms" "1"
> > "request.timeout.ms" "3000"
> > "max.in.flight.requests.per.connection" "1"
> > "retries" "2"
> > "max.block.ms" "10000"
> > "buffer.memory" "33554432"
> >
> > Broker:
> > min.insync.replicas=1
> >
> > I'm having a bit of a hard time debugging why this happens, mostly
> because
> > I'm not seeing any logs from the producer. Is there a guide somewhere for
> > turning up the logging information from the kafka java client? I'm using
> > logback if that helps.
> >
> > Thanks,
> > Mike.
> >
> > Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure your
> life.
> >
>