You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Paul Mackles <pm...@adobe.com> on 2014/01/07 19:00:29 UTC

0.8 high-level consumer error handling

Hi - I noticed that if a kafka cluster goes away entirely, the high-level consumer will endlessly try to fetch metadata until the cluster comes back up, never bubbling the error condition up to the application. While I see a setting to control the interval at which it reconnects, I don't see anything to tell it when to just give up. I think it would be useful if there were a way for the application to detect this condition and possibly take some sort of action. Either a max-retries setting and/or some sort of flag that can be tested after a timeout. Is that capability already there? Is there a known workaround for this?

Thanks,
Paul

Re: 0.8 high-level consumer error handling

Posted by Paul Mackles <pm...@adobe.com>.

Hi Joel - The kind of error I am thinking about is when there is a
networking issue where the consumer is completely cut-off from the
cluster. In that scenario, the consuming application has no way of knowing
whether there is an actual problem or there are just no messages to
consume. In the case of a networking issue, the application might want to
shutdown and/or send a notification upstream.

On 1/7/14 8:13 PM, "Joel Koshy" <jj...@gmail.com> wrote:

>Paul,
>
>I don't think there is currently a way to detect this condition
>apart from alerting off consumer metrics or logs.
>
>However, I'm not sure it can be called a "fatal" condition in that
>the brokers could re-register in zookeeper and consumption would then
>resume; unless someone decides to move a Kafka cluster to some
>other zookeeper namespace without telling anyone.
>
>What would be a suitable action on the application-side if such a
>condition were propagated back to the application as an exception?
>
>Thanks,
>
>Joel
>
>On Tue, Jan 07, 2014 at 06:00:29PM +0000, Paul Mackles wrote:
>> Hi - I noticed that if a kafka cluster goes away entirely, the
>>high-level consumer will endlessly try to fetch metadata until the
>>cluster comes back up, never bubbling the error condition up to the
>>application. While I see a setting to control the interval at which it
>>reconnects, I don't see anything to tell it when to just give up. I
>>think it would be useful if there were a way for the application to
>>detect this condition and possibly take some sort of action. Either a
>>max-retries setting and/or some sort of flag that can be tested after a
>>timeout. Is that capability already there? Is there a known workaround
>>for this?
>> 
>> Thanks,
>> Paul
>

Re: 0.8 high-level consumer error handling

Posted by Joel Koshy <jj...@gmail.com>.

Paul,

I don't think there is currently a way to detect this condition
apart from alerting off consumer metrics or logs.

However, I'm not sure it can be called a "fatal" condition in that
the brokers could re-register in zookeeper and consumption would then
resume; unless someone decides to move a Kafka cluster to some
other zookeeper namespace without telling anyone.

What would be a suitable action on the application-side if such a
condition were propagated back to the application as an exception?

Thanks,

Joel

On Tue, Jan 07, 2014 at 06:00:29PM +0000, Paul Mackles wrote:
> Hi - I noticed that if a kafka cluster goes away entirely, the high-level consumer will endlessly try to fetch metadata until the cluster comes back up, never bubbling the error condition up to the application. While I see a setting to control the interval at which it reconnects, I don't see anything to tell it when to just give up. I think it would be useful if there were a way for the application to detect this condition and possibly take some sort of action. Either a max-retries setting and/or some sort of flag that can be tested after a timeout. Is that capability already there? Is there a known workaround for this?
> 
> Thanks,
> Paul