You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Ali Akhtar <al...@gmail.com> on 2017/04/08 10:54:57 UTC

Leader not available error after kafka node goes down

I have a 3 node kafka cluster which is being managed via kubernetes, in
docker containers.

Recently, one of the 3 nodes went down, and was automatically re-created by
kubernetes.

However, now whenever I try to consume from one of my Kafka topics, thru
Kafka Streaming, i get the error:

>6687 [StreamThread-1] WARN  org.apache.kafka.clients.NetworkClient  -
Error while fetching metadata with correlation id 1 :
{my_topic=LEADER_NOT_AVAILABLE}

> org.apache.kafka.streams.errors.StreamsException: Topic not found during
partition assignment: my_topic

When I tried to re-create the topic via 'kafka-topics.sh --create', I
received:

> Error while executing topic command : Topic "my_topic" already exists.

Any ideas what's going on here, and how to have Kafka recover from a node
going down and automatically elect a new leader?

Re: Leader not available error after kafka node goes down

Posted by Ali Akhtar <al...@gmail.com>.

Hey Eno,

So here's the sequence of events:

- 48 hrs ago: Node went down and was brought back up
- Yesterday: I tried to deploy a kafka streams app, but it gave that error
for that topic name.

I then changed the topic name, created that topic manually, and redeployed.
This time it worked.

Any ideas of this depends for sure on the retries being 0, or if its
possible that the node came back with a different ID or something, and it
couldn't find the leader of the topic as it was looking for the previous
node ID that no longer existed?

Thanks.

On Sat, Apr 8, 2017 at 7:08 PM, Eno Thereska <en...@gmail.com> wrote:

> Hi Ali,
>
> Try changing the default value for the streams producer retries to
> something large, since the default is 0 (which means that if a broker is
> temporarily down, streams would give that error),  e.g., :
>
> final Properties props = new Properties();
> props.put(StreamsConfig.APPLICATION_ID_CONFIG, ID);
> ...
> props.put(ProducerConfig.RETRIES_CONFIG, 10);
>
>
> Note that the default is now changed in 0.10.2.1 (which is being voted on).
>
> While you're there, another important config we changed in 0.10.2.1 is
> max.poll.interval.ms, so I'd recommend changing that too. This is to
> avoid rebalancing during long state recoveries:
> props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,
> Integer.toString(Integer.MAX_VALUE));
>
> Thanks
> Eno
>
>
> > On 8 Apr 2017, at 03:54, Ali Akhtar <al...@gmail.com> wrote:
> >
> > I have a 3 node kafka cluster which is being managed via kubernetes, in
> > docker containers.
> >
> > Recently, one of the 3 nodes went down, and was automatically re-created
> by
> > kubernetes.
> >
> > However, now whenever I try to consume from one of my Kafka topics, thru
> > Kafka Streaming, i get the error:
> >
> >> 6687 [StreamThread-1] WARN  org.apache.kafka.clients.NetworkClient  -
> > Error while fetching metadata with correlation id 1 :
> > {my_topic=LEADER_NOT_AVAILABLE}
> >
> >> org.apache.kafka.streams.errors.StreamsException: Topic not found
> during
> > partition assignment: my_topic
> >
> > When I tried to re-create the topic via 'kafka-topics.sh --create', I
> > received:
> >
> >> Error while executing topic command : Topic "my_topic" already exists.
> >
> > Any ideas what's going on here, and how to have Kafka recover from a node
> > going down and automatically elect a new leader?
>
>

Re: Leader not available error after kafka node goes down

Posted by Eno Thereska <en...@gmail.com>.

Hi Ali,

Try changing the default value for the streams producer retries to something large, since the default is 0 (which means that if a broker is temporarily down, streams would give that error),  e.g., :

final Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, ID);
...
props.put(ProducerConfig.RETRIES_CONFIG, 10);

Note that the default is now changed in 0.10.2.1 (which is being voted on). 

While you're there, another important config we changed in 0.10.2.1 is max.poll.interval.ms, so I'd recommend changing that too. This is to avoid rebalancing during long state recoveries:
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, Integer.toString(Integer.MAX_VALUE));

Thanks
Eno

> On 8 Apr 2017, at 03:54, Ali Akhtar <al...@gmail.com> wrote:
> 
> I have a 3 node kafka cluster which is being managed via kubernetes, in
> docker containers.
> 
> Recently, one of the 3 nodes went down, and was automatically re-created by
> kubernetes.
> 
> However, now whenever I try to consume from one of my Kafka topics, thru
> Kafka Streaming, i get the error:
> 
>> 6687 [StreamThread-1] WARN  org.apache.kafka.clients.NetworkClient  -
> Error while fetching metadata with correlation id 1 :
> {my_topic=LEADER_NOT_AVAILABLE}
> 
>> org.apache.kafka.streams.errors.StreamsException: Topic not found during
> partition assignment: my_topic
> 
> When I tried to re-create the topic via 'kafka-topics.sh --create', I
> received:
> 
>> Error while executing topic command : Topic "my_topic" already exists.
> 
> Any ideas what's going on here, and how to have Kafka recover from a node
> going down and automatically elect a new leader?