You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jan Omar <ja...@wooga.net> on 2016/11/22 09:40:44 UTC

Consumer failover issue when coordinator dies (e.g. broker restart)

Hey guys,

We're running Kafka 0.9.0.1 with Java 7 on FreeBSD. We are experiencing unrecoverable issues in our consumers, e.g. when restarting brokers.

The consumers start reporting that the coordinator died (which in general is correct, because the coordinator was restarted). However, the consumer should failover to another coordinator, unfortunately that never happens. Instead it runs into an inifite loop, that looks like this:

[exa3-fetcher] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483641 dead.
[exa3-fetcher] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Attempt to join group bit-exa3 failed due to obsolete coordinator information, retrying.

It remains like this until we shutdown all brokers except for 1. As soon as we have only 1 broker remaining life continues as expected.

[exa3-fetcher] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Attempt to heart beat failed since the group is rebalancing, try to re-join group.

Any idea what's causing this or if this a known bug or something?

Any help would be highly appreciated as this is a serious issue for us.

Thanks!

Regards

Jan