You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Manikumar (JIRA)" <ji...@apache.org> on 2018/06/18 16:11:00 UTC
[jira] [Resolved] (KAFKA-4061) Apache Kafka failover is not working

     [ https://issues.apache.org/jira/browse/KAFKA-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Manikumar resolved KAFKA-4061.
------------------------------
    Resolution: Cannot Reproduce

This is mostly due to the health of the consumer offset topic.  replication factor of the "__consumer_offsets"  topic should be greater than 1 for greater availability.  Please reopen if you think the issue still exists

> Apache Kafka failover is not working
> ------------------------------------
>
>                 Key: KAFKA-4061
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4061
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.0.0
>         Environment: Linux
>            Reporter: Sebastian Bruckner
>            Priority: Major
>
> We have a 3 node cluster (kafka1 to kafka3) on 0.10.0.0
> When I shut down the node kafka1 i can see in the debug logs of my consumers the following:
> {code}
> Sending coordinator request for group f49dc74f-3ccb-4fef-bafc-a7547fe26bc8 to broker kafka3:9092 (id: 3 rack: null)
> Received group coordinator response ClientResponse(receivedTimeMs=1471511333843, disconnected=false, request=ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@3892b449, request=RequestSend(header={api_key=10,api_version=0,correlation_id=118,client_id=f49dc74f-3ccb-4fef-bafc-a7547fe26bc8}, body={group_id=f49dc74f-3ccb-4fef-bafc-a7547fe26bc8}), createdTimeMs=1471511333794, sendTimeMs=1471511333794), responseBody={error_code=0,coordinator={node_id=1,host=kafka1,port=9092}})
> {code}
> So the problem is that kafka3 answers with an response telling the consumer that the coordinator is kafka1 (which is shut down).
> This then happens over and over again.
> When i restart the consumer i can see the following:
> {code}
> Updated cluster metadata version 1 to Cluster(nodes = [kafka2:9092 (id: -2 rack: null), kafka1:9092 (id: -1 rack: null), kafka3:9092 (id: -3 rack: null)], partitions = [])
> ... responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
> {code}
> The difference is now that it answers with error code 15 (GROUP_COORDINATOR_NOT_AVAILABLE). 
> Somehow kafka doesn't elect a new group coordinator. 
> In a local setup with 2 brokers and 1 zookeper it works fine..
> Can you help me debugging this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)