You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Navinder Brar (Jira)" <ji...@apache.org> on 2020/08/25 04:16:00 UTC

[jira] [Created] (KAFKA-10429) Group Coordinator is unavailable leads to missing events

Navinder Brar created KAFKA-10429:
-------------------------------------

             Summary: Group Coordinator is unavailable leads to missing events
                 Key: KAFKA-10429
                 URL: https://issues.apache.org/jira/browse/KAFKA-10429
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 1.1.1
            Reporter: Navinder Brar


We are regularly getting this Exception in logs.

[2020-08-25 03:24:59,214] INFO [Consumer clientId=appId-StreamThread-1-consumer, groupId=dashavatara] Group coordinator ip:9092 (id: 1452096777 rack: null) is *unavailable* or invalid, will attempt rediscovery (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

 

And after sometime it becomes discoverable:

[2020-08-25 03:25:02,218] INFO [Consumer clientId=appId-c3d1d186-e487-4993-ae3d-5fed75887e6b-StreamThread-1-consumer, groupId=appId] Discovered group coordinator ip:9092 (id: 1452096777 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

 

Now, the doubt I have is why this unavailability doesn't trigger a rebalance in the cluster. We have few hours of retention on the source Kafka Topics and sometimes this unavailability stays over for more than few hours and since it doesn't trigger a rebalance or stops processing on other nodes(which are connected to GC) we never come to know that some issue has happened and till then we lose events from our source topics. 

 

There are some resolutions mentioned on stackoverflow but those configs are already set in our kafka:

default.replication.factor=3

offsets.topic.replication.factor=3

 

It would be great to understand why this issue is happening and why it doesn't trigger a rebalance and is there any known solution for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)