You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Manikumar (JIRA)" <ji...@apache.org> on 2018/02/21 18:25:00 UTC

[jira] [Resolved] (KAFKA-2553) Kafka Consumer Hangs after Network Partition

     [ https://issues.apache.org/jira/browse/KAFKA-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Manikumar resolved KAFKA-2553.
------------------------------
    Resolution: Fixed

Similar issues areĀ in KAFKA-2169. Pl reopen if the issue still exists

> Kafka Consumer Hangs after Network Partition
> --------------------------------------------
>
>                 Key: KAFKA-2553
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2553
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.8.1.1
>         Environment: Amazon EC2, Ubuntu 12.04.
>            Reporter: Aaditya Ramesh
>            Assignee: Neha Narkhede
>            Priority: Major
>         Attachments: kafka_bug_report
>
>
> We have a Kafka consumer in an EC2 instance in Ireland that fetches data from a kafka cluster in a datacenter in the eastern United States. We sporadically encounter strange network partitions where we are unable to ping any machines between the two data centers (the ping always times out), but this kind of network partition is not too strange for inter-data center connections. However, Kafka consumer's connection to Zookeeper never recovers after one of these network hiccups and requires a full process restart in order to begin consuming from the remote data center after the network has recovered. The relevant code in ZookeeperConsumerConnector.scala catches all Throwables and does nothing with them, which not only doesn't alert the process, but also doesn't display any alerting metrics that we could use to diagnose such a hung state. We therefore patched the client code in our codebase to perform a System.exit(0) whenever this occurs, since a restart is better than failing silently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)