You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Aaditya Ramesh (JIRA)" <ji...@apache.org> on 2015/09/16 22:29:45 UTC

[jira] [Created] (KAFKA-2553) Kafka Consumer Hangs after Network Partition

Aaditya Ramesh created KAFKA-2553:
-------------------------------------

             Summary: Kafka Consumer Hangs after Network Partition
                 Key: KAFKA-2553
                 URL: https://issues.apache.org/jira/browse/KAFKA-2553
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 0.8.1.1
         Environment: Amazon EC2, Ubuntu 12.04.
            Reporter: Aaditya Ramesh
            Assignee: Neha Narkhede
         Attachments: kafka_bug_report

We have a Kafka consumer in an EC2 instance in Ireland that fetches data from a kafka cluster in a datacenter in the eastern United States. We sporadically encounter strange network partitions where we are unable to ping any machines between the two data centers (the ping always times out), but this kind of network partition is not too strange for inter-data center connections. However, Kafka consumer's connection to Zookeeper never recovers after one of these network hiccups and requires a full process restart in order to begin consuming from the remote data center after the network has recovered. The relevant code in ZookeeperConsumerConnector.scala catches all Throwables and does nothing with them, which not only doesn't alert the process, but also doesn't display any alerting metrics that we could use to diagnose such a hung state. We therefore patched the client code in our codebase to perform a System.exit(0) whenever this occurs, since a restart is better than failing silently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)