You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Anukool Rattana (JIRA)" <ji...@apache.org> on 2016/11/02 11:14:59 UTC

[jira] [Created] (KAFKA-4368) Unclean shutdown breaks Kafka cluster

Anukool Rattana created KAFKA-4368:
--------------------------------------

             Summary: Unclean shutdown breaks Kafka cluster
                 Key: KAFKA-4368
                 URL: https://issues.apache.org/jira/browse/KAFKA-4368
             Project: Kafka
          Issue Type: Bug
          Components: producer 
    Affects Versions: 0.10.0.0, 0.9.0.1
            Reporter: Anukool Rattana
            Priority: Critical


My team has observed that if broker process die unclean then it will block producer from sending messages to kafka topic.

Here is how to reproduce the problem:
1) Create a Kafka 0.10 with three brokers (A, B and C). 
2) Create topic with replication_factor = 2 
3) Set producer to send messages with "acks=all" meaning all replicas must be created before able to proceed next message. 
4) Force IEM (IBM Endpoint Manager) to send patch to broker A and force server to reboot after patches installed.
Note: min.insync.replicas = 1


Result: - Producers are not able send messages to kafka topic after broker rebooted and come back to join cluster with following error messages. 

[2016-09-28 09:32:41,823] WARN Error while fetching metadata with correlation id 0 : {logstash=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)

We suspected that number of replication_factor (2) is not sufficient to our kafka environment but really need an explanation on what happen when broker facing unclean shutdown. 
The same issue occurred when setting cluster with 2 brokers and replication_factor = 1.

The workaround i used to recover service is to cleanup both kafka topic log file and zookeeper data (rmr /brokers/topics/XXX and rmr /consumers/XXX).

Note:
Topic list after A comeback from rebooted.
Topic:logstash  PartitionCount:3        ReplicationFactor:2     Configs:
        Topic: logstash Partition: 0    Leader: 1       Replicas: 1,3   Isr: 1,3
        Topic: logstash Partition: 1    Leader: 2       Replicas: 2,1   Isr: 2,1
        Topic: logstash Partition: 2    Leader: 3       Replicas: 3,2   Isr: 2,3




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)