You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Manikumar (JIRA)" <ji...@apache.org> on 2018/05/28 18:13:00 UTC
[jira] [Resolved] (KAFKA-4368) Unclean shutdown breaks Kafka
cluster
[ https://issues.apache.org/jira/browse/KAFKA-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manikumar resolved KAFKA-4368.
------------------------------
Resolution: Auto Closed
Closing inactive issue.
> Unclean shutdown breaks Kafka cluster
> -------------------------------------
>
> Key: KAFKA-4368
> URL: https://issues.apache.org/jira/browse/KAFKA-4368
> Project: Kafka
> Issue Type: Bug
> Components: producer
> Affects Versions: 0.9.0.1, 0.10.0.0
> Reporter: Anukool Rattana
> Priority: Critical
>
> My team has observed that if broker process die unclean then it will block producer from sending messages to kafka topic.
> Here is how to reproduce the problem:
> 1) Create a Kafka 0.10 with three brokers (A, B and C).
> 2) Create topic with replication_factor = 2
> 3) Set producer to send messages with "acks=all" meaning all replicas must be created before able to proceed next message.
> 4) Force IEM (IBM Endpoint Manager) to send patch to broker A and force server to reboot after patches installed.
> Note: min.insync.replicas = 1
> Result: - Producers are not able send messages to kafka topic after broker rebooted and come back to join cluster with following error messages.
> [2016-09-28 09:32:41,823] WARN Error while fetching metadata with correlation id 0 : {logstash=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
> We suspected that number of replication_factor (2) is not sufficient to our kafka environment but really need an explanation on what happen when broker facing unclean shutdown.
> The same issue occurred when setting cluster with 2 brokers and replication_factor = 1.
> The workaround i used to recover service is to cleanup both kafka topic log file and zookeeper data (rmr /brokers/topics/XXX and rmr /consumers/XXX).
> Note:
> Topic list after A comeback from rebooted.
> Topic:logstash PartitionCount:3 ReplicationFactor:2 Configs:
> Topic: logstash Partition: 0 Leader: 1 Replicas: 1,3 Isr: 1,3
> Topic: logstash Partition: 1 Leader: 2 Replicas: 2,1 Isr: 2,1
> Topic: logstash Partition: 2 Leader: 3 Replicas: 3,2 Isr: 2,3
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)