You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Pradeep Jawahar <pj...@groupon.com> on 2015/09/02 20:19:51 UTC

Recovery skipped after unclean shutdown

One of the brokers in our cluster had an unclean shutdown and after it was
restated I found the following logs.

$ grep "clean shutdown" /var/groupon/kafka/kafka-broker.log
02/Sep/2015 16:19:23   - warn::[Kafka Server 1], Proceeding to do an
unclean shutdown as all the controlled shutdown attempts failed
02/Sep/2015 16:22:06   - info::Found clean shutdown file. Skipping recovery
for all logs in data directory '/data/vol1'
02/Sep/2015 16:22:11   - info::Found clean shutdown file. Skipping recovery
for all logs in data directory '/data/vol2'
02/Sep/2015 16:22:15   - info::Found clean shutdown file. Skipping recovery
for all logs in data directory '/data/vol3'
02/Sep/2015 16:22:18   - info::Found clean shutdown file. Skipping recovery
for all logs in data directory '/data/vol4'
02/Sep/2015 16:22:22   - info::Found clean shutdown file. Skipping recovery
for all logs in data directory '/data/vol5'
02/Sep/2015 16:22:22   - info::Found clean shutdown file. Skipping recovery
for all logs in data directory '/data/vol6'
02/Sep/2015 16:22:26   - info::Found clean shutdown file. Skipping recovery
for all logs in data directory '/data/vol7'
02/Sep/2015 16:22:29   - info::Found clean shutdown file. Skipping recovery
for all logs in data directory '/data/vol8'

So no recovery happened and the partitions managed by this broker are not
catching up with the other replicas. I found that the ReplicaFetcher
threads for each of the partitions died.

Is anyone aware of how to get out of this situation. I was trying to locate
the shutdown file (may be it was left over from a previous run) and delete
it.


Additional Information
~~~~~~~~~~~~~~~~~
Kafka v 0.8.1.1
Centos 5
11 node cluster with replication factor 3
Disks are JBODs

Thanks,
Pradeep