You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jay Kreps (JIRA)" <ji...@apache.org> on 2015/09/04 23:56:45 UTC

[jira] [Comment Edited] (KAFKA-2510) Prevent broker from re-replicating / losing data due to disk misconfiguration

    [ https://issues.apache.org/jira/browse/KAFKA-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731497#comment-14731497 ] 

Jay Kreps edited comment on KAFKA-2510 at 9/4/15 9:56 PM:
----------------------------------------------------------

I actually think we shouldn't prevent this.

In our replication model the data on disk is basically a cache. If it's there the broker uses it to help make its own recovery faster and just pulls the diff from replicas. If its not there it recreates it. You are allowed to lose what is on disk at any time.

For the chef case, yes, if you botch the directory you will replicate data, but the same is true of the old node id as well as things like the zk URL, etc. Replicating will be slow but not fatal. The case of rolling restart you actually won't lose data if you use controlled shutdown. If you don't use controlled shutdown you will lose data no matter what.

So today if you want you can wipe your data and restart and the broker happily re-replicates. If you have a disk failure you can repair it and restart. And if your AWS instance disappears you can move it over to another and it re-replicates. If you need to rebuild your RAID array, or want to add disks and get data on them all of these can be accomplished by deleting your data and bouncing the instance.

We actually intended to exploit this for running in AWS and Mesos more elastically when we do automated data balancing. For example in Mesos the mesos guys are going to add a feature in Marathon where tasks will be semi-sticky to nodes. So if a Kafka node is restarted or dies mesos will prefer to restart it on the node it was on (if that node is still around and has free slots). If not it will start it elsewhere (where it will, of course, have no data).


was (Author: jkreps):
I actually think we shouldn't prevent this.

In our replication model the data on disk is basically a cache. If it's there the broker uses it to help make its own recovery faster and just pulls the diff from replicas. If its not there it recreates it. You are allowed to lose what is on disk at any time.

For the chef case, yes, if you botch the directory you will replicate data, but the same is true of the old node id as well as things like the zk URL, etc. Replicating will be slow but not fatal. The case of rolling restart is not correct if you use controlled shutdown. If you don't use controlled shutdown you will lose data no matter what.

So today if you want you can wipe your data and restart and the broker happily re-replicates. If you have a disk failure you can repair it and restart. And if your AWS instance disappears you can move it over to another and it re-replicates.

We actually intended to exploit this for running in AWS and Mesos more elastically when we do automated data balancing. For example in Mesos the mesos guys are going to add a feature in Marathon where tasks will be semi-sticky to nodes. So if a Kafka node is restarted or dies mesos will prefer to restart it on the node it was on (if that node is still around and has free slots). If not it will start it elsewhere (where it will, of course, have no data).

> Prevent broker from re-replicating / losing data due to disk misconfiguration
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-2510
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2510
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Gwen Shapira
>
> Currently Kafka assumes that whatever it sees in the data directory is the correct state of the data.
> This means that if an admin mistakenly configures Chef to use wrong data directory, one of the following can happen:
> 1. The broker will replicate a bunch of partitions and take over the network
> 2. If you did this to enough brokers, you can lose entire topics and partitions.
> We have information about existing topics, partitions and their ISR in zookeeper.
> We need a mode in which if a broker starts, is in ISR for a partition and doesn't have any data or directory for the partition, the broker will issue a huge ERROR in the log and refuse to do anything for the partition.
> [~fpj] worked on the problem for ZK and had some ideas on what is required here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)