You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Jason Guo (jguo2)" <jg...@cisco.com> on 2015/04/06 15:12:28 UTC

Is there a complete Kafka 0.8.* replication design document

Hi,

         These days I have been focus on Kafka 0.8 replication design and found three replication design proposals from the wiki (according to the document, the V3 version is used in Kafka 0.8 release).
         But the v3 proposal is not complete and is inconsistent with the release.
Is there a complete Kafka 0.8 replication design document?

Here are part of my questions about Kafka 0.8 replication design.
#1   According to V3, /brokers/topics/[topic]/[partition_id]/leaderAndISR stores leader and ISR of a partition. However in 0.8.2 release there is not such a znode, instead, it use /brokers/topics/[topic]/partitions/[partition_id]/state to store the leader and ISR of a partition.
#2  In /brokers/topics/[topic], we can get all the ISR for all partitions in a certain topic, why we need /brokers/topics/[topic]/partitions/[partition_id]/state ?
#3   I didn't find /admin/partitions_add/[topic]/[partition_id] and /admin/partitions_remove/[topic]/[partition_id] during my adding and removing partitions with bin/kafka-topics.sh. Is this deprecated in the 0.8 release?
         #4  I found these two znode under /admin only will be automaticall removed after the action complete. /admin/reassign_partitions/, /admin/preferred_replica_election/. But why this znode (/admin/delete_topic/) will not be removed automatically?
         #5   What's the LeaderAndISRCommand in Senario A in V3? Is that same with LeaderAndISRRequest?
         #6   For Senario D, when a certain broker becomes Controller, it will send a LeaderAndISRRequest to brokers with a special flag INIT. For Senario C, when the broker receive LeaderAndISRRequest with INIT flag, it will delete all local partitions not in set_p. Why we need to delete all local partitions for Controller changing?
         #7   For Senario E. Broker startup.  The first step is read the replica assignment. Doesn't it need to add its id to /brokers/ids first?
         #8   Senario H. Add/remove partitions to an existing topic.  In my test, I didn't found such znode for PartitionRemove Path/PartitionAdd Path in Zookeeper. Is this approach for partition adding/deleting deprecated? In fact, I didn't observe any znode change during my adding/deleting partitions. So what's the process of Kafka partition adding/deleting?
         #9   Senario G. seems not consistent with the release one



Regards,
Jason






Re: Is there a complete Kafka 0.8.* replication design document

Posted by Jun Rao <ju...@confluent.io>.
Yes, the wiki is a bit old. You can find out more about replication in the
following links.
http://kafka.apache.org/documentation.html#replication
http://www.slideshare.net/junrao/kafka-replication-apachecon2013

#1, #2, #8. See the ZK layout in
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper

#3. Adding partitions is now done by updating /brokers/topics/[topic]
directly.

#4. For deleting a topic, the ZK path
/admin/delete_topics/[topic_to_be_deleted]
is created and removed after the deletion completes.

#5  LeaderAndISRCommand should be the same as LeaderAndISRRequest.

#6 This is to take care of partitions that have been deleted while the
broker is down. The implementation doesn't rely on the special INIT flag.
Instead, it expects the very first LeaderAndISRRequest to include all valid
partitions. Local partitions not in that list will be deleted.

#7 Only the controller needs to read the replica assignment. The controller
can be started before the broker registers itself. This will be handled
through ZK watchers.

#9 The high level algorithm described there is still valid. For the
implementation, you can take a look at ReplicaManager.

Thanks,

Jun


On Mon, Apr 6, 2015 at 6:12 AM, Jason Guo (jguo2) <jg...@cisco.com> wrote:

> Hi,
>
>          These days I have been focus on Kafka 0.8 replication design and
> found three replication design proposals from the wiki (according to the
> document, the V3 version is used in Kafka 0.8 release).
>          But the v3 proposal is not complete and is inconsistent with the
> release.
> Is there a complete Kafka 0.8 replication design document?
>
> Here are part of my questions about Kafka 0.8 replication design.
> #1   According to V3, /brokers/topics/[topic]/[partition_id]/leaderAndISR
> stores leader and ISR of a partition. However in 0.8.2 release there is not
> such a znode, instead, it use
> /brokers/topics/[topic]/partitions/[partition_id]/state to store the leader
> and ISR of a partition.
> #2  In /brokers/topics/[topic], we can get all the ISR for all partitions
> in a certain topic, why we need
> /brokers/topics/[topic]/partitions/[partition_id]/state ?
> #3   I didn't find /admin/partitions_add/[topic]/[partition_id] and
> /admin/partitions_remove/[topic]/[partition_id] during my adding and
> removing partitions with bin/kafka-topics.sh. Is this deprecated in the 0.8
> release?
>          #4  I found these two znode under /admin only will be
> automaticall removed after the action complete.
> /admin/reassign_partitions/, /admin/preferred_replica_election/. But why
> this znode (/admin/delete_topic/) will not be removed automatically?
>          #5   What's the LeaderAndISRCommand in Senario A in V3? Is that
> same with LeaderAndISRRequest?
>          #6   For Senario D, when a certain broker becomes Controller, it
> will send a LeaderAndISRRequest to brokers with a special flag INIT. For
> Senario C, when the broker receive LeaderAndISRRequest with INIT flag, it
> will delete all local partitions not in set_p. Why we need to delete all
> local partitions for Controller changing?
>          #7   For Senario E. Broker startup.  The first step is read the
> replica assignment. Doesn't it need to add its id to /brokers/ids first?
>          #8   Senario H. Add/remove partitions to an existing topic.  In
> my test, I didn't found such znode for PartitionRemove Path/PartitionAdd
> Path in Zookeeper. Is this approach for partition adding/deleting
> deprecated? In fact, I didn't observe any znode change during my
> adding/deleting partitions. So what's the process of Kafka partition
> adding/deleting?
>          #9   Senario G. seems not consistent with the release one
>
>
>
> Regards,
> Jason
>
>
>
>
>
>

Re: Is there a complete Kafka 0.8.* replication design document

Posted by Jun Rao <ju...@confluent.io>.
Yes, the wiki is a bit old. You can find out more about replication in the
following links.
http://kafka.apache.org/documentation.html#replication
http://www.slideshare.net/junrao/kafka-replication-apachecon2013

#1, #2, #8. See the ZK layout in
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper

#3. Adding partitions is now done by updating /brokers/topics/[topic]
directly.

#4. For deleting a topic, the ZK path
/admin/delete_topics/[topic_to_be_deleted]
is created and removed after the deletion completes.

#5  LeaderAndISRCommand should be the same as LeaderAndISRRequest.

#6 This is to take care of partitions that have been deleted while the
broker is down. The implementation doesn't rely on the special INIT flag.
Instead, it expects the very first LeaderAndISRRequest to include all valid
partitions. Local partitions not in that list will be deleted.

#7 Only the controller needs to read the replica assignment. The controller
can be started before the broker registers itself. This will be handled
through ZK watchers.

#9 The high level algorithm described there is still valid. For the
implementation, you can take a look at ReplicaManager.

Thanks,

Jun


On Mon, Apr 6, 2015 at 6:12 AM, Jason Guo (jguo2) <jg...@cisco.com> wrote:

> Hi,
>
>          These days I have been focus on Kafka 0.8 replication design and
> found three replication design proposals from the wiki (according to the
> document, the V3 version is used in Kafka 0.8 release).
>          But the v3 proposal is not complete and is inconsistent with the
> release.
> Is there a complete Kafka 0.8 replication design document?
>
> Here are part of my questions about Kafka 0.8 replication design.
> #1   According to V3, /brokers/topics/[topic]/[partition_id]/leaderAndISR
> stores leader and ISR of a partition. However in 0.8.2 release there is not
> such a znode, instead, it use
> /brokers/topics/[topic]/partitions/[partition_id]/state to store the leader
> and ISR of a partition.
> #2  In /brokers/topics/[topic], we can get all the ISR for all partitions
> in a certain topic, why we need
> /brokers/topics/[topic]/partitions/[partition_id]/state ?
> #3   I didn't find /admin/partitions_add/[topic]/[partition_id] and
> /admin/partitions_remove/[topic]/[partition_id] during my adding and
> removing partitions with bin/kafka-topics.sh. Is this deprecated in the 0.8
> release?
>          #4  I found these two znode under /admin only will be
> automaticall removed after the action complete.
> /admin/reassign_partitions/, /admin/preferred_replica_election/. But why
> this znode (/admin/delete_topic/) will not be removed automatically?
>          #5   What's the LeaderAndISRCommand in Senario A in V3? Is that
> same with LeaderAndISRRequest?
>          #6   For Senario D, when a certain broker becomes Controller, it
> will send a LeaderAndISRRequest to brokers with a special flag INIT. For
> Senario C, when the broker receive LeaderAndISRRequest with INIT flag, it
> will delete all local partitions not in set_p. Why we need to delete all
> local partitions for Controller changing?
>          #7   For Senario E. Broker startup.  The first step is read the
> replica assignment. Doesn't it need to add its id to /brokers/ids first?
>          #8   Senario H. Add/remove partitions to an existing topic.  In
> my test, I didn't found such znode for PartitionRemove Path/PartitionAdd
> Path in Zookeeper. Is this approach for partition adding/deleting
> deprecated? In fact, I didn't observe any znode change during my
> adding/deleting partitions. So what's the process of Kafka partition
> adding/deleting?
>          #9   Senario G. seems not consistent with the release one
>
>
>
> Regards,
> Jason
>
>
>
>
>
>