You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2014/01/24 18:43:38 UTC

[jira] [Commented] (KAFKA-1215) Rack-Aware replica assignment option

    [ https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881193#comment-13881193 ] 

Jun Rao commented on KAFKA-1215:
--------------------------------

Thanks for the patch. Looks good over all. Some comments.

1. KafkaConfig:
1.1 We need a config for default max-rack-replication for auto topic creation.
1.2 rackId: We probably don't want to make this a required property. So, perhaps we can default it to 0?

2. AdminUtils.assignReplicasToBrokers(): 
2.1 Could you add some comments on the rack-aware assignment algorithm?
2.2 It's a bit weird for this method to take zkclient in the input. We probably can pass in a list of Broker objects instead.

3. Unit tests: I suggest that we leave most existing tests intact by keeping the rackId default and add a new test for rack-aware assignment.

4. Compatibility test: It seems that the changes for the broker format in ZK is backward compatible. Could you double check? For example, an old reader (controller, consumer, etc) should be able to parse the broker registered in new format and a new reader should be able to parse the broker registered in the old format. Also, we probably should increase the version in the ZK registration for the broker.

5. Could you rebase to trunk?

> Rack-Aware replica assignment option
> ------------------------------------
>
>                 Key: KAFKA-1215
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1215
>             Project: Kafka
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 0.8.0, 0.8.1
>            Reporter: Joris Van Remoortere
>            Assignee: Neha Narkhede
>             Fix For: 0.8.0, 0.8.1
>
>         Attachments: rack_aware_replica_assignment_v1.patch
>
>
> Adding a rack-id to kafka config. This rack-id can be used during replica assignment by using the max-rack-replication argument in the admin scripts (create topic, etc.). By default the original replication assignment algorithm is used because max-rack-replication defaults to -1. max-rack-replication > -1 is not honored if you are doing manual replica assignment (preffered).
> If this looks good I can add some test cases specific to the rack-aware assignment.
> I can also port this to trunk. We are currently running 0.8.0 in production and need this, so i wrote the patch against that.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Re: [jira] [Commented] (KAFKA-1215) Rack-Aware replica assignment option

Posted by Jun Rao <ju...@gmail.com>.
Thanks for the update. This proposal looks reasonable to me.

One issue that needs a bit more thinking is what happens when a broker is
moved from one rack to another. After the move, constraints on
max-rack-replication could be violated. The question is what we should do
when this happens. A less intrusive approach is to let the broker start and
simply log a warning about the violation. Not sure if this is the best
approach though.

Jun


On Sun, Jan 26, 2014 at 10:07 AM, Joris VanRemoortere <
jvanremoortere@tagged.com> wrote:

> Thanks Jun,
>
> As you said, rack-aware assignment needs to be supported during topic
> creation, adding of partitions, and partition re-assignment.
>
> In order to reduce the risk of using a different value of
> max-rack-replication during different calls to those functions, I'd like to
> carry / persist that information as part of the topic config. This way the
> option is mandatory during topic creation, and optional partition
> re-assign.
>
> Here is my proposal:
>
> Topic Create:
>
>    - max-rack-replication mandatory
>    - store max-rack-replication as a topic config
>
> Partition Add:
>
>    - Use the stored value of max-rack-replication to add another partition
>    using the same max-rack-replication value as the rest of the partitions
> in
>    the topic
>
> Partition Re-assign:
>
>    - If max-rack-replication is provided, treat this as an altered config.
>    This forces a re-validation of all replica assignments of all partitions
>    for the topic. If this succeeds, save the updated topic config.
>    - If max-rack-replication is not provided, used the saved value in the
>    topic config.
>
> Alter Topic:
>
>    - If someone over-rides the max-rack-replication value, we re-validate
>    all replica assignments as with partition re-assignment.
>
> I think this strategy will be less error prone, making it impossible to
> have multiple partitions in a topic using different max-rack-replication
> values. Let me know if this sounds ok to you,
>
> Joris
>
>
> On Sun, Jan 26, 2014 at 9:15 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > Replica assignments can change during topic creation, adding partitions,
> > and reassign partitions. So, it would be good to integrate rack-aware
> > assignment in those cases.
> >
> > For compatibility, to do a rolling upgrade of a Kafka cluster means that
> at
> > a given point of time, some brokers could be registered in the old format
> > and some others could be registered in the new format. Both the consumer
> > and the controller (
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Controller+Internals
> > )
> > read the broker registration in ZK. So, we need to make sure that the new
> > code can read the old format and the old code can read the new format
> too.
> > For example, in Broker.createBroker(), if the code is looking for a field
> > "rackid" but couldn't find it, we need to be able to provide a reasonable
> > default.
> >
> > Also, take a look at the format of broker registration in ZK in
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper
> > .
> > Since we are changing the format, we should probably change the
> > version
> > from 1 to 2. However, we can make this change backward compatible.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Fri, Jan 24, 2014 at 1:57 PM, Joris VanRemoortere <
> > jvanremoortere@tagged.com> wrote:
> >
> > > Working on the above.
> > >
> > > Since trunk currently allows Topic configs, and altering topics, does
> it
> > > make sense to you to make this a topic config so that it is kept in
> sync
> > > between topic alterations and partition adds?
> > >
> > > For compatibility: I can make the zookeeper state parsing backwards /
> > > forwards compatible. I don't think I can make the readFrom / writeTo
> > > ByteBuffer compatible since chained calls of these during api parsing
> are
> > > size dependent? Can you clarify exactly what you expect to be
> compatible?
> > >
> > > I'm also not very familiar with the consequences of changing the broker
> > > version in zookeeper registration. Can you please provide a reference
> to
> > > documentation for this?
> > >
> > > Thanks,
> > >
> > > Joris
> > >
> > >
> > > On Fri, Jan 24, 2014 at 9:43 AM, Jun Rao (JIRA) <ji...@apache.org>
> wrote:
> > >
> > > >
> > > >     [
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881193#comment-13881193
> > > ]
> > > >
> > > > Jun Rao commented on KAFKA-1215:
> > > > --------------------------------
> > > >
> > > > Thanks for the patch. Looks good over all. Some comments.
> > > >
> > > > 1. KafkaConfig:
> > > > 1.1 We need a config for default max-rack-replication for auto topic
> > > > creation.
> > > > 1.2 rackId: We probably don't want to make this a required property.
> > So,
> > > > perhaps we can default it to 0?
> > > >
> > > > 2. AdminUtils.assignReplicasToBrokers():
> > > > 2.1 Could you add some comments on the rack-aware assignment
> algorithm?
> > > > 2.2 It's a bit weird for this method to take zkclient in the input.
> We
> > > > probably can pass in a list of Broker objects instead.
> > > >
> > > > 3. Unit tests: I suggest that we leave most existing tests intact by
> > > > keeping the rackId default and add a new test for rack-aware
> > assignment.
> > > >
> > > > 4. Compatibility test: It seems that the changes for the broker
> format
> > in
> > > > ZK is backward compatible. Could you double check? For example, an
> old
> > > > reader (controller, consumer, etc) should be able to parse the broker
> > > > registered in new format and a new reader should be able to parse the
> > > > broker registered in the old format. Also, we probably should
> increase
> > > the
> > > > version in the ZK registration for the broker.
> > > >
> > > > 5. Could you rebase to trunk?
> > > >
> > > > > Rack-Aware replica assignment option
> > > > > ------------------------------------
> > > > >
> > > > >                 Key: KAFKA-1215
> > > > >                 URL:
> > https://issues.apache.org/jira/browse/KAFKA-1215
> > > > >             Project: Kafka
> > > > >          Issue Type: Improvement
> > > > >          Components: replication
> > > > >    Affects Versions: 0.8.0, 0.8.1
> > > > >            Reporter: Joris Van Remoortere
> > > > >            Assignee: Neha Narkhede
> > > > >             Fix For: 0.8.0, 0.8.1
> > > > >
> > > > >         Attachments: rack_aware_replica_assignment_v1.patch
> > > > >
> > > > >
> > > > > Adding a rack-id to kafka config. This rack-id can be used during
> > > > replica assignment by using the max-rack-replication argument in the
> > > admin
> > > > scripts (create topic, etc.). By default the original replication
> > > > assignment algorithm is used because max-rack-replication defaults to
> > -1.
> > > > max-rack-replication > -1 is not honored if you are doing manual
> > replica
> > > > assignment (preffered).
> > > > > If this looks good I can add some test cases specific to the
> > rack-aware
> > > > assignment.
> > > > > I can also port this to trunk. We are currently running 0.8.0 in
> > > > production and need this, so i wrote the patch against that.
> > > >
> > > >
> > > >
> > > > --
> > > > This message was sent by Atlassian JIRA
> > > > (v6.1.5#6160)
> > > >
> > >
> >
>

Re: [jira] [Commented] (KAFKA-1215) Rack-Aware replica assignment option

Posted by Joris VanRemoortere <jv...@tagged.com>.
Thanks Jun,

As you said, rack-aware assignment needs to be supported during topic
creation, adding of partitions, and partition re-assignment.

In order to reduce the risk of using a different value of
max-rack-replication during different calls to those functions, I'd like to
carry / persist that information as part of the topic config. This way the
option is mandatory during topic creation, and optional partition re-assign.

Here is my proposal:

Topic Create:

   - max-rack-replication mandatory
   - store max-rack-replication as a topic config

Partition Add:

   - Use the stored value of max-rack-replication to add another partition
   using the same max-rack-replication value as the rest of the partitions in
   the topic

Partition Re-assign:

   - If max-rack-replication is provided, treat this as an altered config.
   This forces a re-validation of all replica assignments of all partitions
   for the topic. If this succeeds, save the updated topic config.
   - If max-rack-replication is not provided, used the saved value in the
   topic config.

Alter Topic:

   - If someone over-rides the max-rack-replication value, we re-validate
   all replica assignments as with partition re-assignment.

I think this strategy will be less error prone, making it impossible to
have multiple partitions in a topic using different max-rack-replication
values. Let me know if this sounds ok to you,

Joris


On Sun, Jan 26, 2014 at 9:15 AM, Jun Rao <ju...@gmail.com> wrote:

> Replica assignments can change during topic creation, adding partitions,
> and reassign partitions. So, it would be good to integrate rack-aware
> assignment in those cases.
>
> For compatibility, to do a rolling upgrade of a Kafka cluster means that at
> a given point of time, some brokers could be registered in the old format
> and some others could be registered in the new format. Both the consumer
> and the controller (
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Controller+Internals
> )
> read the broker registration in ZK. So, we need to make sure that the new
> code can read the old format and the old code can read the new format too.
> For example, in Broker.createBroker(), if the code is looking for a field
> "rackid" but couldn't find it, we need to be able to provide a reasonable
> default.
>
> Also, take a look at the format of broker registration in ZK in
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper
> .
> Since we are changing the format, we should probably change the
> version
> from 1 to 2. However, we can make this change backward compatible.
>
> Thanks,
>
> Jun
>
>
> On Fri, Jan 24, 2014 at 1:57 PM, Joris VanRemoortere <
> jvanremoortere@tagged.com> wrote:
>
> > Working on the above.
> >
> > Since trunk currently allows Topic configs, and altering topics, does it
> > make sense to you to make this a topic config so that it is kept in sync
> > between topic alterations and partition adds?
> >
> > For compatibility: I can make the zookeeper state parsing backwards /
> > forwards compatible. I don't think I can make the readFrom / writeTo
> > ByteBuffer compatible since chained calls of these during api parsing are
> > size dependent? Can you clarify exactly what you expect to be compatible?
> >
> > I'm also not very familiar with the consequences of changing the broker
> > version in zookeeper registration. Can you please provide a reference to
> > documentation for this?
> >
> > Thanks,
> >
> > Joris
> >
> >
> > On Fri, Jan 24, 2014 at 9:43 AM, Jun Rao (JIRA) <ji...@apache.org> wrote:
> >
> > >
> > >     [
> > >
> >
> https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881193#comment-13881193
> > ]
> > >
> > > Jun Rao commented on KAFKA-1215:
> > > --------------------------------
> > >
> > > Thanks for the patch. Looks good over all. Some comments.
> > >
> > > 1. KafkaConfig:
> > > 1.1 We need a config for default max-rack-replication for auto topic
> > > creation.
> > > 1.2 rackId: We probably don't want to make this a required property.
> So,
> > > perhaps we can default it to 0?
> > >
> > > 2. AdminUtils.assignReplicasToBrokers():
> > > 2.1 Could you add some comments on the rack-aware assignment algorithm?
> > > 2.2 It's a bit weird for this method to take zkclient in the input. We
> > > probably can pass in a list of Broker objects instead.
> > >
> > > 3. Unit tests: I suggest that we leave most existing tests intact by
> > > keeping the rackId default and add a new test for rack-aware
> assignment.
> > >
> > > 4. Compatibility test: It seems that the changes for the broker format
> in
> > > ZK is backward compatible. Could you double check? For example, an old
> > > reader (controller, consumer, etc) should be able to parse the broker
> > > registered in new format and a new reader should be able to parse the
> > > broker registered in the old format. Also, we probably should increase
> > the
> > > version in the ZK registration for the broker.
> > >
> > > 5. Could you rebase to trunk?
> > >
> > > > Rack-Aware replica assignment option
> > > > ------------------------------------
> > > >
> > > >                 Key: KAFKA-1215
> > > >                 URL:
> https://issues.apache.org/jira/browse/KAFKA-1215
> > > >             Project: Kafka
> > > >          Issue Type: Improvement
> > > >          Components: replication
> > > >    Affects Versions: 0.8.0, 0.8.1
> > > >            Reporter: Joris Van Remoortere
> > > >            Assignee: Neha Narkhede
> > > >             Fix For: 0.8.0, 0.8.1
> > > >
> > > >         Attachments: rack_aware_replica_assignment_v1.patch
> > > >
> > > >
> > > > Adding a rack-id to kafka config. This rack-id can be used during
> > > replica assignment by using the max-rack-replication argument in the
> > admin
> > > scripts (create topic, etc.). By default the original replication
> > > assignment algorithm is used because max-rack-replication defaults to
> -1.
> > > max-rack-replication > -1 is not honored if you are doing manual
> replica
> > > assignment (preffered).
> > > > If this looks good I can add some test cases specific to the
> rack-aware
> > > assignment.
> > > > I can also port this to trunk. We are currently running 0.8.0 in
> > > production and need this, so i wrote the patch against that.
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.1.5#6160)
> > >
> >
>

Re: [jira] [Commented] (KAFKA-1215) Rack-Aware replica assignment option

Posted by Jun Rao <ju...@gmail.com>.
Replica assignments can change during topic creation, adding partitions,
and reassign partitions. So, it would be good to integrate rack-aware
assignment in those cases.

For compatibility, to do a rolling upgrade of a Kafka cluster means that at
a given point of time, some brokers could be registered in the old format
and some others could be registered in the new format. Both the consumer
and the controller (
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Controller+Internals)
read the broker registration in ZK. So, we need to make sure that the new
code can read the old format and the old code can read the new format too.
For example, in Broker.createBroker(), if the code is looking for a field
"rackid" but couldn't find it, we need to be able to provide a reasonable
default.

Also, take a look at the format of broker registration in ZK in
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper.
Since we are changing the format, we should probably change the
version
from 1 to 2. However, we can make this change backward compatible.

Thanks,

Jun


On Fri, Jan 24, 2014 at 1:57 PM, Joris VanRemoortere <
jvanremoortere@tagged.com> wrote:

> Working on the above.
>
> Since trunk currently allows Topic configs, and altering topics, does it
> make sense to you to make this a topic config so that it is kept in sync
> between topic alterations and partition adds?
>
> For compatibility: I can make the zookeeper state parsing backwards /
> forwards compatible. I don't think I can make the readFrom / writeTo
> ByteBuffer compatible since chained calls of these during api parsing are
> size dependent? Can you clarify exactly what you expect to be compatible?
>
> I'm also not very familiar with the consequences of changing the broker
> version in zookeeper registration. Can you please provide a reference to
> documentation for this?
>
> Thanks,
>
> Joris
>
>
> On Fri, Jan 24, 2014 at 9:43 AM, Jun Rao (JIRA) <ji...@apache.org> wrote:
>
> >
> >     [
> >
> https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881193#comment-13881193
> ]
> >
> > Jun Rao commented on KAFKA-1215:
> > --------------------------------
> >
> > Thanks for the patch. Looks good over all. Some comments.
> >
> > 1. KafkaConfig:
> > 1.1 We need a config for default max-rack-replication for auto topic
> > creation.
> > 1.2 rackId: We probably don't want to make this a required property. So,
> > perhaps we can default it to 0?
> >
> > 2. AdminUtils.assignReplicasToBrokers():
> > 2.1 Could you add some comments on the rack-aware assignment algorithm?
> > 2.2 It's a bit weird for this method to take zkclient in the input. We
> > probably can pass in a list of Broker objects instead.
> >
> > 3. Unit tests: I suggest that we leave most existing tests intact by
> > keeping the rackId default and add a new test for rack-aware assignment.
> >
> > 4. Compatibility test: It seems that the changes for the broker format in
> > ZK is backward compatible. Could you double check? For example, an old
> > reader (controller, consumer, etc) should be able to parse the broker
> > registered in new format and a new reader should be able to parse the
> > broker registered in the old format. Also, we probably should increase
> the
> > version in the ZK registration for the broker.
> >
> > 5. Could you rebase to trunk?
> >
> > > Rack-Aware replica assignment option
> > > ------------------------------------
> > >
> > >                 Key: KAFKA-1215
> > >                 URL: https://issues.apache.org/jira/browse/KAFKA-1215
> > >             Project: Kafka
> > >          Issue Type: Improvement
> > >          Components: replication
> > >    Affects Versions: 0.8.0, 0.8.1
> > >            Reporter: Joris Van Remoortere
> > >            Assignee: Neha Narkhede
> > >             Fix For: 0.8.0, 0.8.1
> > >
> > >         Attachments: rack_aware_replica_assignment_v1.patch
> > >
> > >
> > > Adding a rack-id to kafka config. This rack-id can be used during
> > replica assignment by using the max-rack-replication argument in the
> admin
> > scripts (create topic, etc.). By default the original replication
> > assignment algorithm is used because max-rack-replication defaults to -1.
> > max-rack-replication > -1 is not honored if you are doing manual replica
> > assignment (preffered).
> > > If this looks good I can add some test cases specific to the rack-aware
> > assignment.
> > > I can also port this to trunk. We are currently running 0.8.0 in
> > production and need this, so i wrote the patch against that.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.1.5#6160)
> >
>

Re: [jira] [Commented] (KAFKA-1215) Rack-Aware replica assignment option

Posted by Joris VanRemoortere <jv...@tagged.com>.
Working on the above.

Since trunk currently allows Topic configs, and altering topics, does it
make sense to you to make this a topic config so that it is kept in sync
between topic alterations and partition adds?

For compatibility: I can make the zookeeper state parsing backwards /
forwards compatible. I don't think I can make the readFrom / writeTo
ByteBuffer compatible since chained calls of these during api parsing are
size dependent? Can you clarify exactly what you expect to be compatible?

I'm also not very familiar with the consequences of changing the broker
version in zookeeper registration. Can you please provide a reference to
documentation for this?

Thanks,

Joris


On Fri, Jan 24, 2014 at 9:43 AM, Jun Rao (JIRA) <ji...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881193#comment-13881193]
>
> Jun Rao commented on KAFKA-1215:
> --------------------------------
>
> Thanks for the patch. Looks good over all. Some comments.
>
> 1. KafkaConfig:
> 1.1 We need a config for default max-rack-replication for auto topic
> creation.
> 1.2 rackId: We probably don't want to make this a required property. So,
> perhaps we can default it to 0?
>
> 2. AdminUtils.assignReplicasToBrokers():
> 2.1 Could you add some comments on the rack-aware assignment algorithm?
> 2.2 It's a bit weird for this method to take zkclient in the input. We
> probably can pass in a list of Broker objects instead.
>
> 3. Unit tests: I suggest that we leave most existing tests intact by
> keeping the rackId default and add a new test for rack-aware assignment.
>
> 4. Compatibility test: It seems that the changes for the broker format in
> ZK is backward compatible. Could you double check? For example, an old
> reader (controller, consumer, etc) should be able to parse the broker
> registered in new format and a new reader should be able to parse the
> broker registered in the old format. Also, we probably should increase the
> version in the ZK registration for the broker.
>
> 5. Could you rebase to trunk?
>
> > Rack-Aware replica assignment option
> > ------------------------------------
> >
> >                 Key: KAFKA-1215
> >                 URL: https://issues.apache.org/jira/browse/KAFKA-1215
> >             Project: Kafka
> >          Issue Type: Improvement
> >          Components: replication
> >    Affects Versions: 0.8.0, 0.8.1
> >            Reporter: Joris Van Remoortere
> >            Assignee: Neha Narkhede
> >             Fix For: 0.8.0, 0.8.1
> >
> >         Attachments: rack_aware_replica_assignment_v1.patch
> >
> >
> > Adding a rack-id to kafka config. This rack-id can be used during
> replica assignment by using the max-rack-replication argument in the admin
> scripts (create topic, etc.). By default the original replication
> assignment algorithm is used because max-rack-replication defaults to -1.
> max-rack-replication > -1 is not honored if you are doing manual replica
> assignment (preffered).
> > If this looks good I can add some test cases specific to the rack-aware
> assignment.
> > I can also port this to trunk. We are currently running 0.8.0 in
> production and need this, so i wrote the patch against that.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1.5#6160)
>