You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Allen Wang <al...@gmail.com> on 2015/09/28 21:15:33 UTC

[DISCUSS] KIP-36 - Rack aware replica assignment

Hello Kafka Developers,

I just created KIP-36 for rack aware replica assignment.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment

The goal is to utilize the isolation provided by the racks in data center
and distribute replicas to racks to provide fault tolerance.

Comments are welcome.

Thanks,
Allen

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Grant Henke <gh...@cloudera.com>.
NULLABLE_STRING was just committed to trunk:
https://github.com/apache/kafka/pull/866

Also we should pull in this PR before making changes to UpdateMetadata:
https://github.com/apache/kafka/pull/896

Thanks,
Grant

On Fri, Feb 12, 2016 at 8:16 PM, Joel Koshy <jj...@gmail.com> wrote:

> We are adding a NULLABLE_STRING type (KAFKA-3088) but you would then need
> to evolve the UpdateMetadata request. Regardless, it seems better to just
> go with an empty string.
>
> On Fri, Feb 12, 2016 at 5:38 PM, Allen Wang <al...@gmail.com> wrote:
>
> > In implementing changes to UpdateMetadataRequest, I noticed
> > that org.apache.kafka.common.protocol.types.STRING does not allow null
> > value. This creates a problem for rack as it is an optional field for
> > broker. In Scala, it is declared as Option[String]. I was planning to
> > transmit the rack as null in the protocol if rack is not configured for
> the
> > broker.
> >
> > There are two options:
> >
> > - Transmit the rack as empty string if rack is not configured for the
> > broker. This implies that empty string cannot be used for the rack we
> need
> > to do this validation. This is reasonable since empty string for the rack
> > is most likely a user error and I cannot think of a use case why users
> > would pick empty string as rack. It does create some inconsistency
> between
> > what gets transmitted on the wire vs. the actual value in broker runtime.
> >
> > - Change STRING to allow null. I think that is also reasonable since
> > ApiUtils.writeShortString and ApiUtils.readShortString APIs support null.
> > However, I would like to know if there is any particular reason not to
> > allow null for STRING.
> >
> > Any opinions?
> >
> > Thanks,
> > Allen
> >
> >
> > On Wed, Jan 20, 2016 at 1:50 PM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > Hi Arun,
> > >
> > > This is about making replica assignment rack aware. It is not about
> > making
> > > replica assignment algorithm pluggable. I think plug-ability should be
> > > discussed separately from this KIP.
> > >
> > > Thanks,
> > > Allen
> > >
> > >
> > > On Tue, Jan 19, 2016 at 11:16 PM, Arun Mahadevan <ar...@apache.org>
> > wrote:
> > >
> > >> Nice feature. Is this going to support only rack aware assignments?
> > >>
> > >> May be nice to make the implementation pluggable (with rack aware
> being
> > >> one) so that other kind of assignment algorithms can be plugged in
> > future.
> > >>
> > >> - Arun
> > >>
> > >>
> > >>
> > >> On 1/15/16, 12:22 AM, "Allen Wang" <al...@gmail.com> wrote:
> > >>
> > >> >Thanks Ismael. KIP is updated to use 0.9.0.0 and add link to the
> JIRA.
> > >> >
> > >> >
> > >> >On Thu, Jan 14, 2016 at 8:46 AM, Ismael Juma <is...@juma.me.uk>
> > wrote:
> > >> >
> > >> >> On Thu, Jan 14, 2016 at 1:24 AM, Allen Wang <al...@gmail.com>
> > >> wrote:
> > >> >>
> > >> >> > Updated KIP regarding how broker JSON version will be handled and
> > new
> > >> >> > procedure of upgrade.
> > >> >>
> > >> >>
> > >> >> Thanks Allen. In the following text, I think we should replace
> 0.9.0
> > >> with
> > >> >> 0.9.0.0:
> > >> >>
> > >> >> "Due to a bug introduced in 0.9.0 in ZkUtils.getBrokerInfo(), old
> > >> clients
> > >> >> will throw an exception when it sees the broker JSON version is
> not 1
> > >> or 2.
> > >> >> Therefore, *a minor release 0.9.0.1 is required* to fix the problem
> > >> first
> > >> >> so that old clients can parse future version of broker JSON in
> > >> ZooKeeper.
> > >> >> That means 0.9.0 clients must be upgraded to 0.9.0.1 before 0.9.1
> > >> upgrade
> > >> >> can start. In addition, since ZkUtils.getBrokerInfo() is also used
> by
> > >> >> broker, version specific code has to be used when registering
> broker
> > >> with
> > >> >> ZooKeeper"
> > >> >>
> > >> >> Also, I posted a PR for supporting version > 2 in 0.9.0.1 and
> trunk:
> > >> >>
> > >> >> https://github.com/apache/kafka/pull/773
> > >> >>
> > >> >> Ismael
> > >> >>
> > >>
> > >>
> > >
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Joel Koshy <jj...@gmail.com>.
We are adding a NULLABLE_STRING type (KAFKA-3088) but you would then need
to evolve the UpdateMetadata request. Regardless, it seems better to just
go with an empty string.

On Fri, Feb 12, 2016 at 5:38 PM, Allen Wang <al...@gmail.com> wrote:

> In implementing changes to UpdateMetadataRequest, I noticed
> that org.apache.kafka.common.protocol.types.STRING does not allow null
> value. This creates a problem for rack as it is an optional field for
> broker. In Scala, it is declared as Option[String]. I was planning to
> transmit the rack as null in the protocol if rack is not configured for the
> broker.
>
> There are two options:
>
> - Transmit the rack as empty string if rack is not configured for the
> broker. This implies that empty string cannot be used for the rack we need
> to do this validation. This is reasonable since empty string for the rack
> is most likely a user error and I cannot think of a use case why users
> would pick empty string as rack. It does create some inconsistency between
> what gets transmitted on the wire vs. the actual value in broker runtime.
>
> - Change STRING to allow null. I think that is also reasonable since
> ApiUtils.writeShortString and ApiUtils.readShortString APIs support null.
> However, I would like to know if there is any particular reason not to
> allow null for STRING.
>
> Any opinions?
>
> Thanks,
> Allen
>
>
> On Wed, Jan 20, 2016 at 1:50 PM, Allen Wang <al...@gmail.com> wrote:
>
> > Hi Arun,
> >
> > This is about making replica assignment rack aware. It is not about
> making
> > replica assignment algorithm pluggable. I think plug-ability should be
> > discussed separately from this KIP.
> >
> > Thanks,
> > Allen
> >
> >
> > On Tue, Jan 19, 2016 at 11:16 PM, Arun Mahadevan <ar...@apache.org>
> wrote:
> >
> >> Nice feature. Is this going to support only rack aware assignments?
> >>
> >> May be nice to make the implementation pluggable (with rack aware being
> >> one) so that other kind of assignment algorithms can be plugged in
> future.
> >>
> >> - Arun
> >>
> >>
> >>
> >> On 1/15/16, 12:22 AM, "Allen Wang" <al...@gmail.com> wrote:
> >>
> >> >Thanks Ismael. KIP is updated to use 0.9.0.0 and add link to the JIRA.
> >> >
> >> >
> >> >On Thu, Jan 14, 2016 at 8:46 AM, Ismael Juma <is...@juma.me.uk>
> wrote:
> >> >
> >> >> On Thu, Jan 14, 2016 at 1:24 AM, Allen Wang <al...@gmail.com>
> >> wrote:
> >> >>
> >> >> > Updated KIP regarding how broker JSON version will be handled and
> new
> >> >> > procedure of upgrade.
> >> >>
> >> >>
> >> >> Thanks Allen. In the following text, I think we should replace 0.9.0
> >> with
> >> >> 0.9.0.0:
> >> >>
> >> >> "Due to a bug introduced in 0.9.0 in ZkUtils.getBrokerInfo(), old
> >> clients
> >> >> will throw an exception when it sees the broker JSON version is not 1
> >> or 2.
> >> >> Therefore, *a minor release 0.9.0.1 is required* to fix the problem
> >> first
> >> >> so that old clients can parse future version of broker JSON in
> >> ZooKeeper.
> >> >> That means 0.9.0 clients must be upgraded to 0.9.0.1 before 0.9.1
> >> upgrade
> >> >> can start. In addition, since ZkUtils.getBrokerInfo() is also used by
> >> >> broker, version specific code has to be used when registering broker
> >> with
> >> >> ZooKeeper"
> >> >>
> >> >> Also, I posted a PR for supporting version > 2 in 0.9.0.1 and trunk:
> >> >>
> >> >> https://github.com/apache/kafka/pull/773
> >> >>
> >> >> Ismael
> >> >>
> >>
> >>
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
In implementing changes to UpdateMetadataRequest, I noticed
that org.apache.kafka.common.protocol.types.STRING does not allow null
value. This creates a problem for rack as it is an optional field for
broker. In Scala, it is declared as Option[String]. I was planning to
transmit the rack as null in the protocol if rack is not configured for the
broker.

There are two options:

- Transmit the rack as empty string if rack is not configured for the
broker. This implies that empty string cannot be used for the rack we need
to do this validation. This is reasonable since empty string for the rack
is most likely a user error and I cannot think of a use case why users
would pick empty string as rack. It does create some inconsistency between
what gets transmitted on the wire vs. the actual value in broker runtime.

- Change STRING to allow null. I think that is also reasonable since
ApiUtils.writeShortString and ApiUtils.readShortString APIs support null.
However, I would like to know if there is any particular reason not to
allow null for STRING.

Any opinions?

Thanks,
Allen


On Wed, Jan 20, 2016 at 1:50 PM, Allen Wang <al...@gmail.com> wrote:

> Hi Arun,
>
> This is about making replica assignment rack aware. It is not about making
> replica assignment algorithm pluggable. I think plug-ability should be
> discussed separately from this KIP.
>
> Thanks,
> Allen
>
>
> On Tue, Jan 19, 2016 at 11:16 PM, Arun Mahadevan <ar...@apache.org> wrote:
>
>> Nice feature. Is this going to support only rack aware assignments?
>>
>> May be nice to make the implementation pluggable (with rack aware being
>> one) so that other kind of assignment algorithms can be plugged in future.
>>
>> - Arun
>>
>>
>>
>> On 1/15/16, 12:22 AM, "Allen Wang" <al...@gmail.com> wrote:
>>
>> >Thanks Ismael. KIP is updated to use 0.9.0.0 and add link to the JIRA.
>> >
>> >
>> >On Thu, Jan 14, 2016 at 8:46 AM, Ismael Juma <is...@juma.me.uk> wrote:
>> >
>> >> On Thu, Jan 14, 2016 at 1:24 AM, Allen Wang <al...@gmail.com>
>> wrote:
>> >>
>> >> > Updated KIP regarding how broker JSON version will be handled and new
>> >> > procedure of upgrade.
>> >>
>> >>
>> >> Thanks Allen. In the following text, I think we should replace 0.9.0
>> with
>> >> 0.9.0.0:
>> >>
>> >> "Due to a bug introduced in 0.9.0 in ZkUtils.getBrokerInfo(), old
>> clients
>> >> will throw an exception when it sees the broker JSON version is not 1
>> or 2.
>> >> Therefore, *a minor release 0.9.0.1 is required* to fix the problem
>> first
>> >> so that old clients can parse future version of broker JSON in
>> ZooKeeper.
>> >> That means 0.9.0 clients must be upgraded to 0.9.0.1 before 0.9.1
>> upgrade
>> >> can start. In addition, since ZkUtils.getBrokerInfo() is also used by
>> >> broker, version specific code has to be used when registering broker
>> with
>> >> ZooKeeper"
>> >>
>> >> Also, I posted a PR for supporting version > 2 in 0.9.0.1 and trunk:
>> >>
>> >> https://github.com/apache/kafka/pull/773
>> >>
>> >> Ismael
>> >>
>>
>>
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Hi Arun,

This is about making replica assignment rack aware. It is not about making
replica assignment algorithm pluggable. I think plug-ability should be
discussed separately from this KIP.

Thanks,
Allen


On Tue, Jan 19, 2016 at 11:16 PM, Arun Mahadevan <ar...@apache.org> wrote:

> Nice feature. Is this going to support only rack aware assignments?
>
> May be nice to make the implementation pluggable (with rack aware being
> one) so that other kind of assignment algorithms can be plugged in future.
>
> - Arun
>
>
>
> On 1/15/16, 12:22 AM, "Allen Wang" <al...@gmail.com> wrote:
>
> >Thanks Ismael. KIP is updated to use 0.9.0.0 and add link to the JIRA.
> >
> >
> >On Thu, Jan 14, 2016 at 8:46 AM, Ismael Juma <is...@juma.me.uk> wrote:
> >
> >> On Thu, Jan 14, 2016 at 1:24 AM, Allen Wang <al...@gmail.com>
> wrote:
> >>
> >> > Updated KIP regarding how broker JSON version will be handled and new
> >> > procedure of upgrade.
> >>
> >>
> >> Thanks Allen. In the following text, I think we should replace 0.9.0
> with
> >> 0.9.0.0:
> >>
> >> "Due to a bug introduced in 0.9.0 in ZkUtils.getBrokerInfo(), old
> clients
> >> will throw an exception when it sees the broker JSON version is not 1
> or 2.
> >> Therefore, *a minor release 0.9.0.1 is required* to fix the problem
> first
> >> so that old clients can parse future version of broker JSON in
> ZooKeeper.
> >> That means 0.9.0 clients must be upgraded to 0.9.0.1 before 0.9.1
> upgrade
> >> can start. In addition, since ZkUtils.getBrokerInfo() is also used by
> >> broker, version specific code has to be used when registering broker
> with
> >> ZooKeeper"
> >>
> >> Also, I posted a PR for supporting version > 2 in 0.9.0.1 and trunk:
> >>
> >> https://github.com/apache/kafka/pull/773
> >>
> >> Ismael
> >>
>
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Arun Mahadevan <ar...@apache.org>.
Nice feature. Is this going to support only rack aware assignments?

May be nice to make the implementation pluggable (with rack aware being one) so that other kind of assignment algorithms can be plugged in future.

- Arun



On 1/15/16, 12:22 AM, "Allen Wang" <al...@gmail.com> wrote:

>Thanks Ismael. KIP is updated to use 0.9.0.0 and add link to the JIRA.
>
>
>On Thu, Jan 14, 2016 at 8:46 AM, Ismael Juma <is...@juma.me.uk> wrote:
>
>> On Thu, Jan 14, 2016 at 1:24 AM, Allen Wang <al...@gmail.com> wrote:
>>
>> > Updated KIP regarding how broker JSON version will be handled and new
>> > procedure of upgrade.
>>
>>
>> Thanks Allen. In the following text, I think we should replace 0.9.0 with
>> 0.9.0.0:
>>
>> "Due to a bug introduced in 0.9.0 in ZkUtils.getBrokerInfo(), old clients
>> will throw an exception when it sees the broker JSON version is not 1 or 2.
>> Therefore, *a minor release 0.9.0.1 is required* to fix the problem first
>> so that old clients can parse future version of broker JSON in ZooKeeper.
>> That means 0.9.0 clients must be upgraded to 0.9.0.1 before 0.9.1 upgrade
>> can start. In addition, since ZkUtils.getBrokerInfo() is also used by
>> broker, version specific code has to be used when registering broker with
>> ZooKeeper"
>>
>> Also, I posted a PR for supporting version > 2 in 0.9.0.1 and trunk:
>>
>> https://github.com/apache/kafka/pull/773
>>
>> Ismael
>>


Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Thanks Ismael. KIP is updated to use 0.9.0.0 and add link to the JIRA.


On Thu, Jan 14, 2016 at 8:46 AM, Ismael Juma <is...@juma.me.uk> wrote:

> On Thu, Jan 14, 2016 at 1:24 AM, Allen Wang <al...@gmail.com> wrote:
>
> > Updated KIP regarding how broker JSON version will be handled and new
> > procedure of upgrade.
>
>
> Thanks Allen. In the following text, I think we should replace 0.9.0 with
> 0.9.0.0:
>
> "Due to a bug introduced in 0.9.0 in ZkUtils.getBrokerInfo(), old clients
> will throw an exception when it sees the broker JSON version is not 1 or 2.
> Therefore, *a minor release 0.9.0.1 is required* to fix the problem first
> so that old clients can parse future version of broker JSON in ZooKeeper.
> That means 0.9.0 clients must be upgraded to 0.9.0.1 before 0.9.1 upgrade
> can start. In addition, since ZkUtils.getBrokerInfo() is also used by
> broker, version specific code has to be used when registering broker with
> ZooKeeper"
>
> Also, I posted a PR for supporting version > 2 in 0.9.0.1 and trunk:
>
> https://github.com/apache/kafka/pull/773
>
> Ismael
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Ismael Juma <is...@juma.me.uk>.
On Thu, Jan 14, 2016 at 1:24 AM, Allen Wang <al...@gmail.com> wrote:

> Updated KIP regarding how broker JSON version will be handled and new
> procedure of upgrade.


Thanks Allen. In the following text, I think we should replace 0.9.0 with
0.9.0.0:

"Due to a bug introduced in 0.9.0 in ZkUtils.getBrokerInfo(), old clients
will throw an exception when it sees the broker JSON version is not 1 or 2.
Therefore, *a minor release 0.9.0.1 is required* to fix the problem first
so that old clients can parse future version of broker JSON in ZooKeeper.
That means 0.9.0 clients must be upgraded to 0.9.0.1 before 0.9.1 upgrade
can start. In addition, since ZkUtils.getBrokerInfo() is also used by
broker, version specific code has to be used when registering broker with
ZooKeeper"

Also, I posted a PR for supporting version > 2 in 0.9.0.1 and trunk:

https://github.com/apache/kafka/pull/773

Ismael

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Updated KIP regarding how broker JSON version will be handled and new
procedure of upgrade.


On Wed, Jan 13, 2016 at 10:58 AM, Allen Wang <al...@gmail.com> wrote:

> Hi Jun,
>
> I feel it is a bit complicated and unconventional to have a major release
> have dependency on a minor release. It would also double the cost of
> releasing.
>
> On the other hand, if we want to skip the minor release and fix
> ZkUtils.getBrokerInfo() in 0.9.1, it would require all 0.9.0 clients to
> upgrade first to 0.9.1 before the upgrading brokers, which is also a bit
> unconventional. It is also problematic since 0.9.1 client will expect rack
> field (even though it may be encoded as null) when it tries to deserialize
> TopicMetadataResponse.
>
> So it looks like we have to have a minor release first.
>
> Thanks,
> Allen
>
>
>
> On Tue, Jan 12, 2016 at 6:27 PM, Jun Rao <ju...@confluent.io> wrote:
>
>> Allen,
>>
>> It's not ideal to add a new field in json without increasing the version.
>> Also, if we don't fix this issue in 0.9.0, if we ever change the version
>> of
>> json in the future, the consumer in 0.9.0 will break after the broker is
>> upgraded to the new release. So, I suggest that we fix the behavior in
>> ZkUtils.getBrokerInfo()
>> in both trunk and 0.9.0 branch. After we release 0.9.0.1, the upgrade path
>> is for the old consumer to be upgraded to 0.9.0.1 before upgrading the
>> broker to 0.9.1 and beyond. This fix can be done in a separate jira.
>>
>> Thanks,
>>
>> Jun
>>
>> On Tue, Jan 12, 2016 at 5:35 PM, Allen Wang <al...@gmail.com> wrote:
>>
>> > Agreed. So it seems that for 0.9.1, the only option is to keep the JSON
>> > version unchanged. But as part of the PR, I can change the behavior of
>> > ZkUtils.getBrokerInfo()
>> > to make it compatible with future JSON versions.
>> >
>> > Thanks,
>> > Allen
>> >
>> >
>> > On Tue, Jan 12, 2016 at 2:57 PM, Jun Rao <ju...@confluent.io> wrote:
>> >
>> > > Hi, Allen,
>> > >
>> > > That's a good point. In 0.9.0.0, the old consumer reads broker info
>> > > directly from ZK and the code throws an exception if the version in
>> json
>> > is
>> > > not 1 or 2. This old consumer will break when we upgrade the broker
>> json
>> > to
>> > > version 3 in ZK in 0.9.1, which will be an issue. We overlooked this
>> > issue
>> > > in 0.9.0.0. The easiest fix is probably not to check the version in
>> > > ZkUtils.getBrokerInfo().
>> > > This way, as long as we are only adding new fields in broker json, we
>> can
>> > > preserve the compatibility.
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> > >
>> > > On Tue, Jan 12, 2016 at 1:52 PM, Allen Wang <al...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi Jun,
>> > > >
>> > > > That's a good suggestion. However, it does not solve the problem for
>> > the
>> > > > clients or thirty party tools that get broker information directly
>> from
>> > > > ZooKeeper.
>> > > >
>> > > > Thanks,
>> > > > Allen
>> > > >
>> > > >
>> > > > On Tue, Jan 12, 2016 at 1:29 PM, Jun Rao <ju...@confluent.io> wrote:
>> > > >
>> > > > > Allen,
>> > > > >
>> > > > > Another way to do this is the following.
>> > > > >
>> > > > > When inter.broker.protocol.version is set to 0.9.0, the broker
>> will
>> > > write
>> > > > > the broker info in ZK using version 2, ignoring the rack info.
>> > > > >
>> > > > > When inter.broker.protocol.version is set to 0.9.1, the broker
>> will
>> > > write
>> > > > > the broker info in ZK using version 3, including the rack info.
>> > > > >
>> > > > > If one follows the upgrade process, after the 2nd round of rolling
>> > > > bounces,
>> > > > > every broker is capable of parsing version 3 of broker info in ZK.
>> > This
>> > > > is
>> > > > > when the rack-aware feature will be used.
>> > > > >
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jun
>> > > > >
>> > > > > On Tue, Jan 12, 2016 at 12:19 PM, Allen Wang <
>> allenxwang@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > > Regarding the JSON version of Broker:
>> > > > > >
>> > > > > > I don't why the ZkUtils.getBrokerInfo() restricts the JSON
>> versions
>> > > it
>> > > > > can
>> > > > > > read. It will throw exception if version is not 1 or 2. Seems
>> to me
>> > > > that
>> > > > > it
>> > > > > > will cause compatibility problem whenever the version needs to
>> be
>> > > > changed
>> > > > > > and make the upgrade path difficult.
>> > > > > >
>> > > > > > One option we have is to make rack also part of version 2 and
>> keep
>> > > the
>> > > > > > version 2 unchanged for this update. This will make the old
>> clients
>> > > > > > compatible. During rolling upgrade, it will also avoid problems
>> if
>> > > the
>> > > > > > controller/broker is still the old version.
>> > > > > >
>> > > > > > However, ZkUtils.getBrokerInfo() will be updated to return the
>> > Broker
>> > > > > with
>> > > > > > rack so the rack information will be available once the
>> > server/client
>> > > > is
>> > > > > > upgraded to the latest version.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Wed, Jan 6, 2016 at 6:28 PM, Allen Wang <
>> allenxwang@gmail.com>
>> > > > wrote:
>> > > > > >
>> > > > > > > Updated KIP according to Jun's comment and included changes to
>> > TMR.
>> > > > > > >
>> > > > > > > On Tue, Jan 5, 2016 at 5:59 PM, Jun Rao <ju...@confluent.io>
>> > wrote:
>> > > > > > >
>> > > > > > >> Hi, Allen,
>> > > > > > >>
>> > > > > > >> A couple of minor comments on the KIP.
>> > > > > > >>
>> > > > > > >> 1. The version of the broker JSON string says 2. It should
>> be 3.
>> > > > > > >>
>> > > > > > >> 2. The new version of UpdateMetadataRequest should be 2,
>> instead
>> > > of
>> > > > 1.
>> > > > > > >> Could you include the full wire protocol of version 2 of
>> > > > > > >> UpdateMetadataRequest and highlight the changed part?
>> > > > > > >>
>> > > > > > >> Thanks,
>> > > > > > >>
>> > > > > > >> Jun
>> > > > > > >>
>> > > > > > >> On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <
>> > allenxwang@gmail.com>
>> > > > > > wrote:
>> > > > > > >>
>> > > > > > >> > Jun and I had a chance to discuss it in a meeting and it is
>> > > agreed
>> > > > > to
>> > > > > > >> > change the TMR in a different patch.
>> > > > > > >> >
>> > > > > > >> > I can change the KIP to include rack in TMR. The essential
>> > > change
>> > > > is
>> > > > > > to
>> > > > > > >> add
>> > > > > > >> > rack into class BrokerEndPoint and make TMR version aware.
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> > On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
>> > > > > > >> > aauradkar@linkedin.com.invalid> wrote:
>> > > > > > >> >
>> > > > > > >> > > Jun/Allen -
>> > > > > > >> > >
>> > > > > > >> > > Did we ever actually agree on whether we should evolve
>> the
>> > TMR
>> > > > to
>> > > > > > >> include
>> > > > > > >> > > rack info or not?
>> > > > > > >> > > I don't feel strongly about it but I if it's the right
>> thing
>> > > to
>> > > > do
>> > > > > > we
>> > > > > > >> > > should probably do it in this KIP (can be a separate
>> > patch)..
>> > > it
>> > > > > > >> isn't a
>> > > > > > >> > > large change.
>> > > > > > >> > >
>> > > > > > >> > > Aditya
>> > > > > > >> > >
>> > > > > > >> > > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <
>> > > > allenxwang@gmail.com
>> > > > > >
>> > > > > > >> > wrote:
>> > > > > > >> > >
>> > > > > > >> > > > Added the rolling upgrade instruction in the KIP,
>> similar
>> > to
>> > > > > those
>> > > > > > >> in
>> > > > > > >> > > 0.9.0
>> > > > > > >> > > > release notes.
>> > > > > > >> > > >
>> > > > > > >> > > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <
>> > > > > > allenxwang@gmail.com>
>> > > > > > >> > > wrote:
>> > > > > > >> > > >
>> > > > > > >> > > > > Hi Jun,
>> > > > > > >> > > > >
>> > > > > > >> > > > > The reason that TopicMetadataResponse is not
>> included in
>> > > the
>> > > > > KIP
>> > > > > > >> is
>> > > > > > >> > > that
>> > > > > > >> > > > > it currently is not version aware . So we need to
>> > > introduce
>> > > > > > >> version
>> > > > > > >> > to
>> > > > > > >> > > it
>> > > > > > >> > > > > in order to make sure backward compatibility. It
>> seems
>> > to
>> > > > me a
>> > > > > > big
>> > > > > > >> > > > change.
>> > > > > > >> > > > > Do we want to couple it with this KIP? Do we need to
>> > > further
>> > > > > > >> discuss
>> > > > > > >> > > what
>> > > > > > >> > > > > information to include in the new version besides
>> rack?
>> > > For
>> > > > > > >> example,
>> > > > > > >> > > > should
>> > > > > > >> > > > > we include broker security protocol in
>> > > > TopicMetadataResponse?
>> > > > > > >> > > > >
>> > > > > > >> > > > > The other option is to make it a separate KIP to make
>> > > > > > >> > > > > TopicMetadataResponse version aware and decide what
>> to
>> > > > > include,
>> > > > > > >> and
>> > > > > > >> > > make
>> > > > > > >> > > > > this KIP focus on the rack aware algorithm, admin
>> tools
>> > > and
>> > > > > > >> related
>> > > > > > >> > > > > changes to inter-broker protocol .
>> > > > > > >> > > > >
>> > > > > > >> > > > > Thanks,
>> > > > > > >> > > > > Allen
>> > > > > > >> > > > >
>> > > > > > >> > > > >
>> > > > > > >> > > > >
>> > > > > > >> > > > >
>> > > > > > >> > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <
>> > > jun@confluent.io>
>> > > > > > >> wrote:
>> > > > > > >> > > > >
>> > > > > > >> > > > >> Allen,
>> > > > > > >> > > > >>
>> > > > > > >> > > > >> Thanks for the proposal. A few comments.
>> > > > > > >> > > > >>
>> > > > > > >> > > > >> 1. Since this KIP changes the inter broker
>> > communication
>> > > > > > protocol
>> > > > > > >> > > > >> (UpdateMetadataRequest), we will need to document
>> the
>> > > > upgrade
>> > > > > > >> path
>> > > > > > >> > > > >> (similar
>> > > > > > >> > > > >> to what's described in
>> > > > > > >> > > > >>
>> http://kafka.apache.org/090/documentation.html#upgrade
>> > ).
>> > > > > > >> > > > >>
>> > > > > > >> > > > >> 2. It might be useful to include the rack info of
>> the
>> > > > broker
>> > > > > in
>> > > > > > >> > > > >> TopicMetadataResponse. This can be useful for
>> > > > administrative
>> > > > > > >> tasks,
>> > > > > > >> > as
>> > > > > > >> > > > >> well
>> > > > > > >> > > > >> as read affinity in the future.
>> > > > > > >> > > > >>
>> > > > > > >> > > > >> Jun
>> > > > > > >> > > > >>
>> > > > > > >> > > > >>
>> > > > > > >> > > > >>
>> > > > > > >> > > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <
>> > > > > > >> allenxwang@gmail.com>
>> > > > > > >> > > > wrote:
>> > > > > > >> > > > >>
>> > > > > > >> > > > >> > If there are no more comments I would like to call
>> > for
>> > > a
>> > > > > > vote.
>> > > > > > >> > > > >> >
>> > > > > > >> > > > >> >
>> > > > > > >> > > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <
>> > > > > > >> > allenxwang@gmail.com>
>> > > > > > >> > > > >> wrote:
>> > > > > > >> > > > >> >
>> > > > > > >> > > > >> > > KIP is updated with more details and how to
>> handle
>> > > the
>> > > > > > >> situation
>> > > > > > >> > > > where
>> > > > > > >> > > > >> > > rack information is incomplete.
>> > > > > > >> > > > >> > >
>> > > > > > >> > > > >> > > In the situation where rack information is
>> > > incomplete,
>> > > > > but
>> > > > > > we
>> > > > > > >> > want
>> > > > > > >> > > > to
>> > > > > > >> > > > >> > > continue with the assignment, I have suggested
>> to
>> > > > ignore
>> > > > > > all
>> > > > > > >> > rack
>> > > > > > >> > > > >> > > information and fallback to original algorithm.
>> The
>> > > > > reason
>> > > > > > is
>> > > > > > >> > > > >> explained
>> > > > > > >> > > > >> > > below:
>> > > > > > >> > > > >> > >
>> > > > > > >> > > > >> > > The other options are to assume that the broker
>> > > without
>> > > > > the
>> > > > > > >> rack
>> > > > > > >> > > > >> belong
>> > > > > > >> > > > >> > to
>> > > > > > >> > > > >> > > its own unique rack, or they belong to one
>> > "default"
>> > > > > rack.
>> > > > > > >> > Either
>> > > > > > >> > > > way
>> > > > > > >> > > > >> we
>> > > > > > >> > > > >> > > choose, it is highly likely to result in uneven
>> > > number
>> > > > of
>> > > > > > >> > brokers
>> > > > > > >> > > in
>> > > > > > >> > > > >> > racks,
>> > > > > > >> > > > >> > > and it is quite possible that the "made up"
>> racks
>> > > will
>> > > > > have
>> > > > > > >> much
>> > > > > > >> > > > fewer
>> > > > > > >> > > > >> > > number of brokers. As I explained in the KIP,
>> > uneven
>> > > > > number
>> > > > > > >> of
>> > > > > > >> > > > >> brokers in
>> > > > > > >> > > > >> > > racks will lead to uneven distribution of
>> replicas
>> > > > among
>> > > > > > >> brokers
>> > > > > > >> > > > (even
>> > > > > > >> > > > >> > > though the leader distribution is still even).
>> The
>> > > > > brokers
>> > > > > > in
>> > > > > > >> > the
>> > > > > > >> > > > rack
>> > > > > > >> > > > >> > that
>> > > > > > >> > > > >> > > has fewer number of brokers will get more
>> replicas
>> > > per
>> > > > > > broker
>> > > > > > >> > than
>> > > > > > >> > > > >> > brokers
>> > > > > > >> > > > >> > > in other racks.
>> > > > > > >> > > > >> > >
>> > > > > > >> > > > >> > > Given this fact and the replica assignment
>> produced
>> > > > will
>> > > > > be
>> > > > > > >> > > > incorrect
>> > > > > > >> > > > >> > > anyway from rack aware point of view, ignoring
>> all
>> > > rack
>> > > > > > >> > > information
>> > > > > > >> > > > >> and
>> > > > > > >> > > > >> > > fallback to the original algorithm is not a bad
>> > > choice
>> > > > > > since
>> > > > > > >> it
>> > > > > > >> > > will
>> > > > > > >> > > > >> at
>> > > > > > >> > > > >> > > least have a better guarantee of replica
>> > > distribution.
>> > > > > > >> > > > >> > >
>> > > > > > >> > > > >> > > Also for command line tools it gives user a
>> choice
>> > if
>> > > > for
>> > > > > > any
>> > > > > > >> > > reason
>> > > > > > >> > > > >> they
>> > > > > > >> > > > >> > > want to ignore rack information and fallback to
>> the
>> > > > > > original
>> > > > > > >> > > > >> algorithm.
>> > > > > > >> > > > >> > >
>> > > > > > >> > > > >> > >
>> > > > > > >> > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <
>> > > > > > >> > allenxwang@gmail.com
>> > > > > > >> > > >
>> > > > > > >> > > > >> > wrote:
>> > > > > > >> > > > >> > >
>> > > > > > >> > > > >> > >> I am busy with some time pressing issues for
>> the
>> > > last
>> > > > > few
>> > > > > > >> > days. I
>> > > > > > >> > > > >> will
>> > > > > > >> > > > >> > >> think about how the incomplete rack information
>> > will
>> > > > > > affect
>> > > > > > >> the
>> > > > > > >> > > > >> balance
>> > > > > > >> > > > >> > and
>> > > > > > >> > > > >> > >> update the KIP by early next week.
>> > > > > > >> > > > >> > >>
>> > > > > > >> > > > >> > >> Thanks,
>> > > > > > >> > > > >> > >> Allen
>> > > > > > >> > > > >> > >>
>> > > > > > >> > > > >> > >>
>> > > > > > >> > > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <
>> > > > > > >> > neha@confluent.io
>> > > > > > >> > > >
>> > > > > > >> > > > >> > wrote:
>> > > > > > >> > > > >> > >>
>> > > > > > >> > > > >> > >>> Few suggestions on improving the KIP
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>> *If some brokers have rack, and some do not,
>> the
>> > > > > > algorithm
>> > > > > > >> > will
>> > > > > > >> > > > >> thrown
>> > > > > > >> > > > >> > an
>> > > > > > >> > > > >> > >>> > exception. This is to prevent incorrect
>> > > assignment
>> > > > > > >> caused by
>> > > > > > >> > > > user
>> > > > > > >> > > > >> > >>> error.*
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>> In the KIP, can you clearly state the
>> user-facing
>> > > > > > behavior
>> > > > > > >> > when
>> > > > > > >> > > > some
>> > > > > > >> > > > >> > >>> brokers have rack information and some don't.
>> > Which
>> > > > > > actions
>> > > > > > >> > and
>> > > > > > >> > > > >> > requests
>> > > > > > >> > > > >> > >>> will error out and how?
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>> *Even distribution of partition leadership
>> among
>> > > > > brokers*
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>> There is some information about arranging the
>> > > sorted
>> > > > > > broker
>> > > > > > >> > list
>> > > > > > >> > > > >> > >>> interlaced
>> > > > > > >> > > > >> > >>> with rack ids. Can you describe the changes to
>> > the
>> > > > > > current
>> > > > > > >> > > > algorithm
>> > > > > > >> > > > >> > in a
>> > > > > > >> > > > >> > >>> little more detail? How does this interlacing
>> > work
>> > > if
>> > > > > > only
>> > > > > > >> a
>> > > > > > >> > > > subset
>> > > > > > >> > > > >> of
>> > > > > > >> > > > >> > >>> brokers have the rack id configured? Does this
>> > > still
>> > > > > work
>> > > > > > >> if
>> > > > > > >> > > > uneven
>> > > > > > >> > > > >> #
>> > > > > > >> > > > >> > of
>> > > > > > >> > > > >> > >>> brokers are assigned to each rack? It might
>> work,
>> > > I'm
>> > > > > > >> looking
>> > > > > > >> > > for
>> > > > > > >> > > > >> more
>> > > > > > >> > > > >> > >>> details on the changes, since it will affect
>> the
>> > > > > behavior
>> > > > > > >> seen
>> > > > > > >> > > by
>> > > > > > >> > > > >> the
>> > > > > > >> > > > >> > >>> user
>> > > > > > >> > > > >> > >>> - imbalance on either the leaders or data or
>> > both.
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya
>> Auradkar <
>> > > > > > >> > > > >> > aauradkar@linkedin.com>
>> > > > > > >> > > > >> > >>> wrote:
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>> > I think this sounds reasonable. Anyone else
>> > have
>> > > > > > >> comments?
>> > > > > > >> > > > >> > >>> >
>> > > > > > >> > > > >> > >>> > Aditya
>> > > > > > >> > > > >> > >>> >
>> > > > > > >> > > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang
>> <
>> > > > > > >> > > > allenxwang@gmail.com
>> > > > > > >> > > > >> >
>> > > > > > >> > > > >> > >>> wrote:
>> > > > > > >> > > > >> > >>> >
>> > > > > > >> > > > >> > >>> > > During the discussion in the hangout, it
>> was
>> > > > > > mentioned
>> > > > > > >> > that
>> > > > > > >> > > it
>> > > > > > >> > > > >> > would
>> > > > > > >> > > > >> > >>> be
>> > > > > > >> > > > >> > >>> > > desirable that consumers know the rack
>> > > > information
>> > > > > of
>> > > > > > >> the
>> > > > > > >> > > > >> brokers
>> > > > > > >> > > > >> > so
>> > > > > > >> > > > >> > >>> that
>> > > > > > >> > > > >> > >>> > > they can consume from the broker in the
>> same
>> > > rack
>> > > > > to
>> > > > > > >> > reduce
>> > > > > > >> > > > >> > latency.
>> > > > > > >> > > > >> > >>> As I
>> > > > > > >> > > > >> > >>> > > understand this will only be beneficial if
>> > > > consumer
>> > > > > > can
>> > > > > > >> > > > consume
>> > > > > > >> > > > >> > from
>> > > > > > >> > > > >> > >>> any
>> > > > > > >> > > > >> > >>> > > broker in ISR, which is not possible now.
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> > > I suggest we skip the change to TMR. Once
>> the
>> > > > > change
>> > > > > > is
>> > > > > > >> > made
>> > > > > > >> > > > to
>> > > > > > >> > > > >> > >>> consumer
>> > > > > > >> > > > >> > >>> > to
>> > > > > > >> > > > >> > >>> > > be able to consume from any broker in ISR,
>> > the
>> > > > rack
>> > > > > > >> > > > information
>> > > > > > >> > > > >> can
>> > > > > > >> > > > >> > >>> be
>> > > > > > >> > > > >> > >>> > > added to TMR.
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> > > Another thing I want to confirm is
>> command
>> > > line
>> > > > > > >> > behavior. I
>> > > > > > >> > > > >> think
>> > > > > > >> > > > >> > >>> the
>> > > > > > >> > > > >> > >>> > > desirable default behavior is to fail
>> fast on
>> > > > > command
>> > > > > > >> line
>> > > > > > >> > > for
>> > > > > > >> > > > >> > >>> incomplete
>> > > > > > >> > > > >> > >>> > > rack mapping. The error message can
>> include
>> > > > further
>> > > > > > >> > > > instruction
>> > > > > > >> > > > >> > that
>> > > > > > >> > > > >> > >>> > tells
>> > > > > > >> > > > >> > >>> > > the user to add an extra argument (like
>> > > > > > >> > > > >> "--allow-partial-rackinfo")
>> > > > > > >> > > > >> > >>> to
>> > > > > > >> > > > >> > >>> > > suppress the error and do an imperfect
>> rack
>> > > aware
>> > > > > > >> > > assignment.
>> > > > > > >> > > > If
>> > > > > > >> > > > >> > the
>> > > > > > >> > > > >> > >>> > > default behavior is to allow incomplete
>> > > mapping,
>> > > > > the
>> > > > > > >> error
>> > > > > > >> > > can
>> > > > > > >> > > > >> > still
>> > > > > > >> > > > >> > >>> be
>> > > > > > >> > > > >> > >>> > > easily missed.
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> > > The affected command line tools are
>> > > TopicCommand
>> > > > > and
>> > > > > > >> > > > >> > >>> > > ReassignPartitionsCommand.
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> > > Thanks,
>> > > > > > >> > > > >> > >>> > > Allen
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya
>> > > > Auradkar <
>> > > > > > >> > > > >> > >>> > aauradkar@linkedin.com>
>> > > > > > >> > > > >> > >>> > > wrote:
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> > > > Hi Allen,
>> > > > > > >> > > > >> > >>> > > >
>> > > > > > >> > > > >> > >>> > > > For TopicMetadataResponse to understand
>> > > > version,
>> > > > > > you
>> > > > > > >> can
>> > > > > > >> > > > bump
>> > > > > > >> > > > >> up
>> > > > > > >> > > > >> > >>> the
>> > > > > > >> > > > >> > >>> > > > request version itself. Based on the
>> > version
>> > > of
>> > > > > the
>> > > > > > >> > > request,
>> > > > > > >> > > > >> the
>> > > > > > >> > > > >> > >>> > response
>> > > > > > >> > > > >> > >>> > > > can be appropriately serialized. It
>> > shouldn't
>> > > > be
>> > > > > a
>> > > > > > >> huge
>> > > > > > >> > > > >> change.
>> > > > > > >> > > > >> > For
>> > > > > > >> > > > >> > >>> > > > example: We went through something
>> similar
>> > > for
>> > > > > > >> > > > ProduceRequest
>> > > > > > >> > > > >> > >>> recently
>> > > > > > >> > > > >> > >>> > (
>> > > > > > >> > > > >> > >>> > > > https://reviews.apache.org/r/33378/)
>> > > > > > >> > > > >> > >>> > > > I guess the reason protocol information
>> is
>> > > not
>> > > > > > >> included
>> > > > > > >> > in
>> > > > > > >> > > > the
>> > > > > > >> > > > >> > TMR
>> > > > > > >> > > > >> > >>> is
>> > > > > > >> > > > >> > >>> > > > because the topic itself is independent
>> of
>> > > any
>> > > > > > >> > particular
>> > > > > > >> > > > >> > protocol
>> > > > > > >> > > > >> > >>> (SSL
>> > > > > > >> > > > >> > >>> > > vs
>> > > > > > >> > > > >> > >>> > > > Plaintext). Having said that, I'm not
>> sure
>> > we
>> > > > > even
>> > > > > > >> need
>> > > > > > >> > > rack
>> > > > > > >> > > > >> > >>> > information
>> > > > > > >> > > > >> > >>> > > in
>> > > > > > >> > > > >> > >>> > > > TMR. What usecase were you thinking of
>> > > > initially?
>> > > > > > >> > > > >> > >>> > > >
>> > > > > > >> > > > >> > >>> > > > For 1 - I'd be fine with adding an
>> option
>> > to
>> > > > the
>> > > > > > >> command
>> > > > > > >> > > > line
>> > > > > > >> > > > >> > tools
>> > > > > > >> > > > >> > >>> > that
>> > > > > > >> > > > >> > >>> > > > check rack assignment. For e.g.
>> > > > > > >> "--strict-assignment" or
>> > > > > > >> > > > >> > something
>> > > > > > >> > > > >> > >>> > > similar.
>> > > > > > >> > > > >> > >>> > > >
>> > > > > > >> > > > >> > >>> > > > Aditya
>> > > > > > >> > > > >> > >>> > > >
>> > > > > > >> > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen
>> > Wang <
>> > > > > > >> > > > >> > allenxwang@gmail.com>
>> > > > > > >> > > > >> > >>> > > wrote:
>> > > > > > >> > > > >> > >>> > > >
>> > > > > > >> > > > >> > >>> > > > > For 2 and 3, I have updated the KIP.
>> > Please
>> > > > > take
>> > > > > > a
>> > > > > > >> > look.
>> > > > > > >> > > > One
>> > > > > > >> > > > >> > >>> thing I
>> > > > > > >> > > > >> > >>> > > have
>> > > > > > >> > > > >> > >>> > > > > changed is removing the proposal to
>> add
>> > > rack
>> > > > to
>> > > > > > >> > > > >> > >>> > TopicMetadataResponse.
>> > > > > > >> > > > >> > >>> > > > The
>> > > > > > >> > > > >> > >>> > > > > reason is that unlike
>> > > UpdateMetadataRequest,
>> > > > > > >> > > > >> > >>> TopicMetadataResponse
>> > > > > > >> > > > >> > >>> > does
>> > > > > > >> > > > >> > >>> > > > not
>> > > > > > >> > > > >> > >>> > > > > understand version. I don't see a way
>> to
>> > > > > include
>> > > > > > >> rack
>> > > > > > >> > > > >> without
>> > > > > > >> > > > >> > >>> > breaking
>> > > > > > >> > > > >> > >>> > > > old
>> > > > > > >> > > > >> > >>> > > > > version of clients. That's probably
>> why
>> > > > secure
>> > > > > > >> > protocol
>> > > > > > >> > > is
>> > > > > > >> > > > >> not
>> > > > > > >> > > > >> > >>> > included
>> > > > > > >> > > > >> > >>> > > > in
>> > > > > > >> > > > >> > >>> > > > > the TopicMetadataResponse either. I
>> think
>> > > it
>> > > > > will
>> > > > > > >> be a
>> > > > > > >> > > > much
>> > > > > > >> > > > >> > >>> bigger
>> > > > > > >> > > > >> > >>> > > change
>> > > > > > >> > > > >> > >>> > > > > to include rack in
>> TopicMetadataResponse.
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > For 1, my concern is that doing rack
>> > aware
>> > > > > > >> assignment
>> > > > > > >> > > > >> without
>> > > > > > >> > > > >> > >>> > complete
>> > > > > > >> > > > >> > >>> > > > > broker to rack mapping will result in
>> > > > > assignment
>> > > > > > >> that
>> > > > > > >> > is
>> > > > > > >> > > > not
>> > > > > > >> > > > >> > rack
>> > > > > > >> > > > >> > >>> > aware
>> > > > > > >> > > > >> > >>> > > > and
>> > > > > > >> > > > >> > >>> > > > > fail to provide fault tolerance in the
>> > > event
>> > > > of
>> > > > > > >> rack
>> > > > > > >> > > > outage.
>> > > > > > >> > > > >> > This
>> > > > > > >> > > > >> > >>> > kind
>> > > > > > >> > > > >> > >>> > > of
>> > > > > > >> > > > >> > >>> > > > > problem will be difficult to surface.
>> And
>> > > the
>> > > > > > cost
>> > > > > > >> of
>> > > > > > >> > > this
>> > > > > > >> > > > >> > >>> problem is
>> > > > > > >> > > > >> > >>> > > > high:
>> > > > > > >> > > > >> > >>> > > > > you have to do partition reassignment
>> if
>> > > you
>> > > > > are
>> > > > > > >> lucky
>> > > > > > >> > > to
>> > > > > > >> > > > >> spot
>> > > > > > >> > > > >> > >>> the
>> > > > > > >> > > > >> > >>> > > > problem
>> > > > > > >> > > > >> > >>> > > > > early on or face the consequence of
>> data
>> > > loss
>> > > > > > >> during
>> > > > > > >> > > real
>> > > > > > >> > > > >> rack
>> > > > > > >> > > > >> > >>> > outage.
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > I do see the concern of fail-fast as
>> it
>> > > might
>> > > > > > also
>> > > > > > >> > cause
>> > > > > > >> > > > >> data
>> > > > > > >> > > > >> > >>> loss if
>> > > > > > >> > > > >> > >>> > > > > producer is not able produce the
>> message
>> > > due
>> > > > to
>> > > > > > >> topic
>> > > > > > >> > > > >> creation
>> > > > > > >> > > > >> > >>> > failure.
>> > > > > > >> > > > >> > >>> > > > Is
>> > > > > > >> > > > >> > >>> > > > > it feasible to treat dynamic topic
>> > creation
>> > > > and
>> > > > > > >> > command
>> > > > > > >> > > > >> tools
>> > > > > > >> > > > >> > >>> > > > differently?
>> > > > > > >> > > > >> > >>> > > > > We allow dynamic topic creation with
>> > > > incomplete
>> > > > > > >> > > > broker-rack
>> > > > > > >> > > > >> > >>> mapping
>> > > > > > >> > > > >> > >>> > and
>> > > > > > >> > > > >> > >>> > > > > fail fast in command line. Another
>> option
>> > > is
>> > > > to
>> > > > > > let
>> > > > > > >> > user
>> > > > > > >> > > > >> > >>> determine
>> > > > > > >> > > > >> > >>> > the
>> > > > > > >> > > > >> > >>> > > > > behavior for command line. For
>> example,
>> > by
>> > > > > > default
>> > > > > > >> > fail
>> > > > > > >> > > > >> fast in
>> > > > > > >> > > > >> > >>> > command
>> > > > > > >> > > > >> > >>> > > > > line but allow incomplete broker-rack
>> > > mapping
>> > > > > if
>> > > > > > >> > another
>> > > > > > >> > > > >> switch
>> > > > > > >> > > > >> > >>> is
>> > > > > > >> > > > >> > >>> > > > > provided.
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM,
>> Aditya
>> > > > > > Auradkar <
>> > > > > > >> > > > >> > >>> > > > > aauradkar@linkedin.com.invalid>
>> wrote:
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > Hey Allen,
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > 1. If we choose fail fast topic
>> > creation,
>> > > > we
>> > > > > > will
>> > > > > > >> > have
>> > > > > > >> > > > >> topic
>> > > > > > >> > > > >> > >>> > creation
>> > > > > > >> > > > >> > >>> > > > > > failures while upgrading the
>> cluster. I
>> > > > > really
>> > > > > > >> doubt
>> > > > > > >> > > we
>> > > > > > >> > > > >> want
>> > > > > > >> > > > >> > >>> this
>> > > > > > >> > > > >> > >>> > > > > behavior.
>> > > > > > >> > > > >> > >>> > > > > > Ideally, this should be invisible to
>> > > > clients
>> > > > > > of a
>> > > > > > >> > > > cluster.
>> > > > > > >> > > > >> > >>> > Currently,
>> > > > > > >> > > > >> > >>> > > > > each
>> > > > > > >> > > > >> > >>> > > > > > broker is effectively its own rack.
>> So
>> > we
>> > > > > > >> probably
>> > > > > > >> > can
>> > > > > > >> > > > use
>> > > > > > >> > > > >> > the
>> > > > > > >> > > > >> > >>> rack
>> > > > > > >> > > > >> > >>> > > > > > information whenever possible but
>> not
>> > > make
>> > > > > it a
>> > > > > > >> hard
>> > > > > > >> > > > >> > >>> requirement.
>> > > > > > >> > > > >> > >>> > To
>> > > > > > >> > > > >> > >>> > > > > extend
>> > > > > > >> > > > >> > >>> > > > > > Gwen's example, one badly configured
>> > > broker
>> > > > > > >> should
>> > > > > > >> > not
>> > > > > > >> > > > >> > degrade
>> > > > > > >> > > > >> > >>> > topic
>> > > > > > >> > > > >> > >>> > > > > > creation for the entire cluster.
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a
>> > > section
>> > > > > on
>> > > > > > >> the
>> > > > > > >> > > > upgrade
>> > > > > > >> > > > >> > >>> piece to
>> > > > > > >> > > > >> > >>> > > > > confirm
>> > > > > > >> > > > >> > >>> > > > > > that old clients will not see
>> errors? I
>> > > > > believe
>> > > > > > >> > > > >> > >>> > > > > ZookeeperConsumerConnector
>> > > > > > >> > > > >> > >>> > > > > > reads the Broker objects from ZK. I
>> > > wanted
>> > > > to
>> > > > > > >> > confirm
>> > > > > > >> > > > that
>> > > > > > >> > > > >> > this
>> > > > > > >> > > > >> > >>> > will
>> > > > > > >> > > > >> > >>> > > > not
>> > > > > > >> > > > >> > >>> > > > > > cause any problems.
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > 3. Could you elaborate your proposed
>> > > > changes
>> > > > > to
>> > > > > > >> the
>> > > > > > >> > > > >> > >>> > > > UpdateMetadataRequest
>> > > > > > >> > > > >> > >>> > > > > > in the "Public Interfaces" section?
>> > > > > > Personally, I
>> > > > > > >> > find
>> > > > > > >> > > > >> this
>> > > > > > >> > > > >> > >>> format
>> > > > > > >> > > > >> > >>> > > easy
>> > > > > > >> > > > >> > >>> > > > > to
>> > > > > > >> > > > >> > >>> > > > > > read in terms of wire protocol
>> changes:
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > >
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> >
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> >
>> > > > > > >> > > > >>
>> > > > > > >> > > >
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > Aditya
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM,
>> Allen
>> > > > Wang <
>> > > > > > >> > > > >> > >>> allenxwang@gmail.com>
>> > > > > > >> > > > >> > >>> > > > > wrote:
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > KIP is updated include rack as an
>> > > > optional
>> > > > > > >> > property
>> > > > > > >> > > > for
>> > > > > > >> > > > >> > >>> broker.
>> > > > > > >> > > > >> > >>> > > > Please
>> > > > > > >> > > > >> > >>> > > > > > take
>> > > > > > >> > > > >> > >>> > > > > > > a look and let me know if more
>> > details
>> > > > are
>> > > > > > >> needed.
>> > > > > > >> > > > >> > >>> > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > For the case where some brokers
>> have
>> > > rack
>> > > > > and
>> > > > > > >> some
>> > > > > > >> > > do
>> > > > > > >> > > > >> not,
>> > > > > > >> > > > >> > >>> the
>> > > > > > >> > > > >> > >>> > > > current
>> > > > > > >> > > > >> > >>> > > > > > KIP
>> > > > > > >> > > > >> > >>> > > > > > > uses the fail-fast behavior. If
>> there
>> > > are
>> > > > > > >> > concerns,
>> > > > > > >> > > we
>> > > > > > >> > > > >> can
>> > > > > > >> > > > >> > >>> > further
>> > > > > > >> > > > >> > >>> > > > > > discuss
>> > > > > > >> > > > >> > >>> > > > > > > this in the email thread or next
>> > > hangout.
>> > > > > > >> > > > >> > >>> > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM,
>> > Allen
>> > > > > Wang
>> > > > > > <
>> > > > > > >> > > > >> > >>> > allenxwang@gmail.com
>> > > > > > >> > > > >> > >>> > > >
>> > > > > > >> > > > >> > >>> > > > > > wrote:
>> > > > > > >> > > > >> > >>> > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > > That's a good question. I can
>> think
>> > > of
>> > > > > > three
>> > > > > > >> > > actions
>> > > > > > >> > > > >> if
>> > > > > > >> > > > >> > the
>> > > > > > >> > > > >> > >>> > rack
>> > > > > > >> > > > >> > >>> > > > > > > > information is incomplete:
>> > > > > > >> > > > >> > >>> > > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > > 1. Treat the node without rack
>> as
>> > if
>> > > it
>> > > > > is
>> > > > > > on
>> > > > > > >> > its
>> > > > > > >> > > > >> unique
>> > > > > > >> > > > >> > >>> rack
>> > > > > > >> > > > >> > >>> > > > > > > > 2. Disregard all rack
>> information
>> > and
>> > > > > > >> fallback
>> > > > > > >> > to
>> > > > > > >> > > > >> current
>> > > > > > >> > > > >> > >>> > > algorithm
>> > > > > > >> > > > >> > >>> > > > > > > > 3. Fail-fast
>> > > > > > >> > > > >> > >>> > > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > > Now I think about it, one and
>> three
>> > > > make
>> > > > > > more
>> > > > > > >> > > sense.
>> > > > > > >> > > > >> The
>> > > > > > >> > > > >> > >>> reason
>> > > > > > >> > > > >> > >>> > > for
>> > > > > > >> > > > >> > >>> > > > > > > > fail-fast is that user mistake
>> for
>> > > not
>> > > > > > >> providing
>> > > > > > >> > > the
>> > > > > > >> > > > >> rack
>> > > > > > >> > > > >> > >>> may
>> > > > > > >> > > > >> > >>> > > never
>> > > > > > >> > > > >> > >>> > > > > be
>> > > > > > >> > > > >> > >>> > > > > > > > found if we tolerate that and
>> the
>> > > > > > assignment
>> > > > > > >> may
>> > > > > > >> > > not
>> > > > > > >> > > > >> be
>> > > > > > >> > > > >> > >>> rack
>> > > > > > >> > > > >> > >>> > > aware
>> > > > > > >> > > > >> > >>> > > > as
>> > > > > > >> > > > >> > >>> > > > > > the
>> > > > > > >> > > > >> > >>> > > > > > > > user has expected and this
>> creates
>> > > > debug
>> > > > > > >> > problems
>> > > > > > >> > > > when
>> > > > > > >> > > > >> > >>> things
>> > > > > > >> > > > >> > >>> > > fail.
>> > > > > > >> > > > >> > >>> > > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > > What do you think? If not
>> > fail-fast,
>> > > is
>> > > > > > there
>> > > > > > >> > > anyway
>> > > > > > >> > > > >> we
>> > > > > > >> > > > >> > can
>> > > > > > >> > > > >> > >>> > make
>> > > > > > >> > > > >> > >>> > > > the
>> > > > > > >> > > > >> > >>> > > > > > user
>> > > > > > >> > > > >> > >>> > > > > > > > error standing out?
>> > > > > > >> > > > >> > >>> > > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17
>> AM,
>> > > Gwen
>> > > > > > >> Shapira <
>> > > > > > >> > > > >> > >>> > > gwen@confluent.io>
>> > > > > > >> > > > >> > >>> > > > > > > wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when
>> some
>> > > > > brokers
>> > > > > > >> have
>> > > > > > >> > > > rack
>> > > > > > >> > > > >> > >>> > assignment
>> > > > > > >> > > > >> > >>> > > > and
>> > > > > > >> > > > >> > >>> > > > > > some
>> > > > > > >> > > > >> > >>> > > > > > > >> don't, do we act like none of
>> them
>> > > > have
>> > > > > > it?
>> > > > > > >> or
>> > > > > > >> > > like
>> > > > > > >> > > > >> > those
>> > > > > > >> > > > >> > >>> > > without
>> > > > > > >> > > > >> > >>> > > > > > > >> assignment are in their own
>> rack?
>> > > > > > >> > > > >> > >>> > > > > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> The first scenario is good when
>> > > first
>> > > > > > >> setting
>> > > > > > >> > up
>> > > > > > >> > > > >> > >>> > rack-awareness,
>> > > > > > >> > > > >> > >>> > > > but
>> > > > > > >> > > > >> > >>> > > > > > the
>> > > > > > >> > > > >> > >>> > > > > > > >> second makes more sense for
>> > on-going
>> > > > > > >> > maintenance
>> > > > > > >> > > (I
>> > > > > > >> > > > >> can
>> > > > > > >> > > > >> > >>> > totally
>> > > > > > >> > > > >> > >>> > > > see
>> > > > > > >> > > > >> > >>> > > > > > > >> someone
>> > > > > > >> > > > >> > >>> > > > > > > >> adding a node and forgetting to
>> > set
>> > > > the
>> > > > > > rack
>> > > > > > >> > > > >> property,
>> > > > > > >> > > > >> > we
>> > > > > > >> > > > >> > >>> > don't
>> > > > > > >> > > > >> > >>> > > > want
>> > > > > > >> > > > >> > >>> > > > > > > this
>> > > > > > >> > > > >> > >>> > > > > > > >> to change behavior for anything
>> > > except
>> > > > > the
>> > > > > > >> new
>> > > > > > >> > > > node).
>> > > > > > >> > > > >> > >>> > > > > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> What do you think?
>> > > > > > >> > > > >> > >>> > > > > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> Gwen
>> > > > > > >> > > > >> > >>> > > > > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13
>> AM,
>> > > > Allen
>> > > > > > >> Wang <
>> > > > > > >> > > > >> > >>> > > > allenxwang@gmail.com>
>> > > > > > >> > > > >> > >>> > > > > > > >> wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> > For scenario 1:
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > - Add the rack information to
>> > > broker
>> > > > > > >> property
>> > > > > > >> > > > file
>> > > > > > >> > > > >> or
>> > > > > > >> > > > >> > >>> > > > dynamically
>> > > > > > >> > > > >> > >>> > > > > > set
>> > > > > > >> > > > >> > >>> > > > > > > >> it in
>> > > > > > >> > > > >> > >>> > > > > > > >> > the wrapper code to bootstrap
>> > > Kafka
>> > > > > > >> server.
>> > > > > > >> > You
>> > > > > > >> > > > >> would
>> > > > > > >> > > > >> > do
>> > > > > > >> > > > >> > >>> > that
>> > > > > > >> > > > >> > >>> > > > for
>> > > > > > >> > > > >> > >>> > > > > > all
>> > > > > > >> > > > >> > >>> > > > > > > >> > brokers and restart the
>> brokers
>> > > one
>> > > > by
>> > > > > > >> one.
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > In this scenario, the
>> complete
>> > > > broker
>> > > > > to
>> > > > > > >> rack
>> > > > > > >> > > > >> mapping
>> > > > > > >> > > > >> > >>> may
>> > > > > > >> > > > >> > >>> > not
>> > > > > > >> > > > >> > >>> > > be
>> > > > > > >> > > > >> > >>> > > > > > > >> available
>> > > > > > >> > > > >> > >>> > > > > > > >> > until every broker is
>> restarted.
>> > > > > During
>> > > > > > >> that
>> > > > > > >> > > time
>> > > > > > >> > > > >> we
>> > > > > > >> > > > >> > >>> fall
>> > > > > > >> > > > >> > >>> > back
>> > > > > > >> > > > >> > >>> > > > to
>> > > > > > >> > > > >> > >>> > > > > > > >> default
>> > > > > > >> > > > >> > >>> > > > > > > >> > replica assignment algorithm.
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > For scenario 2:
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > - Add the rack information to
>> > > broker
>> > > > > > >> property
>> > > > > > >> > > > file
>> > > > > > >> > > > >> or
>> > > > > > >> > > > >> > >>> > > > dynamically
>> > > > > > >> > > > >> > >>> > > > > > set
>> > > > > > >> > > > >> > >>> > > > > > > >> it in
>> > > > > > >> > > > >> > >>> > > > > > > >> > the wrapper code and start
>> the
>> > > > broker.
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36
>> PM,
>> > > > Gwen
>> > > > > > >> > Shapira <
>> > > > > > >> > > > >> > >>> > > > gwen@confluent.io>
>> > > > > > >> > > > >> > >>> > > > > > > >> wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > Can you clarify the
>> workflow
>> > for
>> > > > the
>> > > > > > >> > > following
>> > > > > > >> > > > >> > >>> scenarios:
>> > > > > > >> > > > >> > >>> > > > > > > >> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > 1. I currently have 6
>> brokers
>> > > and
>> > > > > want
>> > > > > > >> to
>> > > > > > >> > add
>> > > > > > >> > > > >> rack
>> > > > > > >> > > > >> > >>> > > information
>> > > > > > >> > > > >> > >>> > > > > for
>> > > > > > >> > > > >> > >>> > > > > > > >> each
>> > > > > > >> > > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker
>> > and I
>> > > > > want
>> > > > > > to
>> > > > > > >> > > > specify
>> > > > > > >> > > > >> > which
>> > > > > > >> > > > >> > >>> > rack
>> > > > > > >> > > > >> > >>> > > it
>> > > > > > >> > > > >> > >>> > > > > > > >> belongs on
>> > > > > > >> > > > >> > >>> > > > > > > >> > > while adding it.
>> > > > > > >> > > > >> > >>> > > > > > > >> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > Thanks!
>> > > > > > >> > > > >> > >>> > > > > > > >> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at
>> 2:21
>> > PM,
>> > > > > Allen
>> > > > > > >> > Wang <
>> > > > > > >> > > > >> > >>> > > > > allenxwang@gmail.com
>> > > > > > >> > > > >> > >>> > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > We discussed the KIP in
>> the
>> > > > > hangout
>> > > > > > >> > today.
>> > > > > > >> > > > The
>> > > > > > >> > > > >> > >>> > > > recommendation
>> > > > > > >> > > > >> > >>> > > > > is
>> > > > > > >> > > > >> > >>> > > > > > > to
>> > > > > > >> > > > >> > >>> > > > > > > >> > make
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > rack as a broker
>> property in
>> > > > > > >> ZooKeeper.
>> > > > > > >> > For
>> > > > > > >> > > > >> users
>> > > > > > >> > > > >> > >>> with
>> > > > > > >> > > > >> > >>> > > > > existing
>> > > > > > >> > > > >> > >>> > > > > > > rack
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > information stored
>> > somewhere,
>> > > > they
>> > > > > > >> would
>> > > > > > >> > > need
>> > > > > > >> > > > >> to
>> > > > > > >> > > > >> > >>> > retrieve
>> > > > > > >> > > > >> > >>> > > > the
>> > > > > > >> > > > >> > >>> > > > > > > >> > information
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > at broker start up and
>> > > > dynamically
>> > > > > > set
>> > > > > > >> > the
>> > > > > > >> > > > rack
>> > > > > > >> > > > >> > >>> > property,
>> > > > > > >> > > > >> > >>> > > > > which
>> > > > > > >> > > > >> > >>> > > > > > > can
>> > > > > > >> > > > >> > >>> > > > > > > >> be
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > implemented as a wrapper
>> to
>> > > > > > bootstrap
>> > > > > > >> > > broker.
>> > > > > > >> > > > >> > There
>> > > > > > >> > > > >> > >>> will
>> > > > > > >> > > > >> > >>> > > be
>> > > > > > >> > > > >> > >>> > > > no
>> > > > > > >> > > > >> > >>> > > > > > > >> > interface
>> > > > > > >> > > > >> > >>> > > > > > > >> > > or
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > pluggable implementation
>> to
>> > > > > retrieve
>> > > > > > >> the
>> > > > > > >> > > rack
>> > > > > > >> > > > >> > >>> > information.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > The assumption is that
>> you
>> > > > always
>> > > > > > >> need to
>> > > > > > >> > > > >> restart
>> > > > > > >> > > > >> > >>> the
>> > > > > > >> > > > >> > >>> > > broker
>> > > > > > >> > > > >> > >>> > > > > to
>> > > > > > >> > > > >> > >>> > > > > > > >> make a
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > change to the rack.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > Once the rack becomes a
>> > broker
>> > > > > > >> property,
>> > > > > > >> > it
>> > > > > > >> > > > >> will
>> > > > > > >> > > > >> > be
>> > > > > > >> > > > >> > >>> > > possible
>> > > > > > >> > > > >> > >>> > > > > to
>> > > > > > >> > > > >> > >>> > > > > > > make
>> > > > > > >> > > > >> > >>> > > > > > > >> > rack
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > part of the meta data to
>> > help
>> > > > the
>> > > > > > >> > consumer
>> > > > > > >> > > > >> choose
>> > > > > > >> > > > >> > >>> which
>> > > > > > >> > > > >> > >>> > in
>> > > > > > >> > > > >> > >>> > > > > sync
>> > > > > > >> > > > >> > >>> > > > > > > >> replica
>> > > > > > >> > > > >> > >>> > > > > > > >> > > to
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > consume from as part of
>> the
>> > > > future
>> > > > > > >> > consumer
>> > > > > > >> > > > >> > >>> enhancement.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > I will update the KIP.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > Thanks,
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > Allen
>> > > > > > >> > > > >> > >>> > > > > > > >> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at
>> 9:23
>> > > AM,
>> > > > > > Allen
>> > > > > > >> > Wang
>> > > > > > >> > > <
>> > > > > > >> > > > >> > >>> > > > > > allenxwang@gmail.com>
>> > > > > > >> > > > >> > >>> > > > > > > >> > wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's
>> KIP
>> > > > hangout
>> > > > > > but
>> > > > > > >> > this
>> > > > > > >> > > > KIP
>> > > > > > >> > > > >> > was
>> > > > > > >> > > > >> > >>> not
>> > > > > > >> > > > >> > >>> > > > > > discussed
>> > > > > > >> > > > >> > >>> > > > > > > >> due
>> > > > > > >> > > > >> > >>> > > > > > > >> > to
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > > time constraint.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > > However, after hearing
>> > > > > discussion
>> > > > > > of
>> > > > > > >> > > > KIP-35,
>> > > > > > >> > > > >> I
>> > > > > > >> > > > >> > >>> have
>> > > > > > >> > > > >> > >>> > the
>> > > > > > >> > > > >> > >>> > > > > > feeling
>> > > > > > >> > > > >> > >>> > > > > > > >> that
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > > incompatibility
>> (caused by
>> > > new
>> > > > > > >> broker
>> > > > > > >> > > > >> property)
>> > > > > > >> > > > >> > >>> > between
>> > > > > > >> > > > >> > >>> > > > > > brokers
>> > > > > > >> > > > >> > >>> > > > > > > >> with
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > > different versions
>> will
>> > be
>> > > > > solved
>> > > > > > >> > there.
>> > > > > > >> > > > In
>> > > > > > >> > > > >> > >>> addition,
>> > > > > > >> > > > >> > >>> > > > > having
>> > > > > > >> > > > >> > >>> > > > > > > >> stack
>> > > > > > >> > > > >> > >>> > > > > > > >> > in
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > > broker property as meta
>> > data
>> > > > may
>> > > > > > >> also
>> > > > > > >> > > help
>> > > > > > >> > > > >> > >>> consumers
>> > > > > > >> > > > >> > >>> > in
>> > > > > > >> > > > >> > >>> > > > the
>> > > > > > >> > > > >> > >>> > > > > > > >> future.
>> > > > > > >> > > > >> > >>> > > > > > > >> > So
>> > > > > > >> > > > >> > >>> > > > > > > >> > > I
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > am
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > > open to adding stack
>> > > property
>> > > > to
>> > > > > > >> > broker.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > > Hopefully we can
>> discuss
>> > > this
>> > > > in
>> > > > > > the
>> > > > > > >> > next
>> > > > > > >> > > > KIP
>> > > > > > >> > > > >> > >>> hangout.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at
>> > 2:46
>> > > > PM,
>> > > > > > >> Allen
>> > > > > > >> > > > Wang <
>> > > > > > >> > > > >> > >>> > > > > > > allenxwang@gmail.com
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> Can you send me the
>> > > > information
>> > > > > > on
>> > > > > > >> the
>> > > > > > >> > > > next
>> > > > > > >> > > > >> KIP
>> > > > > > >> > > > >> > >>> > > hangout?
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> Currently the
>> broker-rack
>> > > > > mapping
>> > > > > > >> is
>> > > > > > >> > not
>> > > > > > >> > > > >> > cached.
>> > > > > > >> > > > >> > >>> In
>> > > > > > >> > > > >> > >>> > > > > > KafkaApis,
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
>> RackLocator.getRackInfo()
>> > > is
>> > > > > > called
>> > > > > > >> > each
>> > > > > > >> > > > >> time
>> > > > > > >> > > > >> > the
>> > > > > > >> > > > >> > >>> > > mapping
>> > > > > > >> > > > >> > >>> > > > > is
>> > > > > > >> > > > >> > >>> > > > > > > >> needed
>> > > > > > >> > > > >> > >>> > > > > > > >> > > for
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> auto topic creation.
>> This
>> > > > will
>> > > > > > >> ensure
>> > > > > > >> > > > latest
>> > > > > > >> > > > >> > >>> mapping
>> > > > > > >> > > > >> > >>> > is
>> > > > > > >> > > > >> > >>> > > > > used
>> > > > > > >> > > > >> > >>> > > > > > at
>> > > > > > >> > > > >> > >>> > > > > > > >> any
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > time.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> The ability to get the
>> > > > complete
>> > > > > > >> > mapping
>> > > > > > >> > > > >> makes
>> > > > > > >> > > > >> > it
>> > > > > > >> > > > >> > >>> > simple
>> > > > > > >> > > > >> > >>> > > > to
>> > > > > > >> > > > >> > >>> > > > > > > reuse
>> > > > > > >> > > > >> > >>> > > > > > > >> the
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > same
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> interface in command
>> line
>> > > > > tools.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015
>> at
>> > > 11:01
>> > > > > AM,
>> > > > > > >> > Aditya
>> > > > > > >> > > > >> > >>> Auradkar <
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
>> > > > aauradkar@linkedin.com.invalid
>> > > > > >
>> > > > > > >> > wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss
>> this
>> > > > during
>> > > > > > the
>> > > > > > >> > next
>> > > > > > >> > > > KIP
>> > > > > > >> > > > >> > >>> hangout?
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> I do see that a
>> > pluggable
>> > > > rack
>> > > > > > >> > locator
>> > > > > > >> > > > can
>> > > > > > >> > > > >> be
>> > > > > > >> > > > >> > >>> useful
>> > > > > > >> > > > >> > >>> > > > but I
>> > > > > > >> > > > >> > >>> > > > > > do
>> > > > > > >> > > > >> > >>> > > > > > > >> see a
>> > > > > > >> > > > >> > >>> > > > > > > >> > > few
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> concerns:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as
>> > > > > described
>> > > > > > in
>> > > > > > >> > the
>> > > > > > >> > > > >> > >>> document),
>> > > > > > >> > > > >> > >>> > > > implies
>> > > > > > >> > > > >> > >>> > > > > > that
>> > > > > > >> > > > >> > >>> > > > > > > >> it
>> > > > > > >> > > > >> > >>> > > > > > > >> > can
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> discover rack
>> > information
>> > > > for
>> > > > > > any
>> > > > > > >> > node
>> > > > > > >> > > in
>> > > > > > >> > > > >> the
>> > > > > > >> > > > >> > >>> > cluster.
>> > > > > > >> > > > >> > >>> > > > How
>> > > > > > >> > > > >> > >>> > > > > > > does
>> > > > > > >> > > > >> > >>> > > > > > > >> it
>> > > > > > >> > > > >> > >>> > > > > > > >> > > deal
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> with rack location
>> > > changes?
>> > > > > For
>> > > > > > >> > > example,
>> > > > > > >> > > > >> if I
>> > > > > > >> > > > >> > >>> moved
>> > > > > > >> > > > >> > >>> > > > broker
>> > > > > > >> > > > >> > >>> > > > > > id
>> > > > > > >> > > > >> > >>> > > > > > > >> (1)
>> > > > > > >> > > > >> > >>> > > > > > > >> > > from
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have
>> to
>> > > start
>> > > > > > that
>> > > > > > >> > > broker
>> > > > > > >> > > > >> with
>> > > > > > >> > > > >> > a
>> > > > > > >> > > > >> > >>> > newer
>> > > > > > >> > > > >> > >>> > > > rack
>> > > > > > >> > > > >> > >>> > > > > > > >> config.
>> > > > > > >> > > > >> > >>> > > > > > > >> > If
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers
>> > > broker
>> > > > > ->
>> > > > > > >> rack
>> > > > > > >> > > > >> > >>> information at
>> > > > > > >> > > > >> > >>> > > > start
>> > > > > > >> > > > >> > >>> > > > > up
>> > > > > > >> > > > >> > >>> > > > > > > >> time,
>> > > > > > >> > > > >> > >>> > > > > > > >> > > any
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> change to a broker
>> will
>> > > > > require
>> > > > > > >> > > bouncing
>> > > > > > >> > > > >> the
>> > > > > > >> > > > >> > >>> entire
>> > > > > > >> > > > >> > >>> > > > > cluster
>> > > > > > >> > > > >> > >>> > > > > > > >> since
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> createTopic requests
>> can
>> > > be
>> > > > > sent
>> > > > > > >> to
>> > > > > > >> > any
>> > > > > > >> > > > >> node
>> > > > > > >> > > > >> > in
>> > > > > > >> > > > >> > >>> the
>> > > > > > >> > > > >> > >>> > > > > cluster.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> For this reason it
>> may
>> > be
>> > > > > > simpler
>> > > > > > >> to
>> > > > > > >> > > have
>> > > > > > >> > > > >> each
>> > > > > > >> > > > >> > >>> node
>> > > > > > >> > > > >> > >>> > be
>> > > > > > >> > > > >> > >>> > > > > aware
>> > > > > > >> > > > >> > >>> > > > > > > of
>> > > > > > >> > > > >> > >>> > > > > > > >> its
>> > > > > > >> > > > >> > >>> > > > > > > >> > > own
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack and persist it
>> in
>> > ZK
>> > > > > during
>> > > > > > >> > start
>> > > > > > >> > > up
>> > > > > > >> > > > >> > time.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> - A pluggable
>> > RackLocator
>> > > > > relies
>> > > > > > >> on
>> > > > > > >> > an
>> > > > > > >> > > > >> > external
>> > > > > > >> > > > >> > >>> > > service
>> > > > > > >> > > > >> > >>> > > > > > being
>> > > > > > >> > > > >> > >>> > > > > > > >> > > available
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> to
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> serve rack
>> information.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I
>> > looked
>> > > > up
>> > > > > > how
>> > > > > > >> a
>> > > > > > >> > > > couple
>> > > > > > >> > > > >> of
>> > > > > > >> > > > >> > >>> other
>> > > > > > >> > > > >> > >>> > > > > systems
>> > > > > > >> > > > >> > >>> > > > > > > deal
>> > > > > > >> > > > >> > >>> > > > > > > >> > with
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> For Cassandra some
>> > > > interesting
>> > > > > > >> modes
>> > > > > > >> > > are:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> (Property File
>> > > > configuration)
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >>
>> > > > > > >> > > > >> > >>> > > > > > >
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > >
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> >
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> >
>> > > > > > >> > > > >>
>> > > > > > >> > > >
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >>
>> > > > > > >> > > > >> > >>> > > > > > >
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > >
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> >
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> >
>> > > > > > >> > > > >>
>> > > > > > >> > > >
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Voldemort does a
>> static
>> > > node
>> > > > > ->
>> > > > > > >> zone
>> > > > > > >> > > > >> > assignment
>> > > > > > >> > > > >> > >>> > based
>> > > > > > >> > > > >> > >>> > > on
>> > > > > > >> > > > >> > >>> > > > > > > >> > > configuration.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Aditya
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015
>> at
>> > > > 10:05
>> > > > > > AM,
>> > > > > > >> > Allen
>> > > > > > >> > > > >> Wang <
>> > > > > > >> > > > >> > >>> > > > > > > >> allenxwang@gmail.com
>> > > > > > >> > > > >> > >>> > > > > > > >> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > I would like to
>> see if
>> > > we
>> > > > > can
>> > > > > > do
>> > > > > > >> > > both:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator
>> > > > pluggable
>> > > > > > to
>> > > > > > >> > > > >> facilitate
>> > > > > > >> > > > >> > >>> > migration
>> > > > > > >> > > > >> > >>> > > > > with
>> > > > > > >> > > > >> > >>> > > > > > > >> > existing
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make rack an
>> > optional
>> > > > > > property
>> > > > > > >> > for
>> > > > > > >> > > > >> broker.
>> > > > > > >> > > > >> > >>> If
>> > > > > > >> > > > >> > >>> > rack
>> > > > > > >> > > > >> > >>> > > > is
>> > > > > > >> > > > >> > >>> > > > > > > >> available
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > from
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as
>> > > source
>> > > > > of
>> > > > > > >> > truth.
>> > > > > > >> > > > For
>> > > > > > >> > > > >> > users
>> > > > > > >> > > > >> > >>> > with
>> > > > > > >> > > > >> > >>> > > > > > existing
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> broker-rack
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere
>> > else,
>> > > > they
>> > > > > > can
>> > > > > > >> > use
>> > > > > > >> > > > the
>> > > > > > >> > > > >> > >>> pluggable
>> > > > > > >> > > > >> > >>> > > way
>> > > > > > >> > > > >> > >>> > > > > or
>> > > > > > >> > > > >> > >>> > > > > > > they
>> > > > > > >> > > > >> > >>> > > > > > > >> > can
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> transfer
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > the mapping to the
>> > > broker
>> > > > > rack
>> > > > > > >> > > > property.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > One thing I am not
>> > sure
>> > > is
>> > > > > > what
>> > > > > > >> > > happens
>> > > > > > >> > > > >> at
>> > > > > > >> > > > >> > >>> rolling
>> > > > > > >> > > > >> > >>> > > > > upgrade
>> > > > > > >> > > > >> > >>> > > > > > > >> when
>> > > > > > >> > > > >> > >>> > > > > > > >> > we
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > have
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > rack as a broker
>> > > property.
>> > > > > For
>> > > > > > >> > > brokers
>> > > > > > >> > > > >> with
>> > > > > > >> > > > >> > >>> older
>> > > > > > >> > > > >> > >>> > > > > version
>> > > > > > >> > > > >> > >>> > > > > > of
>> > > > > > >> > > > >> > >>> > > > > > > >> > Kafka,
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> will it
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > cause problem for
>> > them?
>> > > If
>> > > > > so,
>> > > > > > >> is
>> > > > > > >> > > there
>> > > > > > >> > > > >> any
>> > > > > > >> > > > >> > >>> > > > workaround?
>> > > > > > >> > > > >> > >>> > > > > I
>> > > > > > >> > > > >> > >>> > > > > > > also
>> > > > > > >> > > > >> > >>> > > > > > > >> > > think
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > it
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > would be better
>> not to
>> > > > have
>> > > > > > >> rack in
>> > > > > > >> > > the
>> > > > > > >> > > > >> > >>> controller
>> > > > > > >> > > > >> > >>> > > > wire
>> > > > > > >> > > > >> > >>> > > > > > > >> protocol
>> > > > > > >> > > > >> > >>> > > > > > > >> > > but
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> not
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > sure if it is
>> > > achievable.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > Thanks,
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > Allen
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28,
>> 2015
>> > at
>> > > > 4:55
>> > > > > > PM,
>> > > > > > >> > Todd
>> > > > > > >> > > > >> > Palino <
>> > > > > > >> > > > >> > >>> > > > > > > >> tpalino@gmail.com>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > I tend to like
>> the
>> > > idea
>> > > > > of a
>> > > > > > >> > > > pluggable
>> > > > > > >> > > > >> > >>> locator.
>> > > > > > >> > > > >> > >>> > > For
>> > > > > > >> > > > >> > >>> > > > > > > >> example, we
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> already
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > have an interface
>> > for
>> > > > > > >> discovering
>> > > > > > >> > > > >> > >>> information
>> > > > > > >> > > > >> > >>> > > about
>> > > > > > >> > > > >> > >>> > > > > the
>> > > > > > >> > > > >> > >>> > > > > > > >> > physical
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> location
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > of servers. I
>> don't
>> > > > relish
>> > > > > > the
>> > > > > > >> > idea
>> > > > > > >> > > > of
>> > > > > > >> > > > >> > >>> having to
>> > > > > > >> > > > >> > >>> > > > > > maintain
>> > > > > > >> > > > >> > >>> > > > > > > >> data
>> > > > > > >> > > > >> > >>> > > > > > > >> > in
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > multiple
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > places.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > -Todd
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28,
>> 2015
>> > > at
>> > > > > 4:48
>> > > > > > >> PM,
>> > > > > > >> > > > Aditya
>> > > > > > >> > > > >> > >>> > Auradkar <
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
>> > > > > > aauradkar@linkedin.com.invalid
>> > > > > > >> >
>> > > > > > >> > > > wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for
>> > starting
>> > > > this
>> > > > > > KIP
>> > > > > > >> > > Allen.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with
>> Gwen
>> > > that
>> > > > > > >> having a
>> > > > > > >> > > > >> > >>> RackLocator
>> > > > > > >> > > > >> > >>> > > class
>> > > > > > >> > > > >> > >>> > > > > that
>> > > > > > >> > > > >> > >>> > > > > > > is
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > pluggable
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > seems
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > to be too
>> complex.
>> > > The
>> > > > > KIP
>> > > > > > >> > refers
>> > > > > > >> > > > to
>> > > > > > >> > > > >> > >>> > potentially
>> > > > > > >> > > > >> > >>> > > > > > non-ZK
>> > > > > > >> > > > >> > >>> > > > > > > >> > storage
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> for the
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack info
>> which I
>> > > > don't
>> > > > > > >> think
>> > > > > > >> > is
>> > > > > > >> > > > >> > >>> necessary.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can
>> > > persist
>> > > > > > this
>> > > > > > >> > info
>> > > > > > >> > > in
>> > > > > > >> > > > >> zk
>> > > > > > >> > > > >> > >>> under
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> /brokers/ids/<broker_id>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > similar to
>> other
>> > > > broker
>> > > > > > >> > > properties
>> > > > > > >> > > > >> and
>> > > > > > >> > > > >> > >>> add a
>> > > > > > >> > > > >> > >>> > > > config
>> > > > > > >> > > > >> > >>> > > > > in
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > KafkaConfig
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > called
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > "rack".
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > > > >> > > > >> > >>> > > > > > >
>> > > > > > >> > > > >>
>> > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
>> > > > > > >> > > > >> > >>> > > > > > > >> > > "rack":
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > "abc"}
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28,
>> > 2015
>> > > > at
>> > > > > > 2:30
>> > > > > > >> > PM,
>> > > > > > >> > > > Gwen
>> > > > > > >> > > > >> > >>> Shapira
>> > > > > > >> > > > >> > >>> > <
>> > > > > > >> > > > >> > >>> > > > > > > >> > > gwen@confluent.io
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks
>> > for
>> > > > > > putting
>> > > > > > >> > out a
>> > > > > > >> > > > KIP
>> > > > > > >> > > > >> > for
>> > > > > > >> > > > >> > >>> > this.
>> > > > > > >> > > > >> > >>> > > > This
>> > > > > > >> > > > >> > >>> > > > > > is
>> > > > > > >> > > > >> > >>> > > > > > > >> super
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> important
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > for
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > production
>> > > > deployments
>> > > > > > of
>> > > > > > >> > > Kafka.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Few
>> questions:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we
>> sure
>> > we
>> > > > want
>> > > > > > "as
>> > > > > > >> > many
>> > > > > > >> > > > >> racks
>> > > > > > >> > > > >> > as
>> > > > > > >> > > > >> > >>> > > > > possible"?
>> > > > > > >> > > > >> > >>> > > > > > > I'd
>> > > > > > >> > > > >> > >>> > > > > > > >> > want
>> > > > > > >> > > > >> > >>> > > > > > > >> > > to
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > balance
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > between
>> safety
>> > > (more
>> > > > > > >> racks)
>> > > > > > >> > and
>> > > > > > >> > > > >> > network
>> > > > > > >> > > > >> > >>> > > > > utilization
>> > > > > > >> > > > >> > >>> > > > > > > >> > (traffic
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> within a
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > uses the
>> > > > > high-bandwidth
>> > > > > > >> TOR
>> > > > > > >> > > > >> switch).
>> > > > > > >> > > > >> > One
>> > > > > > >> > > > >> > >>> > > replica
>> > > > > > >> > > > >> > >>> > > > > on
>> > > > > > >> > > > >> > >>> > > > > > a
>> > > > > > >> > > > >> > >>> > > > > > > >> > > different
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > and
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on
>> same
>> > > > rack
>> > > > > > (if
>> > > > > > >> > > > possible)
>> > > > > > >> > > > >> > >>> sounds
>> > > > > > >> > > > >> > >>> > > > better
>> > > > > > >> > > > >> > >>> > > > > to
>> > > > > > >> > > > >> > >>> > > > > > > me.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 2)
>> Rack-locator
>> > > > class
>> > > > > > >> seems
>> > > > > > >> > > > overly
>> > > > > > >> > > > >> > >>> complex
>> > > > > > >> > > > >> > >>> > > > > compared
>> > > > > > >> > > > >> > >>> > > > > > to
>> > > > > > >> > > > >> > >>> > > > > > > >> > > adding a
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack.number
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > property to
>> the
>> > > > broker
>> > > > > > >> > > properties
>> > > > > > >> > > > >> > file.
>> > > > > > >> > > > >> > >>> Why
>> > > > > > >> > > > >> > >>> > do
>> > > > > > >> > > > >> > >>> > > > we
>> > > > > > >> > > > >> > >>> > > > > > want
>> > > > > > >> > > > >> > >>> > > > > > > >> > that?
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep
>> 28,
>> > > 2015
>> > > > > at
>> > > > > > >> 12:15
>> > > > > > >> > > PM,
>> > > > > > >> > > > >> > Allen
>> > > > > > >> > > > >> > >>> > Wang <
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com
>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > wrote:
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka
>> > > > > > Developers,
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just
>> created
>> > > > > KIP-36
>> > > > > > >> for
>> > > > > > >> > > rack
>> > > > > > >> > > > >> aware
>> > > > > > >> > > > >> > >>> > replica
>> > > > > > >> > > > >> > >>> > > > > > > >> assignment.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >>
>> > > > > > >> > > > >> > >>> > > > > > >
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > >
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> >
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> >
>> > > > > > >> > > > >>
>> > > > > > >> > > >
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal
>> is to
>> > > > > utilize
>> > > > > > >> the
>> > > > > > >> > > > >> isolation
>> > > > > > >> > > > >> > >>> > > provided
>> > > > > > >> > > > >> > >>> > > > by
>> > > > > > >> > > > >> > >>> > > > > > the
>> > > > > > >> > > > >> > >>> > > > > > > >> > racks
>> > > > > > >> > > > >> > >>> > > > > > > >> > > in
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> data
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > center
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > and
>> distribute
>> > > > > > replicas
>> > > > > > >> to
>> > > > > > >> > > > racks
>> > > > > > >> > > > >> to
>> > > > > > >> > > > >> > >>> > provide
>> > > > > > >> > > > >> > >>> > > > > fault
>> > > > > > >> > > > >> > >>> > > > > > > >> > > tolerance.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments
>> are
>> > > > > welcome.
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > > >
>> > > > > > >> > > > >> > >>> > > > > > > >> > >
>> > > > > > >> > > > >> > >>> > > > > > > >> >
>> > > > > > >> > > > >> > >>> > > > > > > >>
>> > > > > > >> > > > >> > >>> > > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > > >
>> > > > > > >> > > > >> > >>> > > > > > >
>> > > > > > >> > > > >> > >>> > > > > >
>> > > > > > >> > > > >> > >>> > > > >
>> > > > > > >> > > > >> > >>> > > >
>> > > > > > >> > > > >> > >>> > >
>> > > > > > >> > > > >> > >>> >
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>> --
>> > > > > > >> > > > >> > >>> Thanks,
>> > > > > > >> > > > >> > >>> Neha
>> > > > > > >> > > > >> > >>>
>> > > > > > >> > > > >> > >>
>> > > > > > >> > > > >> > >>
>> > > > > > >> > > > >> > >
>> > > > > > >> > > > >> >
>> > > > > > >> > > > >>
>> > > > > > >> > > > >
>> > > > > > >> > > > >
>> > > > > > >> > > >
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Hi Jun,

I feel it is a bit complicated and unconventional to have a major release
have dependency on a minor release. It would also double the cost of
releasing.

On the other hand, if we want to skip the minor release and fix
ZkUtils.getBrokerInfo() in 0.9.1, it would require all 0.9.0 clients to
upgrade first to 0.9.1 before the upgrading brokers, which is also a bit
unconventional. It is also problematic since 0.9.1 client will expect rack
field (even though it may be encoded as null) when it tries to deserialize
TopicMetadataResponse.

So it looks like we have to have a minor release first.

Thanks,
Allen



On Tue, Jan 12, 2016 at 6:27 PM, Jun Rao <ju...@confluent.io> wrote:

> Allen,
>
> It's not ideal to add a new field in json without increasing the version.
> Also, if we don't fix this issue in 0.9.0, if we ever change the version of
> json in the future, the consumer in 0.9.0 will break after the broker is
> upgraded to the new release. So, I suggest that we fix the behavior in
> ZkUtils.getBrokerInfo()
> in both trunk and 0.9.0 branch. After we release 0.9.0.1, the upgrade path
> is for the old consumer to be upgraded to 0.9.0.1 before upgrading the
> broker to 0.9.1 and beyond. This fix can be done in a separate jira.
>
> Thanks,
>
> Jun
>
> On Tue, Jan 12, 2016 at 5:35 PM, Allen Wang <al...@gmail.com> wrote:
>
> > Agreed. So it seems that for 0.9.1, the only option is to keep the JSON
> > version unchanged. But as part of the PR, I can change the behavior of
> > ZkUtils.getBrokerInfo()
> > to make it compatible with future JSON versions.
> >
> > Thanks,
> > Allen
> >
> >
> > On Tue, Jan 12, 2016 at 2:57 PM, Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Allen,
> > >
> > > That's a good point. In 0.9.0.0, the old consumer reads broker info
> > > directly from ZK and the code throws an exception if the version in
> json
> > is
> > > not 1 or 2. This old consumer will break when we upgrade the broker
> json
> > to
> > > version 3 in ZK in 0.9.1, which will be an issue. We overlooked this
> > issue
> > > in 0.9.0.0. The easiest fix is probably not to check the version in
> > > ZkUtils.getBrokerInfo().
> > > This way, as long as we are only adding new fields in broker json, we
> can
> > > preserve the compatibility.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Jan 12, 2016 at 1:52 PM, Allen Wang <al...@gmail.com>
> > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > That's a good suggestion. However, it does not solve the problem for
> > the
> > > > clients or thirty party tools that get broker information directly
> from
> > > > ZooKeeper.
> > > >
> > > > Thanks,
> > > > Allen
> > > >
> > > >
> > > > On Tue, Jan 12, 2016 at 1:29 PM, Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Allen,
> > > > >
> > > > > Another way to do this is the following.
> > > > >
> > > > > When inter.broker.protocol.version is set to 0.9.0, the broker will
> > > write
> > > > > the broker info in ZK using version 2, ignoring the rack info.
> > > > >
> > > > > When inter.broker.protocol.version is set to 0.9.1, the broker will
> > > write
> > > > > the broker info in ZK using version 3, including the rack info.
> > > > >
> > > > > If one follows the upgrade process, after the 2nd round of rolling
> > > > bounces,
> > > > > every broker is capable of parsing version 3 of broker info in ZK.
> > This
> > > > is
> > > > > when the rack-aware feature will be used.
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Jan 12, 2016 at 12:19 PM, Allen Wang <allenxwang@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Regarding the JSON version of Broker:
> > > > > >
> > > > > > I don't why the ZkUtils.getBrokerInfo() restricts the JSON
> versions
> > > it
> > > > > can
> > > > > > read. It will throw exception if version is not 1 or 2. Seems to
> me
> > > > that
> > > > > it
> > > > > > will cause compatibility problem whenever the version needs to be
> > > > changed
> > > > > > and make the upgrade path difficult.
> > > > > >
> > > > > > One option we have is to make rack also part of version 2 and
> keep
> > > the
> > > > > > version 2 unchanged for this update. This will make the old
> clients
> > > > > > compatible. During rolling upgrade, it will also avoid problems
> if
> > > the
> > > > > > controller/broker is still the old version.
> > > > > >
> > > > > > However, ZkUtils.getBrokerInfo() will be updated to return the
> > Broker
> > > > > with
> > > > > > rack so the rack information will be available once the
> > server/client
> > > > is
> > > > > > upgraded to the latest version.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 6, 2016 at 6:28 PM, Allen Wang <allenxwang@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Updated KIP according to Jun's comment and included changes to
> > TMR.
> > > > > > >
> > > > > > > On Tue, Jan 5, 2016 at 5:59 PM, Jun Rao <ju...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > >> Hi, Allen,
> > > > > > >>
> > > > > > >> A couple of minor comments on the KIP.
> > > > > > >>
> > > > > > >> 1. The version of the broker JSON string says 2. It should be
> 3.
> > > > > > >>
> > > > > > >> 2. The new version of UpdateMetadataRequest should be 2,
> instead
> > > of
> > > > 1.
> > > > > > >> Could you include the full wire protocol of version 2 of
> > > > > > >> UpdateMetadataRequest and highlight the changed part?
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >>
> > > > > > >> Jun
> > > > > > >>
> > > > > > >> On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <
> > allenxwang@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > Jun and I had a chance to discuss it in a meeting and it is
> > > agreed
> > > > > to
> > > > > > >> > change the TMR in a different patch.
> > > > > > >> >
> > > > > > >> > I can change the KIP to include rack in TMR. The essential
> > > change
> > > > is
> > > > > > to
> > > > > > >> add
> > > > > > >> > rack into class BrokerEndPoint and make TMR version aware.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
> > > > > > >> > aauradkar@linkedin.com.invalid> wrote:
> > > > > > >> >
> > > > > > >> > > Jun/Allen -
> > > > > > >> > >
> > > > > > >> > > Did we ever actually agree on whether we should evolve the
> > TMR
> > > > to
> > > > > > >> include
> > > > > > >> > > rack info or not?
> > > > > > >> > > I don't feel strongly about it but I if it's the right
> thing
> > > to
> > > > do
> > > > > > we
> > > > > > >> > > should probably do it in this KIP (can be a separate
> > patch)..
> > > it
> > > > > > >> isn't a
> > > > > > >> > > large change.
> > > > > > >> > >
> > > > > > >> > > Aditya
> > > > > > >> > >
> > > > > > >> > > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <
> > > > allenxwang@gmail.com
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > >
> > > > > > >> > > > Added the rolling upgrade instruction in the KIP,
> similar
> > to
> > > > > those
> > > > > > >> in
> > > > > > >> > > 0.9.0
> > > > > > >> > > > release notes.
> > > > > > >> > > >
> > > > > > >> > > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <
> > > > > > allenxwang@gmail.com>
> > > > > > >> > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Hi Jun,
> > > > > > >> > > > >
> > > > > > >> > > > > The reason that TopicMetadataResponse is not included
> in
> > > the
> > > > > KIP
> > > > > > >> is
> > > > > > >> > > that
> > > > > > >> > > > > it currently is not version aware . So we need to
> > > introduce
> > > > > > >> version
> > > > > > >> > to
> > > > > > >> > > it
> > > > > > >> > > > > in order to make sure backward compatibility. It seems
> > to
> > > > me a
> > > > > > big
> > > > > > >> > > > change.
> > > > > > >> > > > > Do we want to couple it with this KIP? Do we need to
> > > further
> > > > > > >> discuss
> > > > > > >> > > what
> > > > > > >> > > > > information to include in the new version besides
> rack?
> > > For
> > > > > > >> example,
> > > > > > >> > > > should
> > > > > > >> > > > > we include broker security protocol in
> > > > TopicMetadataResponse?
> > > > > > >> > > > >
> > > > > > >> > > > > The other option is to make it a separate KIP to make
> > > > > > >> > > > > TopicMetadataResponse version aware and decide what to
> > > > > include,
> > > > > > >> and
> > > > > > >> > > make
> > > > > > >> > > > > this KIP focus on the rack aware algorithm, admin
> tools
> > > and
> > > > > > >> related
> > > > > > >> > > > > changes to inter-broker protocol .
> > > > > > >> > > > >
> > > > > > >> > > > > Thanks,
> > > > > > >> > > > > Allen
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <
> > > jun@confluent.io>
> > > > > > >> wrote:
> > > > > > >> > > > >
> > > > > > >> > > > >> Allen,
> > > > > > >> > > > >>
> > > > > > >> > > > >> Thanks for the proposal. A few comments.
> > > > > > >> > > > >>
> > > > > > >> > > > >> 1. Since this KIP changes the inter broker
> > communication
> > > > > > protocol
> > > > > > >> > > > >> (UpdateMetadataRequest), we will need to document the
> > > > upgrade
> > > > > > >> path
> > > > > > >> > > > >> (similar
> > > > > > >> > > > >> to what's described in
> > > > > > >> > > > >>
> http://kafka.apache.org/090/documentation.html#upgrade
> > ).
> > > > > > >> > > > >>
> > > > > > >> > > > >> 2. It might be useful to include the rack info of the
> > > > broker
> > > > > in
> > > > > > >> > > > >> TopicMetadataResponse. This can be useful for
> > > > administrative
> > > > > > >> tasks,
> > > > > > >> > as
> > > > > > >> > > > >> well
> > > > > > >> > > > >> as read affinity in the future.
> > > > > > >> > > > >>
> > > > > > >> > > > >> Jun
> > > > > > >> > > > >>
> > > > > > >> > > > >>
> > > > > > >> > > > >>
> > > > > > >> > > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <
> > > > > > >> allenxwang@gmail.com>
> > > > > > >> > > > wrote:
> > > > > > >> > > > >>
> > > > > > >> > > > >> > If there are no more comments I would like to call
> > for
> > > a
> > > > > > vote.
> > > > > > >> > > > >> >
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <
> > > > > > >> > allenxwang@gmail.com>
> > > > > > >> > > > >> wrote:
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > > KIP is updated with more details and how to
> handle
> > > the
> > > > > > >> situation
> > > > > > >> > > > where
> > > > > > >> > > > >> > > rack information is incomplete.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > In the situation where rack information is
> > > incomplete,
> > > > > but
> > > > > > we
> > > > > > >> > want
> > > > > > >> > > > to
> > > > > > >> > > > >> > > continue with the assignment, I have suggested to
> > > > ignore
> > > > > > all
> > > > > > >> > rack
> > > > > > >> > > > >> > > information and fallback to original algorithm.
> The
> > > > > reason
> > > > > > is
> > > > > > >> > > > >> explained
> > > > > > >> > > > >> > > below:
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > The other options are to assume that the broker
> > > without
> > > > > the
> > > > > > >> rack
> > > > > > >> > > > >> belong
> > > > > > >> > > > >> > to
> > > > > > >> > > > >> > > its own unique rack, or they belong to one
> > "default"
> > > > > rack.
> > > > > > >> > Either
> > > > > > >> > > > way
> > > > > > >> > > > >> we
> > > > > > >> > > > >> > > choose, it is highly likely to result in uneven
> > > number
> > > > of
> > > > > > >> > brokers
> > > > > > >> > > in
> > > > > > >> > > > >> > racks,
> > > > > > >> > > > >> > > and it is quite possible that the "made up" racks
> > > will
> > > > > have
> > > > > > >> much
> > > > > > >> > > > fewer
> > > > > > >> > > > >> > > number of brokers. As I explained in the KIP,
> > uneven
> > > > > number
> > > > > > >> of
> > > > > > >> > > > >> brokers in
> > > > > > >> > > > >> > > racks will lead to uneven distribution of
> replicas
> > > > among
> > > > > > >> brokers
> > > > > > >> > > > (even
> > > > > > >> > > > >> > > though the leader distribution is still even).
> The
> > > > > brokers
> > > > > > in
> > > > > > >> > the
> > > > > > >> > > > rack
> > > > > > >> > > > >> > that
> > > > > > >> > > > >> > > has fewer number of brokers will get more
> replicas
> > > per
> > > > > > broker
> > > > > > >> > than
> > > > > > >> > > > >> > brokers
> > > > > > >> > > > >> > > in other racks.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > Given this fact and the replica assignment
> produced
> > > > will
> > > > > be
> > > > > > >> > > > incorrect
> > > > > > >> > > > >> > > anyway from rack aware point of view, ignoring
> all
> > > rack
> > > > > > >> > > information
> > > > > > >> > > > >> and
> > > > > > >> > > > >> > > fallback to the original algorithm is not a bad
> > > choice
> > > > > > since
> > > > > > >> it
> > > > > > >> > > will
> > > > > > >> > > > >> at
> > > > > > >> > > > >> > > least have a better guarantee of replica
> > > distribution.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > Also for command line tools it gives user a
> choice
> > if
> > > > for
> > > > > > any
> > > > > > >> > > reason
> > > > > > >> > > > >> they
> > > > > > >> > > > >> > > want to ignore rack information and fallback to
> the
> > > > > > original
> > > > > > >> > > > >> algorithm.
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <
> > > > > > >> > allenxwang@gmail.com
> > > > > > >> > > >
> > > > > > >> > > > >> > wrote:
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > >> I am busy with some time pressing issues for the
> > > last
> > > > > few
> > > > > > >> > days. I
> > > > > > >> > > > >> will
> > > > > > >> > > > >> > >> think about how the incomplete rack information
> > will
> > > > > > affect
> > > > > > >> the
> > > > > > >> > > > >> balance
> > > > > > >> > > > >> > and
> > > > > > >> > > > >> > >> update the KIP by early next week.
> > > > > > >> > > > >> > >>
> > > > > > >> > > > >> > >> Thanks,
> > > > > > >> > > > >> > >> Allen
> > > > > > >> > > > >> > >>
> > > > > > >> > > > >> > >>
> > > > > > >> > > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <
> > > > > > >> > neha@confluent.io
> > > > > > >> > > >
> > > > > > >> > > > >> > wrote:
> > > > > > >> > > > >> > >>
> > > > > > >> > > > >> > >>> Few suggestions on improving the KIP
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>> *If some brokers have rack, and some do not,
> the
> > > > > > algorithm
> > > > > > >> > will
> > > > > > >> > > > >> thrown
> > > > > > >> > > > >> > an
> > > > > > >> > > > >> > >>> > exception. This is to prevent incorrect
> > > assignment
> > > > > > >> caused by
> > > > > > >> > > > user
> > > > > > >> > > > >> > >>> error.*
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>> In the KIP, can you clearly state the
> user-facing
> > > > > > behavior
> > > > > > >> > when
> > > > > > >> > > > some
> > > > > > >> > > > >> > >>> brokers have rack information and some don't.
> > Which
> > > > > > actions
> > > > > > >> > and
> > > > > > >> > > > >> > requests
> > > > > > >> > > > >> > >>> will error out and how?
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>> *Even distribution of partition leadership
> among
> > > > > brokers*
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>> There is some information about arranging the
> > > sorted
> > > > > > broker
> > > > > > >> > list
> > > > > > >> > > > >> > >>> interlaced
> > > > > > >> > > > >> > >>> with rack ids. Can you describe the changes to
> > the
> > > > > > current
> > > > > > >> > > > algorithm
> > > > > > >> > > > >> > in a
> > > > > > >> > > > >> > >>> little more detail? How does this interlacing
> > work
> > > if
> > > > > > only
> > > > > > >> a
> > > > > > >> > > > subset
> > > > > > >> > > > >> of
> > > > > > >> > > > >> > >>> brokers have the rack id configured? Does this
> > > still
> > > > > work
> > > > > > >> if
> > > > > > >> > > > uneven
> > > > > > >> > > > >> #
> > > > > > >> > > > >> > of
> > > > > > >> > > > >> > >>> brokers are assigned to each rack? It might
> work,
> > > I'm
> > > > > > >> looking
> > > > > > >> > > for
> > > > > > >> > > > >> more
> > > > > > >> > > > >> > >>> details on the changes, since it will affect
> the
> > > > > behavior
> > > > > > >> seen
> > > > > > >> > > by
> > > > > > >> > > > >> the
> > > > > > >> > > > >> > >>> user
> > > > > > >> > > > >> > >>> - imbalance on either the leaders or data or
> > both.
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya
> Auradkar <
> > > > > > >> > > > >> > aauradkar@linkedin.com>
> > > > > > >> > > > >> > >>> wrote:
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>> > I think this sounds reasonable. Anyone else
> > have
> > > > > > >> comments?
> > > > > > >> > > > >> > >>> >
> > > > > > >> > > > >> > >>> > Aditya
> > > > > > >> > > > >> > >>> >
> > > > > > >> > > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> > > > > > >> > > > allenxwang@gmail.com
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > >>> wrote:
> > > > > > >> > > > >> > >>> >
> > > > > > >> > > > >> > >>> > > During the discussion in the hangout, it
> was
> > > > > > mentioned
> > > > > > >> > that
> > > > > > >> > > it
> > > > > > >> > > > >> > would
> > > > > > >> > > > >> > >>> be
> > > > > > >> > > > >> > >>> > > desirable that consumers know the rack
> > > > information
> > > > > of
> > > > > > >> the
> > > > > > >> > > > >> brokers
> > > > > > >> > > > >> > so
> > > > > > >> > > > >> > >>> that
> > > > > > >> > > > >> > >>> > > they can consume from the broker in the
> same
> > > rack
> > > > > to
> > > > > > >> > reduce
> > > > > > >> > > > >> > latency.
> > > > > > >> > > > >> > >>> As I
> > > > > > >> > > > >> > >>> > > understand this will only be beneficial if
> > > > consumer
> > > > > > can
> > > > > > >> > > > consume
> > > > > > >> > > > >> > from
> > > > > > >> > > > >> > >>> any
> > > > > > >> > > > >> > >>> > > broker in ISR, which is not possible now.
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> > > I suggest we skip the change to TMR. Once
> the
> > > > > change
> > > > > > is
> > > > > > >> > made
> > > > > > >> > > > to
> > > > > > >> > > > >> > >>> consumer
> > > > > > >> > > > >> > >>> > to
> > > > > > >> > > > >> > >>> > > be able to consume from any broker in ISR,
> > the
> > > > rack
> > > > > > >> > > > information
> > > > > > >> > > > >> can
> > > > > > >> > > > >> > >>> be
> > > > > > >> > > > >> > >>> > > added to TMR.
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> > > Another thing I want to confirm is  command
> > > line
> > > > > > >> > behavior. I
> > > > > > >> > > > >> think
> > > > > > >> > > > >> > >>> the
> > > > > > >> > > > >> > >>> > > desirable default behavior is to fail fast
> on
> > > > > command
> > > > > > >> line
> > > > > > >> > > for
> > > > > > >> > > > >> > >>> incomplete
> > > > > > >> > > > >> > >>> > > rack mapping. The error message can include
> > > > further
> > > > > > >> > > > instruction
> > > > > > >> > > > >> > that
> > > > > > >> > > > >> > >>> > tells
> > > > > > >> > > > >> > >>> > > the user to add an extra argument (like
> > > > > > >> > > > >> "--allow-partial-rackinfo")
> > > > > > >> > > > >> > >>> to
> > > > > > >> > > > >> > >>> > > suppress the error and do an imperfect rack
> > > aware
> > > > > > >> > > assignment.
> > > > > > >> > > > If
> > > > > > >> > > > >> > the
> > > > > > >> > > > >> > >>> > > default behavior is to allow incomplete
> > > mapping,
> > > > > the
> > > > > > >> error
> > > > > > >> > > can
> > > > > > >> > > > >> > still
> > > > > > >> > > > >> > >>> be
> > > > > > >> > > > >> > >>> > > easily missed.
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> > > The affected command line tools are
> > > TopicCommand
> > > > > and
> > > > > > >> > > > >> > >>> > > ReassignPartitionsCommand.
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> > > Thanks,
> > > > > > >> > > > >> > >>> > > Allen
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya
> > > > Auradkar <
> > > > > > >> > > > >> > >>> > aauradkar@linkedin.com>
> > > > > > >> > > > >> > >>> > > wrote:
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> > > > Hi Allen,
> > > > > > >> > > > >> > >>> > > >
> > > > > > >> > > > >> > >>> > > > For TopicMetadataResponse to understand
> > > > version,
> > > > > > you
> > > > > > >> can
> > > > > > >> > > > bump
> > > > > > >> > > > >> up
> > > > > > >> > > > >> > >>> the
> > > > > > >> > > > >> > >>> > > > request version itself. Based on the
> > version
> > > of
> > > > > the
> > > > > > >> > > request,
> > > > > > >> > > > >> the
> > > > > > >> > > > >> > >>> > response
> > > > > > >> > > > >> > >>> > > > can be appropriately serialized. It
> > shouldn't
> > > > be
> > > > > a
> > > > > > >> huge
> > > > > > >> > > > >> change.
> > > > > > >> > > > >> > For
> > > > > > >> > > > >> > >>> > > > example: We went through something
> similar
> > > for
> > > > > > >> > > > ProduceRequest
> > > > > > >> > > > >> > >>> recently
> > > > > > >> > > > >> > >>> > (
> > > > > > >> > > > >> > >>> > > > https://reviews.apache.org/r/33378/)
> > > > > > >> > > > >> > >>> > > > I guess the reason protocol information
> is
> > > not
> > > > > > >> included
> > > > > > >> > in
> > > > > > >> > > > the
> > > > > > >> > > > >> > TMR
> > > > > > >> > > > >> > >>> is
> > > > > > >> > > > >> > >>> > > > because the topic itself is independent
> of
> > > any
> > > > > > >> > particular
> > > > > > >> > > > >> > protocol
> > > > > > >> > > > >> > >>> (SSL
> > > > > > >> > > > >> > >>> > > vs
> > > > > > >> > > > >> > >>> > > > Plaintext). Having said that, I'm not
> sure
> > we
> > > > > even
> > > > > > >> need
> > > > > > >> > > rack
> > > > > > >> > > > >> > >>> > information
> > > > > > >> > > > >> > >>> > > in
> > > > > > >> > > > >> > >>> > > > TMR. What usecase were you thinking of
> > > > initially?
> > > > > > >> > > > >> > >>> > > >
> > > > > > >> > > > >> > >>> > > > For 1 - I'd be fine with adding an option
> > to
> > > > the
> > > > > > >> command
> > > > > > >> > > > line
> > > > > > >> > > > >> > tools
> > > > > > >> > > > >> > >>> > that
> > > > > > >> > > > >> > >>> > > > check rack assignment. For e.g.
> > > > > > >> "--strict-assignment" or
> > > > > > >> > > > >> > something
> > > > > > >> > > > >> > >>> > > similar.
> > > > > > >> > > > >> > >>> > > >
> > > > > > >> > > > >> > >>> > > > Aditya
> > > > > > >> > > > >> > >>> > > >
> > > > > > >> > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen
> > Wang <
> > > > > > >> > > > >> > allenxwang@gmail.com>
> > > > > > >> > > > >> > >>> > > wrote:
> > > > > > >> > > > >> > >>> > > >
> > > > > > >> > > > >> > >>> > > > > For 2 and 3, I have updated the KIP.
> > Please
> > > > > take
> > > > > > a
> > > > > > >> > look.
> > > > > > >> > > > One
> > > > > > >> > > > >> > >>> thing I
> > > > > > >> > > > >> > >>> > > have
> > > > > > >> > > > >> > >>> > > > > changed is removing the proposal to add
> > > rack
> > > > to
> > > > > > >> > > > >> > >>> > TopicMetadataResponse.
> > > > > > >> > > > >> > >>> > > > The
> > > > > > >> > > > >> > >>> > > > > reason is that unlike
> > > UpdateMetadataRequest,
> > > > > > >> > > > >> > >>> TopicMetadataResponse
> > > > > > >> > > > >> > >>> > does
> > > > > > >> > > > >> > >>> > > > not
> > > > > > >> > > > >> > >>> > > > > understand version. I don't see a way
> to
> > > > > include
> > > > > > >> rack
> > > > > > >> > > > >> without
> > > > > > >> > > > >> > >>> > breaking
> > > > > > >> > > > >> > >>> > > > old
> > > > > > >> > > > >> > >>> > > > > version of clients. That's probably why
> > > > secure
> > > > > > >> > protocol
> > > > > > >> > > is
> > > > > > >> > > > >> not
> > > > > > >> > > > >> > >>> > included
> > > > > > >> > > > >> > >>> > > > in
> > > > > > >> > > > >> > >>> > > > > the TopicMetadataResponse either. I
> think
> > > it
> > > > > will
> > > > > > >> be a
> > > > > > >> > > > much
> > > > > > >> > > > >> > >>> bigger
> > > > > > >> > > > >> > >>> > > change
> > > > > > >> > > > >> > >>> > > > > to include rack in
> TopicMetadataResponse.
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > For 1, my concern is that doing rack
> > aware
> > > > > > >> assignment
> > > > > > >> > > > >> without
> > > > > > >> > > > >> > >>> > complete
> > > > > > >> > > > >> > >>> > > > > broker to rack mapping will result in
> > > > > assignment
> > > > > > >> that
> > > > > > >> > is
> > > > > > >> > > > not
> > > > > > >> > > > >> > rack
> > > > > > >> > > > >> > >>> > aware
> > > > > > >> > > > >> > >>> > > > and
> > > > > > >> > > > >> > >>> > > > > fail to provide fault tolerance in the
> > > event
> > > > of
> > > > > > >> rack
> > > > > > >> > > > outage.
> > > > > > >> > > > >> > This
> > > > > > >> > > > >> > >>> > kind
> > > > > > >> > > > >> > >>> > > of
> > > > > > >> > > > >> > >>> > > > > problem will be difficult to surface.
> And
> > > the
> > > > > > cost
> > > > > > >> of
> > > > > > >> > > this
> > > > > > >> > > > >> > >>> problem is
> > > > > > >> > > > >> > >>> > > > high:
> > > > > > >> > > > >> > >>> > > > > you have to do partition reassignment
> if
> > > you
> > > > > are
> > > > > > >> lucky
> > > > > > >> > > to
> > > > > > >> > > > >> spot
> > > > > > >> > > > >> > >>> the
> > > > > > >> > > > >> > >>> > > > problem
> > > > > > >> > > > >> > >>> > > > > early on or face the consequence of
> data
> > > loss
> > > > > > >> during
> > > > > > >> > > real
> > > > > > >> > > > >> rack
> > > > > > >> > > > >> > >>> > outage.
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > I do see the concern of fail-fast as it
> > > might
> > > > > > also
> > > > > > >> > cause
> > > > > > >> > > > >> data
> > > > > > >> > > > >> > >>> loss if
> > > > > > >> > > > >> > >>> > > > > producer is not able produce the
> message
> > > due
> > > > to
> > > > > > >> topic
> > > > > > >> > > > >> creation
> > > > > > >> > > > >> > >>> > failure.
> > > > > > >> > > > >> > >>> > > > Is
> > > > > > >> > > > >> > >>> > > > > it feasible to treat dynamic topic
> > creation
> > > > and
> > > > > > >> > command
> > > > > > >> > > > >> tools
> > > > > > >> > > > >> > >>> > > > differently?
> > > > > > >> > > > >> > >>> > > > > We allow dynamic topic creation with
> > > > incomplete
> > > > > > >> > > > broker-rack
> > > > > > >> > > > >> > >>> mapping
> > > > > > >> > > > >> > >>> > and
> > > > > > >> > > > >> > >>> > > > > fail fast in command line. Another
> option
> > > is
> > > > to
> > > > > > let
> > > > > > >> > user
> > > > > > >> > > > >> > >>> determine
> > > > > > >> > > > >> > >>> > the
> > > > > > >> > > > >> > >>> > > > > behavior for command line. For example,
> > by
> > > > > > default
> > > > > > >> > fail
> > > > > > >> > > > >> fast in
> > > > > > >> > > > >> > >>> > command
> > > > > > >> > > > >> > >>> > > > > line but allow incomplete broker-rack
> > > mapping
> > > > > if
> > > > > > >> > another
> > > > > > >> > > > >> switch
> > > > > > >> > > > >> > >>> is
> > > > > > >> > > > >> > >>> > > > > provided.
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM,
> Aditya
> > > > > > Auradkar <
> > > > > > >> > > > >> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > Hey Allen,
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > 1. If we choose fail fast topic
> > creation,
> > > > we
> > > > > > will
> > > > > > >> > have
> > > > > > >> > > > >> topic
> > > > > > >> > > > >> > >>> > creation
> > > > > > >> > > > >> > >>> > > > > > failures while upgrading the
> cluster. I
> > > > > really
> > > > > > >> doubt
> > > > > > >> > > we
> > > > > > >> > > > >> want
> > > > > > >> > > > >> > >>> this
> > > > > > >> > > > >> > >>> > > > > behavior.
> > > > > > >> > > > >> > >>> > > > > > Ideally, this should be invisible to
> > > > clients
> > > > > > of a
> > > > > > >> > > > cluster.
> > > > > > >> > > > >> > >>> > Currently,
> > > > > > >> > > > >> > >>> > > > > each
> > > > > > >> > > > >> > >>> > > > > > broker is effectively its own rack.
> So
> > we
> > > > > > >> probably
> > > > > > >> > can
> > > > > > >> > > > use
> > > > > > >> > > > >> > the
> > > > > > >> > > > >> > >>> rack
> > > > > > >> > > > >> > >>> > > > > > information whenever possible but not
> > > make
> > > > > it a
> > > > > > >> hard
> > > > > > >> > > > >> > >>> requirement.
> > > > > > >> > > > >> > >>> > To
> > > > > > >> > > > >> > >>> > > > > extend
> > > > > > >> > > > >> > >>> > > > > > Gwen's example, one badly configured
> > > broker
> > > > > > >> should
> > > > > > >> > not
> > > > > > >> > > > >> > degrade
> > > > > > >> > > > >> > >>> > topic
> > > > > > >> > > > >> > >>> > > > > > creation for the entire cluster.
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a
> > > section
> > > > > on
> > > > > > >> the
> > > > > > >> > > > upgrade
> > > > > > >> > > > >> > >>> piece to
> > > > > > >> > > > >> > >>> > > > > confirm
> > > > > > >> > > > >> > >>> > > > > > that old clients will not see
> errors? I
> > > > > believe
> > > > > > >> > > > >> > >>> > > > > ZookeeperConsumerConnector
> > > > > > >> > > > >> > >>> > > > > > reads the Broker objects from ZK. I
> > > wanted
> > > > to
> > > > > > >> > confirm
> > > > > > >> > > > that
> > > > > > >> > > > >> > this
> > > > > > >> > > > >> > >>> > will
> > > > > > >> > > > >> > >>> > > > not
> > > > > > >> > > > >> > >>> > > > > > cause any problems.
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > 3. Could you elaborate your proposed
> > > > changes
> > > > > to
> > > > > > >> the
> > > > > > >> > > > >> > >>> > > > UpdateMetadataRequest
> > > > > > >> > > > >> > >>> > > > > > in the "Public Interfaces" section?
> > > > > > Personally, I
> > > > > > >> > find
> > > > > > >> > > > >> this
> > > > > > >> > > > >> > >>> format
> > > > > > >> > > > >> > >>> > > easy
> > > > > > >> > > > >> > >>> > > > > to
> > > > > > >> > > > >> > >>> > > > > > read in terms of wire protocol
> changes:
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > >
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> >
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> >
> > > > > > >> > > > >>
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > Aditya
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM,
> Allen
> > > > Wang <
> > > > > > >> > > > >> > >>> allenxwang@gmail.com>
> > > > > > >> > > > >> > >>> > > > > wrote:
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > > KIP is updated include rack as an
> > > > optional
> > > > > > >> > property
> > > > > > >> > > > for
> > > > > > >> > > > >> > >>> broker.
> > > > > > >> > > > >> > >>> > > > Please
> > > > > > >> > > > >> > >>> > > > > > take
> > > > > > >> > > > >> > >>> > > > > > > a look and let me know if more
> > details
> > > > are
> > > > > > >> needed.
> > > > > > >> > > > >> > >>> > > > > > >
> > > > > > >> > > > >> > >>> > > > > > > For the case where some brokers
> have
> > > rack
> > > > > and
> > > > > > >> some
> > > > > > >> > > do
> > > > > > >> > > > >> not,
> > > > > > >> > > > >> > >>> the
> > > > > > >> > > > >> > >>> > > > current
> > > > > > >> > > > >> > >>> > > > > > KIP
> > > > > > >> > > > >> > >>> > > > > > > uses the fail-fast behavior. If
> there
> > > are
> > > > > > >> > concerns,
> > > > > > >> > > we
> > > > > > >> > > > >> can
> > > > > > >> > > > >> > >>> > further
> > > > > > >> > > > >> > >>> > > > > > discuss
> > > > > > >> > > > >> > >>> > > > > > > this in the email thread or next
> > > hangout.
> > > > > > >> > > > >> > >>> > > > > > >
> > > > > > >> > > > >> > >>> > > > > > >
> > > > > > >> > > > >> > >>> > > > > > >
> > > > > > >> > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM,
> > Allen
> > > > > Wang
> > > > > > <
> > > > > > >> > > > >> > >>> > allenxwang@gmail.com
> > > > > > >> > > > >> > >>> > > >
> > > > > > >> > > > >> > >>> > > > > > wrote:
> > > > > > >> > > > >> > >>> > > > > > >
> > > > > > >> > > > >> > >>> > > > > > > > That's a good question. I can
> think
> > > of
> > > > > > three
> > > > > > >> > > actions
> > > > > > >> > > > >> if
> > > > > > >> > > > >> > the
> > > > > > >> > > > >> > >>> > rack
> > > > > > >> > > > >> > >>> > > > > > > > information is incomplete:
> > > > > > >> > > > >> > >>> > > > > > > >
> > > > > > >> > > > >> > >>> > > > > > > > 1. Treat the node without rack as
> > if
> > > it
> > > > > is
> > > > > > on
> > > > > > >> > its
> > > > > > >> > > > >> unique
> > > > > > >> > > > >> > >>> rack
> > > > > > >> > > > >> > >>> > > > > > > > 2. Disregard all rack information
> > and
> > > > > > >> fallback
> > > > > > >> > to
> > > > > > >> > > > >> current
> > > > > > >> > > > >> > >>> > > algorithm
> > > > > > >> > > > >> > >>> > > > > > > > 3. Fail-fast
> > > > > > >> > > > >> > >>> > > > > > > >
> > > > > > >> > > > >> > >>> > > > > > > > Now I think about it, one and
> three
> > > > make
> > > > > > more
> > > > > > >> > > sense.
> > > > > > >> > > > >> The
> > > > > > >> > > > >> > >>> reason
> > > > > > >> > > > >> > >>> > > for
> > > > > > >> > > > >> > >>> > > > > > > > fail-fast is that user mistake
> for
> > > not
> > > > > > >> providing
> > > > > > >> > > the
> > > > > > >> > > > >> rack
> > > > > > >> > > > >> > >>> may
> > > > > > >> > > > >> > >>> > > never
> > > > > > >> > > > >> > >>> > > > > be
> > > > > > >> > > > >> > >>> > > > > > > > found if we tolerate that and the
> > > > > > assignment
> > > > > > >> may
> > > > > > >> > > not
> > > > > > >> > > > >> be
> > > > > > >> > > > >> > >>> rack
> > > > > > >> > > > >> > >>> > > aware
> > > > > > >> > > > >> > >>> > > > as
> > > > > > >> > > > >> > >>> > > > > > the
> > > > > > >> > > > >> > >>> > > > > > > > user has expected and this
> creates
> > > > debug
> > > > > > >> > problems
> > > > > > >> > > > when
> > > > > > >> > > > >> > >>> things
> > > > > > >> > > > >> > >>> > > fail.
> > > > > > >> > > > >> > >>> > > > > > > >
> > > > > > >> > > > >> > >>> > > > > > > > What do you think? If not
> > fail-fast,
> > > is
> > > > > > there
> > > > > > >> > > anyway
> > > > > > >> > > > >> we
> > > > > > >> > > > >> > can
> > > > > > >> > > > >> > >>> > make
> > > > > > >> > > > >> > >>> > > > the
> > > > > > >> > > > >> > >>> > > > > > user
> > > > > > >> > > > >> > >>> > > > > > > > error standing out?
> > > > > > >> > > > >> > >>> > > > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >
> > > > > > >> > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM,
> > > Gwen
> > > > > > >> Shapira <
> > > > > > >> > > > >> > >>> > > gwen@confluent.io>
> > > > > > >> > > > >> > >>> > > > > > > wrote:
> > > > > > >> > > > >> > >>> > > > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when
> some
> > > > > brokers
> > > > > > >> have
> > > > > > >> > > > rack
> > > > > > >> > > > >> > >>> > assignment
> > > > > > >> > > > >> > >>> > > > and
> > > > > > >> > > > >> > >>> > > > > > some
> > > > > > >> > > > >> > >>> > > > > > > >> don't, do we act like none of
> them
> > > > have
> > > > > > it?
> > > > > > >> or
> > > > > > >> > > like
> > > > > > >> > > > >> > those
> > > > > > >> > > > >> > >>> > > without
> > > > > > >> > > > >> > >>> > > > > > > >> assignment are in their own
> rack?
> > > > > > >> > > > >> > >>> > > > > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> The first scenario is good when
> > > first
> > > > > > >> setting
> > > > > > >> > up
> > > > > > >> > > > >> > >>> > rack-awareness,
> > > > > > >> > > > >> > >>> > > > but
> > > > > > >> > > > >> > >>> > > > > > the
> > > > > > >> > > > >> > >>> > > > > > > >> second makes more sense for
> > on-going
> > > > > > >> > maintenance
> > > > > > >> > > (I
> > > > > > >> > > > >> can
> > > > > > >> > > > >> > >>> > totally
> > > > > > >> > > > >> > >>> > > > see
> > > > > > >> > > > >> > >>> > > > > > > >> someone
> > > > > > >> > > > >> > >>> > > > > > > >> adding a node and forgetting to
> > set
> > > > the
> > > > > > rack
> > > > > > >> > > > >> property,
> > > > > > >> > > > >> > we
> > > > > > >> > > > >> > >>> > don't
> > > > > > >> > > > >> > >>> > > > want
> > > > > > >> > > > >> > >>> > > > > > > this
> > > > > > >> > > > >> > >>> > > > > > > >> to change behavior for anything
> > > except
> > > > > the
> > > > > > >> new
> > > > > > >> > > > node).
> > > > > > >> > > > >> > >>> > > > > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> What do you think?
> > > > > > >> > > > >> > >>> > > > > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> Gwen
> > > > > > >> > > > >> > >>> > > > > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13
> AM,
> > > > Allen
> > > > > > >> Wang <
> > > > > > >> > > > >> > >>> > > > allenxwang@gmail.com>
> > > > > > >> > > > >> > >>> > > > > > > >> wrote:
> > > > > > >> > > > >> > >>> > > > > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> > For scenario 1:
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >> > - Add the rack information to
> > > broker
> > > > > > >> property
> > > > > > >> > > > file
> > > > > > >> > > > >> or
> > > > > > >> > > > >> > >>> > > > dynamically
> > > > > > >> > > > >> > >>> > > > > > set
> > > > > > >> > > > >> > >>> > > > > > > >> it in
> > > > > > >> > > > >> > >>> > > > > > > >> > the wrapper code to bootstrap
> > > Kafka
> > > > > > >> server.
> > > > > > >> > You
> > > > > > >> > > > >> would
> > > > > > >> > > > >> > do
> > > > > > >> > > > >> > >>> > that
> > > > > > >> > > > >> > >>> > > > for
> > > > > > >> > > > >> > >>> > > > > > all
> > > > > > >> > > > >> > >>> > > > > > > >> > brokers and restart the
> brokers
> > > one
> > > > by
> > > > > > >> one.
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >> > In this scenario, the complete
> > > > broker
> > > > > to
> > > > > > >> rack
> > > > > > >> > > > >> mapping
> > > > > > >> > > > >> > >>> may
> > > > > > >> > > > >> > >>> > not
> > > > > > >> > > > >> > >>> > > be
> > > > > > >> > > > >> > >>> > > > > > > >> available
> > > > > > >> > > > >> > >>> > > > > > > >> > until every broker is
> restarted.
> > > > > During
> > > > > > >> that
> > > > > > >> > > time
> > > > > > >> > > > >> we
> > > > > > >> > > > >> > >>> fall
> > > > > > >> > > > >> > >>> > back
> > > > > > >> > > > >> > >>> > > > to
> > > > > > >> > > > >> > >>> > > > > > > >> default
> > > > > > >> > > > >> > >>> > > > > > > >> > replica assignment algorithm.
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >> > For scenario 2:
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >> > - Add the rack information to
> > > broker
> > > > > > >> property
> > > > > > >> > > > file
> > > > > > >> > > > >> or
> > > > > > >> > > > >> > >>> > > > dynamically
> > > > > > >> > > > >> > >>> > > > > > set
> > > > > > >> > > > >> > >>> > > > > > > >> it in
> > > > > > >> > > > >> > >>> > > > > > > >> > the wrapper code and start the
> > > > broker.
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36
> PM,
> > > > Gwen
> > > > > > >> > Shapira <
> > > > > > >> > > > >> > >>> > > > gwen@confluent.io>
> > > > > > >> > > > >> > >>> > > > > > > >> wrote:
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > Can you clarify the workflow
> > for
> > > > the
> > > > > > >> > > following
> > > > > > >> > > > >> > >>> scenarios:
> > > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > 1. I currently have 6
> brokers
> > > and
> > > > > want
> > > > > > >> to
> > > > > > >> > add
> > > > > > >> > > > >> rack
> > > > > > >> > > > >> > >>> > > information
> > > > > > >> > > > >> > >>> > > > > for
> > > > > > >> > > > >> > >>> > > > > > > >> each
> > > > > > >> > > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker
> > and I
> > > > > want
> > > > > > to
> > > > > > >> > > > specify
> > > > > > >> > > > >> > which
> > > > > > >> > > > >> > >>> > rack
> > > > > > >> > > > >> > >>> > > it
> > > > > > >> > > > >> > >>> > > > > > > >> belongs on
> > > > > > >> > > > >> > >>> > > > > > > >> > > while adding it.
> > > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > Thanks!
> > > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21
> > PM,
> > > > > Allen
> > > > > > >> > Wang <
> > > > > > >> > > > >> > >>> > > > > allenxwang@gmail.com
> > > > > > >> > > > >> > >>> > > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > wrote:
> > > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > We discussed the KIP in
> the
> > > > > hangout
> > > > > > >> > today.
> > > > > > >> > > > The
> > > > > > >> > > > >> > >>> > > > recommendation
> > > > > > >> > > > >> > >>> > > > > is
> > > > > > >> > > > >> > >>> > > > > > > to
> > > > > > >> > > > >> > >>> > > > > > > >> > make
> > > > > > >> > > > >> > >>> > > > > > > >> > > > rack as a broker property
> in
> > > > > > >> ZooKeeper.
> > > > > > >> > For
> > > > > > >> > > > >> users
> > > > > > >> > > > >> > >>> with
> > > > > > >> > > > >> > >>> > > > > existing
> > > > > > >> > > > >> > >>> > > > > > > rack
> > > > > > >> > > > >> > >>> > > > > > > >> > > > information stored
> > somewhere,
> > > > they
> > > > > > >> would
> > > > > > >> > > need
> > > > > > >> > > > >> to
> > > > > > >> > > > >> > >>> > retrieve
> > > > > > >> > > > >> > >>> > > > the
> > > > > > >> > > > >> > >>> > > > > > > >> > information
> > > > > > >> > > > >> > >>> > > > > > > >> > > > at broker start up and
> > > > dynamically
> > > > > > set
> > > > > > >> > the
> > > > > > >> > > > rack
> > > > > > >> > > > >> > >>> > property,
> > > > > > >> > > > >> > >>> > > > > which
> > > > > > >> > > > >> > >>> > > > > > > can
> > > > > > >> > > > >> > >>> > > > > > > >> be
> > > > > > >> > > > >> > >>> > > > > > > >> > > > implemented as a wrapper
> to
> > > > > > bootstrap
> > > > > > >> > > broker.
> > > > > > >> > > > >> > There
> > > > > > >> > > > >> > >>> will
> > > > > > >> > > > >> > >>> > > be
> > > > > > >> > > > >> > >>> > > > no
> > > > > > >> > > > >> > >>> > > > > > > >> > interface
> > > > > > >> > > > >> > >>> > > > > > > >> > > or
> > > > > > >> > > > >> > >>> > > > > > > >> > > > pluggable implementation
> to
> > > > > retrieve
> > > > > > >> the
> > > > > > >> > > rack
> > > > > > >> > > > >> > >>> > information.
> > > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > The assumption is that you
> > > > always
> > > > > > >> need to
> > > > > > >> > > > >> restart
> > > > > > >> > > > >> > >>> the
> > > > > > >> > > > >> > >>> > > broker
> > > > > > >> > > > >> > >>> > > > > to
> > > > > > >> > > > >> > >>> > > > > > > >> make a
> > > > > > >> > > > >> > >>> > > > > > > >> > > > change to the rack.
> > > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > Once the rack becomes a
> > broker
> > > > > > >> property,
> > > > > > >> > it
> > > > > > >> > > > >> will
> > > > > > >> > > > >> > be
> > > > > > >> > > > >> > >>> > > possible
> > > > > > >> > > > >> > >>> > > > > to
> > > > > > >> > > > >> > >>> > > > > > > make
> > > > > > >> > > > >> > >>> > > > > > > >> > rack
> > > > > > >> > > > >> > >>> > > > > > > >> > > > part of the meta data to
> > help
> > > > the
> > > > > > >> > consumer
> > > > > > >> > > > >> choose
> > > > > > >> > > > >> > >>> which
> > > > > > >> > > > >> > >>> > in
> > > > > > >> > > > >> > >>> > > > > sync
> > > > > > >> > > > >> > >>> > > > > > > >> replica
> > > > > > >> > > > >> > >>> > > > > > > >> > > to
> > > > > > >> > > > >> > >>> > > > > > > >> > > > consume from as part of
> the
> > > > future
> > > > > > >> > consumer
> > > > > > >> > > > >> > >>> enhancement.
> > > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > I will update the KIP.
> > > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > Thanks,
> > > > > > >> > > > >> > >>> > > > > > > >> > > > Allen
> > > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at
> 9:23
> > > AM,
> > > > > > Allen
> > > > > > >> > Wang
> > > > > > >> > > <
> > > > > > >> > > > >> > >>> > > > > > allenxwang@gmail.com>
> > > > > > >> > > > >> > >>> > > > > > > >> > wrote:
> > > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP
> > > > hangout
> > > > > > but
> > > > > > >> > this
> > > > > > >> > > > KIP
> > > > > > >> > > > >> > was
> > > > > > >> > > > >> > >>> not
> > > > > > >> > > > >> > >>> > > > > > discussed
> > > > > > >> > > > >> > >>> > > > > > > >> due
> > > > > > >> > > > >> > >>> > > > > > > >> > to
> > > > > > >> > > > >> > >>> > > > > > > >> > > > > time constraint.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > > However, after hearing
> > > > > discussion
> > > > > > of
> > > > > > >> > > > KIP-35,
> > > > > > >> > > > >> I
> > > > > > >> > > > >> > >>> have
> > > > > > >> > > > >> > >>> > the
> > > > > > >> > > > >> > >>> > > > > > feeling
> > > > > > >> > > > >> > >>> > > > > > > >> that
> > > > > > >> > > > >> > >>> > > > > > > >> > > > > incompatibility (caused
> by
> > > new
> > > > > > >> broker
> > > > > > >> > > > >> property)
> > > > > > >> > > > >> > >>> > between
> > > > > > >> > > > >> > >>> > > > > > brokers
> > > > > > >> > > > >> > >>> > > > > > > >> with
> > > > > > >> > > > >> > >>> > > > > > > >> > > > > different versions  will
> > be
> > > > > solved
> > > > > > >> > there.
> > > > > > >> > > > In
> > > > > > >> > > > >> > >>> addition,
> > > > > > >> > > > >> > >>> > > > > having
> > > > > > >> > > > >> > >>> > > > > > > >> stack
> > > > > > >> > > > >> > >>> > > > > > > >> > in
> > > > > > >> > > > >> > >>> > > > > > > >> > > > > broker property as meta
> > data
> > > > may
> > > > > > >> also
> > > > > > >> > > help
> > > > > > >> > > > >> > >>> consumers
> > > > > > >> > > > >> > >>> > in
> > > > > > >> > > > >> > >>> > > > the
> > > > > > >> > > > >> > >>> > > > > > > >> future.
> > > > > > >> > > > >> > >>> > > > > > > >> > So
> > > > > > >> > > > >> > >>> > > > > > > >> > > I
> > > > > > >> > > > >> > >>> > > > > > > >> > > > am
> > > > > > >> > > > >> > >>> > > > > > > >> > > > > open to adding stack
> > > property
> > > > to
> > > > > > >> > broker.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss
> > > this
> > > > in
> > > > > > the
> > > > > > >> > next
> > > > > > >> > > > KIP
> > > > > > >> > > > >> > >>> hangout.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at
> > 2:46
> > > > PM,
> > > > > > >> Allen
> > > > > > >> > > > Wang <
> > > > > > >> > > > >> > >>> > > > > > > allenxwang@gmail.com
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > wrote:
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> Can you send me the
> > > > information
> > > > > > on
> > > > > > >> the
> > > > > > >> > > > next
> > > > > > >> > > > >> KIP
> > > > > > >> > > > >> > >>> > > hangout?
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> Currently the
> broker-rack
> > > > > mapping
> > > > > > >> is
> > > > > > >> > not
> > > > > > >> > > > >> > cached.
> > > > > > >> > > > >> > >>> In
> > > > > > >> > > > >> > >>> > > > > > KafkaApis,
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> RackLocator.getRackInfo()
> > > is
> > > > > > called
> > > > > > >> > each
> > > > > > >> > > > >> time
> > > > > > >> > > > >> > the
> > > > > > >> > > > >> > >>> > > mapping
> > > > > > >> > > > >> > >>> > > > > is
> > > > > > >> > > > >> > >>> > > > > > > >> needed
> > > > > > >> > > > >> > >>> > > > > > > >> > > for
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> auto topic creation.
> This
> > > > will
> > > > > > >> ensure
> > > > > > >> > > > latest
> > > > > > >> > > > >> > >>> mapping
> > > > > > >> > > > >> > >>> > is
> > > > > > >> > > > >> > >>> > > > > used
> > > > > > >> > > > >> > >>> > > > > > at
> > > > > > >> > > > >> > >>> > > > > > > >> any
> > > > > > >> > > > >> > >>> > > > > > > >> > > > time.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> The ability to get the
> > > > complete
> > > > > > >> > mapping
> > > > > > >> > > > >> makes
> > > > > > >> > > > >> > it
> > > > > > >> > > > >> > >>> > simple
> > > > > > >> > > > >> > >>> > > > to
> > > > > > >> > > > >> > >>> > > > > > > reuse
> > > > > > >> > > > >> > >>> > > > > > > >> the
> > > > > > >> > > > >> > >>> > > > > > > >> > > > same
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> interface in command
> line
> > > > > tools.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at
> > > 11:01
> > > > > AM,
> > > > > > >> > Aditya
> > > > > > >> > > > >> > >>> Auradkar <
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > aauradkar@linkedin.com.invalid
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss
> this
> > > > during
> > > > > > the
> > > > > > >> > next
> > > > > > >> > > > KIP
> > > > > > >> > > > >> > >>> hangout?
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> I do see that a
> > pluggable
> > > > rack
> > > > > > >> > locator
> > > > > > >> > > > can
> > > > > > >> > > > >> be
> > > > > > >> > > > >> > >>> useful
> > > > > > >> > > > >> > >>> > > > but I
> > > > > > >> > > > >> > >>> > > > > > do
> > > > > > >> > > > >> > >>> > > > > > > >> see a
> > > > > > >> > > > >> > >>> > > > > > > >> > > few
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> concerns:
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as
> > > > > described
> > > > > > in
> > > > > > >> > the
> > > > > > >> > > > >> > >>> document),
> > > > > > >> > > > >> > >>> > > > implies
> > > > > > >> > > > >> > >>> > > > > > that
> > > > > > >> > > > >> > >>> > > > > > > >> it
> > > > > > >> > > > >> > >>> > > > > > > >> > can
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> discover rack
> > information
> > > > for
> > > > > > any
> > > > > > >> > node
> > > > > > >> > > in
> > > > > > >> > > > >> the
> > > > > > >> > > > >> > >>> > cluster.
> > > > > > >> > > > >> > >>> > > > How
> > > > > > >> > > > >> > >>> > > > > > > does
> > > > > > >> > > > >> > >>> > > > > > > >> it
> > > > > > >> > > > >> > >>> > > > > > > >> > > deal
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> with rack location
> > > changes?
> > > > > For
> > > > > > >> > > example,
> > > > > > >> > > > >> if I
> > > > > > >> > > > >> > >>> moved
> > > > > > >> > > > >> > >>> > > > broker
> > > > > > >> > > > >> > >>> > > > > > id
> > > > > > >> > > > >> > >>> > > > > > > >> (1)
> > > > > > >> > > > >> > >>> > > > > > > >> > > from
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to
> > > start
> > > > > > that
> > > > > > >> > > broker
> > > > > > >> > > > >> with
> > > > > > >> > > > >> > a
> > > > > > >> > > > >> > >>> > newer
> > > > > > >> > > > >> > >>> > > > rack
> > > > > > >> > > > >> > >>> > > > > > > >> config.
> > > > > > >> > > > >> > >>> > > > > > > >> > If
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers
> > > broker
> > > > > ->
> > > > > > >> rack
> > > > > > >> > > > >> > >>> information at
> > > > > > >> > > > >> > >>> > > > start
> > > > > > >> > > > >> > >>> > > > > up
> > > > > > >> > > > >> > >>> > > > > > > >> time,
> > > > > > >> > > > >> > >>> > > > > > > >> > > any
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> change to a broker
> will
> > > > > require
> > > > > > >> > > bouncing
> > > > > > >> > > > >> the
> > > > > > >> > > > >> > >>> entire
> > > > > > >> > > > >> > >>> > > > > cluster
> > > > > > >> > > > >> > >>> > > > > > > >> since
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> createTopic requests
> can
> > > be
> > > > > sent
> > > > > > >> to
> > > > > > >> > any
> > > > > > >> > > > >> node
> > > > > > >> > > > >> > in
> > > > > > >> > > > >> > >>> the
> > > > > > >> > > > >> > >>> > > > > cluster.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> For this reason it may
> > be
> > > > > > simpler
> > > > > > >> to
> > > > > > >> > > have
> > > > > > >> > > > >> each
> > > > > > >> > > > >> > >>> node
> > > > > > >> > > > >> > >>> > be
> > > > > > >> > > > >> > >>> > > > > aware
> > > > > > >> > > > >> > >>> > > > > > > of
> > > > > > >> > > > >> > >>> > > > > > > >> its
> > > > > > >> > > > >> > >>> > > > > > > >> > > own
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack and persist it in
> > ZK
> > > > > during
> > > > > > >> > start
> > > > > > >> > > up
> > > > > > >> > > > >> > time.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> - A pluggable
> > RackLocator
> > > > > relies
> > > > > > >> on
> > > > > > >> > an
> > > > > > >> > > > >> > external
> > > > > > >> > > > >> > >>> > > service
> > > > > > >> > > > >> > >>> > > > > > being
> > > > > > >> > > > >> > >>> > > > > > > >> > > available
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> to
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> serve rack
> information.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I
> > looked
> > > > up
> > > > > > how
> > > > > > >> a
> > > > > > >> > > > couple
> > > > > > >> > > > >> of
> > > > > > >> > > > >> > >>> other
> > > > > > >> > > > >> > >>> > > > > systems
> > > > > > >> > > > >> > >>> > > > > > > deal
> > > > > > >> > > > >> > >>> > > > > > > >> > with
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> For Cassandra some
> > > > interesting
> > > > > > >> modes
> > > > > > >> > > are:
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> (Property File
> > > > configuration)
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >>
> > > > > > >> > > > >> > >>> > > > > > >
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > >
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> >
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> >
> > > > > > >> > > > >>
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >>
> > > > > > >> > > > >> > >>> > > > > > >
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > >
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> >
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> >
> > > > > > >> > > > >>
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Voldemort does a
> static
> > > node
> > > > > ->
> > > > > > >> zone
> > > > > > >> > > > >> > assignment
> > > > > > >> > > > >> > >>> > based
> > > > > > >> > > > >> > >>> > > on
> > > > > > >> > > > >> > >>> > > > > > > >> > > configuration.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Aditya
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015
> at
> > > > 10:05
> > > > > > AM,
> > > > > > >> > Allen
> > > > > > >> > > > >> Wang <
> > > > > > >> > > > >> > >>> > > > > > > >> allenxwang@gmail.com
> > > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > I would like to see
> if
> > > we
> > > > > can
> > > > > > do
> > > > > > >> > > both:
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator
> > > > pluggable
> > > > > > to
> > > > > > >> > > > >> facilitate
> > > > > > >> > > > >> > >>> > migration
> > > > > > >> > > > >> > >>> > > > > with
> > > > > > >> > > > >> > >>> > > > > > > >> > existing
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make rack an
> > optional
> > > > > > property
> > > > > > >> > for
> > > > > > >> > > > >> broker.
> > > > > > >> > > > >> > >>> If
> > > > > > >> > > > >> > >>> > rack
> > > > > > >> > > > >> > >>> > > > is
> > > > > > >> > > > >> > >>> > > > > > > >> available
> > > > > > >> > > > >> > >>> > > > > > > >> > > > from
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as
> > > source
> > > > > of
> > > > > > >> > truth.
> > > > > > >> > > > For
> > > > > > >> > > > >> > users
> > > > > > >> > > > >> > >>> > with
> > > > > > >> > > > >> > >>> > > > > > existing
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> broker-rack
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere
> > else,
> > > > they
> > > > > > can
> > > > > > >> > use
> > > > > > >> > > > the
> > > > > > >> > > > >> > >>> pluggable
> > > > > > >> > > > >> > >>> > > way
> > > > > > >> > > > >> > >>> > > > > or
> > > > > > >> > > > >> > >>> > > > > > > they
> > > > > > >> > > > >> > >>> > > > > > > >> > can
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> transfer
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > the mapping to the
> > > broker
> > > > > rack
> > > > > > >> > > > property.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > One thing I am not
> > sure
> > > is
> > > > > > what
> > > > > > >> > > happens
> > > > > > >> > > > >> at
> > > > > > >> > > > >> > >>> rolling
> > > > > > >> > > > >> > >>> > > > > upgrade
> > > > > > >> > > > >> > >>> > > > > > > >> when
> > > > > > >> > > > >> > >>> > > > > > > >> > we
> > > > > > >> > > > >> > >>> > > > > > > >> > > > have
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > rack as a broker
> > > property.
> > > > > For
> > > > > > >> > > brokers
> > > > > > >> > > > >> with
> > > > > > >> > > > >> > >>> older
> > > > > > >> > > > >> > >>> > > > > version
> > > > > > >> > > > >> > >>> > > > > > of
> > > > > > >> > > > >> > >>> > > > > > > >> > Kafka,
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> will it
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > cause problem for
> > them?
> > > If
> > > > > so,
> > > > > > >> is
> > > > > > >> > > there
> > > > > > >> > > > >> any
> > > > > > >> > > > >> > >>> > > > workaround?
> > > > > > >> > > > >> > >>> > > > > I
> > > > > > >> > > > >> > >>> > > > > > > also
> > > > > > >> > > > >> > >>> > > > > > > >> > > think
> > > > > > >> > > > >> > >>> > > > > > > >> > > > it
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > would be better not
> to
> > > > have
> > > > > > >> rack in
> > > > > > >> > > the
> > > > > > >> > > > >> > >>> controller
> > > > > > >> > > > >> > >>> > > > wire
> > > > > > >> > > > >> > >>> > > > > > > >> protocol
> > > > > > >> > > > >> > >>> > > > > > > >> > > but
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> not
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > sure if it is
> > > achievable.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > Thanks,
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > Allen
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015
> > at
> > > > 4:55
> > > > > > PM,
> > > > > > >> > Todd
> > > > > > >> > > > >> > Palino <
> > > > > > >> > > > >> > >>> > > > > > > >> tpalino@gmail.com>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the
> > > idea
> > > > > of a
> > > > > > >> > > > pluggable
> > > > > > >> > > > >> > >>> locator.
> > > > > > >> > > > >> > >>> > > For
> > > > > > >> > > > >> > >>> > > > > > > >> example, we
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> already
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > have an interface
> > for
> > > > > > >> discovering
> > > > > > >> > > > >> > >>> information
> > > > > > >> > > > >> > >>> > > about
> > > > > > >> > > > >> > >>> > > > > the
> > > > > > >> > > > >> > >>> > > > > > > >> > physical
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> location
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > of servers. I
> don't
> > > > relish
> > > > > > the
> > > > > > >> > idea
> > > > > > >> > > > of
> > > > > > >> > > > >> > >>> having to
> > > > > > >> > > > >> > >>> > > > > > maintain
> > > > > > >> > > > >> > >>> > > > > > > >> data
> > > > > > >> > > > >> > >>> > > > > > > >> > in
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > multiple
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > places.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > -Todd
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28,
> 2015
> > > at
> > > > > 4:48
> > > > > > >> PM,
> > > > > > >> > > > Aditya
> > > > > > >> > > > >> > >>> > Auradkar <
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > > aauradkar@linkedin.com.invalid
> > > > > > >> >
> > > > > > >> > > > wrote:
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for
> > starting
> > > > this
> > > > > > KIP
> > > > > > >> > > Allen.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with
> Gwen
> > > that
> > > > > > >> having a
> > > > > > >> > > > >> > >>> RackLocator
> > > > > > >> > > > >> > >>> > > class
> > > > > > >> > > > >> > >>> > > > > that
> > > > > > >> > > > >> > >>> > > > > > > is
> > > > > > >> > > > >> > >>> > > > > > > >> > > > pluggable
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > seems
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > to be too
> complex.
> > > The
> > > > > KIP
> > > > > > >> > refers
> > > > > > >> > > > to
> > > > > > >> > > > >> > >>> > potentially
> > > > > > >> > > > >> > >>> > > > > > non-ZK
> > > > > > >> > > > >> > >>> > > > > > > >> > storage
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> for the
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack info which
> I
> > > > don't
> > > > > > >> think
> > > > > > >> > is
> > > > > > >> > > > >> > >>> necessary.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can
> > > persist
> > > > > > this
> > > > > > >> > info
> > > > > > >> > > in
> > > > > > >> > > > >> zk
> > > > > > >> > > > >> > >>> under
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> /brokers/ids/<broker_id>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > similar to other
> > > > broker
> > > > > > >> > > properties
> > > > > > >> > > > >> and
> > > > > > >> > > > >> > >>> add a
> > > > > > >> > > > >> > >>> > > > config
> > > > > > >> > > > >> > >>> > > > > in
> > > > > > >> > > > >> > >>> > > > > > > >> > > > KafkaConfig
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > called
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > "rack".
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > > >> > > > >> > >>> > > > > > >
> > > > > > >> > > > >>
> > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > > > > >> > > > >> > >>> > > > > > > >> > > "rack":
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > "abc"}
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28,
> > 2015
> > > > at
> > > > > > 2:30
> > > > > > >> > PM,
> > > > > > >> > > > Gwen
> > > > > > >> > > > >> > >>> Shapira
> > > > > > >> > > > >> > >>> > <
> > > > > > >> > > > >> > >>> > > > > > > >> > > gwen@confluent.io
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > wrote:
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks
> > for
> > > > > > putting
> > > > > > >> > out a
> > > > > > >> > > > KIP
> > > > > > >> > > > >> > for
> > > > > > >> > > > >> > >>> > this.
> > > > > > >> > > > >> > >>> > > > This
> > > > > > >> > > > >> > >>> > > > > > is
> > > > > > >> > > > >> > >>> > > > > > > >> super
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> important
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > for
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > production
> > > > deployments
> > > > > > of
> > > > > > >> > > Kafka.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure
> > we
> > > > want
> > > > > > "as
> > > > > > >> > many
> > > > > > >> > > > >> racks
> > > > > > >> > > > >> > as
> > > > > > >> > > > >> > >>> > > > > possible"?
> > > > > > >> > > > >> > >>> > > > > > > I'd
> > > > > > >> > > > >> > >>> > > > > > > >> > want
> > > > > > >> > > > >> > >>> > > > > > > >> > > to
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > balance
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > between safety
> > > (more
> > > > > > >> racks)
> > > > > > >> > and
> > > > > > >> > > > >> > network
> > > > > > >> > > > >> > >>> > > > > utilization
> > > > > > >> > > > >> > >>> > > > > > > >> > (traffic
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> within a
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > uses the
> > > > > high-bandwidth
> > > > > > >> TOR
> > > > > > >> > > > >> switch).
> > > > > > >> > > > >> > One
> > > > > > >> > > > >> > >>> > > replica
> > > > > > >> > > > >> > >>> > > > > on
> > > > > > >> > > > >> > >>> > > > > > a
> > > > > > >> > > > >> > >>> > > > > > > >> > > different
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > and
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on
> same
> > > > rack
> > > > > > (if
> > > > > > >> > > > possible)
> > > > > > >> > > > >> > >>> sounds
> > > > > > >> > > > >> > >>> > > > better
> > > > > > >> > > > >> > >>> > > > > to
> > > > > > >> > > > >> > >>> > > > > > > me.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 2)
> Rack-locator
> > > > class
> > > > > > >> seems
> > > > > > >> > > > overly
> > > > > > >> > > > >> > >>> complex
> > > > > > >> > > > >> > >>> > > > > compared
> > > > > > >> > > > >> > >>> > > > > > to
> > > > > > >> > > > >> > >>> > > > > > > >> > > adding a
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack.number
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > property to
> the
> > > > broker
> > > > > > >> > > properties
> > > > > > >> > > > >> > file.
> > > > > > >> > > > >> > >>> Why
> > > > > > >> > > > >> > >>> > do
> > > > > > >> > > > >> > >>> > > > we
> > > > > > >> > > > >> > >>> > > > > > want
> > > > > > >> > > > >> > >>> > > > > > > >> > that?
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep
> 28,
> > > 2015
> > > > > at
> > > > > > >> 12:15
> > > > > > >> > > PM,
> > > > > > >> > > > >> > Allen
> > > > > > >> > > > >> > >>> > Wang <
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > wrote:
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka
> > > > > > Developers,
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just
> created
> > > > > KIP-36
> > > > > > >> for
> > > > > > >> > > rack
> > > > > > >> > > > >> aware
> > > > > > >> > > > >> > >>> > replica
> > > > > > >> > > > >> > >>> > > > > > > >> assignment.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >>
> > > > > > >> > > > >> > >>> > > > > > >
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > >
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> >
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> >
> > > > > > >> > > > >>
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is
> to
> > > > > utilize
> > > > > > >> the
> > > > > > >> > > > >> isolation
> > > > > > >> > > > >> > >>> > > provided
> > > > > > >> > > > >> > >>> > > > by
> > > > > > >> > > > >> > >>> > > > > > the
> > > > > > >> > > > >> > >>> > > > > > > >> > racks
> > > > > > >> > > > >> > >>> > > > > > > >> > > in
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> data
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > center
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > and
> distribute
> > > > > > replicas
> > > > > > >> to
> > > > > > >> > > > racks
> > > > > > >> > > > >> to
> > > > > > >> > > > >> > >>> > provide
> > > > > > >> > > > >> > >>> > > > > fault
> > > > > > >> > > > >> > >>> > > > > > > >> > > tolerance.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are
> > > > > welcome.
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > > >> > > > >> > >>> > > > > > > >>
> > > > > > >> > > > >> > >>> > > > > > > >
> > > > > > >> > > > >> > >>> > > > > > > >
> > > > > > >> > > > >> > >>> > > > > > >
> > > > > > >> > > > >> > >>> > > > > >
> > > > > > >> > > > >> > >>> > > > >
> > > > > > >> > > > >> > >>> > > >
> > > > > > >> > > > >> > >>> > >
> > > > > > >> > > > >> > >>> >
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>> --
> > > > > > >> > > > >> > >>> Thanks,
> > > > > > >> > > > >> > >>> Neha
> > > > > > >> > > > >> > >>>
> > > > > > >> > > > >> > >>
> > > > > > >> > > > >> > >>
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> >
> > > > > > >> > > > >>
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Jun Rao <ju...@confluent.io>.
Allen,

It's not ideal to add a new field in json without increasing the version.
Also, if we don't fix this issue in 0.9.0, if we ever change the version of
json in the future, the consumer in 0.9.0 will break after the broker is
upgraded to the new release. So, I suggest that we fix the behavior in
ZkUtils.getBrokerInfo()
in both trunk and 0.9.0 branch. After we release 0.9.0.1, the upgrade path
is for the old consumer to be upgraded to 0.9.0.1 before upgrading the
broker to 0.9.1 and beyond. This fix can be done in a separate jira.

Thanks,

Jun

On Tue, Jan 12, 2016 at 5:35 PM, Allen Wang <al...@gmail.com> wrote:

> Agreed. So it seems that for 0.9.1, the only option is to keep the JSON
> version unchanged. But as part of the PR, I can change the behavior of
> ZkUtils.getBrokerInfo()
> to make it compatible with future JSON versions.
>
> Thanks,
> Allen
>
>
> On Tue, Jan 12, 2016 at 2:57 PM, Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Allen,
> >
> > That's a good point. In 0.9.0.0, the old consumer reads broker info
> > directly from ZK and the code throws an exception if the version in json
> is
> > not 1 or 2. This old consumer will break when we upgrade the broker json
> to
> > version 3 in ZK in 0.9.1, which will be an issue. We overlooked this
> issue
> > in 0.9.0.0. The easiest fix is probably not to check the version in
> > ZkUtils.getBrokerInfo().
> > This way, as long as we are only adding new fields in broker json, we can
> > preserve the compatibility.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Jan 12, 2016 at 1:52 PM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > Hi Jun,
> > >
> > > That's a good suggestion. However, it does not solve the problem for
> the
> > > clients or thirty party tools that get broker information directly from
> > > ZooKeeper.
> > >
> > > Thanks,
> > > Allen
> > >
> > >
> > > On Tue, Jan 12, 2016 at 1:29 PM, Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Allen,
> > > >
> > > > Another way to do this is the following.
> > > >
> > > > When inter.broker.protocol.version is set to 0.9.0, the broker will
> > write
> > > > the broker info in ZK using version 2, ignoring the rack info.
> > > >
> > > > When inter.broker.protocol.version is set to 0.9.1, the broker will
> > write
> > > > the broker info in ZK using version 3, including the rack info.
> > > >
> > > > If one follows the upgrade process, after the 2nd round of rolling
> > > bounces,
> > > > every broker is capable of parsing version 3 of broker info in ZK.
> This
> > > is
> > > > when the rack-aware feature will be used.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Jan 12, 2016 at 12:19 PM, Allen Wang <al...@gmail.com>
> > > wrote:
> > > >
> > > > > Regarding the JSON version of Broker:
> > > > >
> > > > > I don't why the ZkUtils.getBrokerInfo() restricts the JSON versions
> > it
> > > > can
> > > > > read. It will throw exception if version is not 1 or 2. Seems to me
> > > that
> > > > it
> > > > > will cause compatibility problem whenever the version needs to be
> > > changed
> > > > > and make the upgrade path difficult.
> > > > >
> > > > > One option we have is to make rack also part of version 2 and keep
> > the
> > > > > version 2 unchanged for this update. This will make the old clients
> > > > > compatible. During rolling upgrade, it will also avoid problems if
> > the
> > > > > controller/broker is still the old version.
> > > > >
> > > > > However, ZkUtils.getBrokerInfo() will be updated to return the
> Broker
> > > > with
> > > > > rack so the rack information will be available once the
> server/client
> > > is
> > > > > upgraded to the latest version.
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jan 6, 2016 at 6:28 PM, Allen Wang <al...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Updated KIP according to Jun's comment and included changes to
> TMR.
> > > > > >
> > > > > > On Tue, Jan 5, 2016 at 5:59 PM, Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > >> Hi, Allen,
> > > > > >>
> > > > > >> A couple of minor comments on the KIP.
> > > > > >>
> > > > > >> 1. The version of the broker JSON string says 2. It should be 3.
> > > > > >>
> > > > > >> 2. The new version of UpdateMetadataRequest should be 2, instead
> > of
> > > 1.
> > > > > >> Could you include the full wire protocol of version 2 of
> > > > > >> UpdateMetadataRequest and highlight the changed part?
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Jun
> > > > > >>
> > > > > >> On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <
> allenxwang@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >> > Jun and I had a chance to discuss it in a meeting and it is
> > agreed
> > > > to
> > > > > >> > change the TMR in a different patch.
> > > > > >> >
> > > > > >> > I can change the KIP to include rack in TMR. The essential
> > change
> > > is
> > > > > to
> > > > > >> add
> > > > > >> > rack into class BrokerEndPoint and make TMR version aware.
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
> > > > > >> > aauradkar@linkedin.com.invalid> wrote:
> > > > > >> >
> > > > > >> > > Jun/Allen -
> > > > > >> > >
> > > > > >> > > Did we ever actually agree on whether we should evolve the
> TMR
> > > to
> > > > > >> include
> > > > > >> > > rack info or not?
> > > > > >> > > I don't feel strongly about it but I if it's the right thing
> > to
> > > do
> > > > > we
> > > > > >> > > should probably do it in this KIP (can be a separate
> patch)..
> > it
> > > > > >> isn't a
> > > > > >> > > large change.
> > > > > >> > >
> > > > > >> > > Aditya
> > > > > >> > >
> > > > > >> > > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <
> > > allenxwang@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Added the rolling upgrade instruction in the KIP, similar
> to
> > > > those
> > > > > >> in
> > > > > >> > > 0.9.0
> > > > > >> > > > release notes.
> > > > > >> > > >
> > > > > >> > > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <
> > > > > allenxwang@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi Jun,
> > > > > >> > > > >
> > > > > >> > > > > The reason that TopicMetadataResponse is not included in
> > the
> > > > KIP
> > > > > >> is
> > > > > >> > > that
> > > > > >> > > > > it currently is not version aware . So we need to
> > introduce
> > > > > >> version
> > > > > >> > to
> > > > > >> > > it
> > > > > >> > > > > in order to make sure backward compatibility. It seems
> to
> > > me a
> > > > > big
> > > > > >> > > > change.
> > > > > >> > > > > Do we want to couple it with this KIP? Do we need to
> > further
> > > > > >> discuss
> > > > > >> > > what
> > > > > >> > > > > information to include in the new version besides rack?
> > For
> > > > > >> example,
> > > > > >> > > > should
> > > > > >> > > > > we include broker security protocol in
> > > TopicMetadataResponse?
> > > > > >> > > > >
> > > > > >> > > > > The other option is to make it a separate KIP to make
> > > > > >> > > > > TopicMetadataResponse version aware and decide what to
> > > > include,
> > > > > >> and
> > > > > >> > > make
> > > > > >> > > > > this KIP focus on the rack aware algorithm, admin tools
> > and
> > > > > >> related
> > > > > >> > > > > changes to inter-broker protocol .
> > > > > >> > > > >
> > > > > >> > > > > Thanks,
> > > > > >> > > > > Allen
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <
> > jun@confluent.io>
> > > > > >> wrote:
> > > > > >> > > > >
> > > > > >> > > > >> Allen,
> > > > > >> > > > >>
> > > > > >> > > > >> Thanks for the proposal. A few comments.
> > > > > >> > > > >>
> > > > > >> > > > >> 1. Since this KIP changes the inter broker
> communication
> > > > > protocol
> > > > > >> > > > >> (UpdateMetadataRequest), we will need to document the
> > > upgrade
> > > > > >> path
> > > > > >> > > > >> (similar
> > > > > >> > > > >> to what's described in
> > > > > >> > > > >> http://kafka.apache.org/090/documentation.html#upgrade
> ).
> > > > > >> > > > >>
> > > > > >> > > > >> 2. It might be useful to include the rack info of the
> > > broker
> > > > in
> > > > > >> > > > >> TopicMetadataResponse. This can be useful for
> > > administrative
> > > > > >> tasks,
> > > > > >> > as
> > > > > >> > > > >> well
> > > > > >> > > > >> as read affinity in the future.
> > > > > >> > > > >>
> > > > > >> > > > >> Jun
> > > > > >> > > > >>
> > > > > >> > > > >>
> > > > > >> > > > >>
> > > > > >> > > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <
> > > > > >> allenxwang@gmail.com>
> > > > > >> > > > wrote:
> > > > > >> > > > >>
> > > > > >> > > > >> > If there are no more comments I would like to call
> for
> > a
> > > > > vote.
> > > > > >> > > > >> >
> > > > > >> > > > >> >
> > > > > >> > > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <
> > > > > >> > allenxwang@gmail.com>
> > > > > >> > > > >> wrote:
> > > > > >> > > > >> >
> > > > > >> > > > >> > > KIP is updated with more details and how to handle
> > the
> > > > > >> situation
> > > > > >> > > > where
> > > > > >> > > > >> > > rack information is incomplete.
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > In the situation where rack information is
> > incomplete,
> > > > but
> > > > > we
> > > > > >> > want
> > > > > >> > > > to
> > > > > >> > > > >> > > continue with the assignment, I have suggested to
> > > ignore
> > > > > all
> > > > > >> > rack
> > > > > >> > > > >> > > information and fallback to original algorithm. The
> > > > reason
> > > > > is
> > > > > >> > > > >> explained
> > > > > >> > > > >> > > below:
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > The other options are to assume that the broker
> > without
> > > > the
> > > > > >> rack
> > > > > >> > > > >> belong
> > > > > >> > > > >> > to
> > > > > >> > > > >> > > its own unique rack, or they belong to one
> "default"
> > > > rack.
> > > > > >> > Either
> > > > > >> > > > way
> > > > > >> > > > >> we
> > > > > >> > > > >> > > choose, it is highly likely to result in uneven
> > number
> > > of
> > > > > >> > brokers
> > > > > >> > > in
> > > > > >> > > > >> > racks,
> > > > > >> > > > >> > > and it is quite possible that the "made up" racks
> > will
> > > > have
> > > > > >> much
> > > > > >> > > > fewer
> > > > > >> > > > >> > > number of brokers. As I explained in the KIP,
> uneven
> > > > number
> > > > > >> of
> > > > > >> > > > >> brokers in
> > > > > >> > > > >> > > racks will lead to uneven distribution of replicas
> > > among
> > > > > >> brokers
> > > > > >> > > > (even
> > > > > >> > > > >> > > though the leader distribution is still even). The
> > > > brokers
> > > > > in
> > > > > >> > the
> > > > > >> > > > rack
> > > > > >> > > > >> > that
> > > > > >> > > > >> > > has fewer number of brokers will get more replicas
> > per
> > > > > broker
> > > > > >> > than
> > > > > >> > > > >> > brokers
> > > > > >> > > > >> > > in other racks.
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > Given this fact and the replica assignment produced
> > > will
> > > > be
> > > > > >> > > > incorrect
> > > > > >> > > > >> > > anyway from rack aware point of view, ignoring all
> > rack
> > > > > >> > > information
> > > > > >> > > > >> and
> > > > > >> > > > >> > > fallback to the original algorithm is not a bad
> > choice
> > > > > since
> > > > > >> it
> > > > > >> > > will
> > > > > >> > > > >> at
> > > > > >> > > > >> > > least have a better guarantee of replica
> > distribution.
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > Also for command line tools it gives user a choice
> if
> > > for
> > > > > any
> > > > > >> > > reason
> > > > > >> > > > >> they
> > > > > >> > > > >> > > want to ignore rack information and fallback to the
> > > > > original
> > > > > >> > > > >> algorithm.
> > > > > >> > > > >> > >
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <
> > > > > >> > allenxwang@gmail.com
> > > > > >> > > >
> > > > > >> > > > >> > wrote:
> > > > > >> > > > >> > >
> > > > > >> > > > >> > >> I am busy with some time pressing issues for the
> > last
> > > > few
> > > > > >> > days. I
> > > > > >> > > > >> will
> > > > > >> > > > >> > >> think about how the incomplete rack information
> will
> > > > > affect
> > > > > >> the
> > > > > >> > > > >> balance
> > > > > >> > > > >> > and
> > > > > >> > > > >> > >> update the KIP by early next week.
> > > > > >> > > > >> > >>
> > > > > >> > > > >> > >> Thanks,
> > > > > >> > > > >> > >> Allen
> > > > > >> > > > >> > >>
> > > > > >> > > > >> > >>
> > > > > >> > > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <
> > > > > >> > neha@confluent.io
> > > > > >> > > >
> > > > > >> > > > >> > wrote:
> > > > > >> > > > >> > >>
> > > > > >> > > > >> > >>> Few suggestions on improving the KIP
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>> *If some brokers have rack, and some do not, the
> > > > > algorithm
> > > > > >> > will
> > > > > >> > > > >> thrown
> > > > > >> > > > >> > an
> > > > > >> > > > >> > >>> > exception. This is to prevent incorrect
> > assignment
> > > > > >> caused by
> > > > > >> > > > user
> > > > > >> > > > >> > >>> error.*
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>> In the KIP, can you clearly state the user-facing
> > > > > behavior
> > > > > >> > when
> > > > > >> > > > some
> > > > > >> > > > >> > >>> brokers have rack information and some don't.
> Which
> > > > > actions
> > > > > >> > and
> > > > > >> > > > >> > requests
> > > > > >> > > > >> > >>> will error out and how?
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>> *Even distribution of partition leadership among
> > > > brokers*
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>> There is some information about arranging the
> > sorted
> > > > > broker
> > > > > >> > list
> > > > > >> > > > >> > >>> interlaced
> > > > > >> > > > >> > >>> with rack ids. Can you describe the changes to
> the
> > > > > current
> > > > > >> > > > algorithm
> > > > > >> > > > >> > in a
> > > > > >> > > > >> > >>> little more detail? How does this interlacing
> work
> > if
> > > > > only
> > > > > >> a
> > > > > >> > > > subset
> > > > > >> > > > >> of
> > > > > >> > > > >> > >>> brokers have the rack id configured? Does this
> > still
> > > > work
> > > > > >> if
> > > > > >> > > > uneven
> > > > > >> > > > >> #
> > > > > >> > > > >> > of
> > > > > >> > > > >> > >>> brokers are assigned to each rack? It might work,
> > I'm
> > > > > >> looking
> > > > > >> > > for
> > > > > >> > > > >> more
> > > > > >> > > > >> > >>> details on the changes, since it will affect the
> > > > behavior
> > > > > >> seen
> > > > > >> > > by
> > > > > >> > > > >> the
> > > > > >> > > > >> > >>> user
> > > > > >> > > > >> > >>> - imbalance on either the leaders or data or
> both.
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> > > > > >> > > > >> > aauradkar@linkedin.com>
> > > > > >> > > > >> > >>> wrote:
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>> > I think this sounds reasonable. Anyone else
> have
> > > > > >> comments?
> > > > > >> > > > >> > >>> >
> > > > > >> > > > >> > >>> > Aditya
> > > > > >> > > > >> > >>> >
> > > > > >> > > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> > > > > >> > > > allenxwang@gmail.com
> > > > > >> > > > >> >
> > > > > >> > > > >> > >>> wrote:
> > > > > >> > > > >> > >>> >
> > > > > >> > > > >> > >>> > > During the discussion in the hangout, it was
> > > > > mentioned
> > > > > >> > that
> > > > > >> > > it
> > > > > >> > > > >> > would
> > > > > >> > > > >> > >>> be
> > > > > >> > > > >> > >>> > > desirable that consumers know the rack
> > > information
> > > > of
> > > > > >> the
> > > > > >> > > > >> brokers
> > > > > >> > > > >> > so
> > > > > >> > > > >> > >>> that
> > > > > >> > > > >> > >>> > > they can consume from the broker in the same
> > rack
> > > > to
> > > > > >> > reduce
> > > > > >> > > > >> > latency.
> > > > > >> > > > >> > >>> As I
> > > > > >> > > > >> > >>> > > understand this will only be beneficial if
> > > consumer
> > > > > can
> > > > > >> > > > consume
> > > > > >> > > > >> > from
> > > > > >> > > > >> > >>> any
> > > > > >> > > > >> > >>> > > broker in ISR, which is not possible now.
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> > > I suggest we skip the change to TMR. Once the
> > > > change
> > > > > is
> > > > > >> > made
> > > > > >> > > > to
> > > > > >> > > > >> > >>> consumer
> > > > > >> > > > >> > >>> > to
> > > > > >> > > > >> > >>> > > be able to consume from any broker in ISR,
> the
> > > rack
> > > > > >> > > > information
> > > > > >> > > > >> can
> > > > > >> > > > >> > >>> be
> > > > > >> > > > >> > >>> > > added to TMR.
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> > > Another thing I want to confirm is  command
> > line
> > > > > >> > behavior. I
> > > > > >> > > > >> think
> > > > > >> > > > >> > >>> the
> > > > > >> > > > >> > >>> > > desirable default behavior is to fail fast on
> > > > command
> > > > > >> line
> > > > > >> > > for
> > > > > >> > > > >> > >>> incomplete
> > > > > >> > > > >> > >>> > > rack mapping. The error message can include
> > > further
> > > > > >> > > > instruction
> > > > > >> > > > >> > that
> > > > > >> > > > >> > >>> > tells
> > > > > >> > > > >> > >>> > > the user to add an extra argument (like
> > > > > >> > > > >> "--allow-partial-rackinfo")
> > > > > >> > > > >> > >>> to
> > > > > >> > > > >> > >>> > > suppress the error and do an imperfect rack
> > aware
> > > > > >> > > assignment.
> > > > > >> > > > If
> > > > > >> > > > >> > the
> > > > > >> > > > >> > >>> > > default behavior is to allow incomplete
> > mapping,
> > > > the
> > > > > >> error
> > > > > >> > > can
> > > > > >> > > > >> > still
> > > > > >> > > > >> > >>> be
> > > > > >> > > > >> > >>> > > easily missed.
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> > > The affected command line tools are
> > TopicCommand
> > > > and
> > > > > >> > > > >> > >>> > > ReassignPartitionsCommand.
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> > > Thanks,
> > > > > >> > > > >> > >>> > > Allen
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya
> > > Auradkar <
> > > > > >> > > > >> > >>> > aauradkar@linkedin.com>
> > > > > >> > > > >> > >>> > > wrote:
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> > > > Hi Allen,
> > > > > >> > > > >> > >>> > > >
> > > > > >> > > > >> > >>> > > > For TopicMetadataResponse to understand
> > > version,
> > > > > you
> > > > > >> can
> > > > > >> > > > bump
> > > > > >> > > > >> up
> > > > > >> > > > >> > >>> the
> > > > > >> > > > >> > >>> > > > request version itself. Based on the
> version
> > of
> > > > the
> > > > > >> > > request,
> > > > > >> > > > >> the
> > > > > >> > > > >> > >>> > response
> > > > > >> > > > >> > >>> > > > can be appropriately serialized. It
> shouldn't
> > > be
> > > > a
> > > > > >> huge
> > > > > >> > > > >> change.
> > > > > >> > > > >> > For
> > > > > >> > > > >> > >>> > > > example: We went through something similar
> > for
> > > > > >> > > > ProduceRequest
> > > > > >> > > > >> > >>> recently
> > > > > >> > > > >> > >>> > (
> > > > > >> > > > >> > >>> > > > https://reviews.apache.org/r/33378/)
> > > > > >> > > > >> > >>> > > > I guess the reason protocol information is
> > not
> > > > > >> included
> > > > > >> > in
> > > > > >> > > > the
> > > > > >> > > > >> > TMR
> > > > > >> > > > >> > >>> is
> > > > > >> > > > >> > >>> > > > because the topic itself is independent of
> > any
> > > > > >> > particular
> > > > > >> > > > >> > protocol
> > > > > >> > > > >> > >>> (SSL
> > > > > >> > > > >> > >>> > > vs
> > > > > >> > > > >> > >>> > > > Plaintext). Having said that, I'm not sure
> we
> > > > even
> > > > > >> need
> > > > > >> > > rack
> > > > > >> > > > >> > >>> > information
> > > > > >> > > > >> > >>> > > in
> > > > > >> > > > >> > >>> > > > TMR. What usecase were you thinking of
> > > initially?
> > > > > >> > > > >> > >>> > > >
> > > > > >> > > > >> > >>> > > > For 1 - I'd be fine with adding an option
> to
> > > the
> > > > > >> command
> > > > > >> > > > line
> > > > > >> > > > >> > tools
> > > > > >> > > > >> > >>> > that
> > > > > >> > > > >> > >>> > > > check rack assignment. For e.g.
> > > > > >> "--strict-assignment" or
> > > > > >> > > > >> > something
> > > > > >> > > > >> > >>> > > similar.
> > > > > >> > > > >> > >>> > > >
> > > > > >> > > > >> > >>> > > > Aditya
> > > > > >> > > > >> > >>> > > >
> > > > > >> > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen
> Wang <
> > > > > >> > > > >> > allenxwang@gmail.com>
> > > > > >> > > > >> > >>> > > wrote:
> > > > > >> > > > >> > >>> > > >
> > > > > >> > > > >> > >>> > > > > For 2 and 3, I have updated the KIP.
> Please
> > > > take
> > > > > a
> > > > > >> > look.
> > > > > >> > > > One
> > > > > >> > > > >> > >>> thing I
> > > > > >> > > > >> > >>> > > have
> > > > > >> > > > >> > >>> > > > > changed is removing the proposal to add
> > rack
> > > to
> > > > > >> > > > >> > >>> > TopicMetadataResponse.
> > > > > >> > > > >> > >>> > > > The
> > > > > >> > > > >> > >>> > > > > reason is that unlike
> > UpdateMetadataRequest,
> > > > > >> > > > >> > >>> TopicMetadataResponse
> > > > > >> > > > >> > >>> > does
> > > > > >> > > > >> > >>> > > > not
> > > > > >> > > > >> > >>> > > > > understand version. I don't see a way to
> > > > include
> > > > > >> rack
> > > > > >> > > > >> without
> > > > > >> > > > >> > >>> > breaking
> > > > > >> > > > >> > >>> > > > old
> > > > > >> > > > >> > >>> > > > > version of clients. That's probably why
> > > secure
> > > > > >> > protocol
> > > > > >> > > is
> > > > > >> > > > >> not
> > > > > >> > > > >> > >>> > included
> > > > > >> > > > >> > >>> > > > in
> > > > > >> > > > >> > >>> > > > > the TopicMetadataResponse either. I think
> > it
> > > > will
> > > > > >> be a
> > > > > >> > > > much
> > > > > >> > > > >> > >>> bigger
> > > > > >> > > > >> > >>> > > change
> > > > > >> > > > >> > >>> > > > > to include rack in TopicMetadataResponse.
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > > > For 1, my concern is that doing rack
> aware
> > > > > >> assignment
> > > > > >> > > > >> without
> > > > > >> > > > >> > >>> > complete
> > > > > >> > > > >> > >>> > > > > broker to rack mapping will result in
> > > > assignment
> > > > > >> that
> > > > > >> > is
> > > > > >> > > > not
> > > > > >> > > > >> > rack
> > > > > >> > > > >> > >>> > aware
> > > > > >> > > > >> > >>> > > > and
> > > > > >> > > > >> > >>> > > > > fail to provide fault tolerance in the
> > event
> > > of
> > > > > >> rack
> > > > > >> > > > outage.
> > > > > >> > > > >> > This
> > > > > >> > > > >> > >>> > kind
> > > > > >> > > > >> > >>> > > of
> > > > > >> > > > >> > >>> > > > > problem will be difficult to surface. And
> > the
> > > > > cost
> > > > > >> of
> > > > > >> > > this
> > > > > >> > > > >> > >>> problem is
> > > > > >> > > > >> > >>> > > > high:
> > > > > >> > > > >> > >>> > > > > you have to do partition reassignment if
> > you
> > > > are
> > > > > >> lucky
> > > > > >> > > to
> > > > > >> > > > >> spot
> > > > > >> > > > >> > >>> the
> > > > > >> > > > >> > >>> > > > problem
> > > > > >> > > > >> > >>> > > > > early on or face the consequence of data
> > loss
> > > > > >> during
> > > > > >> > > real
> > > > > >> > > > >> rack
> > > > > >> > > > >> > >>> > outage.
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > > > I do see the concern of fail-fast as it
> > might
> > > > > also
> > > > > >> > cause
> > > > > >> > > > >> data
> > > > > >> > > > >> > >>> loss if
> > > > > >> > > > >> > >>> > > > > producer is not able produce the message
> > due
> > > to
> > > > > >> topic
> > > > > >> > > > >> creation
> > > > > >> > > > >> > >>> > failure.
> > > > > >> > > > >> > >>> > > > Is
> > > > > >> > > > >> > >>> > > > > it feasible to treat dynamic topic
> creation
> > > and
> > > > > >> > command
> > > > > >> > > > >> tools
> > > > > >> > > > >> > >>> > > > differently?
> > > > > >> > > > >> > >>> > > > > We allow dynamic topic creation with
> > > incomplete
> > > > > >> > > > broker-rack
> > > > > >> > > > >> > >>> mapping
> > > > > >> > > > >> > >>> > and
> > > > > >> > > > >> > >>> > > > > fail fast in command line. Another option
> > is
> > > to
> > > > > let
> > > > > >> > user
> > > > > >> > > > >> > >>> determine
> > > > > >> > > > >> > >>> > the
> > > > > >> > > > >> > >>> > > > > behavior for command line. For example,
> by
> > > > > default
> > > > > >> > fail
> > > > > >> > > > >> fast in
> > > > > >> > > > >> > >>> > command
> > > > > >> > > > >> > >>> > > > > line but allow incomplete broker-rack
> > mapping
> > > > if
> > > > > >> > another
> > > > > >> > > > >> switch
> > > > > >> > > > >> > >>> is
> > > > > >> > > > >> > >>> > > > > provided.
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya
> > > > > Auradkar <
> > > > > >> > > > >> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > Hey Allen,
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > 1. If we choose fail fast topic
> creation,
> > > we
> > > > > will
> > > > > >> > have
> > > > > >> > > > >> topic
> > > > > >> > > > >> > >>> > creation
> > > > > >> > > > >> > >>> > > > > > failures while upgrading the cluster. I
> > > > really
> > > > > >> doubt
> > > > > >> > > we
> > > > > >> > > > >> want
> > > > > >> > > > >> > >>> this
> > > > > >> > > > >> > >>> > > > > behavior.
> > > > > >> > > > >> > >>> > > > > > Ideally, this should be invisible to
> > > clients
> > > > > of a
> > > > > >> > > > cluster.
> > > > > >> > > > >> > >>> > Currently,
> > > > > >> > > > >> > >>> > > > > each
> > > > > >> > > > >> > >>> > > > > > broker is effectively its own rack. So
> we
> > > > > >> probably
> > > > > >> > can
> > > > > >> > > > use
> > > > > >> > > > >> > the
> > > > > >> > > > >> > >>> rack
> > > > > >> > > > >> > >>> > > > > > information whenever possible but not
> > make
> > > > it a
> > > > > >> hard
> > > > > >> > > > >> > >>> requirement.
> > > > > >> > > > >> > >>> > To
> > > > > >> > > > >> > >>> > > > > extend
> > > > > >> > > > >> > >>> > > > > > Gwen's example, one badly configured
> > broker
> > > > > >> should
> > > > > >> > not
> > > > > >> > > > >> > degrade
> > > > > >> > > > >> > >>> > topic
> > > > > >> > > > >> > >>> > > > > > creation for the entire cluster.
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a
> > section
> > > > on
> > > > > >> the
> > > > > >> > > > upgrade
> > > > > >> > > > >> > >>> piece to
> > > > > >> > > > >> > >>> > > > > confirm
> > > > > >> > > > >> > >>> > > > > > that old clients will not see errors? I
> > > > believe
> > > > > >> > > > >> > >>> > > > > ZookeeperConsumerConnector
> > > > > >> > > > >> > >>> > > > > > reads the Broker objects from ZK. I
> > wanted
> > > to
> > > > > >> > confirm
> > > > > >> > > > that
> > > > > >> > > > >> > this
> > > > > >> > > > >> > >>> > will
> > > > > >> > > > >> > >>> > > > not
> > > > > >> > > > >> > >>> > > > > > cause any problems.
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > 3. Could you elaborate your proposed
> > > changes
> > > > to
> > > > > >> the
> > > > > >> > > > >> > >>> > > > UpdateMetadataRequest
> > > > > >> > > > >> > >>> > > > > > in the "Public Interfaces" section?
> > > > > Personally, I
> > > > > >> > find
> > > > > >> > > > >> this
> > > > > >> > > > >> > >>> format
> > > > > >> > > > >> > >>> > > easy
> > > > > >> > > > >> > >>> > > > > to
> > > > > >> > > > >> > >>> > > > > > read in terms of wire protocol changes:
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > >
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> >
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> >
> > > > > >> > > > >>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > Aditya
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen
> > > Wang <
> > > > > >> > > > >> > >>> allenxwang@gmail.com>
> > > > > >> > > > >> > >>> > > > > wrote:
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > > KIP is updated include rack as an
> > > optional
> > > > > >> > property
> > > > > >> > > > for
> > > > > >> > > > >> > >>> broker.
> > > > > >> > > > >> > >>> > > > Please
> > > > > >> > > > >> > >>> > > > > > take
> > > > > >> > > > >> > >>> > > > > > > a look and let me know if more
> details
> > > are
> > > > > >> needed.
> > > > > >> > > > >> > >>> > > > > > >
> > > > > >> > > > >> > >>> > > > > > > For the case where some brokers have
> > rack
> > > > and
> > > > > >> some
> > > > > >> > > do
> > > > > >> > > > >> not,
> > > > > >> > > > >> > >>> the
> > > > > >> > > > >> > >>> > > > current
> > > > > >> > > > >> > >>> > > > > > KIP
> > > > > >> > > > >> > >>> > > > > > > uses the fail-fast behavior. If there
> > are
> > > > > >> > concerns,
> > > > > >> > > we
> > > > > >> > > > >> can
> > > > > >> > > > >> > >>> > further
> > > > > >> > > > >> > >>> > > > > > discuss
> > > > > >> > > > >> > >>> > > > > > > this in the email thread or next
> > hangout.
> > > > > >> > > > >> > >>> > > > > > >
> > > > > >> > > > >> > >>> > > > > > >
> > > > > >> > > > >> > >>> > > > > > >
> > > > > >> > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM,
> Allen
> > > > Wang
> > > > > <
> > > > > >> > > > >> > >>> > allenxwang@gmail.com
> > > > > >> > > > >> > >>> > > >
> > > > > >> > > > >> > >>> > > > > > wrote:
> > > > > >> > > > >> > >>> > > > > > >
> > > > > >> > > > >> > >>> > > > > > > > That's a good question. I can think
> > of
> > > > > three
> > > > > >> > > actions
> > > > > >> > > > >> if
> > > > > >> > > > >> > the
> > > > > >> > > > >> > >>> > rack
> > > > > >> > > > >> > >>> > > > > > > > information is incomplete:
> > > > > >> > > > >> > >>> > > > > > > >
> > > > > >> > > > >> > >>> > > > > > > > 1. Treat the node without rack as
> if
> > it
> > > > is
> > > > > on
> > > > > >> > its
> > > > > >> > > > >> unique
> > > > > >> > > > >> > >>> rack
> > > > > >> > > > >> > >>> > > > > > > > 2. Disregard all rack information
> and
> > > > > >> fallback
> > > > > >> > to
> > > > > >> > > > >> current
> > > > > >> > > > >> > >>> > > algorithm
> > > > > >> > > > >> > >>> > > > > > > > 3. Fail-fast
> > > > > >> > > > >> > >>> > > > > > > >
> > > > > >> > > > >> > >>> > > > > > > > Now I think about it, one and three
> > > make
> > > > > more
> > > > > >> > > sense.
> > > > > >> > > > >> The
> > > > > >> > > > >> > >>> reason
> > > > > >> > > > >> > >>> > > for
> > > > > >> > > > >> > >>> > > > > > > > fail-fast is that user mistake for
> > not
> > > > > >> providing
> > > > > >> > > the
> > > > > >> > > > >> rack
> > > > > >> > > > >> > >>> may
> > > > > >> > > > >> > >>> > > never
> > > > > >> > > > >> > >>> > > > > be
> > > > > >> > > > >> > >>> > > > > > > > found if we tolerate that and the
> > > > > assignment
> > > > > >> may
> > > > > >> > > not
> > > > > >> > > > >> be
> > > > > >> > > > >> > >>> rack
> > > > > >> > > > >> > >>> > > aware
> > > > > >> > > > >> > >>> > > > as
> > > > > >> > > > >> > >>> > > > > > the
> > > > > >> > > > >> > >>> > > > > > > > user has expected and this creates
> > > debug
> > > > > >> > problems
> > > > > >> > > > when
> > > > > >> > > > >> > >>> things
> > > > > >> > > > >> > >>> > > fail.
> > > > > >> > > > >> > >>> > > > > > > >
> > > > > >> > > > >> > >>> > > > > > > > What do you think? If not
> fail-fast,
> > is
> > > > > there
> > > > > >> > > anyway
> > > > > >> > > > >> we
> > > > > >> > > > >> > can
> > > > > >> > > > >> > >>> > make
> > > > > >> > > > >> > >>> > > > the
> > > > > >> > > > >> > >>> > > > > > user
> > > > > >> > > > >> > >>> > > > > > > > error standing out?
> > > > > >> > > > >> > >>> > > > > > > >
> > > > > >> > > > >> > >>> > > > > > > >
> > > > > >> > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM,
> > Gwen
> > > > > >> Shapira <
> > > > > >> > > > >> > >>> > > gwen@confluent.io>
> > > > > >> > > > >> > >>> > > > > > > wrote:
> > > > > >> > > > >> > >>> > > > > > > >
> > > > > >> > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when some
> > > > brokers
> > > > > >> have
> > > > > >> > > > rack
> > > > > >> > > > >> > >>> > assignment
> > > > > >> > > > >> > >>> > > > and
> > > > > >> > > > >> > >>> > > > > > some
> > > > > >> > > > >> > >>> > > > > > > >> don't, do we act like none of them
> > > have
> > > > > it?
> > > > > >> or
> > > > > >> > > like
> > > > > >> > > > >> > those
> > > > > >> > > > >> > >>> > > without
> > > > > >> > > > >> > >>> > > > > > > >> assignment are in their own rack?
> > > > > >> > > > >> > >>> > > > > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> The first scenario is good when
> > first
> > > > > >> setting
> > > > > >> > up
> > > > > >> > > > >> > >>> > rack-awareness,
> > > > > >> > > > >> > >>> > > > but
> > > > > >> > > > >> > >>> > > > > > the
> > > > > >> > > > >> > >>> > > > > > > >> second makes more sense for
> on-going
> > > > > >> > maintenance
> > > > > >> > > (I
> > > > > >> > > > >> can
> > > > > >> > > > >> > >>> > totally
> > > > > >> > > > >> > >>> > > > see
> > > > > >> > > > >> > >>> > > > > > > >> someone
> > > > > >> > > > >> > >>> > > > > > > >> adding a node and forgetting to
> set
> > > the
> > > > > rack
> > > > > >> > > > >> property,
> > > > > >> > > > >> > we
> > > > > >> > > > >> > >>> > don't
> > > > > >> > > > >> > >>> > > > want
> > > > > >> > > > >> > >>> > > > > > > this
> > > > > >> > > > >> > >>> > > > > > > >> to change behavior for anything
> > except
> > > > the
> > > > > >> new
> > > > > >> > > > node).
> > > > > >> > > > >> > >>> > > > > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> What do you think?
> > > > > >> > > > >> > >>> > > > > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> Gwen
> > > > > >> > > > >> > >>> > > > > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM,
> > > Allen
> > > > > >> Wang <
> > > > > >> > > > >> > >>> > > > allenxwang@gmail.com>
> > > > > >> > > > >> > >>> > > > > > > >> wrote:
> > > > > >> > > > >> > >>> > > > > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> > For scenario 1:
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >> > - Add the rack information to
> > broker
> > > > > >> property
> > > > > >> > > > file
> > > > > >> > > > >> or
> > > > > >> > > > >> > >>> > > > dynamically
> > > > > >> > > > >> > >>> > > > > > set
> > > > > >> > > > >> > >>> > > > > > > >> it in
> > > > > >> > > > >> > >>> > > > > > > >> > the wrapper code to bootstrap
> > Kafka
> > > > > >> server.
> > > > > >> > You
> > > > > >> > > > >> would
> > > > > >> > > > >> > do
> > > > > >> > > > >> > >>> > that
> > > > > >> > > > >> > >>> > > > for
> > > > > >> > > > >> > >>> > > > > > all
> > > > > >> > > > >> > >>> > > > > > > >> > brokers and restart the brokers
> > one
> > > by
> > > > > >> one.
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >> > In this scenario, the complete
> > > broker
> > > > to
> > > > > >> rack
> > > > > >> > > > >> mapping
> > > > > >> > > > >> > >>> may
> > > > > >> > > > >> > >>> > not
> > > > > >> > > > >> > >>> > > be
> > > > > >> > > > >> > >>> > > > > > > >> available
> > > > > >> > > > >> > >>> > > > > > > >> > until every broker is restarted.
> > > > During
> > > > > >> that
> > > > > >> > > time
> > > > > >> > > > >> we
> > > > > >> > > > >> > >>> fall
> > > > > >> > > > >> > >>> > back
> > > > > >> > > > >> > >>> > > > to
> > > > > >> > > > >> > >>> > > > > > > >> default
> > > > > >> > > > >> > >>> > > > > > > >> > replica assignment algorithm.
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >> > For scenario 2:
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >> > - Add the rack information to
> > broker
> > > > > >> property
> > > > > >> > > > file
> > > > > >> > > > >> or
> > > > > >> > > > >> > >>> > > > dynamically
> > > > > >> > > > >> > >>> > > > > > set
> > > > > >> > > > >> > >>> > > > > > > >> it in
> > > > > >> > > > >> > >>> > > > > > > >> > the wrapper code and start the
> > > broker.
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM,
> > > Gwen
> > > > > >> > Shapira <
> > > > > >> > > > >> > >>> > > > gwen@confluent.io>
> > > > > >> > > > >> > >>> > > > > > > >> wrote:
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >> > > Can you clarify the workflow
> for
> > > the
> > > > > >> > > following
> > > > > >> > > > >> > >>> scenarios:
> > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > >> > > > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers
> > and
> > > > want
> > > > > >> to
> > > > > >> > add
> > > > > >> > > > >> rack
> > > > > >> > > > >> > >>> > > information
> > > > > >> > > > >> > >>> > > > > for
> > > > > >> > > > >> > >>> > > > > > > >> each
> > > > > >> > > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker
> and I
> > > > want
> > > > > to
> > > > > >> > > > specify
> > > > > >> > > > >> > which
> > > > > >> > > > >> > >>> > rack
> > > > > >> > > > >> > >>> > > it
> > > > > >> > > > >> > >>> > > > > > > >> belongs on
> > > > > >> > > > >> > >>> > > > > > > >> > > while adding it.
> > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > >> > > > >> > >>> > > > > > > >> > > Thanks!
> > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > >> > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21
> PM,
> > > > Allen
> > > > > >> > Wang <
> > > > > >> > > > >> > >>> > > > > allenxwang@gmail.com
> > > > > >> > > > >> > >>> > > > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > wrote:
> > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > We discussed the KIP in the
> > > > hangout
> > > > > >> > today.
> > > > > >> > > > The
> > > > > >> > > > >> > >>> > > > recommendation
> > > > > >> > > > >> > >>> > > > > is
> > > > > >> > > > >> > >>> > > > > > > to
> > > > > >> > > > >> > >>> > > > > > > >> > make
> > > > > >> > > > >> > >>> > > > > > > >> > > > rack as a broker property in
> > > > > >> ZooKeeper.
> > > > > >> > For
> > > > > >> > > > >> users
> > > > > >> > > > >> > >>> with
> > > > > >> > > > >> > >>> > > > > existing
> > > > > >> > > > >> > >>> > > > > > > rack
> > > > > >> > > > >> > >>> > > > > > > >> > > > information stored
> somewhere,
> > > they
> > > > > >> would
> > > > > >> > > need
> > > > > >> > > > >> to
> > > > > >> > > > >> > >>> > retrieve
> > > > > >> > > > >> > >>> > > > the
> > > > > >> > > > >> > >>> > > > > > > >> > information
> > > > > >> > > > >> > >>> > > > > > > >> > > > at broker start up and
> > > dynamically
> > > > > set
> > > > > >> > the
> > > > > >> > > > rack
> > > > > >> > > > >> > >>> > property,
> > > > > >> > > > >> > >>> > > > > which
> > > > > >> > > > >> > >>> > > > > > > can
> > > > > >> > > > >> > >>> > > > > > > >> be
> > > > > >> > > > >> > >>> > > > > > > >> > > > implemented as a wrapper to
> > > > > bootstrap
> > > > > >> > > broker.
> > > > > >> > > > >> > There
> > > > > >> > > > >> > >>> will
> > > > > >> > > > >> > >>> > > be
> > > > > >> > > > >> > >>> > > > no
> > > > > >> > > > >> > >>> > > > > > > >> > interface
> > > > > >> > > > >> > >>> > > > > > > >> > > or
> > > > > >> > > > >> > >>> > > > > > > >> > > > pluggable implementation to
> > > > retrieve
> > > > > >> the
> > > > > >> > > rack
> > > > > >> > > > >> > >>> > information.
> > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > The assumption is that you
> > > always
> > > > > >> need to
> > > > > >> > > > >> restart
> > > > > >> > > > >> > >>> the
> > > > > >> > > > >> > >>> > > broker
> > > > > >> > > > >> > >>> > > > > to
> > > > > >> > > > >> > >>> > > > > > > >> make a
> > > > > >> > > > >> > >>> > > > > > > >> > > > change to the rack.
> > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > Once the rack becomes a
> broker
> > > > > >> property,
> > > > > >> > it
> > > > > >> > > > >> will
> > > > > >> > > > >> > be
> > > > > >> > > > >> > >>> > > possible
> > > > > >> > > > >> > >>> > > > > to
> > > > > >> > > > >> > >>> > > > > > > make
> > > > > >> > > > >> > >>> > > > > > > >> > rack
> > > > > >> > > > >> > >>> > > > > > > >> > > > part of the meta data to
> help
> > > the
> > > > > >> > consumer
> > > > > >> > > > >> choose
> > > > > >> > > > >> > >>> which
> > > > > >> > > > >> > >>> > in
> > > > > >> > > > >> > >>> > > > > sync
> > > > > >> > > > >> > >>> > > > > > > >> replica
> > > > > >> > > > >> > >>> > > > > > > >> > > to
> > > > > >> > > > >> > >>> > > > > > > >> > > > consume from as part of the
> > > future
> > > > > >> > consumer
> > > > > >> > > > >> > >>> enhancement.
> > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > I will update the KIP.
> > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > Thanks,
> > > > > >> > > > >> > >>> > > > > > > >> > > > Allen
> > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23
> > AM,
> > > > > Allen
> > > > > >> > Wang
> > > > > >> > > <
> > > > > >> > > > >> > >>> > > > > > allenxwang@gmail.com>
> > > > > >> > > > >> > >>> > > > > > > >> > wrote:
> > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP
> > > hangout
> > > > > but
> > > > > >> > this
> > > > > >> > > > KIP
> > > > > >> > > > >> > was
> > > > > >> > > > >> > >>> not
> > > > > >> > > > >> > >>> > > > > > discussed
> > > > > >> > > > >> > >>> > > > > > > >> due
> > > > > >> > > > >> > >>> > > > > > > >> > to
> > > > > >> > > > >> > >>> > > > > > > >> > > > > time constraint.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > > However, after hearing
> > > > discussion
> > > > > of
> > > > > >> > > > KIP-35,
> > > > > >> > > > >> I
> > > > > >> > > > >> > >>> have
> > > > > >> > > > >> > >>> > the
> > > > > >> > > > >> > >>> > > > > > feeling
> > > > > >> > > > >> > >>> > > > > > > >> that
> > > > > >> > > > >> > >>> > > > > > > >> > > > > incompatibility (caused by
> > new
> > > > > >> broker
> > > > > >> > > > >> property)
> > > > > >> > > > >> > >>> > between
> > > > > >> > > > >> > >>> > > > > > brokers
> > > > > >> > > > >> > >>> > > > > > > >> with
> > > > > >> > > > >> > >>> > > > > > > >> > > > > different versions  will
> be
> > > > solved
> > > > > >> > there.
> > > > > >> > > > In
> > > > > >> > > > >> > >>> addition,
> > > > > >> > > > >> > >>> > > > > having
> > > > > >> > > > >> > >>> > > > > > > >> stack
> > > > > >> > > > >> > >>> > > > > > > >> > in
> > > > > >> > > > >> > >>> > > > > > > >> > > > > broker property as meta
> data
> > > may
> > > > > >> also
> > > > > >> > > help
> > > > > >> > > > >> > >>> consumers
> > > > > >> > > > >> > >>> > in
> > > > > >> > > > >> > >>> > > > the
> > > > > >> > > > >> > >>> > > > > > > >> future.
> > > > > >> > > > >> > >>> > > > > > > >> > So
> > > > > >> > > > >> > >>> > > > > > > >> > > I
> > > > > >> > > > >> > >>> > > > > > > >> > > > am
> > > > > >> > > > >> > >>> > > > > > > >> > > > > open to adding stack
> > property
> > > to
> > > > > >> > broker.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss
> > this
> > > in
> > > > > the
> > > > > >> > next
> > > > > >> > > > KIP
> > > > > >> > > > >> > >>> hangout.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at
> 2:46
> > > PM,
> > > > > >> Allen
> > > > > >> > > > Wang <
> > > > > >> > > > >> > >>> > > > > > > allenxwang@gmail.com
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > wrote:
> > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >> Can you send me the
> > > information
> > > > > on
> > > > > >> the
> > > > > >> > > > next
> > > > > >> > > > >> KIP
> > > > > >> > > > >> > >>> > > hangout?
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >> Currently the broker-rack
> > > > mapping
> > > > > >> is
> > > > > >> > not
> > > > > >> > > > >> > cached.
> > > > > >> > > > >> > >>> In
> > > > > >> > > > >> > >>> > > > > > KafkaApis,
> > > > > >> > > > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo()
> > is
> > > > > called
> > > > > >> > each
> > > > > >> > > > >> time
> > > > > >> > > > >> > the
> > > > > >> > > > >> > >>> > > mapping
> > > > > >> > > > >> > >>> > > > > is
> > > > > >> > > > >> > >>> > > > > > > >> needed
> > > > > >> > > > >> > >>> > > > > > > >> > > for
> > > > > >> > > > >> > >>> > > > > > > >> > > > >> auto topic creation. This
> > > will
> > > > > >> ensure
> > > > > >> > > > latest
> > > > > >> > > > >> > >>> mapping
> > > > > >> > > > >> > >>> > is
> > > > > >> > > > >> > >>> > > > > used
> > > > > >> > > > >> > >>> > > > > > at
> > > > > >> > > > >> > >>> > > > > > > >> any
> > > > > >> > > > >> > >>> > > > > > > >> > > > time.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >> The ability to get the
> > > complete
> > > > > >> > mapping
> > > > > >> > > > >> makes
> > > > > >> > > > >> > it
> > > > > >> > > > >> > >>> > simple
> > > > > >> > > > >> > >>> > > > to
> > > > > >> > > > >> > >>> > > > > > > reuse
> > > > > >> > > > >> > >>> > > > > > > >> the
> > > > > >> > > > >> > >>> > > > > > > >> > > > same
> > > > > >> > > > >> > >>> > > > > > > >> > > > >> interface in command line
> > > > tools.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at
> > 11:01
> > > > AM,
> > > > > >> > Aditya
> > > > > >> > > > >> > >>> Auradkar <
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > aauradkar@linkedin.com.invalid
> > > > >
> > > > > >> > wrote:
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this
> > > during
> > > > > the
> > > > > >> > next
> > > > > >> > > > KIP
> > > > > >> > > > >> > >>> hangout?
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> I do see that a
> pluggable
> > > rack
> > > > > >> > locator
> > > > > >> > > > can
> > > > > >> > > > >> be
> > > > > >> > > > >> > >>> useful
> > > > > >> > > > >> > >>> > > > but I
> > > > > >> > > > >> > >>> > > > > > do
> > > > > >> > > > >> > >>> > > > > > > >> see a
> > > > > >> > > > >> > >>> > > > > > > >> > > few
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> concerns:
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as
> > > > described
> > > > > in
> > > > > >> > the
> > > > > >> > > > >> > >>> document),
> > > > > >> > > > >> > >>> > > > implies
> > > > > >> > > > >> > >>> > > > > > that
> > > > > >> > > > >> > >>> > > > > > > >> it
> > > > > >> > > > >> > >>> > > > > > > >> > can
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> discover rack
> information
> > > for
> > > > > any
> > > > > >> > node
> > > > > >> > > in
> > > > > >> > > > >> the
> > > > > >> > > > >> > >>> > cluster.
> > > > > >> > > > >> > >>> > > > How
> > > > > >> > > > >> > >>> > > > > > > does
> > > > > >> > > > >> > >>> > > > > > > >> it
> > > > > >> > > > >> > >>> > > > > > > >> > > deal
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> with rack location
> > changes?
> > > > For
> > > > > >> > > example,
> > > > > >> > > > >> if I
> > > > > >> > > > >> > >>> moved
> > > > > >> > > > >> > >>> > > > broker
> > > > > >> > > > >> > >>> > > > > > id
> > > > > >> > > > >> > >>> > > > > > > >> (1)
> > > > > >> > > > >> > >>> > > > > > > >> > > from
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to
> > start
> > > > > that
> > > > > >> > > broker
> > > > > >> > > > >> with
> > > > > >> > > > >> > a
> > > > > >> > > > >> > >>> > newer
> > > > > >> > > > >> > >>> > > > rack
> > > > > >> > > > >> > >>> > > > > > > >> config.
> > > > > >> > > > >> > >>> > > > > > > >> > If
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers
> > broker
> > > > ->
> > > > > >> rack
> > > > > >> > > > >> > >>> information at
> > > > > >> > > > >> > >>> > > > start
> > > > > >> > > > >> > >>> > > > > up
> > > > > >> > > > >> > >>> > > > > > > >> time,
> > > > > >> > > > >> > >>> > > > > > > >> > > any
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> change to a broker will
> > > > require
> > > > > >> > > bouncing
> > > > > >> > > > >> the
> > > > > >> > > > >> > >>> entire
> > > > > >> > > > >> > >>> > > > > cluster
> > > > > >> > > > >> > >>> > > > > > > >> since
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> createTopic requests can
> > be
> > > > sent
> > > > > >> to
> > > > > >> > any
> > > > > >> > > > >> node
> > > > > >> > > > >> > in
> > > > > >> > > > >> > >>> the
> > > > > >> > > > >> > >>> > > > > cluster.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> For this reason it may
> be
> > > > > simpler
> > > > > >> to
> > > > > >> > > have
> > > > > >> > > > >> each
> > > > > >> > > > >> > >>> node
> > > > > >> > > > >> > >>> > be
> > > > > >> > > > >> > >>> > > > > aware
> > > > > >> > > > >> > >>> > > > > > > of
> > > > > >> > > > >> > >>> > > > > > > >> its
> > > > > >> > > > >> > >>> > > > > > > >> > > own
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack and persist it in
> ZK
> > > > during
> > > > > >> > start
> > > > > >> > > up
> > > > > >> > > > >> > time.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> - A pluggable
> RackLocator
> > > > relies
> > > > > >> on
> > > > > >> > an
> > > > > >> > > > >> > external
> > > > > >> > > > >> > >>> > > service
> > > > > >> > > > >> > >>> > > > > > being
> > > > > >> > > > >> > >>> > > > > > > >> > > available
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> to
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> serve rack information.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I
> looked
> > > up
> > > > > how
> > > > > >> a
> > > > > >> > > > couple
> > > > > >> > > > >> of
> > > > > >> > > > >> > >>> other
> > > > > >> > > > >> > >>> > > > > systems
> > > > > >> > > > >> > >>> > > > > > > deal
> > > > > >> > > > >> > >>> > > > > > > >> > with
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> For Cassandra some
> > > interesting
> > > > > >> modes
> > > > > >> > > are:
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> (Property File
> > > configuration)
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >>
> > > > > >> > > > >> > >>> > > > > > >
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > >
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> >
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> >
> > > > > >> > > > >>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >>
> > > > > >> > > > >> > >>> > > > > > >
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > >
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> >
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> >
> > > > > >> > > > >>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Voldemort does a static
> > node
> > > > ->
> > > > > >> zone
> > > > > >> > > > >> > assignment
> > > > > >> > > > >> > >>> > based
> > > > > >> > > > >> > >>> > > on
> > > > > >> > > > >> > >>> > > > > > > >> > > configuration.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> Aditya
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at
> > > 10:05
> > > > > AM,
> > > > > >> > Allen
> > > > > >> > > > >> Wang <
> > > > > >> > > > >> > >>> > > > > > > >> allenxwang@gmail.com
> > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > I would like to see if
> > we
> > > > can
> > > > > do
> > > > > >> > > both:
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator
> > > pluggable
> > > > > to
> > > > > >> > > > >> facilitate
> > > > > >> > > > >> > >>> > migration
> > > > > >> > > > >> > >>> > > > > with
> > > > > >> > > > >> > >>> > > > > > > >> > existing
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make rack an
> optional
> > > > > property
> > > > > >> > for
> > > > > >> > > > >> broker.
> > > > > >> > > > >> > >>> If
> > > > > >> > > > >> > >>> > rack
> > > > > >> > > > >> > >>> > > > is
> > > > > >> > > > >> > >>> > > > > > > >> available
> > > > > >> > > > >> > >>> > > > > > > >> > > > from
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as
> > source
> > > > of
> > > > > >> > truth.
> > > > > >> > > > For
> > > > > >> > > > >> > users
> > > > > >> > > > >> > >>> > with
> > > > > >> > > > >> > >>> > > > > > existing
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> broker-rack
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere
> else,
> > > they
> > > > > can
> > > > > >> > use
> > > > > >> > > > the
> > > > > >> > > > >> > >>> pluggable
> > > > > >> > > > >> > >>> > > way
> > > > > >> > > > >> > >>> > > > > or
> > > > > >> > > > >> > >>> > > > > > > they
> > > > > >> > > > >> > >>> > > > > > > >> > can
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> transfer
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > the mapping to the
> > broker
> > > > rack
> > > > > >> > > > property.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > One thing I am not
> sure
> > is
> > > > > what
> > > > > >> > > happens
> > > > > >> > > > >> at
> > > > > >> > > > >> > >>> rolling
> > > > > >> > > > >> > >>> > > > > upgrade
> > > > > >> > > > >> > >>> > > > > > > >> when
> > > > > >> > > > >> > >>> > > > > > > >> > we
> > > > > >> > > > >> > >>> > > > > > > >> > > > have
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > rack as a broker
> > property.
> > > > For
> > > > > >> > > brokers
> > > > > >> > > > >> with
> > > > > >> > > > >> > >>> older
> > > > > >> > > > >> > >>> > > > > version
> > > > > >> > > > >> > >>> > > > > > of
> > > > > >> > > > >> > >>> > > > > > > >> > Kafka,
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> will it
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > cause problem for
> them?
> > If
> > > > so,
> > > > > >> is
> > > > > >> > > there
> > > > > >> > > > >> any
> > > > > >> > > > >> > >>> > > > workaround?
> > > > > >> > > > >> > >>> > > > > I
> > > > > >> > > > >> > >>> > > > > > > also
> > > > > >> > > > >> > >>> > > > > > > >> > > think
> > > > > >> > > > >> > >>> > > > > > > >> > > > it
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > would be better not to
> > > have
> > > > > >> rack in
> > > > > >> > > the
> > > > > >> > > > >> > >>> controller
> > > > > >> > > > >> > >>> > > > wire
> > > > > >> > > > >> > >>> > > > > > > >> protocol
> > > > > >> > > > >> > >>> > > > > > > >> > > but
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> not
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > sure if it is
> > achievable.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > Thanks,
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > Allen
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015
> at
> > > 4:55
> > > > > PM,
> > > > > >> > Todd
> > > > > >> > > > >> > Palino <
> > > > > >> > > > >> > >>> > > > > > > >> tpalino@gmail.com>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the
> > idea
> > > > of a
> > > > > >> > > > pluggable
> > > > > >> > > > >> > >>> locator.
> > > > > >> > > > >> > >>> > > For
> > > > > >> > > > >> > >>> > > > > > > >> example, we
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> already
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > have an interface
> for
> > > > > >> discovering
> > > > > >> > > > >> > >>> information
> > > > > >> > > > >> > >>> > > about
> > > > > >> > > > >> > >>> > > > > the
> > > > > >> > > > >> > >>> > > > > > > >> > physical
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> location
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't
> > > relish
> > > > > the
> > > > > >> > idea
> > > > > >> > > > of
> > > > > >> > > > >> > >>> having to
> > > > > >> > > > >> > >>> > > > > > maintain
> > > > > >> > > > >> > >>> > > > > > > >> data
> > > > > >> > > > >> > >>> > > > > > > >> > in
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > multiple
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > places.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > -Todd
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015
> > at
> > > > 4:48
> > > > > >> PM,
> > > > > >> > > > Aditya
> > > > > >> > > > >> > >>> > Auradkar <
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > aauradkar@linkedin.com.invalid
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for
> starting
> > > this
> > > > > KIP
> > > > > >> > > Allen.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen
> > that
> > > > > >> having a
> > > > > >> > > > >> > >>> RackLocator
> > > > > >> > > > >> > >>> > > class
> > > > > >> > > > >> > >>> > > > > that
> > > > > >> > > > >> > >>> > > > > > > is
> > > > > >> > > > >> > >>> > > > > > > >> > > > pluggable
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > seems
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > to be too complex.
> > The
> > > > KIP
> > > > > >> > refers
> > > > > >> > > > to
> > > > > >> > > > >> > >>> > potentially
> > > > > >> > > > >> > >>> > > > > > non-ZK
> > > > > >> > > > >> > >>> > > > > > > >> > storage
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> for the
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack info which I
> > > don't
> > > > > >> think
> > > > > >> > is
> > > > > >> > > > >> > >>> necessary.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can
> > persist
> > > > > this
> > > > > >> > info
> > > > > >> > > in
> > > > > >> > > > >> zk
> > > > > >> > > > >> > >>> under
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > similar to other
> > > broker
> > > > > >> > > properties
> > > > > >> > > > >> and
> > > > > >> > > > >> > >>> add a
> > > > > >> > > > >> > >>> > > > config
> > > > > >> > > > >> > >>> > > > > in
> > > > > >> > > > >> > >>> > > > > > > >> > > > KafkaConfig
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > called
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > "rack".
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > >> > > > >> > >>> > > > > > >
> > > > > >> > > > >>
> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > > > >> > > > >> > >>> > > > > > > >> > > "rack":
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > "abc"}
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28,
> 2015
> > > at
> > > > > 2:30
> > > > > >> > PM,
> > > > > >> > > > Gwen
> > > > > >> > > > >> > >>> Shapira
> > > > > >> > > > >> > >>> > <
> > > > > >> > > > >> > >>> > > > > > > >> > > gwen@confluent.io
> > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > wrote:
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks
> for
> > > > > putting
> > > > > >> > out a
> > > > > >> > > > KIP
> > > > > >> > > > >> > for
> > > > > >> > > > >> > >>> > this.
> > > > > >> > > > >> > >>> > > > This
> > > > > >> > > > >> > >>> > > > > > is
> > > > > >> > > > >> > >>> > > > > > > >> super
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> important
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > for
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > production
> > > deployments
> > > > > of
> > > > > >> > > Kafka.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure
> we
> > > want
> > > > > "as
> > > > > >> > many
> > > > > >> > > > >> racks
> > > > > >> > > > >> > as
> > > > > >> > > > >> > >>> > > > > possible"?
> > > > > >> > > > >> > >>> > > > > > > I'd
> > > > > >> > > > >> > >>> > > > > > > >> > want
> > > > > >> > > > >> > >>> > > > > > > >> > > to
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > balance
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > between safety
> > (more
> > > > > >> racks)
> > > > > >> > and
> > > > > >> > > > >> > network
> > > > > >> > > > >> > >>> > > > > utilization
> > > > > >> > > > >> > >>> > > > > > > >> > (traffic
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> within a
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > uses the
> > > > high-bandwidth
> > > > > >> TOR
> > > > > >> > > > >> switch).
> > > > > >> > > > >> > One
> > > > > >> > > > >> > >>> > > replica
> > > > > >> > > > >> > >>> > > > > on
> > > > > >> > > > >> > >>> > > > > > a
> > > > > >> > > > >> > >>> > > > > > > >> > > different
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > and
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same
> > > rack
> > > > > (if
> > > > > >> > > > possible)
> > > > > >> > > > >> > >>> sounds
> > > > > >> > > > >> > >>> > > > better
> > > > > >> > > > >> > >>> > > > > to
> > > > > >> > > > >> > >>> > > > > > > me.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator
> > > class
> > > > > >> seems
> > > > > >> > > > overly
> > > > > >> > > > >> > >>> complex
> > > > > >> > > > >> > >>> > > > > compared
> > > > > >> > > > >> > >>> > > > > > to
> > > > > >> > > > >> > >>> > > > > > > >> > > adding a
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack.number
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > property to the
> > > broker
> > > > > >> > > properties
> > > > > >> > > > >> > file.
> > > > > >> > > > >> > >>> Why
> > > > > >> > > > >> > >>> > do
> > > > > >> > > > >> > >>> > > > we
> > > > > >> > > > >> > >>> > > > > > want
> > > > > >> > > > >> > >>> > > > > > > >> > that?
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28,
> > 2015
> > > > at
> > > > > >> 12:15
> > > > > >> > > PM,
> > > > > >> > > > >> > Allen
> > > > > >> > > > >> > >>> > Wang <
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > wrote:
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka
> > > > > Developers,
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just created
> > > > KIP-36
> > > > > >> for
> > > > > >> > > rack
> > > > > >> > > > >> aware
> > > > > >> > > > >> > >>> > replica
> > > > > >> > > > >> > >>> > > > > > > >> assignment.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >>
> > > > > >> > > > >> > >>> > > > > > >
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > >
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> >
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> >
> > > > > >> > > > >>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to
> > > > utilize
> > > > > >> the
> > > > > >> > > > >> isolation
> > > > > >> > > > >> > >>> > > provided
> > > > > >> > > > >> > >>> > > > by
> > > > > >> > > > >> > >>> > > > > > the
> > > > > >> > > > >> > >>> > > > > > > >> > racks
> > > > > >> > > > >> > >>> > > > > > > >> > > in
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> data
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > center
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute
> > > > > replicas
> > > > > >> to
> > > > > >> > > > racks
> > > > > >> > > > >> to
> > > > > >> > > > >> > >>> > provide
> > > > > >> > > > >> > >>> > > > > fault
> > > > > >> > > > >> > >>> > > > > > > >> > > tolerance.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are
> > > > welcome.
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > > >> > > > >> > >>> > > > > > > >> > >
> > > > > >> > > > >> > >>> > > > > > > >> >
> > > > > >> > > > >> > >>> > > > > > > >>
> > > > > >> > > > >> > >>> > > > > > > >
> > > > > >> > > > >> > >>> > > > > > > >
> > > > > >> > > > >> > >>> > > > > > >
> > > > > >> > > > >> > >>> > > > > >
> > > > > >> > > > >> > >>> > > > >
> > > > > >> > > > >> > >>> > > >
> > > > > >> > > > >> > >>> > >
> > > > > >> > > > >> > >>> >
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>> --
> > > > > >> > > > >> > >>> Thanks,
> > > > > >> > > > >> > >>> Neha
> > > > > >> > > > >> > >>>
> > > > > >> > > > >> > >>
> > > > > >> > > > >> > >>
> > > > > >> > > > >> > >
> > > > > >> > > > >> >
> > > > > >> > > > >>
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Agreed. So it seems that for 0.9.1, the only option is to keep the JSON
version unchanged. But as part of the PR, I can change the behavior of
ZkUtils.getBrokerInfo()
to make it compatible with future JSON versions.

Thanks,
Allen


On Tue, Jan 12, 2016 at 2:57 PM, Jun Rao <ju...@confluent.io> wrote:

> Hi, Allen,
>
> That's a good point. In 0.9.0.0, the old consumer reads broker info
> directly from ZK and the code throws an exception if the version in json is
> not 1 or 2. This old consumer will break when we upgrade the broker json to
> version 3 in ZK in 0.9.1, which will be an issue. We overlooked this issue
> in 0.9.0.0. The easiest fix is probably not to check the version in
> ZkUtils.getBrokerInfo().
> This way, as long as we are only adding new fields in broker json, we can
> preserve the compatibility.
>
> Thanks,
>
> Jun
>
> On Tue, Jan 12, 2016 at 1:52 PM, Allen Wang <al...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > That's a good suggestion. However, it does not solve the problem for the
> > clients or thirty party tools that get broker information directly from
> > ZooKeeper.
> >
> > Thanks,
> > Allen
> >
> >
> > On Tue, Jan 12, 2016 at 1:29 PM, Jun Rao <ju...@confluent.io> wrote:
> >
> > > Allen,
> > >
> > > Another way to do this is the following.
> > >
> > > When inter.broker.protocol.version is set to 0.9.0, the broker will
> write
> > > the broker info in ZK using version 2, ignoring the rack info.
> > >
> > > When inter.broker.protocol.version is set to 0.9.1, the broker will
> write
> > > the broker info in ZK using version 3, including the rack info.
> > >
> > > If one follows the upgrade process, after the 2nd round of rolling
> > bounces,
> > > every broker is capable of parsing version 3 of broker info in ZK. This
> > is
> > > when the rack-aware feature will be used.
> > >
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Jan 12, 2016 at 12:19 PM, Allen Wang <al...@gmail.com>
> > wrote:
> > >
> > > > Regarding the JSON version of Broker:
> > > >
> > > > I don't why the ZkUtils.getBrokerInfo() restricts the JSON versions
> it
> > > can
> > > > read. It will throw exception if version is not 1 or 2. Seems to me
> > that
> > > it
> > > > will cause compatibility problem whenever the version needs to be
> > changed
> > > > and make the upgrade path difficult.
> > > >
> > > > One option we have is to make rack also part of version 2 and keep
> the
> > > > version 2 unchanged for this update. This will make the old clients
> > > > compatible. During rolling upgrade, it will also avoid problems if
> the
> > > > controller/broker is still the old version.
> > > >
> > > > However, ZkUtils.getBrokerInfo() will be updated to return the Broker
> > > with
> > > > rack so the rack information will be available once the server/client
> > is
> > > > upgraded to the latest version.
> > > >
> > > >
> > > >
> > > > On Wed, Jan 6, 2016 at 6:28 PM, Allen Wang <al...@gmail.com>
> > wrote:
> > > >
> > > > > Updated KIP according to Jun's comment and included changes to TMR.
> > > > >
> > > > > On Tue, Jan 5, 2016 at 5:59 PM, Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > >> Hi, Allen,
> > > > >>
> > > > >> A couple of minor comments on the KIP.
> > > > >>
> > > > >> 1. The version of the broker JSON string says 2. It should be 3.
> > > > >>
> > > > >> 2. The new version of UpdateMetadataRequest should be 2, instead
> of
> > 1.
> > > > >> Could you include the full wire protocol of version 2 of
> > > > >> UpdateMetadataRequest and highlight the changed part?
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Jun
> > > > >>
> > > > >> On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <al...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> > Jun and I had a chance to discuss it in a meeting and it is
> agreed
> > > to
> > > > >> > change the TMR in a different patch.
> > > > >> >
> > > > >> > I can change the KIP to include rack in TMR. The essential
> change
> > is
> > > > to
> > > > >> add
> > > > >> > rack into class BrokerEndPoint and make TMR version aware.
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
> > > > >> > aauradkar@linkedin.com.invalid> wrote:
> > > > >> >
> > > > >> > > Jun/Allen -
> > > > >> > >
> > > > >> > > Did we ever actually agree on whether we should evolve the TMR
> > to
> > > > >> include
> > > > >> > > rack info or not?
> > > > >> > > I don't feel strongly about it but I if it's the right thing
> to
> > do
> > > > we
> > > > >> > > should probably do it in this KIP (can be a separate patch)..
> it
> > > > >> isn't a
> > > > >> > > large change.
> > > > >> > >
> > > > >> > > Aditya
> > > > >> > >
> > > > >> > > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <
> > allenxwang@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Added the rolling upgrade instruction in the KIP, similar to
> > > those
> > > > >> in
> > > > >> > > 0.9.0
> > > > >> > > > release notes.
> > > > >> > > >
> > > > >> > > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <
> > > > allenxwang@gmail.com>
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > > > Hi Jun,
> > > > >> > > > >
> > > > >> > > > > The reason that TopicMetadataResponse is not included in
> the
> > > KIP
> > > > >> is
> > > > >> > > that
> > > > >> > > > > it currently is not version aware . So we need to
> introduce
> > > > >> version
> > > > >> > to
> > > > >> > > it
> > > > >> > > > > in order to make sure backward compatibility. It seems to
> > me a
> > > > big
> > > > >> > > > change.
> > > > >> > > > > Do we want to couple it with this KIP? Do we need to
> further
> > > > >> discuss
> > > > >> > > what
> > > > >> > > > > information to include in the new version besides rack?
> For
> > > > >> example,
> > > > >> > > > should
> > > > >> > > > > we include broker security protocol in
> > TopicMetadataResponse?
> > > > >> > > > >
> > > > >> > > > > The other option is to make it a separate KIP to make
> > > > >> > > > > TopicMetadataResponse version aware and decide what to
> > > include,
> > > > >> and
> > > > >> > > make
> > > > >> > > > > this KIP focus on the rack aware algorithm, admin tools
> and
> > > > >> related
> > > > >> > > > > changes to inter-broker protocol .
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > > Allen
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <
> jun@confluent.io>
> > > > >> wrote:
> > > > >> > > > >
> > > > >> > > > >> Allen,
> > > > >> > > > >>
> > > > >> > > > >> Thanks for the proposal. A few comments.
> > > > >> > > > >>
> > > > >> > > > >> 1. Since this KIP changes the inter broker communication
> > > > protocol
> > > > >> > > > >> (UpdateMetadataRequest), we will need to document the
> > upgrade
> > > > >> path
> > > > >> > > > >> (similar
> > > > >> > > > >> to what's described in
> > > > >> > > > >> http://kafka.apache.org/090/documentation.html#upgrade).
> > > > >> > > > >>
> > > > >> > > > >> 2. It might be useful to include the rack info of the
> > broker
> > > in
> > > > >> > > > >> TopicMetadataResponse. This can be useful for
> > administrative
> > > > >> tasks,
> > > > >> > as
> > > > >> > > > >> well
> > > > >> > > > >> as read affinity in the future.
> > > > >> > > > >>
> > > > >> > > > >> Jun
> > > > >> > > > >>
> > > > >> > > > >>
> > > > >> > > > >>
> > > > >> > > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <
> > > > >> allenxwang@gmail.com>
> > > > >> > > > wrote:
> > > > >> > > > >>
> > > > >> > > > >> > If there are no more comments I would like to call for
> a
> > > > vote.
> > > > >> > > > >> >
> > > > >> > > > >> >
> > > > >> > > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <
> > > > >> > allenxwang@gmail.com>
> > > > >> > > > >> wrote:
> > > > >> > > > >> >
> > > > >> > > > >> > > KIP is updated with more details and how to handle
> the
> > > > >> situation
> > > > >> > > > where
> > > > >> > > > >> > > rack information is incomplete.
> > > > >> > > > >> > >
> > > > >> > > > >> > > In the situation where rack information is
> incomplete,
> > > but
> > > > we
> > > > >> > want
> > > > >> > > > to
> > > > >> > > > >> > > continue with the assignment, I have suggested to
> > ignore
> > > > all
> > > > >> > rack
> > > > >> > > > >> > > information and fallback to original algorithm. The
> > > reason
> > > > is
> > > > >> > > > >> explained
> > > > >> > > > >> > > below:
> > > > >> > > > >> > >
> > > > >> > > > >> > > The other options are to assume that the broker
> without
> > > the
> > > > >> rack
> > > > >> > > > >> belong
> > > > >> > > > >> > to
> > > > >> > > > >> > > its own unique rack, or they belong to one "default"
> > > rack.
> > > > >> > Either
> > > > >> > > > way
> > > > >> > > > >> we
> > > > >> > > > >> > > choose, it is highly likely to result in uneven
> number
> > of
> > > > >> > brokers
> > > > >> > > in
> > > > >> > > > >> > racks,
> > > > >> > > > >> > > and it is quite possible that the "made up" racks
> will
> > > have
> > > > >> much
> > > > >> > > > fewer
> > > > >> > > > >> > > number of brokers. As I explained in the KIP, uneven
> > > number
> > > > >> of
> > > > >> > > > >> brokers in
> > > > >> > > > >> > > racks will lead to uneven distribution of replicas
> > among
> > > > >> brokers
> > > > >> > > > (even
> > > > >> > > > >> > > though the leader distribution is still even). The
> > > brokers
> > > > in
> > > > >> > the
> > > > >> > > > rack
> > > > >> > > > >> > that
> > > > >> > > > >> > > has fewer number of brokers will get more replicas
> per
> > > > broker
> > > > >> > than
> > > > >> > > > >> > brokers
> > > > >> > > > >> > > in other racks.
> > > > >> > > > >> > >
> > > > >> > > > >> > > Given this fact and the replica assignment produced
> > will
> > > be
> > > > >> > > > incorrect
> > > > >> > > > >> > > anyway from rack aware point of view, ignoring all
> rack
> > > > >> > > information
> > > > >> > > > >> and
> > > > >> > > > >> > > fallback to the original algorithm is not a bad
> choice
> > > > since
> > > > >> it
> > > > >> > > will
> > > > >> > > > >> at
> > > > >> > > > >> > > least have a better guarantee of replica
> distribution.
> > > > >> > > > >> > >
> > > > >> > > > >> > > Also for command line tools it gives user a choice if
> > for
> > > > any
> > > > >> > > reason
> > > > >> > > > >> they
> > > > >> > > > >> > > want to ignore rack information and fallback to the
> > > > original
> > > > >> > > > >> algorithm.
> > > > >> > > > >> > >
> > > > >> > > > >> > >
> > > > >> > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <
> > > > >> > allenxwang@gmail.com
> > > > >> > > >
> > > > >> > > > >> > wrote:
> > > > >> > > > >> > >
> > > > >> > > > >> > >> I am busy with some time pressing issues for the
> last
> > > few
> > > > >> > days. I
> > > > >> > > > >> will
> > > > >> > > > >> > >> think about how the incomplete rack information will
> > > > affect
> > > > >> the
> > > > >> > > > >> balance
> > > > >> > > > >> > and
> > > > >> > > > >> > >> update the KIP by early next week.
> > > > >> > > > >> > >>
> > > > >> > > > >> > >> Thanks,
> > > > >> > > > >> > >> Allen
> > > > >> > > > >> > >>
> > > > >> > > > >> > >>
> > > > >> > > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <
> > > > >> > neha@confluent.io
> > > > >> > > >
> > > > >> > > > >> > wrote:
> > > > >> > > > >> > >>
> > > > >> > > > >> > >>> Few suggestions on improving the KIP
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>> *If some brokers have rack, and some do not, the
> > > > algorithm
> > > > >> > will
> > > > >> > > > >> thrown
> > > > >> > > > >> > an
> > > > >> > > > >> > >>> > exception. This is to prevent incorrect
> assignment
> > > > >> caused by
> > > > >> > > > user
> > > > >> > > > >> > >>> error.*
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>> In the KIP, can you clearly state the user-facing
> > > > behavior
> > > > >> > when
> > > > >> > > > some
> > > > >> > > > >> > >>> brokers have rack information and some don't. Which
> > > > actions
> > > > >> > and
> > > > >> > > > >> > requests
> > > > >> > > > >> > >>> will error out and how?
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>> *Even distribution of partition leadership among
> > > brokers*
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>> There is some information about arranging the
> sorted
> > > > broker
> > > > >> > list
> > > > >> > > > >> > >>> interlaced
> > > > >> > > > >> > >>> with rack ids. Can you describe the changes to the
> > > > current
> > > > >> > > > algorithm
> > > > >> > > > >> > in a
> > > > >> > > > >> > >>> little more detail? How does this interlacing work
> if
> > > > only
> > > > >> a
> > > > >> > > > subset
> > > > >> > > > >> of
> > > > >> > > > >> > >>> brokers have the rack id configured? Does this
> still
> > > work
> > > > >> if
> > > > >> > > > uneven
> > > > >> > > > >> #
> > > > >> > > > >> > of
> > > > >> > > > >> > >>> brokers are assigned to each rack? It might work,
> I'm
> > > > >> looking
> > > > >> > > for
> > > > >> > > > >> more
> > > > >> > > > >> > >>> details on the changes, since it will affect the
> > > behavior
> > > > >> seen
> > > > >> > > by
> > > > >> > > > >> the
> > > > >> > > > >> > >>> user
> > > > >> > > > >> > >>> - imbalance on either the leaders or data or both.
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> > > > >> > > > >> > aauradkar@linkedin.com>
> > > > >> > > > >> > >>> wrote:
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>> > I think this sounds reasonable. Anyone else have
> > > > >> comments?
> > > > >> > > > >> > >>> >
> > > > >> > > > >> > >>> > Aditya
> > > > >> > > > >> > >>> >
> > > > >> > > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> > > > >> > > > allenxwang@gmail.com
> > > > >> > > > >> >
> > > > >> > > > >> > >>> wrote:
> > > > >> > > > >> > >>> >
> > > > >> > > > >> > >>> > > During the discussion in the hangout, it was
> > > > mentioned
> > > > >> > that
> > > > >> > > it
> > > > >> > > > >> > would
> > > > >> > > > >> > >>> be
> > > > >> > > > >> > >>> > > desirable that consumers know the rack
> > information
> > > of
> > > > >> the
> > > > >> > > > >> brokers
> > > > >> > > > >> > so
> > > > >> > > > >> > >>> that
> > > > >> > > > >> > >>> > > they can consume from the broker in the same
> rack
> > > to
> > > > >> > reduce
> > > > >> > > > >> > latency.
> > > > >> > > > >> > >>> As I
> > > > >> > > > >> > >>> > > understand this will only be beneficial if
> > consumer
> > > > can
> > > > >> > > > consume
> > > > >> > > > >> > from
> > > > >> > > > >> > >>> any
> > > > >> > > > >> > >>> > > broker in ISR, which is not possible now.
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> > > I suggest we skip the change to TMR. Once the
> > > change
> > > > is
> > > > >> > made
> > > > >> > > > to
> > > > >> > > > >> > >>> consumer
> > > > >> > > > >> > >>> > to
> > > > >> > > > >> > >>> > > be able to consume from any broker in ISR, the
> > rack
> > > > >> > > > information
> > > > >> > > > >> can
> > > > >> > > > >> > >>> be
> > > > >> > > > >> > >>> > > added to TMR.
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> > > Another thing I want to confirm is  command
> line
> > > > >> > behavior. I
> > > > >> > > > >> think
> > > > >> > > > >> > >>> the
> > > > >> > > > >> > >>> > > desirable default behavior is to fail fast on
> > > command
> > > > >> line
> > > > >> > > for
> > > > >> > > > >> > >>> incomplete
> > > > >> > > > >> > >>> > > rack mapping. The error message can include
> > further
> > > > >> > > > instruction
> > > > >> > > > >> > that
> > > > >> > > > >> > >>> > tells
> > > > >> > > > >> > >>> > > the user to add an extra argument (like
> > > > >> > > > >> "--allow-partial-rackinfo")
> > > > >> > > > >> > >>> to
> > > > >> > > > >> > >>> > > suppress the error and do an imperfect rack
> aware
> > > > >> > > assignment.
> > > > >> > > > If
> > > > >> > > > >> > the
> > > > >> > > > >> > >>> > > default behavior is to allow incomplete
> mapping,
> > > the
> > > > >> error
> > > > >> > > can
> > > > >> > > > >> > still
> > > > >> > > > >> > >>> be
> > > > >> > > > >> > >>> > > easily missed.
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> > > The affected command line tools are
> TopicCommand
> > > and
> > > > >> > > > >> > >>> > > ReassignPartitionsCommand.
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> > > Thanks,
> > > > >> > > > >> > >>> > > Allen
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya
> > Auradkar <
> > > > >> > > > >> > >>> > aauradkar@linkedin.com>
> > > > >> > > > >> > >>> > > wrote:
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> > > > Hi Allen,
> > > > >> > > > >> > >>> > > >
> > > > >> > > > >> > >>> > > > For TopicMetadataResponse to understand
> > version,
> > > > you
> > > > >> can
> > > > >> > > > bump
> > > > >> > > > >> up
> > > > >> > > > >> > >>> the
> > > > >> > > > >> > >>> > > > request version itself. Based on the version
> of
> > > the
> > > > >> > > request,
> > > > >> > > > >> the
> > > > >> > > > >> > >>> > response
> > > > >> > > > >> > >>> > > > can be appropriately serialized. It shouldn't
> > be
> > > a
> > > > >> huge
> > > > >> > > > >> change.
> > > > >> > > > >> > For
> > > > >> > > > >> > >>> > > > example: We went through something similar
> for
> > > > >> > > > ProduceRequest
> > > > >> > > > >> > >>> recently
> > > > >> > > > >> > >>> > (
> > > > >> > > > >> > >>> > > > https://reviews.apache.org/r/33378/)
> > > > >> > > > >> > >>> > > > I guess the reason protocol information is
> not
> > > > >> included
> > > > >> > in
> > > > >> > > > the
> > > > >> > > > >> > TMR
> > > > >> > > > >> > >>> is
> > > > >> > > > >> > >>> > > > because the topic itself is independent of
> any
> > > > >> > particular
> > > > >> > > > >> > protocol
> > > > >> > > > >> > >>> (SSL
> > > > >> > > > >> > >>> > > vs
> > > > >> > > > >> > >>> > > > Plaintext). Having said that, I'm not sure we
> > > even
> > > > >> need
> > > > >> > > rack
> > > > >> > > > >> > >>> > information
> > > > >> > > > >> > >>> > > in
> > > > >> > > > >> > >>> > > > TMR. What usecase were you thinking of
> > initially?
> > > > >> > > > >> > >>> > > >
> > > > >> > > > >> > >>> > > > For 1 - I'd be fine with adding an option to
> > the
> > > > >> command
> > > > >> > > > line
> > > > >> > > > >> > tools
> > > > >> > > > >> > >>> > that
> > > > >> > > > >> > >>> > > > check rack assignment. For e.g.
> > > > >> "--strict-assignment" or
> > > > >> > > > >> > something
> > > > >> > > > >> > >>> > > similar.
> > > > >> > > > >> > >>> > > >
> > > > >> > > > >> > >>> > > > Aditya
> > > > >> > > > >> > >>> > > >
> > > > >> > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> > > > >> > > > >> > allenxwang@gmail.com>
> > > > >> > > > >> > >>> > > wrote:
> > > > >> > > > >> > >>> > > >
> > > > >> > > > >> > >>> > > > > For 2 and 3, I have updated the KIP. Please
> > > take
> > > > a
> > > > >> > look.
> > > > >> > > > One
> > > > >> > > > >> > >>> thing I
> > > > >> > > > >> > >>> > > have
> > > > >> > > > >> > >>> > > > > changed is removing the proposal to add
> rack
> > to
> > > > >> > > > >> > >>> > TopicMetadataResponse.
> > > > >> > > > >> > >>> > > > The
> > > > >> > > > >> > >>> > > > > reason is that unlike
> UpdateMetadataRequest,
> > > > >> > > > >> > >>> TopicMetadataResponse
> > > > >> > > > >> > >>> > does
> > > > >> > > > >> > >>> > > > not
> > > > >> > > > >> > >>> > > > > understand version. I don't see a way to
> > > include
> > > > >> rack
> > > > >> > > > >> without
> > > > >> > > > >> > >>> > breaking
> > > > >> > > > >> > >>> > > > old
> > > > >> > > > >> > >>> > > > > version of clients. That's probably why
> > secure
> > > > >> > protocol
> > > > >> > > is
> > > > >> > > > >> not
> > > > >> > > > >> > >>> > included
> > > > >> > > > >> > >>> > > > in
> > > > >> > > > >> > >>> > > > > the TopicMetadataResponse either. I think
> it
> > > will
> > > > >> be a
> > > > >> > > > much
> > > > >> > > > >> > >>> bigger
> > > > >> > > > >> > >>> > > change
> > > > >> > > > >> > >>> > > > > to include rack in TopicMetadataResponse.
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > > > For 1, my concern is that doing rack aware
> > > > >> assignment
> > > > >> > > > >> without
> > > > >> > > > >> > >>> > complete
> > > > >> > > > >> > >>> > > > > broker to rack mapping will result in
> > > assignment
> > > > >> that
> > > > >> > is
> > > > >> > > > not
> > > > >> > > > >> > rack
> > > > >> > > > >> > >>> > aware
> > > > >> > > > >> > >>> > > > and
> > > > >> > > > >> > >>> > > > > fail to provide fault tolerance in the
> event
> > of
> > > > >> rack
> > > > >> > > > outage.
> > > > >> > > > >> > This
> > > > >> > > > >> > >>> > kind
> > > > >> > > > >> > >>> > > of
> > > > >> > > > >> > >>> > > > > problem will be difficult to surface. And
> the
> > > > cost
> > > > >> of
> > > > >> > > this
> > > > >> > > > >> > >>> problem is
> > > > >> > > > >> > >>> > > > high:
> > > > >> > > > >> > >>> > > > > you have to do partition reassignment if
> you
> > > are
> > > > >> lucky
> > > > >> > > to
> > > > >> > > > >> spot
> > > > >> > > > >> > >>> the
> > > > >> > > > >> > >>> > > > problem
> > > > >> > > > >> > >>> > > > > early on or face the consequence of data
> loss
> > > > >> during
> > > > >> > > real
> > > > >> > > > >> rack
> > > > >> > > > >> > >>> > outage.
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > > > I do see the concern of fail-fast as it
> might
> > > > also
> > > > >> > cause
> > > > >> > > > >> data
> > > > >> > > > >> > >>> loss if
> > > > >> > > > >> > >>> > > > > producer is not able produce the message
> due
> > to
> > > > >> topic
> > > > >> > > > >> creation
> > > > >> > > > >> > >>> > failure.
> > > > >> > > > >> > >>> > > > Is
> > > > >> > > > >> > >>> > > > > it feasible to treat dynamic topic creation
> > and
> > > > >> > command
> > > > >> > > > >> tools
> > > > >> > > > >> > >>> > > > differently?
> > > > >> > > > >> > >>> > > > > We allow dynamic topic creation with
> > incomplete
> > > > >> > > > broker-rack
> > > > >> > > > >> > >>> mapping
> > > > >> > > > >> > >>> > and
> > > > >> > > > >> > >>> > > > > fail fast in command line. Another option
> is
> > to
> > > > let
> > > > >> > user
> > > > >> > > > >> > >>> determine
> > > > >> > > > >> > >>> > the
> > > > >> > > > >> > >>> > > > > behavior for command line. For example, by
> > > > default
> > > > >> > fail
> > > > >> > > > >> fast in
> > > > >> > > > >> > >>> > command
> > > > >> > > > >> > >>> > > > > line but allow incomplete broker-rack
> mapping
> > > if
> > > > >> > another
> > > > >> > > > >> switch
> > > > >> > > > >> > >>> is
> > > > >> > > > >> > >>> > > > > provided.
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya
> > > > Auradkar <
> > > > >> > > > >> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > > > > Hey Allen,
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > 1. If we choose fail fast topic creation,
> > we
> > > > will
> > > > >> > have
> > > > >> > > > >> topic
> > > > >> > > > >> > >>> > creation
> > > > >> > > > >> > >>> > > > > > failures while upgrading the cluster. I
> > > really
> > > > >> doubt
> > > > >> > > we
> > > > >> > > > >> want
> > > > >> > > > >> > >>> this
> > > > >> > > > >> > >>> > > > > behavior.
> > > > >> > > > >> > >>> > > > > > Ideally, this should be invisible to
> > clients
> > > > of a
> > > > >> > > > cluster.
> > > > >> > > > >> > >>> > Currently,
> > > > >> > > > >> > >>> > > > > each
> > > > >> > > > >> > >>> > > > > > broker is effectively its own rack. So we
> > > > >> probably
> > > > >> > can
> > > > >> > > > use
> > > > >> > > > >> > the
> > > > >> > > > >> > >>> rack
> > > > >> > > > >> > >>> > > > > > information whenever possible but not
> make
> > > it a
> > > > >> hard
> > > > >> > > > >> > >>> requirement.
> > > > >> > > > >> > >>> > To
> > > > >> > > > >> > >>> > > > > extend
> > > > >> > > > >> > >>> > > > > > Gwen's example, one badly configured
> broker
> > > > >> should
> > > > >> > not
> > > > >> > > > >> > degrade
> > > > >> > > > >> > >>> > topic
> > > > >> > > > >> > >>> > > > > > creation for the entire cluster.
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a
> section
> > > on
> > > > >> the
> > > > >> > > > upgrade
> > > > >> > > > >> > >>> piece to
> > > > >> > > > >> > >>> > > > > confirm
> > > > >> > > > >> > >>> > > > > > that old clients will not see errors? I
> > > believe
> > > > >> > > > >> > >>> > > > > ZookeeperConsumerConnector
> > > > >> > > > >> > >>> > > > > > reads the Broker objects from ZK. I
> wanted
> > to
> > > > >> > confirm
> > > > >> > > > that
> > > > >> > > > >> > this
> > > > >> > > > >> > >>> > will
> > > > >> > > > >> > >>> > > > not
> > > > >> > > > >> > >>> > > > > > cause any problems.
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > 3. Could you elaborate your proposed
> > changes
> > > to
> > > > >> the
> > > > >> > > > >> > >>> > > > UpdateMetadataRequest
> > > > >> > > > >> > >>> > > > > > in the "Public Interfaces" section?
> > > > Personally, I
> > > > >> > find
> > > > >> > > > >> this
> > > > >> > > > >> > >>> format
> > > > >> > > > >> > >>> > > easy
> > > > >> > > > >> > >>> > > > > to
> > > > >> > > > >> > >>> > > > > > read in terms of wire protocol changes:
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > >
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> >
> > > > >> > > > >> > >>>
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > Aditya
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen
> > Wang <
> > > > >> > > > >> > >>> allenxwang@gmail.com>
> > > > >> > > > >> > >>> > > > > wrote:
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > > KIP is updated include rack as an
> > optional
> > > > >> > property
> > > > >> > > > for
> > > > >> > > > >> > >>> broker.
> > > > >> > > > >> > >>> > > > Please
> > > > >> > > > >> > >>> > > > > > take
> > > > >> > > > >> > >>> > > > > > > a look and let me know if more details
> > are
> > > > >> needed.
> > > > >> > > > >> > >>> > > > > > >
> > > > >> > > > >> > >>> > > > > > > For the case where some brokers have
> rack
> > > and
> > > > >> some
> > > > >> > > do
> > > > >> > > > >> not,
> > > > >> > > > >> > >>> the
> > > > >> > > > >> > >>> > > > current
> > > > >> > > > >> > >>> > > > > > KIP
> > > > >> > > > >> > >>> > > > > > > uses the fail-fast behavior. If there
> are
> > > > >> > concerns,
> > > > >> > > we
> > > > >> > > > >> can
> > > > >> > > > >> > >>> > further
> > > > >> > > > >> > >>> > > > > > discuss
> > > > >> > > > >> > >>> > > > > > > this in the email thread or next
> hangout.
> > > > >> > > > >> > >>> > > > > > >
> > > > >> > > > >> > >>> > > > > > >
> > > > >> > > > >> > >>> > > > > > >
> > > > >> > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen
> > > Wang
> > > > <
> > > > >> > > > >> > >>> > allenxwang@gmail.com
> > > > >> > > > >> > >>> > > >
> > > > >> > > > >> > >>> > > > > > wrote:
> > > > >> > > > >> > >>> > > > > > >
> > > > >> > > > >> > >>> > > > > > > > That's a good question. I can think
> of
> > > > three
> > > > >> > > actions
> > > > >> > > > >> if
> > > > >> > > > >> > the
> > > > >> > > > >> > >>> > rack
> > > > >> > > > >> > >>> > > > > > > > information is incomplete:
> > > > >> > > > >> > >>> > > > > > > >
> > > > >> > > > >> > >>> > > > > > > > 1. Treat the node without rack as if
> it
> > > is
> > > > on
> > > > >> > its
> > > > >> > > > >> unique
> > > > >> > > > >> > >>> rack
> > > > >> > > > >> > >>> > > > > > > > 2. Disregard all rack information and
> > > > >> fallback
> > > > >> > to
> > > > >> > > > >> current
> > > > >> > > > >> > >>> > > algorithm
> > > > >> > > > >> > >>> > > > > > > > 3. Fail-fast
> > > > >> > > > >> > >>> > > > > > > >
> > > > >> > > > >> > >>> > > > > > > > Now I think about it, one and three
> > make
> > > > more
> > > > >> > > sense.
> > > > >> > > > >> The
> > > > >> > > > >> > >>> reason
> > > > >> > > > >> > >>> > > for
> > > > >> > > > >> > >>> > > > > > > > fail-fast is that user mistake for
> not
> > > > >> providing
> > > > >> > > the
> > > > >> > > > >> rack
> > > > >> > > > >> > >>> may
> > > > >> > > > >> > >>> > > never
> > > > >> > > > >> > >>> > > > > be
> > > > >> > > > >> > >>> > > > > > > > found if we tolerate that and the
> > > > assignment
> > > > >> may
> > > > >> > > not
> > > > >> > > > >> be
> > > > >> > > > >> > >>> rack
> > > > >> > > > >> > >>> > > aware
> > > > >> > > > >> > >>> > > > as
> > > > >> > > > >> > >>> > > > > > the
> > > > >> > > > >> > >>> > > > > > > > user has expected and this creates
> > debug
> > > > >> > problems
> > > > >> > > > when
> > > > >> > > > >> > >>> things
> > > > >> > > > >> > >>> > > fail.
> > > > >> > > > >> > >>> > > > > > > >
> > > > >> > > > >> > >>> > > > > > > > What do you think? If not fail-fast,
> is
> > > > there
> > > > >> > > anyway
> > > > >> > > > >> we
> > > > >> > > > >> > can
> > > > >> > > > >> > >>> > make
> > > > >> > > > >> > >>> > > > the
> > > > >> > > > >> > >>> > > > > > user
> > > > >> > > > >> > >>> > > > > > > > error standing out?
> > > > >> > > > >> > >>> > > > > > > >
> > > > >> > > > >> > >>> > > > > > > >
> > > > >> > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM,
> Gwen
> > > > >> Shapira <
> > > > >> > > > >> > >>> > > gwen@confluent.io>
> > > > >> > > > >> > >>> > > > > > > wrote:
> > > > >> > > > >> > >>> > > > > > > >
> > > > >> > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when some
> > > brokers
> > > > >> have
> > > > >> > > > rack
> > > > >> > > > >> > >>> > assignment
> > > > >> > > > >> > >>> > > > and
> > > > >> > > > >> > >>> > > > > > some
> > > > >> > > > >> > >>> > > > > > > >> don't, do we act like none of them
> > have
> > > > it?
> > > > >> or
> > > > >> > > like
> > > > >> > > > >> > those
> > > > >> > > > >> > >>> > > without
> > > > >> > > > >> > >>> > > > > > > >> assignment are in their own rack?
> > > > >> > > > >> > >>> > > > > > > >>
> > > > >> > > > >> > >>> > > > > > > >> The first scenario is good when
> first
> > > > >> setting
> > > > >> > up
> > > > >> > > > >> > >>> > rack-awareness,
> > > > >> > > > >> > >>> > > > but
> > > > >> > > > >> > >>> > > > > > the
> > > > >> > > > >> > >>> > > > > > > >> second makes more sense for on-going
> > > > >> > maintenance
> > > > >> > > (I
> > > > >> > > > >> can
> > > > >> > > > >> > >>> > totally
> > > > >> > > > >> > >>> > > > see
> > > > >> > > > >> > >>> > > > > > > >> someone
> > > > >> > > > >> > >>> > > > > > > >> adding a node and forgetting to set
> > the
> > > > rack
> > > > >> > > > >> property,
> > > > >> > > > >> > we
> > > > >> > > > >> > >>> > don't
> > > > >> > > > >> > >>> > > > want
> > > > >> > > > >> > >>> > > > > > > this
> > > > >> > > > >> > >>> > > > > > > >> to change behavior for anything
> except
> > > the
> > > > >> new
> > > > >> > > > node).
> > > > >> > > > >> > >>> > > > > > > >>
> > > > >> > > > >> > >>> > > > > > > >> What do you think?
> > > > >> > > > >> > >>> > > > > > > >>
> > > > >> > > > >> > >>> > > > > > > >> Gwen
> > > > >> > > > >> > >>> > > > > > > >>
> > > > >> > > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM,
> > Allen
> > > > >> Wang <
> > > > >> > > > >> > >>> > > > allenxwang@gmail.com>
> > > > >> > > > >> > >>> > > > > > > >> wrote:
> > > > >> > > > >> > >>> > > > > > > >>
> > > > >> > > > >> > >>> > > > > > > >> > For scenario 1:
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >> > - Add the rack information to
> broker
> > > > >> property
> > > > >> > > > file
> > > > >> > > > >> or
> > > > >> > > > >> > >>> > > > dynamically
> > > > >> > > > >> > >>> > > > > > set
> > > > >> > > > >> > >>> > > > > > > >> it in
> > > > >> > > > >> > >>> > > > > > > >> > the wrapper code to bootstrap
> Kafka
> > > > >> server.
> > > > >> > You
> > > > >> > > > >> would
> > > > >> > > > >> > do
> > > > >> > > > >> > >>> > that
> > > > >> > > > >> > >>> > > > for
> > > > >> > > > >> > >>> > > > > > all
> > > > >> > > > >> > >>> > > > > > > >> > brokers and restart the brokers
> one
> > by
> > > > >> one.
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >> > In this scenario, the complete
> > broker
> > > to
> > > > >> rack
> > > > >> > > > >> mapping
> > > > >> > > > >> > >>> may
> > > > >> > > > >> > >>> > not
> > > > >> > > > >> > >>> > > be
> > > > >> > > > >> > >>> > > > > > > >> available
> > > > >> > > > >> > >>> > > > > > > >> > until every broker is restarted.
> > > During
> > > > >> that
> > > > >> > > time
> > > > >> > > > >> we
> > > > >> > > > >> > >>> fall
> > > > >> > > > >> > >>> > back
> > > > >> > > > >> > >>> > > > to
> > > > >> > > > >> > >>> > > > > > > >> default
> > > > >> > > > >> > >>> > > > > > > >> > replica assignment algorithm.
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >> > For scenario 2:
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >> > - Add the rack information to
> broker
> > > > >> property
> > > > >> > > > file
> > > > >> > > > >> or
> > > > >> > > > >> > >>> > > > dynamically
> > > > >> > > > >> > >>> > > > > > set
> > > > >> > > > >> > >>> > > > > > > >> it in
> > > > >> > > > >> > >>> > > > > > > >> > the wrapper code and start the
> > broker.
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM,
> > Gwen
> > > > >> > Shapira <
> > > > >> > > > >> > >>> > > > gwen@confluent.io>
> > > > >> > > > >> > >>> > > > > > > >> wrote:
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >> > > Can you clarify the workflow for
> > the
> > > > >> > > following
> > > > >> > > > >> > >>> scenarios:
> > > > >> > > > >> > >>> > > > > > > >> > >
> > > > >> > > > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers
> and
> > > want
> > > > >> to
> > > > >> > add
> > > > >> > > > >> rack
> > > > >> > > > >> > >>> > > information
> > > > >> > > > >> > >>> > > > > for
> > > > >> > > > >> > >>> > > > > > > >> each
> > > > >> > > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I
> > > want
> > > > to
> > > > >> > > > specify
> > > > >> > > > >> > which
> > > > >> > > > >> > >>> > rack
> > > > >> > > > >> > >>> > > it
> > > > >> > > > >> > >>> > > > > > > >> belongs on
> > > > >> > > > >> > >>> > > > > > > >> > > while adding it.
> > > > >> > > > >> > >>> > > > > > > >> > >
> > > > >> > > > >> > >>> > > > > > > >> > > Thanks!
> > > > >> > > > >> > >>> > > > > > > >> > >
> > > > >> > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM,
> > > Allen
> > > > >> > Wang <
> > > > >> > > > >> > >>> > > > > allenxwang@gmail.com
> > > > >> > > > >> > >>> > > > > > >
> > > > >> > > > >> > >>> > > > > > > >> > wrote:
> > > > >> > > > >> > >>> > > > > > > >> > >
> > > > >> > > > >> > >>> > > > > > > >> > > > We discussed the KIP in the
> > > hangout
> > > > >> > today.
> > > > >> > > > The
> > > > >> > > > >> > >>> > > > recommendation
> > > > >> > > > >> > >>> > > > > is
> > > > >> > > > >> > >>> > > > > > > to
> > > > >> > > > >> > >>> > > > > > > >> > make
> > > > >> > > > >> > >>> > > > > > > >> > > > rack as a broker property in
> > > > >> ZooKeeper.
> > > > >> > For
> > > > >> > > > >> users
> > > > >> > > > >> > >>> with
> > > > >> > > > >> > >>> > > > > existing
> > > > >> > > > >> > >>> > > > > > > rack
> > > > >> > > > >> > >>> > > > > > > >> > > > information stored somewhere,
> > they
> > > > >> would
> > > > >> > > need
> > > > >> > > > >> to
> > > > >> > > > >> > >>> > retrieve
> > > > >> > > > >> > >>> > > > the
> > > > >> > > > >> > >>> > > > > > > >> > information
> > > > >> > > > >> > >>> > > > > > > >> > > > at broker start up and
> > dynamically
> > > > set
> > > > >> > the
> > > > >> > > > rack
> > > > >> > > > >> > >>> > property,
> > > > >> > > > >> > >>> > > > > which
> > > > >> > > > >> > >>> > > > > > > can
> > > > >> > > > >> > >>> > > > > > > >> be
> > > > >> > > > >> > >>> > > > > > > >> > > > implemented as a wrapper to
> > > > bootstrap
> > > > >> > > broker.
> > > > >> > > > >> > There
> > > > >> > > > >> > >>> will
> > > > >> > > > >> > >>> > > be
> > > > >> > > > >> > >>> > > > no
> > > > >> > > > >> > >>> > > > > > > >> > interface
> > > > >> > > > >> > >>> > > > > > > >> > > or
> > > > >> > > > >> > >>> > > > > > > >> > > > pluggable implementation to
> > > retrieve
> > > > >> the
> > > > >> > > rack
> > > > >> > > > >> > >>> > information.
> > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > The assumption is that you
> > always
> > > > >> need to
> > > > >> > > > >> restart
> > > > >> > > > >> > >>> the
> > > > >> > > > >> > >>> > > broker
> > > > >> > > > >> > >>> > > > > to
> > > > >> > > > >> > >>> > > > > > > >> make a
> > > > >> > > > >> > >>> > > > > > > >> > > > change to the rack.
> > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > Once the rack becomes a broker
> > > > >> property,
> > > > >> > it
> > > > >> > > > >> will
> > > > >> > > > >> > be
> > > > >> > > > >> > >>> > > possible
> > > > >> > > > >> > >>> > > > > to
> > > > >> > > > >> > >>> > > > > > > make
> > > > >> > > > >> > >>> > > > > > > >> > rack
> > > > >> > > > >> > >>> > > > > > > >> > > > part of the meta data to help
> > the
> > > > >> > consumer
> > > > >> > > > >> choose
> > > > >> > > > >> > >>> which
> > > > >> > > > >> > >>> > in
> > > > >> > > > >> > >>> > > > > sync
> > > > >> > > > >> > >>> > > > > > > >> replica
> > > > >> > > > >> > >>> > > > > > > >> > > to
> > > > >> > > > >> > >>> > > > > > > >> > > > consume from as part of the
> > future
> > > > >> > consumer
> > > > >> > > > >> > >>> enhancement.
> > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > I will update the KIP.
> > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > Thanks,
> > > > >> > > > >> > >>> > > > > > > >> > > > Allen
> > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23
> AM,
> > > > Allen
> > > > >> > Wang
> > > > >> > > <
> > > > >> > > > >> > >>> > > > > > allenxwang@gmail.com>
> > > > >> > > > >> > >>> > > > > > > >> > wrote:
> > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP
> > hangout
> > > > but
> > > > >> > this
> > > > >> > > > KIP
> > > > >> > > > >> > was
> > > > >> > > > >> > >>> not
> > > > >> > > > >> > >>> > > > > > discussed
> > > > >> > > > >> > >>> > > > > > > >> due
> > > > >> > > > >> > >>> > > > > > > >> > to
> > > > >> > > > >> > >>> > > > > > > >> > > > > time constraint.
> > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > > However, after hearing
> > > discussion
> > > > of
> > > > >> > > > KIP-35,
> > > > >> > > > >> I
> > > > >> > > > >> > >>> have
> > > > >> > > > >> > >>> > the
> > > > >> > > > >> > >>> > > > > > feeling
> > > > >> > > > >> > >>> > > > > > > >> that
> > > > >> > > > >> > >>> > > > > > > >> > > > > incompatibility (caused by
> new
> > > > >> broker
> > > > >> > > > >> property)
> > > > >> > > > >> > >>> > between
> > > > >> > > > >> > >>> > > > > > brokers
> > > > >> > > > >> > >>> > > > > > > >> with
> > > > >> > > > >> > >>> > > > > > > >> > > > > different versions  will be
> > > solved
> > > > >> > there.
> > > > >> > > > In
> > > > >> > > > >> > >>> addition,
> > > > >> > > > >> > >>> > > > > having
> > > > >> > > > >> > >>> > > > > > > >> stack
> > > > >> > > > >> > >>> > > > > > > >> > in
> > > > >> > > > >> > >>> > > > > > > >> > > > > broker property as meta data
> > may
> > > > >> also
> > > > >> > > help
> > > > >> > > > >> > >>> consumers
> > > > >> > > > >> > >>> > in
> > > > >> > > > >> > >>> > > > the
> > > > >> > > > >> > >>> > > > > > > >> future.
> > > > >> > > > >> > >>> > > > > > > >> > So
> > > > >> > > > >> > >>> > > > > > > >> > > I
> > > > >> > > > >> > >>> > > > > > > >> > > > am
> > > > >> > > > >> > >>> > > > > > > >> > > > > open to adding stack
> property
> > to
> > > > >> > broker.
> > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss
> this
> > in
> > > > the
> > > > >> > next
> > > > >> > > > KIP
> > > > >> > > > >> > >>> hangout.
> > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46
> > PM,
> > > > >> Allen
> > > > >> > > > Wang <
> > > > >> > > > >> > >>> > > > > > > allenxwang@gmail.com
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >> > > > wrote:
> > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >> Can you send me the
> > information
> > > > on
> > > > >> the
> > > > >> > > > next
> > > > >> > > > >> KIP
> > > > >> > > > >> > >>> > > hangout?
> > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > > > >> > >>> > > > > > > >> > > > >> Currently the broker-rack
> > > mapping
> > > > >> is
> > > > >> > not
> > > > >> > > > >> > cached.
> > > > >> > > > >> > >>> In
> > > > >> > > > >> > >>> > > > > > KafkaApis,
> > > > >> > > > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo()
> is
> > > > called
> > > > >> > each
> > > > >> > > > >> time
> > > > >> > > > >> > the
> > > > >> > > > >> > >>> > > mapping
> > > > >> > > > >> > >>> > > > > is
> > > > >> > > > >> > >>> > > > > > > >> needed
> > > > >> > > > >> > >>> > > > > > > >> > > for
> > > > >> > > > >> > >>> > > > > > > >> > > > >> auto topic creation. This
> > will
> > > > >> ensure
> > > > >> > > > latest
> > > > >> > > > >> > >>> mapping
> > > > >> > > > >> > >>> > is
> > > > >> > > > >> > >>> > > > > used
> > > > >> > > > >> > >>> > > > > > at
> > > > >> > > > >> > >>> > > > > > > >> any
> > > > >> > > > >> > >>> > > > > > > >> > > > time.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > > > >> > >>> > > > > > > >> > > > >> The ability to get the
> > complete
> > > > >> > mapping
> > > > >> > > > >> makes
> > > > >> > > > >> > it
> > > > >> > > > >> > >>> > simple
> > > > >> > > > >> > >>> > > > to
> > > > >> > > > >> > >>> > > > > > > reuse
> > > > >> > > > >> > >>> > > > > > > >> the
> > > > >> > > > >> > >>> > > > > > > >> > > > same
> > > > >> > > > >> > >>> > > > > > > >> > > > >> interface in command line
> > > tools.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at
> 11:01
> > > AM,
> > > > >> > Aditya
> > > > >> > > > >> > >>> Auradkar <
> > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > aauradkar@linkedin.com.invalid
> > > >
> > > > >> > wrote:
> > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this
> > during
> > > > the
> > > > >> > next
> > > > >> > > > KIP
> > > > >> > > > >> > >>> hangout?
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> I do see that a pluggable
> > rack
> > > > >> > locator
> > > > >> > > > can
> > > > >> > > > >> be
> > > > >> > > > >> > >>> useful
> > > > >> > > > >> > >>> > > > but I
> > > > >> > > > >> > >>> > > > > > do
> > > > >> > > > >> > >>> > > > > > > >> see a
> > > > >> > > > >> > >>> > > > > > > >> > > few
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> concerns:
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as
> > > described
> > > > in
> > > > >> > the
> > > > >> > > > >> > >>> document),
> > > > >> > > > >> > >>> > > > implies
> > > > >> > > > >> > >>> > > > > > that
> > > > >> > > > >> > >>> > > > > > > >> it
> > > > >> > > > >> > >>> > > > > > > >> > can
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> discover rack information
> > for
> > > > any
> > > > >> > node
> > > > >> > > in
> > > > >> > > > >> the
> > > > >> > > > >> > >>> > cluster.
> > > > >> > > > >> > >>> > > > How
> > > > >> > > > >> > >>> > > > > > > does
> > > > >> > > > >> > >>> > > > > > > >> it
> > > > >> > > > >> > >>> > > > > > > >> > > deal
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> with rack location
> changes?
> > > For
> > > > >> > > example,
> > > > >> > > > >> if I
> > > > >> > > > >> > >>> moved
> > > > >> > > > >> > >>> > > > broker
> > > > >> > > > >> > >>> > > > > > id
> > > > >> > > > >> > >>> > > > > > > >> (1)
> > > > >> > > > >> > >>> > > > > > > >> > > from
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to
> start
> > > > that
> > > > >> > > broker
> > > > >> > > > >> with
> > > > >> > > > >> > a
> > > > >> > > > >> > >>> > newer
> > > > >> > > > >> > >>> > > > rack
> > > > >> > > > >> > >>> > > > > > > >> config.
> > > > >> > > > >> > >>> > > > > > > >> > If
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers
> broker
> > > ->
> > > > >> rack
> > > > >> > > > >> > >>> information at
> > > > >> > > > >> > >>> > > > start
> > > > >> > > > >> > >>> > > > > up
> > > > >> > > > >> > >>> > > > > > > >> time,
> > > > >> > > > >> > >>> > > > > > > >> > > any
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> change to a broker will
> > > require
> > > > >> > > bouncing
> > > > >> > > > >> the
> > > > >> > > > >> > >>> entire
> > > > >> > > > >> > >>> > > > > cluster
> > > > >> > > > >> > >>> > > > > > > >> since
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> createTopic requests can
> be
> > > sent
> > > > >> to
> > > > >> > any
> > > > >> > > > >> node
> > > > >> > > > >> > in
> > > > >> > > > >> > >>> the
> > > > >> > > > >> > >>> > > > > cluster.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> For this reason it may be
> > > > simpler
> > > > >> to
> > > > >> > > have
> > > > >> > > > >> each
> > > > >> > > > >> > >>> node
> > > > >> > > > >> > >>> > be
> > > > >> > > > >> > >>> > > > > aware
> > > > >> > > > >> > >>> > > > > > > of
> > > > >> > > > >> > >>> > > > > > > >> its
> > > > >> > > > >> > >>> > > > > > > >> > > own
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK
> > > during
> > > > >> > start
> > > > >> > > up
> > > > >> > > > >> > time.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator
> > > relies
> > > > >> on
> > > > >> > an
> > > > >> > > > >> > external
> > > > >> > > > >> > >>> > > service
> > > > >> > > > >> > >>> > > > > > being
> > > > >> > > > >> > >>> > > > > > > >> > > available
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> to
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> serve rack information.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked
> > up
> > > > how
> > > > >> a
> > > > >> > > > couple
> > > > >> > > > >> of
> > > > >> > > > >> > >>> other
> > > > >> > > > >> > >>> > > > > systems
> > > > >> > > > >> > >>> > > > > > > deal
> > > > >> > > > >> > >>> > > > > > > >> > with
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> For Cassandra some
> > interesting
> > > > >> modes
> > > > >> > > are:
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> (Property File
> > configuration)
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > >> > > > >> > >>> > > > > > > >> > >
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >>
> > > > >> > > > >> > >>> > > > > > >
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > >
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> >
> > > > >> > > > >> > >>>
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > >> > > > >> > >>> > > > > > > >> > >
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >>
> > > > >> > > > >> > >>> > > > > > >
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > >
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> >
> > > > >> > > > >> > >>>
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> Voldemort does a static
> node
> > > ->
> > > > >> zone
> > > > >> > > > >> > assignment
> > > > >> > > > >> > >>> > based
> > > > >> > > > >> > >>> > > on
> > > > >> > > > >> > >>> > > > > > > >> > > configuration.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> Aditya
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at
> > 10:05
> > > > AM,
> > > > >> > Allen
> > > > >> > > > >> Wang <
> > > > >> > > > >> > >>> > > > > > > >> allenxwang@gmail.com
> > > > >> > > > >> > >>> > > > > > > >> > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > I would like to see if
> we
> > > can
> > > > do
> > > > >> > > both:
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator
> > pluggable
> > > > to
> > > > >> > > > >> facilitate
> > > > >> > > > >> > >>> > migration
> > > > >> > > > >> > >>> > > > > with
> > > > >> > > > >> > >>> > > > > > > >> > existing
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make rack an optional
> > > > property
> > > > >> > for
> > > > >> > > > >> broker.
> > > > >> > > > >> > >>> If
> > > > >> > > > >> > >>> > rack
> > > > >> > > > >> > >>> > > > is
> > > > >> > > > >> > >>> > > > > > > >> available
> > > > >> > > > >> > >>> > > > > > > >> > > > from
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as
> source
> > > of
> > > > >> > truth.
> > > > >> > > > For
> > > > >> > > > >> > users
> > > > >> > > > >> > >>> > with
> > > > >> > > > >> > >>> > > > > > existing
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> broker-rack
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere else,
> > they
> > > > can
> > > > >> > use
> > > > >> > > > the
> > > > >> > > > >> > >>> pluggable
> > > > >> > > > >> > >>> > > way
> > > > >> > > > >> > >>> > > > > or
> > > > >> > > > >> > >>> > > > > > > they
> > > > >> > > > >> > >>> > > > > > > >> > can
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> transfer
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > the mapping to the
> broker
> > > rack
> > > > >> > > > property.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > One thing I am not sure
> is
> > > > what
> > > > >> > > happens
> > > > >> > > > >> at
> > > > >> > > > >> > >>> rolling
> > > > >> > > > >> > >>> > > > > upgrade
> > > > >> > > > >> > >>> > > > > > > >> when
> > > > >> > > > >> > >>> > > > > > > >> > we
> > > > >> > > > >> > >>> > > > > > > >> > > > have
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > rack as a broker
> property.
> > > For
> > > > >> > > brokers
> > > > >> > > > >> with
> > > > >> > > > >> > >>> older
> > > > >> > > > >> > >>> > > > > version
> > > > >> > > > >> > >>> > > > > > of
> > > > >> > > > >> > >>> > > > > > > >> > Kafka,
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> will it
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > cause problem for them?
> If
> > > so,
> > > > >> is
> > > > >> > > there
> > > > >> > > > >> any
> > > > >> > > > >> > >>> > > > workaround?
> > > > >> > > > >> > >>> > > > > I
> > > > >> > > > >> > >>> > > > > > > also
> > > > >> > > > >> > >>> > > > > > > >> > > think
> > > > >> > > > >> > >>> > > > > > > >> > > > it
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > would be better not to
> > have
> > > > >> rack in
> > > > >> > > the
> > > > >> > > > >> > >>> controller
> > > > >> > > > >> > >>> > > > wire
> > > > >> > > > >> > >>> > > > > > > >> protocol
> > > > >> > > > >> > >>> > > > > > > >> > > but
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> not
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > sure if it is
> achievable.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > Thanks,
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > Allen
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at
> > 4:55
> > > > PM,
> > > > >> > Todd
> > > > >> > > > >> > Palino <
> > > > >> > > > >> > >>> > > > > > > >> tpalino@gmail.com>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the
> idea
> > > of a
> > > > >> > > > pluggable
> > > > >> > > > >> > >>> locator.
> > > > >> > > > >> > >>> > > For
> > > > >> > > > >> > >>> > > > > > > >> example, we
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> already
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > have an interface for
> > > > >> discovering
> > > > >> > > > >> > >>> information
> > > > >> > > > >> > >>> > > about
> > > > >> > > > >> > >>> > > > > the
> > > > >> > > > >> > >>> > > > > > > >> > physical
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> location
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't
> > relish
> > > > the
> > > > >> > idea
> > > > >> > > > of
> > > > >> > > > >> > >>> having to
> > > > >> > > > >> > >>> > > > > > maintain
> > > > >> > > > >> > >>> > > > > > > >> data
> > > > >> > > > >> > >>> > > > > > > >> > in
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > multiple
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > places.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > -Todd
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015
> at
> > > 4:48
> > > > >> PM,
> > > > >> > > > Aditya
> > > > >> > > > >> > >>> > Auradkar <
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > aauradkar@linkedin.com.invalid
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting
> > this
> > > > KIP
> > > > >> > > Allen.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen
> that
> > > > >> having a
> > > > >> > > > >> > >>> RackLocator
> > > > >> > > > >> > >>> > > class
> > > > >> > > > >> > >>> > > > > that
> > > > >> > > > >> > >>> > > > > > > is
> > > > >> > > > >> > >>> > > > > > > >> > > > pluggable
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > seems
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > to be too complex.
> The
> > > KIP
> > > > >> > refers
> > > > >> > > > to
> > > > >> > > > >> > >>> > potentially
> > > > >> > > > >> > >>> > > > > > non-ZK
> > > > >> > > > >> > >>> > > > > > > >> > storage
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> for the
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack info which I
> > don't
> > > > >> think
> > > > >> > is
> > > > >> > > > >> > >>> necessary.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can
> persist
> > > > this
> > > > >> > info
> > > > >> > > in
> > > > >> > > > >> zk
> > > > >> > > > >> > >>> under
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > similar to other
> > broker
> > > > >> > > properties
> > > > >> > > > >> and
> > > > >> > > > >> > >>> add a
> > > > >> > > > >> > >>> > > > config
> > > > >> > > > >> > >>> > > > > in
> > > > >> > > > >> > >>> > > > > > > >> > > > KafkaConfig
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > called
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > "rack".
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > > > >> > >>> > > > > > >
> > > > >> > > > >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > > >> > > > >> > >>> > > > > > > >> > > "rack":
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > "abc"}
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015
> > at
> > > > 2:30
> > > > >> > PM,
> > > > >> > > > Gwen
> > > > >> > > > >> > >>> Shapira
> > > > >> > > > >> > >>> > <
> > > > >> > > > >> > >>> > > > > > > >> > > gwen@confluent.io
> > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > wrote:
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for
> > > > putting
> > > > >> > out a
> > > > >> > > > KIP
> > > > >> > > > >> > for
> > > > >> > > > >> > >>> > this.
> > > > >> > > > >> > >>> > > > This
> > > > >> > > > >> > >>> > > > > > is
> > > > >> > > > >> > >>> > > > > > > >> super
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> important
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > for
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > production
> > deployments
> > > > of
> > > > >> > > Kafka.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we
> > want
> > > > "as
> > > > >> > many
> > > > >> > > > >> racks
> > > > >> > > > >> > as
> > > > >> > > > >> > >>> > > > > possible"?
> > > > >> > > > >> > >>> > > > > > > I'd
> > > > >> > > > >> > >>> > > > > > > >> > want
> > > > >> > > > >> > >>> > > > > > > >> > > to
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > balance
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > between safety
> (more
> > > > >> racks)
> > > > >> > and
> > > > >> > > > >> > network
> > > > >> > > > >> > >>> > > > > utilization
> > > > >> > > > >> > >>> > > > > > > >> > (traffic
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> within a
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > uses the
> > > high-bandwidth
> > > > >> TOR
> > > > >> > > > >> switch).
> > > > >> > > > >> > One
> > > > >> > > > >> > >>> > > replica
> > > > >> > > > >> > >>> > > > > on
> > > > >> > > > >> > >>> > > > > > a
> > > > >> > > > >> > >>> > > > > > > >> > > different
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > and
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same
> > rack
> > > > (if
> > > > >> > > > possible)
> > > > >> > > > >> > >>> sounds
> > > > >> > > > >> > >>> > > > better
> > > > >> > > > >> > >>> > > > > to
> > > > >> > > > >> > >>> > > > > > > me.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator
> > class
> > > > >> seems
> > > > >> > > > overly
> > > > >> > > > >> > >>> complex
> > > > >> > > > >> > >>> > > > > compared
> > > > >> > > > >> > >>> > > > > > to
> > > > >> > > > >> > >>> > > > > > > >> > > adding a
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack.number
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > property to the
> > broker
> > > > >> > > properties
> > > > >> > > > >> > file.
> > > > >> > > > >> > >>> Why
> > > > >> > > > >> > >>> > do
> > > > >> > > > >> > >>> > > > we
> > > > >> > > > >> > >>> > > > > > want
> > > > >> > > > >> > >>> > > > > > > >> > that?
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28,
> 2015
> > > at
> > > > >> 12:15
> > > > >> > > PM,
> > > > >> > > > >> > Allen
> > > > >> > > > >> > >>> > Wang <
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > wrote:
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka
> > > > Developers,
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just created
> > > KIP-36
> > > > >> for
> > > > >> > > rack
> > > > >> > > > >> aware
> > > > >> > > > >> > >>> > replica
> > > > >> > > > >> > >>> > > > > > > >> assignment.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > >> > > > >> > >>> > > > > > > >> > >
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >>
> > > > >> > > > >> > >>> > > > > > >
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > >
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> >
> > > > >> > > > >> > >>>
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to
> > > utilize
> > > > >> the
> > > > >> > > > >> isolation
> > > > >> > > > >> > >>> > > provided
> > > > >> > > > >> > >>> > > > by
> > > > >> > > > >> > >>> > > > > > the
> > > > >> > > > >> > >>> > > > > > > >> > racks
> > > > >> > > > >> > >>> > > > > > > >> > > in
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> data
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > center
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute
> > > > replicas
> > > > >> to
> > > > >> > > > racks
> > > > >> > > > >> to
> > > > >> > > > >> > >>> > provide
> > > > >> > > > >> > >>> > > > > fault
> > > > >> > > > >> > >>> > > > > > > >> > > tolerance.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are
> > > welcome.
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > > > >> > >>> > > > > > > >> > > >
> > > > >> > > > >> > >>> > > > > > > >> > >
> > > > >> > > > >> > >>> > > > > > > >> >
> > > > >> > > > >> > >>> > > > > > > >>
> > > > >> > > > >> > >>> > > > > > > >
> > > > >> > > > >> > >>> > > > > > > >
> > > > >> > > > >> > >>> > > > > > >
> > > > >> > > > >> > >>> > > > > >
> > > > >> > > > >> > >>> > > > >
> > > > >> > > > >> > >>> > > >
> > > > >> > > > >> > >>> > >
> > > > >> > > > >> > >>> >
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>> --
> > > > >> > > > >> > >>> Thanks,
> > > > >> > > > >> > >>> Neha
> > > > >> > > > >> > >>>
> > > > >> > > > >> > >>
> > > > >> > > > >> > >>
> > > > >> > > > >> > >
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Jun Rao <ju...@confluent.io>.
Hi, Allen,

That's a good point. In 0.9.0.0, the old consumer reads broker info
directly from ZK and the code throws an exception if the version in json is
not 1 or 2. This old consumer will break when we upgrade the broker json to
version 3 in ZK in 0.9.1, which will be an issue. We overlooked this issue
in 0.9.0.0. The easiest fix is probably not to check the version in
ZkUtils.getBrokerInfo().
This way, as long as we are only adding new fields in broker json, we can
preserve the compatibility.

Thanks,

Jun

On Tue, Jan 12, 2016 at 1:52 PM, Allen Wang <al...@gmail.com> wrote:

> Hi Jun,
>
> That's a good suggestion. However, it does not solve the problem for the
> clients or thirty party tools that get broker information directly from
> ZooKeeper.
>
> Thanks,
> Allen
>
>
> On Tue, Jan 12, 2016 at 1:29 PM, Jun Rao <ju...@confluent.io> wrote:
>
> > Allen,
> >
> > Another way to do this is the following.
> >
> > When inter.broker.protocol.version is set to 0.9.0, the broker will write
> > the broker info in ZK using version 2, ignoring the rack info.
> >
> > When inter.broker.protocol.version is set to 0.9.1, the broker will write
> > the broker info in ZK using version 3, including the rack info.
> >
> > If one follows the upgrade process, after the 2nd round of rolling
> bounces,
> > every broker is capable of parsing version 3 of broker info in ZK. This
> is
> > when the rack-aware feature will be used.
> >
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Jan 12, 2016 at 12:19 PM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > Regarding the JSON version of Broker:
> > >
> > > I don't why the ZkUtils.getBrokerInfo() restricts the JSON versions it
> > can
> > > read. It will throw exception if version is not 1 or 2. Seems to me
> that
> > it
> > > will cause compatibility problem whenever the version needs to be
> changed
> > > and make the upgrade path difficult.
> > >
> > > One option we have is to make rack also part of version 2 and keep the
> > > version 2 unchanged for this update. This will make the old clients
> > > compatible. During rolling upgrade, it will also avoid problems if the
> > > controller/broker is still the old version.
> > >
> > > However, ZkUtils.getBrokerInfo() will be updated to return the Broker
> > with
> > > rack so the rack information will be available once the server/client
> is
> > > upgraded to the latest version.
> > >
> > >
> > >
> > > On Wed, Jan 6, 2016 at 6:28 PM, Allen Wang <al...@gmail.com>
> wrote:
> > >
> > > > Updated KIP according to Jun's comment and included changes to TMR.
> > > >
> > > > On Tue, Jan 5, 2016 at 5:59 PM, Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > >> Hi, Allen,
> > > >>
> > > >> A couple of minor comments on the KIP.
> > > >>
> > > >> 1. The version of the broker JSON string says 2. It should be 3.
> > > >>
> > > >> 2. The new version of UpdateMetadataRequest should be 2, instead of
> 1.
> > > >> Could you include the full wire protocol of version 2 of
> > > >> UpdateMetadataRequest and highlight the changed part?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jun
> > > >>
> > > >> On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <al...@gmail.com>
> > > wrote:
> > > >>
> > > >> > Jun and I had a chance to discuss it in a meeting and it is agreed
> > to
> > > >> > change the TMR in a different patch.
> > > >> >
> > > >> > I can change the KIP to include rack in TMR. The essential change
> is
> > > to
> > > >> add
> > > >> > rack into class BrokerEndPoint and make TMR version aware.
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
> > > >> > aauradkar@linkedin.com.invalid> wrote:
> > > >> >
> > > >> > > Jun/Allen -
> > > >> > >
> > > >> > > Did we ever actually agree on whether we should evolve the TMR
> to
> > > >> include
> > > >> > > rack info or not?
> > > >> > > I don't feel strongly about it but I if it's the right thing to
> do
> > > we
> > > >> > > should probably do it in this KIP (can be a separate patch).. it
> > > >> isn't a
> > > >> > > large change.
> > > >> > >
> > > >> > > Aditya
> > > >> > >
> > > >> > > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <
> allenxwang@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Added the rolling upgrade instruction in the KIP, similar to
> > those
> > > >> in
> > > >> > > 0.9.0
> > > >> > > > release notes.
> > > >> > > >
> > > >> > > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <
> > > allenxwang@gmail.com>
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > Hi Jun,
> > > >> > > > >
> > > >> > > > > The reason that TopicMetadataResponse is not included in the
> > KIP
> > > >> is
> > > >> > > that
> > > >> > > > > it currently is not version aware . So we need to introduce
> > > >> version
> > > >> > to
> > > >> > > it
> > > >> > > > > in order to make sure backward compatibility. It seems to
> me a
> > > big
> > > >> > > > change.
> > > >> > > > > Do we want to couple it with this KIP? Do we need to further
> > > >> discuss
> > > >> > > what
> > > >> > > > > information to include in the new version besides rack? For
> > > >> example,
> > > >> > > > should
> > > >> > > > > we include broker security protocol in
> TopicMetadataResponse?
> > > >> > > > >
> > > >> > > > > The other option is to make it a separate KIP to make
> > > >> > > > > TopicMetadataResponse version aware and decide what to
> > include,
> > > >> and
> > > >> > > make
> > > >> > > > > this KIP focus on the rack aware algorithm, admin tools  and
> > > >> related
> > > >> > > > > changes to inter-broker protocol .
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Allen
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <ju...@confluent.io>
> > > >> wrote:
> > > >> > > > >
> > > >> > > > >> Allen,
> > > >> > > > >>
> > > >> > > > >> Thanks for the proposal. A few comments.
> > > >> > > > >>
> > > >> > > > >> 1. Since this KIP changes the inter broker communication
> > > protocol
> > > >> > > > >> (UpdateMetadataRequest), we will need to document the
> upgrade
> > > >> path
> > > >> > > > >> (similar
> > > >> > > > >> to what's described in
> > > >> > > > >> http://kafka.apache.org/090/documentation.html#upgrade).
> > > >> > > > >>
> > > >> > > > >> 2. It might be useful to include the rack info of the
> broker
> > in
> > > >> > > > >> TopicMetadataResponse. This can be useful for
> administrative
> > > >> tasks,
> > > >> > as
> > > >> > > > >> well
> > > >> > > > >> as read affinity in the future.
> > > >> > > > >>
> > > >> > > > >> Jun
> > > >> > > > >>
> > > >> > > > >>
> > > >> > > > >>
> > > >> > > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <
> > > >> allenxwang@gmail.com>
> > > >> > > > wrote:
> > > >> > > > >>
> > > >> > > > >> > If there are no more comments I would like to call for a
> > > vote.
> > > >> > > > >> >
> > > >> > > > >> >
> > > >> > > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <
> > > >> > allenxwang@gmail.com>
> > > >> > > > >> wrote:
> > > >> > > > >> >
> > > >> > > > >> > > KIP is updated with more details and how to handle the
> > > >> situation
> > > >> > > > where
> > > >> > > > >> > > rack information is incomplete.
> > > >> > > > >> > >
> > > >> > > > >> > > In the situation where rack information is incomplete,
> > but
> > > we
> > > >> > want
> > > >> > > > to
> > > >> > > > >> > > continue with the assignment, I have suggested to
> ignore
> > > all
> > > >> > rack
> > > >> > > > >> > > information and fallback to original algorithm. The
> > reason
> > > is
> > > >> > > > >> explained
> > > >> > > > >> > > below:
> > > >> > > > >> > >
> > > >> > > > >> > > The other options are to assume that the broker without
> > the
> > > >> rack
> > > >> > > > >> belong
> > > >> > > > >> > to
> > > >> > > > >> > > its own unique rack, or they belong to one "default"
> > rack.
> > > >> > Either
> > > >> > > > way
> > > >> > > > >> we
> > > >> > > > >> > > choose, it is highly likely to result in uneven number
> of
> > > >> > brokers
> > > >> > > in
> > > >> > > > >> > racks,
> > > >> > > > >> > > and it is quite possible that the "made up" racks will
> > have
> > > >> much
> > > >> > > > fewer
> > > >> > > > >> > > number of brokers. As I explained in the KIP, uneven
> > number
> > > >> of
> > > >> > > > >> brokers in
> > > >> > > > >> > > racks will lead to uneven distribution of replicas
> among
> > > >> brokers
> > > >> > > > (even
> > > >> > > > >> > > though the leader distribution is still even). The
> > brokers
> > > in
> > > >> > the
> > > >> > > > rack
> > > >> > > > >> > that
> > > >> > > > >> > > has fewer number of brokers will get more replicas per
> > > broker
> > > >> > than
> > > >> > > > >> > brokers
> > > >> > > > >> > > in other racks.
> > > >> > > > >> > >
> > > >> > > > >> > > Given this fact and the replica assignment produced
> will
> > be
> > > >> > > > incorrect
> > > >> > > > >> > > anyway from rack aware point of view, ignoring all rack
> > > >> > > information
> > > >> > > > >> and
> > > >> > > > >> > > fallback to the original algorithm is not a bad choice
> > > since
> > > >> it
> > > >> > > will
> > > >> > > > >> at
> > > >> > > > >> > > least have a better guarantee of replica distribution.
> > > >> > > > >> > >
> > > >> > > > >> > > Also for command line tools it gives user a choice if
> for
> > > any
> > > >> > > reason
> > > >> > > > >> they
> > > >> > > > >> > > want to ignore rack information and fallback to the
> > > original
> > > >> > > > >> algorithm.
> > > >> > > > >> > >
> > > >> > > > >> > >
> > > >> > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <
> > > >> > allenxwang@gmail.com
> > > >> > > >
> > > >> > > > >> > wrote:
> > > >> > > > >> > >
> > > >> > > > >> > >> I am busy with some time pressing issues for the last
> > few
> > > >> > days. I
> > > >> > > > >> will
> > > >> > > > >> > >> think about how the incomplete rack information will
> > > affect
> > > >> the
> > > >> > > > >> balance
> > > >> > > > >> > and
> > > >> > > > >> > >> update the KIP by early next week.
> > > >> > > > >> > >>
> > > >> > > > >> > >> Thanks,
> > > >> > > > >> > >> Allen
> > > >> > > > >> > >>
> > > >> > > > >> > >>
> > > >> > > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <
> > > >> > neha@confluent.io
> > > >> > > >
> > > >> > > > >> > wrote:
> > > >> > > > >> > >>
> > > >> > > > >> > >>> Few suggestions on improving the KIP
> > > >> > > > >> > >>>
> > > >> > > > >> > >>> *If some brokers have rack, and some do not, the
> > > algorithm
> > > >> > will
> > > >> > > > >> thrown
> > > >> > > > >> > an
> > > >> > > > >> > >>> > exception. This is to prevent incorrect assignment
> > > >> caused by
> > > >> > > > user
> > > >> > > > >> > >>> error.*
> > > >> > > > >> > >>>
> > > >> > > > >> > >>>
> > > >> > > > >> > >>> In the KIP, can you clearly state the user-facing
> > > behavior
> > > >> > when
> > > >> > > > some
> > > >> > > > >> > >>> brokers have rack information and some don't. Which
> > > actions
> > > >> > and
> > > >> > > > >> > requests
> > > >> > > > >> > >>> will error out and how?
> > > >> > > > >> > >>>
> > > >> > > > >> > >>> *Even distribution of partition leadership among
> > brokers*
> > > >> > > > >> > >>>
> > > >> > > > >> > >>>
> > > >> > > > >> > >>> There is some information about arranging the sorted
> > > broker
> > > >> > list
> > > >> > > > >> > >>> interlaced
> > > >> > > > >> > >>> with rack ids. Can you describe the changes to the
> > > current
> > > >> > > > algorithm
> > > >> > > > >> > in a
> > > >> > > > >> > >>> little more detail? How does this interlacing work if
> > > only
> > > >> a
> > > >> > > > subset
> > > >> > > > >> of
> > > >> > > > >> > >>> brokers have the rack id configured? Does this still
> > work
> > > >> if
> > > >> > > > uneven
> > > >> > > > >> #
> > > >> > > > >> > of
> > > >> > > > >> > >>> brokers are assigned to each rack? It might work, I'm
> > > >> looking
> > > >> > > for
> > > >> > > > >> more
> > > >> > > > >> > >>> details on the changes, since it will affect the
> > behavior
> > > >> seen
> > > >> > > by
> > > >> > > > >> the
> > > >> > > > >> > >>> user
> > > >> > > > >> > >>> - imbalance on either the leaders or data or both.
> > > >> > > > >> > >>>
> > > >> > > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> > > >> > > > >> > aauradkar@linkedin.com>
> > > >> > > > >> > >>> wrote:
> > > >> > > > >> > >>>
> > > >> > > > >> > >>> > I think this sounds reasonable. Anyone else have
> > > >> comments?
> > > >> > > > >> > >>> >
> > > >> > > > >> > >>> > Aditya
> > > >> > > > >> > >>> >
> > > >> > > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> > > >> > > > allenxwang@gmail.com
> > > >> > > > >> >
> > > >> > > > >> > >>> wrote:
> > > >> > > > >> > >>> >
> > > >> > > > >> > >>> > > During the discussion in the hangout, it was
> > > mentioned
> > > >> > that
> > > >> > > it
> > > >> > > > >> > would
> > > >> > > > >> > >>> be
> > > >> > > > >> > >>> > > desirable that consumers know the rack
> information
> > of
> > > >> the
> > > >> > > > >> brokers
> > > >> > > > >> > so
> > > >> > > > >> > >>> that
> > > >> > > > >> > >>> > > they can consume from the broker in the same rack
> > to
> > > >> > reduce
> > > >> > > > >> > latency.
> > > >> > > > >> > >>> As I
> > > >> > > > >> > >>> > > understand this will only be beneficial if
> consumer
> > > can
> > > >> > > > consume
> > > >> > > > >> > from
> > > >> > > > >> > >>> any
> > > >> > > > >> > >>> > > broker in ISR, which is not possible now.
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> > > I suggest we skip the change to TMR. Once the
> > change
> > > is
> > > >> > made
> > > >> > > > to
> > > >> > > > >> > >>> consumer
> > > >> > > > >> > >>> > to
> > > >> > > > >> > >>> > > be able to consume from any broker in ISR, the
> rack
> > > >> > > > information
> > > >> > > > >> can
> > > >> > > > >> > >>> be
> > > >> > > > >> > >>> > > added to TMR.
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> > > Another thing I want to confirm is  command line
> > > >> > behavior. I
> > > >> > > > >> think
> > > >> > > > >> > >>> the
> > > >> > > > >> > >>> > > desirable default behavior is to fail fast on
> > command
> > > >> line
> > > >> > > for
> > > >> > > > >> > >>> incomplete
> > > >> > > > >> > >>> > > rack mapping. The error message can include
> further
> > > >> > > > instruction
> > > >> > > > >> > that
> > > >> > > > >> > >>> > tells
> > > >> > > > >> > >>> > > the user to add an extra argument (like
> > > >> > > > >> "--allow-partial-rackinfo")
> > > >> > > > >> > >>> to
> > > >> > > > >> > >>> > > suppress the error and do an imperfect rack aware
> > > >> > > assignment.
> > > >> > > > If
> > > >> > > > >> > the
> > > >> > > > >> > >>> > > default behavior is to allow incomplete mapping,
> > the
> > > >> error
> > > >> > > can
> > > >> > > > >> > still
> > > >> > > > >> > >>> be
> > > >> > > > >> > >>> > > easily missed.
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> > > The affected command line tools are TopicCommand
> > and
> > > >> > > > >> > >>> > > ReassignPartitionsCommand.
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> > > Thanks,
> > > >> > > > >> > >>> > > Allen
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya
> Auradkar <
> > > >> > > > >> > >>> > aauradkar@linkedin.com>
> > > >> > > > >> > >>> > > wrote:
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> > > > Hi Allen,
> > > >> > > > >> > >>> > > >
> > > >> > > > >> > >>> > > > For TopicMetadataResponse to understand
> version,
> > > you
> > > >> can
> > > >> > > > bump
> > > >> > > > >> up
> > > >> > > > >> > >>> the
> > > >> > > > >> > >>> > > > request version itself. Based on the version of
> > the
> > > >> > > request,
> > > >> > > > >> the
> > > >> > > > >> > >>> > response
> > > >> > > > >> > >>> > > > can be appropriately serialized. It shouldn't
> be
> > a
> > > >> huge
> > > >> > > > >> change.
> > > >> > > > >> > For
> > > >> > > > >> > >>> > > > example: We went through something similar for
> > > >> > > > ProduceRequest
> > > >> > > > >> > >>> recently
> > > >> > > > >> > >>> > (
> > > >> > > > >> > >>> > > > https://reviews.apache.org/r/33378/)
> > > >> > > > >> > >>> > > > I guess the reason protocol information is not
> > > >> included
> > > >> > in
> > > >> > > > the
> > > >> > > > >> > TMR
> > > >> > > > >> > >>> is
> > > >> > > > >> > >>> > > > because the topic itself is independent of any
> > > >> > particular
> > > >> > > > >> > protocol
> > > >> > > > >> > >>> (SSL
> > > >> > > > >> > >>> > > vs
> > > >> > > > >> > >>> > > > Plaintext). Having said that, I'm not sure we
> > even
> > > >> need
> > > >> > > rack
> > > >> > > > >> > >>> > information
> > > >> > > > >> > >>> > > in
> > > >> > > > >> > >>> > > > TMR. What usecase were you thinking of
> initially?
> > > >> > > > >> > >>> > > >
> > > >> > > > >> > >>> > > > For 1 - I'd be fine with adding an option to
> the
> > > >> command
> > > >> > > > line
> > > >> > > > >> > tools
> > > >> > > > >> > >>> > that
> > > >> > > > >> > >>> > > > check rack assignment. For e.g.
> > > >> "--strict-assignment" or
> > > >> > > > >> > something
> > > >> > > > >> > >>> > > similar.
> > > >> > > > >> > >>> > > >
> > > >> > > > >> > >>> > > > Aditya
> > > >> > > > >> > >>> > > >
> > > >> > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> > > >> > > > >> > allenxwang@gmail.com>
> > > >> > > > >> > >>> > > wrote:
> > > >> > > > >> > >>> > > >
> > > >> > > > >> > >>> > > > > For 2 and 3, I have updated the KIP. Please
> > take
> > > a
> > > >> > look.
> > > >> > > > One
> > > >> > > > >> > >>> thing I
> > > >> > > > >> > >>> > > have
> > > >> > > > >> > >>> > > > > changed is removing the proposal to add rack
> to
> > > >> > > > >> > >>> > TopicMetadataResponse.
> > > >> > > > >> > >>> > > > The
> > > >> > > > >> > >>> > > > > reason is that unlike UpdateMetadataRequest,
> > > >> > > > >> > >>> TopicMetadataResponse
> > > >> > > > >> > >>> > does
> > > >> > > > >> > >>> > > > not
> > > >> > > > >> > >>> > > > > understand version. I don't see a way to
> > include
> > > >> rack
> > > >> > > > >> without
> > > >> > > > >> > >>> > breaking
> > > >> > > > >> > >>> > > > old
> > > >> > > > >> > >>> > > > > version of clients. That's probably why
> secure
> > > >> > protocol
> > > >> > > is
> > > >> > > > >> not
> > > >> > > > >> > >>> > included
> > > >> > > > >> > >>> > > > in
> > > >> > > > >> > >>> > > > > the TopicMetadataResponse either. I think it
> > will
> > > >> be a
> > > >> > > > much
> > > >> > > > >> > >>> bigger
> > > >> > > > >> > >>> > > change
> > > >> > > > >> > >>> > > > > to include rack in TopicMetadataResponse.
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > > > For 1, my concern is that doing rack aware
> > > >> assignment
> > > >> > > > >> without
> > > >> > > > >> > >>> > complete
> > > >> > > > >> > >>> > > > > broker to rack mapping will result in
> > assignment
> > > >> that
> > > >> > is
> > > >> > > > not
> > > >> > > > >> > rack
> > > >> > > > >> > >>> > aware
> > > >> > > > >> > >>> > > > and
> > > >> > > > >> > >>> > > > > fail to provide fault tolerance in the event
> of
> > > >> rack
> > > >> > > > outage.
> > > >> > > > >> > This
> > > >> > > > >> > >>> > kind
> > > >> > > > >> > >>> > > of
> > > >> > > > >> > >>> > > > > problem will be difficult to surface. And the
> > > cost
> > > >> of
> > > >> > > this
> > > >> > > > >> > >>> problem is
> > > >> > > > >> > >>> > > > high:
> > > >> > > > >> > >>> > > > > you have to do partition reassignment if you
> > are
> > > >> lucky
> > > >> > > to
> > > >> > > > >> spot
> > > >> > > > >> > >>> the
> > > >> > > > >> > >>> > > > problem
> > > >> > > > >> > >>> > > > > early on or face the consequence of data loss
> > > >> during
> > > >> > > real
> > > >> > > > >> rack
> > > >> > > > >> > >>> > outage.
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > > > I do see the concern of fail-fast as it might
> > > also
> > > >> > cause
> > > >> > > > >> data
> > > >> > > > >> > >>> loss if
> > > >> > > > >> > >>> > > > > producer is not able produce the message due
> to
> > > >> topic
> > > >> > > > >> creation
> > > >> > > > >> > >>> > failure.
> > > >> > > > >> > >>> > > > Is
> > > >> > > > >> > >>> > > > > it feasible to treat dynamic topic creation
> and
> > > >> > command
> > > >> > > > >> tools
> > > >> > > > >> > >>> > > > differently?
> > > >> > > > >> > >>> > > > > We allow dynamic topic creation with
> incomplete
> > > >> > > > broker-rack
> > > >> > > > >> > >>> mapping
> > > >> > > > >> > >>> > and
> > > >> > > > >> > >>> > > > > fail fast in command line. Another option is
> to
> > > let
> > > >> > user
> > > >> > > > >> > >>> determine
> > > >> > > > >> > >>> > the
> > > >> > > > >> > >>> > > > > behavior for command line. For example, by
> > > default
> > > >> > fail
> > > >> > > > >> fast in
> > > >> > > > >> > >>> > command
> > > >> > > > >> > >>> > > > > line but allow incomplete broker-rack mapping
> > if
> > > >> > another
> > > >> > > > >> switch
> > > >> > > > >> > >>> is
> > > >> > > > >> > >>> > > > > provided.
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya
> > > Auradkar <
> > > >> > > > >> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > > > > Hey Allen,
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > > > 1. If we choose fail fast topic creation,
> we
> > > will
> > > >> > have
> > > >> > > > >> topic
> > > >> > > > >> > >>> > creation
> > > >> > > > >> > >>> > > > > > failures while upgrading the cluster. I
> > really
> > > >> doubt
> > > >> > > we
> > > >> > > > >> want
> > > >> > > > >> > >>> this
> > > >> > > > >> > >>> > > > > behavior.
> > > >> > > > >> > >>> > > > > > Ideally, this should be invisible to
> clients
> > > of a
> > > >> > > > cluster.
> > > >> > > > >> > >>> > Currently,
> > > >> > > > >> > >>> > > > > each
> > > >> > > > >> > >>> > > > > > broker is effectively its own rack. So we
> > > >> probably
> > > >> > can
> > > >> > > > use
> > > >> > > > >> > the
> > > >> > > > >> > >>> rack
> > > >> > > > >> > >>> > > > > > information whenever possible but not make
> > it a
> > > >> hard
> > > >> > > > >> > >>> requirement.
> > > >> > > > >> > >>> > To
> > > >> > > > >> > >>> > > > > extend
> > > >> > > > >> > >>> > > > > > Gwen's example, one badly configured broker
> > > >> should
> > > >> > not
> > > >> > > > >> > degrade
> > > >> > > > >> > >>> > topic
> > > >> > > > >> > >>> > > > > > creation for the entire cluster.
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a section
> > on
> > > >> the
> > > >> > > > upgrade
> > > >> > > > >> > >>> piece to
> > > >> > > > >> > >>> > > > > confirm
> > > >> > > > >> > >>> > > > > > that old clients will not see errors? I
> > believe
> > > >> > > > >> > >>> > > > > ZookeeperConsumerConnector
> > > >> > > > >> > >>> > > > > > reads the Broker objects from ZK. I wanted
> to
> > > >> > confirm
> > > >> > > > that
> > > >> > > > >> > this
> > > >> > > > >> > >>> > will
> > > >> > > > >> > >>> > > > not
> > > >> > > > >> > >>> > > > > > cause any problems.
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > > > 3. Could you elaborate your proposed
> changes
> > to
> > > >> the
> > > >> > > > >> > >>> > > > UpdateMetadataRequest
> > > >> > > > >> > >>> > > > > > in the "Public Interfaces" section?
> > > Personally, I
> > > >> > find
> > > >> > > > >> this
> > > >> > > > >> > >>> format
> > > >> > > > >> > >>> > > easy
> > > >> > > > >> > >>> > > > > to
> > > >> > > > >> > >>> > > > > > read in terms of wire protocol changes:
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > >
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> >
> > > >> > > > >> > >>>
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > > > Aditya
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen
> Wang <
> > > >> > > > >> > >>> allenxwang@gmail.com>
> > > >> > > > >> > >>> > > > > wrote:
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > > > > KIP is updated include rack as an
> optional
> > > >> > property
> > > >> > > > for
> > > >> > > > >> > >>> broker.
> > > >> > > > >> > >>> > > > Please
> > > >> > > > >> > >>> > > > > > take
> > > >> > > > >> > >>> > > > > > > a look and let me know if more details
> are
> > > >> needed.
> > > >> > > > >> > >>> > > > > > >
> > > >> > > > >> > >>> > > > > > > For the case where some brokers have rack
> > and
> > > >> some
> > > >> > > do
> > > >> > > > >> not,
> > > >> > > > >> > >>> the
> > > >> > > > >> > >>> > > > current
> > > >> > > > >> > >>> > > > > > KIP
> > > >> > > > >> > >>> > > > > > > uses the fail-fast behavior. If there are
> > > >> > concerns,
> > > >> > > we
> > > >> > > > >> can
> > > >> > > > >> > >>> > further
> > > >> > > > >> > >>> > > > > > discuss
> > > >> > > > >> > >>> > > > > > > this in the email thread or next hangout.
> > > >> > > > >> > >>> > > > > > >
> > > >> > > > >> > >>> > > > > > >
> > > >> > > > >> > >>> > > > > > >
> > > >> > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen
> > Wang
> > > <
> > > >> > > > >> > >>> > allenxwang@gmail.com
> > > >> > > > >> > >>> > > >
> > > >> > > > >> > >>> > > > > > wrote:
> > > >> > > > >> > >>> > > > > > >
> > > >> > > > >> > >>> > > > > > > > That's a good question. I can think of
> > > three
> > > >> > > actions
> > > >> > > > >> if
> > > >> > > > >> > the
> > > >> > > > >> > >>> > rack
> > > >> > > > >> > >>> > > > > > > > information is incomplete:
> > > >> > > > >> > >>> > > > > > > >
> > > >> > > > >> > >>> > > > > > > > 1. Treat the node without rack as if it
> > is
> > > on
> > > >> > its
> > > >> > > > >> unique
> > > >> > > > >> > >>> rack
> > > >> > > > >> > >>> > > > > > > > 2. Disregard all rack information and
> > > >> fallback
> > > >> > to
> > > >> > > > >> current
> > > >> > > > >> > >>> > > algorithm
> > > >> > > > >> > >>> > > > > > > > 3. Fail-fast
> > > >> > > > >> > >>> > > > > > > >
> > > >> > > > >> > >>> > > > > > > > Now I think about it, one and three
> make
> > > more
> > > >> > > sense.
> > > >> > > > >> The
> > > >> > > > >> > >>> reason
> > > >> > > > >> > >>> > > for
> > > >> > > > >> > >>> > > > > > > > fail-fast is that user mistake for not
> > > >> providing
> > > >> > > the
> > > >> > > > >> rack
> > > >> > > > >> > >>> may
> > > >> > > > >> > >>> > > never
> > > >> > > > >> > >>> > > > > be
> > > >> > > > >> > >>> > > > > > > > found if we tolerate that and the
> > > assignment
> > > >> may
> > > >> > > not
> > > >> > > > >> be
> > > >> > > > >> > >>> rack
> > > >> > > > >> > >>> > > aware
> > > >> > > > >> > >>> > > > as
> > > >> > > > >> > >>> > > > > > the
> > > >> > > > >> > >>> > > > > > > > user has expected and this creates
> debug
> > > >> > problems
> > > >> > > > when
> > > >> > > > >> > >>> things
> > > >> > > > >> > >>> > > fail.
> > > >> > > > >> > >>> > > > > > > >
> > > >> > > > >> > >>> > > > > > > > What do you think? If not fail-fast, is
> > > there
> > > >> > > anyway
> > > >> > > > >> we
> > > >> > > > >> > can
> > > >> > > > >> > >>> > make
> > > >> > > > >> > >>> > > > the
> > > >> > > > >> > >>> > > > > > user
> > > >> > > > >> > >>> > > > > > > > error standing out?
> > > >> > > > >> > >>> > > > > > > >
> > > >> > > > >> > >>> > > > > > > >
> > > >> > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen
> > > >> Shapira <
> > > >> > > > >> > >>> > > gwen@confluent.io>
> > > >> > > > >> > >>> > > > > > > wrote:
> > > >> > > > >> > >>> > > > > > > >
> > > >> > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when some
> > brokers
> > > >> have
> > > >> > > > rack
> > > >> > > > >> > >>> > assignment
> > > >> > > > >> > >>> > > > and
> > > >> > > > >> > >>> > > > > > some
> > > >> > > > >> > >>> > > > > > > >> don't, do we act like none of them
> have
> > > it?
> > > >> or
> > > >> > > like
> > > >> > > > >> > those
> > > >> > > > >> > >>> > > without
> > > >> > > > >> > >>> > > > > > > >> assignment are in their own rack?
> > > >> > > > >> > >>> > > > > > > >>
> > > >> > > > >> > >>> > > > > > > >> The first scenario is good when first
> > > >> setting
> > > >> > up
> > > >> > > > >> > >>> > rack-awareness,
> > > >> > > > >> > >>> > > > but
> > > >> > > > >> > >>> > > > > > the
> > > >> > > > >> > >>> > > > > > > >> second makes more sense for on-going
> > > >> > maintenance
> > > >> > > (I
> > > >> > > > >> can
> > > >> > > > >> > >>> > totally
> > > >> > > > >> > >>> > > > see
> > > >> > > > >> > >>> > > > > > > >> someone
> > > >> > > > >> > >>> > > > > > > >> adding a node and forgetting to set
> the
> > > rack
> > > >> > > > >> property,
> > > >> > > > >> > we
> > > >> > > > >> > >>> > don't
> > > >> > > > >> > >>> > > > want
> > > >> > > > >> > >>> > > > > > > this
> > > >> > > > >> > >>> > > > > > > >> to change behavior for anything except
> > the
> > > >> new
> > > >> > > > node).
> > > >> > > > >> > >>> > > > > > > >>
> > > >> > > > >> > >>> > > > > > > >> What do you think?
> > > >> > > > >> > >>> > > > > > > >>
> > > >> > > > >> > >>> > > > > > > >> Gwen
> > > >> > > > >> > >>> > > > > > > >>
> > > >> > > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM,
> Allen
> > > >> Wang <
> > > >> > > > >> > >>> > > > allenxwang@gmail.com>
> > > >> > > > >> > >>> > > > > > > >> wrote:
> > > >> > > > >> > >>> > > > > > > >>
> > > >> > > > >> > >>> > > > > > > >> > For scenario 1:
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
> > > >> property
> > > >> > > > file
> > > >> > > > >> or
> > > >> > > > >> > >>> > > > dynamically
> > > >> > > > >> > >>> > > > > > set
> > > >> > > > >> > >>> > > > > > > >> it in
> > > >> > > > >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka
> > > >> server.
> > > >> > You
> > > >> > > > >> would
> > > >> > > > >> > do
> > > >> > > > >> > >>> > that
> > > >> > > > >> > >>> > > > for
> > > >> > > > >> > >>> > > > > > all
> > > >> > > > >> > >>> > > > > > > >> > brokers and restart the brokers one
> by
> > > >> one.
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >> > In this scenario, the complete
> broker
> > to
> > > >> rack
> > > >> > > > >> mapping
> > > >> > > > >> > >>> may
> > > >> > > > >> > >>> > not
> > > >> > > > >> > >>> > > be
> > > >> > > > >> > >>> > > > > > > >> available
> > > >> > > > >> > >>> > > > > > > >> > until every broker is restarted.
> > During
> > > >> that
> > > >> > > time
> > > >> > > > >> we
> > > >> > > > >> > >>> fall
> > > >> > > > >> > >>> > back
> > > >> > > > >> > >>> > > > to
> > > >> > > > >> > >>> > > > > > > >> default
> > > >> > > > >> > >>> > > > > > > >> > replica assignment algorithm.
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >> > For scenario 2:
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
> > > >> property
> > > >> > > > file
> > > >> > > > >> or
> > > >> > > > >> > >>> > > > dynamically
> > > >> > > > >> > >>> > > > > > set
> > > >> > > > >> > >>> > > > > > > >> it in
> > > >> > > > >> > >>> > > > > > > >> > the wrapper code and start the
> broker.
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM,
> Gwen
> > > >> > Shapira <
> > > >> > > > >> > >>> > > > gwen@confluent.io>
> > > >> > > > >> > >>> > > > > > > >> wrote:
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >> > > Can you clarify the workflow for
> the
> > > >> > > following
> > > >> > > > >> > >>> scenarios:
> > > >> > > > >> > >>> > > > > > > >> > >
> > > >> > > > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and
> > want
> > > >> to
> > > >> > add
> > > >> > > > >> rack
> > > >> > > > >> > >>> > > information
> > > >> > > > >> > >>> > > > > for
> > > >> > > > >> > >>> > > > > > > >> each
> > > >> > > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I
> > want
> > > to
> > > >> > > > specify
> > > >> > > > >> > which
> > > >> > > > >> > >>> > rack
> > > >> > > > >> > >>> > > it
> > > >> > > > >> > >>> > > > > > > >> belongs on
> > > >> > > > >> > >>> > > > > > > >> > > while adding it.
> > > >> > > > >> > >>> > > > > > > >> > >
> > > >> > > > >> > >>> > > > > > > >> > > Thanks!
> > > >> > > > >> > >>> > > > > > > >> > >
> > > >> > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM,
> > Allen
> > > >> > Wang <
> > > >> > > > >> > >>> > > > > allenxwang@gmail.com
> > > >> > > > >> > >>> > > > > > >
> > > >> > > > >> > >>> > > > > > > >> > wrote:
> > > >> > > > >> > >>> > > > > > > >> > >
> > > >> > > > >> > >>> > > > > > > >> > > > We discussed the KIP in the
> > hangout
> > > >> > today.
> > > >> > > > The
> > > >> > > > >> > >>> > > > recommendation
> > > >> > > > >> > >>> > > > > is
> > > >> > > > >> > >>> > > > > > > to
> > > >> > > > >> > >>> > > > > > > >> > make
> > > >> > > > >> > >>> > > > > > > >> > > > rack as a broker property in
> > > >> ZooKeeper.
> > > >> > For
> > > >> > > > >> users
> > > >> > > > >> > >>> with
> > > >> > > > >> > >>> > > > > existing
> > > >> > > > >> > >>> > > > > > > rack
> > > >> > > > >> > >>> > > > > > > >> > > > information stored somewhere,
> they
> > > >> would
> > > >> > > need
> > > >> > > > >> to
> > > >> > > > >> > >>> > retrieve
> > > >> > > > >> > >>> > > > the
> > > >> > > > >> > >>> > > > > > > >> > information
> > > >> > > > >> > >>> > > > > > > >> > > > at broker start up and
> dynamically
> > > set
> > > >> > the
> > > >> > > > rack
> > > >> > > > >> > >>> > property,
> > > >> > > > >> > >>> > > > > which
> > > >> > > > >> > >>> > > > > > > can
> > > >> > > > >> > >>> > > > > > > >> be
> > > >> > > > >> > >>> > > > > > > >> > > > implemented as a wrapper to
> > > bootstrap
> > > >> > > broker.
> > > >> > > > >> > There
> > > >> > > > >> > >>> will
> > > >> > > > >> > >>> > > be
> > > >> > > > >> > >>> > > > no
> > > >> > > > >> > >>> > > > > > > >> > interface
> > > >> > > > >> > >>> > > > > > > >> > > or
> > > >> > > > >> > >>> > > > > > > >> > > > pluggable implementation to
> > retrieve
> > > >> the
> > > >> > > rack
> > > >> > > > >> > >>> > information.
> > > >> > > > >> > >>> > > > > > > >> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > The assumption is that you
> always
> > > >> need to
> > > >> > > > >> restart
> > > >> > > > >> > >>> the
> > > >> > > > >> > >>> > > broker
> > > >> > > > >> > >>> > > > > to
> > > >> > > > >> > >>> > > > > > > >> make a
> > > >> > > > >> > >>> > > > > > > >> > > > change to the rack.
> > > >> > > > >> > >>> > > > > > > >> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > Once the rack becomes a broker
> > > >> property,
> > > >> > it
> > > >> > > > >> will
> > > >> > > > >> > be
> > > >> > > > >> > >>> > > possible
> > > >> > > > >> > >>> > > > > to
> > > >> > > > >> > >>> > > > > > > make
> > > >> > > > >> > >>> > > > > > > >> > rack
> > > >> > > > >> > >>> > > > > > > >> > > > part of the meta data to help
> the
> > > >> > consumer
> > > >> > > > >> choose
> > > >> > > > >> > >>> which
> > > >> > > > >> > >>> > in
> > > >> > > > >> > >>> > > > > sync
> > > >> > > > >> > >>> > > > > > > >> replica
> > > >> > > > >> > >>> > > > > > > >> > > to
> > > >> > > > >> > >>> > > > > > > >> > > > consume from as part of the
> future
> > > >> > consumer
> > > >> > > > >> > >>> enhancement.
> > > >> > > > >> > >>> > > > > > > >> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > I will update the KIP.
> > > >> > > > >> > >>> > > > > > > >> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > Thanks,
> > > >> > > > >> > >>> > > > > > > >> > > > Allen
> > > >> > > > >> > >>> > > > > > > >> > > >
> > > >> > > > >> > >>> > > > > > > >> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM,
> > > Allen
> > > >> > Wang
> > > >> > > <
> > > >> > > > >> > >>> > > > > > allenxwang@gmail.com>
> > > >> > > > >> > >>> > > > > > > >> > wrote:
> > > >> > > > >> > >>> > > > > > > >> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP
> hangout
> > > but
> > > >> > this
> > > >> > > > KIP
> > > >> > > > >> > was
> > > >> > > > >> > >>> not
> > > >> > > > >> > >>> > > > > > discussed
> > > >> > > > >> > >>> > > > > > > >> due
> > > >> > > > >> > >>> > > > > > > >> > to
> > > >> > > > >> > >>> > > > > > > >> > > > > time constraint.
> > > >> > > > >> > >>> > > > > > > >> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > > However, after hearing
> > discussion
> > > of
> > > >> > > > KIP-35,
> > > >> > > > >> I
> > > >> > > > >> > >>> have
> > > >> > > > >> > >>> > the
> > > >> > > > >> > >>> > > > > > feeling
> > > >> > > > >> > >>> > > > > > > >> that
> > > >> > > > >> > >>> > > > > > > >> > > > > incompatibility (caused by new
> > > >> broker
> > > >> > > > >> property)
> > > >> > > > >> > >>> > between
> > > >> > > > >> > >>> > > > > > brokers
> > > >> > > > >> > >>> > > > > > > >> with
> > > >> > > > >> > >>> > > > > > > >> > > > > different versions  will be
> > solved
> > > >> > there.
> > > >> > > > In
> > > >> > > > >> > >>> addition,
> > > >> > > > >> > >>> > > > > having
> > > >> > > > >> > >>> > > > > > > >> stack
> > > >> > > > >> > >>> > > > > > > >> > in
> > > >> > > > >> > >>> > > > > > > >> > > > > broker property as meta data
> may
> > > >> also
> > > >> > > help
> > > >> > > > >> > >>> consumers
> > > >> > > > >> > >>> > in
> > > >> > > > >> > >>> > > > the
> > > >> > > > >> > >>> > > > > > > >> future.
> > > >> > > > >> > >>> > > > > > > >> > So
> > > >> > > > >> > >>> > > > > > > >> > > I
> > > >> > > > >> > >>> > > > > > > >> > > > am
> > > >> > > > >> > >>> > > > > > > >> > > > > open to adding stack property
> to
> > > >> > broker.
> > > >> > > > >> > >>> > > > > > > >> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss this
> in
> > > the
> > > >> > next
> > > >> > > > KIP
> > > >> > > > >> > >>> hangout.
> > > >> > > > >> > >>> > > > > > > >> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46
> PM,
> > > >> Allen
> > > >> > > > Wang <
> > > >> > > > >> > >>> > > > > > > allenxwang@gmail.com
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >> > > > wrote:
> > > >> > > > >> > >>> > > > > > > >> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >> Can you send me the
> information
> > > on
> > > >> the
> > > >> > > > next
> > > >> > > > >> KIP
> > > >> > > > >> > >>> > > hangout?
> > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > >> > > > >> > >>> > > > > > > >> > > > >> Currently the broker-rack
> > mapping
> > > >> is
> > > >> > not
> > > >> > > > >> > cached.
> > > >> > > > >> > >>> In
> > > >> > > > >> > >>> > > > > > KafkaApis,
> > > >> > > > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is
> > > called
> > > >> > each
> > > >> > > > >> time
> > > >> > > > >> > the
> > > >> > > > >> > >>> > > mapping
> > > >> > > > >> > >>> > > > > is
> > > >> > > > >> > >>> > > > > > > >> needed
> > > >> > > > >> > >>> > > > > > > >> > > for
> > > >> > > > >> > >>> > > > > > > >> > > > >> auto topic creation. This
> will
> > > >> ensure
> > > >> > > > latest
> > > >> > > > >> > >>> mapping
> > > >> > > > >> > >>> > is
> > > >> > > > >> > >>> > > > > used
> > > >> > > > >> > >>> > > > > > at
> > > >> > > > >> > >>> > > > > > > >> any
> > > >> > > > >> > >>> > > > > > > >> > > > time.
> > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > >> > > > >> > >>> > > > > > > >> > > > >> The ability to get the
> complete
> > > >> > mapping
> > > >> > > > >> makes
> > > >> > > > >> > it
> > > >> > > > >> > >>> > simple
> > > >> > > > >> > >>> > > > to
> > > >> > > > >> > >>> > > > > > > reuse
> > > >> > > > >> > >>> > > > > > > >> the
> > > >> > > > >> > >>> > > > > > > >> > > > same
> > > >> > > > >> > >>> > > > > > > >> > > > >> interface in command line
> > tools.
> > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > >> > > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01
> > AM,
> > > >> > Aditya
> > > >> > > > >> > >>> Auradkar <
> > > >> > > > >> > >>> > > > > > > >> > > > >>
> aauradkar@linkedin.com.invalid
> > >
> > > >> > wrote:
> > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this
> during
> > > the
> > > >> > next
> > > >> > > > KIP
> > > >> > > > >> > >>> hangout?
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> I do see that a pluggable
> rack
> > > >> > locator
> > > >> > > > can
> > > >> > > > >> be
> > > >> > > > >> > >>> useful
> > > >> > > > >> > >>> > > > but I
> > > >> > > > >> > >>> > > > > > do
> > > >> > > > >> > >>> > > > > > > >> see a
> > > >> > > > >> > >>> > > > > > > >> > > few
> > > >> > > > >> > >>> > > > > > > >> > > > >>> concerns:
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as
> > described
> > > in
> > > >> > the
> > > >> > > > >> > >>> document),
> > > >> > > > >> > >>> > > > implies
> > > >> > > > >> > >>> > > > > > that
> > > >> > > > >> > >>> > > > > > > >> it
> > > >> > > > >> > >>> > > > > > > >> > can
> > > >> > > > >> > >>> > > > > > > >> > > > >>> discover rack information
> for
> > > any
> > > >> > node
> > > >> > > in
> > > >> > > > >> the
> > > >> > > > >> > >>> > cluster.
> > > >> > > > >> > >>> > > > How
> > > >> > > > >> > >>> > > > > > > does
> > > >> > > > >> > >>> > > > > > > >> it
> > > >> > > > >> > >>> > > > > > > >> > > deal
> > > >> > > > >> > >>> > > > > > > >> > > > >>> with rack location changes?
> > For
> > > >> > > example,
> > > >> > > > >> if I
> > > >> > > > >> > >>> moved
> > > >> > > > >> > >>> > > > broker
> > > >> > > > >> > >>> > > > > > id
> > > >> > > > >> > >>> > > > > > > >> (1)
> > > >> > > > >> > >>> > > > > > > >> > > from
> > > >> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > >> > > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start
> > > that
> > > >> > > broker
> > > >> > > > >> with
> > > >> > > > >> > a
> > > >> > > > >> > >>> > newer
> > > >> > > > >> > >>> > > > rack
> > > >> > > > >> > >>> > > > > > > >> config.
> > > >> > > > >> > >>> > > > > > > >> > If
> > > >> > > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker
> > ->
> > > >> rack
> > > >> > > > >> > >>> information at
> > > >> > > > >> > >>> > > > start
> > > >> > > > >> > >>> > > > > up
> > > >> > > > >> > >>> > > > > > > >> time,
> > > >> > > > >> > >>> > > > > > > >> > > any
> > > >> > > > >> > >>> > > > > > > >> > > > >>> change to a broker will
> > require
> > > >> > > bouncing
> > > >> > > > >> the
> > > >> > > > >> > >>> entire
> > > >> > > > >> > >>> > > > > cluster
> > > >> > > > >> > >>> > > > > > > >> since
> > > >> > > > >> > >>> > > > > > > >> > > > >>> createTopic requests can be
> > sent
> > > >> to
> > > >> > any
> > > >> > > > >> node
> > > >> > > > >> > in
> > > >> > > > >> > >>> the
> > > >> > > > >> > >>> > > > > cluster.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> For this reason it may be
> > > simpler
> > > >> to
> > > >> > > have
> > > >> > > > >> each
> > > >> > > > >> > >>> node
> > > >> > > > >> > >>> > be
> > > >> > > > >> > >>> > > > > aware
> > > >> > > > >> > >>> > > > > > > of
> > > >> > > > >> > >>> > > > > > > >> its
> > > >> > > > >> > >>> > > > > > > >> > > own
> > > >> > > > >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK
> > during
> > > >> > start
> > > >> > > up
> > > >> > > > >> > time.
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator
> > relies
> > > >> on
> > > >> > an
> > > >> > > > >> > external
> > > >> > > > >> > >>> > > service
> > > >> > > > >> > >>> > > > > > being
> > > >> > > > >> > >>> > > > > > > >> > > available
> > > >> > > > >> > >>> > > > > > > >> > > > >>> to
> > > >> > > > >> > >>> > > > > > > >> > > > >>> serve rack information.
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked
> up
> > > how
> > > >> a
> > > >> > > > couple
> > > >> > > > >> of
> > > >> > > > >> > >>> other
> > > >> > > > >> > >>> > > > > systems
> > > >> > > > >> > >>> > > > > > > deal
> > > >> > > > >> > >>> > > > > > > >> > with
> > > >> > > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> For Cassandra some
> interesting
> > > >> modes
> > > >> > > are:
> > > >> > > > >> > >>> > > > > > > >> > > > >>> (Property File
> configuration)
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > >
> > > >> > > > >> > >>> > > > > > > >> > >
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >>
> > > >> > > > >> > >>> > > > > > >
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > >
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> >
> > > >> > > > >> > >>>
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > >> > > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > >
> > > >> > > > >> > >>> > > > > > > >> > >
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >>
> > > >> > > > >> > >>> > > > > > >
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > >
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> >
> > > >> > > > >> > >>>
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> Voldemort does a static node
> > ->
> > > >> zone
> > > >> > > > >> > assignment
> > > >> > > > >> > >>> > based
> > > >> > > > >> > >>> > > on
> > > >> > > > >> > >>> > > > > > > >> > > configuration.
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> Aditya
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at
> 10:05
> > > AM,
> > > >> > Allen
> > > >> > > > >> Wang <
> > > >> > > > >> > >>> > > > > > > >> allenxwang@gmail.com
> > > >> > > > >> > >>> > > > > > > >> > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > I would like to see if we
> > can
> > > do
> > > >> > > both:
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator
> pluggable
> > > to
> > > >> > > > >> facilitate
> > > >> > > > >> > >>> > migration
> > > >> > > > >> > >>> > > > > with
> > > >> > > > >> > >>> > > > > > > >> > existing
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make rack an optional
> > > property
> > > >> > for
> > > >> > > > >> broker.
> > > >> > > > >> > >>> If
> > > >> > > > >> > >>> > rack
> > > >> > > > >> > >>> > > > is
> > > >> > > > >> > >>> > > > > > > >> available
> > > >> > > > >> > >>> > > > > > > >> > > > from
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as source
> > of
> > > >> > truth.
> > > >> > > > For
> > > >> > > > >> > users
> > > >> > > > >> > >>> > with
> > > >> > > > >> > >>> > > > > > existing
> > > >> > > > >> > >>> > > > > > > >> > > > >>> broker-rack
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere else,
> they
> > > can
> > > >> > use
> > > >> > > > the
> > > >> > > > >> > >>> pluggable
> > > >> > > > >> > >>> > > way
> > > >> > > > >> > >>> > > > > or
> > > >> > > > >> > >>> > > > > > > they
> > > >> > > > >> > >>> > > > > > > >> > can
> > > >> > > > >> > >>> > > > > > > >> > > > >>> transfer
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > the mapping to the broker
> > rack
> > > >> > > > property.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is
> > > what
> > > >> > > happens
> > > >> > > > >> at
> > > >> > > > >> > >>> rolling
> > > >> > > > >> > >>> > > > > upgrade
> > > >> > > > >> > >>> > > > > > > >> when
> > > >> > > > >> > >>> > > > > > > >> > we
> > > >> > > > >> > >>> > > > > > > >> > > > have
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > rack as a broker property.
> > For
> > > >> > > brokers
> > > >> > > > >> with
> > > >> > > > >> > >>> older
> > > >> > > > >> > >>> > > > > version
> > > >> > > > >> > >>> > > > > > of
> > > >> > > > >> > >>> > > > > > > >> > Kafka,
> > > >> > > > >> > >>> > > > > > > >> > > > >>> will it
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > cause problem for them? If
> > so,
> > > >> is
> > > >> > > there
> > > >> > > > >> any
> > > >> > > > >> > >>> > > > workaround?
> > > >> > > > >> > >>> > > > > I
> > > >> > > > >> > >>> > > > > > > also
> > > >> > > > >> > >>> > > > > > > >> > > think
> > > >> > > > >> > >>> > > > > > > >> > > > it
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > would be better not to
> have
> > > >> rack in
> > > >> > > the
> > > >> > > > >> > >>> controller
> > > >> > > > >> > >>> > > > wire
> > > >> > > > >> > >>> > > > > > > >> protocol
> > > >> > > > >> > >>> > > > > > > >> > > but
> > > >> > > > >> > >>> > > > > > > >> > > > >>> not
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > Thanks,
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > Allen
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at
> 4:55
> > > PM,
> > > >> > Todd
> > > >> > > > >> > Palino <
> > > >> > > > >> > >>> > > > > > > >> tpalino@gmail.com>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea
> > of a
> > > >> > > > pluggable
> > > >> > > > >> > >>> locator.
> > > >> > > > >> > >>> > > For
> > > >> > > > >> > >>> > > > > > > >> example, we
> > > >> > > > >> > >>> > > > > > > >> > > > >>> already
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > have an interface for
> > > >> discovering
> > > >> > > > >> > >>> information
> > > >> > > > >> > >>> > > about
> > > >> > > > >> > >>> > > > > the
> > > >> > > > >> > >>> > > > > > > >> > physical
> > > >> > > > >> > >>> > > > > > > >> > > > >>> location
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't
> relish
> > > the
> > > >> > idea
> > > >> > > > of
> > > >> > > > >> > >>> having to
> > > >> > > > >> > >>> > > > > > maintain
> > > >> > > > >> > >>> > > > > > > >> data
> > > >> > > > >> > >>> > > > > > > >> > in
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > multiple
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > places.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > -Todd
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at
> > 4:48
> > > >> PM,
> > > >> > > > Aditya
> > > >> > > > >> > >>> > Auradkar <
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > aauradkar@linkedin.com.invalid
> > > >> >
> > > >> > > > wrote:
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting
> this
> > > KIP
> > > >> > > Allen.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that
> > > >> having a
> > > >> > > > >> > >>> RackLocator
> > > >> > > > >> > >>> > > class
> > > >> > > > >> > >>> > > > > that
> > > >> > > > >> > >>> > > > > > > is
> > > >> > > > >> > >>> > > > > > > >> > > > pluggable
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > seems
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The
> > KIP
> > > >> > refers
> > > >> > > > to
> > > >> > > > >> > >>> > potentially
> > > >> > > > >> > >>> > > > > > non-ZK
> > > >> > > > >> > >>> > > > > > > >> > storage
> > > >> > > > >> > >>> > > > > > > >> > > > >>> for the
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack info which I
> don't
> > > >> think
> > > >> > is
> > > >> > > > >> > >>> necessary.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist
> > > this
> > > >> > info
> > > >> > > in
> > > >> > > > >> zk
> > > >> > > > >> > >>> under
> > > >> > > > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > similar to other
> broker
> > > >> > > properties
> > > >> > > > >> and
> > > >> > > > >> > >>> add a
> > > >> > > > >> > >>> > > > config
> > > >> > > > >> > >>> > > > > in
> > > >> > > > >> > >>> > > > > > > >> > > > KafkaConfig
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > called
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > "rack".
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > > > >> > >>> > > > > > >
> > > >> > > > >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > >> > > > >> > >>> > > > > > > >> > > "rack":
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > "abc"}
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015
> at
> > > 2:30
> > > >> > PM,
> > > >> > > > Gwen
> > > >> > > > >> > >>> Shapira
> > > >> > > > >> > >>> > <
> > > >> > > > >> > >>> > > > > > > >> > > gwen@confluent.io
> > > >> > > > >> > >>> > > > > > > >> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > wrote:
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for
> > > putting
> > > >> > out a
> > > >> > > > KIP
> > > >> > > > >> > for
> > > >> > > > >> > >>> > this.
> > > >> > > > >> > >>> > > > This
> > > >> > > > >> > >>> > > > > > is
> > > >> > > > >> > >>> > > > > > > >> super
> > > >> > > > >> > >>> > > > > > > >> > > > >>> important
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > for
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > production
> deployments
> > > of
> > > >> > > Kafka.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we
> want
> > > "as
> > > >> > many
> > > >> > > > >> racks
> > > >> > > > >> > as
> > > >> > > > >> > >>> > > > > possible"?
> > > >> > > > >> > >>> > > > > > > I'd
> > > >> > > > >> > >>> > > > > > > >> > want
> > > >> > > > >> > >>> > > > > > > >> > > to
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > balance
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > between safety (more
> > > >> racks)
> > > >> > and
> > > >> > > > >> > network
> > > >> > > > >> > >>> > > > > utilization
> > > >> > > > >> > >>> > > > > > > >> > (traffic
> > > >> > > > >> > >>> > > > > > > >> > > > >>> within a
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > uses the
> > high-bandwidth
> > > >> TOR
> > > >> > > > >> switch).
> > > >> > > > >> > One
> > > >> > > > >> > >>> > > replica
> > > >> > > > >> > >>> > > > > on
> > > >> > > > >> > >>> > > > > > a
> > > >> > > > >> > >>> > > > > > > >> > > different
> > > >> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > and
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same
> rack
> > > (if
> > > >> > > > possible)
> > > >> > > > >> > >>> sounds
> > > >> > > > >> > >>> > > > better
> > > >> > > > >> > >>> > > > > to
> > > >> > > > >> > >>> > > > > > > me.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator
> class
> > > >> seems
> > > >> > > > overly
> > > >> > > > >> > >>> complex
> > > >> > > > >> > >>> > > > > compared
> > > >> > > > >> > >>> > > > > > to
> > > >> > > > >> > >>> > > > > > > >> > > adding a
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack.number
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > property to the
> broker
> > > >> > > properties
> > > >> > > > >> > file.
> > > >> > > > >> > >>> Why
> > > >> > > > >> > >>> > do
> > > >> > > > >> > >>> > > > we
> > > >> > > > >> > >>> > > > > > want
> > > >> > > > >> > >>> > > > > > > >> > that?
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015
> > at
> > > >> 12:15
> > > >> > > PM,
> > > >> > > > >> > Allen
> > > >> > > > >> > >>> > Wang <
> > > >> > > > >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > wrote:
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka
> > > Developers,
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just created
> > KIP-36
> > > >> for
> > > >> > > rack
> > > >> > > > >> aware
> > > >> > > > >> > >>> > replica
> > > >> > > > >> > >>> > > > > > > >> assignment.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > >
> > > >> > > > >> > >>> > > > > > > >> > >
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >>
> > > >> > > > >> > >>> > > > > > >
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > >
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> >
> > > >> > > > >> > >>>
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to
> > utilize
> > > >> the
> > > >> > > > >> isolation
> > > >> > > > >> > >>> > > provided
> > > >> > > > >> > >>> > > > by
> > > >> > > > >> > >>> > > > > > the
> > > >> > > > >> > >>> > > > > > > >> > racks
> > > >> > > > >> > >>> > > > > > > >> > > in
> > > >> > > > >> > >>> > > > > > > >> > > > >>> data
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > center
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute
> > > replicas
> > > >> to
> > > >> > > > racks
> > > >> > > > >> to
> > > >> > > > >> > >>> > provide
> > > >> > > > >> > >>> > > > > fault
> > > >> > > > >> > >>> > > > > > > >> > > tolerance.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are
> > welcome.
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > > > >> > >>> > > > > > > >> > > > >>>
> > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > >> > > > >> > >>> > > > > > > >> > > > >>
> > > >> > > > >> > >>> > > > > > > >> > > > >
> > > >> > > > >> > >>> > > > > > > >> > > >
> > > >> > > > >> > >>> > > > > > > >> > >
> > > >> > > > >> > >>> > > > > > > >> >
> > > >> > > > >> > >>> > > > > > > >>
> > > >> > > > >> > >>> > > > > > > >
> > > >> > > > >> > >>> > > > > > > >
> > > >> > > > >> > >>> > > > > > >
> > > >> > > > >> > >>> > > > > >
> > > >> > > > >> > >>> > > > >
> > > >> > > > >> > >>> > > >
> > > >> > > > >> > >>> > >
> > > >> > > > >> > >>> >
> > > >> > > > >> > >>>
> > > >> > > > >> > >>>
> > > >> > > > >> > >>>
> > > >> > > > >> > >>> --
> > > >> > > > >> > >>> Thanks,
> > > >> > > > >> > >>> Neha
> > > >> > > > >> > >>>
> > > >> > > > >> > >>
> > > >> > > > >> > >>
> > > >> > > > >> > >
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Hi Jun,

That's a good suggestion. However, it does not solve the problem for the
clients or thirty party tools that get broker information directly from
ZooKeeper.

Thanks,
Allen


On Tue, Jan 12, 2016 at 1:29 PM, Jun Rao <ju...@confluent.io> wrote:

> Allen,
>
> Another way to do this is the following.
>
> When inter.broker.protocol.version is set to 0.9.0, the broker will write
> the broker info in ZK using version 2, ignoring the rack info.
>
> When inter.broker.protocol.version is set to 0.9.1, the broker will write
> the broker info in ZK using version 3, including the rack info.
>
> If one follows the upgrade process, after the 2nd round of rolling bounces,
> every broker is capable of parsing version 3 of broker info in ZK. This is
> when the rack-aware feature will be used.
>
>
> Thanks,
>
> Jun
>
> On Tue, Jan 12, 2016 at 12:19 PM, Allen Wang <al...@gmail.com> wrote:
>
> > Regarding the JSON version of Broker:
> >
> > I don't why the ZkUtils.getBrokerInfo() restricts the JSON versions it
> can
> > read. It will throw exception if version is not 1 or 2. Seems to me that
> it
> > will cause compatibility problem whenever the version needs to be changed
> > and make the upgrade path difficult.
> >
> > One option we have is to make rack also part of version 2 and keep the
> > version 2 unchanged for this update. This will make the old clients
> > compatible. During rolling upgrade, it will also avoid problems if the
> > controller/broker is still the old version.
> >
> > However, ZkUtils.getBrokerInfo() will be updated to return the Broker
> with
> > rack so the rack information will be available once the server/client is
> > upgraded to the latest version.
> >
> >
> >
> > On Wed, Jan 6, 2016 at 6:28 PM, Allen Wang <al...@gmail.com> wrote:
> >
> > > Updated KIP according to Jun's comment and included changes to TMR.
> > >
> > > On Tue, Jan 5, 2016 at 5:59 PM, Jun Rao <ju...@confluent.io> wrote:
> > >
> > >> Hi, Allen,
> > >>
> > >> A couple of minor comments on the KIP.
> > >>
> > >> 1. The version of the broker JSON string says 2. It should be 3.
> > >>
> > >> 2. The new version of UpdateMetadataRequest should be 2, instead of 1.
> > >> Could you include the full wire protocol of version 2 of
> > >> UpdateMetadataRequest and highlight the changed part?
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >> On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <al...@gmail.com>
> > wrote:
> > >>
> > >> > Jun and I had a chance to discuss it in a meeting and it is agreed
> to
> > >> > change the TMR in a different patch.
> > >> >
> > >> > I can change the KIP to include rack in TMR. The essential change is
> > to
> > >> add
> > >> > rack into class BrokerEndPoint and make TMR version aware.
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
> > >> > aauradkar@linkedin.com.invalid> wrote:
> > >> >
> > >> > > Jun/Allen -
> > >> > >
> > >> > > Did we ever actually agree on whether we should evolve the TMR to
> > >> include
> > >> > > rack info or not?
> > >> > > I don't feel strongly about it but I if it's the right thing to do
> > we
> > >> > > should probably do it in this KIP (can be a separate patch).. it
> > >> isn't a
> > >> > > large change.
> > >> > >
> > >> > > Aditya
> > >> > >
> > >> > > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <allenxwang@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Added the rolling upgrade instruction in the KIP, similar to
> those
> > >> in
> > >> > > 0.9.0
> > >> > > > release notes.
> > >> > > >
> > >> > > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <
> > allenxwang@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Hi Jun,
> > >> > > > >
> > >> > > > > The reason that TopicMetadataResponse is not included in the
> KIP
> > >> is
> > >> > > that
> > >> > > > > it currently is not version aware . So we need to introduce
> > >> version
> > >> > to
> > >> > > it
> > >> > > > > in order to make sure backward compatibility. It seems to me a
> > big
> > >> > > > change.
> > >> > > > > Do we want to couple it with this KIP? Do we need to further
> > >> discuss
> > >> > > what
> > >> > > > > information to include in the new version besides rack? For
> > >> example,
> > >> > > > should
> > >> > > > > we include broker security protocol in TopicMetadataResponse?
> > >> > > > >
> > >> > > > > The other option is to make it a separate KIP to make
> > >> > > > > TopicMetadataResponse version aware and decide what to
> include,
> > >> and
> > >> > > make
> > >> > > > > this KIP focus on the rack aware algorithm, admin tools  and
> > >> related
> > >> > > > > changes to inter-broker protocol .
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Allen
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <ju...@confluent.io>
> > >> wrote:
> > >> > > > >
> > >> > > > >> Allen,
> > >> > > > >>
> > >> > > > >> Thanks for the proposal. A few comments.
> > >> > > > >>
> > >> > > > >> 1. Since this KIP changes the inter broker communication
> > protocol
> > >> > > > >> (UpdateMetadataRequest), we will need to document the upgrade
> > >> path
> > >> > > > >> (similar
> > >> > > > >> to what's described in
> > >> > > > >> http://kafka.apache.org/090/documentation.html#upgrade).
> > >> > > > >>
> > >> > > > >> 2. It might be useful to include the rack info of the broker
> in
> > >> > > > >> TopicMetadataResponse. This can be useful for administrative
> > >> tasks,
> > >> > as
> > >> > > > >> well
> > >> > > > >> as read affinity in the future.
> > >> > > > >>
> > >> > > > >> Jun
> > >> > > > >>
> > >> > > > >>
> > >> > > > >>
> > >> > > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <
> > >> allenxwang@gmail.com>
> > >> > > > wrote:
> > >> > > > >>
> > >> > > > >> > If there are no more comments I would like to call for a
> > vote.
> > >> > > > >> >
> > >> > > > >> >
> > >> > > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <
> > >> > allenxwang@gmail.com>
> > >> > > > >> wrote:
> > >> > > > >> >
> > >> > > > >> > > KIP is updated with more details and how to handle the
> > >> situation
> > >> > > > where
> > >> > > > >> > > rack information is incomplete.
> > >> > > > >> > >
> > >> > > > >> > > In the situation where rack information is incomplete,
> but
> > we
> > >> > want
> > >> > > > to
> > >> > > > >> > > continue with the assignment, I have suggested to ignore
> > all
> > >> > rack
> > >> > > > >> > > information and fallback to original algorithm. The
> reason
> > is
> > >> > > > >> explained
> > >> > > > >> > > below:
> > >> > > > >> > >
> > >> > > > >> > > The other options are to assume that the broker without
> the
> > >> rack
> > >> > > > >> belong
> > >> > > > >> > to
> > >> > > > >> > > its own unique rack, or they belong to one "default"
> rack.
> > >> > Either
> > >> > > > way
> > >> > > > >> we
> > >> > > > >> > > choose, it is highly likely to result in uneven number of
> > >> > brokers
> > >> > > in
> > >> > > > >> > racks,
> > >> > > > >> > > and it is quite possible that the "made up" racks will
> have
> > >> much
> > >> > > > fewer
> > >> > > > >> > > number of brokers. As I explained in the KIP, uneven
> number
> > >> of
> > >> > > > >> brokers in
> > >> > > > >> > > racks will lead to uneven distribution of replicas among
> > >> brokers
> > >> > > > (even
> > >> > > > >> > > though the leader distribution is still even). The
> brokers
> > in
> > >> > the
> > >> > > > rack
> > >> > > > >> > that
> > >> > > > >> > > has fewer number of brokers will get more replicas per
> > broker
> > >> > than
> > >> > > > >> > brokers
> > >> > > > >> > > in other racks.
> > >> > > > >> > >
> > >> > > > >> > > Given this fact and the replica assignment produced will
> be
> > >> > > > incorrect
> > >> > > > >> > > anyway from rack aware point of view, ignoring all rack
> > >> > > information
> > >> > > > >> and
> > >> > > > >> > > fallback to the original algorithm is not a bad choice
> > since
> > >> it
> > >> > > will
> > >> > > > >> at
> > >> > > > >> > > least have a better guarantee of replica distribution.
> > >> > > > >> > >
> > >> > > > >> > > Also for command line tools it gives user a choice if for
> > any
> > >> > > reason
> > >> > > > >> they
> > >> > > > >> > > want to ignore rack information and fallback to the
> > original
> > >> > > > >> algorithm.
> > >> > > > >> > >
> > >> > > > >> > >
> > >> > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <
> > >> > allenxwang@gmail.com
> > >> > > >
> > >> > > > >> > wrote:
> > >> > > > >> > >
> > >> > > > >> > >> I am busy with some time pressing issues for the last
> few
> > >> > days. I
> > >> > > > >> will
> > >> > > > >> > >> think about how the incomplete rack information will
> > affect
> > >> the
> > >> > > > >> balance
> > >> > > > >> > and
> > >> > > > >> > >> update the KIP by early next week.
> > >> > > > >> > >>
> > >> > > > >> > >> Thanks,
> > >> > > > >> > >> Allen
> > >> > > > >> > >>
> > >> > > > >> > >>
> > >> > > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <
> > >> > neha@confluent.io
> > >> > > >
> > >> > > > >> > wrote:
> > >> > > > >> > >>
> > >> > > > >> > >>> Few suggestions on improving the KIP
> > >> > > > >> > >>>
> > >> > > > >> > >>> *If some brokers have rack, and some do not, the
> > algorithm
> > >> > will
> > >> > > > >> thrown
> > >> > > > >> > an
> > >> > > > >> > >>> > exception. This is to prevent incorrect assignment
> > >> caused by
> > >> > > > user
> > >> > > > >> > >>> error.*
> > >> > > > >> > >>>
> > >> > > > >> > >>>
> > >> > > > >> > >>> In the KIP, can you clearly state the user-facing
> > behavior
> > >> > when
> > >> > > > some
> > >> > > > >> > >>> brokers have rack information and some don't. Which
> > actions
> > >> > and
> > >> > > > >> > requests
> > >> > > > >> > >>> will error out and how?
> > >> > > > >> > >>>
> > >> > > > >> > >>> *Even distribution of partition leadership among
> brokers*
> > >> > > > >> > >>>
> > >> > > > >> > >>>
> > >> > > > >> > >>> There is some information about arranging the sorted
> > broker
> > >> > list
> > >> > > > >> > >>> interlaced
> > >> > > > >> > >>> with rack ids. Can you describe the changes to the
> > current
> > >> > > > algorithm
> > >> > > > >> > in a
> > >> > > > >> > >>> little more detail? How does this interlacing work if
> > only
> > >> a
> > >> > > > subset
> > >> > > > >> of
> > >> > > > >> > >>> brokers have the rack id configured? Does this still
> work
> > >> if
> > >> > > > uneven
> > >> > > > >> #
> > >> > > > >> > of
> > >> > > > >> > >>> brokers are assigned to each rack? It might work, I'm
> > >> looking
> > >> > > for
> > >> > > > >> more
> > >> > > > >> > >>> details on the changes, since it will affect the
> behavior
> > >> seen
> > >> > > by
> > >> > > > >> the
> > >> > > > >> > >>> user
> > >> > > > >> > >>> - imbalance on either the leaders or data or both.
> > >> > > > >> > >>>
> > >> > > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> > >> > > > >> > aauradkar@linkedin.com>
> > >> > > > >> > >>> wrote:
> > >> > > > >> > >>>
> > >> > > > >> > >>> > I think this sounds reasonable. Anyone else have
> > >> comments?
> > >> > > > >> > >>> >
> > >> > > > >> > >>> > Aditya
> > >> > > > >> > >>> >
> > >> > > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> > >> > > > allenxwang@gmail.com
> > >> > > > >> >
> > >> > > > >> > >>> wrote:
> > >> > > > >> > >>> >
> > >> > > > >> > >>> > > During the discussion in the hangout, it was
> > mentioned
> > >> > that
> > >> > > it
> > >> > > > >> > would
> > >> > > > >> > >>> be
> > >> > > > >> > >>> > > desirable that consumers know the rack information
> of
> > >> the
> > >> > > > >> brokers
> > >> > > > >> > so
> > >> > > > >> > >>> that
> > >> > > > >> > >>> > > they can consume from the broker in the same rack
> to
> > >> > reduce
> > >> > > > >> > latency.
> > >> > > > >> > >>> As I
> > >> > > > >> > >>> > > understand this will only be beneficial if consumer
> > can
> > >> > > > consume
> > >> > > > >> > from
> > >> > > > >> > >>> any
> > >> > > > >> > >>> > > broker in ISR, which is not possible now.
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > I suggest we skip the change to TMR. Once the
> change
> > is
> > >> > made
> > >> > > > to
> > >> > > > >> > >>> consumer
> > >> > > > >> > >>> > to
> > >> > > > >> > >>> > > be able to consume from any broker in ISR, the rack
> > >> > > > information
> > >> > > > >> can
> > >> > > > >> > >>> be
> > >> > > > >> > >>> > > added to TMR.
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > Another thing I want to confirm is  command line
> > >> > behavior. I
> > >> > > > >> think
> > >> > > > >> > >>> the
> > >> > > > >> > >>> > > desirable default behavior is to fail fast on
> command
> > >> line
> > >> > > for
> > >> > > > >> > >>> incomplete
> > >> > > > >> > >>> > > rack mapping. The error message can include further
> > >> > > > instruction
> > >> > > > >> > that
> > >> > > > >> > >>> > tells
> > >> > > > >> > >>> > > the user to add an extra argument (like
> > >> > > > >> "--allow-partial-rackinfo")
> > >> > > > >> > >>> to
> > >> > > > >> > >>> > > suppress the error and do an imperfect rack aware
> > >> > > assignment.
> > >> > > > If
> > >> > > > >> > the
> > >> > > > >> > >>> > > default behavior is to allow incomplete mapping,
> the
> > >> error
> > >> > > can
> > >> > > > >> > still
> > >> > > > >> > >>> be
> > >> > > > >> > >>> > > easily missed.
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > The affected command line tools are TopicCommand
> and
> > >> > > > >> > >>> > > ReassignPartitionsCommand.
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > Thanks,
> > >> > > > >> > >>> > > Allen
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> > >> > > > >> > >>> > aauradkar@linkedin.com>
> > >> > > > >> > >>> > > wrote:
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > > Hi Allen,
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > For TopicMetadataResponse to understand version,
> > you
> > >> can
> > >> > > > bump
> > >> > > > >> up
> > >> > > > >> > >>> the
> > >> > > > >> > >>> > > > request version itself. Based on the version of
> the
> > >> > > request,
> > >> > > > >> the
> > >> > > > >> > >>> > response
> > >> > > > >> > >>> > > > can be appropriately serialized. It shouldn't be
> a
> > >> huge
> > >> > > > >> change.
> > >> > > > >> > For
> > >> > > > >> > >>> > > > example: We went through something similar for
> > >> > > > ProduceRequest
> > >> > > > >> > >>> recently
> > >> > > > >> > >>> > (
> > >> > > > >> > >>> > > > https://reviews.apache.org/r/33378/)
> > >> > > > >> > >>> > > > I guess the reason protocol information is not
> > >> included
> > >> > in
> > >> > > > the
> > >> > > > >> > TMR
> > >> > > > >> > >>> is
> > >> > > > >> > >>> > > > because the topic itself is independent of any
> > >> > particular
> > >> > > > >> > protocol
> > >> > > > >> > >>> (SSL
> > >> > > > >> > >>> > > vs
> > >> > > > >> > >>> > > > Plaintext). Having said that, I'm not sure we
> even
> > >> need
> > >> > > rack
> > >> > > > >> > >>> > information
> > >> > > > >> > >>> > > in
> > >> > > > >> > >>> > > > TMR. What usecase were you thinking of initially?
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > For 1 - I'd be fine with adding an option to the
> > >> command
> > >> > > > line
> > >> > > > >> > tools
> > >> > > > >> > >>> > that
> > >> > > > >> > >>> > > > check rack assignment. For e.g.
> > >> "--strict-assignment" or
> > >> > > > >> > something
> > >> > > > >> > >>> > > similar.
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > Aditya
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> > >> > > > >> > allenxwang@gmail.com>
> > >> > > > >> > >>> > > wrote:
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > > For 2 and 3, I have updated the KIP. Please
> take
> > a
> > >> > look.
> > >> > > > One
> > >> > > > >> > >>> thing I
> > >> > > > >> > >>> > > have
> > >> > > > >> > >>> > > > > changed is removing the proposal to add rack to
> > >> > > > >> > >>> > TopicMetadataResponse.
> > >> > > > >> > >>> > > > The
> > >> > > > >> > >>> > > > > reason is that unlike UpdateMetadataRequest,
> > >> > > > >> > >>> TopicMetadataResponse
> > >> > > > >> > >>> > does
> > >> > > > >> > >>> > > > not
> > >> > > > >> > >>> > > > > understand version. I don't see a way to
> include
> > >> rack
> > >> > > > >> without
> > >> > > > >> > >>> > breaking
> > >> > > > >> > >>> > > > old
> > >> > > > >> > >>> > > > > version of clients. That's probably why secure
> > >> > protocol
> > >> > > is
> > >> > > > >> not
> > >> > > > >> > >>> > included
> > >> > > > >> > >>> > > > in
> > >> > > > >> > >>> > > > > the TopicMetadataResponse either. I think it
> will
> > >> be a
> > >> > > > much
> > >> > > > >> > >>> bigger
> > >> > > > >> > >>> > > change
> > >> > > > >> > >>> > > > > to include rack in TopicMetadataResponse.
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > For 1, my concern is that doing rack aware
> > >> assignment
> > >> > > > >> without
> > >> > > > >> > >>> > complete
> > >> > > > >> > >>> > > > > broker to rack mapping will result in
> assignment
> > >> that
> > >> > is
> > >> > > > not
> > >> > > > >> > rack
> > >> > > > >> > >>> > aware
> > >> > > > >> > >>> > > > and
> > >> > > > >> > >>> > > > > fail to provide fault tolerance in the event of
> > >> rack
> > >> > > > outage.
> > >> > > > >> > This
> > >> > > > >> > >>> > kind
> > >> > > > >> > >>> > > of
> > >> > > > >> > >>> > > > > problem will be difficult to surface. And the
> > cost
> > >> of
> > >> > > this
> > >> > > > >> > >>> problem is
> > >> > > > >> > >>> > > > high:
> > >> > > > >> > >>> > > > > you have to do partition reassignment if you
> are
> > >> lucky
> > >> > > to
> > >> > > > >> spot
> > >> > > > >> > >>> the
> > >> > > > >> > >>> > > > problem
> > >> > > > >> > >>> > > > > early on or face the consequence of data loss
> > >> during
> > >> > > real
> > >> > > > >> rack
> > >> > > > >> > >>> > outage.
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > I do see the concern of fail-fast as it might
> > also
> > >> > cause
> > >> > > > >> data
> > >> > > > >> > >>> loss if
> > >> > > > >> > >>> > > > > producer is not able produce the message due to
> > >> topic
> > >> > > > >> creation
> > >> > > > >> > >>> > failure.
> > >> > > > >> > >>> > > > Is
> > >> > > > >> > >>> > > > > it feasible to treat dynamic topic creation and
> > >> > command
> > >> > > > >> tools
> > >> > > > >> > >>> > > > differently?
> > >> > > > >> > >>> > > > > We allow dynamic topic creation with incomplete
> > >> > > > broker-rack
> > >> > > > >> > >>> mapping
> > >> > > > >> > >>> > and
> > >> > > > >> > >>> > > > > fail fast in command line. Another option is to
> > let
> > >> > user
> > >> > > > >> > >>> determine
> > >> > > > >> > >>> > the
> > >> > > > >> > >>> > > > > behavior for command line. For example, by
> > default
> > >> > fail
> > >> > > > >> fast in
> > >> > > > >> > >>> > command
> > >> > > > >> > >>> > > > > line but allow incomplete broker-rack mapping
> if
> > >> > another
> > >> > > > >> switch
> > >> > > > >> > >>> is
> > >> > > > >> > >>> > > > > provided.
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya
> > Auradkar <
> > >> > > > >> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > > Hey Allen,
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > > > 1. If we choose fail fast topic creation, we
> > will
> > >> > have
> > >> > > > >> topic
> > >> > > > >> > >>> > creation
> > >> > > > >> > >>> > > > > > failures while upgrading the cluster. I
> really
> > >> doubt
> > >> > > we
> > >> > > > >> want
> > >> > > > >> > >>> this
> > >> > > > >> > >>> > > > > behavior.
> > >> > > > >> > >>> > > > > > Ideally, this should be invisible to clients
> > of a
> > >> > > > cluster.
> > >> > > > >> > >>> > Currently,
> > >> > > > >> > >>> > > > > each
> > >> > > > >> > >>> > > > > > broker is effectively its own rack. So we
> > >> probably
> > >> > can
> > >> > > > use
> > >> > > > >> > the
> > >> > > > >> > >>> rack
> > >> > > > >> > >>> > > > > > information whenever possible but not make
> it a
> > >> hard
> > >> > > > >> > >>> requirement.
> > >> > > > >> > >>> > To
> > >> > > > >> > >>> > > > > extend
> > >> > > > >> > >>> > > > > > Gwen's example, one badly configured broker
> > >> should
> > >> > not
> > >> > > > >> > degrade
> > >> > > > >> > >>> > topic
> > >> > > > >> > >>> > > > > > creation for the entire cluster.
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a section
> on
> > >> the
> > >> > > > upgrade
> > >> > > > >> > >>> piece to
> > >> > > > >> > >>> > > > > confirm
> > >> > > > >> > >>> > > > > > that old clients will not see errors? I
> believe
> > >> > > > >> > >>> > > > > ZookeeperConsumerConnector
> > >> > > > >> > >>> > > > > > reads the Broker objects from ZK. I wanted to
> > >> > confirm
> > >> > > > that
> > >> > > > >> > this
> > >> > > > >> > >>> > will
> > >> > > > >> > >>> > > > not
> > >> > > > >> > >>> > > > > > cause any problems.
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > > > 3. Could you elaborate your proposed changes
> to
> > >> the
> > >> > > > >> > >>> > > > UpdateMetadataRequest
> > >> > > > >> > >>> > > > > > in the "Public Interfaces" section?
> > Personally, I
> > >> > find
> > >> > > > >> this
> > >> > > > >> > >>> format
> > >> > > > >> > >>> > > easy
> > >> > > > >> > >>> > > > > to
> > >> > > > >> > >>> > > > > > read in terms of wire protocol changes:
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> >
> > >> > > > >> > >>>
> > >> > > > >> >
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > > > Aditya
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> > >> > > > >> > >>> allenxwang@gmail.com>
> > >> > > > >> > >>> > > > > wrote:
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > > > > KIP is updated include rack as an optional
> > >> > property
> > >> > > > for
> > >> > > > >> > >>> broker.
> > >> > > > >> > >>> > > > Please
> > >> > > > >> > >>> > > > > > take
> > >> > > > >> > >>> > > > > > > a look and let me know if more details are
> > >> needed.
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > > > For the case where some brokers have rack
> and
> > >> some
> > >> > > do
> > >> > > > >> not,
> > >> > > > >> > >>> the
> > >> > > > >> > >>> > > > current
> > >> > > > >> > >>> > > > > > KIP
> > >> > > > >> > >>> > > > > > > uses the fail-fast behavior. If there are
> > >> > concerns,
> > >> > > we
> > >> > > > >> can
> > >> > > > >> > >>> > further
> > >> > > > >> > >>> > > > > > discuss
> > >> > > > >> > >>> > > > > > > this in the email thread or next hangout.
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen
> Wang
> > <
> > >> > > > >> > >>> > allenxwang@gmail.com
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > > > wrote:
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > > > > That's a good question. I can think of
> > three
> > >> > > actions
> > >> > > > >> if
> > >> > > > >> > the
> > >> > > > >> > >>> > rack
> > >> > > > >> > >>> > > > > > > > information is incomplete:
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > > > 1. Treat the node without rack as if it
> is
> > on
> > >> > its
> > >> > > > >> unique
> > >> > > > >> > >>> rack
> > >> > > > >> > >>> > > > > > > > 2. Disregard all rack information and
> > >> fallback
> > >> > to
> > >> > > > >> current
> > >> > > > >> > >>> > > algorithm
> > >> > > > >> > >>> > > > > > > > 3. Fail-fast
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > > > Now I think about it, one and three make
> > more
> > >> > > sense.
> > >> > > > >> The
> > >> > > > >> > >>> reason
> > >> > > > >> > >>> > > for
> > >> > > > >> > >>> > > > > > > > fail-fast is that user mistake for not
> > >> providing
> > >> > > the
> > >> > > > >> rack
> > >> > > > >> > >>> may
> > >> > > > >> > >>> > > never
> > >> > > > >> > >>> > > > > be
> > >> > > > >> > >>> > > > > > > > found if we tolerate that and the
> > assignment
> > >> may
> > >> > > not
> > >> > > > >> be
> > >> > > > >> > >>> rack
> > >> > > > >> > >>> > > aware
> > >> > > > >> > >>> > > > as
> > >> > > > >> > >>> > > > > > the
> > >> > > > >> > >>> > > > > > > > user has expected and this creates debug
> > >> > problems
> > >> > > > when
> > >> > > > >> > >>> things
> > >> > > > >> > >>> > > fail.
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > > > What do you think? If not fail-fast, is
> > there
> > >> > > anyway
> > >> > > > >> we
> > >> > > > >> > can
> > >> > > > >> > >>> > make
> > >> > > > >> > >>> > > > the
> > >> > > > >> > >>> > > > > > user
> > >> > > > >> > >>> > > > > > > > error standing out?
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen
> > >> Shapira <
> > >> > > > >> > >>> > > gwen@confluent.io>
> > >> > > > >> > >>> > > > > > > wrote:
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when some
> brokers
> > >> have
> > >> > > > rack
> > >> > > > >> > >>> > assignment
> > >> > > > >> > >>> > > > and
> > >> > > > >> > >>> > > > > > some
> > >> > > > >> > >>> > > > > > > >> don't, do we act like none of them have
> > it?
> > >> or
> > >> > > like
> > >> > > > >> > those
> > >> > > > >> > >>> > > without
> > >> > > > >> > >>> > > > > > > >> assignment are in their own rack?
> > >> > > > >> > >>> > > > > > > >>
> > >> > > > >> > >>> > > > > > > >> The first scenario is good when first
> > >> setting
> > >> > up
> > >> > > > >> > >>> > rack-awareness,
> > >> > > > >> > >>> > > > but
> > >> > > > >> > >>> > > > > > the
> > >> > > > >> > >>> > > > > > > >> second makes more sense for on-going
> > >> > maintenance
> > >> > > (I
> > >> > > > >> can
> > >> > > > >> > >>> > totally
> > >> > > > >> > >>> > > > see
> > >> > > > >> > >>> > > > > > > >> someone
> > >> > > > >> > >>> > > > > > > >> adding a node and forgetting to set the
> > rack
> > >> > > > >> property,
> > >> > > > >> > we
> > >> > > > >> > >>> > don't
> > >> > > > >> > >>> > > > want
> > >> > > > >> > >>> > > > > > > this
> > >> > > > >> > >>> > > > > > > >> to change behavior for anything except
> the
> > >> new
> > >> > > > node).
> > >> > > > >> > >>> > > > > > > >>
> > >> > > > >> > >>> > > > > > > >> What do you think?
> > >> > > > >> > >>> > > > > > > >>
> > >> > > > >> > >>> > > > > > > >> Gwen
> > >> > > > >> > >>> > > > > > > >>
> > >> > > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen
> > >> Wang <
> > >> > > > >> > >>> > > > allenxwang@gmail.com>
> > >> > > > >> > >>> > > > > > > >> wrote:
> > >> > > > >> > >>> > > > > > > >>
> > >> > > > >> > >>> > > > > > > >> > For scenario 1:
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
> > >> property
> > >> > > > file
> > >> > > > >> or
> > >> > > > >> > >>> > > > dynamically
> > >> > > > >> > >>> > > > > > set
> > >> > > > >> > >>> > > > > > > >> it in
> > >> > > > >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka
> > >> server.
> > >> > You
> > >> > > > >> would
> > >> > > > >> > do
> > >> > > > >> > >>> > that
> > >> > > > >> > >>> > > > for
> > >> > > > >> > >>> > > > > > all
> > >> > > > >> > >>> > > > > > > >> > brokers and restart the brokers one by
> > >> one.
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >> > In this scenario, the complete broker
> to
> > >> rack
> > >> > > > >> mapping
> > >> > > > >> > >>> may
> > >> > > > >> > >>> > not
> > >> > > > >> > >>> > > be
> > >> > > > >> > >>> > > > > > > >> available
> > >> > > > >> > >>> > > > > > > >> > until every broker is restarted.
> During
> > >> that
> > >> > > time
> > >> > > > >> we
> > >> > > > >> > >>> fall
> > >> > > > >> > >>> > back
> > >> > > > >> > >>> > > > to
> > >> > > > >> > >>> > > > > > > >> default
> > >> > > > >> > >>> > > > > > > >> > replica assignment algorithm.
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >> > For scenario 2:
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
> > >> property
> > >> > > > file
> > >> > > > >> or
> > >> > > > >> > >>> > > > dynamically
> > >> > > > >> > >>> > > > > > set
> > >> > > > >> > >>> > > > > > > >> it in
> > >> > > > >> > >>> > > > > > > >> > the wrapper code and start the broker.
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen
> > >> > Shapira <
> > >> > > > >> > >>> > > > gwen@confluent.io>
> > >> > > > >> > >>> > > > > > > >> wrote:
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >> > > Can you clarify the workflow for the
> > >> > > following
> > >> > > > >> > >>> scenarios:
> > >> > > > >> > >>> > > > > > > >> > >
> > >> > > > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and
> want
> > >> to
> > >> > add
> > >> > > > >> rack
> > >> > > > >> > >>> > > information
> > >> > > > >> > >>> > > > > for
> > >> > > > >> > >>> > > > > > > >> each
> > >> > > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I
> want
> > to
> > >> > > > specify
> > >> > > > >> > which
> > >> > > > >> > >>> > rack
> > >> > > > >> > >>> > > it
> > >> > > > >> > >>> > > > > > > >> belongs on
> > >> > > > >> > >>> > > > > > > >> > > while adding it.
> > >> > > > >> > >>> > > > > > > >> > >
> > >> > > > >> > >>> > > > > > > >> > > Thanks!
> > >> > > > >> > >>> > > > > > > >> > >
> > >> > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM,
> Allen
> > >> > Wang <
> > >> > > > >> > >>> > > > > allenxwang@gmail.com
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > > > >> > wrote:
> > >> > > > >> > >>> > > > > > > >> > >
> > >> > > > >> > >>> > > > > > > >> > > > We discussed the KIP in the
> hangout
> > >> > today.
> > >> > > > The
> > >> > > > >> > >>> > > > recommendation
> > >> > > > >> > >>> > > > > is
> > >> > > > >> > >>> > > > > > > to
> > >> > > > >> > >>> > > > > > > >> > make
> > >> > > > >> > >>> > > > > > > >> > > > rack as a broker property in
> > >> ZooKeeper.
> > >> > For
> > >> > > > >> users
> > >> > > > >> > >>> with
> > >> > > > >> > >>> > > > > existing
> > >> > > > >> > >>> > > > > > > rack
> > >> > > > >> > >>> > > > > > > >> > > > information stored somewhere, they
> > >> would
> > >> > > need
> > >> > > > >> to
> > >> > > > >> > >>> > retrieve
> > >> > > > >> > >>> > > > the
> > >> > > > >> > >>> > > > > > > >> > information
> > >> > > > >> > >>> > > > > > > >> > > > at broker start up and dynamically
> > set
> > >> > the
> > >> > > > rack
> > >> > > > >> > >>> > property,
> > >> > > > >> > >>> > > > > which
> > >> > > > >> > >>> > > > > > > can
> > >> > > > >> > >>> > > > > > > >> be
> > >> > > > >> > >>> > > > > > > >> > > > implemented as a wrapper to
> > bootstrap
> > >> > > broker.
> > >> > > > >> > There
> > >> > > > >> > >>> will
> > >> > > > >> > >>> > > be
> > >> > > > >> > >>> > > > no
> > >> > > > >> > >>> > > > > > > >> > interface
> > >> > > > >> > >>> > > > > > > >> > > or
> > >> > > > >> > >>> > > > > > > >> > > > pluggable implementation to
> retrieve
> > >> the
> > >> > > rack
> > >> > > > >> > >>> > information.
> > >> > > > >> > >>> > > > > > > >> > > >
> > >> > > > >> > >>> > > > > > > >> > > > The assumption is that you always
> > >> need to
> > >> > > > >> restart
> > >> > > > >> > >>> the
> > >> > > > >> > >>> > > broker
> > >> > > > >> > >>> > > > > to
> > >> > > > >> > >>> > > > > > > >> make a
> > >> > > > >> > >>> > > > > > > >> > > > change to the rack.
> > >> > > > >> > >>> > > > > > > >> > > >
> > >> > > > >> > >>> > > > > > > >> > > > Once the rack becomes a broker
> > >> property,
> > >> > it
> > >> > > > >> will
> > >> > > > >> > be
> > >> > > > >> > >>> > > possible
> > >> > > > >> > >>> > > > > to
> > >> > > > >> > >>> > > > > > > make
> > >> > > > >> > >>> > > > > > > >> > rack
> > >> > > > >> > >>> > > > > > > >> > > > part of the meta data to help the
> > >> > consumer
> > >> > > > >> choose
> > >> > > > >> > >>> which
> > >> > > > >> > >>> > in
> > >> > > > >> > >>> > > > > sync
> > >> > > > >> > >>> > > > > > > >> replica
> > >> > > > >> > >>> > > > > > > >> > > to
> > >> > > > >> > >>> > > > > > > >> > > > consume from as part of the future
> > >> > consumer
> > >> > > > >> > >>> enhancement.
> > >> > > > >> > >>> > > > > > > >> > > >
> > >> > > > >> > >>> > > > > > > >> > > > I will update the KIP.
> > >> > > > >> > >>> > > > > > > >> > > >
> > >> > > > >> > >>> > > > > > > >> > > > Thanks,
> > >> > > > >> > >>> > > > > > > >> > > > Allen
> > >> > > > >> > >>> > > > > > > >> > > >
> > >> > > > >> > >>> > > > > > > >> > > >
> > >> > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM,
> > Allen
> > >> > Wang
> > >> > > <
> > >> > > > >> > >>> > > > > > allenxwang@gmail.com>
> > >> > > > >> > >>> > > > > > > >> > wrote:
> > >> > > > >> > >>> > > > > > > >> > > >
> > >> > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout
> > but
> > >> > this
> > >> > > > KIP
> > >> > > > >> > was
> > >> > > > >> > >>> not
> > >> > > > >> > >>> > > > > > discussed
> > >> > > > >> > >>> > > > > > > >> due
> > >> > > > >> > >>> > > > > > > >> > to
> > >> > > > >> > >>> > > > > > > >> > > > > time constraint.
> > >> > > > >> > >>> > > > > > > >> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > > However, after hearing
> discussion
> > of
> > >> > > > KIP-35,
> > >> > > > >> I
> > >> > > > >> > >>> have
> > >> > > > >> > >>> > the
> > >> > > > >> > >>> > > > > > feeling
> > >> > > > >> > >>> > > > > > > >> that
> > >> > > > >> > >>> > > > > > > >> > > > > incompatibility (caused by new
> > >> broker
> > >> > > > >> property)
> > >> > > > >> > >>> > between
> > >> > > > >> > >>> > > > > > brokers
> > >> > > > >> > >>> > > > > > > >> with
> > >> > > > >> > >>> > > > > > > >> > > > > different versions  will be
> solved
> > >> > there.
> > >> > > > In
> > >> > > > >> > >>> addition,
> > >> > > > >> > >>> > > > > having
> > >> > > > >> > >>> > > > > > > >> stack
> > >> > > > >> > >>> > > > > > > >> > in
> > >> > > > >> > >>> > > > > > > >> > > > > broker property as meta data may
> > >> also
> > >> > > help
> > >> > > > >> > >>> consumers
> > >> > > > >> > >>> > in
> > >> > > > >> > >>> > > > the
> > >> > > > >> > >>> > > > > > > >> future.
> > >> > > > >> > >>> > > > > > > >> > So
> > >> > > > >> > >>> > > > > > > >> > > I
> > >> > > > >> > >>> > > > > > > >> > > > am
> > >> > > > >> > >>> > > > > > > >> > > > > open to adding stack property to
> > >> > broker.
> > >> > > > >> > >>> > > > > > > >> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss this in
> > the
> > >> > next
> > >> > > > KIP
> > >> > > > >> > >>> hangout.
> > >> > > > >> > >>> > > > > > > >> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM,
> > >> Allen
> > >> > > > Wang <
> > >> > > > >> > >>> > > > > > > allenxwang@gmail.com
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >> > > > wrote:
> > >> > > > >> > >>> > > > > > > >> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >> Can you send me the information
> > on
> > >> the
> > >> > > > next
> > >> > > > >> KIP
> > >> > > > >> > >>> > > hangout?
> > >> > > > >> > >>> > > > > > > >> > > > >>
> > >> > > > >> > >>> > > > > > > >> > > > >> Currently the broker-rack
> mapping
> > >> is
> > >> > not
> > >> > > > >> > cached.
> > >> > > > >> > >>> In
> > >> > > > >> > >>> > > > > > KafkaApis,
> > >> > > > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is
> > called
> > >> > each
> > >> > > > >> time
> > >> > > > >> > the
> > >> > > > >> > >>> > > mapping
> > >> > > > >> > >>> > > > > is
> > >> > > > >> > >>> > > > > > > >> needed
> > >> > > > >> > >>> > > > > > > >> > > for
> > >> > > > >> > >>> > > > > > > >> > > > >> auto topic creation. This will
> > >> ensure
> > >> > > > latest
> > >> > > > >> > >>> mapping
> > >> > > > >> > >>> > is
> > >> > > > >> > >>> > > > > used
> > >> > > > >> > >>> > > > > > at
> > >> > > > >> > >>> > > > > > > >> any
> > >> > > > >> > >>> > > > > > > >> > > > time.
> > >> > > > >> > >>> > > > > > > >> > > > >>
> > >> > > > >> > >>> > > > > > > >> > > > >> The ability to get the complete
> > >> > mapping
> > >> > > > >> makes
> > >> > > > >> > it
> > >> > > > >> > >>> > simple
> > >> > > > >> > >>> > > > to
> > >> > > > >> > >>> > > > > > > reuse
> > >> > > > >> > >>> > > > > > > >> the
> > >> > > > >> > >>> > > > > > > >> > > > same
> > >> > > > >> > >>> > > > > > > >> > > > >> interface in command line
> tools.
> > >> > > > >> > >>> > > > > > > >> > > > >>
> > >> > > > >> > >>> > > > > > > >> > > > >>
> > >> > > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01
> AM,
> > >> > Aditya
> > >> > > > >> > >>> Auradkar <
> > >> > > > >> > >>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid
> >
> > >> > wrote:
> > >> > > > >> > >>> > > > > > > >> > > > >>
> > >> > > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during
> > the
> > >> > next
> > >> > > > KIP
> > >> > > > >> > >>> hangout?
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack
> > >> > locator
> > >> > > > can
> > >> > > > >> be
> > >> > > > >> > >>> useful
> > >> > > > >> > >>> > > > but I
> > >> > > > >> > >>> > > > > > do
> > >> > > > >> > >>> > > > > > > >> see a
> > >> > > > >> > >>> > > > > > > >> > > few
> > >> > > > >> > >>> > > > > > > >> > > > >>> concerns:
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as
> described
> > in
> > >> > the
> > >> > > > >> > >>> document),
> > >> > > > >> > >>> > > > implies
> > >> > > > >> > >>> > > > > > that
> > >> > > > >> > >>> > > > > > > >> it
> > >> > > > >> > >>> > > > > > > >> > can
> > >> > > > >> > >>> > > > > > > >> > > > >>> discover rack information for
> > any
> > >> > node
> > >> > > in
> > >> > > > >> the
> > >> > > > >> > >>> > cluster.
> > >> > > > >> > >>> > > > How
> > >> > > > >> > >>> > > > > > > does
> > >> > > > >> > >>> > > > > > > >> it
> > >> > > > >> > >>> > > > > > > >> > > deal
> > >> > > > >> > >>> > > > > > > >> > > > >>> with rack location changes?
> For
> > >> > > example,
> > >> > > > >> if I
> > >> > > > >> > >>> moved
> > >> > > > >> > >>> > > > broker
> > >> > > > >> > >>> > > > > > id
> > >> > > > >> > >>> > > > > > > >> (1)
> > >> > > > >> > >>> > > > > > > >> > > from
> > >> > > > >> > >>> > > > > > > >> > > > >>> rack
> > >> > > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start
> > that
> > >> > > broker
> > >> > > > >> with
> > >> > > > >> > a
> > >> > > > >> > >>> > newer
> > >> > > > >> > >>> > > > rack
> > >> > > > >> > >>> > > > > > > >> config.
> > >> > > > >> > >>> > > > > > > >> > If
> > >> > > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker
> ->
> > >> rack
> > >> > > > >> > >>> information at
> > >> > > > >> > >>> > > > start
> > >> > > > >> > >>> > > > > up
> > >> > > > >> > >>> > > > > > > >> time,
> > >> > > > >> > >>> > > > > > > >> > > any
> > >> > > > >> > >>> > > > > > > >> > > > >>> change to a broker will
> require
> > >> > > bouncing
> > >> > > > >> the
> > >> > > > >> > >>> entire
> > >> > > > >> > >>> > > > > cluster
> > >> > > > >> > >>> > > > > > > >> since
> > >> > > > >> > >>> > > > > > > >> > > > >>> createTopic requests can be
> sent
> > >> to
> > >> > any
> > >> > > > >> node
> > >> > > > >> > in
> > >> > > > >> > >>> the
> > >> > > > >> > >>> > > > > cluster.
> > >> > > > >> > >>> > > > > > > >> > > > >>> For this reason it may be
> > simpler
> > >> to
> > >> > > have
> > >> > > > >> each
> > >> > > > >> > >>> node
> > >> > > > >> > >>> > be
> > >> > > > >> > >>> > > > > aware
> > >> > > > >> > >>> > > > > > > of
> > >> > > > >> > >>> > > > > > > >> its
> > >> > > > >> > >>> > > > > > > >> > > own
> > >> > > > >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK
> during
> > >> > start
> > >> > > up
> > >> > > > >> > time.
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator
> relies
> > >> on
> > >> > an
> > >> > > > >> > external
> > >> > > > >> > >>> > > service
> > >> > > > >> > >>> > > > > > being
> > >> > > > >> > >>> > > > > > > >> > > available
> > >> > > > >> > >>> > > > > > > >> > > > >>> to
> > >> > > > >> > >>> > > > > > > >> > > > >>> serve rack information.
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up
> > how
> > >> a
> > >> > > > couple
> > >> > > > >> of
> > >> > > > >> > >>> other
> > >> > > > >> > >>> > > > > systems
> > >> > > > >> > >>> > > > > > > deal
> > >> > > > >> > >>> > > > > > > >> > with
> > >> > > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> > >> > > > >> > >>> > > > > > > >> > > > >>> For Cassandra some interesting
> > >> modes
> > >> > > are:
> > >> > > > >> > >>> > > > > > > >> > > > >>> (Property File configuration)
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > >
> > >> > > > >> > >>> > > > > > > >> > >
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >>
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> >
> > >> > > > >> > >>>
> > >> > > > >> >
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > >> > > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > >
> > >> > > > >> > >>> > > > > > > >> > >
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >>
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> >
> > >> > > > >> > >>>
> > >> > > > >> >
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > > >>> Voldemort does a static node
> ->
> > >> zone
> > >> > > > >> > assignment
> > >> > > > >> > >>> > based
> > >> > > > >> > >>> > > on
> > >> > > > >> > >>> > > > > > > >> > > configuration.
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > > >>> Aditya
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05
> > AM,
> > >> > Allen
> > >> > > > >> Wang <
> > >> > > > >> > >>> > > > > > > >> allenxwang@gmail.com
> > >> > > > >> > >>> > > > > > > >> > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > > >>> > I would like to see if we
> can
> > do
> > >> > > both:
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable
> > to
> > >> > > > >> facilitate
> > >> > > > >> > >>> > migration
> > >> > > > >> > >>> > > > > with
> > >> > > > >> > >>> > > > > > > >> > existing
> > >> > > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > - Make rack an optional
> > property
> > >> > for
> > >> > > > >> broker.
> > >> > > > >> > >>> If
> > >> > > > >> > >>> > rack
> > >> > > > >> > >>> > > > is
> > >> > > > >> > >>> > > > > > > >> available
> > >> > > > >> > >>> > > > > > > >> > > > from
> > >> > > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as source
> of
> > >> > truth.
> > >> > > > For
> > >> > > > >> > users
> > >> > > > >> > >>> > with
> > >> > > > >> > >>> > > > > > existing
> > >> > > > >> > >>> > > > > > > >> > > > >>> broker-rack
> > >> > > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they
> > can
> > >> > use
> > >> > > > the
> > >> > > > >> > >>> pluggable
> > >> > > > >> > >>> > > way
> > >> > > > >> > >>> > > > > or
> > >> > > > >> > >>> > > > > > > they
> > >> > > > >> > >>> > > > > > > >> > can
> > >> > > > >> > >>> > > > > > > >> > > > >>> transfer
> > >> > > > >> > >>> > > > > > > >> > > > >>> > the mapping to the broker
> rack
> > >> > > > property.
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is
> > what
> > >> > > happens
> > >> > > > >> at
> > >> > > > >> > >>> rolling
> > >> > > > >> > >>> > > > > upgrade
> > >> > > > >> > >>> > > > > > > >> when
> > >> > > > >> > >>> > > > > > > >> > we
> > >> > > > >> > >>> > > > > > > >> > > > have
> > >> > > > >> > >>> > > > > > > >> > > > >>> > rack as a broker property.
> For
> > >> > > brokers
> > >> > > > >> with
> > >> > > > >> > >>> older
> > >> > > > >> > >>> > > > > version
> > >> > > > >> > >>> > > > > > of
> > >> > > > >> > >>> > > > > > > >> > Kafka,
> > >> > > > >> > >>> > > > > > > >> > > > >>> will it
> > >> > > > >> > >>> > > > > > > >> > > > >>> > cause problem for them? If
> so,
> > >> is
> > >> > > there
> > >> > > > >> any
> > >> > > > >> > >>> > > > workaround?
> > >> > > > >> > >>> > > > > I
> > >> > > > >> > >>> > > > > > > also
> > >> > > > >> > >>> > > > > > > >> > > think
> > >> > > > >> > >>> > > > > > > >> > > > it
> > >> > > > >> > >>> > > > > > > >> > > > >>> > would be better not to have
> > >> rack in
> > >> > > the
> > >> > > > >> > >>> controller
> > >> > > > >> > >>> > > > wire
> > >> > > > >> > >>> > > > > > > >> protocol
> > >> > > > >> > >>> > > > > > > >> > > but
> > >> > > > >> > >>> > > > > > > >> > > > >>> not
> > >> > > > >> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > Thanks,
> > >> > > > >> > >>> > > > > > > >> > > > >>> > Allen
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55
> > PM,
> > >> > Todd
> > >> > > > >> > Palino <
> > >> > > > >> > >>> > > > > > > >> tpalino@gmail.com>
> > >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea
> of a
> > >> > > > pluggable
> > >> > > > >> > >>> locator.
> > >> > > > >> > >>> > > For
> > >> > > > >> > >>> > > > > > > >> example, we
> > >> > > > >> > >>> > > > > > > >> > > > >>> already
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > have an interface for
> > >> discovering
> > >> > > > >> > >>> information
> > >> > > > >> > >>> > > about
> > >> > > > >> > >>> > > > > the
> > >> > > > >> > >>> > > > > > > >> > physical
> > >> > > > >> > >>> > > > > > > >> > > > >>> location
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish
> > the
> > >> > idea
> > >> > > > of
> > >> > > > >> > >>> having to
> > >> > > > >> > >>> > > > > > maintain
> > >> > > > >> > >>> > > > > > > >> data
> > >> > > > >> > >>> > > > > > > >> > in
> > >> > > > >> > >>> > > > > > > >> > > > >>> > multiple
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > places.
> > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > -Todd
> > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at
> 4:48
> > >> PM,
> > >> > > > Aditya
> > >> > > > >> > >>> > Auradkar <
> > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > aauradkar@linkedin.com.invalid
> > >> >
> > >> > > > wrote:
> > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this
> > KIP
> > >> > > Allen.
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that
> > >> having a
> > >> > > > >> > >>> RackLocator
> > >> > > > >> > >>> > > class
> > >> > > > >> > >>> > > > > that
> > >> > > > >> > >>> > > > > > > is
> > >> > > > >> > >>> > > > > > > >> > > > pluggable
> > >> > > > >> > >>> > > > > > > >> > > > >>> > seems
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The
> KIP
> > >> > refers
> > >> > > > to
> > >> > > > >> > >>> > potentially
> > >> > > > >> > >>> > > > > > non-ZK
> > >> > > > >> > >>> > > > > > > >> > storage
> > >> > > > >> > >>> > > > > > > >> > > > >>> for the
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack info which I don't
> > >> think
> > >> > is
> > >> > > > >> > >>> necessary.
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist
> > this
> > >> > info
> > >> > > in
> > >> > > > >> zk
> > >> > > > >> > >>> under
> > >> > > > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > similar to other broker
> > >> > > properties
> > >> > > > >> and
> > >> > > > >> > >>> add a
> > >> > > > >> > >>> > > > config
> > >> > > > >> > >>> > > > > in
> > >> > > > >> > >>> > > > > > > >> > > > KafkaConfig
> > >> > > > >> > >>> > > > > > > >> > > > >>> > called
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > "rack".
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > >> > > > >> > >>> > > > > > > >> > > "rack":
> > >> > > > >> > >>> > > > > > > >> > > > >>> > "abc"}
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at
> > 2:30
> > >> > PM,
> > >> > > > Gwen
> > >> > > > >> > >>> Shapira
> > >> > > > >> > >>> > <
> > >> > > > >> > >>> > > > > > > >> > > gwen@confluent.io
> > >> > > > >> > >>> > > > > > > >> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > wrote:
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for
> > putting
> > >> > out a
> > >> > > > KIP
> > >> > > > >> > for
> > >> > > > >> > >>> > this.
> > >> > > > >> > >>> > > > This
> > >> > > > >> > >>> > > > > > is
> > >> > > > >> > >>> > > > > > > >> super
> > >> > > > >> > >>> > > > > > > >> > > > >>> important
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > for
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > production deployments
> > of
> > >> > > Kafka.
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want
> > "as
> > >> > many
> > >> > > > >> racks
> > >> > > > >> > as
> > >> > > > >> > >>> > > > > possible"?
> > >> > > > >> > >>> > > > > > > I'd
> > >> > > > >> > >>> > > > > > > >> > want
> > >> > > > >> > >>> > > > > > > >> > > to
> > >> > > > >> > >>> > > > > > > >> > > > >>> > balance
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > between safety (more
> > >> racks)
> > >> > and
> > >> > > > >> > network
> > >> > > > >> > >>> > > > > utilization
> > >> > > > >> > >>> > > > > > > >> > (traffic
> > >> > > > >> > >>> > > > > > > >> > > > >>> within a
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > uses the
> high-bandwidth
> > >> TOR
> > >> > > > >> switch).
> > >> > > > >> > One
> > >> > > > >> > >>> > > replica
> > >> > > > >> > >>> > > > > on
> > >> > > > >> > >>> > > > > > a
> > >> > > > >> > >>> > > > > > > >> > > different
> > >> > > > >> > >>> > > > > > > >> > > > >>> rack
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > and
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack
> > (if
> > >> > > > possible)
> > >> > > > >> > >>> sounds
> > >> > > > >> > >>> > > > better
> > >> > > > >> > >>> > > > > to
> > >> > > > >> > >>> > > > > > > me.
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class
> > >> seems
> > >> > > > overly
> > >> > > > >> > >>> complex
> > >> > > > >> > >>> > > > > compared
> > >> > > > >> > >>> > > > > > to
> > >> > > > >> > >>> > > > > > > >> > > adding a
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack.number
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > property to the broker
> > >> > > properties
> > >> > > > >> > file.
> > >> > > > >> > >>> Why
> > >> > > > >> > >>> > do
> > >> > > > >> > >>> > > > we
> > >> > > > >> > >>> > > > > > want
> > >> > > > >> > >>> > > > > > > >> > that?
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015
> at
> > >> 12:15
> > >> > > PM,
> > >> > > > >> > Allen
> > >> > > > >> > >>> > Wang <
> > >> > > > >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > wrote:
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka
> > Developers,
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just created
> KIP-36
> > >> for
> > >> > > rack
> > >> > > > >> aware
> > >> > > > >> > >>> > replica
> > >> > > > >> > >>> > > > > > > >> assignment.
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > >
> > >> > > > >> > >>> > > > > > > >> > >
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >>
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> >
> > >> > > > >> > >>>
> > >> > > > >> >
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to
> utilize
> > >> the
> > >> > > > >> isolation
> > >> > > > >> > >>> > > provided
> > >> > > > >> > >>> > > > by
> > >> > > > >> > >>> > > > > > the
> > >> > > > >> > >>> > > > > > > >> > racks
> > >> > > > >> > >>> > > > > > > >> > > in
> > >> > > > >> > >>> > > > > > > >> > > > >>> data
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > center
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute
> > replicas
> > >> to
> > >> > > > racks
> > >> > > > >> to
> > >> > > > >> > >>> > provide
> > >> > > > >> > >>> > > > > fault
> > >> > > > >> > >>> > > > > > > >> > > tolerance.
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are
> welcome.
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> > >
> > >> > > > >> > >>> > > > > > > >> > > > >>> >
> > >> > > > >> > >>> > > > > > > >> > > > >>>
> > >> > > > >> > >>> > > > > > > >> > > > >>
> > >> > > > >> > >>> > > > > > > >> > > > >>
> > >> > > > >> > >>> > > > > > > >> > > > >
> > >> > > > >> > >>> > > > > > > >> > > >
> > >> > > > >> > >>> > > > > > > >> > >
> > >> > > > >> > >>> > > > > > > >> >
> > >> > > > >> > >>> > > > > > > >>
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> >
> > >> > > > >> > >>>
> > >> > > > >> > >>>
> > >> > > > >> > >>>
> > >> > > > >> > >>> --
> > >> > > > >> > >>> Thanks,
> > >> > > > >> > >>> Neha
> > >> > > > >> > >>>
> > >> > > > >> > >>
> > >> > > > >> > >>
> > >> > > > >> > >
> > >> > > > >> >
> > >> > > > >>
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Jun Rao <ju...@confluent.io>.
Allen,

Another way to do this is the following.

When inter.broker.protocol.version is set to 0.9.0, the broker will write
the broker info in ZK using version 2, ignoring the rack info.

When inter.broker.protocol.version is set to 0.9.1, the broker will write
the broker info in ZK using version 3, including the rack info.

If one follows the upgrade process, after the 2nd round of rolling bounces,
every broker is capable of parsing version 3 of broker info in ZK. This is
when the rack-aware feature will be used.


Thanks,

Jun

On Tue, Jan 12, 2016 at 12:19 PM, Allen Wang <al...@gmail.com> wrote:

> Regarding the JSON version of Broker:
>
> I don't why the ZkUtils.getBrokerInfo() restricts the JSON versions it can
> read. It will throw exception if version is not 1 or 2. Seems to me that it
> will cause compatibility problem whenever the version needs to be changed
> and make the upgrade path difficult.
>
> One option we have is to make rack also part of version 2 and keep the
> version 2 unchanged for this update. This will make the old clients
> compatible. During rolling upgrade, it will also avoid problems if the
> controller/broker is still the old version.
>
> However, ZkUtils.getBrokerInfo() will be updated to return the Broker with
> rack so the rack information will be available once the server/client is
> upgraded to the latest version.
>
>
>
> On Wed, Jan 6, 2016 at 6:28 PM, Allen Wang <al...@gmail.com> wrote:
>
> > Updated KIP according to Jun's comment and included changes to TMR.
> >
> > On Tue, Jan 5, 2016 at 5:59 PM, Jun Rao <ju...@confluent.io> wrote:
> >
> >> Hi, Allen,
> >>
> >> A couple of minor comments on the KIP.
> >>
> >> 1. The version of the broker JSON string says 2. It should be 3.
> >>
> >> 2. The new version of UpdateMetadataRequest should be 2, instead of 1.
> >> Could you include the full wire protocol of version 2 of
> >> UpdateMetadataRequest and highlight the changed part?
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <al...@gmail.com>
> wrote:
> >>
> >> > Jun and I had a chance to discuss it in a meeting and it is agreed to
> >> > change the TMR in a different patch.
> >> >
> >> > I can change the KIP to include rack in TMR. The essential change is
> to
> >> add
> >> > rack into class BrokerEndPoint and make TMR version aware.
> >> >
> >> >
> >> >
> >> > On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
> >> > aauradkar@linkedin.com.invalid> wrote:
> >> >
> >> > > Jun/Allen -
> >> > >
> >> > > Did we ever actually agree on whether we should evolve the TMR to
> >> include
> >> > > rack info or not?
> >> > > I don't feel strongly about it but I if it's the right thing to do
> we
> >> > > should probably do it in this KIP (can be a separate patch).. it
> >> isn't a
> >> > > large change.
> >> > >
> >> > > Aditya
> >> > >
> >> > > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <al...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Added the rolling upgrade instruction in the KIP, similar to those
> >> in
> >> > > 0.9.0
> >> > > > release notes.
> >> > > >
> >> > > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <
> allenxwang@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > > > Hi Jun,
> >> > > > >
> >> > > > > The reason that TopicMetadataResponse is not included in the KIP
> >> is
> >> > > that
> >> > > > > it currently is not version aware . So we need to introduce
> >> version
> >> > to
> >> > > it
> >> > > > > in order to make sure backward compatibility. It seems to me a
> big
> >> > > > change.
> >> > > > > Do we want to couple it with this KIP? Do we need to further
> >> discuss
> >> > > what
> >> > > > > information to include in the new version besides rack? For
> >> example,
> >> > > > should
> >> > > > > we include broker security protocol in TopicMetadataResponse?
> >> > > > >
> >> > > > > The other option is to make it a separate KIP to make
> >> > > > > TopicMetadataResponse version aware and decide what to include,
> >> and
> >> > > make
> >> > > > > this KIP focus on the rack aware algorithm, admin tools  and
> >> related
> >> > > > > changes to inter-broker protocol .
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Allen
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <ju...@confluent.io>
> >> wrote:
> >> > > > >
> >> > > > >> Allen,
> >> > > > >>
> >> > > > >> Thanks for the proposal. A few comments.
> >> > > > >>
> >> > > > >> 1. Since this KIP changes the inter broker communication
> protocol
> >> > > > >> (UpdateMetadataRequest), we will need to document the upgrade
> >> path
> >> > > > >> (similar
> >> > > > >> to what's described in
> >> > > > >> http://kafka.apache.org/090/documentation.html#upgrade).
> >> > > > >>
> >> > > > >> 2. It might be useful to include the rack info of the broker in
> >> > > > >> TopicMetadataResponse. This can be useful for administrative
> >> tasks,
> >> > as
> >> > > > >> well
> >> > > > >> as read affinity in the future.
> >> > > > >>
> >> > > > >> Jun
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <
> >> allenxwang@gmail.com>
> >> > > > wrote:
> >> > > > >>
> >> > > > >> > If there are no more comments I would like to call for a
> vote.
> >> > > > >> >
> >> > > > >> >
> >> > > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <
> >> > allenxwang@gmail.com>
> >> > > > >> wrote:
> >> > > > >> >
> >> > > > >> > > KIP is updated with more details and how to handle the
> >> situation
> >> > > > where
> >> > > > >> > > rack information is incomplete.
> >> > > > >> > >
> >> > > > >> > > In the situation where rack information is incomplete, but
> we
> >> > want
> >> > > > to
> >> > > > >> > > continue with the assignment, I have suggested to ignore
> all
> >> > rack
> >> > > > >> > > information and fallback to original algorithm. The reason
> is
> >> > > > >> explained
> >> > > > >> > > below:
> >> > > > >> > >
> >> > > > >> > > The other options are to assume that the broker without the
> >> rack
> >> > > > >> belong
> >> > > > >> > to
> >> > > > >> > > its own unique rack, or they belong to one "default" rack.
> >> > Either
> >> > > > way
> >> > > > >> we
> >> > > > >> > > choose, it is highly likely to result in uneven number of
> >> > brokers
> >> > > in
> >> > > > >> > racks,
> >> > > > >> > > and it is quite possible that the "made up" racks will have
> >> much
> >> > > > fewer
> >> > > > >> > > number of brokers. As I explained in the KIP, uneven number
> >> of
> >> > > > >> brokers in
> >> > > > >> > > racks will lead to uneven distribution of replicas among
> >> brokers
> >> > > > (even
> >> > > > >> > > though the leader distribution is still even). The brokers
> in
> >> > the
> >> > > > rack
> >> > > > >> > that
> >> > > > >> > > has fewer number of brokers will get more replicas per
> broker
> >> > than
> >> > > > >> > brokers
> >> > > > >> > > in other racks.
> >> > > > >> > >
> >> > > > >> > > Given this fact and the replica assignment produced will be
> >> > > > incorrect
> >> > > > >> > > anyway from rack aware point of view, ignoring all rack
> >> > > information
> >> > > > >> and
> >> > > > >> > > fallback to the original algorithm is not a bad choice
> since
> >> it
> >> > > will
> >> > > > >> at
> >> > > > >> > > least have a better guarantee of replica distribution.
> >> > > > >> > >
> >> > > > >> > > Also for command line tools it gives user a choice if for
> any
> >> > > reason
> >> > > > >> they
> >> > > > >> > > want to ignore rack information and fallback to the
> original
> >> > > > >> algorithm.
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <
> >> > allenxwang@gmail.com
> >> > > >
> >> > > > >> > wrote:
> >> > > > >> > >
> >> > > > >> > >> I am busy with some time pressing issues for the last few
> >> > days. I
> >> > > > >> will
> >> > > > >> > >> think about how the incomplete rack information will
> affect
> >> the
> >> > > > >> balance
> >> > > > >> > and
> >> > > > >> > >> update the KIP by early next week.
> >> > > > >> > >>
> >> > > > >> > >> Thanks,
> >> > > > >> > >> Allen
> >> > > > >> > >>
> >> > > > >> > >>
> >> > > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <
> >> > neha@confluent.io
> >> > > >
> >> > > > >> > wrote:
> >> > > > >> > >>
> >> > > > >> > >>> Few suggestions on improving the KIP
> >> > > > >> > >>>
> >> > > > >> > >>> *If some brokers have rack, and some do not, the
> algorithm
> >> > will
> >> > > > >> thrown
> >> > > > >> > an
> >> > > > >> > >>> > exception. This is to prevent incorrect assignment
> >> caused by
> >> > > > user
> >> > > > >> > >>> error.*
> >> > > > >> > >>>
> >> > > > >> > >>>
> >> > > > >> > >>> In the KIP, can you clearly state the user-facing
> behavior
> >> > when
> >> > > > some
> >> > > > >> > >>> brokers have rack information and some don't. Which
> actions
> >> > and
> >> > > > >> > requests
> >> > > > >> > >>> will error out and how?
> >> > > > >> > >>>
> >> > > > >> > >>> *Even distribution of partition leadership among brokers*
> >> > > > >> > >>>
> >> > > > >> > >>>
> >> > > > >> > >>> There is some information about arranging the sorted
> broker
> >> > list
> >> > > > >> > >>> interlaced
> >> > > > >> > >>> with rack ids. Can you describe the changes to the
> current
> >> > > > algorithm
> >> > > > >> > in a
> >> > > > >> > >>> little more detail? How does this interlacing work if
> only
> >> a
> >> > > > subset
> >> > > > >> of
> >> > > > >> > >>> brokers have the rack id configured? Does this still work
> >> if
> >> > > > uneven
> >> > > > >> #
> >> > > > >> > of
> >> > > > >> > >>> brokers are assigned to each rack? It might work, I'm
> >> looking
> >> > > for
> >> > > > >> more
> >> > > > >> > >>> details on the changes, since it will affect the behavior
> >> seen
> >> > > by
> >> > > > >> the
> >> > > > >> > >>> user
> >> > > > >> > >>> - imbalance on either the leaders or data or both.
> >> > > > >> > >>>
> >> > > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> >> > > > >> > aauradkar@linkedin.com>
> >> > > > >> > >>> wrote:
> >> > > > >> > >>>
> >> > > > >> > >>> > I think this sounds reasonable. Anyone else have
> >> comments?
> >> > > > >> > >>> >
> >> > > > >> > >>> > Aditya
> >> > > > >> > >>> >
> >> > > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> >> > > > allenxwang@gmail.com
> >> > > > >> >
> >> > > > >> > >>> wrote:
> >> > > > >> > >>> >
> >> > > > >> > >>> > > During the discussion in the hangout, it was
> mentioned
> >> > that
> >> > > it
> >> > > > >> > would
> >> > > > >> > >>> be
> >> > > > >> > >>> > > desirable that consumers know the rack information of
> >> the
> >> > > > >> brokers
> >> > > > >> > so
> >> > > > >> > >>> that
> >> > > > >> > >>> > > they can consume from the broker in the same rack to
> >> > reduce
> >> > > > >> > latency.
> >> > > > >> > >>> As I
> >> > > > >> > >>> > > understand this will only be beneficial if consumer
> can
> >> > > > consume
> >> > > > >> > from
> >> > > > >> > >>> any
> >> > > > >> > >>> > > broker in ISR, which is not possible now.
> >> > > > >> > >>> > >
> >> > > > >> > >>> > > I suggest we skip the change to TMR. Once the change
> is
> >> > made
> >> > > > to
> >> > > > >> > >>> consumer
> >> > > > >> > >>> > to
> >> > > > >> > >>> > > be able to consume from any broker in ISR, the rack
> >> > > > information
> >> > > > >> can
> >> > > > >> > >>> be
> >> > > > >> > >>> > > added to TMR.
> >> > > > >> > >>> > >
> >> > > > >> > >>> > > Another thing I want to confirm is  command line
> >> > behavior. I
> >> > > > >> think
> >> > > > >> > >>> the
> >> > > > >> > >>> > > desirable default behavior is to fail fast on command
> >> line
> >> > > for
> >> > > > >> > >>> incomplete
> >> > > > >> > >>> > > rack mapping. The error message can include further
> >> > > > instruction
> >> > > > >> > that
> >> > > > >> > >>> > tells
> >> > > > >> > >>> > > the user to add an extra argument (like
> >> > > > >> "--allow-partial-rackinfo")
> >> > > > >> > >>> to
> >> > > > >> > >>> > > suppress the error and do an imperfect rack aware
> >> > > assignment.
> >> > > > If
> >> > > > >> > the
> >> > > > >> > >>> > > default behavior is to allow incomplete mapping, the
> >> error
> >> > > can
> >> > > > >> > still
> >> > > > >> > >>> be
> >> > > > >> > >>> > > easily missed.
> >> > > > >> > >>> > >
> >> > > > >> > >>> > > The affected command line tools are TopicCommand and
> >> > > > >> > >>> > > ReassignPartitionsCommand.
> >> > > > >> > >>> > >
> >> > > > >> > >>> > > Thanks,
> >> > > > >> > >>> > > Allen
> >> > > > >> > >>> > >
> >> > > > >> > >>> > >
> >> > > > >> > >>> > >
> >> > > > >> > >>> > >
> >> > > > >> > >>> > >
> >> > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> >> > > > >> > >>> > aauradkar@linkedin.com>
> >> > > > >> > >>> > > wrote:
> >> > > > >> > >>> > >
> >> > > > >> > >>> > > > Hi Allen,
> >> > > > >> > >>> > > >
> >> > > > >> > >>> > > > For TopicMetadataResponse to understand version,
> you
> >> can
> >> > > > bump
> >> > > > >> up
> >> > > > >> > >>> the
> >> > > > >> > >>> > > > request version itself. Based on the version of the
> >> > > request,
> >> > > > >> the
> >> > > > >> > >>> > response
> >> > > > >> > >>> > > > can be appropriately serialized. It shouldn't be a
> >> huge
> >> > > > >> change.
> >> > > > >> > For
> >> > > > >> > >>> > > > example: We went through something similar for
> >> > > > ProduceRequest
> >> > > > >> > >>> recently
> >> > > > >> > >>> > (
> >> > > > >> > >>> > > > https://reviews.apache.org/r/33378/)
> >> > > > >> > >>> > > > I guess the reason protocol information is not
> >> included
> >> > in
> >> > > > the
> >> > > > >> > TMR
> >> > > > >> > >>> is
> >> > > > >> > >>> > > > because the topic itself is independent of any
> >> > particular
> >> > > > >> > protocol
> >> > > > >> > >>> (SSL
> >> > > > >> > >>> > > vs
> >> > > > >> > >>> > > > Plaintext). Having said that, I'm not sure we even
> >> need
> >> > > rack
> >> > > > >> > >>> > information
> >> > > > >> > >>> > > in
> >> > > > >> > >>> > > > TMR. What usecase were you thinking of initially?
> >> > > > >> > >>> > > >
> >> > > > >> > >>> > > > For 1 - I'd be fine with adding an option to the
> >> command
> >> > > > line
> >> > > > >> > tools
> >> > > > >> > >>> > that
> >> > > > >> > >>> > > > check rack assignment. For e.g.
> >> "--strict-assignment" or
> >> > > > >> > something
> >> > > > >> > >>> > > similar.
> >> > > > >> > >>> > > >
> >> > > > >> > >>> > > > Aditya
> >> > > > >> > >>> > > >
> >> > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> >> > > > >> > allenxwang@gmail.com>
> >> > > > >> > >>> > > wrote:
> >> > > > >> > >>> > > >
> >> > > > >> > >>> > > > > For 2 and 3, I have updated the KIP. Please take
> a
> >> > look.
> >> > > > One
> >> > > > >> > >>> thing I
> >> > > > >> > >>> > > have
> >> > > > >> > >>> > > > > changed is removing the proposal to add rack to
> >> > > > >> > >>> > TopicMetadataResponse.
> >> > > > >> > >>> > > > The
> >> > > > >> > >>> > > > > reason is that unlike UpdateMetadataRequest,
> >> > > > >> > >>> TopicMetadataResponse
> >> > > > >> > >>> > does
> >> > > > >> > >>> > > > not
> >> > > > >> > >>> > > > > understand version. I don't see a way to include
> >> rack
> >> > > > >> without
> >> > > > >> > >>> > breaking
> >> > > > >> > >>> > > > old
> >> > > > >> > >>> > > > > version of clients. That's probably why secure
> >> > protocol
> >> > > is
> >> > > > >> not
> >> > > > >> > >>> > included
> >> > > > >> > >>> > > > in
> >> > > > >> > >>> > > > > the TopicMetadataResponse either. I think it will
> >> be a
> >> > > > much
> >> > > > >> > >>> bigger
> >> > > > >> > >>> > > change
> >> > > > >> > >>> > > > > to include rack in TopicMetadataResponse.
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > > > For 1, my concern is that doing rack aware
> >> assignment
> >> > > > >> without
> >> > > > >> > >>> > complete
> >> > > > >> > >>> > > > > broker to rack mapping will result in assignment
> >> that
> >> > is
> >> > > > not
> >> > > > >> > rack
> >> > > > >> > >>> > aware
> >> > > > >> > >>> > > > and
> >> > > > >> > >>> > > > > fail to provide fault tolerance in the event of
> >> rack
> >> > > > outage.
> >> > > > >> > This
> >> > > > >> > >>> > kind
> >> > > > >> > >>> > > of
> >> > > > >> > >>> > > > > problem will be difficult to surface. And the
> cost
> >> of
> >> > > this
> >> > > > >> > >>> problem is
> >> > > > >> > >>> > > > high:
> >> > > > >> > >>> > > > > you have to do partition reassignment if you are
> >> lucky
> >> > > to
> >> > > > >> spot
> >> > > > >> > >>> the
> >> > > > >> > >>> > > > problem
> >> > > > >> > >>> > > > > early on or face the consequence of data loss
> >> during
> >> > > real
> >> > > > >> rack
> >> > > > >> > >>> > outage.
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > > > I do see the concern of fail-fast as it might
> also
> >> > cause
> >> > > > >> data
> >> > > > >> > >>> loss if
> >> > > > >> > >>> > > > > producer is not able produce the message due to
> >> topic
> >> > > > >> creation
> >> > > > >> > >>> > failure.
> >> > > > >> > >>> > > > Is
> >> > > > >> > >>> > > > > it feasible to treat dynamic topic creation and
> >> > command
> >> > > > >> tools
> >> > > > >> > >>> > > > differently?
> >> > > > >> > >>> > > > > We allow dynamic topic creation with incomplete
> >> > > > broker-rack
> >> > > > >> > >>> mapping
> >> > > > >> > >>> > and
> >> > > > >> > >>> > > > > fail fast in command line. Another option is to
> let
> >> > user
> >> > > > >> > >>> determine
> >> > > > >> > >>> > the
> >> > > > >> > >>> > > > > behavior for command line. For example, by
> default
> >> > fail
> >> > > > >> fast in
> >> > > > >> > >>> > command
> >> > > > >> > >>> > > > > line but allow incomplete broker-rack mapping if
> >> > another
> >> > > > >> switch
> >> > > > >> > >>> is
> >> > > > >> > >>> > > > > provided.
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya
> Auradkar <
> >> > > > >> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > > > > Hey Allen,
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > > > 1. If we choose fail fast topic creation, we
> will
> >> > have
> >> > > > >> topic
> >> > > > >> > >>> > creation
> >> > > > >> > >>> > > > > > failures while upgrading the cluster. I really
> >> doubt
> >> > > we
> >> > > > >> want
> >> > > > >> > >>> this
> >> > > > >> > >>> > > > > behavior.
> >> > > > >> > >>> > > > > > Ideally, this should be invisible to clients
> of a
> >> > > > cluster.
> >> > > > >> > >>> > Currently,
> >> > > > >> > >>> > > > > each
> >> > > > >> > >>> > > > > > broker is effectively its own rack. So we
> >> probably
> >> > can
> >> > > > use
> >> > > > >> > the
> >> > > > >> > >>> rack
> >> > > > >> > >>> > > > > > information whenever possible but not make it a
> >> hard
> >> > > > >> > >>> requirement.
> >> > > > >> > >>> > To
> >> > > > >> > >>> > > > > extend
> >> > > > >> > >>> > > > > > Gwen's example, one badly configured broker
> >> should
> >> > not
> >> > > > >> > degrade
> >> > > > >> > >>> > topic
> >> > > > >> > >>> > > > > > creation for the entire cluster.
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a section on
> >> the
> >> > > > upgrade
> >> > > > >> > >>> piece to
> >> > > > >> > >>> > > > > confirm
> >> > > > >> > >>> > > > > > that old clients will not see errors? I believe
> >> > > > >> > >>> > > > > ZookeeperConsumerConnector
> >> > > > >> > >>> > > > > > reads the Broker objects from ZK. I wanted to
> >> > confirm
> >> > > > that
> >> > > > >> > this
> >> > > > >> > >>> > will
> >> > > > >> > >>> > > > not
> >> > > > >> > >>> > > > > > cause any problems.
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > > > 3. Could you elaborate your proposed changes to
> >> the
> >> > > > >> > >>> > > > UpdateMetadataRequest
> >> > > > >> > >>> > > > > > in the "Public Interfaces" section?
> Personally, I
> >> > find
> >> > > > >> this
> >> > > > >> > >>> format
> >> > > > >> > >>> > > easy
> >> > > > >> > >>> > > > > to
> >> > > > >> > >>> > > > > > read in terms of wire protocol changes:
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > >
> >> > > > >> > >>> > >
> >> > > > >> > >>> >
> >> > > > >> > >>>
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > > > Aditya
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> >> > > > >> > >>> allenxwang@gmail.com>
> >> > > > >> > >>> > > > > wrote:
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > > > > KIP is updated include rack as an optional
> >> > property
> >> > > > for
> >> > > > >> > >>> broker.
> >> > > > >> > >>> > > > Please
> >> > > > >> > >>> > > > > > take
> >> > > > >> > >>> > > > > > > a look and let me know if more details are
> >> needed.
> >> > > > >> > >>> > > > > > >
> >> > > > >> > >>> > > > > > > For the case where some brokers have rack and
> >> some
> >> > > do
> >> > > > >> not,
> >> > > > >> > >>> the
> >> > > > >> > >>> > > > current
> >> > > > >> > >>> > > > > > KIP
> >> > > > >> > >>> > > > > > > uses the fail-fast behavior. If there are
> >> > concerns,
> >> > > we
> >> > > > >> can
> >> > > > >> > >>> > further
> >> > > > >> > >>> > > > > > discuss
> >> > > > >> > >>> > > > > > > this in the email thread or next hangout.
> >> > > > >> > >>> > > > > > >
> >> > > > >> > >>> > > > > > >
> >> > > > >> > >>> > > > > > >
> >> > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang
> <
> >> > > > >> > >>> > allenxwang@gmail.com
> >> > > > >> > >>> > > >
> >> > > > >> > >>> > > > > > wrote:
> >> > > > >> > >>> > > > > > >
> >> > > > >> > >>> > > > > > > > That's a good question. I can think of
> three
> >> > > actions
> >> > > > >> if
> >> > > > >> > the
> >> > > > >> > >>> > rack
> >> > > > >> > >>> > > > > > > > information is incomplete:
> >> > > > >> > >>> > > > > > > >
> >> > > > >> > >>> > > > > > > > 1. Treat the node without rack as if it is
> on
> >> > its
> >> > > > >> unique
> >> > > > >> > >>> rack
> >> > > > >> > >>> > > > > > > > 2. Disregard all rack information and
> >> fallback
> >> > to
> >> > > > >> current
> >> > > > >> > >>> > > algorithm
> >> > > > >> > >>> > > > > > > > 3. Fail-fast
> >> > > > >> > >>> > > > > > > >
> >> > > > >> > >>> > > > > > > > Now I think about it, one and three make
> more
> >> > > sense.
> >> > > > >> The
> >> > > > >> > >>> reason
> >> > > > >> > >>> > > for
> >> > > > >> > >>> > > > > > > > fail-fast is that user mistake for not
> >> providing
> >> > > the
> >> > > > >> rack
> >> > > > >> > >>> may
> >> > > > >> > >>> > > never
> >> > > > >> > >>> > > > > be
> >> > > > >> > >>> > > > > > > > found if we tolerate that and the
> assignment
> >> may
> >> > > not
> >> > > > >> be
> >> > > > >> > >>> rack
> >> > > > >> > >>> > > aware
> >> > > > >> > >>> > > > as
> >> > > > >> > >>> > > > > > the
> >> > > > >> > >>> > > > > > > > user has expected and this creates debug
> >> > problems
> >> > > > when
> >> > > > >> > >>> things
> >> > > > >> > >>> > > fail.
> >> > > > >> > >>> > > > > > > >
> >> > > > >> > >>> > > > > > > > What do you think? If not fail-fast, is
> there
> >> > > anyway
> >> > > > >> we
> >> > > > >> > can
> >> > > > >> > >>> > make
> >> > > > >> > >>> > > > the
> >> > > > >> > >>> > > > > > user
> >> > > > >> > >>> > > > > > > > error standing out?
> >> > > > >> > >>> > > > > > > >
> >> > > > >> > >>> > > > > > > >
> >> > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen
> >> Shapira <
> >> > > > >> > >>> > > gwen@confluent.io>
> >> > > > >> > >>> > > > > > > wrote:
> >> > > > >> > >>> > > > > > > >
> >> > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers
> >> have
> >> > > > rack
> >> > > > >> > >>> > assignment
> >> > > > >> > >>> > > > and
> >> > > > >> > >>> > > > > > some
> >> > > > >> > >>> > > > > > > >> don't, do we act like none of them have
> it?
> >> or
> >> > > like
> >> > > > >> > those
> >> > > > >> > >>> > > without
> >> > > > >> > >>> > > > > > > >> assignment are in their own rack?
> >> > > > >> > >>> > > > > > > >>
> >> > > > >> > >>> > > > > > > >> The first scenario is good when first
> >> setting
> >> > up
> >> > > > >> > >>> > rack-awareness,
> >> > > > >> > >>> > > > but
> >> > > > >> > >>> > > > > > the
> >> > > > >> > >>> > > > > > > >> second makes more sense for on-going
> >> > maintenance
> >> > > (I
> >> > > > >> can
> >> > > > >> > >>> > totally
> >> > > > >> > >>> > > > see
> >> > > > >> > >>> > > > > > > >> someone
> >> > > > >> > >>> > > > > > > >> adding a node and forgetting to set the
> rack
> >> > > > >> property,
> >> > > > >> > we
> >> > > > >> > >>> > don't
> >> > > > >> > >>> > > > want
> >> > > > >> > >>> > > > > > > this
> >> > > > >> > >>> > > > > > > >> to change behavior for anything except the
> >> new
> >> > > > node).
> >> > > > >> > >>> > > > > > > >>
> >> > > > >> > >>> > > > > > > >> What do you think?
> >> > > > >> > >>> > > > > > > >>
> >> > > > >> > >>> > > > > > > >> Gwen
> >> > > > >> > >>> > > > > > > >>
> >> > > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen
> >> Wang <
> >> > > > >> > >>> > > > allenxwang@gmail.com>
> >> > > > >> > >>> > > > > > > >> wrote:
> >> > > > >> > >>> > > > > > > >>
> >> > > > >> > >>> > > > > > > >> > For scenario 1:
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
> >> property
> >> > > > file
> >> > > > >> or
> >> > > > >> > >>> > > > dynamically
> >> > > > >> > >>> > > > > > set
> >> > > > >> > >>> > > > > > > >> it in
> >> > > > >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka
> >> server.
> >> > You
> >> > > > >> would
> >> > > > >> > do
> >> > > > >> > >>> > that
> >> > > > >> > >>> > > > for
> >> > > > >> > >>> > > > > > all
> >> > > > >> > >>> > > > > > > >> > brokers and restart the brokers one by
> >> one.
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >> > In this scenario, the complete broker to
> >> rack
> >> > > > >> mapping
> >> > > > >> > >>> may
> >> > > > >> > >>> > not
> >> > > > >> > >>> > > be
> >> > > > >> > >>> > > > > > > >> available
> >> > > > >> > >>> > > > > > > >> > until every broker is restarted. During
> >> that
> >> > > time
> >> > > > >> we
> >> > > > >> > >>> fall
> >> > > > >> > >>> > back
> >> > > > >> > >>> > > > to
> >> > > > >> > >>> > > > > > > >> default
> >> > > > >> > >>> > > > > > > >> > replica assignment algorithm.
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >> > For scenario 2:
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
> >> property
> >> > > > file
> >> > > > >> or
> >> > > > >> > >>> > > > dynamically
> >> > > > >> > >>> > > > > > set
> >> > > > >> > >>> > > > > > > >> it in
> >> > > > >> > >>> > > > > > > >> > the wrapper code and start the broker.
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen
> >> > Shapira <
> >> > > > >> > >>> > > > gwen@confluent.io>
> >> > > > >> > >>> > > > > > > >> wrote:
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >> > > Can you clarify the workflow for the
> >> > > following
> >> > > > >> > >>> scenarios:
> >> > > > >> > >>> > > > > > > >> > >
> >> > > > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want
> >> to
> >> > add
> >> > > > >> rack
> >> > > > >> > >>> > > information
> >> > > > >> > >>> > > > > for
> >> > > > >> > >>> > > > > > > >> each
> >> > > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want
> to
> >> > > > specify
> >> > > > >> > which
> >> > > > >> > >>> > rack
> >> > > > >> > >>> > > it
> >> > > > >> > >>> > > > > > > >> belongs on
> >> > > > >> > >>> > > > > > > >> > > while adding it.
> >> > > > >> > >>> > > > > > > >> > >
> >> > > > >> > >>> > > > > > > >> > > Thanks!
> >> > > > >> > >>> > > > > > > >> > >
> >> > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen
> >> > Wang <
> >> > > > >> > >>> > > > > allenxwang@gmail.com
> >> > > > >> > >>> > > > > > >
> >> > > > >> > >>> > > > > > > >> > wrote:
> >> > > > >> > >>> > > > > > > >> > >
> >> > > > >> > >>> > > > > > > >> > > > We discussed the KIP in the hangout
> >> > today.
> >> > > > The
> >> > > > >> > >>> > > > recommendation
> >> > > > >> > >>> > > > > is
> >> > > > >> > >>> > > > > > > to
> >> > > > >> > >>> > > > > > > >> > make
> >> > > > >> > >>> > > > > > > >> > > > rack as a broker property in
> >> ZooKeeper.
> >> > For
> >> > > > >> users
> >> > > > >> > >>> with
> >> > > > >> > >>> > > > > existing
> >> > > > >> > >>> > > > > > > rack
> >> > > > >> > >>> > > > > > > >> > > > information stored somewhere, they
> >> would
> >> > > need
> >> > > > >> to
> >> > > > >> > >>> > retrieve
> >> > > > >> > >>> > > > the
> >> > > > >> > >>> > > > > > > >> > information
> >> > > > >> > >>> > > > > > > >> > > > at broker start up and dynamically
> set
> >> > the
> >> > > > rack
> >> > > > >> > >>> > property,
> >> > > > >> > >>> > > > > which
> >> > > > >> > >>> > > > > > > can
> >> > > > >> > >>> > > > > > > >> be
> >> > > > >> > >>> > > > > > > >> > > > implemented as a wrapper to
> bootstrap
> >> > > broker.
> >> > > > >> > There
> >> > > > >> > >>> will
> >> > > > >> > >>> > > be
> >> > > > >> > >>> > > > no
> >> > > > >> > >>> > > > > > > >> > interface
> >> > > > >> > >>> > > > > > > >> > > or
> >> > > > >> > >>> > > > > > > >> > > > pluggable implementation to retrieve
> >> the
> >> > > rack
> >> > > > >> > >>> > information.
> >> > > > >> > >>> > > > > > > >> > > >
> >> > > > >> > >>> > > > > > > >> > > > The assumption is that you always
> >> need to
> >> > > > >> restart
> >> > > > >> > >>> the
> >> > > > >> > >>> > > broker
> >> > > > >> > >>> > > > > to
> >> > > > >> > >>> > > > > > > >> make a
> >> > > > >> > >>> > > > > > > >> > > > change to the rack.
> >> > > > >> > >>> > > > > > > >> > > >
> >> > > > >> > >>> > > > > > > >> > > > Once the rack becomes a broker
> >> property,
> >> > it
> >> > > > >> will
> >> > > > >> > be
> >> > > > >> > >>> > > possible
> >> > > > >> > >>> > > > > to
> >> > > > >> > >>> > > > > > > make
> >> > > > >> > >>> > > > > > > >> > rack
> >> > > > >> > >>> > > > > > > >> > > > part of the meta data to help the
> >> > consumer
> >> > > > >> choose
> >> > > > >> > >>> which
> >> > > > >> > >>> > in
> >> > > > >> > >>> > > > > sync
> >> > > > >> > >>> > > > > > > >> replica
> >> > > > >> > >>> > > > > > > >> > > to
> >> > > > >> > >>> > > > > > > >> > > > consume from as part of the future
> >> > consumer
> >> > > > >> > >>> enhancement.
> >> > > > >> > >>> > > > > > > >> > > >
> >> > > > >> > >>> > > > > > > >> > > > I will update the KIP.
> >> > > > >> > >>> > > > > > > >> > > >
> >> > > > >> > >>> > > > > > > >> > > > Thanks,
> >> > > > >> > >>> > > > > > > >> > > > Allen
> >> > > > >> > >>> > > > > > > >> > > >
> >> > > > >> > >>> > > > > > > >> > > >
> >> > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM,
> Allen
> >> > Wang
> >> > > <
> >> > > > >> > >>> > > > > > allenxwang@gmail.com>
> >> > > > >> > >>> > > > > > > >> > wrote:
> >> > > > >> > >>> > > > > > > >> > > >
> >> > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout
> but
> >> > this
> >> > > > KIP
> >> > > > >> > was
> >> > > > >> > >>> not
> >> > > > >> > >>> > > > > > discussed
> >> > > > >> > >>> > > > > > > >> due
> >> > > > >> > >>> > > > > > > >> > to
> >> > > > >> > >>> > > > > > > >> > > > > time constraint.
> >> > > > >> > >>> > > > > > > >> > > > >
> >> > > > >> > >>> > > > > > > >> > > > > However, after hearing discussion
> of
> >> > > > KIP-35,
> >> > > > >> I
> >> > > > >> > >>> have
> >> > > > >> > >>> > the
> >> > > > >> > >>> > > > > > feeling
> >> > > > >> > >>> > > > > > > >> that
> >> > > > >> > >>> > > > > > > >> > > > > incompatibility (caused by new
> >> broker
> >> > > > >> property)
> >> > > > >> > >>> > between
> >> > > > >> > >>> > > > > > brokers
> >> > > > >> > >>> > > > > > > >> with
> >> > > > >> > >>> > > > > > > >> > > > > different versions  will be solved
> >> > there.
> >> > > > In
> >> > > > >> > >>> addition,
> >> > > > >> > >>> > > > > having
> >> > > > >> > >>> > > > > > > >> stack
> >> > > > >> > >>> > > > > > > >> > in
> >> > > > >> > >>> > > > > > > >> > > > > broker property as meta data may
> >> also
> >> > > help
> >> > > > >> > >>> consumers
> >> > > > >> > >>> > in
> >> > > > >> > >>> > > > the
> >> > > > >> > >>> > > > > > > >> future.
> >> > > > >> > >>> > > > > > > >> > So
> >> > > > >> > >>> > > > > > > >> > > I
> >> > > > >> > >>> > > > > > > >> > > > am
> >> > > > >> > >>> > > > > > > >> > > > > open to adding stack property to
> >> > broker.
> >> > > > >> > >>> > > > > > > >> > > > >
> >> > > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss this in
> the
> >> > next
> >> > > > KIP
> >> > > > >> > >>> hangout.
> >> > > > >> > >>> > > > > > > >> > > > >
> >> > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM,
> >> Allen
> >> > > > Wang <
> >> > > > >> > >>> > > > > > > allenxwang@gmail.com
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >> > > > wrote:
> >> > > > >> > >>> > > > > > > >> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >> Can you send me the information
> on
> >> the
> >> > > > next
> >> > > > >> KIP
> >> > > > >> > >>> > > hangout?
> >> > > > >> > >>> > > > > > > >> > > > >>
> >> > > > >> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping
> >> is
> >> > not
> >> > > > >> > cached.
> >> > > > >> > >>> In
> >> > > > >> > >>> > > > > > KafkaApis,
> >> > > > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is
> called
> >> > each
> >> > > > >> time
> >> > > > >> > the
> >> > > > >> > >>> > > mapping
> >> > > > >> > >>> > > > > is
> >> > > > >> > >>> > > > > > > >> needed
> >> > > > >> > >>> > > > > > > >> > > for
> >> > > > >> > >>> > > > > > > >> > > > >> auto topic creation. This will
> >> ensure
> >> > > > latest
> >> > > > >> > >>> mapping
> >> > > > >> > >>> > is
> >> > > > >> > >>> > > > > used
> >> > > > >> > >>> > > > > > at
> >> > > > >> > >>> > > > > > > >> any
> >> > > > >> > >>> > > > > > > >> > > > time.
> >> > > > >> > >>> > > > > > > >> > > > >>
> >> > > > >> > >>> > > > > > > >> > > > >> The ability to get the complete
> >> > mapping
> >> > > > >> makes
> >> > > > >> > it
> >> > > > >> > >>> > simple
> >> > > > >> > >>> > > > to
> >> > > > >> > >>> > > > > > > reuse
> >> > > > >> > >>> > > > > > > >> the
> >> > > > >> > >>> > > > > > > >> > > > same
> >> > > > >> > >>> > > > > > > >> > > > >> interface in command line tools.
> >> > > > >> > >>> > > > > > > >> > > > >>
> >> > > > >> > >>> > > > > > > >> > > > >>
> >> > > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM,
> >> > Aditya
> >> > > > >> > >>> Auradkar <
> >> > > > >> > >>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid>
> >> > wrote:
> >> > > > >> > >>> > > > > > > >> > > > >>
> >> > > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during
> the
> >> > next
> >> > > > KIP
> >> > > > >> > >>> hangout?
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack
> >> > locator
> >> > > > can
> >> > > > >> be
> >> > > > >> > >>> useful
> >> > > > >> > >>> > > > but I
> >> > > > >> > >>> > > > > > do
> >> > > > >> > >>> > > > > > > >> see a
> >> > > > >> > >>> > > > > > > >> > > few
> >> > > > >> > >>> > > > > > > >> > > > >>> concerns:
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as described
> in
> >> > the
> >> > > > >> > >>> document),
> >> > > > >> > >>> > > > implies
> >> > > > >> > >>> > > > > > that
> >> > > > >> > >>> > > > > > > >> it
> >> > > > >> > >>> > > > > > > >> > can
> >> > > > >> > >>> > > > > > > >> > > > >>> discover rack information for
> any
> >> > node
> >> > > in
> >> > > > >> the
> >> > > > >> > >>> > cluster.
> >> > > > >> > >>> > > > How
> >> > > > >> > >>> > > > > > > does
> >> > > > >> > >>> > > > > > > >> it
> >> > > > >> > >>> > > > > > > >> > > deal
> >> > > > >> > >>> > > > > > > >> > > > >>> with rack location changes? For
> >> > > example,
> >> > > > >> if I
> >> > > > >> > >>> moved
> >> > > > >> > >>> > > > broker
> >> > > > >> > >>> > > > > > id
> >> > > > >> > >>> > > > > > > >> (1)
> >> > > > >> > >>> > > > > > > >> > > from
> >> > > > >> > >>> > > > > > > >> > > > >>> rack
> >> > > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start
> that
> >> > > broker
> >> > > > >> with
> >> > > > >> > a
> >> > > > >> > >>> > newer
> >> > > > >> > >>> > > > rack
> >> > > > >> > >>> > > > > > > >> config.
> >> > > > >> > >>> > > > > > > >> > If
> >> > > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker ->
> >> rack
> >> > > > >> > >>> information at
> >> > > > >> > >>> > > > start
> >> > > > >> > >>> > > > > up
> >> > > > >> > >>> > > > > > > >> time,
> >> > > > >> > >>> > > > > > > >> > > any
> >> > > > >> > >>> > > > > > > >> > > > >>> change to a broker will require
> >> > > bouncing
> >> > > > >> the
> >> > > > >> > >>> entire
> >> > > > >> > >>> > > > > cluster
> >> > > > >> > >>> > > > > > > >> since
> >> > > > >> > >>> > > > > > > >> > > > >>> createTopic requests can be sent
> >> to
> >> > any
> >> > > > >> node
> >> > > > >> > in
> >> > > > >> > >>> the
> >> > > > >> > >>> > > > > cluster.
> >> > > > >> > >>> > > > > > > >> > > > >>> For this reason it may be
> simpler
> >> to
> >> > > have
> >> > > > >> each
> >> > > > >> > >>> node
> >> > > > >> > >>> > be
> >> > > > >> > >>> > > > > aware
> >> > > > >> > >>> > > > > > > of
> >> > > > >> > >>> > > > > > > >> its
> >> > > > >> > >>> > > > > > > >> > > own
> >> > > > >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during
> >> > start
> >> > > up
> >> > > > >> > time.
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies
> >> on
> >> > an
> >> > > > >> > external
> >> > > > >> > >>> > > service
> >> > > > >> > >>> > > > > > being
> >> > > > >> > >>> > > > > > > >> > > available
> >> > > > >> > >>> > > > > > > >> > > > >>> to
> >> > > > >> > >>> > > > > > > >> > > > >>> serve rack information.
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up
> how
> >> a
> >> > > > couple
> >> > > > >> of
> >> > > > >> > >>> other
> >> > > > >> > >>> > > > > systems
> >> > > > >> > >>> > > > > > > deal
> >> > > > >> > >>> > > > > > > >> > with
> >> > > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> >> > > > >> > >>> > > > > > > >> > > > >>> For Cassandra some interesting
> >> modes
> >> > > are:
> >> > > > >> > >>> > > > > > > >> > > > >>> (Property File configuration)
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > >
> >> > > > >> > >>> > > > > > > >> > >
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >>
> >> > > > >> > >>> > > > > > >
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > >
> >> > > > >> > >>> > >
> >> > > > >> > >>> >
> >> > > > >> > >>>
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> >> > > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > >
> >> > > > >> > >>> > > > > > > >> > >
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >>
> >> > > > >> > >>> > > > > > >
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > >
> >> > > > >> > >>> > >
> >> > > > >> > >>> >
> >> > > > >> > >>>
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > > >>> Voldemort does a static node ->
> >> zone
> >> > > > >> > assignment
> >> > > > >> > >>> > based
> >> > > > >> > >>> > > on
> >> > > > >> > >>> > > > > > > >> > > configuration.
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > > >>> Aditya
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05
> AM,
> >> > Allen
> >> > > > >> Wang <
> >> > > > >> > >>> > > > > > > >> allenxwang@gmail.com
> >> > > > >> > >>> > > > > > > >> > >
> >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > > >>> > I would like to see if we can
> do
> >> > > both:
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable
> to
> >> > > > >> facilitate
> >> > > > >> > >>> > migration
> >> > > > >> > >>> > > > > with
> >> > > > >> > >>> > > > > > > >> > existing
> >> > > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>> > - Make rack an optional
> property
> >> > for
> >> > > > >> broker.
> >> > > > >> > >>> If
> >> > > > >> > >>> > rack
> >> > > > >> > >>> > > > is
> >> > > > >> > >>> > > > > > > >> available
> >> > > > >> > >>> > > > > > > >> > > > from
> >> > > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as source of
> >> > truth.
> >> > > > For
> >> > > > >> > users
> >> > > > >> > >>> > with
> >> > > > >> > >>> > > > > > existing
> >> > > > >> > >>> > > > > > > >> > > > >>> broker-rack
> >> > > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they
> can
> >> > use
> >> > > > the
> >> > > > >> > >>> pluggable
> >> > > > >> > >>> > > way
> >> > > > >> > >>> > > > > or
> >> > > > >> > >>> > > > > > > they
> >> > > > >> > >>> > > > > > > >> > can
> >> > > > >> > >>> > > > > > > >> > > > >>> transfer
> >> > > > >> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack
> >> > > > property.
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is
> what
> >> > > happens
> >> > > > >> at
> >> > > > >> > >>> rolling
> >> > > > >> > >>> > > > > upgrade
> >> > > > >> > >>> > > > > > > >> when
> >> > > > >> > >>> > > > > > > >> > we
> >> > > > >> > >>> > > > > > > >> > > > have
> >> > > > >> > >>> > > > > > > >> > > > >>> > rack as a broker property. For
> >> > > brokers
> >> > > > >> with
> >> > > > >> > >>> older
> >> > > > >> > >>> > > > > version
> >> > > > >> > >>> > > > > > of
> >> > > > >> > >>> > > > > > > >> > Kafka,
> >> > > > >> > >>> > > > > > > >> > > > >>> will it
> >> > > > >> > >>> > > > > > > >> > > > >>> > cause problem for them? If so,
> >> is
> >> > > there
> >> > > > >> any
> >> > > > >> > >>> > > > workaround?
> >> > > > >> > >>> > > > > I
> >> > > > >> > >>> > > > > > > also
> >> > > > >> > >>> > > > > > > >> > > think
> >> > > > >> > >>> > > > > > > >> > > > it
> >> > > > >> > >>> > > > > > > >> > > > >>> > would be better not to have
> >> rack in
> >> > > the
> >> > > > >> > >>> controller
> >> > > > >> > >>> > > > wire
> >> > > > >> > >>> > > > > > > >> protocol
> >> > > > >> > >>> > > > > > > >> > > but
> >> > > > >> > >>> > > > > > > >> > > > >>> not
> >> > > > >> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>> > Thanks,
> >> > > > >> > >>> > > > > > > >> > > > >>> > Allen
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55
> PM,
> >> > Todd
> >> > > > >> > Palino <
> >> > > > >> > >>> > > > > > > >> tpalino@gmail.com>
> >> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a
> >> > > > pluggable
> >> > > > >> > >>> locator.
> >> > > > >> > >>> > > For
> >> > > > >> > >>> > > > > > > >> example, we
> >> > > > >> > >>> > > > > > > >> > > > >>> already
> >> > > > >> > >>> > > > > > > >> > > > >>> > > have an interface for
> >> discovering
> >> > > > >> > >>> information
> >> > > > >> > >>> > > about
> >> > > > >> > >>> > > > > the
> >> > > > >> > >>> > > > > > > >> > physical
> >> > > > >> > >>> > > > > > > >> > > > >>> location
> >> > > > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish
> the
> >> > idea
> >> > > > of
> >> > > > >> > >>> having to
> >> > > > >> > >>> > > > > > maintain
> >> > > > >> > >>> > > > > > > >> data
> >> > > > >> > >>> > > > > > > >> > in
> >> > > > >> > >>> > > > > > > >> > > > >>> > multiple
> >> > > > >> > >>> > > > > > > >> > > > >>> > > places.
> >> > > > >> > >>> > > > > > > >> > > > >>> > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > -Todd
> >> > > > >> > >>> > > > > > > >> > > > >>> > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48
> >> PM,
> >> > > > Aditya
> >> > > > >> > >>> > Auradkar <
> >> > > > >> > >>> > > > > > > >> > > > >>> > >
> aauradkar@linkedin.com.invalid
> >> >
> >> > > > wrote:
> >> > > > >> > >>> > > > > > > >> > > > >>> > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this
> KIP
> >> > > Allen.
> >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that
> >> having a
> >> > > > >> > >>> RackLocator
> >> > > > >> > >>> > > class
> >> > > > >> > >>> > > > > that
> >> > > > >> > >>> > > > > > > is
> >> > > > >> > >>> > > > > > > >> > > > pluggable
> >> > > > >> > >>> > > > > > > >> > > > >>> > seems
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP
> >> > refers
> >> > > > to
> >> > > > >> > >>> > potentially
> >> > > > >> > >>> > > > > > non-ZK
> >> > > > >> > >>> > > > > > > >> > storage
> >> > > > >> > >>> > > > > > > >> > > > >>> for the
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack info which I don't
> >> think
> >> > is
> >> > > > >> > >>> necessary.
> >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist
> this
> >> > info
> >> > > in
> >> > > > >> zk
> >> > > > >> > >>> under
> >> > > > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > similar to other broker
> >> > > properties
> >> > > > >> and
> >> > > > >> > >>> add a
> >> > > > >> > >>> > > > config
> >> > > > >> > >>> > > > > in
> >> > > > >> > >>> > > > > > > >> > > > KafkaConfig
> >> > > > >> > >>> > > > > > > >> > > > >>> > called
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > "rack".
> >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> >> > > > >> > >>> > > > > > >
> >> > > > >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> >> > > > >> > >>> > > > > > > >> > > "rack":
> >> > > > >> > >>> > > > > > > >> > > > >>> > "abc"}
> >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya
> >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at
> 2:30
> >> > PM,
> >> > > > Gwen
> >> > > > >> > >>> Shapira
> >> > > > >> > >>> > <
> >> > > > >> > >>> > > > > > > >> > > gwen@confluent.io
> >> > > > >> > >>> > > > > > > >> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > wrote:
> >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for
> putting
> >> > out a
> >> > > > KIP
> >> > > > >> > for
> >> > > > >> > >>> > this.
> >> > > > >> > >>> > > > This
> >> > > > >> > >>> > > > > > is
> >> > > > >> > >>> > > > > > > >> super
> >> > > > >> > >>> > > > > > > >> > > > >>> important
> >> > > > >> > >>> > > > > > > >> > > > >>> > > for
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > production deployments
> of
> >> > > Kafka.
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want
> "as
> >> > many
> >> > > > >> racks
> >> > > > >> > as
> >> > > > >> > >>> > > > > possible"?
> >> > > > >> > >>> > > > > > > I'd
> >> > > > >> > >>> > > > > > > >> > want
> >> > > > >> > >>> > > > > > > >> > > to
> >> > > > >> > >>> > > > > > > >> > > > >>> > balance
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > between safety (more
> >> racks)
> >> > and
> >> > > > >> > network
> >> > > > >> > >>> > > > > utilization
> >> > > > >> > >>> > > > > > > >> > (traffic
> >> > > > >> > >>> > > > > > > >> > > > >>> within a
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth
> >> TOR
> >> > > > >> switch).
> >> > > > >> > One
> >> > > > >> > >>> > > replica
> >> > > > >> > >>> > > > > on
> >> > > > >> > >>> > > > > > a
> >> > > > >> > >>> > > > > > > >> > > different
> >> > > > >> > >>> > > > > > > >> > > > >>> rack
> >> > > > >> > >>> > > > > > > >> > > > >>> > > and
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack
> (if
> >> > > > possible)
> >> > > > >> > >>> sounds
> >> > > > >> > >>> > > > better
> >> > > > >> > >>> > > > > to
> >> > > > >> > >>> > > > > > > me.
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class
> >> seems
> >> > > > overly
> >> > > > >> > >>> complex
> >> > > > >> > >>> > > > > compared
> >> > > > >> > >>> > > > > > to
> >> > > > >> > >>> > > > > > > >> > > adding a
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > rack.number
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > property to the broker
> >> > > properties
> >> > > > >> > file.
> >> > > > >> > >>> Why
> >> > > > >> > >>> > do
> >> > > > >> > >>> > > > we
> >> > > > >> > >>> > > > > > want
> >> > > > >> > >>> > > > > > > >> > that?
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at
> >> 12:15
> >> > > PM,
> >> > > > >> > Allen
> >> > > > >> > >>> > Wang <
> >> > > > >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > wrote:
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka
> Developers,
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36
> >> for
> >> > > rack
> >> > > > >> aware
> >> > > > >> > >>> > replica
> >> > > > >> > >>> > > > > > > >> assignment.
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > >
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > >
> >> > > > >> > >>> > > > > > > >> > >
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >>
> >> > > > >> > >>> > > > > > >
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > >
> >> > > > >> > >>> > >
> >> > > > >> > >>> >
> >> > > > >> > >>>
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize
> >> the
> >> > > > >> isolation
> >> > > > >> > >>> > > provided
> >> > > > >> > >>> > > > by
> >> > > > >> > >>> > > > > > the
> >> > > > >> > >>> > > > > > > >> > racks
> >> > > > >> > >>> > > > > > > >> > > in
> >> > > > >> > >>> > > > > > > >> > > > >>> data
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > center
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute
> replicas
> >> to
> >> > > > racks
> >> > > > >> to
> >> > > > >> > >>> > provide
> >> > > > >> > >>> > > > > fault
> >> > > > >> > >>> > > > > > > >> > > tolerance.
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > > >
> >> > > > >> > >>> > > > > > > >> > > > >>> > >
> >> > > > >> > >>> > > > > > > >> > > > >>> >
> >> > > > >> > >>> > > > > > > >> > > > >>>
> >> > > > >> > >>> > > > > > > >> > > > >>
> >> > > > >> > >>> > > > > > > >> > > > >>
> >> > > > >> > >>> > > > > > > >> > > > >
> >> > > > >> > >>> > > > > > > >> > > >
> >> > > > >> > >>> > > > > > > >> > >
> >> > > > >> > >>> > > > > > > >> >
> >> > > > >> > >>> > > > > > > >>
> >> > > > >> > >>> > > > > > > >
> >> > > > >> > >>> > > > > > > >
> >> > > > >> > >>> > > > > > >
> >> > > > >> > >>> > > > > >
> >> > > > >> > >>> > > > >
> >> > > > >> > >>> > > >
> >> > > > >> > >>> > >
> >> > > > >> > >>> >
> >> > > > >> > >>>
> >> > > > >> > >>>
> >> > > > >> > >>>
> >> > > > >> > >>> --
> >> > > > >> > >>> Thanks,
> >> > > > >> > >>> Neha
> >> > > > >> > >>>
> >> > > > >> > >>
> >> > > > >> > >>
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Regarding the JSON version of Broker:

I don't why the ZkUtils.getBrokerInfo() restricts the JSON versions it can
read. It will throw exception if version is not 1 or 2. Seems to me that it
will cause compatibility problem whenever the version needs to be changed
and make the upgrade path difficult.

One option we have is to make rack also part of version 2 and keep the
version 2 unchanged for this update. This will make the old clients
compatible. During rolling upgrade, it will also avoid problems if the
controller/broker is still the old version.

However, ZkUtils.getBrokerInfo() will be updated to return the Broker with
rack so the rack information will be available once the server/client is
upgraded to the latest version.



On Wed, Jan 6, 2016 at 6:28 PM, Allen Wang <al...@gmail.com> wrote:

> Updated KIP according to Jun's comment and included changes to TMR.
>
> On Tue, Jan 5, 2016 at 5:59 PM, Jun Rao <ju...@confluent.io> wrote:
>
>> Hi, Allen,
>>
>> A couple of minor comments on the KIP.
>>
>> 1. The version of the broker JSON string says 2. It should be 3.
>>
>> 2. The new version of UpdateMetadataRequest should be 2, instead of 1.
>> Could you include the full wire protocol of version 2 of
>> UpdateMetadataRequest and highlight the changed part?
>>
>> Thanks,
>>
>> Jun
>>
>> On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <al...@gmail.com> wrote:
>>
>> > Jun and I had a chance to discuss it in a meeting and it is agreed to
>> > change the TMR in a different patch.
>> >
>> > I can change the KIP to include rack in TMR. The essential change is to
>> add
>> > rack into class BrokerEndPoint and make TMR version aware.
>> >
>> >
>> >
>> > On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
>> > aauradkar@linkedin.com.invalid> wrote:
>> >
>> > > Jun/Allen -
>> > >
>> > > Did we ever actually agree on whether we should evolve the TMR to
>> include
>> > > rack info or not?
>> > > I don't feel strongly about it but I if it's the right thing to do we
>> > > should probably do it in this KIP (can be a separate patch).. it
>> isn't a
>> > > large change.
>> > >
>> > > Aditya
>> > >
>> > > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <al...@gmail.com>
>> > wrote:
>> > >
>> > > > Added the rolling upgrade instruction in the KIP, similar to those
>> in
>> > > 0.9.0
>> > > > release notes.
>> > > >
>> > > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <al...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Hi Jun,
>> > > > >
>> > > > > The reason that TopicMetadataResponse is not included in the KIP
>> is
>> > > that
>> > > > > it currently is not version aware . So we need to introduce
>> version
>> > to
>> > > it
>> > > > > in order to make sure backward compatibility. It seems to me a big
>> > > > change.
>> > > > > Do we want to couple it with this KIP? Do we need to further
>> discuss
>> > > what
>> > > > > information to include in the new version besides rack? For
>> example,
>> > > > should
>> > > > > we include broker security protocol in TopicMetadataResponse?
>> > > > >
>> > > > > The other option is to make it a separate KIP to make
>> > > > > TopicMetadataResponse version aware and decide what to include,
>> and
>> > > make
>> > > > > this KIP focus on the rack aware algorithm, admin tools  and
>> related
>> > > > > changes to inter-broker protocol .
>> > > > >
>> > > > > Thanks,
>> > > > > Allen
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <ju...@confluent.io>
>> wrote:
>> > > > >
>> > > > >> Allen,
>> > > > >>
>> > > > >> Thanks for the proposal. A few comments.
>> > > > >>
>> > > > >> 1. Since this KIP changes the inter broker communication protocol
>> > > > >> (UpdateMetadataRequest), we will need to document the upgrade
>> path
>> > > > >> (similar
>> > > > >> to what's described in
>> > > > >> http://kafka.apache.org/090/documentation.html#upgrade).
>> > > > >>
>> > > > >> 2. It might be useful to include the rack info of the broker in
>> > > > >> TopicMetadataResponse. This can be useful for administrative
>> tasks,
>> > as
>> > > > >> well
>> > > > >> as read affinity in the future.
>> > > > >>
>> > > > >> Jun
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <
>> allenxwang@gmail.com>
>> > > > wrote:
>> > > > >>
>> > > > >> > If there are no more comments I would like to call for a vote.
>> > > > >> >
>> > > > >> >
>> > > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <
>> > allenxwang@gmail.com>
>> > > > >> wrote:
>> > > > >> >
>> > > > >> > > KIP is updated with more details and how to handle the
>> situation
>> > > > where
>> > > > >> > > rack information is incomplete.
>> > > > >> > >
>> > > > >> > > In the situation where rack information is incomplete, but we
>> > want
>> > > > to
>> > > > >> > > continue with the assignment, I have suggested to ignore all
>> > rack
>> > > > >> > > information and fallback to original algorithm. The reason is
>> > > > >> explained
>> > > > >> > > below:
>> > > > >> > >
>> > > > >> > > The other options are to assume that the broker without the
>> rack
>> > > > >> belong
>> > > > >> > to
>> > > > >> > > its own unique rack, or they belong to one "default" rack.
>> > Either
>> > > > way
>> > > > >> we
>> > > > >> > > choose, it is highly likely to result in uneven number of
>> > brokers
>> > > in
>> > > > >> > racks,
>> > > > >> > > and it is quite possible that the "made up" racks will have
>> much
>> > > > fewer
>> > > > >> > > number of brokers. As I explained in the KIP, uneven number
>> of
>> > > > >> brokers in
>> > > > >> > > racks will lead to uneven distribution of replicas among
>> brokers
>> > > > (even
>> > > > >> > > though the leader distribution is still even). The brokers in
>> > the
>> > > > rack
>> > > > >> > that
>> > > > >> > > has fewer number of brokers will get more replicas per broker
>> > than
>> > > > >> > brokers
>> > > > >> > > in other racks.
>> > > > >> > >
>> > > > >> > > Given this fact and the replica assignment produced will be
>> > > > incorrect
>> > > > >> > > anyway from rack aware point of view, ignoring all rack
>> > > information
>> > > > >> and
>> > > > >> > > fallback to the original algorithm is not a bad choice since
>> it
>> > > will
>> > > > >> at
>> > > > >> > > least have a better guarantee of replica distribution.
>> > > > >> > >
>> > > > >> > > Also for command line tools it gives user a choice if for any
>> > > reason
>> > > > >> they
>> > > > >> > > want to ignore rack information and fallback to the original
>> > > > >> algorithm.
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <
>> > allenxwang@gmail.com
>> > > >
>> > > > >> > wrote:
>> > > > >> > >
>> > > > >> > >> I am busy with some time pressing issues for the last few
>> > days. I
>> > > > >> will
>> > > > >> > >> think about how the incomplete rack information will affect
>> the
>> > > > >> balance
>> > > > >> > and
>> > > > >> > >> update the KIP by early next week.
>> > > > >> > >>
>> > > > >> > >> Thanks,
>> > > > >> > >> Allen
>> > > > >> > >>
>> > > > >> > >>
>> > > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <
>> > neha@confluent.io
>> > > >
>> > > > >> > wrote:
>> > > > >> > >>
>> > > > >> > >>> Few suggestions on improving the KIP
>> > > > >> > >>>
>> > > > >> > >>> *If some brokers have rack, and some do not, the algorithm
>> > will
>> > > > >> thrown
>> > > > >> > an
>> > > > >> > >>> > exception. This is to prevent incorrect assignment
>> caused by
>> > > > user
>> > > > >> > >>> error.*
>> > > > >> > >>>
>> > > > >> > >>>
>> > > > >> > >>> In the KIP, can you clearly state the user-facing behavior
>> > when
>> > > > some
>> > > > >> > >>> brokers have rack information and some don't. Which actions
>> > and
>> > > > >> > requests
>> > > > >> > >>> will error out and how?
>> > > > >> > >>>
>> > > > >> > >>> *Even distribution of partition leadership among brokers*
>> > > > >> > >>>
>> > > > >> > >>>
>> > > > >> > >>> There is some information about arranging the sorted broker
>> > list
>> > > > >> > >>> interlaced
>> > > > >> > >>> with rack ids. Can you describe the changes to the current
>> > > > algorithm
>> > > > >> > in a
>> > > > >> > >>> little more detail? How does this interlacing work if only
>> a
>> > > > subset
>> > > > >> of
>> > > > >> > >>> brokers have the rack id configured? Does this still work
>> if
>> > > > uneven
>> > > > >> #
>> > > > >> > of
>> > > > >> > >>> brokers are assigned to each rack? It might work, I'm
>> looking
>> > > for
>> > > > >> more
>> > > > >> > >>> details on the changes, since it will affect the behavior
>> seen
>> > > by
>> > > > >> the
>> > > > >> > >>> user
>> > > > >> > >>> - imbalance on either the leaders or data or both.
>> > > > >> > >>>
>> > > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
>> > > > >> > aauradkar@linkedin.com>
>> > > > >> > >>> wrote:
>> > > > >> > >>>
>> > > > >> > >>> > I think this sounds reasonable. Anyone else have
>> comments?
>> > > > >> > >>> >
>> > > > >> > >>> > Aditya
>> > > > >> > >>> >
>> > > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
>> > > > allenxwang@gmail.com
>> > > > >> >
>> > > > >> > >>> wrote:
>> > > > >> > >>> >
>> > > > >> > >>> > > During the discussion in the hangout, it was mentioned
>> > that
>> > > it
>> > > > >> > would
>> > > > >> > >>> be
>> > > > >> > >>> > > desirable that consumers know the rack information of
>> the
>> > > > >> brokers
>> > > > >> > so
>> > > > >> > >>> that
>> > > > >> > >>> > > they can consume from the broker in the same rack to
>> > reduce
>> > > > >> > latency.
>> > > > >> > >>> As I
>> > > > >> > >>> > > understand this will only be beneficial if consumer can
>> > > > consume
>> > > > >> > from
>> > > > >> > >>> any
>> > > > >> > >>> > > broker in ISR, which is not possible now.
>> > > > >> > >>> > >
>> > > > >> > >>> > > I suggest we skip the change to TMR. Once the change is
>> > made
>> > > > to
>> > > > >> > >>> consumer
>> > > > >> > >>> > to
>> > > > >> > >>> > > be able to consume from any broker in ISR, the rack
>> > > > information
>> > > > >> can
>> > > > >> > >>> be
>> > > > >> > >>> > > added to TMR.
>> > > > >> > >>> > >
>> > > > >> > >>> > > Another thing I want to confirm is  command line
>> > behavior. I
>> > > > >> think
>> > > > >> > >>> the
>> > > > >> > >>> > > desirable default behavior is to fail fast on command
>> line
>> > > for
>> > > > >> > >>> incomplete
>> > > > >> > >>> > > rack mapping. The error message can include further
>> > > > instruction
>> > > > >> > that
>> > > > >> > >>> > tells
>> > > > >> > >>> > > the user to add an extra argument (like
>> > > > >> "--allow-partial-rackinfo")
>> > > > >> > >>> to
>> > > > >> > >>> > > suppress the error and do an imperfect rack aware
>> > > assignment.
>> > > > If
>> > > > >> > the
>> > > > >> > >>> > > default behavior is to allow incomplete mapping, the
>> error
>> > > can
>> > > > >> > still
>> > > > >> > >>> be
>> > > > >> > >>> > > easily missed.
>> > > > >> > >>> > >
>> > > > >> > >>> > > The affected command line tools are TopicCommand and
>> > > > >> > >>> > > ReassignPartitionsCommand.
>> > > > >> > >>> > >
>> > > > >> > >>> > > Thanks,
>> > > > >> > >>> > > Allen
>> > > > >> > >>> > >
>> > > > >> > >>> > >
>> > > > >> > >>> > >
>> > > > >> > >>> > >
>> > > > >> > >>> > >
>> > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
>> > > > >> > >>> > aauradkar@linkedin.com>
>> > > > >> > >>> > > wrote:
>> > > > >> > >>> > >
>> > > > >> > >>> > > > Hi Allen,
>> > > > >> > >>> > > >
>> > > > >> > >>> > > > For TopicMetadataResponse to understand version, you
>> can
>> > > > bump
>> > > > >> up
>> > > > >> > >>> the
>> > > > >> > >>> > > > request version itself. Based on the version of the
>> > > request,
>> > > > >> the
>> > > > >> > >>> > response
>> > > > >> > >>> > > > can be appropriately serialized. It shouldn't be a
>> huge
>> > > > >> change.
>> > > > >> > For
>> > > > >> > >>> > > > example: We went through something similar for
>> > > > ProduceRequest
>> > > > >> > >>> recently
>> > > > >> > >>> > (
>> > > > >> > >>> > > > https://reviews.apache.org/r/33378/)
>> > > > >> > >>> > > > I guess the reason protocol information is not
>> included
>> > in
>> > > > the
>> > > > >> > TMR
>> > > > >> > >>> is
>> > > > >> > >>> > > > because the topic itself is independent of any
>> > particular
>> > > > >> > protocol
>> > > > >> > >>> (SSL
>> > > > >> > >>> > > vs
>> > > > >> > >>> > > > Plaintext). Having said that, I'm not sure we even
>> need
>> > > rack
>> > > > >> > >>> > information
>> > > > >> > >>> > > in
>> > > > >> > >>> > > > TMR. What usecase were you thinking of initially?
>> > > > >> > >>> > > >
>> > > > >> > >>> > > > For 1 - I'd be fine with adding an option to the
>> command
>> > > > line
>> > > > >> > tools
>> > > > >> > >>> > that
>> > > > >> > >>> > > > check rack assignment. For e.g.
>> "--strict-assignment" or
>> > > > >> > something
>> > > > >> > >>> > > similar.
>> > > > >> > >>> > > >
>> > > > >> > >>> > > > Aditya
>> > > > >> > >>> > > >
>> > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
>> > > > >> > allenxwang@gmail.com>
>> > > > >> > >>> > > wrote:
>> > > > >> > >>> > > >
>> > > > >> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a
>> > look.
>> > > > One
>> > > > >> > >>> thing I
>> > > > >> > >>> > > have
>> > > > >> > >>> > > > > changed is removing the proposal to add rack to
>> > > > >> > >>> > TopicMetadataResponse.
>> > > > >> > >>> > > > The
>> > > > >> > >>> > > > > reason is that unlike UpdateMetadataRequest,
>> > > > >> > >>> TopicMetadataResponse
>> > > > >> > >>> > does
>> > > > >> > >>> > > > not
>> > > > >> > >>> > > > > understand version. I don't see a way to include
>> rack
>> > > > >> without
>> > > > >> > >>> > breaking
>> > > > >> > >>> > > > old
>> > > > >> > >>> > > > > version of clients. That's probably why secure
>> > protocol
>> > > is
>> > > > >> not
>> > > > >> > >>> > included
>> > > > >> > >>> > > > in
>> > > > >> > >>> > > > > the TopicMetadataResponse either. I think it will
>> be a
>> > > > much
>> > > > >> > >>> bigger
>> > > > >> > >>> > > change
>> > > > >> > >>> > > > > to include rack in TopicMetadataResponse.
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > > > For 1, my concern is that doing rack aware
>> assignment
>> > > > >> without
>> > > > >> > >>> > complete
>> > > > >> > >>> > > > > broker to rack mapping will result in assignment
>> that
>> > is
>> > > > not
>> > > > >> > rack
>> > > > >> > >>> > aware
>> > > > >> > >>> > > > and
>> > > > >> > >>> > > > > fail to provide fault tolerance in the event of
>> rack
>> > > > outage.
>> > > > >> > This
>> > > > >> > >>> > kind
>> > > > >> > >>> > > of
>> > > > >> > >>> > > > > problem will be difficult to surface. And the cost
>> of
>> > > this
>> > > > >> > >>> problem is
>> > > > >> > >>> > > > high:
>> > > > >> > >>> > > > > you have to do partition reassignment if you are
>> lucky
>> > > to
>> > > > >> spot
>> > > > >> > >>> the
>> > > > >> > >>> > > > problem
>> > > > >> > >>> > > > > early on or face the consequence of data loss
>> during
>> > > real
>> > > > >> rack
>> > > > >> > >>> > outage.
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > > > I do see the concern of fail-fast as it might also
>> > cause
>> > > > >> data
>> > > > >> > >>> loss if
>> > > > >> > >>> > > > > producer is not able produce the message due to
>> topic
>> > > > >> creation
>> > > > >> > >>> > failure.
>> > > > >> > >>> > > > Is
>> > > > >> > >>> > > > > it feasible to treat dynamic topic creation and
>> > command
>> > > > >> tools
>> > > > >> > >>> > > > differently?
>> > > > >> > >>> > > > > We allow dynamic topic creation with incomplete
>> > > > broker-rack
>> > > > >> > >>> mapping
>> > > > >> > >>> > and
>> > > > >> > >>> > > > > fail fast in command line. Another option is to let
>> > user
>> > > > >> > >>> determine
>> > > > >> > >>> > the
>> > > > >> > >>> > > > > behavior for command line. For example, by default
>> > fail
>> > > > >> fast in
>> > > > >> > >>> > command
>> > > > >> > >>> > > > > line but allow incomplete broker-rack mapping if
>> > another
>> > > > >> switch
>> > > > >> > >>> is
>> > > > >> > >>> > > > > provided.
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
>> > > > >> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > > > > Hey Allen,
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > > > 1. If we choose fail fast topic creation, we will
>> > have
>> > > > >> topic
>> > > > >> > >>> > creation
>> > > > >> > >>> > > > > > failures while upgrading the cluster. I really
>> doubt
>> > > we
>> > > > >> want
>> > > > >> > >>> this
>> > > > >> > >>> > > > > behavior.
>> > > > >> > >>> > > > > > Ideally, this should be invisible to clients of a
>> > > > cluster.
>> > > > >> > >>> > Currently,
>> > > > >> > >>> > > > > each
>> > > > >> > >>> > > > > > broker is effectively its own rack. So we
>> probably
>> > can
>> > > > use
>> > > > >> > the
>> > > > >> > >>> rack
>> > > > >> > >>> > > > > > information whenever possible but not make it a
>> hard
>> > > > >> > >>> requirement.
>> > > > >> > >>> > To
>> > > > >> > >>> > > > > extend
>> > > > >> > >>> > > > > > Gwen's example, one badly configured broker
>> should
>> > not
>> > > > >> > degrade
>> > > > >> > >>> > topic
>> > > > >> > >>> > > > > > creation for the entire cluster.
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a section on
>> the
>> > > > upgrade
>> > > > >> > >>> piece to
>> > > > >> > >>> > > > > confirm
>> > > > >> > >>> > > > > > that old clients will not see errors? I believe
>> > > > >> > >>> > > > > ZookeeperConsumerConnector
>> > > > >> > >>> > > > > > reads the Broker objects from ZK. I wanted to
>> > confirm
>> > > > that
>> > > > >> > this
>> > > > >> > >>> > will
>> > > > >> > >>> > > > not
>> > > > >> > >>> > > > > > cause any problems.
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > > > 3. Could you elaborate your proposed changes to
>> the
>> > > > >> > >>> > > > UpdateMetadataRequest
>> > > > >> > >>> > > > > > in the "Public Interfaces" section? Personally, I
>> > find
>> > > > >> this
>> > > > >> > >>> format
>> > > > >> > >>> > > easy
>> > > > >> > >>> > > > > to
>> > > > >> > >>> > > > > > read in terms of wire protocol changes:
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > >
>> > > > >> > >>> > >
>> > > > >> > >>> >
>> > > > >> > >>>
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > > > Aditya
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
>> > > > >> > >>> allenxwang@gmail.com>
>> > > > >> > >>> > > > > wrote:
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > > > > KIP is updated include rack as an optional
>> > property
>> > > > for
>> > > > >> > >>> broker.
>> > > > >> > >>> > > > Please
>> > > > >> > >>> > > > > > take
>> > > > >> > >>> > > > > > > a look and let me know if more details are
>> needed.
>> > > > >> > >>> > > > > > >
>> > > > >> > >>> > > > > > > For the case where some brokers have rack and
>> some
>> > > do
>> > > > >> not,
>> > > > >> > >>> the
>> > > > >> > >>> > > > current
>> > > > >> > >>> > > > > > KIP
>> > > > >> > >>> > > > > > > uses the fail-fast behavior. If there are
>> > concerns,
>> > > we
>> > > > >> can
>> > > > >> > >>> > further
>> > > > >> > >>> > > > > > discuss
>> > > > >> > >>> > > > > > > this in the email thread or next hangout.
>> > > > >> > >>> > > > > > >
>> > > > >> > >>> > > > > > >
>> > > > >> > >>> > > > > > >
>> > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
>> > > > >> > >>> > allenxwang@gmail.com
>> > > > >> > >>> > > >
>> > > > >> > >>> > > > > > wrote:
>> > > > >> > >>> > > > > > >
>> > > > >> > >>> > > > > > > > That's a good question. I can think of three
>> > > actions
>> > > > >> if
>> > > > >> > the
>> > > > >> > >>> > rack
>> > > > >> > >>> > > > > > > > information is incomplete:
>> > > > >> > >>> > > > > > > >
>> > > > >> > >>> > > > > > > > 1. Treat the node without rack as if it is on
>> > its
>> > > > >> unique
>> > > > >> > >>> rack
>> > > > >> > >>> > > > > > > > 2. Disregard all rack information and
>> fallback
>> > to
>> > > > >> current
>> > > > >> > >>> > > algorithm
>> > > > >> > >>> > > > > > > > 3. Fail-fast
>> > > > >> > >>> > > > > > > >
>> > > > >> > >>> > > > > > > > Now I think about it, one and three make more
>> > > sense.
>> > > > >> The
>> > > > >> > >>> reason
>> > > > >> > >>> > > for
>> > > > >> > >>> > > > > > > > fail-fast is that user mistake for not
>> providing
>> > > the
>> > > > >> rack
>> > > > >> > >>> may
>> > > > >> > >>> > > never
>> > > > >> > >>> > > > > be
>> > > > >> > >>> > > > > > > > found if we tolerate that and the assignment
>> may
>> > > not
>> > > > >> be
>> > > > >> > >>> rack
>> > > > >> > >>> > > aware
>> > > > >> > >>> > > > as
>> > > > >> > >>> > > > > > the
>> > > > >> > >>> > > > > > > > user has expected and this creates debug
>> > problems
>> > > > when
>> > > > >> > >>> things
>> > > > >> > >>> > > fail.
>> > > > >> > >>> > > > > > > >
>> > > > >> > >>> > > > > > > > What do you think? If not fail-fast, is there
>> > > anyway
>> > > > >> we
>> > > > >> > can
>> > > > >> > >>> > make
>> > > > >> > >>> > > > the
>> > > > >> > >>> > > > > > user
>> > > > >> > >>> > > > > > > > error standing out?
>> > > > >> > >>> > > > > > > >
>> > > > >> > >>> > > > > > > >
>> > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen
>> Shapira <
>> > > > >> > >>> > > gwen@confluent.io>
>> > > > >> > >>> > > > > > > wrote:
>> > > > >> > >>> > > > > > > >
>> > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers
>> have
>> > > > rack
>> > > > >> > >>> > assignment
>> > > > >> > >>> > > > and
>> > > > >> > >>> > > > > > some
>> > > > >> > >>> > > > > > > >> don't, do we act like none of them have it?
>> or
>> > > like
>> > > > >> > those
>> > > > >> > >>> > > without
>> > > > >> > >>> > > > > > > >> assignment are in their own rack?
>> > > > >> > >>> > > > > > > >>
>> > > > >> > >>> > > > > > > >> The first scenario is good when first
>> setting
>> > up
>> > > > >> > >>> > rack-awareness,
>> > > > >> > >>> > > > but
>> > > > >> > >>> > > > > > the
>> > > > >> > >>> > > > > > > >> second makes more sense for on-going
>> > maintenance
>> > > (I
>> > > > >> can
>> > > > >> > >>> > totally
>> > > > >> > >>> > > > see
>> > > > >> > >>> > > > > > > >> someone
>> > > > >> > >>> > > > > > > >> adding a node and forgetting to set the rack
>> > > > >> property,
>> > > > >> > we
>> > > > >> > >>> > don't
>> > > > >> > >>> > > > want
>> > > > >> > >>> > > > > > > this
>> > > > >> > >>> > > > > > > >> to change behavior for anything except the
>> new
>> > > > node).
>> > > > >> > >>> > > > > > > >>
>> > > > >> > >>> > > > > > > >> What do you think?
>> > > > >> > >>> > > > > > > >>
>> > > > >> > >>> > > > > > > >> Gwen
>> > > > >> > >>> > > > > > > >>
>> > > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen
>> Wang <
>> > > > >> > >>> > > > allenxwang@gmail.com>
>> > > > >> > >>> > > > > > > >> wrote:
>> > > > >> > >>> > > > > > > >>
>> > > > >> > >>> > > > > > > >> > For scenario 1:
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
>> property
>> > > > file
>> > > > >> or
>> > > > >> > >>> > > > dynamically
>> > > > >> > >>> > > > > > set
>> > > > >> > >>> > > > > > > >> it in
>> > > > >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka
>> server.
>> > You
>> > > > >> would
>> > > > >> > do
>> > > > >> > >>> > that
>> > > > >> > >>> > > > for
>> > > > >> > >>> > > > > > all
>> > > > >> > >>> > > > > > > >> > brokers and restart the brokers one by
>> one.
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >> > In this scenario, the complete broker to
>> rack
>> > > > >> mapping
>> > > > >> > >>> may
>> > > > >> > >>> > not
>> > > > >> > >>> > > be
>> > > > >> > >>> > > > > > > >> available
>> > > > >> > >>> > > > > > > >> > until every broker is restarted. During
>> that
>> > > time
>> > > > >> we
>> > > > >> > >>> fall
>> > > > >> > >>> > back
>> > > > >> > >>> > > > to
>> > > > >> > >>> > > > > > > >> default
>> > > > >> > >>> > > > > > > >> > replica assignment algorithm.
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >> > For scenario 2:
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
>> property
>> > > > file
>> > > > >> or
>> > > > >> > >>> > > > dynamically
>> > > > >> > >>> > > > > > set
>> > > > >> > >>> > > > > > > >> it in
>> > > > >> > >>> > > > > > > >> > the wrapper code and start the broker.
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen
>> > Shapira <
>> > > > >> > >>> > > > gwen@confluent.io>
>> > > > >> > >>> > > > > > > >> wrote:
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >> > > Can you clarify the workflow for the
>> > > following
>> > > > >> > >>> scenarios:
>> > > > >> > >>> > > > > > > >> > >
>> > > > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want
>> to
>> > add
>> > > > >> rack
>> > > > >> > >>> > > information
>> > > > >> > >>> > > > > for
>> > > > >> > >>> > > > > > > >> each
>> > > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to
>> > > > specify
>> > > > >> > which
>> > > > >> > >>> > rack
>> > > > >> > >>> > > it
>> > > > >> > >>> > > > > > > >> belongs on
>> > > > >> > >>> > > > > > > >> > > while adding it.
>> > > > >> > >>> > > > > > > >> > >
>> > > > >> > >>> > > > > > > >> > > Thanks!
>> > > > >> > >>> > > > > > > >> > >
>> > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen
>> > Wang <
>> > > > >> > >>> > > > > allenxwang@gmail.com
>> > > > >> > >>> > > > > > >
>> > > > >> > >>> > > > > > > >> > wrote:
>> > > > >> > >>> > > > > > > >> > >
>> > > > >> > >>> > > > > > > >> > > > We discussed the KIP in the hangout
>> > today.
>> > > > The
>> > > > >> > >>> > > > recommendation
>> > > > >> > >>> > > > > is
>> > > > >> > >>> > > > > > > to
>> > > > >> > >>> > > > > > > >> > make
>> > > > >> > >>> > > > > > > >> > > > rack as a broker property in
>> ZooKeeper.
>> > For
>> > > > >> users
>> > > > >> > >>> with
>> > > > >> > >>> > > > > existing
>> > > > >> > >>> > > > > > > rack
>> > > > >> > >>> > > > > > > >> > > > information stored somewhere, they
>> would
>> > > need
>> > > > >> to
>> > > > >> > >>> > retrieve
>> > > > >> > >>> > > > the
>> > > > >> > >>> > > > > > > >> > information
>> > > > >> > >>> > > > > > > >> > > > at broker start up and dynamically set
>> > the
>> > > > rack
>> > > > >> > >>> > property,
>> > > > >> > >>> > > > > which
>> > > > >> > >>> > > > > > > can
>> > > > >> > >>> > > > > > > >> be
>> > > > >> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap
>> > > broker.
>> > > > >> > There
>> > > > >> > >>> will
>> > > > >> > >>> > > be
>> > > > >> > >>> > > > no
>> > > > >> > >>> > > > > > > >> > interface
>> > > > >> > >>> > > > > > > >> > > or
>> > > > >> > >>> > > > > > > >> > > > pluggable implementation to retrieve
>> the
>> > > rack
>> > > > >> > >>> > information.
>> > > > >> > >>> > > > > > > >> > > >
>> > > > >> > >>> > > > > > > >> > > > The assumption is that you always
>> need to
>> > > > >> restart
>> > > > >> > >>> the
>> > > > >> > >>> > > broker
>> > > > >> > >>> > > > > to
>> > > > >> > >>> > > > > > > >> make a
>> > > > >> > >>> > > > > > > >> > > > change to the rack.
>> > > > >> > >>> > > > > > > >> > > >
>> > > > >> > >>> > > > > > > >> > > > Once the rack becomes a broker
>> property,
>> > it
>> > > > >> will
>> > > > >> > be
>> > > > >> > >>> > > possible
>> > > > >> > >>> > > > > to
>> > > > >> > >>> > > > > > > make
>> > > > >> > >>> > > > > > > >> > rack
>> > > > >> > >>> > > > > > > >> > > > part of the meta data to help the
>> > consumer
>> > > > >> choose
>> > > > >> > >>> which
>> > > > >> > >>> > in
>> > > > >> > >>> > > > > sync
>> > > > >> > >>> > > > > > > >> replica
>> > > > >> > >>> > > > > > > >> > > to
>> > > > >> > >>> > > > > > > >> > > > consume from as part of the future
>> > consumer
>> > > > >> > >>> enhancement.
>> > > > >> > >>> > > > > > > >> > > >
>> > > > >> > >>> > > > > > > >> > > > I will update the KIP.
>> > > > >> > >>> > > > > > > >> > > >
>> > > > >> > >>> > > > > > > >> > > > Thanks,
>> > > > >> > >>> > > > > > > >> > > > Allen
>> > > > >> > >>> > > > > > > >> > > >
>> > > > >> > >>> > > > > > > >> > > >
>> > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen
>> > Wang
>> > > <
>> > > > >> > >>> > > > > > allenxwang@gmail.com>
>> > > > >> > >>> > > > > > > >> > wrote:
>> > > > >> > >>> > > > > > > >> > > >
>> > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but
>> > this
>> > > > KIP
>> > > > >> > was
>> > > > >> > >>> not
>> > > > >> > >>> > > > > > discussed
>> > > > >> > >>> > > > > > > >> due
>> > > > >> > >>> > > > > > > >> > to
>> > > > >> > >>> > > > > > > >> > > > > time constraint.
>> > > > >> > >>> > > > > > > >> > > > >
>> > > > >> > >>> > > > > > > >> > > > > However, after hearing discussion of
>> > > > KIP-35,
>> > > > >> I
>> > > > >> > >>> have
>> > > > >> > >>> > the
>> > > > >> > >>> > > > > > feeling
>> > > > >> > >>> > > > > > > >> that
>> > > > >> > >>> > > > > > > >> > > > > incompatibility (caused by new
>> broker
>> > > > >> property)
>> > > > >> > >>> > between
>> > > > >> > >>> > > > > > brokers
>> > > > >> > >>> > > > > > > >> with
>> > > > >> > >>> > > > > > > >> > > > > different versions  will be solved
>> > there.
>> > > > In
>> > > > >> > >>> addition,
>> > > > >> > >>> > > > > having
>> > > > >> > >>> > > > > > > >> stack
>> > > > >> > >>> > > > > > > >> > in
>> > > > >> > >>> > > > > > > >> > > > > broker property as meta data may
>> also
>> > > help
>> > > > >> > >>> consumers
>> > > > >> > >>> > in
>> > > > >> > >>> > > > the
>> > > > >> > >>> > > > > > > >> future.
>> > > > >> > >>> > > > > > > >> > So
>> > > > >> > >>> > > > > > > >> > > I
>> > > > >> > >>> > > > > > > >> > > > am
>> > > > >> > >>> > > > > > > >> > > > > open to adding stack property to
>> > broker.
>> > > > >> > >>> > > > > > > >> > > > >
>> > > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the
>> > next
>> > > > KIP
>> > > > >> > >>> hangout.
>> > > > >> > >>> > > > > > > >> > > > >
>> > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM,
>> Allen
>> > > > Wang <
>> > > > >> > >>> > > > > > > allenxwang@gmail.com
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >> > > > wrote:
>> > > > >> > >>> > > > > > > >> > > > >
>> > > > >> > >>> > > > > > > >> > > > >> Can you send me the information on
>> the
>> > > > next
>> > > > >> KIP
>> > > > >> > >>> > > hangout?
>> > > > >> > >>> > > > > > > >> > > > >>
>> > > > >> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping
>> is
>> > not
>> > > > >> > cached.
>> > > > >> > >>> In
>> > > > >> > >>> > > > > > KafkaApis,
>> > > > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called
>> > each
>> > > > >> time
>> > > > >> > the
>> > > > >> > >>> > > mapping
>> > > > >> > >>> > > > > is
>> > > > >> > >>> > > > > > > >> needed
>> > > > >> > >>> > > > > > > >> > > for
>> > > > >> > >>> > > > > > > >> > > > >> auto topic creation. This will
>> ensure
>> > > > latest
>> > > > >> > >>> mapping
>> > > > >> > >>> > is
>> > > > >> > >>> > > > > used
>> > > > >> > >>> > > > > > at
>> > > > >> > >>> > > > > > > >> any
>> > > > >> > >>> > > > > > > >> > > > time.
>> > > > >> > >>> > > > > > > >> > > > >>
>> > > > >> > >>> > > > > > > >> > > > >> The ability to get the complete
>> > mapping
>> > > > >> makes
>> > > > >> > it
>> > > > >> > >>> > simple
>> > > > >> > >>> > > > to
>> > > > >> > >>> > > > > > > reuse
>> > > > >> > >>> > > > > > > >> the
>> > > > >> > >>> > > > > > > >> > > > same
>> > > > >> > >>> > > > > > > >> > > > >> interface in command line tools.
>> > > > >> > >>> > > > > > > >> > > > >>
>> > > > >> > >>> > > > > > > >> > > > >>
>> > > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM,
>> > Aditya
>> > > > >> > >>> Auradkar <
>> > > > >> > >>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid>
>> > wrote:
>> > > > >> > >>> > > > > > > >> > > > >>
>> > > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the
>> > next
>> > > > KIP
>> > > > >> > >>> hangout?
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack
>> > locator
>> > > > can
>> > > > >> be
>> > > > >> > >>> useful
>> > > > >> > >>> > > > but I
>> > > > >> > >>> > > > > > do
>> > > > >> > >>> > > > > > > >> see a
>> > > > >> > >>> > > > > > > >> > > few
>> > > > >> > >>> > > > > > > >> > > > >>> concerns:
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in
>> > the
>> > > > >> > >>> document),
>> > > > >> > >>> > > > implies
>> > > > >> > >>> > > > > > that
>> > > > >> > >>> > > > > > > >> it
>> > > > >> > >>> > > > > > > >> > can
>> > > > >> > >>> > > > > > > >> > > > >>> discover rack information for any
>> > node
>> > > in
>> > > > >> the
>> > > > >> > >>> > cluster.
>> > > > >> > >>> > > > How
>> > > > >> > >>> > > > > > > does
>> > > > >> > >>> > > > > > > >> it
>> > > > >> > >>> > > > > > > >> > > deal
>> > > > >> > >>> > > > > > > >> > > > >>> with rack location changes? For
>> > > example,
>> > > > >> if I
>> > > > >> > >>> moved
>> > > > >> > >>> > > > broker
>> > > > >> > >>> > > > > > id
>> > > > >> > >>> > > > > > > >> (1)
>> > > > >> > >>> > > > > > > >> > > from
>> > > > >> > >>> > > > > > > >> > > > >>> rack
>> > > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that
>> > > broker
>> > > > >> with
>> > > > >> > a
>> > > > >> > >>> > newer
>> > > > >> > >>> > > > rack
>> > > > >> > >>> > > > > > > >> config.
>> > > > >> > >>> > > > > > > >> > If
>> > > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker ->
>> rack
>> > > > >> > >>> information at
>> > > > >> > >>> > > > start
>> > > > >> > >>> > > > > up
>> > > > >> > >>> > > > > > > >> time,
>> > > > >> > >>> > > > > > > >> > > any
>> > > > >> > >>> > > > > > > >> > > > >>> change to a broker will require
>> > > bouncing
>> > > > >> the
>> > > > >> > >>> entire
>> > > > >> > >>> > > > > cluster
>> > > > >> > >>> > > > > > > >> since
>> > > > >> > >>> > > > > > > >> > > > >>> createTopic requests can be sent
>> to
>> > any
>> > > > >> node
>> > > > >> > in
>> > > > >> > >>> the
>> > > > >> > >>> > > > > cluster.
>> > > > >> > >>> > > > > > > >> > > > >>> For this reason it may be simpler
>> to
>> > > have
>> > > > >> each
>> > > > >> > >>> node
>> > > > >> > >>> > be
>> > > > >> > >>> > > > > aware
>> > > > >> > >>> > > > > > > of
>> > > > >> > >>> > > > > > > >> its
>> > > > >> > >>> > > > > > > >> > > own
>> > > > >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during
>> > start
>> > > up
>> > > > >> > time.
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies
>> on
>> > an
>> > > > >> > external
>> > > > >> > >>> > > service
>> > > > >> > >>> > > > > > being
>> > > > >> > >>> > > > > > > >> > > available
>> > > > >> > >>> > > > > > > >> > > > >>> to
>> > > > >> > >>> > > > > > > >> > > > >>> serve rack information.
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how
>> a
>> > > > couple
>> > > > >> of
>> > > > >> > >>> other
>> > > > >> > >>> > > > > systems
>> > > > >> > >>> > > > > > > deal
>> > > > >> > >>> > > > > > > >> > with
>> > > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
>> > > > >> > >>> > > > > > > >> > > > >>> For Cassandra some interesting
>> modes
>> > > are:
>> > > > >> > >>> > > > > > > >> > > > >>> (Property File configuration)
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > >
>> > > > >> > >>> > > > > > > >> > >
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >>
>> > > > >> > >>> > > > > > >
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > >
>> > > > >> > >>> > >
>> > > > >> > >>> >
>> > > > >> > >>>
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>> > > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > >
>> > > > >> > >>> > > > > > > >> > >
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >>
>> > > > >> > >>> > > > > > >
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > >
>> > > > >> > >>> > >
>> > > > >> > >>> >
>> > > > >> > >>>
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > > >>> Voldemort does a static node ->
>> zone
>> > > > >> > assignment
>> > > > >> > >>> > based
>> > > > >> > >>> > > on
>> > > > >> > >>> > > > > > > >> > > configuration.
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > > >>> Aditya
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM,
>> > Allen
>> > > > >> Wang <
>> > > > >> > >>> > > > > > > >> allenxwang@gmail.com
>> > > > >> > >>> > > > > > > >> > >
>> > > > >> > >>> > > > > > > >> > > > >>> wrote:
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > > >>> > I would like to see if we can do
>> > > both:
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to
>> > > > >> facilitate
>> > > > >> > >>> > migration
>> > > > >> > >>> > > > > with
>> > > > >> > >>> > > > > > > >> > existing
>> > > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>> > - Make rack an optional property
>> > for
>> > > > >> broker.
>> > > > >> > >>> If
>> > > > >> > >>> > rack
>> > > > >> > >>> > > > is
>> > > > >> > >>> > > > > > > >> available
>> > > > >> > >>> > > > > > > >> > > > from
>> > > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as source of
>> > truth.
>> > > > For
>> > > > >> > users
>> > > > >> > >>> > with
>> > > > >> > >>> > > > > > existing
>> > > > >> > >>> > > > > > > >> > > > >>> broker-rack
>> > > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can
>> > use
>> > > > the
>> > > > >> > >>> pluggable
>> > > > >> > >>> > > way
>> > > > >> > >>> > > > > or
>> > > > >> > >>> > > > > > > they
>> > > > >> > >>> > > > > > > >> > can
>> > > > >> > >>> > > > > > > >> > > > >>> transfer
>> > > > >> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack
>> > > > property.
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what
>> > > happens
>> > > > >> at
>> > > > >> > >>> rolling
>> > > > >> > >>> > > > > upgrade
>> > > > >> > >>> > > > > > > >> when
>> > > > >> > >>> > > > > > > >> > we
>> > > > >> > >>> > > > > > > >> > > > have
>> > > > >> > >>> > > > > > > >> > > > >>> > rack as a broker property. For
>> > > brokers
>> > > > >> with
>> > > > >> > >>> older
>> > > > >> > >>> > > > > version
>> > > > >> > >>> > > > > > of
>> > > > >> > >>> > > > > > > >> > Kafka,
>> > > > >> > >>> > > > > > > >> > > > >>> will it
>> > > > >> > >>> > > > > > > >> > > > >>> > cause problem for them? If so,
>> is
>> > > there
>> > > > >> any
>> > > > >> > >>> > > > workaround?
>> > > > >> > >>> > > > > I
>> > > > >> > >>> > > > > > > also
>> > > > >> > >>> > > > > > > >> > > think
>> > > > >> > >>> > > > > > > >> > > > it
>> > > > >> > >>> > > > > > > >> > > > >>> > would be better not to have
>> rack in
>> > > the
>> > > > >> > >>> controller
>> > > > >> > >>> > > > wire
>> > > > >> > >>> > > > > > > >> protocol
>> > > > >> > >>> > > > > > > >> > > but
>> > > > >> > >>> > > > > > > >> > > > >>> not
>> > > > >> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>> > Thanks,
>> > > > >> > >>> > > > > > > >> > > > >>> > Allen
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM,
>> > Todd
>> > > > >> > Palino <
>> > > > >> > >>> > > > > > > >> tpalino@gmail.com>
>> > > > >> > >>> > > > > > > >> > > > >>> wrote:
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a
>> > > > pluggable
>> > > > >> > >>> locator.
>> > > > >> > >>> > > For
>> > > > >> > >>> > > > > > > >> example, we
>> > > > >> > >>> > > > > > > >> > > > >>> already
>> > > > >> > >>> > > > > > > >> > > > >>> > > have an interface for
>> discovering
>> > > > >> > >>> information
>> > > > >> > >>> > > about
>> > > > >> > >>> > > > > the
>> > > > >> > >>> > > > > > > >> > physical
>> > > > >> > >>> > > > > > > >> > > > >>> location
>> > > > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the
>> > idea
>> > > > of
>> > > > >> > >>> having to
>> > > > >> > >>> > > > > > maintain
>> > > > >> > >>> > > > > > > >> data
>> > > > >> > >>> > > > > > > >> > in
>> > > > >> > >>> > > > > > > >> > > > >>> > multiple
>> > > > >> > >>> > > > > > > >> > > > >>> > > places.
>> > > > >> > >>> > > > > > > >> > > > >>> > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > -Todd
>> > > > >> > >>> > > > > > > >> > > > >>> > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48
>> PM,
>> > > > Aditya
>> > > > >> > >>> > Auradkar <
>> > > > >> > >>> > > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid
>> >
>> > > > wrote:
>> > > > >> > >>> > > > > > > >> > > > >>> > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP
>> > > Allen.
>> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that
>> having a
>> > > > >> > >>> RackLocator
>> > > > >> > >>> > > class
>> > > > >> > >>> > > > > that
>> > > > >> > >>> > > > > > > is
>> > > > >> > >>> > > > > > > >> > > > pluggable
>> > > > >> > >>> > > > > > > >> > > > >>> > seems
>> > > > >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP
>> > refers
>> > > > to
>> > > > >> > >>> > potentially
>> > > > >> > >>> > > > > > non-ZK
>> > > > >> > >>> > > > > > > >> > storage
>> > > > >> > >>> > > > > > > >> > > > >>> for the
>> > > > >> > >>> > > > > > > >> > > > >>> > > > rack info which I don't
>> think
>> > is
>> > > > >> > >>> necessary.
>> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this
>> > info
>> > > in
>> > > > >> zk
>> > > > >> > >>> under
>> > > > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
>> > > > >> > >>> > > > > > > >> > > > >>> > > > similar to other broker
>> > > properties
>> > > > >> and
>> > > > >> > >>> add a
>> > > > >> > >>> > > > config
>> > > > >> > >>> > > > > in
>> > > > >> > >>> > > > > > > >> > > > KafkaConfig
>> > > > >> > >>> > > > > > > >> > > > >>> > called
>> > > > >> > >>> > > > > > > >> > > > >>> > > > "rack".
>> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > >> > >>> > > > > > >
>> > > > >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
>> > > > >> > >>> > > > > > > >> > > "rack":
>> > > > >> > >>> > > > > > > >> > > > >>> > "abc"}
>> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya
>> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30
>> > PM,
>> > > > Gwen
>> > > > >> > >>> Shapira
>> > > > >> > >>> > <
>> > > > >> > >>> > > > > > > >> > > gwen@confluent.io
>> > > > >> > >>> > > > > > > >> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > wrote:
>> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
>> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting
>> > out a
>> > > > KIP
>> > > > >> > for
>> > > > >> > >>> > this.
>> > > > >> > >>> > > > This
>> > > > >> > >>> > > > > > is
>> > > > >> > >>> > > > > > > >> super
>> > > > >> > >>> > > > > > > >> > > > >>> important
>> > > > >> > >>> > > > > > > >> > > > >>> > > for
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > production deployments of
>> > > Kafka.
>> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
>> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as
>> > many
>> > > > >> racks
>> > > > >> > as
>> > > > >> > >>> > > > > possible"?
>> > > > >> > >>> > > > > > > I'd
>> > > > >> > >>> > > > > > > >> > want
>> > > > >> > >>> > > > > > > >> > > to
>> > > > >> > >>> > > > > > > >> > > > >>> > balance
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > between safety (more
>> racks)
>> > and
>> > > > >> > network
>> > > > >> > >>> > > > > utilization
>> > > > >> > >>> > > > > > > >> > (traffic
>> > > > >> > >>> > > > > > > >> > > > >>> within a
>> > > > >> > >>> > > > > > > >> > > > >>> > > > rack
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth
>> TOR
>> > > > >> switch).
>> > > > >> > One
>> > > > >> > >>> > > replica
>> > > > >> > >>> > > > > on
>> > > > >> > >>> > > > > > a
>> > > > >> > >>> > > > > > > >> > > different
>> > > > >> > >>> > > > > > > >> > > > >>> rack
>> > > > >> > >>> > > > > > > >> > > > >>> > > and
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if
>> > > > possible)
>> > > > >> > >>> sounds
>> > > > >> > >>> > > > better
>> > > > >> > >>> > > > > to
>> > > > >> > >>> > > > > > > me.
>> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class
>> seems
>> > > > overly
>> > > > >> > >>> complex
>> > > > >> > >>> > > > > compared
>> > > > >> > >>> > > > > > to
>> > > > >> > >>> > > > > > > >> > > adding a
>> > > > >> > >>> > > > > > > >> > > > >>> > > > rack.number
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > property to the broker
>> > > properties
>> > > > >> > file.
>> > > > >> > >>> Why
>> > > > >> > >>> > do
>> > > > >> > >>> > > > we
>> > > > >> > >>> > > > > > want
>> > > > >> > >>> > > > > > > >> > that?
>> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
>> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at
>> 12:15
>> > > PM,
>> > > > >> > Allen
>> > > > >> > >>> > Wang <
>> > > > >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
>> > > > >> > >>> > > > > > > >> > > > >>> > > > wrote:
>> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36
>> for
>> > > rack
>> > > > >> aware
>> > > > >> > >>> > replica
>> > > > >> > >>> > > > > > > >> assignment.
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > >
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > >
>> > > > >> > >>> > > > > > > >> > >
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >>
>> > > > >> > >>> > > > > > >
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > >
>> > > > >> > >>> > >
>> > > > >> > >>> >
>> > > > >> > >>>
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize
>> the
>> > > > >> isolation
>> > > > >> > >>> > > provided
>> > > > >> > >>> > > > by
>> > > > >> > >>> > > > > > the
>> > > > >> > >>> > > > > > > >> > racks
>> > > > >> > >>> > > > > > > >> > > in
>> > > > >> > >>> > > > > > > >> > > > >>> data
>> > > > >> > >>> > > > > > > >> > > > >>> > > > center
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas
>> to
>> > > > racks
>> > > > >> to
>> > > > >> > >>> > provide
>> > > > >> > >>> > > > > fault
>> > > > >> > >>> > > > > > > >> > > tolerance.
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
>> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > > >
>> > > > >> > >>> > > > > > > >> > > > >>> > >
>> > > > >> > >>> > > > > > > >> > > > >>> >
>> > > > >> > >>> > > > > > > >> > > > >>>
>> > > > >> > >>> > > > > > > >> > > > >>
>> > > > >> > >>> > > > > > > >> > > > >>
>> > > > >> > >>> > > > > > > >> > > > >
>> > > > >> > >>> > > > > > > >> > > >
>> > > > >> > >>> > > > > > > >> > >
>> > > > >> > >>> > > > > > > >> >
>> > > > >> > >>> > > > > > > >>
>> > > > >> > >>> > > > > > > >
>> > > > >> > >>> > > > > > > >
>> > > > >> > >>> > > > > > >
>> > > > >> > >>> > > > > >
>> > > > >> > >>> > > > >
>> > > > >> > >>> > > >
>> > > > >> > >>> > >
>> > > > >> > >>> >
>> > > > >> > >>>
>> > > > >> > >>>
>> > > > >> > >>>
>> > > > >> > >>> --
>> > > > >> > >>> Thanks,
>> > > > >> > >>> Neha
>> > > > >> > >>>
>> > > > >> > >>
>> > > > >> > >>
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Updated KIP according to Jun's comment and included changes to TMR.

On Tue, Jan 5, 2016 at 5:59 PM, Jun Rao <ju...@confluent.io> wrote:

> Hi, Allen,
>
> A couple of minor comments on the KIP.
>
> 1. The version of the broker JSON string says 2. It should be 3.
>
> 2. The new version of UpdateMetadataRequest should be 2, instead of 1.
> Could you include the full wire protocol of version 2 of
> UpdateMetadataRequest and highlight the changed part?
>
> Thanks,
>
> Jun
>
> On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <al...@gmail.com> wrote:
>
> > Jun and I had a chance to discuss it in a meeting and it is agreed to
> > change the TMR in a different patch.
> >
> > I can change the KIP to include rack in TMR. The essential change is to
> add
> > rack into class BrokerEndPoint and make TMR version aware.
> >
> >
> >
> > On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
> > aauradkar@linkedin.com.invalid> wrote:
> >
> > > Jun/Allen -
> > >
> > > Did we ever actually agree on whether we should evolve the TMR to
> include
> > > rack info or not?
> > > I don't feel strongly about it but I if it's the right thing to do we
> > > should probably do it in this KIP (can be a separate patch).. it isn't
> a
> > > large change.
> > >
> > > Aditya
> > >
> > > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <al...@gmail.com>
> > wrote:
> > >
> > > > Added the rolling upgrade instruction in the KIP, similar to those in
> > > 0.9.0
> > > > release notes.
> > > >
> > > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <al...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > The reason that TopicMetadataResponse is not included in the KIP is
> > > that
> > > > > it currently is not version aware . So we need to introduce version
> > to
> > > it
> > > > > in order to make sure backward compatibility. It seems to me a big
> > > > change.
> > > > > Do we want to couple it with this KIP? Do we need to further
> discuss
> > > what
> > > > > information to include in the new version besides rack? For
> example,
> > > > should
> > > > > we include broker security protocol in TopicMetadataResponse?
> > > > >
> > > > > The other option is to make it a separate KIP to make
> > > > > TopicMetadataResponse version aware and decide what to include, and
> > > make
> > > > > this KIP focus on the rack aware algorithm, admin tools  and
> related
> > > > > changes to inter-broker protocol .
> > > > >
> > > > > Thanks,
> > > > > Allen
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > >> Allen,
> > > > >>
> > > > >> Thanks for the proposal. A few comments.
> > > > >>
> > > > >> 1. Since this KIP changes the inter broker communication protocol
> > > > >> (UpdateMetadataRequest), we will need to document the upgrade path
> > > > >> (similar
> > > > >> to what's described in
> > > > >> http://kafka.apache.org/090/documentation.html#upgrade).
> > > > >>
> > > > >> 2. It might be useful to include the rack info of the broker in
> > > > >> TopicMetadataResponse. This can be useful for administrative
> tasks,
> > as
> > > > >> well
> > > > >> as read affinity in the future.
> > > > >>
> > > > >> Jun
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <allenxwang@gmail.com
> >
> > > > wrote:
> > > > >>
> > > > >> > If there are no more comments I would like to call for a vote.
> > > > >> >
> > > > >> >
> > > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <
> > allenxwang@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > KIP is updated with more details and how to handle the
> situation
> > > > where
> > > > >> > > rack information is incomplete.
> > > > >> > >
> > > > >> > > In the situation where rack information is incomplete, but we
> > want
> > > > to
> > > > >> > > continue with the assignment, I have suggested to ignore all
> > rack
> > > > >> > > information and fallback to original algorithm. The reason is
> > > > >> explained
> > > > >> > > below:
> > > > >> > >
> > > > >> > > The other options are to assume that the broker without the
> rack
> > > > >> belong
> > > > >> > to
> > > > >> > > its own unique rack, or they belong to one "default" rack.
> > Either
> > > > way
> > > > >> we
> > > > >> > > choose, it is highly likely to result in uneven number of
> > brokers
> > > in
> > > > >> > racks,
> > > > >> > > and it is quite possible that the "made up" racks will have
> much
> > > > fewer
> > > > >> > > number of brokers. As I explained in the KIP, uneven number of
> > > > >> brokers in
> > > > >> > > racks will lead to uneven distribution of replicas among
> brokers
> > > > (even
> > > > >> > > though the leader distribution is still even). The brokers in
> > the
> > > > rack
> > > > >> > that
> > > > >> > > has fewer number of brokers will get more replicas per broker
> > than
> > > > >> > brokers
> > > > >> > > in other racks.
> > > > >> > >
> > > > >> > > Given this fact and the replica assignment produced will be
> > > > incorrect
> > > > >> > > anyway from rack aware point of view, ignoring all rack
> > > information
> > > > >> and
> > > > >> > > fallback to the original algorithm is not a bad choice since
> it
> > > will
> > > > >> at
> > > > >> > > least have a better guarantee of replica distribution.
> > > > >> > >
> > > > >> > > Also for command line tools it gives user a choice if for any
> > > reason
> > > > >> they
> > > > >> > > want to ignore rack information and fallback to the original
> > > > >> algorithm.
> > > > >> > >
> > > > >> > >
> > > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <
> > allenxwang@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > >> I am busy with some time pressing issues for the last few
> > days. I
> > > > >> will
> > > > >> > >> think about how the incomplete rack information will affect
> the
> > > > >> balance
> > > > >> > and
> > > > >> > >> update the KIP by early next week.
> > > > >> > >>
> > > > >> > >> Thanks,
> > > > >> > >> Allen
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <
> > neha@confluent.io
> > > >
> > > > >> > wrote:
> > > > >> > >>
> > > > >> > >>> Few suggestions on improving the KIP
> > > > >> > >>>
> > > > >> > >>> *If some brokers have rack, and some do not, the algorithm
> > will
> > > > >> thrown
> > > > >> > an
> > > > >> > >>> > exception. This is to prevent incorrect assignment caused
> by
> > > > user
> > > > >> > >>> error.*
> > > > >> > >>>
> > > > >> > >>>
> > > > >> > >>> In the KIP, can you clearly state the user-facing behavior
> > when
> > > > some
> > > > >> > >>> brokers have rack information and some don't. Which actions
> > and
> > > > >> > requests
> > > > >> > >>> will error out and how?
> > > > >> > >>>
> > > > >> > >>> *Even distribution of partition leadership among brokers*
> > > > >> > >>>
> > > > >> > >>>
> > > > >> > >>> There is some information about arranging the sorted broker
> > list
> > > > >> > >>> interlaced
> > > > >> > >>> with rack ids. Can you describe the changes to the current
> > > > algorithm
> > > > >> > in a
> > > > >> > >>> little more detail? How does this interlacing work if only a
> > > > subset
> > > > >> of
> > > > >> > >>> brokers have the rack id configured? Does this still work if
> > > > uneven
> > > > >> #
> > > > >> > of
> > > > >> > >>> brokers are assigned to each rack? It might work, I'm
> looking
> > > for
> > > > >> more
> > > > >> > >>> details on the changes, since it will affect the behavior
> seen
> > > by
> > > > >> the
> > > > >> > >>> user
> > > > >> > >>> - imbalance on either the leaders or data or both.
> > > > >> > >>>
> > > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> > > > >> > aauradkar@linkedin.com>
> > > > >> > >>> wrote:
> > > > >> > >>>
> > > > >> > >>> > I think this sounds reasonable. Anyone else have comments?
> > > > >> > >>> >
> > > > >> > >>> > Aditya
> > > > >> > >>> >
> > > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> > > > allenxwang@gmail.com
> > > > >> >
> > > > >> > >>> wrote:
> > > > >> > >>> >
> > > > >> > >>> > > During the discussion in the hangout, it was mentioned
> > that
> > > it
> > > > >> > would
> > > > >> > >>> be
> > > > >> > >>> > > desirable that consumers know the rack information of
> the
> > > > >> brokers
> > > > >> > so
> > > > >> > >>> that
> > > > >> > >>> > > they can consume from the broker in the same rack to
> > reduce
> > > > >> > latency.
> > > > >> > >>> As I
> > > > >> > >>> > > understand this will only be beneficial if consumer can
> > > > consume
> > > > >> > from
> > > > >> > >>> any
> > > > >> > >>> > > broker in ISR, which is not possible now.
> > > > >> > >>> > >
> > > > >> > >>> > > I suggest we skip the change to TMR. Once the change is
> > made
> > > > to
> > > > >> > >>> consumer
> > > > >> > >>> > to
> > > > >> > >>> > > be able to consume from any broker in ISR, the rack
> > > > information
> > > > >> can
> > > > >> > >>> be
> > > > >> > >>> > > added to TMR.
> > > > >> > >>> > >
> > > > >> > >>> > > Another thing I want to confirm is  command line
> > behavior. I
> > > > >> think
> > > > >> > >>> the
> > > > >> > >>> > > desirable default behavior is to fail fast on command
> line
> > > for
> > > > >> > >>> incomplete
> > > > >> > >>> > > rack mapping. The error message can include further
> > > > instruction
> > > > >> > that
> > > > >> > >>> > tells
> > > > >> > >>> > > the user to add an extra argument (like
> > > > >> "--allow-partial-rackinfo")
> > > > >> > >>> to
> > > > >> > >>> > > suppress the error and do an imperfect rack aware
> > > assignment.
> > > > If
> > > > >> > the
> > > > >> > >>> > > default behavior is to allow incomplete mapping, the
> error
> > > can
> > > > >> > still
> > > > >> > >>> be
> > > > >> > >>> > > easily missed.
> > > > >> > >>> > >
> > > > >> > >>> > > The affected command line tools are TopicCommand and
> > > > >> > >>> > > ReassignPartitionsCommand.
> > > > >> > >>> > >
> > > > >> > >>> > > Thanks,
> > > > >> > >>> > > Allen
> > > > >> > >>> > >
> > > > >> > >>> > >
> > > > >> > >>> > >
> > > > >> > >>> > >
> > > > >> > >>> > >
> > > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> > > > >> > >>> > aauradkar@linkedin.com>
> > > > >> > >>> > > wrote:
> > > > >> > >>> > >
> > > > >> > >>> > > > Hi Allen,
> > > > >> > >>> > > >
> > > > >> > >>> > > > For TopicMetadataResponse to understand version, you
> can
> > > > bump
> > > > >> up
> > > > >> > >>> the
> > > > >> > >>> > > > request version itself. Based on the version of the
> > > request,
> > > > >> the
> > > > >> > >>> > response
> > > > >> > >>> > > > can be appropriately serialized. It shouldn't be a
> huge
> > > > >> change.
> > > > >> > For
> > > > >> > >>> > > > example: We went through something similar for
> > > > ProduceRequest
> > > > >> > >>> recently
> > > > >> > >>> > (
> > > > >> > >>> > > > https://reviews.apache.org/r/33378/)
> > > > >> > >>> > > > I guess the reason protocol information is not
> included
> > in
> > > > the
> > > > >> > TMR
> > > > >> > >>> is
> > > > >> > >>> > > > because the topic itself is independent of any
> > particular
> > > > >> > protocol
> > > > >> > >>> (SSL
> > > > >> > >>> > > vs
> > > > >> > >>> > > > Plaintext). Having said that, I'm not sure we even
> need
> > > rack
> > > > >> > >>> > information
> > > > >> > >>> > > in
> > > > >> > >>> > > > TMR. What usecase were you thinking of initially?
> > > > >> > >>> > > >
> > > > >> > >>> > > > For 1 - I'd be fine with adding an option to the
> command
> > > > line
> > > > >> > tools
> > > > >> > >>> > that
> > > > >> > >>> > > > check rack assignment. For e.g. "--strict-assignment"
> or
> > > > >> > something
> > > > >> > >>> > > similar.
> > > > >> > >>> > > >
> > > > >> > >>> > > > Aditya
> > > > >> > >>> > > >
> > > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> > > > >> > allenxwang@gmail.com>
> > > > >> > >>> > > wrote:
> > > > >> > >>> > > >
> > > > >> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a
> > look.
> > > > One
> > > > >> > >>> thing I
> > > > >> > >>> > > have
> > > > >> > >>> > > > > changed is removing the proposal to add rack to
> > > > >> > >>> > TopicMetadataResponse.
> > > > >> > >>> > > > The
> > > > >> > >>> > > > > reason is that unlike UpdateMetadataRequest,
> > > > >> > >>> TopicMetadataResponse
> > > > >> > >>> > does
> > > > >> > >>> > > > not
> > > > >> > >>> > > > > understand version. I don't see a way to include
> rack
> > > > >> without
> > > > >> > >>> > breaking
> > > > >> > >>> > > > old
> > > > >> > >>> > > > > version of clients. That's probably why secure
> > protocol
> > > is
> > > > >> not
> > > > >> > >>> > included
> > > > >> > >>> > > > in
> > > > >> > >>> > > > > the TopicMetadataResponse either. I think it will
> be a
> > > > much
> > > > >> > >>> bigger
> > > > >> > >>> > > change
> > > > >> > >>> > > > > to include rack in TopicMetadataResponse.
> > > > >> > >>> > > > >
> > > > >> > >>> > > > > For 1, my concern is that doing rack aware
> assignment
> > > > >> without
> > > > >> > >>> > complete
> > > > >> > >>> > > > > broker to rack mapping will result in assignment
> that
> > is
> > > > not
> > > > >> > rack
> > > > >> > >>> > aware
> > > > >> > >>> > > > and
> > > > >> > >>> > > > > fail to provide fault tolerance in the event of rack
> > > > outage.
> > > > >> > This
> > > > >> > >>> > kind
> > > > >> > >>> > > of
> > > > >> > >>> > > > > problem will be difficult to surface. And the cost
> of
> > > this
> > > > >> > >>> problem is
> > > > >> > >>> > > > high:
> > > > >> > >>> > > > > you have to do partition reassignment if you are
> lucky
> > > to
> > > > >> spot
> > > > >> > >>> the
> > > > >> > >>> > > > problem
> > > > >> > >>> > > > > early on or face the consequence of data loss during
> > > real
> > > > >> rack
> > > > >> > >>> > outage.
> > > > >> > >>> > > > >
> > > > >> > >>> > > > > I do see the concern of fail-fast as it might also
> > cause
> > > > >> data
> > > > >> > >>> loss if
> > > > >> > >>> > > > > producer is not able produce the message due to
> topic
> > > > >> creation
> > > > >> > >>> > failure.
> > > > >> > >>> > > > Is
> > > > >> > >>> > > > > it feasible to treat dynamic topic creation and
> > command
> > > > >> tools
> > > > >> > >>> > > > differently?
> > > > >> > >>> > > > > We allow dynamic topic creation with incomplete
> > > > broker-rack
> > > > >> > >>> mapping
> > > > >> > >>> > and
> > > > >> > >>> > > > > fail fast in command line. Another option is to let
> > user
> > > > >> > >>> determine
> > > > >> > >>> > the
> > > > >> > >>> > > > > behavior for command line. For example, by default
> > fail
> > > > >> fast in
> > > > >> > >>> > command
> > > > >> > >>> > > > > line but allow incomplete broker-rack mapping if
> > another
> > > > >> switch
> > > > >> > >>> is
> > > > >> > >>> > > > > provided.
> > > > >> > >>> > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> > > > >> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> > > > >> > >>> > > > >
> > > > >> > >>> > > > > > Hey Allen,
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > 1. If we choose fail fast topic creation, we will
> > have
> > > > >> topic
> > > > >> > >>> > creation
> > > > >> > >>> > > > > > failures while upgrading the cluster. I really
> doubt
> > > we
> > > > >> want
> > > > >> > >>> this
> > > > >> > >>> > > > > behavior.
> > > > >> > >>> > > > > > Ideally, this should be invisible to clients of a
> > > > cluster.
> > > > >> > >>> > Currently,
> > > > >> > >>> > > > > each
> > > > >> > >>> > > > > > broker is effectively its own rack. So we probably
> > can
> > > > use
> > > > >> > the
> > > > >> > >>> rack
> > > > >> > >>> > > > > > information whenever possible but not make it a
> hard
> > > > >> > >>> requirement.
> > > > >> > >>> > To
> > > > >> > >>> > > > > extend
> > > > >> > >>> > > > > > Gwen's example, one badly configured broker should
> > not
> > > > >> > degrade
> > > > >> > >>> > topic
> > > > >> > >>> > > > > > creation for the entire cluster.
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a section on the
> > > > upgrade
> > > > >> > >>> piece to
> > > > >> > >>> > > > > confirm
> > > > >> > >>> > > > > > that old clients will not see errors? I believe
> > > > >> > >>> > > > > ZookeeperConsumerConnector
> > > > >> > >>> > > > > > reads the Broker objects from ZK. I wanted to
> > confirm
> > > > that
> > > > >> > this
> > > > >> > >>> > will
> > > > >> > >>> > > > not
> > > > >> > >>> > > > > > cause any problems.
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > 3. Could you elaborate your proposed changes to
> the
> > > > >> > >>> > > > UpdateMetadataRequest
> > > > >> > >>> > > > > > in the "Public Interfaces" section? Personally, I
> > find
> > > > >> this
> > > > >> > >>> format
> > > > >> > >>> > > easy
> > > > >> > >>> > > > > to
> > > > >> > >>> > > > > > read in terms of wire protocol changes:
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > >
> > > > >> > >>> > >
> > > > >> > >>> >
> > > > >> > >>>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > Aditya
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> > > > >> > >>> allenxwang@gmail.com>
> > > > >> > >>> > > > > wrote:
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > > > > KIP is updated include rack as an optional
> > property
> > > > for
> > > > >> > >>> broker.
> > > > >> > >>> > > > Please
> > > > >> > >>> > > > > > take
> > > > >> > >>> > > > > > > a look and let me know if more details are
> needed.
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > > For the case where some brokers have rack and
> some
> > > do
> > > > >> not,
> > > > >> > >>> the
> > > > >> > >>> > > > current
> > > > >> > >>> > > > > > KIP
> > > > >> > >>> > > > > > > uses the fail-fast behavior. If there are
> > concerns,
> > > we
> > > > >> can
> > > > >> > >>> > further
> > > > >> > >>> > > > > > discuss
> > > > >> > >>> > > > > > > this in the email thread or next hangout.
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
> > > > >> > >>> > allenxwang@gmail.com
> > > > >> > >>> > > >
> > > > >> > >>> > > > > > wrote:
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > > > That's a good question. I can think of three
> > > actions
> > > > >> if
> > > > >> > the
> > > > >> > >>> > rack
> > > > >> > >>> > > > > > > > information is incomplete:
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > > 1. Treat the node without rack as if it is on
> > its
> > > > >> unique
> > > > >> > >>> rack
> > > > >> > >>> > > > > > > > 2. Disregard all rack information and fallback
> > to
> > > > >> current
> > > > >> > >>> > > algorithm
> > > > >> > >>> > > > > > > > 3. Fail-fast
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > > Now I think about it, one and three make more
> > > sense.
> > > > >> The
> > > > >> > >>> reason
> > > > >> > >>> > > for
> > > > >> > >>> > > > > > > > fail-fast is that user mistake for not
> providing
> > > the
> > > > >> rack
> > > > >> > >>> may
> > > > >> > >>> > > never
> > > > >> > >>> > > > > be
> > > > >> > >>> > > > > > > > found if we tolerate that and the assignment
> may
> > > not
> > > > >> be
> > > > >> > >>> rack
> > > > >> > >>> > > aware
> > > > >> > >>> > > > as
> > > > >> > >>> > > > > > the
> > > > >> > >>> > > > > > > > user has expected and this creates debug
> > problems
> > > > when
> > > > >> > >>> things
> > > > >> > >>> > > fail.
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > > What do you think? If not fail-fast, is there
> > > anyway
> > > > >> we
> > > > >> > can
> > > > >> > >>> > make
> > > > >> > >>> > > > the
> > > > >> > >>> > > > > > user
> > > > >> > >>> > > > > > > > error standing out?
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen
> Shapira <
> > > > >> > >>> > > gwen@confluent.io>
> > > > >> > >>> > > > > > > wrote:
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers
> have
> > > > rack
> > > > >> > >>> > assignment
> > > > >> > >>> > > > and
> > > > >> > >>> > > > > > some
> > > > >> > >>> > > > > > > >> don't, do we act like none of them have it?
> or
> > > like
> > > > >> > those
> > > > >> > >>> > > without
> > > > >> > >>> > > > > > > >> assignment are in their own rack?
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >> The first scenario is good when first setting
> > up
> > > > >> > >>> > rack-awareness,
> > > > >> > >>> > > > but
> > > > >> > >>> > > > > > the
> > > > >> > >>> > > > > > > >> second makes more sense for on-going
> > maintenance
> > > (I
> > > > >> can
> > > > >> > >>> > totally
> > > > >> > >>> > > > see
> > > > >> > >>> > > > > > > >> someone
> > > > >> > >>> > > > > > > >> adding a node and forgetting to set the rack
> > > > >> property,
> > > > >> > we
> > > > >> > >>> > don't
> > > > >> > >>> > > > want
> > > > >> > >>> > > > > > > this
> > > > >> > >>> > > > > > > >> to change behavior for anything except the
> new
> > > > node).
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >> What do you think?
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >> Gwen
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang
> <
> > > > >> > >>> > > > allenxwang@gmail.com>
> > > > >> > >>> > > > > > > >> wrote:
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >> > For scenario 1:
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
> property
> > > > file
> > > > >> or
> > > > >> > >>> > > > dynamically
> > > > >> > >>> > > > > > set
> > > > >> > >>> > > > > > > >> it in
> > > > >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka server.
> > You
> > > > >> would
> > > > >> > do
> > > > >> > >>> > that
> > > > >> > >>> > > > for
> > > > >> > >>> > > > > > all
> > > > >> > >>> > > > > > > >> > brokers and restart the brokers one by one.
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > In this scenario, the complete broker to
> rack
> > > > >> mapping
> > > > >> > >>> may
> > > > >> > >>> > not
> > > > >> > >>> > > be
> > > > >> > >>> > > > > > > >> available
> > > > >> > >>> > > > > > > >> > until every broker is restarted. During
> that
> > > time
> > > > >> we
> > > > >> > >>> fall
> > > > >> > >>> > back
> > > > >> > >>> > > > to
> > > > >> > >>> > > > > > > >> default
> > > > >> > >>> > > > > > > >> > replica assignment algorithm.
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > For scenario 2:
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > - Add the rack information to broker
> property
> > > > file
> > > > >> or
> > > > >> > >>> > > > dynamically
> > > > >> > >>> > > > > > set
> > > > >> > >>> > > > > > > >> it in
> > > > >> > >>> > > > > > > >> > the wrapper code and start the broker.
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen
> > Shapira <
> > > > >> > >>> > > > gwen@confluent.io>
> > > > >> > >>> > > > > > > >> wrote:
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > > Can you clarify the workflow for the
> > > following
> > > > >> > >>> scenarios:
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to
> > add
> > > > >> rack
> > > > >> > >>> > > information
> > > > >> > >>> > > > > for
> > > > >> > >>> > > > > > > >> each
> > > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to
> > > > specify
> > > > >> > which
> > > > >> > >>> > rack
> > > > >> > >>> > > it
> > > > >> > >>> > > > > > > >> belongs on
> > > > >> > >>> > > > > > > >> > > while adding it.
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> > > Thanks!
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen
> > Wang <
> > > > >> > >>> > > > > allenxwang@gmail.com
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > > > >> > wrote:
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> > > > We discussed the KIP in the hangout
> > today.
> > > > The
> > > > >> > >>> > > > recommendation
> > > > >> > >>> > > > > is
> > > > >> > >>> > > > > > > to
> > > > >> > >>> > > > > > > >> > make
> > > > >> > >>> > > > > > > >> > > > rack as a broker property in ZooKeeper.
> > For
> > > > >> users
> > > > >> > >>> with
> > > > >> > >>> > > > > existing
> > > > >> > >>> > > > > > > rack
> > > > >> > >>> > > > > > > >> > > > information stored somewhere, they
> would
> > > need
> > > > >> to
> > > > >> > >>> > retrieve
> > > > >> > >>> > > > the
> > > > >> > >>> > > > > > > >> > information
> > > > >> > >>> > > > > > > >> > > > at broker start up and dynamically set
> > the
> > > > rack
> > > > >> > >>> > property,
> > > > >> > >>> > > > > which
> > > > >> > >>> > > > > > > can
> > > > >> > >>> > > > > > > >> be
> > > > >> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap
> > > broker.
> > > > >> > There
> > > > >> > >>> will
> > > > >> > >>> > > be
> > > > >> > >>> > > > no
> > > > >> > >>> > > > > > > >> > interface
> > > > >> > >>> > > > > > > >> > > or
> > > > >> > >>> > > > > > > >> > > > pluggable implementation to retrieve
> the
> > > rack
> > > > >> > >>> > information.
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > The assumption is that you always need
> to
> > > > >> restart
> > > > >> > >>> the
> > > > >> > >>> > > broker
> > > > >> > >>> > > > > to
> > > > >> > >>> > > > > > > >> make a
> > > > >> > >>> > > > > > > >> > > > change to the rack.
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > Once the rack becomes a broker
> property,
> > it
> > > > >> will
> > > > >> > be
> > > > >> > >>> > > possible
> > > > >> > >>> > > > > to
> > > > >> > >>> > > > > > > make
> > > > >> > >>> > > > > > > >> > rack
> > > > >> > >>> > > > > > > >> > > > part of the meta data to help the
> > consumer
> > > > >> choose
> > > > >> > >>> which
> > > > >> > >>> > in
> > > > >> > >>> > > > > sync
> > > > >> > >>> > > > > > > >> replica
> > > > >> > >>> > > > > > > >> > > to
> > > > >> > >>> > > > > > > >> > > > consume from as part of the future
> > consumer
> > > > >> > >>> enhancement.
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > I will update the KIP.
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > Thanks,
> > > > >> > >>> > > > > > > >> > > > Allen
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen
> > Wang
> > > <
> > > > >> > >>> > > > > > allenxwang@gmail.com>
> > > > >> > >>> > > > > > > >> > wrote:
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but
> > this
> > > > KIP
> > > > >> > was
> > > > >> > >>> not
> > > > >> > >>> > > > > > discussed
> > > > >> > >>> > > > > > > >> due
> > > > >> > >>> > > > > > > >> > to
> > > > >> > >>> > > > > > > >> > > > > time constraint.
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > > > However, after hearing discussion of
> > > > KIP-35,
> > > > >> I
> > > > >> > >>> have
> > > > >> > >>> > the
> > > > >> > >>> > > > > > feeling
> > > > >> > >>> > > > > > > >> that
> > > > >> > >>> > > > > > > >> > > > > incompatibility (caused by new broker
> > > > >> property)
> > > > >> > >>> > between
> > > > >> > >>> > > > > > brokers
> > > > >> > >>> > > > > > > >> with
> > > > >> > >>> > > > > > > >> > > > > different versions  will be solved
> > there.
> > > > In
> > > > >> > >>> addition,
> > > > >> > >>> > > > > having
> > > > >> > >>> > > > > > > >> stack
> > > > >> > >>> > > > > > > >> > in
> > > > >> > >>> > > > > > > >> > > > > broker property as meta data may also
> > > help
> > > > >> > >>> consumers
> > > > >> > >>> > in
> > > > >> > >>> > > > the
> > > > >> > >>> > > > > > > >> future.
> > > > >> > >>> > > > > > > >> > So
> > > > >> > >>> > > > > > > >> > > I
> > > > >> > >>> > > > > > > >> > > > am
> > > > >> > >>> > > > > > > >> > > > > open to adding stack property to
> > broker.
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the
> > next
> > > > KIP
> > > > >> > >>> hangout.
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM,
> Allen
> > > > Wang <
> > > > >> > >>> > > > > > > allenxwang@gmail.com
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >> > > > wrote:
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > > >> Can you send me the information on
> the
> > > > next
> > > > >> KIP
> > > > >> > >>> > > hangout?
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping is
> > not
> > > > >> > cached.
> > > > >> > >>> In
> > > > >> > >>> > > > > > KafkaApis,
> > > > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called
> > each
> > > > >> time
> > > > >> > the
> > > > >> > >>> > > mapping
> > > > >> > >>> > > > > is
> > > > >> > >>> > > > > > > >> needed
> > > > >> > >>> > > > > > > >> > > for
> > > > >> > >>> > > > > > > >> > > > >> auto topic creation. This will
> ensure
> > > > latest
> > > > >> > >>> mapping
> > > > >> > >>> > is
> > > > >> > >>> > > > > used
> > > > >> > >>> > > > > > at
> > > > >> > >>> > > > > > > >> any
> > > > >> > >>> > > > > > > >> > > > time.
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >> The ability to get the complete
> > mapping
> > > > >> makes
> > > > >> > it
> > > > >> > >>> > simple
> > > > >> > >>> > > > to
> > > > >> > >>> > > > > > > reuse
> > > > >> > >>> > > > > > > >> the
> > > > >> > >>> > > > > > > >> > > > same
> > > > >> > >>> > > > > > > >> > > > >> interface in command line tools.
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM,
> > Aditya
> > > > >> > >>> Auradkar <
> > > > >> > >>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid>
> > wrote:
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the
> > next
> > > > KIP
> > > > >> > >>> hangout?
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack
> > locator
> > > > can
> > > > >> be
> > > > >> > >>> useful
> > > > >> > >>> > > > but I
> > > > >> > >>> > > > > > do
> > > > >> > >>> > > > > > > >> see a
> > > > >> > >>> > > > > > > >> > > few
> > > > >> > >>> > > > > > > >> > > > >>> concerns:
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in
> > the
> > > > >> > >>> document),
> > > > >> > >>> > > > implies
> > > > >> > >>> > > > > > that
> > > > >> > >>> > > > > > > >> it
> > > > >> > >>> > > > > > > >> > can
> > > > >> > >>> > > > > > > >> > > > >>> discover rack information for any
> > node
> > > in
> > > > >> the
> > > > >> > >>> > cluster.
> > > > >> > >>> > > > How
> > > > >> > >>> > > > > > > does
> > > > >> > >>> > > > > > > >> it
> > > > >> > >>> > > > > > > >> > > deal
> > > > >> > >>> > > > > > > >> > > > >>> with rack location changes? For
> > > example,
> > > > >> if I
> > > > >> > >>> moved
> > > > >> > >>> > > > broker
> > > > >> > >>> > > > > > id
> > > > >> > >>> > > > > > > >> (1)
> > > > >> > >>> > > > > > > >> > > from
> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that
> > > broker
> > > > >> with
> > > > >> > a
> > > > >> > >>> > newer
> > > > >> > >>> > > > rack
> > > > >> > >>> > > > > > > >> config.
> > > > >> > >>> > > > > > > >> > If
> > > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker ->
> rack
> > > > >> > >>> information at
> > > > >> > >>> > > > start
> > > > >> > >>> > > > > up
> > > > >> > >>> > > > > > > >> time,
> > > > >> > >>> > > > > > > >> > > any
> > > > >> > >>> > > > > > > >> > > > >>> change to a broker will require
> > > bouncing
> > > > >> the
> > > > >> > >>> entire
> > > > >> > >>> > > > > cluster
> > > > >> > >>> > > > > > > >> since
> > > > >> > >>> > > > > > > >> > > > >>> createTopic requests can be sent to
> > any
> > > > >> node
> > > > >> > in
> > > > >> > >>> the
> > > > >> > >>> > > > > cluster.
> > > > >> > >>> > > > > > > >> > > > >>> For this reason it may be simpler
> to
> > > have
> > > > >> each
> > > > >> > >>> node
> > > > >> > >>> > be
> > > > >> > >>> > > > > aware
> > > > >> > >>> > > > > > > of
> > > > >> > >>> > > > > > > >> its
> > > > >> > >>> > > > > > > >> > > own
> > > > >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during
> > start
> > > up
> > > > >> > time.
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on
> > an
> > > > >> > external
> > > > >> > >>> > > service
> > > > >> > >>> > > > > > being
> > > > >> > >>> > > > > > > >> > > available
> > > > >> > >>> > > > > > > >> > > > >>> to
> > > > >> > >>> > > > > > > >> > > > >>> serve rack information.
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a
> > > > couple
> > > > >> of
> > > > >> > >>> other
> > > > >> > >>> > > > > systems
> > > > >> > >>> > > > > > > deal
> > > > >> > >>> > > > > > > >> > with
> > > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> > > > >> > >>> > > > > > > >> > > > >>> For Cassandra some interesting
> modes
> > > are:
> > > > >> > >>> > > > > > > >> > > > >>> (Property File configuration)
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > >
> > > > >> > >>> > >
> > > > >> > >>> >
> > > > >> > >>>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > >
> > > > >> > >>> > >
> > > > >> > >>> >
> > > > >> > >>>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> Voldemort does a static node ->
> zone
> > > > >> > assignment
> > > > >> > >>> > based
> > > > >> > >>> > > on
> > > > >> > >>> > > > > > > >> > > configuration.
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> Aditya
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM,
> > Allen
> > > > >> Wang <
> > > > >> > >>> > > > > > > >> allenxwang@gmail.com
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>> > I would like to see if we can do
> > > both:
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to
> > > > >> facilitate
> > > > >> > >>> > migration
> > > > >> > >>> > > > > with
> > > > >> > >>> > > > > > > >> > existing
> > > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > - Make rack an optional property
> > for
> > > > >> broker.
> > > > >> > >>> If
> > > > >> > >>> > rack
> > > > >> > >>> > > > is
> > > > >> > >>> > > > > > > >> available
> > > > >> > >>> > > > > > > >> > > > from
> > > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as source of
> > truth.
> > > > For
> > > > >> > users
> > > > >> > >>> > with
> > > > >> > >>> > > > > > existing
> > > > >> > >>> > > > > > > >> > > > >>> broker-rack
> > > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can
> > use
> > > > the
> > > > >> > >>> pluggable
> > > > >> > >>> > > way
> > > > >> > >>> > > > > or
> > > > >> > >>> > > > > > > they
> > > > >> > >>> > > > > > > >> > can
> > > > >> > >>> > > > > > > >> > > > >>> transfer
> > > > >> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack
> > > > property.
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what
> > > happens
> > > > >> at
> > > > >> > >>> rolling
> > > > >> > >>> > > > > upgrade
> > > > >> > >>> > > > > > > >> when
> > > > >> > >>> > > > > > > >> > we
> > > > >> > >>> > > > > > > >> > > > have
> > > > >> > >>> > > > > > > >> > > > >>> > rack as a broker property. For
> > > brokers
> > > > >> with
> > > > >> > >>> older
> > > > >> > >>> > > > > version
> > > > >> > >>> > > > > > of
> > > > >> > >>> > > > > > > >> > Kafka,
> > > > >> > >>> > > > > > > >> > > > >>> will it
> > > > >> > >>> > > > > > > >> > > > >>> > cause problem for them? If so, is
> > > there
> > > > >> any
> > > > >> > >>> > > > workaround?
> > > > >> > >>> > > > > I
> > > > >> > >>> > > > > > > also
> > > > >> > >>> > > > > > > >> > > think
> > > > >> > >>> > > > > > > >> > > > it
> > > > >> > >>> > > > > > > >> > > > >>> > would be better not to have rack
> in
> > > the
> > > > >> > >>> controller
> > > > >> > >>> > > > wire
> > > > >> > >>> > > > > > > >> protocol
> > > > >> > >>> > > > > > > >> > > but
> > > > >> > >>> > > > > > > >> > > > >>> not
> > > > >> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > Thanks,
> > > > >> > >>> > > > > > > >> > > > >>> > Allen
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM,
> > Todd
> > > > >> > Palino <
> > > > >> > >>> > > > > > > >> tpalino@gmail.com>
> > > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a
> > > > pluggable
> > > > >> > >>> locator.
> > > > >> > >>> > > For
> > > > >> > >>> > > > > > > >> example, we
> > > > >> > >>> > > > > > > >> > > > >>> already
> > > > >> > >>> > > > > > > >> > > > >>> > > have an interface for
> discovering
> > > > >> > >>> information
> > > > >> > >>> > > about
> > > > >> > >>> > > > > the
> > > > >> > >>> > > > > > > >> > physical
> > > > >> > >>> > > > > > > >> > > > >>> location
> > > > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the
> > idea
> > > > of
> > > > >> > >>> having to
> > > > >> > >>> > > > > > maintain
> > > > >> > >>> > > > > > > >> data
> > > > >> > >>> > > > > > > >> > in
> > > > >> > >>> > > > > > > >> > > > >>> > multiple
> > > > >> > >>> > > > > > > >> > > > >>> > > places.
> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > >>> > > > > > > >> > > > >>> > > -Todd
> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48
> PM,
> > > > Aditya
> > > > >> > >>> > Auradkar <
> > > > >> > >>> > > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid
> >
> > > > wrote:
> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP
> > > Allen.
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that
> having a
> > > > >> > >>> RackLocator
> > > > >> > >>> > > class
> > > > >> > >>> > > > > that
> > > > >> > >>> > > > > > > is
> > > > >> > >>> > > > > > > >> > > > pluggable
> > > > >> > >>> > > > > > > >> > > > >>> > seems
> > > > >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP
> > refers
> > > > to
> > > > >> > >>> > potentially
> > > > >> > >>> > > > > > non-ZK
> > > > >> > >>> > > > > > > >> > storage
> > > > >> > >>> > > > > > > >> > > > >>> for the
> > > > >> > >>> > > > > > > >> > > > >>> > > > rack info which I don't think
> > is
> > > > >> > >>> necessary.
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this
> > info
> > > in
> > > > >> zk
> > > > >> > >>> under
> > > > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> > > > >> > >>> > > > > > > >> > > > >>> > > > similar to other broker
> > > properties
> > > > >> and
> > > > >> > >>> add a
> > > > >> > >>> > > > config
> > > > >> > >>> > > > > in
> > > > >> > >>> > > > > > > >> > > > KafkaConfig
> > > > >> > >>> > > > > > > >> > > > >>> > called
> > > > >> > >>> > > > > > > >> > > > >>> > > > "rack".
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > >
> > > > >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > > >> > >>> > > > > > > >> > > "rack":
> > > > >> > >>> > > > > > > >> > > > >>> > "abc"}
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > Aditya
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30
> > PM,
> > > > Gwen
> > > > >> > >>> Shapira
> > > > >> > >>> > <
> > > > >> > >>> > > > > > > >> > > gwen@confluent.io
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > wrote:
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting
> > out a
> > > > KIP
> > > > >> > for
> > > > >> > >>> > this.
> > > > >> > >>> > > > This
> > > > >> > >>> > > > > > is
> > > > >> > >>> > > > > > > >> super
> > > > >> > >>> > > > > > > >> > > > >>> important
> > > > >> > >>> > > > > > > >> > > > >>> > > for
> > > > >> > >>> > > > > > > >> > > > >>> > > > > production deployments of
> > > Kafka.
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as
> > many
> > > > >> racks
> > > > >> > as
> > > > >> > >>> > > > > possible"?
> > > > >> > >>> > > > > > > I'd
> > > > >> > >>> > > > > > > >> > want
> > > > >> > >>> > > > > > > >> > > to
> > > > >> > >>> > > > > > > >> > > > >>> > balance
> > > > >> > >>> > > > > > > >> > > > >>> > > > > between safety (more racks)
> > and
> > > > >> > network
> > > > >> > >>> > > > > utilization
> > > > >> > >>> > > > > > > >> > (traffic
> > > > >> > >>> > > > > > > >> > > > >>> within a
> > > > >> > >>> > > > > > > >> > > > >>> > > > rack
> > > > >> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR
> > > > >> switch).
> > > > >> > One
> > > > >> > >>> > > replica
> > > > >> > >>> > > > > on
> > > > >> > >>> > > > > > a
> > > > >> > >>> > > > > > > >> > > different
> > > > >> > >>> > > > > > > >> > > > >>> rack
> > > > >> > >>> > > > > > > >> > > > >>> > > and
> > > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if
> > > > possible)
> > > > >> > >>> sounds
> > > > >> > >>> > > > better
> > > > >> > >>> > > > > to
> > > > >> > >>> > > > > > > me.
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems
> > > > overly
> > > > >> > >>> complex
> > > > >> > >>> > > > > compared
> > > > >> > >>> > > > > > to
> > > > >> > >>> > > > > > > >> > > adding a
> > > > >> > >>> > > > > > > >> > > > >>> > > > rack.number
> > > > >> > >>> > > > > > > >> > > > >>> > > > > property to the broker
> > > properties
> > > > >> > file.
> > > > >> > >>> Why
> > > > >> > >>> > do
> > > > >> > >>> > > > we
> > > > >> > >>> > > > > > want
> > > > >> > >>> > > > > > > >> > that?
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at
> 12:15
> > > PM,
> > > > >> > Allen
> > > > >> > >>> > Wang <
> > > > >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> > > > >> > >>> > > > > > > >> > > > >>> > > > wrote:
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for
> > > rack
> > > > >> aware
> > > > >> > >>> > replica
> > > > >> > >>> > > > > > > >> assignment.
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > >
> > > > >> > >>> > >
> > > > >> > >>> >
> > > > >> > >>>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize
> the
> > > > >> isolation
> > > > >> > >>> > > provided
> > > > >> > >>> > > > by
> > > > >> > >>> > > > > > the
> > > > >> > >>> > > > > > > >> > racks
> > > > >> > >>> > > > > > > >> > > in
> > > > >> > >>> > > > > > > >> > > > >>> data
> > > > >> > >>> > > > > > > >> > > > >>> > > > center
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas
> to
> > > > racks
> > > > >> to
> > > > >> > >>> > provide
> > > > >> > >>> > > > > fault
> > > > >> > >>> > > > > > > >> > > tolerance.
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> > > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> > > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > > >> > >>> > > > > > > >> > > > >>> > > >
> > > > >> > >>> > > > > > > >> > > > >>> > >
> > > > >> > >>> > > > > > > >> > > > >>> >
> > > > >> > >>> > > > > > > >> > > > >>>
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >>
> > > > >> > >>> > > > > > > >> > > > >
> > > > >> > >>> > > > > > > >> > > >
> > > > >> > >>> > > > > > > >> > >
> > > > >> > >>> > > > > > > >> >
> > > > >> > >>> > > > > > > >>
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > > >
> > > > >> > >>> > > > > > >
> > > > >> > >>> > > > > >
> > > > >> > >>> > > > >
> > > > >> > >>> > > >
> > > > >> > >>> > >
> > > > >> > >>> >
> > > > >> > >>>
> > > > >> > >>>
> > > > >> > >>>
> > > > >> > >>> --
> > > > >> > >>> Thanks,
> > > > >> > >>> Neha
> > > > >> > >>>
> > > > >> > >>
> > > > >> > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Jun Rao <ju...@confluent.io>.
Hi, Allen,

A couple of minor comments on the KIP.

1. The version of the broker JSON string says 2. It should be 3.

2. The new version of UpdateMetadataRequest should be 2, instead of 1.
Could you include the full wire protocol of version 2 of
UpdateMetadataRequest and highlight the changed part?

Thanks,

Jun

On Tue, Jan 5, 2016 at 3:11 PM, Allen Wang <al...@gmail.com> wrote:

> Jun and I had a chance to discuss it in a meeting and it is agreed to
> change the TMR in a different patch.
>
> I can change the KIP to include rack in TMR. The essential change is to add
> rack into class BrokerEndPoint and make TMR version aware.
>
>
>
> On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
> aauradkar@linkedin.com.invalid> wrote:
>
> > Jun/Allen -
> >
> > Did we ever actually agree on whether we should evolve the TMR to include
> > rack info or not?
> > I don't feel strongly about it but I if it's the right thing to do we
> > should probably do it in this KIP (can be a separate patch).. it isn't a
> > large change.
> >
> > Aditya
> >
> > On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > Added the rolling upgrade instruction in the KIP, similar to those in
> > 0.9.0
> > > release notes.
> > >
> > > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <al...@gmail.com>
> > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > The reason that TopicMetadataResponse is not included in the KIP is
> > that
> > > > it currently is not version aware . So we need to introduce version
> to
> > it
> > > > in order to make sure backward compatibility. It seems to me a big
> > > change.
> > > > Do we want to couple it with this KIP? Do we need to further discuss
> > what
> > > > information to include in the new version besides rack? For example,
> > > should
> > > > we include broker security protocol in TopicMetadataResponse?
> > > >
> > > > The other option is to make it a separate KIP to make
> > > > TopicMetadataResponse version aware and decide what to include, and
> > make
> > > > this KIP focus on the rack aware algorithm, admin tools  and related
> > > > changes to inter-broker protocol .
> > > >
> > > > Thanks,
> > > > Allen
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > >> Allen,
> > > >>
> > > >> Thanks for the proposal. A few comments.
> > > >>
> > > >> 1. Since this KIP changes the inter broker communication protocol
> > > >> (UpdateMetadataRequest), we will need to document the upgrade path
> > > >> (similar
> > > >> to what's described in
> > > >> http://kafka.apache.org/090/documentation.html#upgrade).
> > > >>
> > > >> 2. It might be useful to include the rack info of the broker in
> > > >> TopicMetadataResponse. This can be useful for administrative tasks,
> as
> > > >> well
> > > >> as read affinity in the future.
> > > >>
> > > >> Jun
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <al...@gmail.com>
> > > wrote:
> > > >>
> > > >> > If there are no more comments I would like to call for a vote.
> > > >> >
> > > >> >
> > > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <
> allenxwang@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > KIP is updated with more details and how to handle the situation
> > > where
> > > >> > > rack information is incomplete.
> > > >> > >
> > > >> > > In the situation where rack information is incomplete, but we
> want
> > > to
> > > >> > > continue with the assignment, I have suggested to ignore all
> rack
> > > >> > > information and fallback to original algorithm. The reason is
> > > >> explained
> > > >> > > below:
> > > >> > >
> > > >> > > The other options are to assume that the broker without the rack
> > > >> belong
> > > >> > to
> > > >> > > its own unique rack, or they belong to one "default" rack.
> Either
> > > way
> > > >> we
> > > >> > > choose, it is highly likely to result in uneven number of
> brokers
> > in
> > > >> > racks,
> > > >> > > and it is quite possible that the "made up" racks will have much
> > > fewer
> > > >> > > number of brokers. As I explained in the KIP, uneven number of
> > > >> brokers in
> > > >> > > racks will lead to uneven distribution of replicas among brokers
> > > (even
> > > >> > > though the leader distribution is still even). The brokers in
> the
> > > rack
> > > >> > that
> > > >> > > has fewer number of brokers will get more replicas per broker
> than
> > > >> > brokers
> > > >> > > in other racks.
> > > >> > >
> > > >> > > Given this fact and the replica assignment produced will be
> > > incorrect
> > > >> > > anyway from rack aware point of view, ignoring all rack
> > information
> > > >> and
> > > >> > > fallback to the original algorithm is not a bad choice since it
> > will
> > > >> at
> > > >> > > least have a better guarantee of replica distribution.
> > > >> > >
> > > >> > > Also for command line tools it gives user a choice if for any
> > reason
> > > >> they
> > > >> > > want to ignore rack information and fallback to the original
> > > >> algorithm.
> > > >> > >
> > > >> > >
> > > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <
> allenxwang@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > >> I am busy with some time pressing issues for the last few
> days. I
> > > >> will
> > > >> > >> think about how the incomplete rack information will affect the
> > > >> balance
> > > >> > and
> > > >> > >> update the KIP by early next week.
> > > >> > >>
> > > >> > >> Thanks,
> > > >> > >> Allen
> > > >> > >>
> > > >> > >>
> > > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <
> neha@confluent.io
> > >
> > > >> > wrote:
> > > >> > >>
> > > >> > >>> Few suggestions on improving the KIP
> > > >> > >>>
> > > >> > >>> *If some brokers have rack, and some do not, the algorithm
> will
> > > >> thrown
> > > >> > an
> > > >> > >>> > exception. This is to prevent incorrect assignment caused by
> > > user
> > > >> > >>> error.*
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> In the KIP, can you clearly state the user-facing behavior
> when
> > > some
> > > >> > >>> brokers have rack information and some don't. Which actions
> and
> > > >> > requests
> > > >> > >>> will error out and how?
> > > >> > >>>
> > > >> > >>> *Even distribution of partition leadership among brokers*
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> There is some information about arranging the sorted broker
> list
> > > >> > >>> interlaced
> > > >> > >>> with rack ids. Can you describe the changes to the current
> > > algorithm
> > > >> > in a
> > > >> > >>> little more detail? How does this interlacing work if only a
> > > subset
> > > >> of
> > > >> > >>> brokers have the rack id configured? Does this still work if
> > > uneven
> > > >> #
> > > >> > of
> > > >> > >>> brokers are assigned to each rack? It might work, I'm looking
> > for
> > > >> more
> > > >> > >>> details on the changes, since it will affect the behavior seen
> > by
> > > >> the
> > > >> > >>> user
> > > >> > >>> - imbalance on either the leaders or data or both.
> > > >> > >>>
> > > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> > > >> > aauradkar@linkedin.com>
> > > >> > >>> wrote:
> > > >> > >>>
> > > >> > >>> > I think this sounds reasonable. Anyone else have comments?
> > > >> > >>> >
> > > >> > >>> > Aditya
> > > >> > >>> >
> > > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> > > allenxwang@gmail.com
> > > >> >
> > > >> > >>> wrote:
> > > >> > >>> >
> > > >> > >>> > > During the discussion in the hangout, it was mentioned
> that
> > it
> > > >> > would
> > > >> > >>> be
> > > >> > >>> > > desirable that consumers know the rack information of the
> > > >> brokers
> > > >> > so
> > > >> > >>> that
> > > >> > >>> > > they can consume from the broker in the same rack to
> reduce
> > > >> > latency.
> > > >> > >>> As I
> > > >> > >>> > > understand this will only be beneficial if consumer can
> > > consume
> > > >> > from
> > > >> > >>> any
> > > >> > >>> > > broker in ISR, which is not possible now.
> > > >> > >>> > >
> > > >> > >>> > > I suggest we skip the change to TMR. Once the change is
> made
> > > to
> > > >> > >>> consumer
> > > >> > >>> > to
> > > >> > >>> > > be able to consume from any broker in ISR, the rack
> > > information
> > > >> can
> > > >> > >>> be
> > > >> > >>> > > added to TMR.
> > > >> > >>> > >
> > > >> > >>> > > Another thing I want to confirm is  command line
> behavior. I
> > > >> think
> > > >> > >>> the
> > > >> > >>> > > desirable default behavior is to fail fast on command line
> > for
> > > >> > >>> incomplete
> > > >> > >>> > > rack mapping. The error message can include further
> > > instruction
> > > >> > that
> > > >> > >>> > tells
> > > >> > >>> > > the user to add an extra argument (like
> > > >> "--allow-partial-rackinfo")
> > > >> > >>> to
> > > >> > >>> > > suppress the error and do an imperfect rack aware
> > assignment.
> > > If
> > > >> > the
> > > >> > >>> > > default behavior is to allow incomplete mapping, the error
> > can
> > > >> > still
> > > >> > >>> be
> > > >> > >>> > > easily missed.
> > > >> > >>> > >
> > > >> > >>> > > The affected command line tools are TopicCommand and
> > > >> > >>> > > ReassignPartitionsCommand.
> > > >> > >>> > >
> > > >> > >>> > > Thanks,
> > > >> > >>> > > Allen
> > > >> > >>> > >
> > > >> > >>> > >
> > > >> > >>> > >
> > > >> > >>> > >
> > > >> > >>> > >
> > > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> > > >> > >>> > aauradkar@linkedin.com>
> > > >> > >>> > > wrote:
> > > >> > >>> > >
> > > >> > >>> > > > Hi Allen,
> > > >> > >>> > > >
> > > >> > >>> > > > For TopicMetadataResponse to understand version, you can
> > > bump
> > > >> up
> > > >> > >>> the
> > > >> > >>> > > > request version itself. Based on the version of the
> > request,
> > > >> the
> > > >> > >>> > response
> > > >> > >>> > > > can be appropriately serialized. It shouldn't be a huge
> > > >> change.
> > > >> > For
> > > >> > >>> > > > example: We went through something similar for
> > > ProduceRequest
> > > >> > >>> recently
> > > >> > >>> > (
> > > >> > >>> > > > https://reviews.apache.org/r/33378/)
> > > >> > >>> > > > I guess the reason protocol information is not included
> in
> > > the
> > > >> > TMR
> > > >> > >>> is
> > > >> > >>> > > > because the topic itself is independent of any
> particular
> > > >> > protocol
> > > >> > >>> (SSL
> > > >> > >>> > > vs
> > > >> > >>> > > > Plaintext). Having said that, I'm not sure we even need
> > rack
> > > >> > >>> > information
> > > >> > >>> > > in
> > > >> > >>> > > > TMR. What usecase were you thinking of initially?
> > > >> > >>> > > >
> > > >> > >>> > > > For 1 - I'd be fine with adding an option to the command
> > > line
> > > >> > tools
> > > >> > >>> > that
> > > >> > >>> > > > check rack assignment. For e.g. "--strict-assignment" or
> > > >> > something
> > > >> > >>> > > similar.
> > > >> > >>> > > >
> > > >> > >>> > > > Aditya
> > > >> > >>> > > >
> > > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> > > >> > allenxwang@gmail.com>
> > > >> > >>> > > wrote:
> > > >> > >>> > > >
> > > >> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a
> look.
> > > One
> > > >> > >>> thing I
> > > >> > >>> > > have
> > > >> > >>> > > > > changed is removing the proposal to add rack to
> > > >> > >>> > TopicMetadataResponse.
> > > >> > >>> > > > The
> > > >> > >>> > > > > reason is that unlike UpdateMetadataRequest,
> > > >> > >>> TopicMetadataResponse
> > > >> > >>> > does
> > > >> > >>> > > > not
> > > >> > >>> > > > > understand version. I don't see a way to include rack
> > > >> without
> > > >> > >>> > breaking
> > > >> > >>> > > > old
> > > >> > >>> > > > > version of clients. That's probably why secure
> protocol
> > is
> > > >> not
> > > >> > >>> > included
> > > >> > >>> > > > in
> > > >> > >>> > > > > the TopicMetadataResponse either. I think it will be a
> > > much
> > > >> > >>> bigger
> > > >> > >>> > > change
> > > >> > >>> > > > > to include rack in TopicMetadataResponse.
> > > >> > >>> > > > >
> > > >> > >>> > > > > For 1, my concern is that doing rack aware assignment
> > > >> without
> > > >> > >>> > complete
> > > >> > >>> > > > > broker to rack mapping will result in assignment that
> is
> > > not
> > > >> > rack
> > > >> > >>> > aware
> > > >> > >>> > > > and
> > > >> > >>> > > > > fail to provide fault tolerance in the event of rack
> > > outage.
> > > >> > This
> > > >> > >>> > kind
> > > >> > >>> > > of
> > > >> > >>> > > > > problem will be difficult to surface. And the cost of
> > this
> > > >> > >>> problem is
> > > >> > >>> > > > high:
> > > >> > >>> > > > > you have to do partition reassignment if you are lucky
> > to
> > > >> spot
> > > >> > >>> the
> > > >> > >>> > > > problem
> > > >> > >>> > > > > early on or face the consequence of data loss during
> > real
> > > >> rack
> > > >> > >>> > outage.
> > > >> > >>> > > > >
> > > >> > >>> > > > > I do see the concern of fail-fast as it might also
> cause
> > > >> data
> > > >> > >>> loss if
> > > >> > >>> > > > > producer is not able produce the message due to topic
> > > >> creation
> > > >> > >>> > failure.
> > > >> > >>> > > > Is
> > > >> > >>> > > > > it feasible to treat dynamic topic creation and
> command
> > > >> tools
> > > >> > >>> > > > differently?
> > > >> > >>> > > > > We allow dynamic topic creation with incomplete
> > > broker-rack
> > > >> > >>> mapping
> > > >> > >>> > and
> > > >> > >>> > > > > fail fast in command line. Another option is to let
> user
> > > >> > >>> determine
> > > >> > >>> > the
> > > >> > >>> > > > > behavior for command line. For example, by default
> fail
> > > >> fast in
> > > >> > >>> > command
> > > >> > >>> > > > > line but allow incomplete broker-rack mapping if
> another
> > > >> switch
> > > >> > >>> is
> > > >> > >>> > > > > provided.
> > > >> > >>> > > > >
> > > >> > >>> > > > >
> > > >> > >>> > > > >
> > > >> > >>> > > > >
> > > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> > > >> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> > > >> > >>> > > > >
> > > >> > >>> > > > > > Hey Allen,
> > > >> > >>> > > > > >
> > > >> > >>> > > > > > 1. If we choose fail fast topic creation, we will
> have
> > > >> topic
> > > >> > >>> > creation
> > > >> > >>> > > > > > failures while upgrading the cluster. I really doubt
> > we
> > > >> want
> > > >> > >>> this
> > > >> > >>> > > > > behavior.
> > > >> > >>> > > > > > Ideally, this should be invisible to clients of a
> > > cluster.
> > > >> > >>> > Currently,
> > > >> > >>> > > > > each
> > > >> > >>> > > > > > broker is effectively its own rack. So we probably
> can
> > > use
> > > >> > the
> > > >> > >>> rack
> > > >> > >>> > > > > > information whenever possible but not make it a hard
> > > >> > >>> requirement.
> > > >> > >>> > To
> > > >> > >>> > > > > extend
> > > >> > >>> > > > > > Gwen's example, one badly configured broker should
> not
> > > >> > degrade
> > > >> > >>> > topic
> > > >> > >>> > > > > > creation for the entire cluster.
> > > >> > >>> > > > > >
> > > >> > >>> > > > > > 2. Upgrade scenario - Can you add a section on the
> > > upgrade
> > > >> > >>> piece to
> > > >> > >>> > > > > confirm
> > > >> > >>> > > > > > that old clients will not see errors? I believe
> > > >> > >>> > > > > ZookeeperConsumerConnector
> > > >> > >>> > > > > > reads the Broker objects from ZK. I wanted to
> confirm
> > > that
> > > >> > this
> > > >> > >>> > will
> > > >> > >>> > > > not
> > > >> > >>> > > > > > cause any problems.
> > > >> > >>> > > > > >
> > > >> > >>> > > > > > 3. Could you elaborate your proposed changes to the
> > > >> > >>> > > > UpdateMetadataRequest
> > > >> > >>> > > > > > in the "Public Interfaces" section? Personally, I
> find
> > > >> this
> > > >> > >>> format
> > > >> > >>> > > easy
> > > >> > >>> > > > > to
> > > >> > >>> > > > > > read in terms of wire protocol changes:
> > > >> > >>> > > > > >
> > > >> > >>> > > > > >
> > > >> > >>> > > > >
> > > >> > >>> > > >
> > > >> > >>> > >
> > > >> > >>> >
> > > >> > >>>
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > > >> > >>> > > > > >
> > > >> > >>> > > > > > Aditya
> > > >> > >>> > > > > >
> > > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> > > >> > >>> allenxwang@gmail.com>
> > > >> > >>> > > > > wrote:
> > > >> > >>> > > > > >
> > > >> > >>> > > > > > > KIP is updated include rack as an optional
> property
> > > for
> > > >> > >>> broker.
> > > >> > >>> > > > Please
> > > >> > >>> > > > > > take
> > > >> > >>> > > > > > > a look and let me know if more details are needed.
> > > >> > >>> > > > > > >
> > > >> > >>> > > > > > > For the case where some brokers have rack and some
> > do
> > > >> not,
> > > >> > >>> the
> > > >> > >>> > > > current
> > > >> > >>> > > > > > KIP
> > > >> > >>> > > > > > > uses the fail-fast behavior. If there are
> concerns,
> > we
> > > >> can
> > > >> > >>> > further
> > > >> > >>> > > > > > discuss
> > > >> > >>> > > > > > > this in the email thread or next hangout.
> > > >> > >>> > > > > > >
> > > >> > >>> > > > > > >
> > > >> > >>> > > > > > >
> > > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
> > > >> > >>> > allenxwang@gmail.com
> > > >> > >>> > > >
> > > >> > >>> > > > > > wrote:
> > > >> > >>> > > > > > >
> > > >> > >>> > > > > > > > That's a good question. I can think of three
> > actions
> > > >> if
> > > >> > the
> > > >> > >>> > rack
> > > >> > >>> > > > > > > > information is incomplete:
> > > >> > >>> > > > > > > >
> > > >> > >>> > > > > > > > 1. Treat the node without rack as if it is on
> its
> > > >> unique
> > > >> > >>> rack
> > > >> > >>> > > > > > > > 2. Disregard all rack information and fallback
> to
> > > >> current
> > > >> > >>> > > algorithm
> > > >> > >>> > > > > > > > 3. Fail-fast
> > > >> > >>> > > > > > > >
> > > >> > >>> > > > > > > > Now I think about it, one and three make more
> > sense.
> > > >> The
> > > >> > >>> reason
> > > >> > >>> > > for
> > > >> > >>> > > > > > > > fail-fast is that user mistake for not providing
> > the
> > > >> rack
> > > >> > >>> may
> > > >> > >>> > > never
> > > >> > >>> > > > > be
> > > >> > >>> > > > > > > > found if we tolerate that and the assignment may
> > not
> > > >> be
> > > >> > >>> rack
> > > >> > >>> > > aware
> > > >> > >>> > > > as
> > > >> > >>> > > > > > the
> > > >> > >>> > > > > > > > user has expected and this creates debug
> problems
> > > when
> > > >> > >>> things
> > > >> > >>> > > fail.
> > > >> > >>> > > > > > > >
> > > >> > >>> > > > > > > > What do you think? If not fail-fast, is there
> > anyway
> > > >> we
> > > >> > can
> > > >> > >>> > make
> > > >> > >>> > > > the
> > > >> > >>> > > > > > user
> > > >> > >>> > > > > > > > error standing out?
> > > >> > >>> > > > > > > >
> > > >> > >>> > > > > > > >
> > > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
> > > >> > >>> > > gwen@confluent.io>
> > > >> > >>> > > > > > > wrote:
> > > >> > >>> > > > > > > >
> > > >> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers have
> > > rack
> > > >> > >>> > assignment
> > > >> > >>> > > > and
> > > >> > >>> > > > > > some
> > > >> > >>> > > > > > > >> don't, do we act like none of them have it? or
> > like
> > > >> > those
> > > >> > >>> > > without
> > > >> > >>> > > > > > > >> assignment are in their own rack?
> > > >> > >>> > > > > > > >>
> > > >> > >>> > > > > > > >> The first scenario is good when first setting
> up
> > > >> > >>> > rack-awareness,
> > > >> > >>> > > > but
> > > >> > >>> > > > > > the
> > > >> > >>> > > > > > > >> second makes more sense for on-going
> maintenance
> > (I
> > > >> can
> > > >> > >>> > totally
> > > >> > >>> > > > see
> > > >> > >>> > > > > > > >> someone
> > > >> > >>> > > > > > > >> adding a node and forgetting to set the rack
> > > >> property,
> > > >> > we
> > > >> > >>> > don't
> > > >> > >>> > > > want
> > > >> > >>> > > > > > > this
> > > >> > >>> > > > > > > >> to change behavior for anything except the new
> > > node).
> > > >> > >>> > > > > > > >>
> > > >> > >>> > > > > > > >> What do you think?
> > > >> > >>> > > > > > > >>
> > > >> > >>> > > > > > > >> Gwen
> > > >> > >>> > > > > > > >>
> > > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
> > > >> > >>> > > > allenxwang@gmail.com>
> > > >> > >>> > > > > > > >> wrote:
> > > >> > >>> > > > > > > >>
> > > >> > >>> > > > > > > >> > For scenario 1:
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >> > - Add the rack information to broker property
> > > file
> > > >> or
> > > >> > >>> > > > dynamically
> > > >> > >>> > > > > > set
> > > >> > >>> > > > > > > >> it in
> > > >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka server.
> You
> > > >> would
> > > >> > do
> > > >> > >>> > that
> > > >> > >>> > > > for
> > > >> > >>> > > > > > all
> > > >> > >>> > > > > > > >> > brokers and restart the brokers one by one.
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >> > In this scenario, the complete broker to rack
> > > >> mapping
> > > >> > >>> may
> > > >> > >>> > not
> > > >> > >>> > > be
> > > >> > >>> > > > > > > >> available
> > > >> > >>> > > > > > > >> > until every broker is restarted. During that
> > time
> > > >> we
> > > >> > >>> fall
> > > >> > >>> > back
> > > >> > >>> > > > to
> > > >> > >>> > > > > > > >> default
> > > >> > >>> > > > > > > >> > replica assignment algorithm.
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >> > For scenario 2:
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >> > - Add the rack information to broker property
> > > file
> > > >> or
> > > >> > >>> > > > dynamically
> > > >> > >>> > > > > > set
> > > >> > >>> > > > > > > >> it in
> > > >> > >>> > > > > > > >> > the wrapper code and start the broker.
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen
> Shapira <
> > > >> > >>> > > > gwen@confluent.io>
> > > >> > >>> > > > > > > >> wrote:
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >> > > Can you clarify the workflow for the
> > following
> > > >> > >>> scenarios:
> > > >> > >>> > > > > > > >> > >
> > > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to
> add
> > > >> rack
> > > >> > >>> > > information
> > > >> > >>> > > > > for
> > > >> > >>> > > > > > > >> each
> > > >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to
> > > specify
> > > >> > which
> > > >> > >>> > rack
> > > >> > >>> > > it
> > > >> > >>> > > > > > > >> belongs on
> > > >> > >>> > > > > > > >> > > while adding it.
> > > >> > >>> > > > > > > >> > >
> > > >> > >>> > > > > > > >> > > Thanks!
> > > >> > >>> > > > > > > >> > >
> > > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen
> Wang <
> > > >> > >>> > > > > allenxwang@gmail.com
> > > >> > >>> > > > > > >
> > > >> > >>> > > > > > > >> > wrote:
> > > >> > >>> > > > > > > >> > >
> > > >> > >>> > > > > > > >> > > > We discussed the KIP in the hangout
> today.
> > > The
> > > >> > >>> > > > recommendation
> > > >> > >>> > > > > is
> > > >> > >>> > > > > > > to
> > > >> > >>> > > > > > > >> > make
> > > >> > >>> > > > > > > >> > > > rack as a broker property in ZooKeeper.
> For
> > > >> users
> > > >> > >>> with
> > > >> > >>> > > > > existing
> > > >> > >>> > > > > > > rack
> > > >> > >>> > > > > > > >> > > > information stored somewhere, they would
> > need
> > > >> to
> > > >> > >>> > retrieve
> > > >> > >>> > > > the
> > > >> > >>> > > > > > > >> > information
> > > >> > >>> > > > > > > >> > > > at broker start up and dynamically set
> the
> > > rack
> > > >> > >>> > property,
> > > >> > >>> > > > > which
> > > >> > >>> > > > > > > can
> > > >> > >>> > > > > > > >> be
> > > >> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap
> > broker.
> > > >> > There
> > > >> > >>> will
> > > >> > >>> > > be
> > > >> > >>> > > > no
> > > >> > >>> > > > > > > >> > interface
> > > >> > >>> > > > > > > >> > > or
> > > >> > >>> > > > > > > >> > > > pluggable implementation to retrieve the
> > rack
> > > >> > >>> > information.
> > > >> > >>> > > > > > > >> > > >
> > > >> > >>> > > > > > > >> > > > The assumption is that you always need to
> > > >> restart
> > > >> > >>> the
> > > >> > >>> > > broker
> > > >> > >>> > > > > to
> > > >> > >>> > > > > > > >> make a
> > > >> > >>> > > > > > > >> > > > change to the rack.
> > > >> > >>> > > > > > > >> > > >
> > > >> > >>> > > > > > > >> > > > Once the rack becomes a broker property,
> it
> > > >> will
> > > >> > be
> > > >> > >>> > > possible
> > > >> > >>> > > > > to
> > > >> > >>> > > > > > > make
> > > >> > >>> > > > > > > >> > rack
> > > >> > >>> > > > > > > >> > > > part of the meta data to help the
> consumer
> > > >> choose
> > > >> > >>> which
> > > >> > >>> > in
> > > >> > >>> > > > > sync
> > > >> > >>> > > > > > > >> replica
> > > >> > >>> > > > > > > >> > > to
> > > >> > >>> > > > > > > >> > > > consume from as part of the future
> consumer
> > > >> > >>> enhancement.
> > > >> > >>> > > > > > > >> > > >
> > > >> > >>> > > > > > > >> > > > I will update the KIP.
> > > >> > >>> > > > > > > >> > > >
> > > >> > >>> > > > > > > >> > > > Thanks,
> > > >> > >>> > > > > > > >> > > > Allen
> > > >> > >>> > > > > > > >> > > >
> > > >> > >>> > > > > > > >> > > >
> > > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen
> Wang
> > <
> > > >> > >>> > > > > > allenxwang@gmail.com>
> > > >> > >>> > > > > > > >> > wrote:
> > > >> > >>> > > > > > > >> > > >
> > > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but
> this
> > > KIP
> > > >> > was
> > > >> > >>> not
> > > >> > >>> > > > > > discussed
> > > >> > >>> > > > > > > >> due
> > > >> > >>> > > > > > > >> > to
> > > >> > >>> > > > > > > >> > > > > time constraint.
> > > >> > >>> > > > > > > >> > > > >
> > > >> > >>> > > > > > > >> > > > > However, after hearing discussion of
> > > KIP-35,
> > > >> I
> > > >> > >>> have
> > > >> > >>> > the
> > > >> > >>> > > > > > feeling
> > > >> > >>> > > > > > > >> that
> > > >> > >>> > > > > > > >> > > > > incompatibility (caused by new broker
> > > >> property)
> > > >> > >>> > between
> > > >> > >>> > > > > > brokers
> > > >> > >>> > > > > > > >> with
> > > >> > >>> > > > > > > >> > > > > different versions  will be solved
> there.
> > > In
> > > >> > >>> addition,
> > > >> > >>> > > > > having
> > > >> > >>> > > > > > > >> stack
> > > >> > >>> > > > > > > >> > in
> > > >> > >>> > > > > > > >> > > > > broker property as meta data may also
> > help
> > > >> > >>> consumers
> > > >> > >>> > in
> > > >> > >>> > > > the
> > > >> > >>> > > > > > > >> future.
> > > >> > >>> > > > > > > >> > So
> > > >> > >>> > > > > > > >> > > I
> > > >> > >>> > > > > > > >> > > > am
> > > >> > >>> > > > > > > >> > > > > open to adding stack property to
> broker.
> > > >> > >>> > > > > > > >> > > > >
> > > >> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the
> next
> > > KIP
> > > >> > >>> hangout.
> > > >> > >>> > > > > > > >> > > > >
> > > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen
> > > Wang <
> > > >> > >>> > > > > > > allenxwang@gmail.com
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >> > > > wrote:
> > > >> > >>> > > > > > > >> > > > >
> > > >> > >>> > > > > > > >> > > > >> Can you send me the information on the
> > > next
> > > >> KIP
> > > >> > >>> > > hangout?
> > > >> > >>> > > > > > > >> > > > >>
> > > >> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping is
> not
> > > >> > cached.
> > > >> > >>> In
> > > >> > >>> > > > > > KafkaApis,
> > > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called
> each
> > > >> time
> > > >> > the
> > > >> > >>> > > mapping
> > > >> > >>> > > > > is
> > > >> > >>> > > > > > > >> needed
> > > >> > >>> > > > > > > >> > > for
> > > >> > >>> > > > > > > >> > > > >> auto topic creation. This will ensure
> > > latest
> > > >> > >>> mapping
> > > >> > >>> > is
> > > >> > >>> > > > > used
> > > >> > >>> > > > > > at
> > > >> > >>> > > > > > > >> any
> > > >> > >>> > > > > > > >> > > > time.
> > > >> > >>> > > > > > > >> > > > >>
> > > >> > >>> > > > > > > >> > > > >> The ability to get the complete
> mapping
> > > >> makes
> > > >> > it
> > > >> > >>> > simple
> > > >> > >>> > > > to
> > > >> > >>> > > > > > > reuse
> > > >> > >>> > > > > > > >> the
> > > >> > >>> > > > > > > >> > > > same
> > > >> > >>> > > > > > > >> > > > >> interface in command line tools.
> > > >> > >>> > > > > > > >> > > > >>
> > > >> > >>> > > > > > > >> > > > >>
> > > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM,
> Aditya
> > > >> > >>> Auradkar <
> > > >> > >>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid>
> wrote:
> > > >> > >>> > > > > > > >> > > > >>
> > > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the
> next
> > > KIP
> > > >> > >>> hangout?
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack
> locator
> > > can
> > > >> be
> > > >> > >>> useful
> > > >> > >>> > > > but I
> > > >> > >>> > > > > > do
> > > >> > >>> > > > > > > >> see a
> > > >> > >>> > > > > > > >> > > few
> > > >> > >>> > > > > > > >> > > > >>> concerns:
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in
> the
> > > >> > >>> document),
> > > >> > >>> > > > implies
> > > >> > >>> > > > > > that
> > > >> > >>> > > > > > > >> it
> > > >> > >>> > > > > > > >> > can
> > > >> > >>> > > > > > > >> > > > >>> discover rack information for any
> node
> > in
> > > >> the
> > > >> > >>> > cluster.
> > > >> > >>> > > > How
> > > >> > >>> > > > > > > does
> > > >> > >>> > > > > > > >> it
> > > >> > >>> > > > > > > >> > > deal
> > > >> > >>> > > > > > > >> > > > >>> with rack location changes? For
> > example,
> > > >> if I
> > > >> > >>> moved
> > > >> > >>> > > > broker
> > > >> > >>> > > > > > id
> > > >> > >>> > > > > > > >> (1)
> > > >> > >>> > > > > > > >> > > from
> > > >> > >>> > > > > > > >> > > > >>> rack
> > > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that
> > broker
> > > >> with
> > > >> > a
> > > >> > >>> > newer
> > > >> > >>> > > > rack
> > > >> > >>> > > > > > > >> config.
> > > >> > >>> > > > > > > >> > If
> > > >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack
> > > >> > >>> information at
> > > >> > >>> > > > start
> > > >> > >>> > > > > up
> > > >> > >>> > > > > > > >> time,
> > > >> > >>> > > > > > > >> > > any
> > > >> > >>> > > > > > > >> > > > >>> change to a broker will require
> > bouncing
> > > >> the
> > > >> > >>> entire
> > > >> > >>> > > > > cluster
> > > >> > >>> > > > > > > >> since
> > > >> > >>> > > > > > > >> > > > >>> createTopic requests can be sent to
> any
> > > >> node
> > > >> > in
> > > >> > >>> the
> > > >> > >>> > > > > cluster.
> > > >> > >>> > > > > > > >> > > > >>> For this reason it may be simpler to
> > have
> > > >> each
> > > >> > >>> node
> > > >> > >>> > be
> > > >> > >>> > > > > aware
> > > >> > >>> > > > > > > of
> > > >> > >>> > > > > > > >> its
> > > >> > >>> > > > > > > >> > > own
> > > >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during
> start
> > up
> > > >> > time.
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on
> an
> > > >> > external
> > > >> > >>> > > service
> > > >> > >>> > > > > > being
> > > >> > >>> > > > > > > >> > > available
> > > >> > >>> > > > > > > >> > > > >>> to
> > > >> > >>> > > > > > > >> > > > >>> serve rack information.
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a
> > > couple
> > > >> of
> > > >> > >>> other
> > > >> > >>> > > > > systems
> > > >> > >>> > > > > > > deal
> > > >> > >>> > > > > > > >> > with
> > > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> > > >> > >>> > > > > > > >> > > > >>> For Cassandra some interesting modes
> > are:
> > > >> > >>> > > > > > > >> > > > >>> (Property File configuration)
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > >
> > > >> > >>> > > > > > > >> > >
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >>
> > > >> > >>> > > > > > >
> > > >> > >>> > > > > >
> > > >> > >>> > > > >
> > > >> > >>> > > >
> > > >> > >>> > >
> > > >> > >>> >
> > > >> > >>>
> > > >> >
> > > >>
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > >
> > > >> > >>> > > > > > > >> > >
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >>
> > > >> > >>> > > > > > >
> > > >> > >>> > > > > >
> > > >> > >>> > > > >
> > > >> > >>> > > >
> > > >> > >>> > >
> > > >> > >>> >
> > > >> > >>>
> > > >> >
> > > >>
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > > >>> Voldemort does a static node -> zone
> > > >> > assignment
> > > >> > >>> > based
> > > >> > >>> > > on
> > > >> > >>> > > > > > > >> > > configuration.
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > > >>> Aditya
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM,
> Allen
> > > >> Wang <
> > > >> > >>> > > > > > > >> allenxwang@gmail.com
> > > >> > >>> > > > > > > >> > >
> > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > > >>> > I would like to see if we can do
> > both:
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to
> > > >> facilitate
> > > >> > >>> > migration
> > > >> > >>> > > > > with
> > > >> > >>> > > > > > > >> > existing
> > > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>> > - Make rack an optional property
> for
> > > >> broker.
> > > >> > >>> If
> > > >> > >>> > rack
> > > >> > >>> > > > is
> > > >> > >>> > > > > > > >> available
> > > >> > >>> > > > > > > >> > > > from
> > > >> > >>> > > > > > > >> > > > >>> > broker, treat it as source of
> truth.
> > > For
> > > >> > users
> > > >> > >>> > with
> > > >> > >>> > > > > > existing
> > > >> > >>> > > > > > > >> > > > >>> broker-rack
> > > >> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can
> use
> > > the
> > > >> > >>> pluggable
> > > >> > >>> > > way
> > > >> > >>> > > > > or
> > > >> > >>> > > > > > > they
> > > >> > >>> > > > > > > >> > can
> > > >> > >>> > > > > > > >> > > > >>> transfer
> > > >> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack
> > > property.
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what
> > happens
> > > >> at
> > > >> > >>> rolling
> > > >> > >>> > > > > upgrade
> > > >> > >>> > > > > > > >> when
> > > >> > >>> > > > > > > >> > we
> > > >> > >>> > > > > > > >> > > > have
> > > >> > >>> > > > > > > >> > > > >>> > rack as a broker property. For
> > brokers
> > > >> with
> > > >> > >>> older
> > > >> > >>> > > > > version
> > > >> > >>> > > > > > of
> > > >> > >>> > > > > > > >> > Kafka,
> > > >> > >>> > > > > > > >> > > > >>> will it
> > > >> > >>> > > > > > > >> > > > >>> > cause problem for them? If so, is
> > there
> > > >> any
> > > >> > >>> > > > workaround?
> > > >> > >>> > > > > I
> > > >> > >>> > > > > > > also
> > > >> > >>> > > > > > > >> > > think
> > > >> > >>> > > > > > > >> > > > it
> > > >> > >>> > > > > > > >> > > > >>> > would be better not to have rack in
> > the
> > > >> > >>> controller
> > > >> > >>> > > > wire
> > > >> > >>> > > > > > > >> protocol
> > > >> > >>> > > > > > > >> > > but
> > > >> > >>> > > > > > > >> > > > >>> not
> > > >> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>> > Thanks,
> > > >> > >>> > > > > > > >> > > > >>> > Allen
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM,
> Todd
> > > >> > Palino <
> > > >> > >>> > > > > > > >> tpalino@gmail.com>
> > > >> > >>> > > > > > > >> > > > >>> wrote:
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a
> > > pluggable
> > > >> > >>> locator.
> > > >> > >>> > > For
> > > >> > >>> > > > > > > >> example, we
> > > >> > >>> > > > > > > >> > > > >>> already
> > > >> > >>> > > > > > > >> > > > >>> > > have an interface for discovering
> > > >> > >>> information
> > > >> > >>> > > about
> > > >> > >>> > > > > the
> > > >> > >>> > > > > > > >> > physical
> > > >> > >>> > > > > > > >> > > > >>> location
> > > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the
> idea
> > > of
> > > >> > >>> having to
> > > >> > >>> > > > > > maintain
> > > >> > >>> > > > > > > >> data
> > > >> > >>> > > > > > > >> > in
> > > >> > >>> > > > > > > >> > > > >>> > multiple
> > > >> > >>> > > > > > > >> > > > >>> > > places.
> > > >> > >>> > > > > > > >> > > > >>> > >
> > > >> > >>> > > > > > > >> > > > >>> > > -Todd
> > > >> > >>> > > > > > > >> > > > >>> > >
> > > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM,
> > > Aditya
> > > >> > >>> > Auradkar <
> > > >> > >>> > > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid>
> > > wrote:
> > > >> > >>> > > > > > > >> > > > >>> > >
> > > >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP
> > Allen.
> > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a
> > > >> > >>> RackLocator
> > > >> > >>> > > class
> > > >> > >>> > > > > that
> > > >> > >>> > > > > > > is
> > > >> > >>> > > > > > > >> > > > pluggable
> > > >> > >>> > > > > > > >> > > > >>> > seems
> > > >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP
> refers
> > > to
> > > >> > >>> > potentially
> > > >> > >>> > > > > > non-ZK
> > > >> > >>> > > > > > > >> > storage
> > > >> > >>> > > > > > > >> > > > >>> for the
> > > >> > >>> > > > > > > >> > > > >>> > > > rack info which I don't think
> is
> > > >> > >>> necessary.
> > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this
> info
> > in
> > > >> zk
> > > >> > >>> under
> > > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> > > >> > >>> > > > > > > >> > > > >>> > > > similar to other broker
> > properties
> > > >> and
> > > >> > >>> add a
> > > >> > >>> > > > config
> > > >> > >>> > > > > in
> > > >> > >>> > > > > > > >> > > > KafkaConfig
> > > >> > >>> > > > > > > >> > > > >>> > called
> > > >> > >>> > > > > > > >> > > > >>> > > > "rack".
> > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > >>> > > > > > >
> > > >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > >> > >>> > > > > > > >> > > "rack":
> > > >> > >>> > > > > > > >> > > > >>> > "abc"}
> > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > Aditya
> > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30
> PM,
> > > Gwen
> > > >> > >>> Shapira
> > > >> > >>> > <
> > > >> > >>> > > > > > > >> > > gwen@confluent.io
> > > >> > >>> > > > > > > >> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > wrote:
> > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting
> out a
> > > KIP
> > > >> > for
> > > >> > >>> > this.
> > > >> > >>> > > > This
> > > >> > >>> > > > > > is
> > > >> > >>> > > > > > > >> super
> > > >> > >>> > > > > > > >> > > > >>> important
> > > >> > >>> > > > > > > >> > > > >>> > > for
> > > >> > >>> > > > > > > >> > > > >>> > > > > production deployments of
> > Kafka.
> > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as
> many
> > > >> racks
> > > >> > as
> > > >> > >>> > > > > possible"?
> > > >> > >>> > > > > > > I'd
> > > >> > >>> > > > > > > >> > want
> > > >> > >>> > > > > > > >> > > to
> > > >> > >>> > > > > > > >> > > > >>> > balance
> > > >> > >>> > > > > > > >> > > > >>> > > > > between safety (more racks)
> and
> > > >> > network
> > > >> > >>> > > > > utilization
> > > >> > >>> > > > > > > >> > (traffic
> > > >> > >>> > > > > > > >> > > > >>> within a
> > > >> > >>> > > > > > > >> > > > >>> > > > rack
> > > >> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR
> > > >> switch).
> > > >> > One
> > > >> > >>> > > replica
> > > >> > >>> > > > > on
> > > >> > >>> > > > > > a
> > > >> > >>> > > > > > > >> > > different
> > > >> > >>> > > > > > > >> > > > >>> rack
> > > >> > >>> > > > > > > >> > > > >>> > > and
> > > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if
> > > possible)
> > > >> > >>> sounds
> > > >> > >>> > > > better
> > > >> > >>> > > > > to
> > > >> > >>> > > > > > > me.
> > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems
> > > overly
> > > >> > >>> complex
> > > >> > >>> > > > > compared
> > > >> > >>> > > > > > to
> > > >> > >>> > > > > > > >> > > adding a
> > > >> > >>> > > > > > > >> > > > >>> > > > rack.number
> > > >> > >>> > > > > > > >> > > > >>> > > > > property to the broker
> > properties
> > > >> > file.
> > > >> > >>> Why
> > > >> > >>> > do
> > > >> > >>> > > > we
> > > >> > >>> > > > > > want
> > > >> > >>> > > > > > > >> > that?
> > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15
> > PM,
> > > >> > Allen
> > > >> > >>> > Wang <
> > > >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> > > >> > >>> > > > > > > >> > > > >>> > > > wrote:
> > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for
> > rack
> > > >> aware
> > > >> > >>> > replica
> > > >> > >>> > > > > > > >> assignment.
> > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > >>> > > > > > > >> > > > >>> > >
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > >
> > > >> > >>> > > > > > > >> > >
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >>
> > > >> > >>> > > > > > >
> > > >> > >>> > > > > >
> > > >> > >>> > > > >
> > > >> > >>> > > >
> > > >> > >>> > >
> > > >> > >>> >
> > > >> > >>>
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the
> > > >> isolation
> > > >> > >>> > > provided
> > > >> > >>> > > > by
> > > >> > >>> > > > > > the
> > > >> > >>> > > > > > > >> > racks
> > > >> > >>> > > > > > > >> > > in
> > > >> > >>> > > > > > > >> > > > >>> data
> > > >> > >>> > > > > > > >> > > > >>> > > > center
> > > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas to
> > > racks
> > > >> to
> > > >> > >>> > provide
> > > >> > >>> > > > > fault
> > > >> > >>> > > > > > > >> > > tolerance.
> > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
> > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> > > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> > > >> > >>> > > > > > > >> > > > >>> > > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > > >
> > > >> > >>> > > > > > > >> > > > >>> > > >
> > > >> > >>> > > > > > > >> > > > >>> > >
> > > >> > >>> > > > > > > >> > > > >>> >
> > > >> > >>> > > > > > > >> > > > >>>
> > > >> > >>> > > > > > > >> > > > >>
> > > >> > >>> > > > > > > >> > > > >>
> > > >> > >>> > > > > > > >> > > > >
> > > >> > >>> > > > > > > >> > > >
> > > >> > >>> > > > > > > >> > >
> > > >> > >>> > > > > > > >> >
> > > >> > >>> > > > > > > >>
> > > >> > >>> > > > > > > >
> > > >> > >>> > > > > > > >
> > > >> > >>> > > > > > >
> > > >> > >>> > > > > >
> > > >> > >>> > > > >
> > > >> > >>> > > >
> > > >> > >>> > >
> > > >> > >>> >
> > > >> > >>>
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> --
> > > >> > >>> Thanks,
> > > >> > >>> Neha
> > > >> > >>>
> > > >> > >>
> > > >> > >>
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Jun and I had a chance to discuss it in a meeting and it is agreed to
change the TMR in a different patch.

I can change the KIP to include rack in TMR. The essential change is to add
rack into class BrokerEndPoint and make TMR version aware.



On Tue, Jan 5, 2016 at 10:21 AM, Aditya Auradkar <
aauradkar@linkedin.com.invalid> wrote:

> Jun/Allen -
>
> Did we ever actually agree on whether we should evolve the TMR to include
> rack info or not?
> I don't feel strongly about it but I if it's the right thing to do we
> should probably do it in this KIP (can be a separate patch).. it isn't a
> large change.
>
> Aditya
>
> On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <al...@gmail.com> wrote:
>
> > Added the rolling upgrade instruction in the KIP, similar to those in
> 0.9.0
> > release notes.
> >
> > On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > Hi Jun,
> > >
> > > The reason that TopicMetadataResponse is not included in the KIP is
> that
> > > it currently is not version aware . So we need to introduce version to
> it
> > > in order to make sure backward compatibility. It seems to me a big
> > change.
> > > Do we want to couple it with this KIP? Do we need to further discuss
> what
> > > information to include in the new version besides rack? For example,
> > should
> > > we include broker security protocol in TopicMetadataResponse?
> > >
> > > The other option is to make it a separate KIP to make
> > > TopicMetadataResponse version aware and decide what to include, and
> make
> > > this KIP focus on the rack aware algorithm, admin tools  and related
> > > changes to inter-broker protocol .
> > >
> > > Thanks,
> > > Allen
> > >
> > >
> > >
> > >
> > > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <ju...@confluent.io> wrote:
> > >
> > >> Allen,
> > >>
> > >> Thanks for the proposal. A few comments.
> > >>
> > >> 1. Since this KIP changes the inter broker communication protocol
> > >> (UpdateMetadataRequest), we will need to document the upgrade path
> > >> (similar
> > >> to what's described in
> > >> http://kafka.apache.org/090/documentation.html#upgrade).
> > >>
> > >> 2. It might be useful to include the rack info of the broker in
> > >> TopicMetadataResponse. This can be useful for administrative tasks, as
> > >> well
> > >> as read affinity in the future.
> > >>
> > >> Jun
> > >>
> > >>
> > >>
> > >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <al...@gmail.com>
> > wrote:
> > >>
> > >> > If there are no more comments I would like to call for a vote.
> > >> >
> > >> >
> > >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <al...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > KIP is updated with more details and how to handle the situation
> > where
> > >> > > rack information is incomplete.
> > >> > >
> > >> > > In the situation where rack information is incomplete, but we want
> > to
> > >> > > continue with the assignment, I have suggested to ignore all rack
> > >> > > information and fallback to original algorithm. The reason is
> > >> explained
> > >> > > below:
> > >> > >
> > >> > > The other options are to assume that the broker without the rack
> > >> belong
> > >> > to
> > >> > > its own unique rack, or they belong to one "default" rack. Either
> > way
> > >> we
> > >> > > choose, it is highly likely to result in uneven number of brokers
> in
> > >> > racks,
> > >> > > and it is quite possible that the "made up" racks will have much
> > fewer
> > >> > > number of brokers. As I explained in the KIP, uneven number of
> > >> brokers in
> > >> > > racks will lead to uneven distribution of replicas among brokers
> > (even
> > >> > > though the leader distribution is still even). The brokers in the
> > rack
> > >> > that
> > >> > > has fewer number of brokers will get more replicas per broker than
> > >> > brokers
> > >> > > in other racks.
> > >> > >
> > >> > > Given this fact and the replica assignment produced will be
> > incorrect
> > >> > > anyway from rack aware point of view, ignoring all rack
> information
> > >> and
> > >> > > fallback to the original algorithm is not a bad choice since it
> will
> > >> at
> > >> > > least have a better guarantee of replica distribution.
> > >> > >
> > >> > > Also for command line tools it gives user a choice if for any
> reason
> > >> they
> > >> > > want to ignore rack information and fallback to the original
> > >> algorithm.
> > >> > >
> > >> > >
> > >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <allenxwang@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > >> I am busy with some time pressing issues for the last few days. I
> > >> will
> > >> > >> think about how the incomplete rack information will affect the
> > >> balance
> > >> > and
> > >> > >> update the KIP by early next week.
> > >> > >>
> > >> > >> Thanks,
> > >> > >> Allen
> > >> > >>
> > >> > >>
> > >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <neha@confluent.io
> >
> > >> > wrote:
> > >> > >>
> > >> > >>> Few suggestions on improving the KIP
> > >> > >>>
> > >> > >>> *If some brokers have rack, and some do not, the algorithm will
> > >> thrown
> > >> > an
> > >> > >>> > exception. This is to prevent incorrect assignment caused by
> > user
> > >> > >>> error.*
> > >> > >>>
> > >> > >>>
> > >> > >>> In the KIP, can you clearly state the user-facing behavior when
> > some
> > >> > >>> brokers have rack information and some don't. Which actions and
> > >> > requests
> > >> > >>> will error out and how?
> > >> > >>>
> > >> > >>> *Even distribution of partition leadership among brokers*
> > >> > >>>
> > >> > >>>
> > >> > >>> There is some information about arranging the sorted broker list
> > >> > >>> interlaced
> > >> > >>> with rack ids. Can you describe the changes to the current
> > algorithm
> > >> > in a
> > >> > >>> little more detail? How does this interlacing work if only a
> > subset
> > >> of
> > >> > >>> brokers have the rack id configured? Does this still work if
> > uneven
> > >> #
> > >> > of
> > >> > >>> brokers are assigned to each rack? It might work, I'm looking
> for
> > >> more
> > >> > >>> details on the changes, since it will affect the behavior seen
> by
> > >> the
> > >> > >>> user
> > >> > >>> - imbalance on either the leaders or data or both.
> > >> > >>>
> > >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> > >> > aauradkar@linkedin.com>
> > >> > >>> wrote:
> > >> > >>>
> > >> > >>> > I think this sounds reasonable. Anyone else have comments?
> > >> > >>> >
> > >> > >>> > Aditya
> > >> > >>> >
> > >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> > allenxwang@gmail.com
> > >> >
> > >> > >>> wrote:
> > >> > >>> >
> > >> > >>> > > During the discussion in the hangout, it was mentioned that
> it
> > >> > would
> > >> > >>> be
> > >> > >>> > > desirable that consumers know the rack information of the
> > >> brokers
> > >> > so
> > >> > >>> that
> > >> > >>> > > they can consume from the broker in the same rack to reduce
> > >> > latency.
> > >> > >>> As I
> > >> > >>> > > understand this will only be beneficial if consumer can
> > consume
> > >> > from
> > >> > >>> any
> > >> > >>> > > broker in ISR, which is not possible now.
> > >> > >>> > >
> > >> > >>> > > I suggest we skip the change to TMR. Once the change is made
> > to
> > >> > >>> consumer
> > >> > >>> > to
> > >> > >>> > > be able to consume from any broker in ISR, the rack
> > information
> > >> can
> > >> > >>> be
> > >> > >>> > > added to TMR.
> > >> > >>> > >
> > >> > >>> > > Another thing I want to confirm is  command line behavior. I
> > >> think
> > >> > >>> the
> > >> > >>> > > desirable default behavior is to fail fast on command line
> for
> > >> > >>> incomplete
> > >> > >>> > > rack mapping. The error message can include further
> > instruction
> > >> > that
> > >> > >>> > tells
> > >> > >>> > > the user to add an extra argument (like
> > >> "--allow-partial-rackinfo")
> > >> > >>> to
> > >> > >>> > > suppress the error and do an imperfect rack aware
> assignment.
> > If
> > >> > the
> > >> > >>> > > default behavior is to allow incomplete mapping, the error
> can
> > >> > still
> > >> > >>> be
> > >> > >>> > > easily missed.
> > >> > >>> > >
> > >> > >>> > > The affected command line tools are TopicCommand and
> > >> > >>> > > ReassignPartitionsCommand.
> > >> > >>> > >
> > >> > >>> > > Thanks,
> > >> > >>> > > Allen
> > >> > >>> > >
> > >> > >>> > >
> > >> > >>> > >
> > >> > >>> > >
> > >> > >>> > >
> > >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> > >> > >>> > aauradkar@linkedin.com>
> > >> > >>> > > wrote:
> > >> > >>> > >
> > >> > >>> > > > Hi Allen,
> > >> > >>> > > >
> > >> > >>> > > > For TopicMetadataResponse to understand version, you can
> > bump
> > >> up
> > >> > >>> the
> > >> > >>> > > > request version itself. Based on the version of the
> request,
> > >> the
> > >> > >>> > response
> > >> > >>> > > > can be appropriately serialized. It shouldn't be a huge
> > >> change.
> > >> > For
> > >> > >>> > > > example: We went through something similar for
> > ProduceRequest
> > >> > >>> recently
> > >> > >>> > (
> > >> > >>> > > > https://reviews.apache.org/r/33378/)
> > >> > >>> > > > I guess the reason protocol information is not included in
> > the
> > >> > TMR
> > >> > >>> is
> > >> > >>> > > > because the topic itself is independent of any particular
> > >> > protocol
> > >> > >>> (SSL
> > >> > >>> > > vs
> > >> > >>> > > > Plaintext). Having said that, I'm not sure we even need
> rack
> > >> > >>> > information
> > >> > >>> > > in
> > >> > >>> > > > TMR. What usecase were you thinking of initially?
> > >> > >>> > > >
> > >> > >>> > > > For 1 - I'd be fine with adding an option to the command
> > line
> > >> > tools
> > >> > >>> > that
> > >> > >>> > > > check rack assignment. For e.g. "--strict-assignment" or
> > >> > something
> > >> > >>> > > similar.
> > >> > >>> > > >
> > >> > >>> > > > Aditya
> > >> > >>> > > >
> > >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> > >> > allenxwang@gmail.com>
> > >> > >>> > > wrote:
> > >> > >>> > > >
> > >> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a look.
> > One
> > >> > >>> thing I
> > >> > >>> > > have
> > >> > >>> > > > > changed is removing the proposal to add rack to
> > >> > >>> > TopicMetadataResponse.
> > >> > >>> > > > The
> > >> > >>> > > > > reason is that unlike UpdateMetadataRequest,
> > >> > >>> TopicMetadataResponse
> > >> > >>> > does
> > >> > >>> > > > not
> > >> > >>> > > > > understand version. I don't see a way to include rack
> > >> without
> > >> > >>> > breaking
> > >> > >>> > > > old
> > >> > >>> > > > > version of clients. That's probably why secure protocol
> is
> > >> not
> > >> > >>> > included
> > >> > >>> > > > in
> > >> > >>> > > > > the TopicMetadataResponse either. I think it will be a
> > much
> > >> > >>> bigger
> > >> > >>> > > change
> > >> > >>> > > > > to include rack in TopicMetadataResponse.
> > >> > >>> > > > >
> > >> > >>> > > > > For 1, my concern is that doing rack aware assignment
> > >> without
> > >> > >>> > complete
> > >> > >>> > > > > broker to rack mapping will result in assignment that is
> > not
> > >> > rack
> > >> > >>> > aware
> > >> > >>> > > > and
> > >> > >>> > > > > fail to provide fault tolerance in the event of rack
> > outage.
> > >> > This
> > >> > >>> > kind
> > >> > >>> > > of
> > >> > >>> > > > > problem will be difficult to surface. And the cost of
> this
> > >> > >>> problem is
> > >> > >>> > > > high:
> > >> > >>> > > > > you have to do partition reassignment if you are lucky
> to
> > >> spot
> > >> > >>> the
> > >> > >>> > > > problem
> > >> > >>> > > > > early on or face the consequence of data loss during
> real
> > >> rack
> > >> > >>> > outage.
> > >> > >>> > > > >
> > >> > >>> > > > > I do see the concern of fail-fast as it might also cause
> > >> data
> > >> > >>> loss if
> > >> > >>> > > > > producer is not able produce the message due to topic
> > >> creation
> > >> > >>> > failure.
> > >> > >>> > > > Is
> > >> > >>> > > > > it feasible to treat dynamic topic creation and command
> > >> tools
> > >> > >>> > > > differently?
> > >> > >>> > > > > We allow dynamic topic creation with incomplete
> > broker-rack
> > >> > >>> mapping
> > >> > >>> > and
> > >> > >>> > > > > fail fast in command line. Another option is to let user
> > >> > >>> determine
> > >> > >>> > the
> > >> > >>> > > > > behavior for command line. For example, by default fail
> > >> fast in
> > >> > >>> > command
> > >> > >>> > > > > line but allow incomplete broker-rack mapping if another
> > >> switch
> > >> > >>> is
> > >> > >>> > > > > provided.
> > >> > >>> > > > >
> > >> > >>> > > > >
> > >> > >>> > > > >
> > >> > >>> > > > >
> > >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> > >> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> > >> > >>> > > > >
> > >> > >>> > > > > > Hey Allen,
> > >> > >>> > > > > >
> > >> > >>> > > > > > 1. If we choose fail fast topic creation, we will have
> > >> topic
> > >> > >>> > creation
> > >> > >>> > > > > > failures while upgrading the cluster. I really doubt
> we
> > >> want
> > >> > >>> this
> > >> > >>> > > > > behavior.
> > >> > >>> > > > > > Ideally, this should be invisible to clients of a
> > cluster.
> > >> > >>> > Currently,
> > >> > >>> > > > > each
> > >> > >>> > > > > > broker is effectively its own rack. So we probably can
> > use
> > >> > the
> > >> > >>> rack
> > >> > >>> > > > > > information whenever possible but not make it a hard
> > >> > >>> requirement.
> > >> > >>> > To
> > >> > >>> > > > > extend
> > >> > >>> > > > > > Gwen's example, one badly configured broker should not
> > >> > degrade
> > >> > >>> > topic
> > >> > >>> > > > > > creation for the entire cluster.
> > >> > >>> > > > > >
> > >> > >>> > > > > > 2. Upgrade scenario - Can you add a section on the
> > upgrade
> > >> > >>> piece to
> > >> > >>> > > > > confirm
> > >> > >>> > > > > > that old clients will not see errors? I believe
> > >> > >>> > > > > ZookeeperConsumerConnector
> > >> > >>> > > > > > reads the Broker objects from ZK. I wanted to confirm
> > that
> > >> > this
> > >> > >>> > will
> > >> > >>> > > > not
> > >> > >>> > > > > > cause any problems.
> > >> > >>> > > > > >
> > >> > >>> > > > > > 3. Could you elaborate your proposed changes to the
> > >> > >>> > > > UpdateMetadataRequest
> > >> > >>> > > > > > in the "Public Interfaces" section? Personally, I find
> > >> this
> > >> > >>> format
> > >> > >>> > > easy
> > >> > >>> > > > > to
> > >> > >>> > > > > > read in terms of wire protocol changes:
> > >> > >>> > > > > >
> > >> > >>> > > > > >
> > >> > >>> > > > >
> > >> > >>> > > >
> > >> > >>> > >
> > >> > >>> >
> > >> > >>>
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > >> > >>> > > > > >
> > >> > >>> > > > > > Aditya
> > >> > >>> > > > > >
> > >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> > >> > >>> allenxwang@gmail.com>
> > >> > >>> > > > > wrote:
> > >> > >>> > > > > >
> > >> > >>> > > > > > > KIP is updated include rack as an optional property
> > for
> > >> > >>> broker.
> > >> > >>> > > > Please
> > >> > >>> > > > > > take
> > >> > >>> > > > > > > a look and let me know if more details are needed.
> > >> > >>> > > > > > >
> > >> > >>> > > > > > > For the case where some brokers have rack and some
> do
> > >> not,
> > >> > >>> the
> > >> > >>> > > > current
> > >> > >>> > > > > > KIP
> > >> > >>> > > > > > > uses the fail-fast behavior. If there are concerns,
> we
> > >> can
> > >> > >>> > further
> > >> > >>> > > > > > discuss
> > >> > >>> > > > > > > this in the email thread or next hangout.
> > >> > >>> > > > > > >
> > >> > >>> > > > > > >
> > >> > >>> > > > > > >
> > >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
> > >> > >>> > allenxwang@gmail.com
> > >> > >>> > > >
> > >> > >>> > > > > > wrote:
> > >> > >>> > > > > > >
> > >> > >>> > > > > > > > That's a good question. I can think of three
> actions
> > >> if
> > >> > the
> > >> > >>> > rack
> > >> > >>> > > > > > > > information is incomplete:
> > >> > >>> > > > > > > >
> > >> > >>> > > > > > > > 1. Treat the node without rack as if it is on its
> > >> unique
> > >> > >>> rack
> > >> > >>> > > > > > > > 2. Disregard all rack information and fallback to
> > >> current
> > >> > >>> > > algorithm
> > >> > >>> > > > > > > > 3. Fail-fast
> > >> > >>> > > > > > > >
> > >> > >>> > > > > > > > Now I think about it, one and three make more
> sense.
> > >> The
> > >> > >>> reason
> > >> > >>> > > for
> > >> > >>> > > > > > > > fail-fast is that user mistake for not providing
> the
> > >> rack
> > >> > >>> may
> > >> > >>> > > never
> > >> > >>> > > > > be
> > >> > >>> > > > > > > > found if we tolerate that and the assignment may
> not
> > >> be
> > >> > >>> rack
> > >> > >>> > > aware
> > >> > >>> > > > as
> > >> > >>> > > > > > the
> > >> > >>> > > > > > > > user has expected and this creates debug problems
> > when
> > >> > >>> things
> > >> > >>> > > fail.
> > >> > >>> > > > > > > >
> > >> > >>> > > > > > > > What do you think? If not fail-fast, is there
> anyway
> > >> we
> > >> > can
> > >> > >>> > make
> > >> > >>> > > > the
> > >> > >>> > > > > > user
> > >> > >>> > > > > > > > error standing out?
> > >> > >>> > > > > > > >
> > >> > >>> > > > > > > >
> > >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
> > >> > >>> > > gwen@confluent.io>
> > >> > >>> > > > > > > wrote:
> > >> > >>> > > > > > > >
> > >> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers have
> > rack
> > >> > >>> > assignment
> > >> > >>> > > > and
> > >> > >>> > > > > > some
> > >> > >>> > > > > > > >> don't, do we act like none of them have it? or
> like
> > >> > those
> > >> > >>> > > without
> > >> > >>> > > > > > > >> assignment are in their own rack?
> > >> > >>> > > > > > > >>
> > >> > >>> > > > > > > >> The first scenario is good when first setting up
> > >> > >>> > rack-awareness,
> > >> > >>> > > > but
> > >> > >>> > > > > > the
> > >> > >>> > > > > > > >> second makes more sense for on-going maintenance
> (I
> > >> can
> > >> > >>> > totally
> > >> > >>> > > > see
> > >> > >>> > > > > > > >> someone
> > >> > >>> > > > > > > >> adding a node and forgetting to set the rack
> > >> property,
> > >> > we
> > >> > >>> > don't
> > >> > >>> > > > want
> > >> > >>> > > > > > > this
> > >> > >>> > > > > > > >> to change behavior for anything except the new
> > node).
> > >> > >>> > > > > > > >>
> > >> > >>> > > > > > > >> What do you think?
> > >> > >>> > > > > > > >>
> > >> > >>> > > > > > > >> Gwen
> > >> > >>> > > > > > > >>
> > >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
> > >> > >>> > > > allenxwang@gmail.com>
> > >> > >>> > > > > > > >> wrote:
> > >> > >>> > > > > > > >>
> > >> > >>> > > > > > > >> > For scenario 1:
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >> > - Add the rack information to broker property
> > file
> > >> or
> > >> > >>> > > > dynamically
> > >> > >>> > > > > > set
> > >> > >>> > > > > > > >> it in
> > >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You
> > >> would
> > >> > do
> > >> > >>> > that
> > >> > >>> > > > for
> > >> > >>> > > > > > all
> > >> > >>> > > > > > > >> > brokers and restart the brokers one by one.
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >> > In this scenario, the complete broker to rack
> > >> mapping
> > >> > >>> may
> > >> > >>> > not
> > >> > >>> > > be
> > >> > >>> > > > > > > >> available
> > >> > >>> > > > > > > >> > until every broker is restarted. During that
> time
> > >> we
> > >> > >>> fall
> > >> > >>> > back
> > >> > >>> > > > to
> > >> > >>> > > > > > > >> default
> > >> > >>> > > > > > > >> > replica assignment algorithm.
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >> > For scenario 2:
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >> > - Add the rack information to broker property
> > file
> > >> or
> > >> > >>> > > > dynamically
> > >> > >>> > > > > > set
> > >> > >>> > > > > > > >> it in
> > >> > >>> > > > > > > >> > the wrapper code and start the broker.
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
> > >> > >>> > > > gwen@confluent.io>
> > >> > >>> > > > > > > >> wrote:
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >> > > Can you clarify the workflow for the
> following
> > >> > >>> scenarios:
> > >> > >>> > > > > > > >> > >
> > >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to add
> > >> rack
> > >> > >>> > > information
> > >> > >>> > > > > for
> > >> > >>> > > > > > > >> each
> > >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to
> > specify
> > >> > which
> > >> > >>> > rack
> > >> > >>> > > it
> > >> > >>> > > > > > > >> belongs on
> > >> > >>> > > > > > > >> > > while adding it.
> > >> > >>> > > > > > > >> > >
> > >> > >>> > > > > > > >> > > Thanks!
> > >> > >>> > > > > > > >> > >
> > >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
> > >> > >>> > > > > allenxwang@gmail.com
> > >> > >>> > > > > > >
> > >> > >>> > > > > > > >> > wrote:
> > >> > >>> > > > > > > >> > >
> > >> > >>> > > > > > > >> > > > We discussed the KIP in the hangout today.
> > The
> > >> > >>> > > > recommendation
> > >> > >>> > > > > is
> > >> > >>> > > > > > > to
> > >> > >>> > > > > > > >> > make
> > >> > >>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For
> > >> users
> > >> > >>> with
> > >> > >>> > > > > existing
> > >> > >>> > > > > > > rack
> > >> > >>> > > > > > > >> > > > information stored somewhere, they would
> need
> > >> to
> > >> > >>> > retrieve
> > >> > >>> > > > the
> > >> > >>> > > > > > > >> > information
> > >> > >>> > > > > > > >> > > > at broker start up and dynamically set the
> > rack
> > >> > >>> > property,
> > >> > >>> > > > > which
> > >> > >>> > > > > > > can
> > >> > >>> > > > > > > >> be
> > >> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap
> broker.
> > >> > There
> > >> > >>> will
> > >> > >>> > > be
> > >> > >>> > > > no
> > >> > >>> > > > > > > >> > interface
> > >> > >>> > > > > > > >> > > or
> > >> > >>> > > > > > > >> > > > pluggable implementation to retrieve the
> rack
> > >> > >>> > information.
> > >> > >>> > > > > > > >> > > >
> > >> > >>> > > > > > > >> > > > The assumption is that you always need to
> > >> restart
> > >> > >>> the
> > >> > >>> > > broker
> > >> > >>> > > > > to
> > >> > >>> > > > > > > >> make a
> > >> > >>> > > > > > > >> > > > change to the rack.
> > >> > >>> > > > > > > >> > > >
> > >> > >>> > > > > > > >> > > > Once the rack becomes a broker property, it
> > >> will
> > >> > be
> > >> > >>> > > possible
> > >> > >>> > > > > to
> > >> > >>> > > > > > > make
> > >> > >>> > > > > > > >> > rack
> > >> > >>> > > > > > > >> > > > part of the meta data to help the consumer
> > >> choose
> > >> > >>> which
> > >> > >>> > in
> > >> > >>> > > > > sync
> > >> > >>> > > > > > > >> replica
> > >> > >>> > > > > > > >> > > to
> > >> > >>> > > > > > > >> > > > consume from as part of the future consumer
> > >> > >>> enhancement.
> > >> > >>> > > > > > > >> > > >
> > >> > >>> > > > > > > >> > > > I will update the KIP.
> > >> > >>> > > > > > > >> > > >
> > >> > >>> > > > > > > >> > > > Thanks,
> > >> > >>> > > > > > > >> > > > Allen
> > >> > >>> > > > > > > >> > > >
> > >> > >>> > > > > > > >> > > >
> > >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang
> <
> > >> > >>> > > > > > allenxwang@gmail.com>
> > >> > >>> > > > > > > >> > wrote:
> > >> > >>> > > > > > > >> > > >
> > >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this
> > KIP
> > >> > was
> > >> > >>> not
> > >> > >>> > > > > > discussed
> > >> > >>> > > > > > > >> due
> > >> > >>> > > > > > > >> > to
> > >> > >>> > > > > > > >> > > > > time constraint.
> > >> > >>> > > > > > > >> > > > >
> > >> > >>> > > > > > > >> > > > > However, after hearing discussion of
> > KIP-35,
> > >> I
> > >> > >>> have
> > >> > >>> > the
> > >> > >>> > > > > > feeling
> > >> > >>> > > > > > > >> that
> > >> > >>> > > > > > > >> > > > > incompatibility (caused by new broker
> > >> property)
> > >> > >>> > between
> > >> > >>> > > > > > brokers
> > >> > >>> > > > > > > >> with
> > >> > >>> > > > > > > >> > > > > different versions  will be solved there.
> > In
> > >> > >>> addition,
> > >> > >>> > > > > having
> > >> > >>> > > > > > > >> stack
> > >> > >>> > > > > > > >> > in
> > >> > >>> > > > > > > >> > > > > broker property as meta data may also
> help
> > >> > >>> consumers
> > >> > >>> > in
> > >> > >>> > > > the
> > >> > >>> > > > > > > >> future.
> > >> > >>> > > > > > > >> > So
> > >> > >>> > > > > > > >> > > I
> > >> > >>> > > > > > > >> > > > am
> > >> > >>> > > > > > > >> > > > > open to adding stack property to broker.
> > >> > >>> > > > > > > >> > > > >
> > >> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the next
> > KIP
> > >> > >>> hangout.
> > >> > >>> > > > > > > >> > > > >
> > >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen
> > Wang <
> > >> > >>> > > > > > > allenxwang@gmail.com
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >> > > > wrote:
> > >> > >>> > > > > > > >> > > > >
> > >> > >>> > > > > > > >> > > > >> Can you send me the information on the
> > next
> > >> KIP
> > >> > >>> > > hangout?
> > >> > >>> > > > > > > >> > > > >>
> > >> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping is not
> > >> > cached.
> > >> > >>> In
> > >> > >>> > > > > > KafkaApis,
> > >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each
> > >> time
> > >> > the
> > >> > >>> > > mapping
> > >> > >>> > > > > is
> > >> > >>> > > > > > > >> needed
> > >> > >>> > > > > > > >> > > for
> > >> > >>> > > > > > > >> > > > >> auto topic creation. This will ensure
> > latest
> > >> > >>> mapping
> > >> > >>> > is
> > >> > >>> > > > > used
> > >> > >>> > > > > > at
> > >> > >>> > > > > > > >> any
> > >> > >>> > > > > > > >> > > > time.
> > >> > >>> > > > > > > >> > > > >>
> > >> > >>> > > > > > > >> > > > >> The ability to get the complete mapping
> > >> makes
> > >> > it
> > >> > >>> > simple
> > >> > >>> > > > to
> > >> > >>> > > > > > > reuse
> > >> > >>> > > > > > > >> the
> > >> > >>> > > > > > > >> > > > same
> > >> > >>> > > > > > > >> > > > >> interface in command line tools.
> > >> > >>> > > > > > > >> > > > >>
> > >> > >>> > > > > > > >> > > > >>
> > >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya
> > >> > >>> Auradkar <
> > >> > >>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
> > >> > >>> > > > > > > >> > > > >>
> > >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the next
> > KIP
> > >> > >>> hangout?
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack locator
> > can
> > >> be
> > >> > >>> useful
> > >> > >>> > > > but I
> > >> > >>> > > > > > do
> > >> > >>> > > > > > > >> see a
> > >> > >>> > > > > > > >> > > few
> > >> > >>> > > > > > > >> > > > >>> concerns:
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in the
> > >> > >>> document),
> > >> > >>> > > > implies
> > >> > >>> > > > > > that
> > >> > >>> > > > > > > >> it
> > >> > >>> > > > > > > >> > can
> > >> > >>> > > > > > > >> > > > >>> discover rack information for any node
> in
> > >> the
> > >> > >>> > cluster.
> > >> > >>> > > > How
> > >> > >>> > > > > > > does
> > >> > >>> > > > > > > >> it
> > >> > >>> > > > > > > >> > > deal
> > >> > >>> > > > > > > >> > > > >>> with rack location changes? For
> example,
> > >> if I
> > >> > >>> moved
> > >> > >>> > > > broker
> > >> > >>> > > > > > id
> > >> > >>> > > > > > > >> (1)
> > >> > >>> > > > > > > >> > > from
> > >> > >>> > > > > > > >> > > > >>> rack
> > >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that
> broker
> > >> with
> > >> > a
> > >> > >>> > newer
> > >> > >>> > > > rack
> > >> > >>> > > > > > > >> config.
> > >> > >>> > > > > > > >> > If
> > >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack
> > >> > >>> information at
> > >> > >>> > > > start
> > >> > >>> > > > > up
> > >> > >>> > > > > > > >> time,
> > >> > >>> > > > > > > >> > > any
> > >> > >>> > > > > > > >> > > > >>> change to a broker will require
> bouncing
> > >> the
> > >> > >>> entire
> > >> > >>> > > > > cluster
> > >> > >>> > > > > > > >> since
> > >> > >>> > > > > > > >> > > > >>> createTopic requests can be sent to any
> > >> node
> > >> > in
> > >> > >>> the
> > >> > >>> > > > > cluster.
> > >> > >>> > > > > > > >> > > > >>> For this reason it may be simpler to
> have
> > >> each
> > >> > >>> node
> > >> > >>> > be
> > >> > >>> > > > > aware
> > >> > >>> > > > > > > of
> > >> > >>> > > > > > > >> its
> > >> > >>> > > > > > > >> > > own
> > >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during start
> up
> > >> > time.
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an
> > >> > external
> > >> > >>> > > service
> > >> > >>> > > > > > being
> > >> > >>> > > > > > > >> > > available
> > >> > >>> > > > > > > >> > > > >>> to
> > >> > >>> > > > > > > >> > > > >>> serve rack information.
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a
> > couple
> > >> of
> > >> > >>> other
> > >> > >>> > > > > systems
> > >> > >>> > > > > > > deal
> > >> > >>> > > > > > > >> > with
> > >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> > >> > >>> > > > > > > >> > > > >>> For Cassandra some interesting modes
> are:
> > >> > >>> > > > > > > >> > > > >>> (Property File configuration)
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > >
> > >> > >>> > > > > > > >> > >
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >>
> > >> > >>> > > > > > >
> > >> > >>> > > > > >
> > >> > >>> > > > >
> > >> > >>> > > >
> > >> > >>> > >
> > >> > >>> >
> > >> > >>>
> > >> >
> > >>
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > >
> > >> > >>> > > > > > > >> > >
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >>
> > >> > >>> > > > > > >
> > >> > >>> > > > > >
> > >> > >>> > > > >
> > >> > >>> > > >
> > >> > >>> > >
> > >> > >>> >
> > >> > >>>
> > >> >
> > >>
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > > >>> Voldemort does a static node -> zone
> > >> > assignment
> > >> > >>> > based
> > >> > >>> > > on
> > >> > >>> > > > > > > >> > > configuration.
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > > >>> Aditya
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen
> > >> Wang <
> > >> > >>> > > > > > > >> allenxwang@gmail.com
> > >> > >>> > > > > > > >> > >
> > >> > >>> > > > > > > >> > > > >>> wrote:
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > > >>> > I would like to see if we can do
> both:
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to
> > >> facilitate
> > >> > >>> > migration
> > >> > >>> > > > > with
> > >> > >>> > > > > > > >> > existing
> > >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>> > - Make rack an optional property for
> > >> broker.
> > >> > >>> If
> > >> > >>> > rack
> > >> > >>> > > > is
> > >> > >>> > > > > > > >> available
> > >> > >>> > > > > > > >> > > > from
> > >> > >>> > > > > > > >> > > > >>> > broker, treat it as source of truth.
> > For
> > >> > users
> > >> > >>> > with
> > >> > >>> > > > > > existing
> > >> > >>> > > > > > > >> > > > >>> broker-rack
> > >> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can use
> > the
> > >> > >>> pluggable
> > >> > >>> > > way
> > >> > >>> > > > > or
> > >> > >>> > > > > > > they
> > >> > >>> > > > > > > >> > can
> > >> > >>> > > > > > > >> > > > >>> transfer
> > >> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack
> > property.
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what
> happens
> > >> at
> > >> > >>> rolling
> > >> > >>> > > > > upgrade
> > >> > >>> > > > > > > >> when
> > >> > >>> > > > > > > >> > we
> > >> > >>> > > > > > > >> > > > have
> > >> > >>> > > > > > > >> > > > >>> > rack as a broker property. For
> brokers
> > >> with
> > >> > >>> older
> > >> > >>> > > > > version
> > >> > >>> > > > > > of
> > >> > >>> > > > > > > >> > Kafka,
> > >> > >>> > > > > > > >> > > > >>> will it
> > >> > >>> > > > > > > >> > > > >>> > cause problem for them? If so, is
> there
> > >> any
> > >> > >>> > > > workaround?
> > >> > >>> > > > > I
> > >> > >>> > > > > > > also
> > >> > >>> > > > > > > >> > > think
> > >> > >>> > > > > > > >> > > > it
> > >> > >>> > > > > > > >> > > > >>> > would be better not to have rack in
> the
> > >> > >>> controller
> > >> > >>> > > > wire
> > >> > >>> > > > > > > >> protocol
> > >> > >>> > > > > > > >> > > but
> > >> > >>> > > > > > > >> > > > >>> not
> > >> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>> > Thanks,
> > >> > >>> > > > > > > >> > > > >>> > Allen
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd
> > >> > Palino <
> > >> > >>> > > > > > > >> tpalino@gmail.com>
> > >> > >>> > > > > > > >> > > > >>> wrote:
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a
> > pluggable
> > >> > >>> locator.
> > >> > >>> > > For
> > >> > >>> > > > > > > >> example, we
> > >> > >>> > > > > > > >> > > > >>> already
> > >> > >>> > > > > > > >> > > > >>> > > have an interface for discovering
> > >> > >>> information
> > >> > >>> > > about
> > >> > >>> > > > > the
> > >> > >>> > > > > > > >> > physical
> > >> > >>> > > > > > > >> > > > >>> location
> > >> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea
> > of
> > >> > >>> having to
> > >> > >>> > > > > > maintain
> > >> > >>> > > > > > > >> data
> > >> > >>> > > > > > > >> > in
> > >> > >>> > > > > > > >> > > > >>> > multiple
> > >> > >>> > > > > > > >> > > > >>> > > places.
> > >> > >>> > > > > > > >> > > > >>> > >
> > >> > >>> > > > > > > >> > > > >>> > > -Todd
> > >> > >>> > > > > > > >> > > > >>> > >
> > >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM,
> > Aditya
> > >> > >>> > Auradkar <
> > >> > >>> > > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid>
> > wrote:
> > >> > >>> > > > > > > >> > > > >>> > >
> > >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP
> Allen.
> > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a
> > >> > >>> RackLocator
> > >> > >>> > > class
> > >> > >>> > > > > that
> > >> > >>> > > > > > > is
> > >> > >>> > > > > > > >> > > > pluggable
> > >> > >>> > > > > > > >> > > > >>> > seems
> > >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers
> > to
> > >> > >>> > potentially
> > >> > >>> > > > > > non-ZK
> > >> > >>> > > > > > > >> > storage
> > >> > >>> > > > > > > >> > > > >>> for the
> > >> > >>> > > > > > > >> > > > >>> > > > rack info which I don't think is
> > >> > >>> necessary.
> > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info
> in
> > >> zk
> > >> > >>> under
> > >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> > >> > >>> > > > > > > >> > > > >>> > > > similar to other broker
> properties
> > >> and
> > >> > >>> add a
> > >> > >>> > > > config
> > >> > >>> > > > > in
> > >> > >>> > > > > > > >> > > > KafkaConfig
> > >> > >>> > > > > > > >> > > > >>> > called
> > >> > >>> > > > > > > >> > > > >>> > > > "rack".
> > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > >>> > > > > > >
> > >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > >> > >>> > > > > > > >> > > "rack":
> > >> > >>> > > > > > > >> > > > >>> > "abc"}
> > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > >>> > > > > > > >> > > > >>> > > > Aditya
> > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM,
> > Gwen
> > >> > >>> Shapira
> > >> > >>> > <
> > >> > >>> > > > > > > >> > > gwen@confluent.io
> > >> > >>> > > > > > > >> > > > >
> > >> > >>> > > > > > > >> > > > >>> > wrote:
> > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a
> > KIP
> > >> > for
> > >> > >>> > this.
> > >> > >>> > > > This
> > >> > >>> > > > > > is
> > >> > >>> > > > > > > >> super
> > >> > >>> > > > > > > >> > > > >>> important
> > >> > >>> > > > > > > >> > > > >>> > > for
> > >> > >>> > > > > > > >> > > > >>> > > > > production deployments of
> Kafka.
> > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many
> > >> racks
> > >> > as
> > >> > >>> > > > > possible"?
> > >> > >>> > > > > > > I'd
> > >> > >>> > > > > > > >> > want
> > >> > >>> > > > > > > >> > > to
> > >> > >>> > > > > > > >> > > > >>> > balance
> > >> > >>> > > > > > > >> > > > >>> > > > > between safety (more racks) and
> > >> > network
> > >> > >>> > > > > utilization
> > >> > >>> > > > > > > >> > (traffic
> > >> > >>> > > > > > > >> > > > >>> within a
> > >> > >>> > > > > > > >> > > > >>> > > > rack
> > >> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR
> > >> switch).
> > >> > One
> > >> > >>> > > replica
> > >> > >>> > > > > on
> > >> > >>> > > > > > a
> > >> > >>> > > > > > > >> > > different
> > >> > >>> > > > > > > >> > > > >>> rack
> > >> > >>> > > > > > > >> > > > >>> > > and
> > >> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if
> > possible)
> > >> > >>> sounds
> > >> > >>> > > > better
> > >> > >>> > > > > to
> > >> > >>> > > > > > > me.
> > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems
> > overly
> > >> > >>> complex
> > >> > >>> > > > > compared
> > >> > >>> > > > > > to
> > >> > >>> > > > > > > >> > > adding a
> > >> > >>> > > > > > > >> > > > >>> > > > rack.number
> > >> > >>> > > > > > > >> > > > >>> > > > > property to the broker
> properties
> > >> > file.
> > >> > >>> Why
> > >> > >>> > do
> > >> > >>> > > > we
> > >> > >>> > > > > > want
> > >> > >>> > > > > > > >> > that?
> > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15
> PM,
> > >> > Allen
> > >> > >>> > Wang <
> > >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> > >> > >>> > > > > > > >> > > > >>> > > > wrote:
> > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for
> rack
> > >> aware
> > >> > >>> > replica
> > >> > >>> > > > > > > >> assignment.
> > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > >>> > > > > > > >> > > > >>> > >
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > >
> > >> > >>> > > > > > > >> > >
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >>
> > >> > >>> > > > > > >
> > >> > >>> > > > > >
> > >> > >>> > > > >
> > >> > >>> > > >
> > >> > >>> > >
> > >> > >>> >
> > >> > >>>
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the
> > >> isolation
> > >> > >>> > > provided
> > >> > >>> > > > by
> > >> > >>> > > > > > the
> > >> > >>> > > > > > > >> > racks
> > >> > >>> > > > > > > >> > > in
> > >> > >>> > > > > > > >> > > > >>> data
> > >> > >>> > > > > > > >> > > > >>> > > > center
> > >> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas to
> > racks
> > >> to
> > >> > >>> > provide
> > >> > >>> > > > > fault
> > >> > >>> > > > > > > >> > > tolerance.
> > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
> > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> > >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> > >> > >>> > > > > > > >> > > > >>> > > > > >
> > >> > >>> > > > > > > >> > > > >>> > > > >
> > >> > >>> > > > > > > >> > > > >>> > > >
> > >> > >>> > > > > > > >> > > > >>> > >
> > >> > >>> > > > > > > >> > > > >>> >
> > >> > >>> > > > > > > >> > > > >>>
> > >> > >>> > > > > > > >> > > > >>
> > >> > >>> > > > > > > >> > > > >>
> > >> > >>> > > > > > > >> > > > >
> > >> > >>> > > > > > > >> > > >
> > >> > >>> > > > > > > >> > >
> > >> > >>> > > > > > > >> >
> > >> > >>> > > > > > > >>
> > >> > >>> > > > > > > >
> > >> > >>> > > > > > > >
> > >> > >>> > > > > > >
> > >> > >>> > > > > >
> > >> > >>> > > > >
> > >> > >>> > > >
> > >> > >>> > >
> > >> > >>> >
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> --
> > >> > >>> Thanks,
> > >> > >>> Neha
> > >> > >>>
> > >> > >>
> > >> > >>
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Aditya Auradkar <aa...@linkedin.com.INVALID>.
Jun/Allen -

Did we ever actually agree on whether we should evolve the TMR to include
rack info or not?
I don't feel strongly about it but I if it's the right thing to do we
should probably do it in this KIP (can be a separate patch).. it isn't a
large change.

Aditya

On Sat, Dec 26, 2015 at 3:01 PM, Allen Wang <al...@gmail.com> wrote:

> Added the rolling upgrade instruction in the KIP, similar to those in 0.9.0
> release notes.
>
> On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <al...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > The reason that TopicMetadataResponse is not included in the KIP is that
> > it currently is not version aware . So we need to introduce version to it
> > in order to make sure backward compatibility. It seems to me a big
> change.
> > Do we want to couple it with this KIP? Do we need to further discuss what
> > information to include in the new version besides rack? For example,
> should
> > we include broker security protocol in TopicMetadataResponse?
> >
> > The other option is to make it a separate KIP to make
> > TopicMetadataResponse version aware and decide what to include, and make
> > this KIP focus on the rack aware algorithm, admin tools  and related
> > changes to inter-broker protocol .
> >
> > Thanks,
> > Allen
> >
> >
> >
> >
> > On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <ju...@confluent.io> wrote:
> >
> >> Allen,
> >>
> >> Thanks for the proposal. A few comments.
> >>
> >> 1. Since this KIP changes the inter broker communication protocol
> >> (UpdateMetadataRequest), we will need to document the upgrade path
> >> (similar
> >> to what's described in
> >> http://kafka.apache.org/090/documentation.html#upgrade).
> >>
> >> 2. It might be useful to include the rack info of the broker in
> >> TopicMetadataResponse. This can be useful for administrative tasks, as
> >> well
> >> as read affinity in the future.
> >>
> >> Jun
> >>
> >>
> >>
> >> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <al...@gmail.com>
> wrote:
> >>
> >> > If there are no more comments I would like to call for a vote.
> >> >
> >> >
> >> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <al...@gmail.com>
> >> wrote:
> >> >
> >> > > KIP is updated with more details and how to handle the situation
> where
> >> > > rack information is incomplete.
> >> > >
> >> > > In the situation where rack information is incomplete, but we want
> to
> >> > > continue with the assignment, I have suggested to ignore all rack
> >> > > information and fallback to original algorithm. The reason is
> >> explained
> >> > > below:
> >> > >
> >> > > The other options are to assume that the broker without the rack
> >> belong
> >> > to
> >> > > its own unique rack, or they belong to one "default" rack. Either
> way
> >> we
> >> > > choose, it is highly likely to result in uneven number of brokers in
> >> > racks,
> >> > > and it is quite possible that the "made up" racks will have much
> fewer
> >> > > number of brokers. As I explained in the KIP, uneven number of
> >> brokers in
> >> > > racks will lead to uneven distribution of replicas among brokers
> (even
> >> > > though the leader distribution is still even). The brokers in the
> rack
> >> > that
> >> > > has fewer number of brokers will get more replicas per broker than
> >> > brokers
> >> > > in other racks.
> >> > >
> >> > > Given this fact and the replica assignment produced will be
> incorrect
> >> > > anyway from rack aware point of view, ignoring all rack information
> >> and
> >> > > fallback to the original algorithm is not a bad choice since it will
> >> at
> >> > > least have a better guarantee of replica distribution.
> >> > >
> >> > > Also for command line tools it gives user a choice if for any reason
> >> they
> >> > > want to ignore rack information and fallback to the original
> >> algorithm.
> >> > >
> >> > >
> >> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <al...@gmail.com>
> >> > wrote:
> >> > >
> >> > >> I am busy with some time pressing issues for the last few days. I
> >> will
> >> > >> think about how the incomplete rack information will affect the
> >> balance
> >> > and
> >> > >> update the KIP by early next week.
> >> > >>
> >> > >> Thanks,
> >> > >> Allen
> >> > >>
> >> > >>
> >> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <ne...@confluent.io>
> >> > wrote:
> >> > >>
> >> > >>> Few suggestions on improving the KIP
> >> > >>>
> >> > >>> *If some brokers have rack, and some do not, the algorithm will
> >> thrown
> >> > an
> >> > >>> > exception. This is to prevent incorrect assignment caused by
> user
> >> > >>> error.*
> >> > >>>
> >> > >>>
> >> > >>> In the KIP, can you clearly state the user-facing behavior when
> some
> >> > >>> brokers have rack information and some don't. Which actions and
> >> > requests
> >> > >>> will error out and how?
> >> > >>>
> >> > >>> *Even distribution of partition leadership among brokers*
> >> > >>>
> >> > >>>
> >> > >>> There is some information about arranging the sorted broker list
> >> > >>> interlaced
> >> > >>> with rack ids. Can you describe the changes to the current
> algorithm
> >> > in a
> >> > >>> little more detail? How does this interlacing work if only a
> subset
> >> of
> >> > >>> brokers have the rack id configured? Does this still work if
> uneven
> >> #
> >> > of
> >> > >>> brokers are assigned to each rack? It might work, I'm looking for
> >> more
> >> > >>> details on the changes, since it will affect the behavior seen by
> >> the
> >> > >>> user
> >> > >>> - imbalance on either the leaders or data or both.
> >> > >>>
> >> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> >> > aauradkar@linkedin.com>
> >> > >>> wrote:
> >> > >>>
> >> > >>> > I think this sounds reasonable. Anyone else have comments?
> >> > >>> >
> >> > >>> > Aditya
> >> > >>> >
> >> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <
> allenxwang@gmail.com
> >> >
> >> > >>> wrote:
> >> > >>> >
> >> > >>> > > During the discussion in the hangout, it was mentioned that it
> >> > would
> >> > >>> be
> >> > >>> > > desirable that consumers know the rack information of the
> >> brokers
> >> > so
> >> > >>> that
> >> > >>> > > they can consume from the broker in the same rack to reduce
> >> > latency.
> >> > >>> As I
> >> > >>> > > understand this will only be beneficial if consumer can
> consume
> >> > from
> >> > >>> any
> >> > >>> > > broker in ISR, which is not possible now.
> >> > >>> > >
> >> > >>> > > I suggest we skip the change to TMR. Once the change is made
> to
> >> > >>> consumer
> >> > >>> > to
> >> > >>> > > be able to consume from any broker in ISR, the rack
> information
> >> can
> >> > >>> be
> >> > >>> > > added to TMR.
> >> > >>> > >
> >> > >>> > > Another thing I want to confirm is  command line behavior. I
> >> think
> >> > >>> the
> >> > >>> > > desirable default behavior is to fail fast on command line for
> >> > >>> incomplete
> >> > >>> > > rack mapping. The error message can include further
> instruction
> >> > that
> >> > >>> > tells
> >> > >>> > > the user to add an extra argument (like
> >> "--allow-partial-rackinfo")
> >> > >>> to
> >> > >>> > > suppress the error and do an imperfect rack aware assignment.
> If
> >> > the
> >> > >>> > > default behavior is to allow incomplete mapping, the error can
> >> > still
> >> > >>> be
> >> > >>> > > easily missed.
> >> > >>> > >
> >> > >>> > > The affected command line tools are TopicCommand and
> >> > >>> > > ReassignPartitionsCommand.
> >> > >>> > >
> >> > >>> > > Thanks,
> >> > >>> > > Allen
> >> > >>> > >
> >> > >>> > >
> >> > >>> > >
> >> > >>> > >
> >> > >>> > >
> >> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> >> > >>> > aauradkar@linkedin.com>
> >> > >>> > > wrote:
> >> > >>> > >
> >> > >>> > > > Hi Allen,
> >> > >>> > > >
> >> > >>> > > > For TopicMetadataResponse to understand version, you can
> bump
> >> up
> >> > >>> the
> >> > >>> > > > request version itself. Based on the version of the request,
> >> the
> >> > >>> > response
> >> > >>> > > > can be appropriately serialized. It shouldn't be a huge
> >> change.
> >> > For
> >> > >>> > > > example: We went through something similar for
> ProduceRequest
> >> > >>> recently
> >> > >>> > (
> >> > >>> > > > https://reviews.apache.org/r/33378/)
> >> > >>> > > > I guess the reason protocol information is not included in
> the
> >> > TMR
> >> > >>> is
> >> > >>> > > > because the topic itself is independent of any particular
> >> > protocol
> >> > >>> (SSL
> >> > >>> > > vs
> >> > >>> > > > Plaintext). Having said that, I'm not sure we even need rack
> >> > >>> > information
> >> > >>> > > in
> >> > >>> > > > TMR. What usecase were you thinking of initially?
> >> > >>> > > >
> >> > >>> > > > For 1 - I'd be fine with adding an option to the command
> line
> >> > tools
> >> > >>> > that
> >> > >>> > > > check rack assignment. For e.g. "--strict-assignment" or
> >> > something
> >> > >>> > > similar.
> >> > >>> > > >
> >> > >>> > > > Aditya
> >> > >>> > > >
> >> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> >> > allenxwang@gmail.com>
> >> > >>> > > wrote:
> >> > >>> > > >
> >> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a look.
> One
> >> > >>> thing I
> >> > >>> > > have
> >> > >>> > > > > changed is removing the proposal to add rack to
> >> > >>> > TopicMetadataResponse.
> >> > >>> > > > The
> >> > >>> > > > > reason is that unlike UpdateMetadataRequest,
> >> > >>> TopicMetadataResponse
> >> > >>> > does
> >> > >>> > > > not
> >> > >>> > > > > understand version. I don't see a way to include rack
> >> without
> >> > >>> > breaking
> >> > >>> > > > old
> >> > >>> > > > > version of clients. That's probably why secure protocol is
> >> not
> >> > >>> > included
> >> > >>> > > > in
> >> > >>> > > > > the TopicMetadataResponse either. I think it will be a
> much
> >> > >>> bigger
> >> > >>> > > change
> >> > >>> > > > > to include rack in TopicMetadataResponse.
> >> > >>> > > > >
> >> > >>> > > > > For 1, my concern is that doing rack aware assignment
> >> without
> >> > >>> > complete
> >> > >>> > > > > broker to rack mapping will result in assignment that is
> not
> >> > rack
> >> > >>> > aware
> >> > >>> > > > and
> >> > >>> > > > > fail to provide fault tolerance in the event of rack
> outage.
> >> > This
> >> > >>> > kind
> >> > >>> > > of
> >> > >>> > > > > problem will be difficult to surface. And the cost of this
> >> > >>> problem is
> >> > >>> > > > high:
> >> > >>> > > > > you have to do partition reassignment if you are lucky to
> >> spot
> >> > >>> the
> >> > >>> > > > problem
> >> > >>> > > > > early on or face the consequence of data loss during real
> >> rack
> >> > >>> > outage.
> >> > >>> > > > >
> >> > >>> > > > > I do see the concern of fail-fast as it might also cause
> >> data
> >> > >>> loss if
> >> > >>> > > > > producer is not able produce the message due to topic
> >> creation
> >> > >>> > failure.
> >> > >>> > > > Is
> >> > >>> > > > > it feasible to treat dynamic topic creation and command
> >> tools
> >> > >>> > > > differently?
> >> > >>> > > > > We allow dynamic topic creation with incomplete
> broker-rack
> >> > >>> mapping
> >> > >>> > and
> >> > >>> > > > > fail fast in command line. Another option is to let user
> >> > >>> determine
> >> > >>> > the
> >> > >>> > > > > behavior for command line. For example, by default fail
> >> fast in
> >> > >>> > command
> >> > >>> > > > > line but allow incomplete broker-rack mapping if another
> >> switch
> >> > >>> is
> >> > >>> > > > > provided.
> >> > >>> > > > >
> >> > >>> > > > >
> >> > >>> > > > >
> >> > >>> > > > >
> >> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> >> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> >> > >>> > > > >
> >> > >>> > > > > > Hey Allen,
> >> > >>> > > > > >
> >> > >>> > > > > > 1. If we choose fail fast topic creation, we will have
> >> topic
> >> > >>> > creation
> >> > >>> > > > > > failures while upgrading the cluster. I really doubt we
> >> want
> >> > >>> this
> >> > >>> > > > > behavior.
> >> > >>> > > > > > Ideally, this should be invisible to clients of a
> cluster.
> >> > >>> > Currently,
> >> > >>> > > > > each
> >> > >>> > > > > > broker is effectively its own rack. So we probably can
> use
> >> > the
> >> > >>> rack
> >> > >>> > > > > > information whenever possible but not make it a hard
> >> > >>> requirement.
> >> > >>> > To
> >> > >>> > > > > extend
> >> > >>> > > > > > Gwen's example, one badly configured broker should not
> >> > degrade
> >> > >>> > topic
> >> > >>> > > > > > creation for the entire cluster.
> >> > >>> > > > > >
> >> > >>> > > > > > 2. Upgrade scenario - Can you add a section on the
> upgrade
> >> > >>> piece to
> >> > >>> > > > > confirm
> >> > >>> > > > > > that old clients will not see errors? I believe
> >> > >>> > > > > ZookeeperConsumerConnector
> >> > >>> > > > > > reads the Broker objects from ZK. I wanted to confirm
> that
> >> > this
> >> > >>> > will
> >> > >>> > > > not
> >> > >>> > > > > > cause any problems.
> >> > >>> > > > > >
> >> > >>> > > > > > 3. Could you elaborate your proposed changes to the
> >> > >>> > > > UpdateMetadataRequest
> >> > >>> > > > > > in the "Public Interfaces" section? Personally, I find
> >> this
> >> > >>> format
> >> > >>> > > easy
> >> > >>> > > > > to
> >> > >>> > > > > > read in terms of wire protocol changes:
> >> > >>> > > > > >
> >> > >>> > > > > >
> >> > >>> > > > >
> >> > >>> > > >
> >> > >>> > >
> >> > >>> >
> >> > >>>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> >> > >>> > > > > >
> >> > >>> > > > > > Aditya
> >> > >>> > > > > >
> >> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> >> > >>> allenxwang@gmail.com>
> >> > >>> > > > > wrote:
> >> > >>> > > > > >
> >> > >>> > > > > > > KIP is updated include rack as an optional property
> for
> >> > >>> broker.
> >> > >>> > > > Please
> >> > >>> > > > > > take
> >> > >>> > > > > > > a look and let me know if more details are needed.
> >> > >>> > > > > > >
> >> > >>> > > > > > > For the case where some brokers have rack and some do
> >> not,
> >> > >>> the
> >> > >>> > > > current
> >> > >>> > > > > > KIP
> >> > >>> > > > > > > uses the fail-fast behavior. If there are concerns, we
> >> can
> >> > >>> > further
> >> > >>> > > > > > discuss
> >> > >>> > > > > > > this in the email thread or next hangout.
> >> > >>> > > > > > >
> >> > >>> > > > > > >
> >> > >>> > > > > > >
> >> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
> >> > >>> > allenxwang@gmail.com
> >> > >>> > > >
> >> > >>> > > > > > wrote:
> >> > >>> > > > > > >
> >> > >>> > > > > > > > That's a good question. I can think of three actions
> >> if
> >> > the
> >> > >>> > rack
> >> > >>> > > > > > > > information is incomplete:
> >> > >>> > > > > > > >
> >> > >>> > > > > > > > 1. Treat the node without rack as if it is on its
> >> unique
> >> > >>> rack
> >> > >>> > > > > > > > 2. Disregard all rack information and fallback to
> >> current
> >> > >>> > > algorithm
> >> > >>> > > > > > > > 3. Fail-fast
> >> > >>> > > > > > > >
> >> > >>> > > > > > > > Now I think about it, one and three make more sense.
> >> The
> >> > >>> reason
> >> > >>> > > for
> >> > >>> > > > > > > > fail-fast is that user mistake for not providing the
> >> rack
> >> > >>> may
> >> > >>> > > never
> >> > >>> > > > > be
> >> > >>> > > > > > > > found if we tolerate that and the assignment may not
> >> be
> >> > >>> rack
> >> > >>> > > aware
> >> > >>> > > > as
> >> > >>> > > > > > the
> >> > >>> > > > > > > > user has expected and this creates debug problems
> when
> >> > >>> things
> >> > >>> > > fail.
> >> > >>> > > > > > > >
> >> > >>> > > > > > > > What do you think? If not fail-fast, is there anyway
> >> we
> >> > can
> >> > >>> > make
> >> > >>> > > > the
> >> > >>> > > > > > user
> >> > >>> > > > > > > > error standing out?
> >> > >>> > > > > > > >
> >> > >>> > > > > > > >
> >> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
> >> > >>> > > gwen@confluent.io>
> >> > >>> > > > > > > wrote:
> >> > >>> > > > > > > >
> >> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers have
> rack
> >> > >>> > assignment
> >> > >>> > > > and
> >> > >>> > > > > > some
> >> > >>> > > > > > > >> don't, do we act like none of them have it? or like
> >> > those
> >> > >>> > > without
> >> > >>> > > > > > > >> assignment are in their own rack?
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >> The first scenario is good when first setting up
> >> > >>> > rack-awareness,
> >> > >>> > > > but
> >> > >>> > > > > > the
> >> > >>> > > > > > > >> second makes more sense for on-going maintenance (I
> >> can
> >> > >>> > totally
> >> > >>> > > > see
> >> > >>> > > > > > > >> someone
> >> > >>> > > > > > > >> adding a node and forgetting to set the rack
> >> property,
> >> > we
> >> > >>> > don't
> >> > >>> > > > want
> >> > >>> > > > > > > this
> >> > >>> > > > > > > >> to change behavior for anything except the new
> node).
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >> What do you think?
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >> Gwen
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
> >> > >>> > > > allenxwang@gmail.com>
> >> > >>> > > > > > > >> wrote:
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >> > For scenario 1:
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > - Add the rack information to broker property
> file
> >> or
> >> > >>> > > > dynamically
> >> > >>> > > > > > set
> >> > >>> > > > > > > >> it in
> >> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You
> >> would
> >> > do
> >> > >>> > that
> >> > >>> > > > for
> >> > >>> > > > > > all
> >> > >>> > > > > > > >> > brokers and restart the brokers one by one.
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > In this scenario, the complete broker to rack
> >> mapping
> >> > >>> may
> >> > >>> > not
> >> > >>> > > be
> >> > >>> > > > > > > >> available
> >> > >>> > > > > > > >> > until every broker is restarted. During that time
> >> we
> >> > >>> fall
> >> > >>> > back
> >> > >>> > > > to
> >> > >>> > > > > > > >> default
> >> > >>> > > > > > > >> > replica assignment algorithm.
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > For scenario 2:
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > - Add the rack information to broker property
> file
> >> or
> >> > >>> > > > dynamically
> >> > >>> > > > > > set
> >> > >>> > > > > > > >> it in
> >> > >>> > > > > > > >> > the wrapper code and start the broker.
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
> >> > >>> > > > gwen@confluent.io>
> >> > >>> > > > > > > >> wrote:
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > > Can you clarify the workflow for the following
> >> > >>> scenarios:
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to add
> >> rack
> >> > >>> > > information
> >> > >>> > > > > for
> >> > >>> > > > > > > >> each
> >> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to
> specify
> >> > which
> >> > >>> > rack
> >> > >>> > > it
> >> > >>> > > > > > > >> belongs on
> >> > >>> > > > > > > >> > > while adding it.
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> > > Thanks!
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
> >> > >>> > > > > allenxwang@gmail.com
> >> > >>> > > > > > >
> >> > >>> > > > > > > >> > wrote:
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> > > > We discussed the KIP in the hangout today.
> The
> >> > >>> > > > recommendation
> >> > >>> > > > > is
> >> > >>> > > > > > > to
> >> > >>> > > > > > > >> > make
> >> > >>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For
> >> users
> >> > >>> with
> >> > >>> > > > > existing
> >> > >>> > > > > > > rack
> >> > >>> > > > > > > >> > > > information stored somewhere, they would need
> >> to
> >> > >>> > retrieve
> >> > >>> > > > the
> >> > >>> > > > > > > >> > information
> >> > >>> > > > > > > >> > > > at broker start up and dynamically set the
> rack
> >> > >>> > property,
> >> > >>> > > > > which
> >> > >>> > > > > > > can
> >> > >>> > > > > > > >> be
> >> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker.
> >> > There
> >> > >>> will
> >> > >>> > > be
> >> > >>> > > > no
> >> > >>> > > > > > > >> > interface
> >> > >>> > > > > > > >> > > or
> >> > >>> > > > > > > >> > > > pluggable implementation to retrieve the rack
> >> > >>> > information.
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > The assumption is that you always need to
> >> restart
> >> > >>> the
> >> > >>> > > broker
> >> > >>> > > > > to
> >> > >>> > > > > > > >> make a
> >> > >>> > > > > > > >> > > > change to the rack.
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > Once the rack becomes a broker property, it
> >> will
> >> > be
> >> > >>> > > possible
> >> > >>> > > > > to
> >> > >>> > > > > > > make
> >> > >>> > > > > > > >> > rack
> >> > >>> > > > > > > >> > > > part of the meta data to help the consumer
> >> choose
> >> > >>> which
> >> > >>> > in
> >> > >>> > > > > sync
> >> > >>> > > > > > > >> replica
> >> > >>> > > > > > > >> > > to
> >> > >>> > > > > > > >> > > > consume from as part of the future consumer
> >> > >>> enhancement.
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > I will update the KIP.
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > Thanks,
> >> > >>> > > > > > > >> > > > Allen
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
> >> > >>> > > > > > allenxwang@gmail.com>
> >> > >>> > > > > > > >> > wrote:
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this
> KIP
> >> > was
> >> > >>> not
> >> > >>> > > > > > discussed
> >> > >>> > > > > > > >> due
> >> > >>> > > > > > > >> > to
> >> > >>> > > > > > > >> > > > > time constraint.
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > > > However, after hearing discussion of
> KIP-35,
> >> I
> >> > >>> have
> >> > >>> > the
> >> > >>> > > > > > feeling
> >> > >>> > > > > > > >> that
> >> > >>> > > > > > > >> > > > > incompatibility (caused by new broker
> >> property)
> >> > >>> > between
> >> > >>> > > > > > brokers
> >> > >>> > > > > > > >> with
> >> > >>> > > > > > > >> > > > > different versions  will be solved there.
> In
> >> > >>> addition,
> >> > >>> > > > > having
> >> > >>> > > > > > > >> stack
> >> > >>> > > > > > > >> > in
> >> > >>> > > > > > > >> > > > > broker property as meta data may also help
> >> > >>> consumers
> >> > >>> > in
> >> > >>> > > > the
> >> > >>> > > > > > > >> future.
> >> > >>> > > > > > > >> > So
> >> > >>> > > > > > > >> > > I
> >> > >>> > > > > > > >> > > > am
> >> > >>> > > > > > > >> > > > > open to adding stack property to broker.
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the next
> KIP
> >> > >>> hangout.
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen
> Wang <
> >> > >>> > > > > > > allenxwang@gmail.com
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >> > > > wrote:
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > > >> Can you send me the information on the
> next
> >> KIP
> >> > >>> > > hangout?
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping is not
> >> > cached.
> >> > >>> In
> >> > >>> > > > > > KafkaApis,
> >> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each
> >> time
> >> > the
> >> > >>> > > mapping
> >> > >>> > > > > is
> >> > >>> > > > > > > >> needed
> >> > >>> > > > > > > >> > > for
> >> > >>> > > > > > > >> > > > >> auto topic creation. This will ensure
> latest
> >> > >>> mapping
> >> > >>> > is
> >> > >>> > > > > used
> >> > >>> > > > > > at
> >> > >>> > > > > > > >> any
> >> > >>> > > > > > > >> > > > time.
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >> The ability to get the complete mapping
> >> makes
> >> > it
> >> > >>> > simple
> >> > >>> > > > to
> >> > >>> > > > > > > reuse
> >> > >>> > > > > > > >> the
> >> > >>> > > > > > > >> > > > same
> >> > >>> > > > > > > >> > > > >> interface in command line tools.
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya
> >> > >>> Auradkar <
> >> > >>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the next
> KIP
> >> > >>> hangout?
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack locator
> can
> >> be
> >> > >>> useful
> >> > >>> > > > but I
> >> > >>> > > > > > do
> >> > >>> > > > > > > >> see a
> >> > >>> > > > > > > >> > > few
> >> > >>> > > > > > > >> > > > >>> concerns:
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in the
> >> > >>> document),
> >> > >>> > > > implies
> >> > >>> > > > > > that
> >> > >>> > > > > > > >> it
> >> > >>> > > > > > > >> > can
> >> > >>> > > > > > > >> > > > >>> discover rack information for any node in
> >> the
> >> > >>> > cluster.
> >> > >>> > > > How
> >> > >>> > > > > > > does
> >> > >>> > > > > > > >> it
> >> > >>> > > > > > > >> > > deal
> >> > >>> > > > > > > >> > > > >>> with rack location changes? For example,
> >> if I
> >> > >>> moved
> >> > >>> > > > broker
> >> > >>> > > > > > id
> >> > >>> > > > > > > >> (1)
> >> > >>> > > > > > > >> > > from
> >> > >>> > > > > > > >> > > > >>> rack
> >> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that broker
> >> with
> >> > a
> >> > >>> > newer
> >> > >>> > > > rack
> >> > >>> > > > > > > >> config.
> >> > >>> > > > > > > >> > If
> >> > >>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack
> >> > >>> information at
> >> > >>> > > > start
> >> > >>> > > > > up
> >> > >>> > > > > > > >> time,
> >> > >>> > > > > > > >> > > any
> >> > >>> > > > > > > >> > > > >>> change to a broker will require bouncing
> >> the
> >> > >>> entire
> >> > >>> > > > > cluster
> >> > >>> > > > > > > >> since
> >> > >>> > > > > > > >> > > > >>> createTopic requests can be sent to any
> >> node
> >> > in
> >> > >>> the
> >> > >>> > > > > cluster.
> >> > >>> > > > > > > >> > > > >>> For this reason it may be simpler to have
> >> each
> >> > >>> node
> >> > >>> > be
> >> > >>> > > > > aware
> >> > >>> > > > > > > of
> >> > >>> > > > > > > >> its
> >> > >>> > > > > > > >> > > own
> >> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during start up
> >> > time.
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an
> >> > external
> >> > >>> > > service
> >> > >>> > > > > > being
> >> > >>> > > > > > > >> > > available
> >> > >>> > > > > > > >> > > > >>> to
> >> > >>> > > > > > > >> > > > >>> serve rack information.
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a
> couple
> >> of
> >> > >>> other
> >> > >>> > > > > systems
> >> > >>> > > > > > > deal
> >> > >>> > > > > > > >> > with
> >> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> >> > >>> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
> >> > >>> > > > > > > >> > > > >>> (Property File configuration)
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >>
> >> > >>> > > > > > >
> >> > >>> > > > > >
> >> > >>> > > > >
> >> > >>> > > >
> >> > >>> > >
> >> > >>> >
> >> > >>>
> >> >
> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> >> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >>
> >> > >>> > > > > > >
> >> > >>> > > > > >
> >> > >>> > > > >
> >> > >>> > > >
> >> > >>> > >
> >> > >>> >
> >> > >>>
> >> >
> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> Voldemort does a static node -> zone
> >> > assignment
> >> > >>> > based
> >> > >>> > > on
> >> > >>> > > > > > > >> > > configuration.
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> Aditya
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen
> >> Wang <
> >> > >>> > > > > > > >> allenxwang@gmail.com
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> > > > >>> wrote:
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>> > I would like to see if we can do both:
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to
> >> facilitate
> >> > >>> > migration
> >> > >>> > > > > with
> >> > >>> > > > > > > >> > existing
> >> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > - Make rack an optional property for
> >> broker.
> >> > >>> If
> >> > >>> > rack
> >> > >>> > > > is
> >> > >>> > > > > > > >> available
> >> > >>> > > > > > > >> > > > from
> >> > >>> > > > > > > >> > > > >>> > broker, treat it as source of truth.
> For
> >> > users
> >> > >>> > with
> >> > >>> > > > > > existing
> >> > >>> > > > > > > >> > > > >>> broker-rack
> >> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can use
> the
> >> > >>> pluggable
> >> > >>> > > way
> >> > >>> > > > > or
> >> > >>> > > > > > > they
> >> > >>> > > > > > > >> > can
> >> > >>> > > > > > > >> > > > >>> transfer
> >> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack
> property.
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what happens
> >> at
> >> > >>> rolling
> >> > >>> > > > > upgrade
> >> > >>> > > > > > > >> when
> >> > >>> > > > > > > >> > we
> >> > >>> > > > > > > >> > > > have
> >> > >>> > > > > > > >> > > > >>> > rack as a broker property. For brokers
> >> with
> >> > >>> older
> >> > >>> > > > > version
> >> > >>> > > > > > of
> >> > >>> > > > > > > >> > Kafka,
> >> > >>> > > > > > > >> > > > >>> will it
> >> > >>> > > > > > > >> > > > >>> > cause problem for them? If so, is there
> >> any
> >> > >>> > > > workaround?
> >> > >>> > > > > I
> >> > >>> > > > > > > also
> >> > >>> > > > > > > >> > > think
> >> > >>> > > > > > > >> > > > it
> >> > >>> > > > > > > >> > > > >>> > would be better not to have rack in the
> >> > >>> controller
> >> > >>> > > > wire
> >> > >>> > > > > > > >> protocol
> >> > >>> > > > > > > >> > > but
> >> > >>> > > > > > > >> > > > >>> not
> >> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > Thanks,
> >> > >>> > > > > > > >> > > > >>> > Allen
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd
> >> > Palino <
> >> > >>> > > > > > > >> tpalino@gmail.com>
> >> > >>> > > > > > > >> > > > >>> wrote:
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a
> pluggable
> >> > >>> locator.
> >> > >>> > > For
> >> > >>> > > > > > > >> example, we
> >> > >>> > > > > > > >> > > > >>> already
> >> > >>> > > > > > > >> > > > >>> > > have an interface for discovering
> >> > >>> information
> >> > >>> > > about
> >> > >>> > > > > the
> >> > >>> > > > > > > >> > physical
> >> > >>> > > > > > > >> > > > >>> location
> >> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea
> of
> >> > >>> having to
> >> > >>> > > > > > maintain
> >> > >>> > > > > > > >> data
> >> > >>> > > > > > > >> > in
> >> > >>> > > > > > > >> > > > >>> > multiple
> >> > >>> > > > > > > >> > > > >>> > > places.
> >> > >>> > > > > > > >> > > > >>> > >
> >> > >>> > > > > > > >> > > > >>> > > -Todd
> >> > >>> > > > > > > >> > > > >>> > >
> >> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM,
> Aditya
> >> > >>> > Auradkar <
> >> > >>> > > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid>
> wrote:
> >> > >>> > > > > > > >> > > > >>> > >
> >> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a
> >> > >>> RackLocator
> >> > >>> > > class
> >> > >>> > > > > that
> >> > >>> > > > > > > is
> >> > >>> > > > > > > >> > > > pluggable
> >> > >>> > > > > > > >> > > > >>> > seems
> >> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers
> to
> >> > >>> > potentially
> >> > >>> > > > > > non-ZK
> >> > >>> > > > > > > >> > storage
> >> > >>> > > > > > > >> > > > >>> for the
> >> > >>> > > > > > > >> > > > >>> > > > rack info which I don't think is
> >> > >>> necessary.
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in
> >> zk
> >> > >>> under
> >> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> >> > >>> > > > > > > >> > > > >>> > > > similar to other broker properties
> >> and
> >> > >>> add a
> >> > >>> > > > config
> >> > >>> > > > > in
> >> > >>> > > > > > > >> > > > KafkaConfig
> >> > >>> > > > > > > >> > > > >>> > called
> >> > >>> > > > > > > >> > > > >>> > > > "rack".
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > >
> >> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> >> > >>> > > > > > > >> > > "rack":
> >> > >>> > > > > > > >> > > > >>> > "abc"}
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > > > Aditya
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM,
> Gwen
> >> > >>> Shapira
> >> > >>> > <
> >> > >>> > > > > > > >> > > gwen@confluent.io
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > > >>> > wrote:
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > > > > Hi,
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a
> KIP
> >> > for
> >> > >>> > this.
> >> > >>> > > > This
> >> > >>> > > > > > is
> >> > >>> > > > > > > >> super
> >> > >>> > > > > > > >> > > > >>> important
> >> > >>> > > > > > > >> > > > >>> > > for
> >> > >>> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many
> >> racks
> >> > as
> >> > >>> > > > > possible"?
> >> > >>> > > > > > > I'd
> >> > >>> > > > > > > >> > want
> >> > >>> > > > > > > >> > > to
> >> > >>> > > > > > > >> > > > >>> > balance
> >> > >>> > > > > > > >> > > > >>> > > > > between safety (more racks) and
> >> > network
> >> > >>> > > > > utilization
> >> > >>> > > > > > > >> > (traffic
> >> > >>> > > > > > > >> > > > >>> within a
> >> > >>> > > > > > > >> > > > >>> > > > rack
> >> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR
> >> switch).
> >> > One
> >> > >>> > > replica
> >> > >>> > > > > on
> >> > >>> > > > > > a
> >> > >>> > > > > > > >> > > different
> >> > >>> > > > > > > >> > > > >>> rack
> >> > >>> > > > > > > >> > > > >>> > > and
> >> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if
> possible)
> >> > >>> sounds
> >> > >>> > > > better
> >> > >>> > > > > to
> >> > >>> > > > > > > me.
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems
> overly
> >> > >>> complex
> >> > >>> > > > > compared
> >> > >>> > > > > > to
> >> > >>> > > > > > > >> > > adding a
> >> > >>> > > > > > > >> > > > >>> > > > rack.number
> >> > >>> > > > > > > >> > > > >>> > > > > property to the broker properties
> >> > file.
> >> > >>> Why
> >> > >>> > do
> >> > >>> > > > we
> >> > >>> > > > > > want
> >> > >>> > > > > > > >> > that?
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > Gwen
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM,
> >> > Allen
> >> > >>> > Wang <
> >> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> >> > >>> > > > > > > >> > > > >>> > > > wrote:
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack
> >> aware
> >> > >>> > replica
> >> > >>> > > > > > > >> assignment.
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >>
> >> > >>> > > > > > >
> >> > >>> > > > > >
> >> > >>> > > > >
> >> > >>> > > >
> >> > >>> > >
> >> > >>> >
> >> > >>>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the
> >> isolation
> >> > >>> > > provided
> >> > >>> > > > by
> >> > >>> > > > > > the
> >> > >>> > > > > > > >> > racks
> >> > >>> > > > > > > >> > > in
> >> > >>> > > > > > > >> > > > >>> data
> >> > >>> > > > > > > >> > > > >>> > > > center
> >> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas to
> racks
> >> to
> >> > >>> > provide
> >> > >>> > > > > fault
> >> > >>> > > > > > > >> > > tolerance.
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> >> > >>> > > > > > > >> > > > >>> > > > > > Allen
> >> > >>> > > > > > > >> > > > >>> > > > > >
> >> > >>> > > > > > > >> > > > >>> > > > >
> >> > >>> > > > > > > >> > > > >>> > > >
> >> > >>> > > > > > > >> > > > >>> > >
> >> > >>> > > > > > > >> > > > >>> >
> >> > >>> > > > > > > >> > > > >>>
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >>
> >> > >>> > > > > > > >> > > > >
> >> > >>> > > > > > > >> > > >
> >> > >>> > > > > > > >> > >
> >> > >>> > > > > > > >> >
> >> > >>> > > > > > > >>
> >> > >>> > > > > > > >
> >> > >>> > > > > > > >
> >> > >>> > > > > > >
> >> > >>> > > > > >
> >> > >>> > > > >
> >> > >>> > > >
> >> > >>> > >
> >> > >>> >
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> --
> >> > >>> Thanks,
> >> > >>> Neha
> >> > >>>
> >> > >>
> >> > >>
> >> > >
> >> >
> >>
> >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Added the rolling upgrade instruction in the KIP, similar to those in 0.9.0
release notes.

On Wed, Dec 16, 2015 at 11:32 AM, Allen Wang <al...@gmail.com> wrote:

> Hi Jun,
>
> The reason that TopicMetadataResponse is not included in the KIP is that
> it currently is not version aware . So we need to introduce version to it
> in order to make sure backward compatibility. It seems to me a big change.
> Do we want to couple it with this KIP? Do we need to further discuss what
> information to include in the new version besides rack? For example, should
> we include broker security protocol in TopicMetadataResponse?
>
> The other option is to make it a separate KIP to make
> TopicMetadataResponse version aware and decide what to include, and make
> this KIP focus on the rack aware algorithm, admin tools  and related
> changes to inter-broker protocol .
>
> Thanks,
> Allen
>
>
>
>
> On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <ju...@confluent.io> wrote:
>
>> Allen,
>>
>> Thanks for the proposal. A few comments.
>>
>> 1. Since this KIP changes the inter broker communication protocol
>> (UpdateMetadataRequest), we will need to document the upgrade path
>> (similar
>> to what's described in
>> http://kafka.apache.org/090/documentation.html#upgrade).
>>
>> 2. It might be useful to include the rack info of the broker in
>> TopicMetadataResponse. This can be useful for administrative tasks, as
>> well
>> as read affinity in the future.
>>
>> Jun
>>
>>
>>
>> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <al...@gmail.com> wrote:
>>
>> > If there are no more comments I would like to call for a vote.
>> >
>> >
>> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <al...@gmail.com>
>> wrote:
>> >
>> > > KIP is updated with more details and how to handle the situation where
>> > > rack information is incomplete.
>> > >
>> > > In the situation where rack information is incomplete, but we want to
>> > > continue with the assignment, I have suggested to ignore all rack
>> > > information and fallback to original algorithm. The reason is
>> explained
>> > > below:
>> > >
>> > > The other options are to assume that the broker without the rack
>> belong
>> > to
>> > > its own unique rack, or they belong to one "default" rack. Either way
>> we
>> > > choose, it is highly likely to result in uneven number of brokers in
>> > racks,
>> > > and it is quite possible that the "made up" racks will have much fewer
>> > > number of brokers. As I explained in the KIP, uneven number of
>> brokers in
>> > > racks will lead to uneven distribution of replicas among brokers (even
>> > > though the leader distribution is still even). The brokers in the rack
>> > that
>> > > has fewer number of brokers will get more replicas per broker than
>> > brokers
>> > > in other racks.
>> > >
>> > > Given this fact and the replica assignment produced will be incorrect
>> > > anyway from rack aware point of view, ignoring all rack information
>> and
>> > > fallback to the original algorithm is not a bad choice since it will
>> at
>> > > least have a better guarantee of replica distribution.
>> > >
>> > > Also for command line tools it gives user a choice if for any reason
>> they
>> > > want to ignore rack information and fallback to the original
>> algorithm.
>> > >
>> > >
>> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <al...@gmail.com>
>> > wrote:
>> > >
>> > >> I am busy with some time pressing issues for the last few days. I
>> will
>> > >> think about how the incomplete rack information will affect the
>> balance
>> > and
>> > >> update the KIP by early next week.
>> > >>
>> > >> Thanks,
>> > >> Allen
>> > >>
>> > >>
>> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <ne...@confluent.io>
>> > wrote:
>> > >>
>> > >>> Few suggestions on improving the KIP
>> > >>>
>> > >>> *If some brokers have rack, and some do not, the algorithm will
>> thrown
>> > an
>> > >>> > exception. This is to prevent incorrect assignment caused by user
>> > >>> error.*
>> > >>>
>> > >>>
>> > >>> In the KIP, can you clearly state the user-facing behavior when some
>> > >>> brokers have rack information and some don't. Which actions and
>> > requests
>> > >>> will error out and how?
>> > >>>
>> > >>> *Even distribution of partition leadership among brokers*
>> > >>>
>> > >>>
>> > >>> There is some information about arranging the sorted broker list
>> > >>> interlaced
>> > >>> with rack ids. Can you describe the changes to the current algorithm
>> > in a
>> > >>> little more detail? How does this interlacing work if only a subset
>> of
>> > >>> brokers have the rack id configured? Does this still work if uneven
>> #
>> > of
>> > >>> brokers are assigned to each rack? It might work, I'm looking for
>> more
>> > >>> details on the changes, since it will affect the behavior seen by
>> the
>> > >>> user
>> > >>> - imbalance on either the leaders or data or both.
>> > >>>
>> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
>> > aauradkar@linkedin.com>
>> > >>> wrote:
>> > >>>
>> > >>> > I think this sounds reasonable. Anyone else have comments?
>> > >>> >
>> > >>> > Aditya
>> > >>> >
>> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <allenxwang@gmail.com
>> >
>> > >>> wrote:
>> > >>> >
>> > >>> > > During the discussion in the hangout, it was mentioned that it
>> > would
>> > >>> be
>> > >>> > > desirable that consumers know the rack information of the
>> brokers
>> > so
>> > >>> that
>> > >>> > > they can consume from the broker in the same rack to reduce
>> > latency.
>> > >>> As I
>> > >>> > > understand this will only be beneficial if consumer can consume
>> > from
>> > >>> any
>> > >>> > > broker in ISR, which is not possible now.
>> > >>> > >
>> > >>> > > I suggest we skip the change to TMR. Once the change is made to
>> > >>> consumer
>> > >>> > to
>> > >>> > > be able to consume from any broker in ISR, the rack information
>> can
>> > >>> be
>> > >>> > > added to TMR.
>> > >>> > >
>> > >>> > > Another thing I want to confirm is  command line behavior. I
>> think
>> > >>> the
>> > >>> > > desirable default behavior is to fail fast on command line for
>> > >>> incomplete
>> > >>> > > rack mapping. The error message can include further instruction
>> > that
>> > >>> > tells
>> > >>> > > the user to add an extra argument (like
>> "--allow-partial-rackinfo")
>> > >>> to
>> > >>> > > suppress the error and do an imperfect rack aware assignment. If
>> > the
>> > >>> > > default behavior is to allow incomplete mapping, the error can
>> > still
>> > >>> be
>> > >>> > > easily missed.
>> > >>> > >
>> > >>> > > The affected command line tools are TopicCommand and
>> > >>> > > ReassignPartitionsCommand.
>> > >>> > >
>> > >>> > > Thanks,
>> > >>> > > Allen
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
>> > >>> > aauradkar@linkedin.com>
>> > >>> > > wrote:
>> > >>> > >
>> > >>> > > > Hi Allen,
>> > >>> > > >
>> > >>> > > > For TopicMetadataResponse to understand version, you can bump
>> up
>> > >>> the
>> > >>> > > > request version itself. Based on the version of the request,
>> the
>> > >>> > response
>> > >>> > > > can be appropriately serialized. It shouldn't be a huge
>> change.
>> > For
>> > >>> > > > example: We went through something similar for ProduceRequest
>> > >>> recently
>> > >>> > (
>> > >>> > > > https://reviews.apache.org/r/33378/)
>> > >>> > > > I guess the reason protocol information is not included in the
>> > TMR
>> > >>> is
>> > >>> > > > because the topic itself is independent of any particular
>> > protocol
>> > >>> (SSL
>> > >>> > > vs
>> > >>> > > > Plaintext). Having said that, I'm not sure we even need rack
>> > >>> > information
>> > >>> > > in
>> > >>> > > > TMR. What usecase were you thinking of initially?
>> > >>> > > >
>> > >>> > > > For 1 - I'd be fine with adding an option to the command line
>> > tools
>> > >>> > that
>> > >>> > > > check rack assignment. For e.g. "--strict-assignment" or
>> > something
>> > >>> > > similar.
>> > >>> > > >
>> > >>> > > > Aditya
>> > >>> > > >
>> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
>> > allenxwang@gmail.com>
>> > >>> > > wrote:
>> > >>> > > >
>> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a look. One
>> > >>> thing I
>> > >>> > > have
>> > >>> > > > > changed is removing the proposal to add rack to
>> > >>> > TopicMetadataResponse.
>> > >>> > > > The
>> > >>> > > > > reason is that unlike UpdateMetadataRequest,
>> > >>> TopicMetadataResponse
>> > >>> > does
>> > >>> > > > not
>> > >>> > > > > understand version. I don't see a way to include rack
>> without
>> > >>> > breaking
>> > >>> > > > old
>> > >>> > > > > version of clients. That's probably why secure protocol is
>> not
>> > >>> > included
>> > >>> > > > in
>> > >>> > > > > the TopicMetadataResponse either. I think it will be a much
>> > >>> bigger
>> > >>> > > change
>> > >>> > > > > to include rack in TopicMetadataResponse.
>> > >>> > > > >
>> > >>> > > > > For 1, my concern is that doing rack aware assignment
>> without
>> > >>> > complete
>> > >>> > > > > broker to rack mapping will result in assignment that is not
>> > rack
>> > >>> > aware
>> > >>> > > > and
>> > >>> > > > > fail to provide fault tolerance in the event of rack outage.
>> > This
>> > >>> > kind
>> > >>> > > of
>> > >>> > > > > problem will be difficult to surface. And the cost of this
>> > >>> problem is
>> > >>> > > > high:
>> > >>> > > > > you have to do partition reassignment if you are lucky to
>> spot
>> > >>> the
>> > >>> > > > problem
>> > >>> > > > > early on or face the consequence of data loss during real
>> rack
>> > >>> > outage.
>> > >>> > > > >
>> > >>> > > > > I do see the concern of fail-fast as it might also cause
>> data
>> > >>> loss if
>> > >>> > > > > producer is not able produce the message due to topic
>> creation
>> > >>> > failure.
>> > >>> > > > Is
>> > >>> > > > > it feasible to treat dynamic topic creation and command
>> tools
>> > >>> > > > differently?
>> > >>> > > > > We allow dynamic topic creation with incomplete broker-rack
>> > >>> mapping
>> > >>> > and
>> > >>> > > > > fail fast in command line. Another option is to let user
>> > >>> determine
>> > >>> > the
>> > >>> > > > > behavior for command line. For example, by default fail
>> fast in
>> > >>> > command
>> > >>> > > > > line but allow incomplete broker-rack mapping if another
>> switch
>> > >>> is
>> > >>> > > > > provided.
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
>> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
>> > >>> > > > >
>> > >>> > > > > > Hey Allen,
>> > >>> > > > > >
>> > >>> > > > > > 1. If we choose fail fast topic creation, we will have
>> topic
>> > >>> > creation
>> > >>> > > > > > failures while upgrading the cluster. I really doubt we
>> want
>> > >>> this
>> > >>> > > > > behavior.
>> > >>> > > > > > Ideally, this should be invisible to clients of a cluster.
>> > >>> > Currently,
>> > >>> > > > > each
>> > >>> > > > > > broker is effectively its own rack. So we probably can use
>> > the
>> > >>> rack
>> > >>> > > > > > information whenever possible but not make it a hard
>> > >>> requirement.
>> > >>> > To
>> > >>> > > > > extend
>> > >>> > > > > > Gwen's example, one badly configured broker should not
>> > degrade
>> > >>> > topic
>> > >>> > > > > > creation for the entire cluster.
>> > >>> > > > > >
>> > >>> > > > > > 2. Upgrade scenario - Can you add a section on the upgrade
>> > >>> piece to
>> > >>> > > > > confirm
>> > >>> > > > > > that old clients will not see errors? I believe
>> > >>> > > > > ZookeeperConsumerConnector
>> > >>> > > > > > reads the Broker objects from ZK. I wanted to confirm that
>> > this
>> > >>> > will
>> > >>> > > > not
>> > >>> > > > > > cause any problems.
>> > >>> > > > > >
>> > >>> > > > > > 3. Could you elaborate your proposed changes to the
>> > >>> > > > UpdateMetadataRequest
>> > >>> > > > > > in the "Public Interfaces" section? Personally, I find
>> this
>> > >>> format
>> > >>> > > easy
>> > >>> > > > > to
>> > >>> > > > > > read in terms of wire protocol changes:
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
>> > >>> > > > > >
>> > >>> > > > > > Aditya
>> > >>> > > > > >
>> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
>> > >>> allenxwang@gmail.com>
>> > >>> > > > > wrote:
>> > >>> > > > > >
>> > >>> > > > > > > KIP is updated include rack as an optional property for
>> > >>> broker.
>> > >>> > > > Please
>> > >>> > > > > > take
>> > >>> > > > > > > a look and let me know if more details are needed.
>> > >>> > > > > > >
>> > >>> > > > > > > For the case where some brokers have rack and some do
>> not,
>> > >>> the
>> > >>> > > > current
>> > >>> > > > > > KIP
>> > >>> > > > > > > uses the fail-fast behavior. If there are concerns, we
>> can
>> > >>> > further
>> > >>> > > > > > discuss
>> > >>> > > > > > > this in the email thread or next hangout.
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
>> > >>> > allenxwang@gmail.com
>> > >>> > > >
>> > >>> > > > > > wrote:
>> > >>> > > > > > >
>> > >>> > > > > > > > That's a good question. I can think of three actions
>> if
>> > the
>> > >>> > rack
>> > >>> > > > > > > > information is incomplete:
>> > >>> > > > > > > >
>> > >>> > > > > > > > 1. Treat the node without rack as if it is on its
>> unique
>> > >>> rack
>> > >>> > > > > > > > 2. Disregard all rack information and fallback to
>> current
>> > >>> > > algorithm
>> > >>> > > > > > > > 3. Fail-fast
>> > >>> > > > > > > >
>> > >>> > > > > > > > Now I think about it, one and three make more sense.
>> The
>> > >>> reason
>> > >>> > > for
>> > >>> > > > > > > > fail-fast is that user mistake for not providing the
>> rack
>> > >>> may
>> > >>> > > never
>> > >>> > > > > be
>> > >>> > > > > > > > found if we tolerate that and the assignment may not
>> be
>> > >>> rack
>> > >>> > > aware
>> > >>> > > > as
>> > >>> > > > > > the
>> > >>> > > > > > > > user has expected and this creates debug problems when
>> > >>> things
>> > >>> > > fail.
>> > >>> > > > > > > >
>> > >>> > > > > > > > What do you think? If not fail-fast, is there anyway
>> we
>> > can
>> > >>> > make
>> > >>> > > > the
>> > >>> > > > > > user
>> > >>> > > > > > > > error standing out?
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
>> > >>> > > gwen@confluent.io>
>> > >>> > > > > > > wrote:
>> > >>> > > > > > > >
>> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers have rack
>> > >>> > assignment
>> > >>> > > > and
>> > >>> > > > > > some
>> > >>> > > > > > > >> don't, do we act like none of them have it? or like
>> > those
>> > >>> > > without
>> > >>> > > > > > > >> assignment are in their own rack?
>> > >>> > > > > > > >>
>> > >>> > > > > > > >> The first scenario is good when first setting up
>> > >>> > rack-awareness,
>> > >>> > > > but
>> > >>> > > > > > the
>> > >>> > > > > > > >> second makes more sense for on-going maintenance (I
>> can
>> > >>> > totally
>> > >>> > > > see
>> > >>> > > > > > > >> someone
>> > >>> > > > > > > >> adding a node and forgetting to set the rack
>> property,
>> > we
>> > >>> > don't
>> > >>> > > > want
>> > >>> > > > > > > this
>> > >>> > > > > > > >> to change behavior for anything except the new node).
>> > >>> > > > > > > >>
>> > >>> > > > > > > >> What do you think?
>> > >>> > > > > > > >>
>> > >>> > > > > > > >> Gwen
>> > >>> > > > > > > >>
>> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
>> > >>> > > > allenxwang@gmail.com>
>> > >>> > > > > > > >> wrote:
>> > >>> > > > > > > >>
>> > >>> > > > > > > >> > For scenario 1:
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > - Add the rack information to broker property file
>> or
>> > >>> > > > dynamically
>> > >>> > > > > > set
>> > >>> > > > > > > >> it in
>> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You
>> would
>> > do
>> > >>> > that
>> > >>> > > > for
>> > >>> > > > > > all
>> > >>> > > > > > > >> > brokers and restart the brokers one by one.
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > In this scenario, the complete broker to rack
>> mapping
>> > >>> may
>> > >>> > not
>> > >>> > > be
>> > >>> > > > > > > >> available
>> > >>> > > > > > > >> > until every broker is restarted. During that time
>> we
>> > >>> fall
>> > >>> > back
>> > >>> > > > to
>> > >>> > > > > > > >> default
>> > >>> > > > > > > >> > replica assignment algorithm.
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > For scenario 2:
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > - Add the rack information to broker property file
>> or
>> > >>> > > > dynamically
>> > >>> > > > > > set
>> > >>> > > > > > > >> it in
>> > >>> > > > > > > >> > the wrapper code and start the broker.
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
>> > >>> > > > gwen@confluent.io>
>> > >>> > > > > > > >> wrote:
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > > Can you clarify the workflow for the following
>> > >>> scenarios:
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to add
>> rack
>> > >>> > > information
>> > >>> > > > > for
>> > >>> > > > > > > >> each
>> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to specify
>> > which
>> > >>> > rack
>> > >>> > > it
>> > >>> > > > > > > >> belongs on
>> > >>> > > > > > > >> > > while adding it.
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> > > Thanks!
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
>> > >>> > > > > allenxwang@gmail.com
>> > >>> > > > > > >
>> > >>> > > > > > > >> > wrote:
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> > > > We discussed the KIP in the hangout today. The
>> > >>> > > > recommendation
>> > >>> > > > > is
>> > >>> > > > > > > to
>> > >>> > > > > > > >> > make
>> > >>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For
>> users
>> > >>> with
>> > >>> > > > > existing
>> > >>> > > > > > > rack
>> > >>> > > > > > > >> > > > information stored somewhere, they would need
>> to
>> > >>> > retrieve
>> > >>> > > > the
>> > >>> > > > > > > >> > information
>> > >>> > > > > > > >> > > > at broker start up and dynamically set the rack
>> > >>> > property,
>> > >>> > > > > which
>> > >>> > > > > > > can
>> > >>> > > > > > > >> be
>> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker.
>> > There
>> > >>> will
>> > >>> > > be
>> > >>> > > > no
>> > >>> > > > > > > >> > interface
>> > >>> > > > > > > >> > > or
>> > >>> > > > > > > >> > > > pluggable implementation to retrieve the rack
>> > >>> > information.
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > The assumption is that you always need to
>> restart
>> > >>> the
>> > >>> > > broker
>> > >>> > > > > to
>> > >>> > > > > > > >> make a
>> > >>> > > > > > > >> > > > change to the rack.
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > Once the rack becomes a broker property, it
>> will
>> > be
>> > >>> > > possible
>> > >>> > > > > to
>> > >>> > > > > > > make
>> > >>> > > > > > > >> > rack
>> > >>> > > > > > > >> > > > part of the meta data to help the consumer
>> choose
>> > >>> which
>> > >>> > in
>> > >>> > > > > sync
>> > >>> > > > > > > >> replica
>> > >>> > > > > > > >> > > to
>> > >>> > > > > > > >> > > > consume from as part of the future consumer
>> > >>> enhancement.
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > I will update the KIP.
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > Thanks,
>> > >>> > > > > > > >> > > > Allen
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
>> > >>> > > > > > allenxwang@gmail.com>
>> > >>> > > > > > > >> > wrote:
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP
>> > was
>> > >>> not
>> > >>> > > > > > discussed
>> > >>> > > > > > > >> due
>> > >>> > > > > > > >> > to
>> > >>> > > > > > > >> > > > > time constraint.
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > > > However, after hearing discussion of KIP-35,
>> I
>> > >>> have
>> > >>> > the
>> > >>> > > > > > feeling
>> > >>> > > > > > > >> that
>> > >>> > > > > > > >> > > > > incompatibility (caused by new broker
>> property)
>> > >>> > between
>> > >>> > > > > > brokers
>> > >>> > > > > > > >> with
>> > >>> > > > > > > >> > > > > different versions  will be solved there. In
>> > >>> addition,
>> > >>> > > > > having
>> > >>> > > > > > > >> stack
>> > >>> > > > > > > >> > in
>> > >>> > > > > > > >> > > > > broker property as meta data may also help
>> > >>> consumers
>> > >>> > in
>> > >>> > > > the
>> > >>> > > > > > > >> future.
>> > >>> > > > > > > >> > So
>> > >>> > > > > > > >> > > I
>> > >>> > > > > > > >> > > > am
>> > >>> > > > > > > >> > > > > open to adding stack property to broker.
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the next KIP
>> > >>> hangout.
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
>> > >>> > > > > > > allenxwang@gmail.com
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >> > > > wrote:
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > > >> Can you send me the information on the next
>> KIP
>> > >>> > > hangout?
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping is not
>> > cached.
>> > >>> In
>> > >>> > > > > > KafkaApis,
>> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each
>> time
>> > the
>> > >>> > > mapping
>> > >>> > > > > is
>> > >>> > > > > > > >> needed
>> > >>> > > > > > > >> > > for
>> > >>> > > > > > > >> > > > >> auto topic creation. This will ensure latest
>> > >>> mapping
>> > >>> > is
>> > >>> > > > > used
>> > >>> > > > > > at
>> > >>> > > > > > > >> any
>> > >>> > > > > > > >> > > > time.
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >> The ability to get the complete mapping
>> makes
>> > it
>> > >>> > simple
>> > >>> > > > to
>> > >>> > > > > > > reuse
>> > >>> > > > > > > >> the
>> > >>> > > > > > > >> > > > same
>> > >>> > > > > > > >> > > > >> interface in command line tools.
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya
>> > >>> Auradkar <
>> > >>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the next KIP
>> > >>> hangout?
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack locator can
>> be
>> > >>> useful
>> > >>> > > > but I
>> > >>> > > > > > do
>> > >>> > > > > > > >> see a
>> > >>> > > > > > > >> > > few
>> > >>> > > > > > > >> > > > >>> concerns:
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in the
>> > >>> document),
>> > >>> > > > implies
>> > >>> > > > > > that
>> > >>> > > > > > > >> it
>> > >>> > > > > > > >> > can
>> > >>> > > > > > > >> > > > >>> discover rack information for any node in
>> the
>> > >>> > cluster.
>> > >>> > > > How
>> > >>> > > > > > > does
>> > >>> > > > > > > >> it
>> > >>> > > > > > > >> > > deal
>> > >>> > > > > > > >> > > > >>> with rack location changes? For example,
>> if I
>> > >>> moved
>> > >>> > > > broker
>> > >>> > > > > > id
>> > >>> > > > > > > >> (1)
>> > >>> > > > > > > >> > > from
>> > >>> > > > > > > >> > > > >>> rack
>> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that broker
>> with
>> > a
>> > >>> > newer
>> > >>> > > > rack
>> > >>> > > > > > > >> config.
>> > >>> > > > > > > >> > If
>> > >>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack
>> > >>> information at
>> > >>> > > > start
>> > >>> > > > > up
>> > >>> > > > > > > >> time,
>> > >>> > > > > > > >> > > any
>> > >>> > > > > > > >> > > > >>> change to a broker will require bouncing
>> the
>> > >>> entire
>> > >>> > > > > cluster
>> > >>> > > > > > > >> since
>> > >>> > > > > > > >> > > > >>> createTopic requests can be sent to any
>> node
>> > in
>> > >>> the
>> > >>> > > > > cluster.
>> > >>> > > > > > > >> > > > >>> For this reason it may be simpler to have
>> each
>> > >>> node
>> > >>> > be
>> > >>> > > > > aware
>> > >>> > > > > > > of
>> > >>> > > > > > > >> its
>> > >>> > > > > > > >> > > own
>> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during start up
>> > time.
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an
>> > external
>> > >>> > > service
>> > >>> > > > > > being
>> > >>> > > > > > > >> > > available
>> > >>> > > > > > > >> > > > >>> to
>> > >>> > > > > > > >> > > > >>> serve rack information.
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a couple
>> of
>> > >>> other
>> > >>> > > > > systems
>> > >>> > > > > > > deal
>> > >>> > > > > > > >> > with
>> > >>> > > > > > > >> > > > >>> zone/rack awareness.
>> > >>> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
>> > >>> > > > > > > >> > > > >>> (Property File configuration)
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >>
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>> > >>> > > > > > > >> > > > >>> (Dynamic inference)
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >>
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> Voldemort does a static node -> zone
>> > assignment
>> > >>> > based
>> > >>> > > on
>> > >>> > > > > > > >> > > configuration.
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> Aditya
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen
>> Wang <
>> > >>> > > > > > > >> allenxwang@gmail.com
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> > > > >>> wrote:
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>> > I would like to see if we can do both:
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to
>> facilitate
>> > >>> > migration
>> > >>> > > > > with
>> > >>> > > > > > > >> > existing
>> > >>> > > > > > > >> > > > >>> > broker-rack mapping
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > - Make rack an optional property for
>> broker.
>> > >>> If
>> > >>> > rack
>> > >>> > > > is
>> > >>> > > > > > > >> available
>> > >>> > > > > > > >> > > > from
>> > >>> > > > > > > >> > > > >>> > broker, treat it as source of truth. For
>> > users
>> > >>> > with
>> > >>> > > > > > existing
>> > >>> > > > > > > >> > > > >>> broker-rack
>> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can use the
>> > >>> pluggable
>> > >>> > > way
>> > >>> > > > > or
>> > >>> > > > > > > they
>> > >>> > > > > > > >> > can
>> > >>> > > > > > > >> > > > >>> transfer
>> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack property.
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what happens
>> at
>> > >>> rolling
>> > >>> > > > > upgrade
>> > >>> > > > > > > >> when
>> > >>> > > > > > > >> > we
>> > >>> > > > > > > >> > > > have
>> > >>> > > > > > > >> > > > >>> > rack as a broker property. For brokers
>> with
>> > >>> older
>> > >>> > > > > version
>> > >>> > > > > > of
>> > >>> > > > > > > >> > Kafka,
>> > >>> > > > > > > >> > > > >>> will it
>> > >>> > > > > > > >> > > > >>> > cause problem for them? If so, is there
>> any
>> > >>> > > > workaround?
>> > >>> > > > > I
>> > >>> > > > > > > also
>> > >>> > > > > > > >> > > think
>> > >>> > > > > > > >> > > > it
>> > >>> > > > > > > >> > > > >>> > would be better not to have rack in the
>> > >>> controller
>> > >>> > > > wire
>> > >>> > > > > > > >> protocol
>> > >>> > > > > > > >> > > but
>> > >>> > > > > > > >> > > > >>> not
>> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > Thanks,
>> > >>> > > > > > > >> > > > >>> > Allen
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd
>> > Palino <
>> > >>> > > > > > > >> tpalino@gmail.com>
>> > >>> > > > > > > >> > > > >>> wrote:
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a pluggable
>> > >>> locator.
>> > >>> > > For
>> > >>> > > > > > > >> example, we
>> > >>> > > > > > > >> > > > >>> already
>> > >>> > > > > > > >> > > > >>> > > have an interface for discovering
>> > >>> information
>> > >>> > > about
>> > >>> > > > > the
>> > >>> > > > > > > >> > physical
>> > >>> > > > > > > >> > > > >>> location
>> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea of
>> > >>> having to
>> > >>> > > > > > maintain
>> > >>> > > > > > > >> data
>> > >>> > > > > > > >> > in
>> > >>> > > > > > > >> > > > >>> > multiple
>> > >>> > > > > > > >> > > > >>> > > places.
>> > >>> > > > > > > >> > > > >>> > >
>> > >>> > > > > > > >> > > > >>> > > -Todd
>> > >>> > > > > > > >> > > > >>> > >
>> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya
>> > >>> > Auradkar <
>> > >>> > > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
>> > >>> > > > > > > >> > > > >>> > >
>> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a
>> > >>> RackLocator
>> > >>> > > class
>> > >>> > > > > that
>> > >>> > > > > > > is
>> > >>> > > > > > > >> > > > pluggable
>> > >>> > > > > > > >> > > > >>> > seems
>> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers to
>> > >>> > potentially
>> > >>> > > > > > non-ZK
>> > >>> > > > > > > >> > storage
>> > >>> > > > > > > >> > > > >>> for the
>> > >>> > > > > > > >> > > > >>> > > > rack info which I don't think is
>> > >>> necessary.
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in
>> zk
>> > >>> under
>> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
>> > >>> > > > > > > >> > > > >>> > > > similar to other broker properties
>> and
>> > >>> add a
>> > >>> > > > config
>> > >>> > > > > in
>> > >>> > > > > > > >> > > > KafkaConfig
>> > >>> > > > > > > >> > > > >>> > called
>> > >>> > > > > > > >> > > > >>> > > > "rack".
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > >
>> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
>> > >>> > > > > > > >> > > "rack":
>> > >>> > > > > > > >> > > > >>> > "abc"}
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > > > Aditya
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen
>> > >>> Shapira
>> > >>> > <
>> > >>> > > > > > > >> > > gwen@confluent.io
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > > >>> > wrote:
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > > > > Hi,
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP
>> > for
>> > >>> > this.
>> > >>> > > > This
>> > >>> > > > > > is
>> > >>> > > > > > > >> super
>> > >>> > > > > > > >> > > > >>> important
>> > >>> > > > > > > >> > > > >>> > > for
>> > >>> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > Few questions:
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many
>> racks
>> > as
>> > >>> > > > > possible"?
>> > >>> > > > > > > I'd
>> > >>> > > > > > > >> > want
>> > >>> > > > > > > >> > > to
>> > >>> > > > > > > >> > > > >>> > balance
>> > >>> > > > > > > >> > > > >>> > > > > between safety (more racks) and
>> > network
>> > >>> > > > > utilization
>> > >>> > > > > > > >> > (traffic
>> > >>> > > > > > > >> > > > >>> within a
>> > >>> > > > > > > >> > > > >>> > > > rack
>> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR
>> switch).
>> > One
>> > >>> > > replica
>> > >>> > > > > on
>> > >>> > > > > > a
>> > >>> > > > > > > >> > > different
>> > >>> > > > > > > >> > > > >>> rack
>> > >>> > > > > > > >> > > > >>> > > and
>> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if possible)
>> > >>> sounds
>> > >>> > > > better
>> > >>> > > > > to
>> > >>> > > > > > > me.
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly
>> > >>> complex
>> > >>> > > > > compared
>> > >>> > > > > > to
>> > >>> > > > > > > >> > > adding a
>> > >>> > > > > > > >> > > > >>> > > > rack.number
>> > >>> > > > > > > >> > > > >>> > > > > property to the broker properties
>> > file.
>> > >>> Why
>> > >>> > do
>> > >>> > > > we
>> > >>> > > > > > want
>> > >>> > > > > > > >> > that?
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > Gwen
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM,
>> > Allen
>> > >>> > Wang <
>> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
>> > >>> > > > > > > >> > > > >>> > > > wrote:
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack
>> aware
>> > >>> > replica
>> > >>> > > > > > > >> assignment.
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >>
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the
>> isolation
>> > >>> > > provided
>> > >>> > > > by
>> > >>> > > > > > the
>> > >>> > > > > > > >> > racks
>> > >>> > > > > > > >> > > in
>> > >>> > > > > > > >> > > > >>> data
>> > >>> > > > > > > >> > > > >>> > > > center
>> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas to racks
>> to
>> > >>> > provide
>> > >>> > > > > fault
>> > >>> > > > > > > >> > > tolerance.
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
>> > >>> > > > > > > >> > > > >>> > > > > > Allen
>> > >>> > > > > > > >> > > > >>> > > > > >
>> > >>> > > > > > > >> > > > >>> > > > >
>> > >>> > > > > > > >> > > > >>> > > >
>> > >>> > > > > > > >> > > > >>> > >
>> > >>> > > > > > > >> > > > >>> >
>> > >>> > > > > > > >> > > > >>>
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >>
>> > >>> > > > > > > >> > > > >
>> > >>> > > > > > > >> > > >
>> > >>> > > > > > > >> > >
>> > >>> > > > > > > >> >
>> > >>> > > > > > > >>
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Thanks,
>> > >>> Neha
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Hi Jun,

The reason that TopicMetadataResponse is not included in the KIP is that it
currently is not version aware . So we need to introduce version to it in
order to make sure backward compatibility. It seems to me a big change. Do
we want to couple it with this KIP? Do we need to further discuss what
information to include in the new version besides rack? For example, should
we include broker security protocol in TopicMetadataResponse?

The other option is to make it a separate KIP to make TopicMetadataResponse
version aware and decide what to include, and make this KIP focus on the
rack aware algorithm, admin tools  and related changes to inter-broker
protocol .

Thanks,
Allen




On Mon, Dec 14, 2015 at 8:30 AM, Jun Rao <ju...@confluent.io> wrote:

> Allen,
>
> Thanks for the proposal. A few comments.
>
> 1. Since this KIP changes the inter broker communication protocol
> (UpdateMetadataRequest), we will need to document the upgrade path (similar
> to what's described in
> http://kafka.apache.org/090/documentation.html#upgrade).
>
> 2. It might be useful to include the rack info of the broker in
> TopicMetadataResponse. This can be useful for administrative tasks, as well
> as read affinity in the future.
>
> Jun
>
>
>
> On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <al...@gmail.com> wrote:
>
> > If there are no more comments I would like to call for a vote.
> >
> >
> > On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > KIP is updated with more details and how to handle the situation where
> > > rack information is incomplete.
> > >
> > > In the situation where rack information is incomplete, but we want to
> > > continue with the assignment, I have suggested to ignore all rack
> > > information and fallback to original algorithm. The reason is explained
> > > below:
> > >
> > > The other options are to assume that the broker without the rack belong
> > to
> > > its own unique rack, or they belong to one "default" rack. Either way
> we
> > > choose, it is highly likely to result in uneven number of brokers in
> > racks,
> > > and it is quite possible that the "made up" racks will have much fewer
> > > number of brokers. As I explained in the KIP, uneven number of brokers
> in
> > > racks will lead to uneven distribution of replicas among brokers (even
> > > though the leader distribution is still even). The brokers in the rack
> > that
> > > has fewer number of brokers will get more replicas per broker than
> > brokers
> > > in other racks.
> > >
> > > Given this fact and the replica assignment produced will be incorrect
> > > anyway from rack aware point of view, ignoring all rack information and
> > > fallback to the original algorithm is not a bad choice since it will at
> > > least have a better guarantee of replica distribution.
> > >
> > > Also for command line tools it gives user a choice if for any reason
> they
> > > want to ignore rack information and fallback to the original algorithm.
> > >
> > >
> > > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <al...@gmail.com>
> > wrote:
> > >
> > >> I am busy with some time pressing issues for the last few days. I will
> > >> think about how the incomplete rack information will affect the
> balance
> > and
> > >> update the KIP by early next week.
> > >>
> > >> Thanks,
> > >> Allen
> > >>
> > >>
> > >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <ne...@confluent.io>
> > wrote:
> > >>
> > >>> Few suggestions on improving the KIP
> > >>>
> > >>> *If some brokers have rack, and some do not, the algorithm will
> thrown
> > an
> > >>> > exception. This is to prevent incorrect assignment caused by user
> > >>> error.*
> > >>>
> > >>>
> > >>> In the KIP, can you clearly state the user-facing behavior when some
> > >>> brokers have rack information and some don't. Which actions and
> > requests
> > >>> will error out and how?
> > >>>
> > >>> *Even distribution of partition leadership among brokers*
> > >>>
> > >>>
> > >>> There is some information about arranging the sorted broker list
> > >>> interlaced
> > >>> with rack ids. Can you describe the changes to the current algorithm
> > in a
> > >>> little more detail? How does this interlacing work if only a subset
> of
> > >>> brokers have the rack id configured? Does this still work if uneven #
> > of
> > >>> brokers are assigned to each rack? It might work, I'm looking for
> more
> > >>> details on the changes, since it will affect the behavior seen by the
> > >>> user
> > >>> - imbalance on either the leaders or data or both.
> > >>>
> > >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> > aauradkar@linkedin.com>
> > >>> wrote:
> > >>>
> > >>> > I think this sounds reasonable. Anyone else have comments?
> > >>> >
> > >>> > Aditya
> > >>> >
> > >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <al...@gmail.com>
> > >>> wrote:
> > >>> >
> > >>> > > During the discussion in the hangout, it was mentioned that it
> > would
> > >>> be
> > >>> > > desirable that consumers know the rack information of the brokers
> > so
> > >>> that
> > >>> > > they can consume from the broker in the same rack to reduce
> > latency.
> > >>> As I
> > >>> > > understand this will only be beneficial if consumer can consume
> > from
> > >>> any
> > >>> > > broker in ISR, which is not possible now.
> > >>> > >
> > >>> > > I suggest we skip the change to TMR. Once the change is made to
> > >>> consumer
> > >>> > to
> > >>> > > be able to consume from any broker in ISR, the rack information
> can
> > >>> be
> > >>> > > added to TMR.
> > >>> > >
> > >>> > > Another thing I want to confirm is  command line behavior. I
> think
> > >>> the
> > >>> > > desirable default behavior is to fail fast on command line for
> > >>> incomplete
> > >>> > > rack mapping. The error message can include further instruction
> > that
> > >>> > tells
> > >>> > > the user to add an extra argument (like
> "--allow-partial-rackinfo")
> > >>> to
> > >>> > > suppress the error and do an imperfect rack aware assignment. If
> > the
> > >>> > > default behavior is to allow incomplete mapping, the error can
> > still
> > >>> be
> > >>> > > easily missed.
> > >>> > >
> > >>> > > The affected command line tools are TopicCommand and
> > >>> > > ReassignPartitionsCommand.
> > >>> > >
> > >>> > > Thanks,
> > >>> > > Allen
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> > >>> > aauradkar@linkedin.com>
> > >>> > > wrote:
> > >>> > >
> > >>> > > > Hi Allen,
> > >>> > > >
> > >>> > > > For TopicMetadataResponse to understand version, you can bump
> up
> > >>> the
> > >>> > > > request version itself. Based on the version of the request,
> the
> > >>> > response
> > >>> > > > can be appropriately serialized. It shouldn't be a huge change.
> > For
> > >>> > > > example: We went through something similar for ProduceRequest
> > >>> recently
> > >>> > (
> > >>> > > > https://reviews.apache.org/r/33378/)
> > >>> > > > I guess the reason protocol information is not included in the
> > TMR
> > >>> is
> > >>> > > > because the topic itself is independent of any particular
> > protocol
> > >>> (SSL
> > >>> > > vs
> > >>> > > > Plaintext). Having said that, I'm not sure we even need rack
> > >>> > information
> > >>> > > in
> > >>> > > > TMR. What usecase were you thinking of initially?
> > >>> > > >
> > >>> > > > For 1 - I'd be fine with adding an option to the command line
> > tools
> > >>> > that
> > >>> > > > check rack assignment. For e.g. "--strict-assignment" or
> > something
> > >>> > > similar.
> > >>> > > >
> > >>> > > > Aditya
> > >>> > > >
> > >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> > allenxwang@gmail.com>
> > >>> > > wrote:
> > >>> > > >
> > >>> > > > > For 2 and 3, I have updated the KIP. Please take a look. One
> > >>> thing I
> > >>> > > have
> > >>> > > > > changed is removing the proposal to add rack to
> > >>> > TopicMetadataResponse.
> > >>> > > > The
> > >>> > > > > reason is that unlike UpdateMetadataRequest,
> > >>> TopicMetadataResponse
> > >>> > does
> > >>> > > > not
> > >>> > > > > understand version. I don't see a way to include rack without
> > >>> > breaking
> > >>> > > > old
> > >>> > > > > version of clients. That's probably why secure protocol is
> not
> > >>> > included
> > >>> > > > in
> > >>> > > > > the TopicMetadataResponse either. I think it will be a much
> > >>> bigger
> > >>> > > change
> > >>> > > > > to include rack in TopicMetadataResponse.
> > >>> > > > >
> > >>> > > > > For 1, my concern is that doing rack aware assignment without
> > >>> > complete
> > >>> > > > > broker to rack mapping will result in assignment that is not
> > rack
> > >>> > aware
> > >>> > > > and
> > >>> > > > > fail to provide fault tolerance in the event of rack outage.
> > This
> > >>> > kind
> > >>> > > of
> > >>> > > > > problem will be difficult to surface. And the cost of this
> > >>> problem is
> > >>> > > > high:
> > >>> > > > > you have to do partition reassignment if you are lucky to
> spot
> > >>> the
> > >>> > > > problem
> > >>> > > > > early on or face the consequence of data loss during real
> rack
> > >>> > outage.
> > >>> > > > >
> > >>> > > > > I do see the concern of fail-fast as it might also cause data
> > >>> loss if
> > >>> > > > > producer is not able produce the message due to topic
> creation
> > >>> > failure.
> > >>> > > > Is
> > >>> > > > > it feasible to treat dynamic topic creation and command tools
> > >>> > > > differently?
> > >>> > > > > We allow dynamic topic creation with incomplete broker-rack
> > >>> mapping
> > >>> > and
> > >>> > > > > fail fast in command line. Another option is to let user
> > >>> determine
> > >>> > the
> > >>> > > > > behavior for command line. For example, by default fail fast
> in
> > >>> > command
> > >>> > > > > line but allow incomplete broker-rack mapping if another
> switch
> > >>> is
> > >>> > > > > provided.
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> > >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> > >>> > > > >
> > >>> > > > > > Hey Allen,
> > >>> > > > > >
> > >>> > > > > > 1. If we choose fail fast topic creation, we will have
> topic
> > >>> > creation
> > >>> > > > > > failures while upgrading the cluster. I really doubt we
> want
> > >>> this
> > >>> > > > > behavior.
> > >>> > > > > > Ideally, this should be invisible to clients of a cluster.
> > >>> > Currently,
> > >>> > > > > each
> > >>> > > > > > broker is effectively its own rack. So we probably can use
> > the
> > >>> rack
> > >>> > > > > > information whenever possible but not make it a hard
> > >>> requirement.
> > >>> > To
> > >>> > > > > extend
> > >>> > > > > > Gwen's example, one badly configured broker should not
> > degrade
> > >>> > topic
> > >>> > > > > > creation for the entire cluster.
> > >>> > > > > >
> > >>> > > > > > 2. Upgrade scenario - Can you add a section on the upgrade
> > >>> piece to
> > >>> > > > > confirm
> > >>> > > > > > that old clients will not see errors? I believe
> > >>> > > > > ZookeeperConsumerConnector
> > >>> > > > > > reads the Broker objects from ZK. I wanted to confirm that
> > this
> > >>> > will
> > >>> > > > not
> > >>> > > > > > cause any problems.
> > >>> > > > > >
> > >>> > > > > > 3. Could you elaborate your proposed changes to the
> > >>> > > > UpdateMetadataRequest
> > >>> > > > > > in the "Public Interfaces" section? Personally, I find this
> > >>> format
> > >>> > > easy
> > >>> > > > > to
> > >>> > > > > > read in terms of wire protocol changes:
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > >>> > > > > >
> > >>> > > > > > Aditya
> > >>> > > > > >
> > >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> > >>> allenxwang@gmail.com>
> > >>> > > > > wrote:
> > >>> > > > > >
> > >>> > > > > > > KIP is updated include rack as an optional property for
> > >>> broker.
> > >>> > > > Please
> > >>> > > > > > take
> > >>> > > > > > > a look and let me know if more details are needed.
> > >>> > > > > > >
> > >>> > > > > > > For the case where some brokers have rack and some do
> not,
> > >>> the
> > >>> > > > current
> > >>> > > > > > KIP
> > >>> > > > > > > uses the fail-fast behavior. If there are concerns, we
> can
> > >>> > further
> > >>> > > > > > discuss
> > >>> > > > > > > this in the email thread or next hangout.
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
> > >>> > allenxwang@gmail.com
> > >>> > > >
> > >>> > > > > > wrote:
> > >>> > > > > > >
> > >>> > > > > > > > That's a good question. I can think of three actions if
> > the
> > >>> > rack
> > >>> > > > > > > > information is incomplete:
> > >>> > > > > > > >
> > >>> > > > > > > > 1. Treat the node without rack as if it is on its
> unique
> > >>> rack
> > >>> > > > > > > > 2. Disregard all rack information and fallback to
> current
> > >>> > > algorithm
> > >>> > > > > > > > 3. Fail-fast
> > >>> > > > > > > >
> > >>> > > > > > > > Now I think about it, one and three make more sense.
> The
> > >>> reason
> > >>> > > for
> > >>> > > > > > > > fail-fast is that user mistake for not providing the
> rack
> > >>> may
> > >>> > > never
> > >>> > > > > be
> > >>> > > > > > > > found if we tolerate that and the assignment may not be
> > >>> rack
> > >>> > > aware
> > >>> > > > as
> > >>> > > > > > the
> > >>> > > > > > > > user has expected and this creates debug problems when
> > >>> things
> > >>> > > fail.
> > >>> > > > > > > >
> > >>> > > > > > > > What do you think? If not fail-fast, is there anyway we
> > can
> > >>> > make
> > >>> > > > the
> > >>> > > > > > user
> > >>> > > > > > > > error standing out?
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
> > >>> > > gwen@confluent.io>
> > >>> > > > > > > wrote:
> > >>> > > > > > > >
> > >>> > > > > > > >> Thanks! Just to clarify, when some brokers have rack
> > >>> > assignment
> > >>> > > > and
> > >>> > > > > > some
> > >>> > > > > > > >> don't, do we act like none of them have it? or like
> > those
> > >>> > > without
> > >>> > > > > > > >> assignment are in their own rack?
> > >>> > > > > > > >>
> > >>> > > > > > > >> The first scenario is good when first setting up
> > >>> > rack-awareness,
> > >>> > > > but
> > >>> > > > > > the
> > >>> > > > > > > >> second makes more sense for on-going maintenance (I
> can
> > >>> > totally
> > >>> > > > see
> > >>> > > > > > > >> someone
> > >>> > > > > > > >> adding a node and forgetting to set the rack property,
> > we
> > >>> > don't
> > >>> > > > want
> > >>> > > > > > > this
> > >>> > > > > > > >> to change behavior for anything except the new node).
> > >>> > > > > > > >>
> > >>> > > > > > > >> What do you think?
> > >>> > > > > > > >>
> > >>> > > > > > > >> Gwen
> > >>> > > > > > > >>
> > >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
> > >>> > > > allenxwang@gmail.com>
> > >>> > > > > > > >> wrote:
> > >>> > > > > > > >>
> > >>> > > > > > > >> > For scenario 1:
> > >>> > > > > > > >> >
> > >>> > > > > > > >> > - Add the rack information to broker property file
> or
> > >>> > > > dynamically
> > >>> > > > > > set
> > >>> > > > > > > >> it in
> > >>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You
> would
> > do
> > >>> > that
> > >>> > > > for
> > >>> > > > > > all
> > >>> > > > > > > >> > brokers and restart the brokers one by one.
> > >>> > > > > > > >> >
> > >>> > > > > > > >> > In this scenario, the complete broker to rack
> mapping
> > >>> may
> > >>> > not
> > >>> > > be
> > >>> > > > > > > >> available
> > >>> > > > > > > >> > until every broker is restarted. During that time we
> > >>> fall
> > >>> > back
> > >>> > > > to
> > >>> > > > > > > >> default
> > >>> > > > > > > >> > replica assignment algorithm.
> > >>> > > > > > > >> >
> > >>> > > > > > > >> > For scenario 2:
> > >>> > > > > > > >> >
> > >>> > > > > > > >> > - Add the rack information to broker property file
> or
> > >>> > > > dynamically
> > >>> > > > > > set
> > >>> > > > > > > >> it in
> > >>> > > > > > > >> > the wrapper code and start the broker.
> > >>> > > > > > > >> >
> > >>> > > > > > > >> >
> > >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
> > >>> > > > gwen@confluent.io>
> > >>> > > > > > > >> wrote:
> > >>> > > > > > > >> >
> > >>> > > > > > > >> > > Can you clarify the workflow for the following
> > >>> scenarios:
> > >>> > > > > > > >> > >
> > >>> > > > > > > >> > > 1. I currently have 6 brokers and want to add rack
> > >>> > > information
> > >>> > > > > for
> > >>> > > > > > > >> each
> > >>> > > > > > > >> > > 2. I'm adding a new broker and I want to specify
> > which
> > >>> > rack
> > >>> > > it
> > >>> > > > > > > >> belongs on
> > >>> > > > > > > >> > > while adding it.
> > >>> > > > > > > >> > >
> > >>> > > > > > > >> > > Thanks!
> > >>> > > > > > > >> > >
> > >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
> > >>> > > > > allenxwang@gmail.com
> > >>> > > > > > >
> > >>> > > > > > > >> > wrote:
> > >>> > > > > > > >> > >
> > >>> > > > > > > >> > > > We discussed the KIP in the hangout today. The
> > >>> > > > recommendation
> > >>> > > > > is
> > >>> > > > > > > to
> > >>> > > > > > > >> > make
> > >>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For
> users
> > >>> with
> > >>> > > > > existing
> > >>> > > > > > > rack
> > >>> > > > > > > >> > > > information stored somewhere, they would need to
> > >>> > retrieve
> > >>> > > > the
> > >>> > > > > > > >> > information
> > >>> > > > > > > >> > > > at broker start up and dynamically set the rack
> > >>> > property,
> > >>> > > > > which
> > >>> > > > > > > can
> > >>> > > > > > > >> be
> > >>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker.
> > There
> > >>> will
> > >>> > > be
> > >>> > > > no
> > >>> > > > > > > >> > interface
> > >>> > > > > > > >> > > or
> > >>> > > > > > > >> > > > pluggable implementation to retrieve the rack
> > >>> > information.
> > >>> > > > > > > >> > > >
> > >>> > > > > > > >> > > > The assumption is that you always need to
> restart
> > >>> the
> > >>> > > broker
> > >>> > > > > to
> > >>> > > > > > > >> make a
> > >>> > > > > > > >> > > > change to the rack.
> > >>> > > > > > > >> > > >
> > >>> > > > > > > >> > > > Once the rack becomes a broker property, it will
> > be
> > >>> > > possible
> > >>> > > > > to
> > >>> > > > > > > make
> > >>> > > > > > > >> > rack
> > >>> > > > > > > >> > > > part of the meta data to help the consumer
> choose
> > >>> which
> > >>> > in
> > >>> > > > > sync
> > >>> > > > > > > >> replica
> > >>> > > > > > > >> > > to
> > >>> > > > > > > >> > > > consume from as part of the future consumer
> > >>> enhancement.
> > >>> > > > > > > >> > > >
> > >>> > > > > > > >> > > > I will update the KIP.
> > >>> > > > > > > >> > > >
> > >>> > > > > > > >> > > > Thanks,
> > >>> > > > > > > >> > > > Allen
> > >>> > > > > > > >> > > >
> > >>> > > > > > > >> > > >
> > >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
> > >>> > > > > > allenxwang@gmail.com>
> > >>> > > > > > > >> > wrote:
> > >>> > > > > > > >> > > >
> > >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP
> > was
> > >>> not
> > >>> > > > > > discussed
> > >>> > > > > > > >> due
> > >>> > > > > > > >> > to
> > >>> > > > > > > >> > > > > time constraint.
> > >>> > > > > > > >> > > > >
> > >>> > > > > > > >> > > > > However, after hearing discussion of KIP-35, I
> > >>> have
> > >>> > the
> > >>> > > > > > feeling
> > >>> > > > > > > >> that
> > >>> > > > > > > >> > > > > incompatibility (caused by new broker
> property)
> > >>> > between
> > >>> > > > > > brokers
> > >>> > > > > > > >> with
> > >>> > > > > > > >> > > > > different versions  will be solved there. In
> > >>> addition,
> > >>> > > > > having
> > >>> > > > > > > >> stack
> > >>> > > > > > > >> > in
> > >>> > > > > > > >> > > > > broker property as meta data may also help
> > >>> consumers
> > >>> > in
> > >>> > > > the
> > >>> > > > > > > >> future.
> > >>> > > > > > > >> > So
> > >>> > > > > > > >> > > I
> > >>> > > > > > > >> > > > am
> > >>> > > > > > > >> > > > > open to adding stack property to broker.
> > >>> > > > > > > >> > > > >
> > >>> > > > > > > >> > > > > Hopefully we can discuss this in the next KIP
> > >>> hangout.
> > >>> > > > > > > >> > > > >
> > >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
> > >>> > > > > > > allenxwang@gmail.com
> > >>> > > > > > > >> >
> > >>> > > > > > > >> > > > wrote:
> > >>> > > > > > > >> > > > >
> > >>> > > > > > > >> > > > >> Can you send me the information on the next
> KIP
> > >>> > > hangout?
> > >>> > > > > > > >> > > > >>
> > >>> > > > > > > >> > > > >> Currently the broker-rack mapping is not
> > cached.
> > >>> In
> > >>> > > > > > KafkaApis,
> > >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each time
> > the
> > >>> > > mapping
> > >>> > > > > is
> > >>> > > > > > > >> needed
> > >>> > > > > > > >> > > for
> > >>> > > > > > > >> > > > >> auto topic creation. This will ensure latest
> > >>> mapping
> > >>> > is
> > >>> > > > > used
> > >>> > > > > > at
> > >>> > > > > > > >> any
> > >>> > > > > > > >> > > > time.
> > >>> > > > > > > >> > > > >>
> > >>> > > > > > > >> > > > >> The ability to get the complete mapping makes
> > it
> > >>> > simple
> > >>> > > > to
> > >>> > > > > > > reuse
> > >>> > > > > > > >> the
> > >>> > > > > > > >> > > > same
> > >>> > > > > > > >> > > > >> interface in command line tools.
> > >>> > > > > > > >> > > > >>
> > >>> > > > > > > >> > > > >>
> > >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya
> > >>> Auradkar <
> > >>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
> > >>> > > > > > > >> > > > >>
> > >>> > > > > > > >> > > > >>> Perhaps we discuss this during the next KIP
> > >>> hangout?
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > > >>> I do see that a pluggable rack locator can
> be
> > >>> useful
> > >>> > > > but I
> > >>> > > > > > do
> > >>> > > > > > > >> see a
> > >>> > > > > > > >> > > few
> > >>> > > > > > > >> > > > >>> concerns:
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > > >>> - The RackLocator (as described in the
> > >>> document),
> > >>> > > > implies
> > >>> > > > > > that
> > >>> > > > > > > >> it
> > >>> > > > > > > >> > can
> > >>> > > > > > > >> > > > >>> discover rack information for any node in
> the
> > >>> > cluster.
> > >>> > > > How
> > >>> > > > > > > does
> > >>> > > > > > > >> it
> > >>> > > > > > > >> > > deal
> > >>> > > > > > > >> > > > >>> with rack location changes? For example, if
> I
> > >>> moved
> > >>> > > > broker
> > >>> > > > > > id
> > >>> > > > > > > >> (1)
> > >>> > > > > > > >> > > from
> > >>> > > > > > > >> > > > >>> rack
> > >>> > > > > > > >> > > > >>> X to Y, I only have to start that broker
> with
> > a
> > >>> > newer
> > >>> > > > rack
> > >>> > > > > > > >> config.
> > >>> > > > > > > >> > If
> > >>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack
> > >>> information at
> > >>> > > > start
> > >>> > > > > up
> > >>> > > > > > > >> time,
> > >>> > > > > > > >> > > any
> > >>> > > > > > > >> > > > >>> change to a broker will require bouncing the
> > >>> entire
> > >>> > > > > cluster
> > >>> > > > > > > >> since
> > >>> > > > > > > >> > > > >>> createTopic requests can be sent to any node
> > in
> > >>> the
> > >>> > > > > cluster.
> > >>> > > > > > > >> > > > >>> For this reason it may be simpler to have
> each
> > >>> node
> > >>> > be
> > >>> > > > > aware
> > >>> > > > > > > of
> > >>> > > > > > > >> its
> > >>> > > > > > > >> > > own
> > >>> > > > > > > >> > > > >>> rack and persist it in ZK during start up
> > time.
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an
> > external
> > >>> > > service
> > >>> > > > > > being
> > >>> > > > > > > >> > > available
> > >>> > > > > > > >> > > > >>> to
> > >>> > > > > > > >> > > > >>> serve rack information.
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a couple
> of
> > >>> other
> > >>> > > > > systems
> > >>> > > > > > > deal
> > >>> > > > > > > >> > with
> > >>> > > > > > > >> > > > >>> zone/rack awareness.
> > >>> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
> > >>> > > > > > > >> > > > >>> (Property File configuration)
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > >
> > >>> > > > > > > >> > >
> > >>> > > > > > > >> >
> > >>> > > > > > > >>
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > >>> > > > > > > >> > > > >>> (Dynamic inference)
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > >
> > >>> > > > > > > >> > >
> > >>> > > > > > > >> >
> > >>> > > > > > > >>
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > > >>> Voldemort does a static node -> zone
> > assignment
> > >>> > based
> > >>> > > on
> > >>> > > > > > > >> > > configuration.
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > > >>> Aditya
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen
> Wang <
> > >>> > > > > > > >> allenxwang@gmail.com
> > >>> > > > > > > >> > >
> > >>> > > > > > > >> > > > >>> wrote:
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > > >>> > I would like to see if we can do both:
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to facilitate
> > >>> > migration
> > >>> > > > > with
> > >>> > > > > > > >> > existing
> > >>> > > > > > > >> > > > >>> > broker-rack mapping
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>> > - Make rack an optional property for
> broker.
> > >>> If
> > >>> > rack
> > >>> > > > is
> > >>> > > > > > > >> available
> > >>> > > > > > > >> > > > from
> > >>> > > > > > > >> > > > >>> > broker, treat it as source of truth. For
> > users
> > >>> > with
> > >>> > > > > > existing
> > >>> > > > > > > >> > > > >>> broker-rack
> > >>> > > > > > > >> > > > >>> > mapping somewhere else, they can use the
> > >>> pluggable
> > >>> > > way
> > >>> > > > > or
> > >>> > > > > > > they
> > >>> > > > > > > >> > can
> > >>> > > > > > > >> > > > >>> transfer
> > >>> > > > > > > >> > > > >>> > the mapping to the broker rack property.
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>> > One thing I am not sure is what happens at
> > >>> rolling
> > >>> > > > > upgrade
> > >>> > > > > > > >> when
> > >>> > > > > > > >> > we
> > >>> > > > > > > >> > > > have
> > >>> > > > > > > >> > > > >>> > rack as a broker property. For brokers
> with
> > >>> older
> > >>> > > > > version
> > >>> > > > > > of
> > >>> > > > > > > >> > Kafka,
> > >>> > > > > > > >> > > > >>> will it
> > >>> > > > > > > >> > > > >>> > cause problem for them? If so, is there
> any
> > >>> > > > workaround?
> > >>> > > > > I
> > >>> > > > > > > also
> > >>> > > > > > > >> > > think
> > >>> > > > > > > >> > > > it
> > >>> > > > > > > >> > > > >>> > would be better not to have rack in the
> > >>> controller
> > >>> > > > wire
> > >>> > > > > > > >> protocol
> > >>> > > > > > > >> > > but
> > >>> > > > > > > >> > > > >>> not
> > >>> > > > > > > >> > > > >>> > sure if it is achievable.
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>> > Thanks,
> > >>> > > > > > > >> > > > >>> > Allen
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd
> > Palino <
> > >>> > > > > > > >> tpalino@gmail.com>
> > >>> > > > > > > >> > > > >>> wrote:
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>> > > I tend to like the idea of a pluggable
> > >>> locator.
> > >>> > > For
> > >>> > > > > > > >> example, we
> > >>> > > > > > > >> > > > >>> already
> > >>> > > > > > > >> > > > >>> > > have an interface for discovering
> > >>> information
> > >>> > > about
> > >>> > > > > the
> > >>> > > > > > > >> > physical
> > >>> > > > > > > >> > > > >>> location
> > >>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea of
> > >>> having to
> > >>> > > > > > maintain
> > >>> > > > > > > >> data
> > >>> > > > > > > >> > in
> > >>> > > > > > > >> > > > >>> > multiple
> > >>> > > > > > > >> > > > >>> > > places.
> > >>> > > > > > > >> > > > >>> > >
> > >>> > > > > > > >> > > > >>> > > -Todd
> > >>> > > > > > > >> > > > >>> > >
> > >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya
> > >>> > Auradkar <
> > >>> > > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
> > >>> > > > > > > >> > > > >>> > >
> > >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
> > >>> > > > > > > >> > > > >>> > > >
> > >>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a
> > >>> RackLocator
> > >>> > > class
> > >>> > > > > that
> > >>> > > > > > > is
> > >>> > > > > > > >> > > > pluggable
> > >>> > > > > > > >> > > > >>> > seems
> > >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers to
> > >>> > potentially
> > >>> > > > > > non-ZK
> > >>> > > > > > > >> > storage
> > >>> > > > > > > >> > > > >>> for the
> > >>> > > > > > > >> > > > >>> > > > rack info which I don't think is
> > >>> necessary.
> > >>> > > > > > > >> > > > >>> > > >
> > >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in zk
> > >>> under
> > >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> > >>> > > > > > > >> > > > >>> > > > similar to other broker properties and
> > >>> add a
> > >>> > > > config
> > >>> > > > > in
> > >>> > > > > > > >> > > > KafkaConfig
> > >>> > > > > > > >> > > > >>> > called
> > >>> > > > > > > >> > > > >>> > > > "rack".
> > >>> > > > > > > >> > > > >>> > > >
> > >>> > > > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > >>> > > > > > > >> > > "rack":
> > >>> > > > > > > >> > > > >>> > "abc"}
> > >>> > > > > > > >> > > > >>> > > >
> > >>> > > > > > > >> > > > >>> > > > Aditya
> > >>> > > > > > > >> > > > >>> > > >
> > >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen
> > >>> Shapira
> > >>> > <
> > >>> > > > > > > >> > > gwen@confluent.io
> > >>> > > > > > > >> > > > >
> > >>> > > > > > > >> > > > >>> > wrote:
> > >>> > > > > > > >> > > > >>> > > >
> > >>> > > > > > > >> > > > >>> > > > > Hi,
> > >>> > > > > > > >> > > > >>> > > > >
> > >>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP
> > for
> > >>> > this.
> > >>> > > > This
> > >>> > > > > > is
> > >>> > > > > > > >> super
> > >>> > > > > > > >> > > > >>> important
> > >>> > > > > > > >> > > > >>> > > for
> > >>> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
> > >>> > > > > > > >> > > > >>> > > > >
> > >>> > > > > > > >> > > > >>> > > > > Few questions:
> > >>> > > > > > > >> > > > >>> > > > >
> > >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many
> racks
> > as
> > >>> > > > > possible"?
> > >>> > > > > > > I'd
> > >>> > > > > > > >> > want
> > >>> > > > > > > >> > > to
> > >>> > > > > > > >> > > > >>> > balance
> > >>> > > > > > > >> > > > >>> > > > > between safety (more racks) and
> > network
> > >>> > > > > utilization
> > >>> > > > > > > >> > (traffic
> > >>> > > > > > > >> > > > >>> within a
> > >>> > > > > > > >> > > > >>> > > > rack
> > >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch).
> > One
> > >>> > > replica
> > >>> > > > > on
> > >>> > > > > > a
> > >>> > > > > > > >> > > different
> > >>> > > > > > > >> > > > >>> rack
> > >>> > > > > > > >> > > > >>> > > and
> > >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if possible)
> > >>> sounds
> > >>> > > > better
> > >>> > > > > to
> > >>> > > > > > > me.
> > >>> > > > > > > >> > > > >>> > > > >
> > >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly
> > >>> complex
> > >>> > > > > compared
> > >>> > > > > > to
> > >>> > > > > > > >> > > adding a
> > >>> > > > > > > >> > > > >>> > > > rack.number
> > >>> > > > > > > >> > > > >>> > > > > property to the broker properties
> > file.
> > >>> Why
> > >>> > do
> > >>> > > > we
> > >>> > > > > > want
> > >>> > > > > > > >> > that?
> > >>> > > > > > > >> > > > >>> > > > >
> > >>> > > > > > > >> > > > >>> > > > > Gwen
> > >>> > > > > > > >> > > > >>> > > > >
> > >>> > > > > > > >> > > > >>> > > > >
> > >>> > > > > > > >> > > > >>> > > > >
> > >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM,
> > Allen
> > >>> > Wang <
> > >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> > >>> > > > > > > >> > > > >>> > > > wrote:
> > >>> > > > > > > >> > > > >>> > > > >
> > >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> > >>> > > > > > > >> > > > >>> > > > > >
> > >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack
> aware
> > >>> > replica
> > >>> > > > > > > >> assignment.
> > >>> > > > > > > >> > > > >>> > > > > >
> > >>> > > > > > > >> > > > >>> > > > > >
> > >>> > > > > > > >> > > > >>> > > > > >
> > >>> > > > > > > >> > > > >>> > > > >
> > >>> > > > > > > >> > > > >>> > > >
> > >>> > > > > > > >> > > > >>> > >
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > >
> > >>> > > > > > > >> > >
> > >>> > > > > > > >> >
> > >>> > > > > > > >>
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > >>> > > > > > > >> > > > >>> > > > > >
> > >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the
> isolation
> > >>> > > provided
> > >>> > > > by
> > >>> > > > > > the
> > >>> > > > > > > >> > racks
> > >>> > > > > > > >> > > in
> > >>> > > > > > > >> > > > >>> data
> > >>> > > > > > > >> > > > >>> > > > center
> > >>> > > > > > > >> > > > >>> > > > > > and distribute replicas to racks
> to
> > >>> > provide
> > >>> > > > > fault
> > >>> > > > > > > >> > > tolerance.
> > >>> > > > > > > >> > > > >>> > > > > >
> > >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
> > >>> > > > > > > >> > > > >>> > > > > >
> > >>> > > > > > > >> > > > >>> > > > > > Thanks,
> > >>> > > > > > > >> > > > >>> > > > > > Allen
> > >>> > > > > > > >> > > > >>> > > > > >
> > >>> > > > > > > >> > > > >>> > > > >
> > >>> > > > > > > >> > > > >>> > > >
> > >>> > > > > > > >> > > > >>> > >
> > >>> > > > > > > >> > > > >>> >
> > >>> > > > > > > >> > > > >>>
> > >>> > > > > > > >> > > > >>
> > >>> > > > > > > >> > > > >>
> > >>> > > > > > > >> > > > >
> > >>> > > > > > > >> > > >
> > >>> > > > > > > >> > >
> > >>> > > > > > > >> >
> > >>> > > > > > > >>
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Thanks,
> > >>> Neha
> > >>>
> > >>
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Jun Rao <ju...@confluent.io>.
Allen,

Thanks for the proposal. A few comments.

1. Since this KIP changes the inter broker communication protocol
(UpdateMetadataRequest), we will need to document the upgrade path (similar
to what's described in
http://kafka.apache.org/090/documentation.html#upgrade).

2. It might be useful to include the rack info of the broker in
TopicMetadataResponse. This can be useful for administrative tasks, as well
as read affinity in the future.

Jun



On Thu, Dec 10, 2015 at 9:38 AM, Allen Wang <al...@gmail.com> wrote:

> If there are no more comments I would like to call for a vote.
>
>
> On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <al...@gmail.com> wrote:
>
> > KIP is updated with more details and how to handle the situation where
> > rack information is incomplete.
> >
> > In the situation where rack information is incomplete, but we want to
> > continue with the assignment, I have suggested to ignore all rack
> > information and fallback to original algorithm. The reason is explained
> > below:
> >
> > The other options are to assume that the broker without the rack belong
> to
> > its own unique rack, or they belong to one "default" rack. Either way we
> > choose, it is highly likely to result in uneven number of brokers in
> racks,
> > and it is quite possible that the "made up" racks will have much fewer
> > number of brokers. As I explained in the KIP, uneven number of brokers in
> > racks will lead to uneven distribution of replicas among brokers (even
> > though the leader distribution is still even). The brokers in the rack
> that
> > has fewer number of brokers will get more replicas per broker than
> brokers
> > in other racks.
> >
> > Given this fact and the replica assignment produced will be incorrect
> > anyway from rack aware point of view, ignoring all rack information and
> > fallback to the original algorithm is not a bad choice since it will at
> > least have a better guarantee of replica distribution.
> >
> > Also for command line tools it gives user a choice if for any reason they
> > want to ignore rack information and fallback to the original algorithm.
> >
> >
> > On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <al...@gmail.com>
> wrote:
> >
> >> I am busy with some time pressing issues for the last few days. I will
> >> think about how the incomplete rack information will affect the balance
> and
> >> update the KIP by early next week.
> >>
> >> Thanks,
> >> Allen
> >>
> >>
> >> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <ne...@confluent.io>
> wrote:
> >>
> >>> Few suggestions on improving the KIP
> >>>
> >>> *If some brokers have rack, and some do not, the algorithm will thrown
> an
> >>> > exception. This is to prevent incorrect assignment caused by user
> >>> error.*
> >>>
> >>>
> >>> In the KIP, can you clearly state the user-facing behavior when some
> >>> brokers have rack information and some don't. Which actions and
> requests
> >>> will error out and how?
> >>>
> >>> *Even distribution of partition leadership among brokers*
> >>>
> >>>
> >>> There is some information about arranging the sorted broker list
> >>> interlaced
> >>> with rack ids. Can you describe the changes to the current algorithm
> in a
> >>> little more detail? How does this interlacing work if only a subset of
> >>> brokers have the rack id configured? Does this still work if uneven #
> of
> >>> brokers are assigned to each rack? It might work, I'm looking for more
> >>> details on the changes, since it will affect the behavior seen by the
> >>> user
> >>> - imbalance on either the leaders or data or both.
> >>>
> >>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <
> aauradkar@linkedin.com>
> >>> wrote:
> >>>
> >>> > I think this sounds reasonable. Anyone else have comments?
> >>> >
> >>> > Aditya
> >>> >
> >>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <al...@gmail.com>
> >>> wrote:
> >>> >
> >>> > > During the discussion in the hangout, it was mentioned that it
> would
> >>> be
> >>> > > desirable that consumers know the rack information of the brokers
> so
> >>> that
> >>> > > they can consume from the broker in the same rack to reduce
> latency.
> >>> As I
> >>> > > understand this will only be beneficial if consumer can consume
> from
> >>> any
> >>> > > broker in ISR, which is not possible now.
> >>> > >
> >>> > > I suggest we skip the change to TMR. Once the change is made to
> >>> consumer
> >>> > to
> >>> > > be able to consume from any broker in ISR, the rack information can
> >>> be
> >>> > > added to TMR.
> >>> > >
> >>> > > Another thing I want to confirm is  command line behavior. I think
> >>> the
> >>> > > desirable default behavior is to fail fast on command line for
> >>> incomplete
> >>> > > rack mapping. The error message can include further instruction
> that
> >>> > tells
> >>> > > the user to add an extra argument (like "--allow-partial-rackinfo")
> >>> to
> >>> > > suppress the error and do an imperfect rack aware assignment. If
> the
> >>> > > default behavior is to allow incomplete mapping, the error can
> still
> >>> be
> >>> > > easily missed.
> >>> > >
> >>> > > The affected command line tools are TopicCommand and
> >>> > > ReassignPartitionsCommand.
> >>> > >
> >>> > > Thanks,
> >>> > > Allen
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> >>> > aauradkar@linkedin.com>
> >>> > > wrote:
> >>> > >
> >>> > > > Hi Allen,
> >>> > > >
> >>> > > > For TopicMetadataResponse to understand version, you can bump up
> >>> the
> >>> > > > request version itself. Based on the version of the request, the
> >>> > response
> >>> > > > can be appropriately serialized. It shouldn't be a huge change.
> For
> >>> > > > example: We went through something similar for ProduceRequest
> >>> recently
> >>> > (
> >>> > > > https://reviews.apache.org/r/33378/)
> >>> > > > I guess the reason protocol information is not included in the
> TMR
> >>> is
> >>> > > > because the topic itself is independent of any particular
> protocol
> >>> (SSL
> >>> > > vs
> >>> > > > Plaintext). Having said that, I'm not sure we even need rack
> >>> > information
> >>> > > in
> >>> > > > TMR. What usecase were you thinking of initially?
> >>> > > >
> >>> > > > For 1 - I'd be fine with adding an option to the command line
> tools
> >>> > that
> >>> > > > check rack assignment. For e.g. "--strict-assignment" or
> something
> >>> > > similar.
> >>> > > >
> >>> > > > Aditya
> >>> > > >
> >>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <
> allenxwang@gmail.com>
> >>> > > wrote:
> >>> > > >
> >>> > > > > For 2 and 3, I have updated the KIP. Please take a look. One
> >>> thing I
> >>> > > have
> >>> > > > > changed is removing the proposal to add rack to
> >>> > TopicMetadataResponse.
> >>> > > > The
> >>> > > > > reason is that unlike UpdateMetadataRequest,
> >>> TopicMetadataResponse
> >>> > does
> >>> > > > not
> >>> > > > > understand version. I don't see a way to include rack without
> >>> > breaking
> >>> > > > old
> >>> > > > > version of clients. That's probably why secure protocol is not
> >>> > included
> >>> > > > in
> >>> > > > > the TopicMetadataResponse either. I think it will be a much
> >>> bigger
> >>> > > change
> >>> > > > > to include rack in TopicMetadataResponse.
> >>> > > > >
> >>> > > > > For 1, my concern is that doing rack aware assignment without
> >>> > complete
> >>> > > > > broker to rack mapping will result in assignment that is not
> rack
> >>> > aware
> >>> > > > and
> >>> > > > > fail to provide fault tolerance in the event of rack outage.
> This
> >>> > kind
> >>> > > of
> >>> > > > > problem will be difficult to surface. And the cost of this
> >>> problem is
> >>> > > > high:
> >>> > > > > you have to do partition reassignment if you are lucky to spot
> >>> the
> >>> > > > problem
> >>> > > > > early on or face the consequence of data loss during real rack
> >>> > outage.
> >>> > > > >
> >>> > > > > I do see the concern of fail-fast as it might also cause data
> >>> loss if
> >>> > > > > producer is not able produce the message due to topic creation
> >>> > failure.
> >>> > > > Is
> >>> > > > > it feasible to treat dynamic topic creation and command tools
> >>> > > > differently?
> >>> > > > > We allow dynamic topic creation with incomplete broker-rack
> >>> mapping
> >>> > and
> >>> > > > > fail fast in command line. Another option is to let user
> >>> determine
> >>> > the
> >>> > > > > behavior for command line. For example, by default fail fast in
> >>> > command
> >>> > > > > line but allow incomplete broker-rack mapping if another switch
> >>> is
> >>> > > > > provided.
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> >>> > > > > aauradkar@linkedin.com.invalid> wrote:
> >>> > > > >
> >>> > > > > > Hey Allen,
> >>> > > > > >
> >>> > > > > > 1. If we choose fail fast topic creation, we will have topic
> >>> > creation
> >>> > > > > > failures while upgrading the cluster. I really doubt we want
> >>> this
> >>> > > > > behavior.
> >>> > > > > > Ideally, this should be invisible to clients of a cluster.
> >>> > Currently,
> >>> > > > > each
> >>> > > > > > broker is effectively its own rack. So we probably can use
> the
> >>> rack
> >>> > > > > > information whenever possible but not make it a hard
> >>> requirement.
> >>> > To
> >>> > > > > extend
> >>> > > > > > Gwen's example, one badly configured broker should not
> degrade
> >>> > topic
> >>> > > > > > creation for the entire cluster.
> >>> > > > > >
> >>> > > > > > 2. Upgrade scenario - Can you add a section on the upgrade
> >>> piece to
> >>> > > > > confirm
> >>> > > > > > that old clients will not see errors? I believe
> >>> > > > > ZookeeperConsumerConnector
> >>> > > > > > reads the Broker objects from ZK. I wanted to confirm that
> this
> >>> > will
> >>> > > > not
> >>> > > > > > cause any problems.
> >>> > > > > >
> >>> > > > > > 3. Could you elaborate your proposed changes to the
> >>> > > > UpdateMetadataRequest
> >>> > > > > > in the "Public Interfaces" section? Personally, I find this
> >>> format
> >>> > > easy
> >>> > > > > to
> >>> > > > > > read in terms of wire protocol changes:
> >>> > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> >>> > > > > >
> >>> > > > > > Aditya
> >>> > > > > >
> >>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> >>> allenxwang@gmail.com>
> >>> > > > > wrote:
> >>> > > > > >
> >>> > > > > > > KIP is updated include rack as an optional property for
> >>> broker.
> >>> > > > Please
> >>> > > > > > take
> >>> > > > > > > a look and let me know if more details are needed.
> >>> > > > > > >
> >>> > > > > > > For the case where some brokers have rack and some do not,
> >>> the
> >>> > > > current
> >>> > > > > > KIP
> >>> > > > > > > uses the fail-fast behavior. If there are concerns, we can
> >>> > further
> >>> > > > > > discuss
> >>> > > > > > > this in the email thread or next hangout.
> >>> > > > > > >
> >>> > > > > > >
> >>> > > > > > >
> >>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
> >>> > allenxwang@gmail.com
> >>> > > >
> >>> > > > > > wrote:
> >>> > > > > > >
> >>> > > > > > > > That's a good question. I can think of three actions if
> the
> >>> > rack
> >>> > > > > > > > information is incomplete:
> >>> > > > > > > >
> >>> > > > > > > > 1. Treat the node without rack as if it is on its unique
> >>> rack
> >>> > > > > > > > 2. Disregard all rack information and fallback to current
> >>> > > algorithm
> >>> > > > > > > > 3. Fail-fast
> >>> > > > > > > >
> >>> > > > > > > > Now I think about it, one and three make more sense. The
> >>> reason
> >>> > > for
> >>> > > > > > > > fail-fast is that user mistake for not providing the rack
> >>> may
> >>> > > never
> >>> > > > > be
> >>> > > > > > > > found if we tolerate that and the assignment may not be
> >>> rack
> >>> > > aware
> >>> > > > as
> >>> > > > > > the
> >>> > > > > > > > user has expected and this creates debug problems when
> >>> things
> >>> > > fail.
> >>> > > > > > > >
> >>> > > > > > > > What do you think? If not fail-fast, is there anyway we
> can
> >>> > make
> >>> > > > the
> >>> > > > > > user
> >>> > > > > > > > error standing out?
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
> >>> > > gwen@confluent.io>
> >>> > > > > > > wrote:
> >>> > > > > > > >
> >>> > > > > > > >> Thanks! Just to clarify, when some brokers have rack
> >>> > assignment
> >>> > > > and
> >>> > > > > > some
> >>> > > > > > > >> don't, do we act like none of them have it? or like
> those
> >>> > > without
> >>> > > > > > > >> assignment are in their own rack?
> >>> > > > > > > >>
> >>> > > > > > > >> The first scenario is good when first setting up
> >>> > rack-awareness,
> >>> > > > but
> >>> > > > > > the
> >>> > > > > > > >> second makes more sense for on-going maintenance (I can
> >>> > totally
> >>> > > > see
> >>> > > > > > > >> someone
> >>> > > > > > > >> adding a node and forgetting to set the rack property,
> we
> >>> > don't
> >>> > > > want
> >>> > > > > > > this
> >>> > > > > > > >> to change behavior for anything except the new node).
> >>> > > > > > > >>
> >>> > > > > > > >> What do you think?
> >>> > > > > > > >>
> >>> > > > > > > >> Gwen
> >>> > > > > > > >>
> >>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
> >>> > > > allenxwang@gmail.com>
> >>> > > > > > > >> wrote:
> >>> > > > > > > >>
> >>> > > > > > > >> > For scenario 1:
> >>> > > > > > > >> >
> >>> > > > > > > >> > - Add the rack information to broker property file or
> >>> > > > dynamically
> >>> > > > > > set
> >>> > > > > > > >> it in
> >>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You would
> do
> >>> > that
> >>> > > > for
> >>> > > > > > all
> >>> > > > > > > >> > brokers and restart the brokers one by one.
> >>> > > > > > > >> >
> >>> > > > > > > >> > In this scenario, the complete broker to rack mapping
> >>> may
> >>> > not
> >>> > > be
> >>> > > > > > > >> available
> >>> > > > > > > >> > until every broker is restarted. During that time we
> >>> fall
> >>> > back
> >>> > > > to
> >>> > > > > > > >> default
> >>> > > > > > > >> > replica assignment algorithm.
> >>> > > > > > > >> >
> >>> > > > > > > >> > For scenario 2:
> >>> > > > > > > >> >
> >>> > > > > > > >> > - Add the rack information to broker property file or
> >>> > > > dynamically
> >>> > > > > > set
> >>> > > > > > > >> it in
> >>> > > > > > > >> > the wrapper code and start the broker.
> >>> > > > > > > >> >
> >>> > > > > > > >> >
> >>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
> >>> > > > gwen@confluent.io>
> >>> > > > > > > >> wrote:
> >>> > > > > > > >> >
> >>> > > > > > > >> > > Can you clarify the workflow for the following
> >>> scenarios:
> >>> > > > > > > >> > >
> >>> > > > > > > >> > > 1. I currently have 6 brokers and want to add rack
> >>> > > information
> >>> > > > > for
> >>> > > > > > > >> each
> >>> > > > > > > >> > > 2. I'm adding a new broker and I want to specify
> which
> >>> > rack
> >>> > > it
> >>> > > > > > > >> belongs on
> >>> > > > > > > >> > > while adding it.
> >>> > > > > > > >> > >
> >>> > > > > > > >> > > Thanks!
> >>> > > > > > > >> > >
> >>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
> >>> > > > > allenxwang@gmail.com
> >>> > > > > > >
> >>> > > > > > > >> > wrote:
> >>> > > > > > > >> > >
> >>> > > > > > > >> > > > We discussed the KIP in the hangout today. The
> >>> > > > recommendation
> >>> > > > > is
> >>> > > > > > > to
> >>> > > > > > > >> > make
> >>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For users
> >>> with
> >>> > > > > existing
> >>> > > > > > > rack
> >>> > > > > > > >> > > > information stored somewhere, they would need to
> >>> > retrieve
> >>> > > > the
> >>> > > > > > > >> > information
> >>> > > > > > > >> > > > at broker start up and dynamically set the rack
> >>> > property,
> >>> > > > > which
> >>> > > > > > > can
> >>> > > > > > > >> be
> >>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker.
> There
> >>> will
> >>> > > be
> >>> > > > no
> >>> > > > > > > >> > interface
> >>> > > > > > > >> > > or
> >>> > > > > > > >> > > > pluggable implementation to retrieve the rack
> >>> > information.
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > The assumption is that you always need to restart
> >>> the
> >>> > > broker
> >>> > > > > to
> >>> > > > > > > >> make a
> >>> > > > > > > >> > > > change to the rack.
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > Once the rack becomes a broker property, it will
> be
> >>> > > possible
> >>> > > > > to
> >>> > > > > > > make
> >>> > > > > > > >> > rack
> >>> > > > > > > >> > > > part of the meta data to help the consumer choose
> >>> which
> >>> > in
> >>> > > > > sync
> >>> > > > > > > >> replica
> >>> > > > > > > >> > > to
> >>> > > > > > > >> > > > consume from as part of the future consumer
> >>> enhancement.
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > I will update the KIP.
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > Thanks,
> >>> > > > > > > >> > > > Allen
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
> >>> > > > > > allenxwang@gmail.com>
> >>> > > > > > > >> > wrote:
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP
> was
> >>> not
> >>> > > > > > discussed
> >>> > > > > > > >> due
> >>> > > > > > > >> > to
> >>> > > > > > > >> > > > > time constraint.
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > > > However, after hearing discussion of KIP-35, I
> >>> have
> >>> > the
> >>> > > > > > feeling
> >>> > > > > > > >> that
> >>> > > > > > > >> > > > > incompatibility (caused by new broker property)
> >>> > between
> >>> > > > > > brokers
> >>> > > > > > > >> with
> >>> > > > > > > >> > > > > different versions  will be solved there. In
> >>> addition,
> >>> > > > > having
> >>> > > > > > > >> stack
> >>> > > > > > > >> > in
> >>> > > > > > > >> > > > > broker property as meta data may also help
> >>> consumers
> >>> > in
> >>> > > > the
> >>> > > > > > > >> future.
> >>> > > > > > > >> > So
> >>> > > > > > > >> > > I
> >>> > > > > > > >> > > > am
> >>> > > > > > > >> > > > > open to adding stack property to broker.
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > > > Hopefully we can discuss this in the next KIP
> >>> hangout.
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
> >>> > > > > > > allenxwang@gmail.com
> >>> > > > > > > >> >
> >>> > > > > > > >> > > > wrote:
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > > >> Can you send me the information on the next KIP
> >>> > > hangout?
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >> Currently the broker-rack mapping is not
> cached.
> >>> In
> >>> > > > > > KafkaApis,
> >>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each time
> the
> >>> > > mapping
> >>> > > > > is
> >>> > > > > > > >> needed
> >>> > > > > > > >> > > for
> >>> > > > > > > >> > > > >> auto topic creation. This will ensure latest
> >>> mapping
> >>> > is
> >>> > > > > used
> >>> > > > > > at
> >>> > > > > > > >> any
> >>> > > > > > > >> > > > time.
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >> The ability to get the complete mapping makes
> it
> >>> > simple
> >>> > > > to
> >>> > > > > > > reuse
> >>> > > > > > > >> the
> >>> > > > > > > >> > > > same
> >>> > > > > > > >> > > > >> interface in command line tools.
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya
> >>> Auradkar <
> >>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >>> Perhaps we discuss this during the next KIP
> >>> hangout?
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> I do see that a pluggable rack locator can be
> >>> useful
> >>> > > > but I
> >>> > > > > > do
> >>> > > > > > > >> see a
> >>> > > > > > > >> > > few
> >>> > > > > > > >> > > > >>> concerns:
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> - The RackLocator (as described in the
> >>> document),
> >>> > > > implies
> >>> > > > > > that
> >>> > > > > > > >> it
> >>> > > > > > > >> > can
> >>> > > > > > > >> > > > >>> discover rack information for any node in the
> >>> > cluster.
> >>> > > > How
> >>> > > > > > > does
> >>> > > > > > > >> it
> >>> > > > > > > >> > > deal
> >>> > > > > > > >> > > > >>> with rack location changes? For example, if I
> >>> moved
> >>> > > > broker
> >>> > > > > > id
> >>> > > > > > > >> (1)
> >>> > > > > > > >> > > from
> >>> > > > > > > >> > > > >>> rack
> >>> > > > > > > >> > > > >>> X to Y, I only have to start that broker with
> a
> >>> > newer
> >>> > > > rack
> >>> > > > > > > >> config.
> >>> > > > > > > >> > If
> >>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack
> >>> information at
> >>> > > > start
> >>> > > > > up
> >>> > > > > > > >> time,
> >>> > > > > > > >> > > any
> >>> > > > > > > >> > > > >>> change to a broker will require bouncing the
> >>> entire
> >>> > > > > cluster
> >>> > > > > > > >> since
> >>> > > > > > > >> > > > >>> createTopic requests can be sent to any node
> in
> >>> the
> >>> > > > > cluster.
> >>> > > > > > > >> > > > >>> For this reason it may be simpler to have each
> >>> node
> >>> > be
> >>> > > > > aware
> >>> > > > > > > of
> >>> > > > > > > >> its
> >>> > > > > > > >> > > own
> >>> > > > > > > >> > > > >>> rack and persist it in ZK during start up
> time.
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an
> external
> >>> > > service
> >>> > > > > > being
> >>> > > > > > > >> > > available
> >>> > > > > > > >> > > > >>> to
> >>> > > > > > > >> > > > >>> serve rack information.
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a couple of
> >>> other
> >>> > > > > systems
> >>> > > > > > > deal
> >>> > > > > > > >> > with
> >>> > > > > > > >> > > > >>> zone/rack awareness.
> >>> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
> >>> > > > > > > >> > > > >>> (Property File configuration)
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > >
> >>> > > > > > > >> >
> >>> > > > > > > >>
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> >>> > > > > > > >> > > > >>> (Dynamic inference)
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > >
> >>> > > > > > > >> >
> >>> > > > > > > >>
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> Voldemort does a static node -> zone
> assignment
> >>> > based
> >>> > > on
> >>> > > > > > > >> > > configuration.
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> Aditya
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
> >>> > > > > > > >> allenxwang@gmail.com
> >>> > > > > > > >> > >
> >>> > > > > > > >> > > > >>> wrote:
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>> > I would like to see if we can do both:
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to facilitate
> >>> > migration
> >>> > > > > with
> >>> > > > > > > >> > existing
> >>> > > > > > > >> > > > >>> > broker-rack mapping
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > - Make rack an optional property for broker.
> >>> If
> >>> > rack
> >>> > > > is
> >>> > > > > > > >> available
> >>> > > > > > > >> > > > from
> >>> > > > > > > >> > > > >>> > broker, treat it as source of truth. For
> users
> >>> > with
> >>> > > > > > existing
> >>> > > > > > > >> > > > >>> broker-rack
> >>> > > > > > > >> > > > >>> > mapping somewhere else, they can use the
> >>> pluggable
> >>> > > way
> >>> > > > > or
> >>> > > > > > > they
> >>> > > > > > > >> > can
> >>> > > > > > > >> > > > >>> transfer
> >>> > > > > > > >> > > > >>> > the mapping to the broker rack property.
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > One thing I am not sure is what happens at
> >>> rolling
> >>> > > > > upgrade
> >>> > > > > > > >> when
> >>> > > > > > > >> > we
> >>> > > > > > > >> > > > have
> >>> > > > > > > >> > > > >>> > rack as a broker property. For brokers with
> >>> older
> >>> > > > > version
> >>> > > > > > of
> >>> > > > > > > >> > Kafka,
> >>> > > > > > > >> > > > >>> will it
> >>> > > > > > > >> > > > >>> > cause problem for them? If so, is there any
> >>> > > > workaround?
> >>> > > > > I
> >>> > > > > > > also
> >>> > > > > > > >> > > think
> >>> > > > > > > >> > > > it
> >>> > > > > > > >> > > > >>> > would be better not to have rack in the
> >>> controller
> >>> > > > wire
> >>> > > > > > > >> protocol
> >>> > > > > > > >> > > but
> >>> > > > > > > >> > > > >>> not
> >>> > > > > > > >> > > > >>> > sure if it is achievable.
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > Thanks,
> >>> > > > > > > >> > > > >>> > Allen
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd
> Palino <
> >>> > > > > > > >> tpalino@gmail.com>
> >>> > > > > > > >> > > > >>> wrote:
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>> > > I tend to like the idea of a pluggable
> >>> locator.
> >>> > > For
> >>> > > > > > > >> example, we
> >>> > > > > > > >> > > > >>> already
> >>> > > > > > > >> > > > >>> > > have an interface for discovering
> >>> information
> >>> > > about
> >>> > > > > the
> >>> > > > > > > >> > physical
> >>> > > > > > > >> > > > >>> location
> >>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea of
> >>> having to
> >>> > > > > > maintain
> >>> > > > > > > >> data
> >>> > > > > > > >> > in
> >>> > > > > > > >> > > > >>> > multiple
> >>> > > > > > > >> > > > >>> > > places.
> >>> > > > > > > >> > > > >>> > >
> >>> > > > > > > >> > > > >>> > > -Todd
> >>> > > > > > > >> > > > >>> > >
> >>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya
> >>> > Auradkar <
> >>> > > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
> >>> > > > > > > >> > > > >>> > >
> >>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a
> >>> RackLocator
> >>> > > class
> >>> > > > > that
> >>> > > > > > > is
> >>> > > > > > > >> > > > pluggable
> >>> > > > > > > >> > > > >>> > seems
> >>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers to
> >>> > potentially
> >>> > > > > > non-ZK
> >>> > > > > > > >> > storage
> >>> > > > > > > >> > > > >>> for the
> >>> > > > > > > >> > > > >>> > > > rack info which I don't think is
> >>> necessary.
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in zk
> >>> under
> >>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> >>> > > > > > > >> > > > >>> > > > similar to other broker properties and
> >>> add a
> >>> > > > config
> >>> > > > > in
> >>> > > > > > > >> > > > KafkaConfig
> >>> > > > > > > >> > > > >>> > called
> >>> > > > > > > >> > > > >>> > > > "rack".
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> >>> > > > > > > >> > > "rack":
> >>> > > > > > > >> > > > >>> > "abc"}
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > > > Aditya
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen
> >>> Shapira
> >>> > <
> >>> > > > > > > >> > > gwen@confluent.io
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > > >>> > wrote:
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > > > > Hi,
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP
> for
> >>> > this.
> >>> > > > This
> >>> > > > > > is
> >>> > > > > > > >> super
> >>> > > > > > > >> > > > >>> important
> >>> > > > > > > >> > > > >>> > > for
> >>> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > Few questions:
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks
> as
> >>> > > > > possible"?
> >>> > > > > > > I'd
> >>> > > > > > > >> > want
> >>> > > > > > > >> > > to
> >>> > > > > > > >> > > > >>> > balance
> >>> > > > > > > >> > > > >>> > > > > between safety (more racks) and
> network
> >>> > > > > utilization
> >>> > > > > > > >> > (traffic
> >>> > > > > > > >> > > > >>> within a
> >>> > > > > > > >> > > > >>> > > > rack
> >>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch).
> One
> >>> > > replica
> >>> > > > > on
> >>> > > > > > a
> >>> > > > > > > >> > > different
> >>> > > > > > > >> > > > >>> rack
> >>> > > > > > > >> > > > >>> > > and
> >>> > > > > > > >> > > > >>> > > > > the rest on same rack (if possible)
> >>> sounds
> >>> > > > better
> >>> > > > > to
> >>> > > > > > > me.
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly
> >>> complex
> >>> > > > > compared
> >>> > > > > > to
> >>> > > > > > > >> > > adding a
> >>> > > > > > > >> > > > >>> > > > rack.number
> >>> > > > > > > >> > > > >>> > > > > property to the broker properties
> file.
> >>> Why
> >>> > do
> >>> > > > we
> >>> > > > > > want
> >>> > > > > > > >> > that?
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > Gwen
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM,
> Allen
> >>> > Wang <
> >>> > > > > > > >> > > > >>> allenxwang@gmail.com>
> >>> > > > > > > >> > > > >>> > > > wrote:
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware
> >>> > replica
> >>> > > > > > > >> assignment.
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > >
> >>> > > > > > > >> >
> >>> > > > > > > >>
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the isolation
> >>> > > provided
> >>> > > > by
> >>> > > > > > the
> >>> > > > > > > >> > racks
> >>> > > > > > > >> > > in
> >>> > > > > > > >> > > > >>> data
> >>> > > > > > > >> > > > >>> > > > center
> >>> > > > > > > >> > > > >>> > > > > > and distribute replicas to racks to
> >>> > provide
> >>> > > > > fault
> >>> > > > > > > >> > > tolerance.
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > > > Thanks,
> >>> > > > > > > >> > > > >>> > > > > > Allen
> >>> > > > > > > >> > > > >>> > > > > >
> >>> > > > > > > >> > > > >>> > > > >
> >>> > > > > > > >> > > > >>> > > >
> >>> > > > > > > >> > > > >>> > >
> >>> > > > > > > >> > > > >>> >
> >>> > > > > > > >> > > > >>>
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >>
> >>> > > > > > > >> > > > >
> >>> > > > > > > >> > > >
> >>> > > > > > > >> > >
> >>> > > > > > > >> >
> >>> > > > > > > >>
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Neha
> >>>
> >>
> >>
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
If there are no more comments I would like to call for a vote.


On Sun, Nov 15, 2015 at 10:08 PM, Allen Wang <al...@gmail.com> wrote:

> KIP is updated with more details and how to handle the situation where
> rack information is incomplete.
>
> In the situation where rack information is incomplete, but we want to
> continue with the assignment, I have suggested to ignore all rack
> information and fallback to original algorithm. The reason is explained
> below:
>
> The other options are to assume that the broker without the rack belong to
> its own unique rack, or they belong to one "default" rack. Either way we
> choose, it is highly likely to result in uneven number of brokers in racks,
> and it is quite possible that the "made up" racks will have much fewer
> number of brokers. As I explained in the KIP, uneven number of brokers in
> racks will lead to uneven distribution of replicas among brokers (even
> though the leader distribution is still even). The brokers in the rack that
> has fewer number of brokers will get more replicas per broker than brokers
> in other racks.
>
> Given this fact and the replica assignment produced will be incorrect
> anyway from rack aware point of view, ignoring all rack information and
> fallback to the original algorithm is not a bad choice since it will at
> least have a better guarantee of replica distribution.
>
> Also for command line tools it gives user a choice if for any reason they
> want to ignore rack information and fallback to the original algorithm.
>
>
> On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <al...@gmail.com> wrote:
>
>> I am busy with some time pressing issues for the last few days. I will
>> think about how the incomplete rack information will affect the balance and
>> update the KIP by early next week.
>>
>> Thanks,
>> Allen
>>
>>
>> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <ne...@confluent.io> wrote:
>>
>>> Few suggestions on improving the KIP
>>>
>>> *If some brokers have rack, and some do not, the algorithm will thrown an
>>> > exception. This is to prevent incorrect assignment caused by user
>>> error.*
>>>
>>>
>>> In the KIP, can you clearly state the user-facing behavior when some
>>> brokers have rack information and some don't. Which actions and requests
>>> will error out and how?
>>>
>>> *Even distribution of partition leadership among brokers*
>>>
>>>
>>> There is some information about arranging the sorted broker list
>>> interlaced
>>> with rack ids. Can you describe the changes to the current algorithm in a
>>> little more detail? How does this interlacing work if only a subset of
>>> brokers have the rack id configured? Does this still work if uneven # of
>>> brokers are assigned to each rack? It might work, I'm looking for more
>>> details on the changes, since it will affect the behavior seen by the
>>> user
>>> - imbalance on either the leaders or data or both.
>>>
>>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <aa...@linkedin.com>
>>> wrote:
>>>
>>> > I think this sounds reasonable. Anyone else have comments?
>>> >
>>> > Aditya
>>> >
>>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <al...@gmail.com>
>>> wrote:
>>> >
>>> > > During the discussion in the hangout, it was mentioned that it would
>>> be
>>> > > desirable that consumers know the rack information of the brokers so
>>> that
>>> > > they can consume from the broker in the same rack to reduce latency.
>>> As I
>>> > > understand this will only be beneficial if consumer can consume from
>>> any
>>> > > broker in ISR, which is not possible now.
>>> > >
>>> > > I suggest we skip the change to TMR. Once the change is made to
>>> consumer
>>> > to
>>> > > be able to consume from any broker in ISR, the rack information can
>>> be
>>> > > added to TMR.
>>> > >
>>> > > Another thing I want to confirm is  command line behavior. I think
>>> the
>>> > > desirable default behavior is to fail fast on command line for
>>> incomplete
>>> > > rack mapping. The error message can include further instruction that
>>> > tells
>>> > > the user to add an extra argument (like "--allow-partial-rackinfo")
>>> to
>>> > > suppress the error and do an imperfect rack aware assignment. If the
>>> > > default behavior is to allow incomplete mapping, the error can still
>>> be
>>> > > easily missed.
>>> > >
>>> > > The affected command line tools are TopicCommand and
>>> > > ReassignPartitionsCommand.
>>> > >
>>> > > Thanks,
>>> > > Allen
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
>>> > aauradkar@linkedin.com>
>>> > > wrote:
>>> > >
>>> > > > Hi Allen,
>>> > > >
>>> > > > For TopicMetadataResponse to understand version, you can bump up
>>> the
>>> > > > request version itself. Based on the version of the request, the
>>> > response
>>> > > > can be appropriately serialized. It shouldn't be a huge change. For
>>> > > > example: We went through something similar for ProduceRequest
>>> recently
>>> > (
>>> > > > https://reviews.apache.org/r/33378/)
>>> > > > I guess the reason protocol information is not included in the TMR
>>> is
>>> > > > because the topic itself is independent of any particular protocol
>>> (SSL
>>> > > vs
>>> > > > Plaintext). Having said that, I'm not sure we even need rack
>>> > information
>>> > > in
>>> > > > TMR. What usecase were you thinking of initially?
>>> > > >
>>> > > > For 1 - I'd be fine with adding an option to the command line tools
>>> > that
>>> > > > check rack assignment. For e.g. "--strict-assignment" or something
>>> > > similar.
>>> > > >
>>> > > > Aditya
>>> > > >
>>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <al...@gmail.com>
>>> > > wrote:
>>> > > >
>>> > > > > For 2 and 3, I have updated the KIP. Please take a look. One
>>> thing I
>>> > > have
>>> > > > > changed is removing the proposal to add rack to
>>> > TopicMetadataResponse.
>>> > > > The
>>> > > > > reason is that unlike UpdateMetadataRequest,
>>> TopicMetadataResponse
>>> > does
>>> > > > not
>>> > > > > understand version. I don't see a way to include rack without
>>> > breaking
>>> > > > old
>>> > > > > version of clients. That's probably why secure protocol is not
>>> > included
>>> > > > in
>>> > > > > the TopicMetadataResponse either. I think it will be a much
>>> bigger
>>> > > change
>>> > > > > to include rack in TopicMetadataResponse.
>>> > > > >
>>> > > > > For 1, my concern is that doing rack aware assignment without
>>> > complete
>>> > > > > broker to rack mapping will result in assignment that is not rack
>>> > aware
>>> > > > and
>>> > > > > fail to provide fault tolerance in the event of rack outage. This
>>> > kind
>>> > > of
>>> > > > > problem will be difficult to surface. And the cost of this
>>> problem is
>>> > > > high:
>>> > > > > you have to do partition reassignment if you are lucky to spot
>>> the
>>> > > > problem
>>> > > > > early on or face the consequence of data loss during real rack
>>> > outage.
>>> > > > >
>>> > > > > I do see the concern of fail-fast as it might also cause data
>>> loss if
>>> > > > > producer is not able produce the message due to topic creation
>>> > failure.
>>> > > > Is
>>> > > > > it feasible to treat dynamic topic creation and command tools
>>> > > > differently?
>>> > > > > We allow dynamic topic creation with incomplete broker-rack
>>> mapping
>>> > and
>>> > > > > fail fast in command line. Another option is to let user
>>> determine
>>> > the
>>> > > > > behavior for command line. For example, by default fail fast in
>>> > command
>>> > > > > line but allow incomplete broker-rack mapping if another switch
>>> is
>>> > > > > provided.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
>>> > > > > aauradkar@linkedin.com.invalid> wrote:
>>> > > > >
>>> > > > > > Hey Allen,
>>> > > > > >
>>> > > > > > 1. If we choose fail fast topic creation, we will have topic
>>> > creation
>>> > > > > > failures while upgrading the cluster. I really doubt we want
>>> this
>>> > > > > behavior.
>>> > > > > > Ideally, this should be invisible to clients of a cluster.
>>> > Currently,
>>> > > > > each
>>> > > > > > broker is effectively its own rack. So we probably can use the
>>> rack
>>> > > > > > information whenever possible but not make it a hard
>>> requirement.
>>> > To
>>> > > > > extend
>>> > > > > > Gwen's example, one badly configured broker should not degrade
>>> > topic
>>> > > > > > creation for the entire cluster.
>>> > > > > >
>>> > > > > > 2. Upgrade scenario - Can you add a section on the upgrade
>>> piece to
>>> > > > > confirm
>>> > > > > > that old clients will not see errors? I believe
>>> > > > > ZookeeperConsumerConnector
>>> > > > > > reads the Broker objects from ZK. I wanted to confirm that this
>>> > will
>>> > > > not
>>> > > > > > cause any problems.
>>> > > > > >
>>> > > > > > 3. Could you elaborate your proposed changes to the
>>> > > > UpdateMetadataRequest
>>> > > > > > in the "Public Interfaces" section? Personally, I find this
>>> format
>>> > > easy
>>> > > > > to
>>> > > > > > read in terms of wire protocol changes:
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
>>> > > > > >
>>> > > > > > Aditya
>>> > > > > >
>>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
>>> allenxwang@gmail.com>
>>> > > > > wrote:
>>> > > > > >
>>> > > > > > > KIP is updated include rack as an optional property for
>>> broker.
>>> > > > Please
>>> > > > > > take
>>> > > > > > > a look and let me know if more details are needed.
>>> > > > > > >
>>> > > > > > > For the case where some brokers have rack and some do not,
>>> the
>>> > > > current
>>> > > > > > KIP
>>> > > > > > > uses the fail-fast behavior. If there are concerns, we can
>>> > further
>>> > > > > > discuss
>>> > > > > > > this in the email thread or next hangout.
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
>>> > allenxwang@gmail.com
>>> > > >
>>> > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > That's a good question. I can think of three actions if the
>>> > rack
>>> > > > > > > > information is incomplete:
>>> > > > > > > >
>>> > > > > > > > 1. Treat the node without rack as if it is on its unique
>>> rack
>>> > > > > > > > 2. Disregard all rack information and fallback to current
>>> > > algorithm
>>> > > > > > > > 3. Fail-fast
>>> > > > > > > >
>>> > > > > > > > Now I think about it, one and three make more sense. The
>>> reason
>>> > > for
>>> > > > > > > > fail-fast is that user mistake for not providing the rack
>>> may
>>> > > never
>>> > > > > be
>>> > > > > > > > found if we tolerate that and the assignment may not be
>>> rack
>>> > > aware
>>> > > > as
>>> > > > > > the
>>> > > > > > > > user has expected and this creates debug problems when
>>> things
>>> > > fail.
>>> > > > > > > >
>>> > > > > > > > What do you think? If not fail-fast, is there anyway we can
>>> > make
>>> > > > the
>>> > > > > > user
>>> > > > > > > > error standing out?
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
>>> > > gwen@confluent.io>
>>> > > > > > > wrote:
>>> > > > > > > >
>>> > > > > > > >> Thanks! Just to clarify, when some brokers have rack
>>> > assignment
>>> > > > and
>>> > > > > > some
>>> > > > > > > >> don't, do we act like none of them have it? or like those
>>> > > without
>>> > > > > > > >> assignment are in their own rack?
>>> > > > > > > >>
>>> > > > > > > >> The first scenario is good when first setting up
>>> > rack-awareness,
>>> > > > but
>>> > > > > > the
>>> > > > > > > >> second makes more sense for on-going maintenance (I can
>>> > totally
>>> > > > see
>>> > > > > > > >> someone
>>> > > > > > > >> adding a node and forgetting to set the rack property, we
>>> > don't
>>> > > > want
>>> > > > > > > this
>>> > > > > > > >> to change behavior for anything except the new node).
>>> > > > > > > >>
>>> > > > > > > >> What do you think?
>>> > > > > > > >>
>>> > > > > > > >> Gwen
>>> > > > > > > >>
>>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
>>> > > > allenxwang@gmail.com>
>>> > > > > > > >> wrote:
>>> > > > > > > >>
>>> > > > > > > >> > For scenario 1:
>>> > > > > > > >> >
>>> > > > > > > >> > - Add the rack information to broker property file or
>>> > > > dynamically
>>> > > > > > set
>>> > > > > > > >> it in
>>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You would do
>>> > that
>>> > > > for
>>> > > > > > all
>>> > > > > > > >> > brokers and restart the brokers one by one.
>>> > > > > > > >> >
>>> > > > > > > >> > In this scenario, the complete broker to rack mapping
>>> may
>>> > not
>>> > > be
>>> > > > > > > >> available
>>> > > > > > > >> > until every broker is restarted. During that time we
>>> fall
>>> > back
>>> > > > to
>>> > > > > > > >> default
>>> > > > > > > >> > replica assignment algorithm.
>>> > > > > > > >> >
>>> > > > > > > >> > For scenario 2:
>>> > > > > > > >> >
>>> > > > > > > >> > - Add the rack information to broker property file or
>>> > > > dynamically
>>> > > > > > set
>>> > > > > > > >> it in
>>> > > > > > > >> > the wrapper code and start the broker.
>>> > > > > > > >> >
>>> > > > > > > >> >
>>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
>>> > > > gwen@confluent.io>
>>> > > > > > > >> wrote:
>>> > > > > > > >> >
>>> > > > > > > >> > > Can you clarify the workflow for the following
>>> scenarios:
>>> > > > > > > >> > >
>>> > > > > > > >> > > 1. I currently have 6 brokers and want to add rack
>>> > > information
>>> > > > > for
>>> > > > > > > >> each
>>> > > > > > > >> > > 2. I'm adding a new broker and I want to specify which
>>> > rack
>>> > > it
>>> > > > > > > >> belongs on
>>> > > > > > > >> > > while adding it.
>>> > > > > > > >> > >
>>> > > > > > > >> > > Thanks!
>>> > > > > > > >> > >
>>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
>>> > > > > allenxwang@gmail.com
>>> > > > > > >
>>> > > > > > > >> > wrote:
>>> > > > > > > >> > >
>>> > > > > > > >> > > > We discussed the KIP in the hangout today. The
>>> > > > recommendation
>>> > > > > is
>>> > > > > > > to
>>> > > > > > > >> > make
>>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For users
>>> with
>>> > > > > existing
>>> > > > > > > rack
>>> > > > > > > >> > > > information stored somewhere, they would need to
>>> > retrieve
>>> > > > the
>>> > > > > > > >> > information
>>> > > > > > > >> > > > at broker start up and dynamically set the rack
>>> > property,
>>> > > > > which
>>> > > > > > > can
>>> > > > > > > >> be
>>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker. There
>>> will
>>> > > be
>>> > > > no
>>> > > > > > > >> > interface
>>> > > > > > > >> > > or
>>> > > > > > > >> > > > pluggable implementation to retrieve the rack
>>> > information.
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > The assumption is that you always need to restart
>>> the
>>> > > broker
>>> > > > > to
>>> > > > > > > >> make a
>>> > > > > > > >> > > > change to the rack.
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > Once the rack becomes a broker property, it will be
>>> > > possible
>>> > > > > to
>>> > > > > > > make
>>> > > > > > > >> > rack
>>> > > > > > > >> > > > part of the meta data to help the consumer choose
>>> which
>>> > in
>>> > > > > sync
>>> > > > > > > >> replica
>>> > > > > > > >> > > to
>>> > > > > > > >> > > > consume from as part of the future consumer
>>> enhancement.
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > I will update the KIP.
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > Thanks,
>>> > > > > > > >> > > > Allen
>>> > > > > > > >> > > >
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
>>> > > > > > allenxwang@gmail.com>
>>> > > > > > > >> > wrote:
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP was
>>> not
>>> > > > > > discussed
>>> > > > > > > >> due
>>> > > > > > > >> > to
>>> > > > > > > >> > > > > time constraint.
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > > However, after hearing discussion of KIP-35, I
>>> have
>>> > the
>>> > > > > > feeling
>>> > > > > > > >> that
>>> > > > > > > >> > > > > incompatibility (caused by new broker property)
>>> > between
>>> > > > > > brokers
>>> > > > > > > >> with
>>> > > > > > > >> > > > > different versions  will be solved there. In
>>> addition,
>>> > > > > having
>>> > > > > > > >> stack
>>> > > > > > > >> > in
>>> > > > > > > >> > > > > broker property as meta data may also help
>>> consumers
>>> > in
>>> > > > the
>>> > > > > > > >> future.
>>> > > > > > > >> > So
>>> > > > > > > >> > > I
>>> > > > > > > >> > > > am
>>> > > > > > > >> > > > > open to adding stack property to broker.
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > > Hopefully we can discuss this in the next KIP
>>> hangout.
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
>>> > > > > > > allenxwang@gmail.com
>>> > > > > > > >> >
>>> > > > > > > >> > > > wrote:
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > >> Can you send me the information on the next KIP
>>> > > hangout?
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >> Currently the broker-rack mapping is not cached.
>>> In
>>> > > > > > KafkaApis,
>>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each time the
>>> > > mapping
>>> > > > > is
>>> > > > > > > >> needed
>>> > > > > > > >> > > for
>>> > > > > > > >> > > > >> auto topic creation. This will ensure latest
>>> mapping
>>> > is
>>> > > > > used
>>> > > > > > at
>>> > > > > > > >> any
>>> > > > > > > >> > > > time.
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >> The ability to get the complete mapping makes it
>>> > simple
>>> > > > to
>>> > > > > > > reuse
>>> > > > > > > >> the
>>> > > > > > > >> > > > same
>>> > > > > > > >> > > > >> interface in command line tools.
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya
>>> Auradkar <
>>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >>> Perhaps we discuss this during the next KIP
>>> hangout?
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> I do see that a pluggable rack locator can be
>>> useful
>>> > > > but I
>>> > > > > > do
>>> > > > > > > >> see a
>>> > > > > > > >> > > few
>>> > > > > > > >> > > > >>> concerns:
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> - The RackLocator (as described in the
>>> document),
>>> > > > implies
>>> > > > > > that
>>> > > > > > > >> it
>>> > > > > > > >> > can
>>> > > > > > > >> > > > >>> discover rack information for any node in the
>>> > cluster.
>>> > > > How
>>> > > > > > > does
>>> > > > > > > >> it
>>> > > > > > > >> > > deal
>>> > > > > > > >> > > > >>> with rack location changes? For example, if I
>>> moved
>>> > > > broker
>>> > > > > > id
>>> > > > > > > >> (1)
>>> > > > > > > >> > > from
>>> > > > > > > >> > > > >>> rack
>>> > > > > > > >> > > > >>> X to Y, I only have to start that broker with a
>>> > newer
>>> > > > rack
>>> > > > > > > >> config.
>>> > > > > > > >> > If
>>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack
>>> information at
>>> > > > start
>>> > > > > up
>>> > > > > > > >> time,
>>> > > > > > > >> > > any
>>> > > > > > > >> > > > >>> change to a broker will require bouncing the
>>> entire
>>> > > > > cluster
>>> > > > > > > >> since
>>> > > > > > > >> > > > >>> createTopic requests can be sent to any node in
>>> the
>>> > > > > cluster.
>>> > > > > > > >> > > > >>> For this reason it may be simpler to have each
>>> node
>>> > be
>>> > > > > aware
>>> > > > > > > of
>>> > > > > > > >> its
>>> > > > > > > >> > > own
>>> > > > > > > >> > > > >>> rack and persist it in ZK during start up time.
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an external
>>> > > service
>>> > > > > > being
>>> > > > > > > >> > > available
>>> > > > > > > >> > > > >>> to
>>> > > > > > > >> > > > >>> serve rack information.
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a couple of
>>> other
>>> > > > > systems
>>> > > > > > > deal
>>> > > > > > > >> > with
>>> > > > > > > >> > > > >>> zone/rack awareness.
>>> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
>>> > > > > > > >> > > > >>> (Property File configuration)
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > >
>>> > > > > > > >> > >
>>> > > > > > > >> >
>>> > > > > > > >>
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>>> > > > > > > >> > > > >>> (Dynamic inference)
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > >
>>> > > > > > > >> > >
>>> > > > > > > >> >
>>> > > > > > > >>
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> Voldemort does a static node -> zone assignment
>>> > based
>>> > > on
>>> > > > > > > >> > > configuration.
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> Aditya
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
>>> > > > > > > >> allenxwang@gmail.com
>>> > > > > > > >> > >
>>> > > > > > > >> > > > >>> wrote:
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>> > I would like to see if we can do both:
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to facilitate
>>> > migration
>>> > > > > with
>>> > > > > > > >> > existing
>>> > > > > > > >> > > > >>> > broker-rack mapping
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > - Make rack an optional property for broker.
>>> If
>>> > rack
>>> > > > is
>>> > > > > > > >> available
>>> > > > > > > >> > > > from
>>> > > > > > > >> > > > >>> > broker, treat it as source of truth. For users
>>> > with
>>> > > > > > existing
>>> > > > > > > >> > > > >>> broker-rack
>>> > > > > > > >> > > > >>> > mapping somewhere else, they can use the
>>> pluggable
>>> > > way
>>> > > > > or
>>> > > > > > > they
>>> > > > > > > >> > can
>>> > > > > > > >> > > > >>> transfer
>>> > > > > > > >> > > > >>> > the mapping to the broker rack property.
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > One thing I am not sure is what happens at
>>> rolling
>>> > > > > upgrade
>>> > > > > > > >> when
>>> > > > > > > >> > we
>>> > > > > > > >> > > > have
>>> > > > > > > >> > > > >>> > rack as a broker property. For brokers with
>>> older
>>> > > > > version
>>> > > > > > of
>>> > > > > > > >> > Kafka,
>>> > > > > > > >> > > > >>> will it
>>> > > > > > > >> > > > >>> > cause problem for them? If so, is there any
>>> > > > workaround?
>>> > > > > I
>>> > > > > > > also
>>> > > > > > > >> > > think
>>> > > > > > > >> > > > it
>>> > > > > > > >> > > > >>> > would be better not to have rack in the
>>> controller
>>> > > > wire
>>> > > > > > > >> protocol
>>> > > > > > > >> > > but
>>> > > > > > > >> > > > >>> not
>>> > > > > > > >> > > > >>> > sure if it is achievable.
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > Thanks,
>>> > > > > > > >> > > > >>> > Allen
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
>>> > > > > > > >> tpalino@gmail.com>
>>> > > > > > > >> > > > >>> wrote:
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>> > > I tend to like the idea of a pluggable
>>> locator.
>>> > > For
>>> > > > > > > >> example, we
>>> > > > > > > >> > > > >>> already
>>> > > > > > > >> > > > >>> > > have an interface for discovering
>>> information
>>> > > about
>>> > > > > the
>>> > > > > > > >> > physical
>>> > > > > > > >> > > > >>> location
>>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea of
>>> having to
>>> > > > > > maintain
>>> > > > > > > >> data
>>> > > > > > > >> > in
>>> > > > > > > >> > > > >>> > multiple
>>> > > > > > > >> > > > >>> > > places.
>>> > > > > > > >> > > > >>> > >
>>> > > > > > > >> > > > >>> > > -Todd
>>> > > > > > > >> > > > >>> > >
>>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya
>>> > Auradkar <
>>> > > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
>>> > > > > > > >> > > > >>> > >
>>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a
>>> RackLocator
>>> > > class
>>> > > > > that
>>> > > > > > > is
>>> > > > > > > >> > > > pluggable
>>> > > > > > > >> > > > >>> > seems
>>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers to
>>> > potentially
>>> > > > > > non-ZK
>>> > > > > > > >> > storage
>>> > > > > > > >> > > > >>> for the
>>> > > > > > > >> > > > >>> > > > rack info which I don't think is
>>> necessary.
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in zk
>>> under
>>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
>>> > > > > > > >> > > > >>> > > > similar to other broker properties and
>>> add a
>>> > > > config
>>> > > > > in
>>> > > > > > > >> > > > KafkaConfig
>>> > > > > > > >> > > > >>> > called
>>> > > > > > > >> > > > >>> > > > "rack".
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
>>> > > > > > > >> > > "rack":
>>> > > > > > > >> > > > >>> > "abc"}
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > > > Aditya
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen
>>> Shapira
>>> > <
>>> > > > > > > >> > > gwen@confluent.io
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > >>> > wrote:
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > > > > Hi,
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP for
>>> > this.
>>> > > > This
>>> > > > > > is
>>> > > > > > > >> super
>>> > > > > > > >> > > > >>> important
>>> > > > > > > >> > > > >>> > > for
>>> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > Few questions:
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as
>>> > > > > possible"?
>>> > > > > > > I'd
>>> > > > > > > >> > want
>>> > > > > > > >> > > to
>>> > > > > > > >> > > > >>> > balance
>>> > > > > > > >> > > > >>> > > > > between safety (more racks) and network
>>> > > > > utilization
>>> > > > > > > >> > (traffic
>>> > > > > > > >> > > > >>> within a
>>> > > > > > > >> > > > >>> > > > rack
>>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One
>>> > > replica
>>> > > > > on
>>> > > > > > a
>>> > > > > > > >> > > different
>>> > > > > > > >> > > > >>> rack
>>> > > > > > > >> > > > >>> > > and
>>> > > > > > > >> > > > >>> > > > > the rest on same rack (if possible)
>>> sounds
>>> > > > better
>>> > > > > to
>>> > > > > > > me.
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly
>>> complex
>>> > > > > compared
>>> > > > > > to
>>> > > > > > > >> > > adding a
>>> > > > > > > >> > > > >>> > > > rack.number
>>> > > > > > > >> > > > >>> > > > > property to the broker properties file.
>>> Why
>>> > do
>>> > > > we
>>> > > > > > want
>>> > > > > > > >> > that?
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > Gwen
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen
>>> > Wang <
>>> > > > > > > >> > > > >>> allenxwang@gmail.com>
>>> > > > > > > >> > > > >>> > > > wrote:
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware
>>> > replica
>>> > > > > > > >> assignment.
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > >
>>> > > > > > > >> > >
>>> > > > > > > >> >
>>> > > > > > > >>
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the isolation
>>> > > provided
>>> > > > by
>>> > > > > > the
>>> > > > > > > >> > racks
>>> > > > > > > >> > > in
>>> > > > > > > >> > > > >>> data
>>> > > > > > > >> > > > >>> > > > center
>>> > > > > > > >> > > > >>> > > > > > and distribute replicas to racks to
>>> > provide
>>> > > > > fault
>>> > > > > > > >> > > tolerance.
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > > > Thanks,
>>> > > > > > > >> > > > >>> > > > > > Allen
>>> > > > > > > >> > > > >>> > > > > >
>>> > > > > > > >> > > > >>> > > > >
>>> > > > > > > >> > > > >>> > > >
>>> > > > > > > >> > > > >>> > >
>>> > > > > > > >> > > > >>> >
>>> > > > > > > >> > > > >>>
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >>
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > >
>>> > > > > > > >> > >
>>> > > > > > > >> >
>>> > > > > > > >>
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Neha
>>>
>>
>>
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
KIP is updated with more details and how to handle the situation where rack
information is incomplete.

In the situation where rack information is incomplete, but we want to
continue with the assignment, I have suggested to ignore all rack
information and fallback to original algorithm. The reason is explained
below:

The other options are to assume that the broker without the rack belong to
its own unique rack, or they belong to one "default" rack. Either way we
choose, it is highly likely to result in uneven number of brokers in racks,
and it is quite possible that the "made up" racks will have much fewer
number of brokers. As I explained in the KIP, uneven number of brokers in
racks will lead to uneven distribution of replicas among brokers (even
though the leader distribution is still even). The brokers in the rack that
has fewer number of brokers will get more replicas per broker than brokers
in other racks.

Given this fact and the replica assignment produced will be incorrect
anyway from rack aware point of view, ignoring all rack information and
fallback to the original algorithm is not a bad choice since it will at
least have a better guarantee of replica distribution.

Also for command line tools it gives user a choice if for any reason they
want to ignore rack information and fallback to the original algorithm.


On Tue, Nov 10, 2015 at 9:04 AM, Allen Wang <al...@gmail.com> wrote:

> I am busy with some time pressing issues for the last few days. I will
> think about how the incomplete rack information will affect the balance and
> update the KIP by early next week.
>
> Thanks,
> Allen
>
>
> On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <ne...@confluent.io> wrote:
>
>> Few suggestions on improving the KIP
>>
>> *If some brokers have rack, and some do not, the algorithm will thrown an
>> > exception. This is to prevent incorrect assignment caused by user
>> error.*
>>
>>
>> In the KIP, can you clearly state the user-facing behavior when some
>> brokers have rack information and some don't. Which actions and requests
>> will error out and how?
>>
>> *Even distribution of partition leadership among brokers*
>>
>>
>> There is some information about arranging the sorted broker list
>> interlaced
>> with rack ids. Can you describe the changes to the current algorithm in a
>> little more detail? How does this interlacing work if only a subset of
>> brokers have the rack id configured? Does this still work if uneven # of
>> brokers are assigned to each rack? It might work, I'm looking for more
>> details on the changes, since it will affect the behavior seen by the user
>> - imbalance on either the leaders or data or both.
>>
>> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <aa...@linkedin.com>
>> wrote:
>>
>> > I think this sounds reasonable. Anyone else have comments?
>> >
>> > Aditya
>> >
>> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <al...@gmail.com>
>> wrote:
>> >
>> > > During the discussion in the hangout, it was mentioned that it would
>> be
>> > > desirable that consumers know the rack information of the brokers so
>> that
>> > > they can consume from the broker in the same rack to reduce latency.
>> As I
>> > > understand this will only be beneficial if consumer can consume from
>> any
>> > > broker in ISR, which is not possible now.
>> > >
>> > > I suggest we skip the change to TMR. Once the change is made to
>> consumer
>> > to
>> > > be able to consume from any broker in ISR, the rack information can be
>> > > added to TMR.
>> > >
>> > > Another thing I want to confirm is  command line behavior. I think the
>> > > desirable default behavior is to fail fast on command line for
>> incomplete
>> > > rack mapping. The error message can include further instruction that
>> > tells
>> > > the user to add an extra argument (like "--allow-partial-rackinfo") to
>> > > suppress the error and do an imperfect rack aware assignment. If the
>> > > default behavior is to allow incomplete mapping, the error can still
>> be
>> > > easily missed.
>> > >
>> > > The affected command line tools are TopicCommand and
>> > > ReassignPartitionsCommand.
>> > >
>> > > Thanks,
>> > > Allen
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
>> > aauradkar@linkedin.com>
>> > > wrote:
>> > >
>> > > > Hi Allen,
>> > > >
>> > > > For TopicMetadataResponse to understand version, you can bump up the
>> > > > request version itself. Based on the version of the request, the
>> > response
>> > > > can be appropriately serialized. It shouldn't be a huge change. For
>> > > > example: We went through something similar for ProduceRequest
>> recently
>> > (
>> > > > https://reviews.apache.org/r/33378/)
>> > > > I guess the reason protocol information is not included in the TMR
>> is
>> > > > because the topic itself is independent of any particular protocol
>> (SSL
>> > > vs
>> > > > Plaintext). Having said that, I'm not sure we even need rack
>> > information
>> > > in
>> > > > TMR. What usecase were you thinking of initially?
>> > > >
>> > > > For 1 - I'd be fine with adding an option to the command line tools
>> > that
>> > > > check rack assignment. For e.g. "--strict-assignment" or something
>> > > similar.
>> > > >
>> > > > Aditya
>> > > >
>> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <al...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > For 2 and 3, I have updated the KIP. Please take a look. One
>> thing I
>> > > have
>> > > > > changed is removing the proposal to add rack to
>> > TopicMetadataResponse.
>> > > > The
>> > > > > reason is that unlike UpdateMetadataRequest, TopicMetadataResponse
>> > does
>> > > > not
>> > > > > understand version. I don't see a way to include rack without
>> > breaking
>> > > > old
>> > > > > version of clients. That's probably why secure protocol is not
>> > included
>> > > > in
>> > > > > the TopicMetadataResponse either. I think it will be a much bigger
>> > > change
>> > > > > to include rack in TopicMetadataResponse.
>> > > > >
>> > > > > For 1, my concern is that doing rack aware assignment without
>> > complete
>> > > > > broker to rack mapping will result in assignment that is not rack
>> > aware
>> > > > and
>> > > > > fail to provide fault tolerance in the event of rack outage. This
>> > kind
>> > > of
>> > > > > problem will be difficult to surface. And the cost of this
>> problem is
>> > > > high:
>> > > > > you have to do partition reassignment if you are lucky to spot the
>> > > > problem
>> > > > > early on or face the consequence of data loss during real rack
>> > outage.
>> > > > >
>> > > > > I do see the concern of fail-fast as it might also cause data
>> loss if
>> > > > > producer is not able produce the message due to topic creation
>> > failure.
>> > > > Is
>> > > > > it feasible to treat dynamic topic creation and command tools
>> > > > differently?
>> > > > > We allow dynamic topic creation with incomplete broker-rack
>> mapping
>> > and
>> > > > > fail fast in command line. Another option is to let user determine
>> > the
>> > > > > behavior for command line. For example, by default fail fast in
>> > command
>> > > > > line but allow incomplete broker-rack mapping if another switch is
>> > > > > provided.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
>> > > > > aauradkar@linkedin.com.invalid> wrote:
>> > > > >
>> > > > > > Hey Allen,
>> > > > > >
>> > > > > > 1. If we choose fail fast topic creation, we will have topic
>> > creation
>> > > > > > failures while upgrading the cluster. I really doubt we want
>> this
>> > > > > behavior.
>> > > > > > Ideally, this should be invisible to clients of a cluster.
>> > Currently,
>> > > > > each
>> > > > > > broker is effectively its own rack. So we probably can use the
>> rack
>> > > > > > information whenever possible but not make it a hard
>> requirement.
>> > To
>> > > > > extend
>> > > > > > Gwen's example, one badly configured broker should not degrade
>> > topic
>> > > > > > creation for the entire cluster.
>> > > > > >
>> > > > > > 2. Upgrade scenario - Can you add a section on the upgrade
>> piece to
>> > > > > confirm
>> > > > > > that old clients will not see errors? I believe
>> > > > > ZookeeperConsumerConnector
>> > > > > > reads the Broker objects from ZK. I wanted to confirm that this
>> > will
>> > > > not
>> > > > > > cause any problems.
>> > > > > >
>> > > > > > 3. Could you elaborate your proposed changes to the
>> > > > UpdateMetadataRequest
>> > > > > > in the "Public Interfaces" section? Personally, I find this
>> format
>> > > easy
>> > > > > to
>> > > > > > read in terms of wire protocol changes:
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
>> > > > > >
>> > > > > > Aditya
>> > > > > >
>> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
>> allenxwang@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > KIP is updated include rack as an optional property for
>> broker.
>> > > > Please
>> > > > > > take
>> > > > > > > a look and let me know if more details are needed.
>> > > > > > >
>> > > > > > > For the case where some brokers have rack and some do not, the
>> > > > current
>> > > > > > KIP
>> > > > > > > uses the fail-fast behavior. If there are concerns, we can
>> > further
>> > > > > > discuss
>> > > > > > > this in the email thread or next hangout.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
>> > allenxwang@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > That's a good question. I can think of three actions if the
>> > rack
>> > > > > > > > information is incomplete:
>> > > > > > > >
>> > > > > > > > 1. Treat the node without rack as if it is on its unique
>> rack
>> > > > > > > > 2. Disregard all rack information and fallback to current
>> > > algorithm
>> > > > > > > > 3. Fail-fast
>> > > > > > > >
>> > > > > > > > Now I think about it, one and three make more sense. The
>> reason
>> > > for
>> > > > > > > > fail-fast is that user mistake for not providing the rack
>> may
>> > > never
>> > > > > be
>> > > > > > > > found if we tolerate that and the assignment may not be rack
>> > > aware
>> > > > as
>> > > > > > the
>> > > > > > > > user has expected and this creates debug problems when
>> things
>> > > fail.
>> > > > > > > >
>> > > > > > > > What do you think? If not fail-fast, is there anyway we can
>> > make
>> > > > the
>> > > > > > user
>> > > > > > > > error standing out?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
>> > > gwen@confluent.io>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > >> Thanks! Just to clarify, when some brokers have rack
>> > assignment
>> > > > and
>> > > > > > some
>> > > > > > > >> don't, do we act like none of them have it? or like those
>> > > without
>> > > > > > > >> assignment are in their own rack?
>> > > > > > > >>
>> > > > > > > >> The first scenario is good when first setting up
>> > rack-awareness,
>> > > > but
>> > > > > > the
>> > > > > > > >> second makes more sense for on-going maintenance (I can
>> > totally
>> > > > see
>> > > > > > > >> someone
>> > > > > > > >> adding a node and forgetting to set the rack property, we
>> > don't
>> > > > want
>> > > > > > > this
>> > > > > > > >> to change behavior for anything except the new node).
>> > > > > > > >>
>> > > > > > > >> What do you think?
>> > > > > > > >>
>> > > > > > > >> Gwen
>> > > > > > > >>
>> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
>> > > > allenxwang@gmail.com>
>> > > > > > > >> wrote:
>> > > > > > > >>
>> > > > > > > >> > For scenario 1:
>> > > > > > > >> >
>> > > > > > > >> > - Add the rack information to broker property file or
>> > > > dynamically
>> > > > > > set
>> > > > > > > >> it in
>> > > > > > > >> > the wrapper code to bootstrap Kafka server. You would do
>> > that
>> > > > for
>> > > > > > all
>> > > > > > > >> > brokers and restart the brokers one by one.
>> > > > > > > >> >
>> > > > > > > >> > In this scenario, the complete broker to rack mapping may
>> > not
>> > > be
>> > > > > > > >> available
>> > > > > > > >> > until every broker is restarted. During that time we fall
>> > back
>> > > > to
>> > > > > > > >> default
>> > > > > > > >> > replica assignment algorithm.
>> > > > > > > >> >
>> > > > > > > >> > For scenario 2:
>> > > > > > > >> >
>> > > > > > > >> > - Add the rack information to broker property file or
>> > > > dynamically
>> > > > > > set
>> > > > > > > >> it in
>> > > > > > > >> > the wrapper code and start the broker.
>> > > > > > > >> >
>> > > > > > > >> >
>> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
>> > > > gwen@confluent.io>
>> > > > > > > >> wrote:
>> > > > > > > >> >
>> > > > > > > >> > > Can you clarify the workflow for the following
>> scenarios:
>> > > > > > > >> > >
>> > > > > > > >> > > 1. I currently have 6 brokers and want to add rack
>> > > information
>> > > > > for
>> > > > > > > >> each
>> > > > > > > >> > > 2. I'm adding a new broker and I want to specify which
>> > rack
>> > > it
>> > > > > > > >> belongs on
>> > > > > > > >> > > while adding it.
>> > > > > > > >> > >
>> > > > > > > >> > > Thanks!
>> > > > > > > >> > >
>> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
>> > > > > allenxwang@gmail.com
>> > > > > > >
>> > > > > > > >> > wrote:
>> > > > > > > >> > >
>> > > > > > > >> > > > We discussed the KIP in the hangout today. The
>> > > > recommendation
>> > > > > is
>> > > > > > > to
>> > > > > > > >> > make
>> > > > > > > >> > > > rack as a broker property in ZooKeeper. For users
>> with
>> > > > > existing
>> > > > > > > rack
>> > > > > > > >> > > > information stored somewhere, they would need to
>> > retrieve
>> > > > the
>> > > > > > > >> > information
>> > > > > > > >> > > > at broker start up and dynamically set the rack
>> > property,
>> > > > > which
>> > > > > > > can
>> > > > > > > >> be
>> > > > > > > >> > > > implemented as a wrapper to bootstrap broker. There
>> will
>> > > be
>> > > > no
>> > > > > > > >> > interface
>> > > > > > > >> > > or
>> > > > > > > >> > > > pluggable implementation to retrieve the rack
>> > information.
>> > > > > > > >> > > >
>> > > > > > > >> > > > The assumption is that you always need to restart the
>> > > broker
>> > > > > to
>> > > > > > > >> make a
>> > > > > > > >> > > > change to the rack.
>> > > > > > > >> > > >
>> > > > > > > >> > > > Once the rack becomes a broker property, it will be
>> > > possible
>> > > > > to
>> > > > > > > make
>> > > > > > > >> > rack
>> > > > > > > >> > > > part of the meta data to help the consumer choose
>> which
>> > in
>> > > > > sync
>> > > > > > > >> replica
>> > > > > > > >> > > to
>> > > > > > > >> > > > consume from as part of the future consumer
>> enhancement.
>> > > > > > > >> > > >
>> > > > > > > >> > > > I will update the KIP.
>> > > > > > > >> > > >
>> > > > > > > >> > > > Thanks,
>> > > > > > > >> > > > Allen
>> > > > > > > >> > > >
>> > > > > > > >> > > >
>> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
>> > > > > > allenxwang@gmail.com>
>> > > > > > > >> > wrote:
>> > > > > > > >> > > >
>> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP was
>> not
>> > > > > > discussed
>> > > > > > > >> due
>> > > > > > > >> > to
>> > > > > > > >> > > > > time constraint.
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > However, after hearing discussion of KIP-35, I have
>> > the
>> > > > > > feeling
>> > > > > > > >> that
>> > > > > > > >> > > > > incompatibility (caused by new broker property)
>> > between
>> > > > > > brokers
>> > > > > > > >> with
>> > > > > > > >> > > > > different versions  will be solved there. In
>> addition,
>> > > > > having
>> > > > > > > >> stack
>> > > > > > > >> > in
>> > > > > > > >> > > > > broker property as meta data may also help
>> consumers
>> > in
>> > > > the
>> > > > > > > >> future.
>> > > > > > > >> > So
>> > > > > > > >> > > I
>> > > > > > > >> > > > am
>> > > > > > > >> > > > > open to adding stack property to broker.
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > Hopefully we can discuss this in the next KIP
>> hangout.
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
>> > > > > > > allenxwang@gmail.com
>> > > > > > > >> >
>> > > > > > > >> > > > wrote:
>> > > > > > > >> > > > >
>> > > > > > > >> > > > >> Can you send me the information on the next KIP
>> > > hangout?
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >> Currently the broker-rack mapping is not cached.
>> In
>> > > > > > KafkaApis,
>> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each time the
>> > > mapping
>> > > > > is
>> > > > > > > >> needed
>> > > > > > > >> > > for
>> > > > > > > >> > > > >> auto topic creation. This will ensure latest
>> mapping
>> > is
>> > > > > used
>> > > > > > at
>> > > > > > > >> any
>> > > > > > > >> > > > time.
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >> The ability to get the complete mapping makes it
>> > simple
>> > > > to
>> > > > > > > reuse
>> > > > > > > >> the
>> > > > > > > >> > > > same
>> > > > > > > >> > > > >> interface in command line tools.
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar
>> <
>> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >>> Perhaps we discuss this during the next KIP
>> hangout?
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> I do see that a pluggable rack locator can be
>> useful
>> > > > but I
>> > > > > > do
>> > > > > > > >> see a
>> > > > > > > >> > > few
>> > > > > > > >> > > > >>> concerns:
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> - The RackLocator (as described in the document),
>> > > > implies
>> > > > > > that
>> > > > > > > >> it
>> > > > > > > >> > can
>> > > > > > > >> > > > >>> discover rack information for any node in the
>> > cluster.
>> > > > How
>> > > > > > > does
>> > > > > > > >> it
>> > > > > > > >> > > deal
>> > > > > > > >> > > > >>> with rack location changes? For example, if I
>> moved
>> > > > broker
>> > > > > > id
>> > > > > > > >> (1)
>> > > > > > > >> > > from
>> > > > > > > >> > > > >>> rack
>> > > > > > > >> > > > >>> X to Y, I only have to start that broker with a
>> > newer
>> > > > rack
>> > > > > > > >> config.
>> > > > > > > >> > If
>> > > > > > > >> > > > >>> RackLocator discovers broker -> rack information
>> at
>> > > > start
>> > > > > up
>> > > > > > > >> time,
>> > > > > > > >> > > any
>> > > > > > > >> > > > >>> change to a broker will require bouncing the
>> entire
>> > > > > cluster
>> > > > > > > >> since
>> > > > > > > >> > > > >>> createTopic requests can be sent to any node in
>> the
>> > > > > cluster.
>> > > > > > > >> > > > >>> For this reason it may be simpler to have each
>> node
>> > be
>> > > > > aware
>> > > > > > > of
>> > > > > > > >> its
>> > > > > > > >> > > own
>> > > > > > > >> > > > >>> rack and persist it in ZK during start up time.
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an external
>> > > service
>> > > > > > being
>> > > > > > > >> > > available
>> > > > > > > >> > > > >>> to
>> > > > > > > >> > > > >>> serve rack information.
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> Out of curiosity, I looked up how a couple of
>> other
>> > > > > systems
>> > > > > > > deal
>> > > > > > > >> > with
>> > > > > > > >> > > > >>> zone/rack awareness.
>> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
>> > > > > > > >> > > > >>> (Property File configuration)
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > >
>> > > > > > > >> > >
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>> > > > > > > >> > > > >>> (Dynamic inference)
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > >
>> > > > > > > >> > >
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> Voldemort does a static node -> zone assignment
>> > based
>> > > on
>> > > > > > > >> > > configuration.
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> Aditya
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
>> > > > > > > >> allenxwang@gmail.com
>> > > > > > > >> > >
>> > > > > > > >> > > > >>> wrote:
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>> > I would like to see if we can do both:
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > - Make RackLocator pluggable to facilitate
>> > migration
>> > > > > with
>> > > > > > > >> > existing
>> > > > > > > >> > > > >>> > broker-rack mapping
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > - Make rack an optional property for broker. If
>> > rack
>> > > > is
>> > > > > > > >> available
>> > > > > > > >> > > > from
>> > > > > > > >> > > > >>> > broker, treat it as source of truth. For users
>> > with
>> > > > > > existing
>> > > > > > > >> > > > >>> broker-rack
>> > > > > > > >> > > > >>> > mapping somewhere else, they can use the
>> pluggable
>> > > way
>> > > > > or
>> > > > > > > they
>> > > > > > > >> > can
>> > > > > > > >> > > > >>> transfer
>> > > > > > > >> > > > >>> > the mapping to the broker rack property.
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > One thing I am not sure is what happens at
>> rolling
>> > > > > upgrade
>> > > > > > > >> when
>> > > > > > > >> > we
>> > > > > > > >> > > > have
>> > > > > > > >> > > > >>> > rack as a broker property. For brokers with
>> older
>> > > > > version
>> > > > > > of
>> > > > > > > >> > Kafka,
>> > > > > > > >> > > > >>> will it
>> > > > > > > >> > > > >>> > cause problem for them? If so, is there any
>> > > > workaround?
>> > > > > I
>> > > > > > > also
>> > > > > > > >> > > think
>> > > > > > > >> > > > it
>> > > > > > > >> > > > >>> > would be better not to have rack in the
>> controller
>> > > > wire
>> > > > > > > >> protocol
>> > > > > > > >> > > but
>> > > > > > > >> > > > >>> not
>> > > > > > > >> > > > >>> > sure if it is achievable.
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > Thanks,
>> > > > > > > >> > > > >>> > Allen
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
>> > > > > > > >> tpalino@gmail.com>
>> > > > > > > >> > > > >>> wrote:
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>> > > I tend to like the idea of a pluggable
>> locator.
>> > > For
>> > > > > > > >> example, we
>> > > > > > > >> > > > >>> already
>> > > > > > > >> > > > >>> > > have an interface for discovering information
>> > > about
>> > > > > the
>> > > > > > > >> > physical
>> > > > > > > >> > > > >>> location
>> > > > > > > >> > > > >>> > > of servers. I don't relish the idea of
>> having to
>> > > > > > maintain
>> > > > > > > >> data
>> > > > > > > >> > in
>> > > > > > > >> > > > >>> > multiple
>> > > > > > > >> > > > >>> > > places.
>> > > > > > > >> > > > >>> > >
>> > > > > > > >> > > > >>> > > -Todd
>> > > > > > > >> > > > >>> > >
>> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya
>> > Auradkar <
>> > > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
>> > > > > > > >> > > > >>> > >
>> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > > > I agree with Gwen that having a RackLocator
>> > > class
>> > > > > that
>> > > > > > > is
>> > > > > > > >> > > > pluggable
>> > > > > > > >> > > > >>> > seems
>> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers to
>> > potentially
>> > > > > > non-ZK
>> > > > > > > >> > storage
>> > > > > > > >> > > > >>> for the
>> > > > > > > >> > > > >>> > > > rack info which I don't think is necessary.
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in zk
>> under
>> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
>> > > > > > > >> > > > >>> > > > similar to other broker properties and add
>> a
>> > > > config
>> > > > > in
>> > > > > > > >> > > > KafkaConfig
>> > > > > > > >> > > > >>> > called
>> > > > > > > >> > > > >>> > > > "rack".
>> > > > > > > >> > > > >>> > > >
>> > > > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
>> > > > > > > >> > > "rack":
>> > > > > > > >> > > > >>> > "abc"}
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > > > Aditya
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen
>> Shapira
>> > <
>> > > > > > > >> > > gwen@confluent.io
>> > > > > > > >> > > > >
>> > > > > > > >> > > > >>> > wrote:
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > > > > Hi,
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP for
>> > this.
>> > > > This
>> > > > > > is
>> > > > > > > >> super
>> > > > > > > >> > > > >>> important
>> > > > > > > >> > > > >>> > > for
>> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > Few questions:
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as
>> > > > > possible"?
>> > > > > > > I'd
>> > > > > > > >> > want
>> > > > > > > >> > > to
>> > > > > > > >> > > > >>> > balance
>> > > > > > > >> > > > >>> > > > > between safety (more racks) and network
>> > > > > utilization
>> > > > > > > >> > (traffic
>> > > > > > > >> > > > >>> within a
>> > > > > > > >> > > > >>> > > > rack
>> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One
>> > > replica
>> > > > > on
>> > > > > > a
>> > > > > > > >> > > different
>> > > > > > > >> > > > >>> rack
>> > > > > > > >> > > > >>> > > and
>> > > > > > > >> > > > >>> > > > > the rest on same rack (if possible)
>> sounds
>> > > > better
>> > > > > to
>> > > > > > > me.
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly
>> complex
>> > > > > compared
>> > > > > > to
>> > > > > > > >> > > adding a
>> > > > > > > >> > > > >>> > > > rack.number
>> > > > > > > >> > > > >>> > > > > property to the broker properties file.
>> Why
>> > do
>> > > > we
>> > > > > > want
>> > > > > > > >> > that?
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > Gwen
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen
>> > Wang <
>> > > > > > > >> > > > >>> allenxwang@gmail.com>
>> > > > > > > >> > > > >>> > > > wrote:
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware
>> > replica
>> > > > > > > >> assignment.
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > >
>> > > > > > > >> > >
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > > The goal is to utilize the isolation
>> > > provided
>> > > > by
>> > > > > > the
>> > > > > > > >> > racks
>> > > > > > > >> > > in
>> > > > > > > >> > > > >>> data
>> > > > > > > >> > > > >>> > > > center
>> > > > > > > >> > > > >>> > > > > > and distribute replicas to racks to
>> > provide
>> > > > > fault
>> > > > > > > >> > > tolerance.
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > > Comments are welcome.
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > > > Thanks,
>> > > > > > > >> > > > >>> > > > > > Allen
>> > > > > > > >> > > > >>> > > > > >
>> > > > > > > >> > > > >>> > > > >
>> > > > > > > >> > > > >>> > > >
>> > > > > > > >> > > > >>> > >
>> > > > > > > >> > > > >>> >
>> > > > > > > >> > > > >>>
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >>
>> > > > > > > >> > > > >
>> > > > > > > >> > > >
>> > > > > > > >> > >
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Thanks,
>> Neha
>>
>
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
I am busy with some time pressing issues for the last few days. I will
think about how the incomplete rack information will affect the balance and
update the KIP by early next week.

Thanks,
Allen


On Tue, Nov 3, 2015 at 9:03 AM, Neha Narkhede <ne...@confluent.io> wrote:

> Few suggestions on improving the KIP
>
> *If some brokers have rack, and some do not, the algorithm will thrown an
> > exception. This is to prevent incorrect assignment caused by user error.*
>
>
> In the KIP, can you clearly state the user-facing behavior when some
> brokers have rack information and some don't. Which actions and requests
> will error out and how?
>
> *Even distribution of partition leadership among brokers*
>
>
> There is some information about arranging the sorted broker list interlaced
> with rack ids. Can you describe the changes to the current algorithm in a
> little more detail? How does this interlacing work if only a subset of
> brokers have the rack id configured? Does this still work if uneven # of
> brokers are assigned to each rack? It might work, I'm looking for more
> details on the changes, since it will affect the behavior seen by the user
> - imbalance on either the leaders or data or both.
>
> On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <aa...@linkedin.com>
> wrote:
>
> > I think this sounds reasonable. Anyone else have comments?
> >
> > Aditya
> >
> > On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > During the discussion in the hangout, it was mentioned that it would be
> > > desirable that consumers know the rack information of the brokers so
> that
> > > they can consume from the broker in the same rack to reduce latency.
> As I
> > > understand this will only be beneficial if consumer can consume from
> any
> > > broker in ISR, which is not possible now.
> > >
> > > I suggest we skip the change to TMR. Once the change is made to
> consumer
> > to
> > > be able to consume from any broker in ISR, the rack information can be
> > > added to TMR.
> > >
> > > Another thing I want to confirm is  command line behavior. I think the
> > > desirable default behavior is to fail fast on command line for
> incomplete
> > > rack mapping. The error message can include further instruction that
> > tells
> > > the user to add an extra argument (like "--allow-partial-rackinfo") to
> > > suppress the error and do an imperfect rack aware assignment. If the
> > > default behavior is to allow incomplete mapping, the error can still be
> > > easily missed.
> > >
> > > The affected command line tools are TopicCommand and
> > > ReassignPartitionsCommand.
> > >
> > > Thanks,
> > > Allen
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> > aauradkar@linkedin.com>
> > > wrote:
> > >
> > > > Hi Allen,
> > > >
> > > > For TopicMetadataResponse to understand version, you can bump up the
> > > > request version itself. Based on the version of the request, the
> > response
> > > > can be appropriately serialized. It shouldn't be a huge change. For
> > > > example: We went through something similar for ProduceRequest
> recently
> > (
> > > > https://reviews.apache.org/r/33378/)
> > > > I guess the reason protocol information is not included in the TMR is
> > > > because the topic itself is independent of any particular protocol
> (SSL
> > > vs
> > > > Plaintext). Having said that, I'm not sure we even need rack
> > information
> > > in
> > > > TMR. What usecase were you thinking of initially?
> > > >
> > > > For 1 - I'd be fine with adding an option to the command line tools
> > that
> > > > check rack assignment. For e.g. "--strict-assignment" or something
> > > similar.
> > > >
> > > > Aditya
> > > >
> > > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <al...@gmail.com>
> > > wrote:
> > > >
> > > > > For 2 and 3, I have updated the KIP. Please take a look. One thing
> I
> > > have
> > > > > changed is removing the proposal to add rack to
> > TopicMetadataResponse.
> > > > The
> > > > > reason is that unlike UpdateMetadataRequest, TopicMetadataResponse
> > does
> > > > not
> > > > > understand version. I don't see a way to include rack without
> > breaking
> > > > old
> > > > > version of clients. That's probably why secure protocol is not
> > included
> > > > in
> > > > > the TopicMetadataResponse either. I think it will be a much bigger
> > > change
> > > > > to include rack in TopicMetadataResponse.
> > > > >
> > > > > For 1, my concern is that doing rack aware assignment without
> > complete
> > > > > broker to rack mapping will result in assignment that is not rack
> > aware
> > > > and
> > > > > fail to provide fault tolerance in the event of rack outage. This
> > kind
> > > of
> > > > > problem will be difficult to surface. And the cost of this problem
> is
> > > > high:
> > > > > you have to do partition reassignment if you are lucky to spot the
> > > > problem
> > > > > early on or face the consequence of data loss during real rack
> > outage.
> > > > >
> > > > > I do see the concern of fail-fast as it might also cause data loss
> if
> > > > > producer is not able produce the message due to topic creation
> > failure.
> > > > Is
> > > > > it feasible to treat dynamic topic creation and command tools
> > > > differently?
> > > > > We allow dynamic topic creation with incomplete broker-rack mapping
> > and
> > > > > fail fast in command line. Another option is to let user determine
> > the
> > > > > behavior for command line. For example, by default fail fast in
> > command
> > > > > line but allow incomplete broker-rack mapping if another switch is
> > > > > provided.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> > > > > aauradkar@linkedin.com.invalid> wrote:
> > > > >
> > > > > > Hey Allen,
> > > > > >
> > > > > > 1. If we choose fail fast topic creation, we will have topic
> > creation
> > > > > > failures while upgrading the cluster. I really doubt we want this
> > > > > behavior.
> > > > > > Ideally, this should be invisible to clients of a cluster.
> > Currently,
> > > > > each
> > > > > > broker is effectively its own rack. So we probably can use the
> rack
> > > > > > information whenever possible but not make it a hard requirement.
> > To
> > > > > extend
> > > > > > Gwen's example, one badly configured broker should not degrade
> > topic
> > > > > > creation for the entire cluster.
> > > > > >
> > > > > > 2. Upgrade scenario - Can you add a section on the upgrade piece
> to
> > > > > confirm
> > > > > > that old clients will not see errors? I believe
> > > > > ZookeeperConsumerConnector
> > > > > > reads the Broker objects from ZK. I wanted to confirm that this
> > will
> > > > not
> > > > > > cause any problems.
> > > > > >
> > > > > > 3. Could you elaborate your proposed changes to the
> > > > UpdateMetadataRequest
> > > > > > in the "Public Interfaces" section? Personally, I find this
> format
> > > easy
> > > > > to
> > > > > > read in terms of wire protocol changes:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > > > > >
> > > > > > Aditya
> > > > > >
> > > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <
> allenxwang@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > KIP is updated include rack as an optional property for broker.
> > > > Please
> > > > > > take
> > > > > > > a look and let me know if more details are needed.
> > > > > > >
> > > > > > > For the case where some brokers have rack and some do not, the
> > > > current
> > > > > > KIP
> > > > > > > uses the fail-fast behavior. If there are concerns, we can
> > further
> > > > > > discuss
> > > > > > > this in the email thread or next hangout.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
> > allenxwang@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > That's a good question. I can think of three actions if the
> > rack
> > > > > > > > information is incomplete:
> > > > > > > >
> > > > > > > > 1. Treat the node without rack as if it is on its unique rack
> > > > > > > > 2. Disregard all rack information and fallback to current
> > > algorithm
> > > > > > > > 3. Fail-fast
> > > > > > > >
> > > > > > > > Now I think about it, one and three make more sense. The
> reason
> > > for
> > > > > > > > fail-fast is that user mistake for not providing the rack may
> > > never
> > > > > be
> > > > > > > > found if we tolerate that and the assignment may not be rack
> > > aware
> > > > as
> > > > > > the
> > > > > > > > user has expected and this creates debug problems when things
> > > fail.
> > > > > > > >
> > > > > > > > What do you think? If not fail-fast, is there anyway we can
> > make
> > > > the
> > > > > > user
> > > > > > > > error standing out?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
> > > gwen@confluent.io>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Thanks! Just to clarify, when some brokers have rack
> > assignment
> > > > and
> > > > > > some
> > > > > > > >> don't, do we act like none of them have it? or like those
> > > without
> > > > > > > >> assignment are in their own rack?
> > > > > > > >>
> > > > > > > >> The first scenario is good when first setting up
> > rack-awareness,
> > > > but
> > > > > > the
> > > > > > > >> second makes more sense for on-going maintenance (I can
> > totally
> > > > see
> > > > > > > >> someone
> > > > > > > >> adding a node and forgetting to set the rack property, we
> > don't
> > > > want
> > > > > > > this
> > > > > > > >> to change behavior for anything except the new node).
> > > > > > > >>
> > > > > > > >> What do you think?
> > > > > > > >>
> > > > > > > >> Gwen
> > > > > > > >>
> > > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
> > > > allenxwang@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > For scenario 1:
> > > > > > > >> >
> > > > > > > >> > - Add the rack information to broker property file or
> > > > dynamically
> > > > > > set
> > > > > > > >> it in
> > > > > > > >> > the wrapper code to bootstrap Kafka server. You would do
> > that
> > > > for
> > > > > > all
> > > > > > > >> > brokers and restart the brokers one by one.
> > > > > > > >> >
> > > > > > > >> > In this scenario, the complete broker to rack mapping may
> > not
> > > be
> > > > > > > >> available
> > > > > > > >> > until every broker is restarted. During that time we fall
> > back
> > > > to
> > > > > > > >> default
> > > > > > > >> > replica assignment algorithm.
> > > > > > > >> >
> > > > > > > >> > For scenario 2:
> > > > > > > >> >
> > > > > > > >> > - Add the rack information to broker property file or
> > > > dynamically
> > > > > > set
> > > > > > > >> it in
> > > > > > > >> > the wrapper code and start the broker.
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
> > > > gwen@confluent.io>
> > > > > > > >> wrote:
> > > > > > > >> >
> > > > > > > >> > > Can you clarify the workflow for the following
> scenarios:
> > > > > > > >> > >
> > > > > > > >> > > 1. I currently have 6 brokers and want to add rack
> > > information
> > > > > for
> > > > > > > >> each
> > > > > > > >> > > 2. I'm adding a new broker and I want to specify which
> > rack
> > > it
> > > > > > > >> belongs on
> > > > > > > >> > > while adding it.
> > > > > > > >> > >
> > > > > > > >> > > Thanks!
> > > > > > > >> > >
> > > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
> > > > > allenxwang@gmail.com
> > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > We discussed the KIP in the hangout today. The
> > > > recommendation
> > > > > is
> > > > > > > to
> > > > > > > >> > make
> > > > > > > >> > > > rack as a broker property in ZooKeeper. For users with
> > > > > existing
> > > > > > > rack
> > > > > > > >> > > > information stored somewhere, they would need to
> > retrieve
> > > > the
> > > > > > > >> > information
> > > > > > > >> > > > at broker start up and dynamically set the rack
> > property,
> > > > > which
> > > > > > > can
> > > > > > > >> be
> > > > > > > >> > > > implemented as a wrapper to bootstrap broker. There
> will
> > > be
> > > > no
> > > > > > > >> > interface
> > > > > > > >> > > or
> > > > > > > >> > > > pluggable implementation to retrieve the rack
> > information.
> > > > > > > >> > > >
> > > > > > > >> > > > The assumption is that you always need to restart the
> > > broker
> > > > > to
> > > > > > > >> make a
> > > > > > > >> > > > change to the rack.
> > > > > > > >> > > >
> > > > > > > >> > > > Once the rack becomes a broker property, it will be
> > > possible
> > > > > to
> > > > > > > make
> > > > > > > >> > rack
> > > > > > > >> > > > part of the meta data to help the consumer choose
> which
> > in
> > > > > sync
> > > > > > > >> replica
> > > > > > > >> > > to
> > > > > > > >> > > > consume from as part of the future consumer
> enhancement.
> > > > > > > >> > > >
> > > > > > > >> > > > I will update the KIP.
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks,
> > > > > > > >> > > > Allen
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
> > > > > > allenxwang@gmail.com>
> > > > > > > >> > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP was
> not
> > > > > > discussed
> > > > > > > >> due
> > > > > > > >> > to
> > > > > > > >> > > > > time constraint.
> > > > > > > >> > > > >
> > > > > > > >> > > > > However, after hearing discussion of KIP-35, I have
> > the
> > > > > > feeling
> > > > > > > >> that
> > > > > > > >> > > > > incompatibility (caused by new broker property)
> > between
> > > > > > brokers
> > > > > > > >> with
> > > > > > > >> > > > > different versions  will be solved there. In
> addition,
> > > > > having
> > > > > > > >> stack
> > > > > > > >> > in
> > > > > > > >> > > > > broker property as meta data may also help consumers
> > in
> > > > the
> > > > > > > >> future.
> > > > > > > >> > So
> > > > > > > >> > > I
> > > > > > > >> > > > am
> > > > > > > >> > > > > open to adding stack property to broker.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Hopefully we can discuss this in the next KIP
> hangout.
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
> > > > > > > allenxwang@gmail.com
> > > > > > > >> >
> > > > > > > >> > > > wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > >> Can you send me the information on the next KIP
> > > hangout?
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> Currently the broker-rack mapping is not cached. In
> > > > > > KafkaApis,
> > > > > > > >> > > > >> RackLocator.getRackInfo() is called each time the
> > > mapping
> > > > > is
> > > > > > > >> needed
> > > > > > > >> > > for
> > > > > > > >> > > > >> auto topic creation. This will ensure latest
> mapping
> > is
> > > > > used
> > > > > > at
> > > > > > > >> any
> > > > > > > >> > > > time.
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> The ability to get the complete mapping makes it
> > simple
> > > > to
> > > > > > > reuse
> > > > > > > >> the
> > > > > > > >> > > > same
> > > > > > > >> > > > >> interface in command line tools.
> > > > > > > >> > > > >>
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> > > > > > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
> > > > > > > >> > > > >>
> > > > > > > >> > > > >>> Perhaps we discuss this during the next KIP
> hangout?
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> I do see that a pluggable rack locator can be
> useful
> > > > but I
> > > > > > do
> > > > > > > >> see a
> > > > > > > >> > > few
> > > > > > > >> > > > >>> concerns:
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> - The RackLocator (as described in the document),
> > > > implies
> > > > > > that
> > > > > > > >> it
> > > > > > > >> > can
> > > > > > > >> > > > >>> discover rack information for any node in the
> > cluster.
> > > > How
> > > > > > > does
> > > > > > > >> it
> > > > > > > >> > > deal
> > > > > > > >> > > > >>> with rack location changes? For example, if I
> moved
> > > > broker
> > > > > > id
> > > > > > > >> (1)
> > > > > > > >> > > from
> > > > > > > >> > > > >>> rack
> > > > > > > >> > > > >>> X to Y, I only have to start that broker with a
> > newer
> > > > rack
> > > > > > > >> config.
> > > > > > > >> > If
> > > > > > > >> > > > >>> RackLocator discovers broker -> rack information
> at
> > > > start
> > > > > up
> > > > > > > >> time,
> > > > > > > >> > > any
> > > > > > > >> > > > >>> change to a broker will require bouncing the
> entire
> > > > > cluster
> > > > > > > >> since
> > > > > > > >> > > > >>> createTopic requests can be sent to any node in
> the
> > > > > cluster.
> > > > > > > >> > > > >>> For this reason it may be simpler to have each
> node
> > be
> > > > > aware
> > > > > > > of
> > > > > > > >> its
> > > > > > > >> > > own
> > > > > > > >> > > > >>> rack and persist it in ZK during start up time.
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> - A pluggable RackLocator relies on an external
> > > service
> > > > > > being
> > > > > > > >> > > available
> > > > > > > >> > > > >>> to
> > > > > > > >> > > > >>> serve rack information.
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> Out of curiosity, I looked up how a couple of
> other
> > > > > systems
> > > > > > > deal
> > > > > > > >> > with
> > > > > > > >> > > > >>> zone/rack awareness.
> > > > > > > >> > > > >>> For Cassandra some interesting modes are:
> > > > > > > >> > > > >>> (Property File configuration)
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>>
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > > > > > >> > > > >>> (Dynamic inference)
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>>
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> Voldemort does a static node -> zone assignment
> > based
> > > on
> > > > > > > >> > > configuration.
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> Aditya
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
> > > > > > > >> allenxwang@gmail.com
> > > > > > > >> > >
> > > > > > > >> > > > >>> wrote:
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> > I would like to see if we can do both:
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> > - Make RackLocator pluggable to facilitate
> > migration
> > > > > with
> > > > > > > >> > existing
> > > > > > > >> > > > >>> > broker-rack mapping
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> > - Make rack an optional property for broker. If
> > rack
> > > > is
> > > > > > > >> available
> > > > > > > >> > > > from
> > > > > > > >> > > > >>> > broker, treat it as source of truth. For users
> > with
> > > > > > existing
> > > > > > > >> > > > >>> broker-rack
> > > > > > > >> > > > >>> > mapping somewhere else, they can use the
> pluggable
> > > way
> > > > > or
> > > > > > > they
> > > > > > > >> > can
> > > > > > > >> > > > >>> transfer
> > > > > > > >> > > > >>> > the mapping to the broker rack property.
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> > One thing I am not sure is what happens at
> rolling
> > > > > upgrade
> > > > > > > >> when
> > > > > > > >> > we
> > > > > > > >> > > > have
> > > > > > > >> > > > >>> > rack as a broker property. For brokers with
> older
> > > > > version
> > > > > > of
> > > > > > > >> > Kafka,
> > > > > > > >> > > > >>> will it
> > > > > > > >> > > > >>> > cause problem for them? If so, is there any
> > > > workaround?
> > > > > I
> > > > > > > also
> > > > > > > >> > > think
> > > > > > > >> > > > it
> > > > > > > >> > > > >>> > would be better not to have rack in the
> controller
> > > > wire
> > > > > > > >> protocol
> > > > > > > >> > > but
> > > > > > > >> > > > >>> not
> > > > > > > >> > > > >>> > sure if it is achievable.
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> > Thanks,
> > > > > > > >> > > > >>> > Allen
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
> > > > > > > >> tpalino@gmail.com>
> > > > > > > >> > > > >>> wrote:
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> > > I tend to like the idea of a pluggable
> locator.
> > > For
> > > > > > > >> example, we
> > > > > > > >> > > > >>> already
> > > > > > > >> > > > >>> > > have an interface for discovering information
> > > about
> > > > > the
> > > > > > > >> > physical
> > > > > > > >> > > > >>> location
> > > > > > > >> > > > >>> > > of servers. I don't relish the idea of having
> to
> > > > > > maintain
> > > > > > > >> data
> > > > > > > >> > in
> > > > > > > >> > > > >>> > multiple
> > > > > > > >> > > > >>> > > places.
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> > > -Todd
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya
> > Auradkar <
> > > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
> > > > > > > >> > > > >>> > > >
> > > > > > > >> > > > >>> > > > I agree with Gwen that having a RackLocator
> > > class
> > > > > that
> > > > > > > is
> > > > > > > >> > > > pluggable
> > > > > > > >> > > > >>> > seems
> > > > > > > >> > > > >>> > > > to be too complex. The KIP refers to
> > potentially
> > > > > > non-ZK
> > > > > > > >> > storage
> > > > > > > >> > > > >>> for the
> > > > > > > >> > > > >>> > > > rack info which I don't think is necessary.
> > > > > > > >> > > > >>> > > >
> > > > > > > >> > > > >>> > > > Perhaps we can persist this info in zk under
> > > > > > > >> > > > >>> /brokers/ids/<broker_id>
> > > > > > > >> > > > >>> > > > similar to other broker properties and add a
> > > > config
> > > > > in
> > > > > > > >> > > > KafkaConfig
> > > > > > > >> > > > >>> > called
> > > > > > > >> > > > >>> > > > "rack".
> > > > > > > >> > > > >>> > > >
> > > > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > > > > > >> > > "rack":
> > > > > > > >> > > > >>> > "abc"}
> > > > > > > >> > > > >>> > > >
> > > > > > > >> > > > >>> > > > Aditya
> > > > > > > >> > > > >>> > > >
> > > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen
> Shapira
> > <
> > > > > > > >> > > gwen@confluent.io
> > > > > > > >> > > > >
> > > > > > > >> > > > >>> > wrote:
> > > > > > > >> > > > >>> > > >
> > > > > > > >> > > > >>> > > > > Hi,
> > > > > > > >> > > > >>> > > > >
> > > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP for
> > this.
> > > > This
> > > > > > is
> > > > > > > >> super
> > > > > > > >> > > > >>> important
> > > > > > > >> > > > >>> > > for
> > > > > > > >> > > > >>> > > > > production deployments of Kafka.
> > > > > > > >> > > > >>> > > > >
> > > > > > > >> > > > >>> > > > > Few questions:
> > > > > > > >> > > > >>> > > > >
> > > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as
> > > > > possible"?
> > > > > > > I'd
> > > > > > > >> > want
> > > > > > > >> > > to
> > > > > > > >> > > > >>> > balance
> > > > > > > >> > > > >>> > > > > between safety (more racks) and network
> > > > > utilization
> > > > > > > >> > (traffic
> > > > > > > >> > > > >>> within a
> > > > > > > >> > > > >>> > > > rack
> > > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One
> > > replica
> > > > > on
> > > > > > a
> > > > > > > >> > > different
> > > > > > > >> > > > >>> rack
> > > > > > > >> > > > >>> > > and
> > > > > > > >> > > > >>> > > > > the rest on same rack (if possible) sounds
> > > > better
> > > > > to
> > > > > > > me.
> > > > > > > >> > > > >>> > > > >
> > > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly complex
> > > > > compared
> > > > > > to
> > > > > > > >> > > adding a
> > > > > > > >> > > > >>> > > > rack.number
> > > > > > > >> > > > >>> > > > > property to the broker properties file.
> Why
> > do
> > > > we
> > > > > > want
> > > > > > > >> > that?
> > > > > > > >> > > > >>> > > > >
> > > > > > > >> > > > >>> > > > > Gwen
> > > > > > > >> > > > >>> > > > >
> > > > > > > >> > > > >>> > > > >
> > > > > > > >> > > > >>> > > > >
> > > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen
> > Wang <
> > > > > > > >> > > > >>> allenxwang@gmail.com>
> > > > > > > >> > > > >>> > > > wrote:
> > > > > > > >> > > > >>> > > > >
> > > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> > > > > > > >> > > > >>> > > > > >
> > > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware
> > replica
> > > > > > > >> assignment.
> > > > > > > >> > > > >>> > > > > >
> > > > > > > >> > > > >>> > > > > >
> > > > > > > >> > > > >>> > > > > >
> > > > > > > >> > > > >>> > > > >
> > > > > > > >> > > > >>> > > >
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>>
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > > > > >> > > > >>> > > > > >
> > > > > > > >> > > > >>> > > > > > The goal is to utilize the isolation
> > > provided
> > > > by
> > > > > > the
> > > > > > > >> > racks
> > > > > > > >> > > in
> > > > > > > >> > > > >>> data
> > > > > > > >> > > > >>> > > > center
> > > > > > > >> > > > >>> > > > > > and distribute replicas to racks to
> > provide
> > > > > fault
> > > > > > > >> > > tolerance.
> > > > > > > >> > > > >>> > > > > >
> > > > > > > >> > > > >>> > > > > > Comments are welcome.
> > > > > > > >> > > > >>> > > > > >
> > > > > > > >> > > > >>> > > > > > Thanks,
> > > > > > > >> > > > >>> > > > > > Allen
> > > > > > > >> > > > >>> > > > > >
> > > > > > > >> > > > >>> > > > >
> > > > > > > >> > > > >>> > > >
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>
> > > > > > > >> > > > >>
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Thanks,
> Neha
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Neha Narkhede <ne...@confluent.io>.
Few suggestions on improving the KIP

*If some brokers have rack, and some do not, the algorithm will thrown an
> exception. This is to prevent incorrect assignment caused by user error.*


In the KIP, can you clearly state the user-facing behavior when some
brokers have rack information and some don't. Which actions and requests
will error out and how?

*Even distribution of partition leadership among brokers*


There is some information about arranging the sorted broker list interlaced
with rack ids. Can you describe the changes to the current algorithm in a
little more detail? How does this interlacing work if only a subset of
brokers have the rack id configured? Does this still work if uneven # of
brokers are assigned to each rack? It might work, I'm looking for more
details on the changes, since it will affect the behavior seen by the user
- imbalance on either the leaders or data or both.

On Mon, Nov 2, 2015 at 6:39 PM, Aditya Auradkar <aa...@linkedin.com>
wrote:

> I think this sounds reasonable. Anyone else have comments?
>
> Aditya
>
> On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <al...@gmail.com> wrote:
>
> > During the discussion in the hangout, it was mentioned that it would be
> > desirable that consumers know the rack information of the brokers so that
> > they can consume from the broker in the same rack to reduce latency. As I
> > understand this will only be beneficial if consumer can consume from any
> > broker in ISR, which is not possible now.
> >
> > I suggest we skip the change to TMR. Once the change is made to consumer
> to
> > be able to consume from any broker in ISR, the rack information can be
> > added to TMR.
> >
> > Another thing I want to confirm is  command line behavior. I think the
> > desirable default behavior is to fail fast on command line for incomplete
> > rack mapping. The error message can include further instruction that
> tells
> > the user to add an extra argument (like "--allow-partial-rackinfo") to
> > suppress the error and do an imperfect rack aware assignment. If the
> > default behavior is to allow incomplete mapping, the error can still be
> > easily missed.
> >
> > The affected command line tools are TopicCommand and
> > ReassignPartitionsCommand.
> >
> > Thanks,
> > Allen
> >
> >
> >
> >
> >
> > On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <
> aauradkar@linkedin.com>
> > wrote:
> >
> > > Hi Allen,
> > >
> > > For TopicMetadataResponse to understand version, you can bump up the
> > > request version itself. Based on the version of the request, the
> response
> > > can be appropriately serialized. It shouldn't be a huge change. For
> > > example: We went through something similar for ProduceRequest recently
> (
> > > https://reviews.apache.org/r/33378/)
> > > I guess the reason protocol information is not included in the TMR is
> > > because the topic itself is independent of any particular protocol (SSL
> > vs
> > > Plaintext). Having said that, I'm not sure we even need rack
> information
> > in
> > > TMR. What usecase were you thinking of initially?
> > >
> > > For 1 - I'd be fine with adding an option to the command line tools
> that
> > > check rack assignment. For e.g. "--strict-assignment" or something
> > similar.
> > >
> > > Aditya
> > >
> > > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <al...@gmail.com>
> > wrote:
> > >
> > > > For 2 and 3, I have updated the KIP. Please take a look. One thing I
> > have
> > > > changed is removing the proposal to add rack to
> TopicMetadataResponse.
> > > The
> > > > reason is that unlike UpdateMetadataRequest, TopicMetadataResponse
> does
> > > not
> > > > understand version. I don't see a way to include rack without
> breaking
> > > old
> > > > version of clients. That's probably why secure protocol is not
> included
> > > in
> > > > the TopicMetadataResponse either. I think it will be a much bigger
> > change
> > > > to include rack in TopicMetadataResponse.
> > > >
> > > > For 1, my concern is that doing rack aware assignment without
> complete
> > > > broker to rack mapping will result in assignment that is not rack
> aware
> > > and
> > > > fail to provide fault tolerance in the event of rack outage. This
> kind
> > of
> > > > problem will be difficult to surface. And the cost of this problem is
> > > high:
> > > > you have to do partition reassignment if you are lucky to spot the
> > > problem
> > > > early on or face the consequence of data loss during real rack
> outage.
> > > >
> > > > I do see the concern of fail-fast as it might also cause data loss if
> > > > producer is not able produce the message due to topic creation
> failure.
> > > Is
> > > > it feasible to treat dynamic topic creation and command tools
> > > differently?
> > > > We allow dynamic topic creation with incomplete broker-rack mapping
> and
> > > > fail fast in command line. Another option is to let user determine
> the
> > > > behavior for command line. For example, by default fail fast in
> command
> > > > line but allow incomplete broker-rack mapping if another switch is
> > > > provided.
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> > > > aauradkar@linkedin.com.invalid> wrote:
> > > >
> > > > > Hey Allen,
> > > > >
> > > > > 1. If we choose fail fast topic creation, we will have topic
> creation
> > > > > failures while upgrading the cluster. I really doubt we want this
> > > > behavior.
> > > > > Ideally, this should be invisible to clients of a cluster.
> Currently,
> > > > each
> > > > > broker is effectively its own rack. So we probably can use the rack
> > > > > information whenever possible but not make it a hard requirement.
> To
> > > > extend
> > > > > Gwen's example, one badly configured broker should not degrade
> topic
> > > > > creation for the entire cluster.
> > > > >
> > > > > 2. Upgrade scenario - Can you add a section on the upgrade piece to
> > > > confirm
> > > > > that old clients will not see errors? I believe
> > > > ZookeeperConsumerConnector
> > > > > reads the Broker objects from ZK. I wanted to confirm that this
> will
> > > not
> > > > > cause any problems.
> > > > >
> > > > > 3. Could you elaborate your proposed changes to the
> > > UpdateMetadataRequest
> > > > > in the "Public Interfaces" section? Personally, I find this format
> > easy
> > > > to
> > > > > read in terms of wire protocol changes:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > > > >
> > > > > Aditya
> > > > >
> > > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <al...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > KIP is updated include rack as an optional property for broker.
> > > Please
> > > > > take
> > > > > > a look and let me know if more details are needed.
> > > > > >
> > > > > > For the case where some brokers have rack and some do not, the
> > > current
> > > > > KIP
> > > > > > uses the fail-fast behavior. If there are concerns, we can
> further
> > > > > discuss
> > > > > > this in the email thread or next hangout.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <
> allenxwang@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > That's a good question. I can think of three actions if the
> rack
> > > > > > > information is incomplete:
> > > > > > >
> > > > > > > 1. Treat the node without rack as if it is on its unique rack
> > > > > > > 2. Disregard all rack information and fallback to current
> > algorithm
> > > > > > > 3. Fail-fast
> > > > > > >
> > > > > > > Now I think about it, one and three make more sense. The reason
> > for
> > > > > > > fail-fast is that user mistake for not providing the rack may
> > never
> > > > be
> > > > > > > found if we tolerate that and the assignment may not be rack
> > aware
> > > as
> > > > > the
> > > > > > > user has expected and this creates debug problems when things
> > fail.
> > > > > > >
> > > > > > > What do you think? If not fail-fast, is there anyway we can
> make
> > > the
> > > > > user
> > > > > > > error standing out?
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
> > gwen@confluent.io>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks! Just to clarify, when some brokers have rack
> assignment
> > > and
> > > > > some
> > > > > > >> don't, do we act like none of them have it? or like those
> > without
> > > > > > >> assignment are in their own rack?
> > > > > > >>
> > > > > > >> The first scenario is good when first setting up
> rack-awareness,
> > > but
> > > > > the
> > > > > > >> second makes more sense for on-going maintenance (I can
> totally
> > > see
> > > > > > >> someone
> > > > > > >> adding a node and forgetting to set the rack property, we
> don't
> > > want
> > > > > > this
> > > > > > >> to change behavior for anything except the new node).
> > > > > > >>
> > > > > > >> What do you think?
> > > > > > >>
> > > > > > >> Gwen
> > > > > > >>
> > > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
> > > allenxwang@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > For scenario 1:
> > > > > > >> >
> > > > > > >> > - Add the rack information to broker property file or
> > > dynamically
> > > > > set
> > > > > > >> it in
> > > > > > >> > the wrapper code to bootstrap Kafka server. You would do
> that
> > > for
> > > > > all
> > > > > > >> > brokers and restart the brokers one by one.
> > > > > > >> >
> > > > > > >> > In this scenario, the complete broker to rack mapping may
> not
> > be
> > > > > > >> available
> > > > > > >> > until every broker is restarted. During that time we fall
> back
> > > to
> > > > > > >> default
> > > > > > >> > replica assignment algorithm.
> > > > > > >> >
> > > > > > >> > For scenario 2:
> > > > > > >> >
> > > > > > >> > - Add the rack information to broker property file or
> > > dynamically
> > > > > set
> > > > > > >> it in
> > > > > > >> > the wrapper code and start the broker.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
> > > gwen@confluent.io>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > > Can you clarify the workflow for the following scenarios:
> > > > > > >> > >
> > > > > > >> > > 1. I currently have 6 brokers and want to add rack
> > information
> > > > for
> > > > > > >> each
> > > > > > >> > > 2. I'm adding a new broker and I want to specify which
> rack
> > it
> > > > > > >> belongs on
> > > > > > >> > > while adding it.
> > > > > > >> > >
> > > > > > >> > > Thanks!
> > > > > > >> > >
> > > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
> > > > allenxwang@gmail.com
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > >
> > > > > > >> > > > We discussed the KIP in the hangout today. The
> > > recommendation
> > > > is
> > > > > > to
> > > > > > >> > make
> > > > > > >> > > > rack as a broker property in ZooKeeper. For users with
> > > > existing
> > > > > > rack
> > > > > > >> > > > information stored somewhere, they would need to
> retrieve
> > > the
> > > > > > >> > information
> > > > > > >> > > > at broker start up and dynamically set the rack
> property,
> > > > which
> > > > > > can
> > > > > > >> be
> > > > > > >> > > > implemented as a wrapper to bootstrap broker. There will
> > be
> > > no
> > > > > > >> > interface
> > > > > > >> > > or
> > > > > > >> > > > pluggable implementation to retrieve the rack
> information.
> > > > > > >> > > >
> > > > > > >> > > > The assumption is that you always need to restart the
> > broker
> > > > to
> > > > > > >> make a
> > > > > > >> > > > change to the rack.
> > > > > > >> > > >
> > > > > > >> > > > Once the rack becomes a broker property, it will be
> > possible
> > > > to
> > > > > > make
> > > > > > >> > rack
> > > > > > >> > > > part of the meta data to help the consumer choose which
> in
> > > > sync
> > > > > > >> replica
> > > > > > >> > > to
> > > > > > >> > > > consume from as part of the future consumer enhancement.
> > > > > > >> > > >
> > > > > > >> > > > I will update the KIP.
> > > > > > >> > > >
> > > > > > >> > > > Thanks,
> > > > > > >> > > > Allen
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
> > > > > allenxwang@gmail.com>
> > > > > > >> > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP was not
> > > > > discussed
> > > > > > >> due
> > > > > > >> > to
> > > > > > >> > > > > time constraint.
> > > > > > >> > > > >
> > > > > > >> > > > > However, after hearing discussion of KIP-35, I have
> the
> > > > > feeling
> > > > > > >> that
> > > > > > >> > > > > incompatibility (caused by new broker property)
> between
> > > > > brokers
> > > > > > >> with
> > > > > > >> > > > > different versions  will be solved there. In addition,
> > > > having
> > > > > > >> stack
> > > > > > >> > in
> > > > > > >> > > > > broker property as meta data may also help consumers
> in
> > > the
> > > > > > >> future.
> > > > > > >> > So
> > > > > > >> > > I
> > > > > > >> > > > am
> > > > > > >> > > > > open to adding stack property to broker.
> > > > > > >> > > > >
> > > > > > >> > > > > Hopefully we can discuss this in the next KIP hangout.
> > > > > > >> > > > >
> > > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
> > > > > > allenxwang@gmail.com
> > > > > > >> >
> > > > > > >> > > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > >> Can you send me the information on the next KIP
> > hangout?
> > > > > > >> > > > >>
> > > > > > >> > > > >> Currently the broker-rack mapping is not cached. In
> > > > > KafkaApis,
> > > > > > >> > > > >> RackLocator.getRackInfo() is called each time the
> > mapping
> > > > is
> > > > > > >> needed
> > > > > > >> > > for
> > > > > > >> > > > >> auto topic creation. This will ensure latest mapping
> is
> > > > used
> > > > > at
> > > > > > >> any
> > > > > > >> > > > time.
> > > > > > >> > > > >>
> > > > > > >> > > > >> The ability to get the complete mapping makes it
> simple
> > > to
> > > > > > reuse
> > > > > > >> the
> > > > > > >> > > > same
> > > > > > >> > > > >> interface in command line tools.
> > > > > > >> > > > >>
> > > > > > >> > > > >>
> > > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> > > > > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
> > > > > > >> > > > >>
> > > > > > >> > > > >>> Perhaps we discuss this during the next KIP hangout?
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> I do see that a pluggable rack locator can be useful
> > > but I
> > > > > do
> > > > > > >> see a
> > > > > > >> > > few
> > > > > > >> > > > >>> concerns:
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> - The RackLocator (as described in the document),
> > > implies
> > > > > that
> > > > > > >> it
> > > > > > >> > can
> > > > > > >> > > > >>> discover rack information for any node in the
> cluster.
> > > How
> > > > > > does
> > > > > > >> it
> > > > > > >> > > deal
> > > > > > >> > > > >>> with rack location changes? For example, if I moved
> > > broker
> > > > > id
> > > > > > >> (1)
> > > > > > >> > > from
> > > > > > >> > > > >>> rack
> > > > > > >> > > > >>> X to Y, I only have to start that broker with a
> newer
> > > rack
> > > > > > >> config.
> > > > > > >> > If
> > > > > > >> > > > >>> RackLocator discovers broker -> rack information at
> > > start
> > > > up
> > > > > > >> time,
> > > > > > >> > > any
> > > > > > >> > > > >>> change to a broker will require bouncing the entire
> > > > cluster
> > > > > > >> since
> > > > > > >> > > > >>> createTopic requests can be sent to any node in the
> > > > cluster.
> > > > > > >> > > > >>> For this reason it may be simpler to have each node
> be
> > > > aware
> > > > > > of
> > > > > > >> its
> > > > > > >> > > own
> > > > > > >> > > > >>> rack and persist it in ZK during start up time.
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> - A pluggable RackLocator relies on an external
> > service
> > > > > being
> > > > > > >> > > available
> > > > > > >> > > > >>> to
> > > > > > >> > > > >>> serve rack information.
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> Out of curiosity, I looked up how a couple of other
> > > > systems
> > > > > > deal
> > > > > > >> > with
> > > > > > >> > > > >>> zone/rack awareness.
> > > > > > >> > > > >>> For Cassandra some interesting modes are:
> > > > > > >> > > > >>> (Property File configuration)
> > > > > > >> > > > >>>
> > > > > > >> > > > >>>
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > > > > >> > > > >>> (Dynamic inference)
> > > > > > >> > > > >>>
> > > > > > >> > > > >>>
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> Voldemort does a static node -> zone assignment
> based
> > on
> > > > > > >> > > configuration.
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> Aditya
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
> > > > > > >> allenxwang@gmail.com
> > > > > > >> > >
> > > > > > >> > > > >>> wrote:
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> > I would like to see if we can do both:
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> > - Make RackLocator pluggable to facilitate
> migration
> > > > with
> > > > > > >> > existing
> > > > > > >> > > > >>> > broker-rack mapping
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> > - Make rack an optional property for broker. If
> rack
> > > is
> > > > > > >> available
> > > > > > >> > > > from
> > > > > > >> > > > >>> > broker, treat it as source of truth. For users
> with
> > > > > existing
> > > > > > >> > > > >>> broker-rack
> > > > > > >> > > > >>> > mapping somewhere else, they can use the pluggable
> > way
> > > > or
> > > > > > they
> > > > > > >> > can
> > > > > > >> > > > >>> transfer
> > > > > > >> > > > >>> > the mapping to the broker rack property.
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> > One thing I am not sure is what happens at rolling
> > > > upgrade
> > > > > > >> when
> > > > > > >> > we
> > > > > > >> > > > have
> > > > > > >> > > > >>> > rack as a broker property. For brokers with older
> > > > version
> > > > > of
> > > > > > >> > Kafka,
> > > > > > >> > > > >>> will it
> > > > > > >> > > > >>> > cause problem for them? If so, is there any
> > > workaround?
> > > > I
> > > > > > also
> > > > > > >> > > think
> > > > > > >> > > > it
> > > > > > >> > > > >>> > would be better not to have rack in the controller
> > > wire
> > > > > > >> protocol
> > > > > > >> > > but
> > > > > > >> > > > >>> not
> > > > > > >> > > > >>> > sure if it is achievable.
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> > Thanks,
> > > > > > >> > > > >>> > Allen
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
> > > > > > >> tpalino@gmail.com>
> > > > > > >> > > > >>> wrote:
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> > > I tend to like the idea of a pluggable locator.
> > For
> > > > > > >> example, we
> > > > > > >> > > > >>> already
> > > > > > >> > > > >>> > > have an interface for discovering information
> > about
> > > > the
> > > > > > >> > physical
> > > > > > >> > > > >>> location
> > > > > > >> > > > >>> > > of servers. I don't relish the idea of having to
> > > > > maintain
> > > > > > >> data
> > > > > > >> > in
> > > > > > >> > > > >>> > multiple
> > > > > > >> > > > >>> > > places.
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> > > -Todd
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya
> Auradkar <
> > > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
> > > > > > >> > > > >>> > > >
> > > > > > >> > > > >>> > > > I agree with Gwen that having a RackLocator
> > class
> > > > that
> > > > > > is
> > > > > > >> > > > pluggable
> > > > > > >> > > > >>> > seems
> > > > > > >> > > > >>> > > > to be too complex. The KIP refers to
> potentially
> > > > > non-ZK
> > > > > > >> > storage
> > > > > > >> > > > >>> for the
> > > > > > >> > > > >>> > > > rack info which I don't think is necessary.
> > > > > > >> > > > >>> > > >
> > > > > > >> > > > >>> > > > Perhaps we can persist this info in zk under
> > > > > > >> > > > >>> /brokers/ids/<broker_id>
> > > > > > >> > > > >>> > > > similar to other broker properties and add a
> > > config
> > > > in
> > > > > > >> > > > KafkaConfig
> > > > > > >> > > > >>> > called
> > > > > > >> > > > >>> > > > "rack".
> > > > > > >> > > > >>> > > >
> > > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > > > > >> > > "rack":
> > > > > > >> > > > >>> > "abc"}
> > > > > > >> > > > >>> > > >
> > > > > > >> > > > >>> > > > Aditya
> > > > > > >> > > > >>> > > >
> > > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira
> <
> > > > > > >> > > gwen@confluent.io
> > > > > > >> > > > >
> > > > > > >> > > > >>> > wrote:
> > > > > > >> > > > >>> > > >
> > > > > > >> > > > >>> > > > > Hi,
> > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >>> > > > > First, thanks for putting out a KIP for
> this.
> > > This
> > > > > is
> > > > > > >> super
> > > > > > >> > > > >>> important
> > > > > > >> > > > >>> > > for
> > > > > > >> > > > >>> > > > > production deployments of Kafka.
> > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >>> > > > > Few questions:
> > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as
> > > > possible"?
> > > > > > I'd
> > > > > > >> > want
> > > > > > >> > > to
> > > > > > >> > > > >>> > balance
> > > > > > >> > > > >>> > > > > between safety (more racks) and network
> > > > utilization
> > > > > > >> > (traffic
> > > > > > >> > > > >>> within a
> > > > > > >> > > > >>> > > > rack
> > > > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One
> > replica
> > > > on
> > > > > a
> > > > > > >> > > different
> > > > > > >> > > > >>> rack
> > > > > > >> > > > >>> > > and
> > > > > > >> > > > >>> > > > > the rest on same rack (if possible) sounds
> > > better
> > > > to
> > > > > > me.
> > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly complex
> > > > compared
> > > > > to
> > > > > > >> > > adding a
> > > > > > >> > > > >>> > > > rack.number
> > > > > > >> > > > >>> > > > > property to the broker properties file. Why
> do
> > > we
> > > > > want
> > > > > > >> > that?
> > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >>> > > > > Gwen
> > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen
> Wang <
> > > > > > >> > > > >>> allenxwang@gmail.com>
> > > > > > >> > > > >>> > > > wrote:
> > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware
> replica
> > > > > > >> assignment.
> > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >>> > > >
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>>
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >>> > > > > > The goal is to utilize the isolation
> > provided
> > > by
> > > > > the
> > > > > > >> > racks
> > > > > > >> > > in
> > > > > > >> > > > >>> data
> > > > > > >> > > > >>> > > > center
> > > > > > >> > > > >>> > > > > > and distribute replicas to racks to
> provide
> > > > fault
> > > > > > >> > > tolerance.
> > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >>> > > > > > Comments are welcome.
> > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >>> > > > > > Thanks,
> > > > > > >> > > > >>> > > > > > Allen
> > > > > > >> > > > >>> > > > > >
> > > > > > >> > > > >>> > > > >
> > > > > > >> > > > >>> > > >
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>>
> > > > > > >> > > > >>
> > > > > > >> > > > >>
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Thanks,
Neha

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Aditya Auradkar <aa...@linkedin.com>.
I think this sounds reasonable. Anyone else have comments?

Aditya

On Tue, Oct 27, 2015 at 5:23 PM, Allen Wang <al...@gmail.com> wrote:

> During the discussion in the hangout, it was mentioned that it would be
> desirable that consumers know the rack information of the brokers so that
> they can consume from the broker in the same rack to reduce latency. As I
> understand this will only be beneficial if consumer can consume from any
> broker in ISR, which is not possible now.
>
> I suggest we skip the change to TMR. Once the change is made to consumer to
> be able to consume from any broker in ISR, the rack information can be
> added to TMR.
>
> Another thing I want to confirm is  command line behavior. I think the
> desirable default behavior is to fail fast on command line for incomplete
> rack mapping. The error message can include further instruction that tells
> the user to add an extra argument (like "--allow-partial-rackinfo") to
> suppress the error and do an imperfect rack aware assignment. If the
> default behavior is to allow incomplete mapping, the error can still be
> easily missed.
>
> The affected command line tools are TopicCommand and
> ReassignPartitionsCommand.
>
> Thanks,
> Allen
>
>
>
>
>
> On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <aa...@linkedin.com>
> wrote:
>
> > Hi Allen,
> >
> > For TopicMetadataResponse to understand version, you can bump up the
> > request version itself. Based on the version of the request, the response
> > can be appropriately serialized. It shouldn't be a huge change. For
> > example: We went through something similar for ProduceRequest recently (
> > https://reviews.apache.org/r/33378/)
> > I guess the reason protocol information is not included in the TMR is
> > because the topic itself is independent of any particular protocol (SSL
> vs
> > Plaintext). Having said that, I'm not sure we even need rack information
> in
> > TMR. What usecase were you thinking of initially?
> >
> > For 1 - I'd be fine with adding an option to the command line tools that
> > check rack assignment. For e.g. "--strict-assignment" or something
> similar.
> >
> > Aditya
> >
> > On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > For 2 and 3, I have updated the KIP. Please take a look. One thing I
> have
> > > changed is removing the proposal to add rack to TopicMetadataResponse.
> > The
> > > reason is that unlike UpdateMetadataRequest, TopicMetadataResponse does
> > not
> > > understand version. I don't see a way to include rack without breaking
> > old
> > > version of clients. That's probably why secure protocol is not included
> > in
> > > the TopicMetadataResponse either. I think it will be a much bigger
> change
> > > to include rack in TopicMetadataResponse.
> > >
> > > For 1, my concern is that doing rack aware assignment without complete
> > > broker to rack mapping will result in assignment that is not rack aware
> > and
> > > fail to provide fault tolerance in the event of rack outage. This kind
> of
> > > problem will be difficult to surface. And the cost of this problem is
> > high:
> > > you have to do partition reassignment if you are lucky to spot the
> > problem
> > > early on or face the consequence of data loss during real rack outage.
> > >
> > > I do see the concern of fail-fast as it might also cause data loss if
> > > producer is not able produce the message due to topic creation failure.
> > Is
> > > it feasible to treat dynamic topic creation and command tools
> > differently?
> > > We allow dynamic topic creation with incomplete broker-rack mapping and
> > > fail fast in command line. Another option is to let user determine the
> > > behavior for command line. For example, by default fail fast in command
> > > line but allow incomplete broker-rack mapping if another switch is
> > > provided.
> > >
> > >
> > >
> > >
> > > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> > > aauradkar@linkedin.com.invalid> wrote:
> > >
> > > > Hey Allen,
> > > >
> > > > 1. If we choose fail fast topic creation, we will have topic creation
> > > > failures while upgrading the cluster. I really doubt we want this
> > > behavior.
> > > > Ideally, this should be invisible to clients of a cluster. Currently,
> > > each
> > > > broker is effectively its own rack. So we probably can use the rack
> > > > information whenever possible but not make it a hard requirement. To
> > > extend
> > > > Gwen's example, one badly configured broker should not degrade topic
> > > > creation for the entire cluster.
> > > >
> > > > 2. Upgrade scenario - Can you add a section on the upgrade piece to
> > > confirm
> > > > that old clients will not see errors? I believe
> > > ZookeeperConsumerConnector
> > > > reads the Broker objects from ZK. I wanted to confirm that this will
> > not
> > > > cause any problems.
> > > >
> > > > 3. Could you elaborate your proposed changes to the
> > UpdateMetadataRequest
> > > > in the "Public Interfaces" section? Personally, I find this format
> easy
> > > to
> > > > read in terms of wire protocol changes:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > > >
> > > > Aditya
> > > >
> > > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <al...@gmail.com>
> > > wrote:
> > > >
> > > > > KIP is updated include rack as an optional property for broker.
> > Please
> > > > take
> > > > > a look and let me know if more details are needed.
> > > > >
> > > > > For the case where some brokers have rack and some do not, the
> > current
> > > > KIP
> > > > > uses the fail-fast behavior. If there are concerns, we can further
> > > > discuss
> > > > > this in the email thread or next hangout.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <allenxwang@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > That's a good question. I can think of three actions if the rack
> > > > > > information is incomplete:
> > > > > >
> > > > > > 1. Treat the node without rack as if it is on its unique rack
> > > > > > 2. Disregard all rack information and fallback to current
> algorithm
> > > > > > 3. Fail-fast
> > > > > >
> > > > > > Now I think about it, one and three make more sense. The reason
> for
> > > > > > fail-fast is that user mistake for not providing the rack may
> never
> > > be
> > > > > > found if we tolerate that and the assignment may not be rack
> aware
> > as
> > > > the
> > > > > > user has expected and this creates debug problems when things
> fail.
> > > > > >
> > > > > > What do you think? If not fail-fast, is there anyway we can make
> > the
> > > > user
> > > > > > error standing out?
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <
> gwen@confluent.io>
> > > > > wrote:
> > > > > >
> > > > > >> Thanks! Just to clarify, when some brokers have rack assignment
> > and
> > > > some
> > > > > >> don't, do we act like none of them have it? or like those
> without
> > > > > >> assignment are in their own rack?
> > > > > >>
> > > > > >> The first scenario is good when first setting up rack-awareness,
> > but
> > > > the
> > > > > >> second makes more sense for on-going maintenance (I can totally
> > see
> > > > > >> someone
> > > > > >> adding a node and forgetting to set the rack property, we don't
> > want
> > > > > this
> > > > > >> to change behavior for anything except the new node).
> > > > > >>
> > > > > >> What do you think?
> > > > > >>
> > > > > >> Gwen
> > > > > >>
> > > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
> > allenxwang@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > For scenario 1:
> > > > > >> >
> > > > > >> > - Add the rack information to broker property file or
> > dynamically
> > > > set
> > > > > >> it in
> > > > > >> > the wrapper code to bootstrap Kafka server. You would do that
> > for
> > > > all
> > > > > >> > brokers and restart the brokers one by one.
> > > > > >> >
> > > > > >> > In this scenario, the complete broker to rack mapping may not
> be
> > > > > >> available
> > > > > >> > until every broker is restarted. During that time we fall back
> > to
> > > > > >> default
> > > > > >> > replica assignment algorithm.
> > > > > >> >
> > > > > >> > For scenario 2:
> > > > > >> >
> > > > > >> > - Add the rack information to broker property file or
> > dynamically
> > > > set
> > > > > >> it in
> > > > > >> > the wrapper code and start the broker.
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
> > gwen@confluent.io>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Can you clarify the workflow for the following scenarios:
> > > > > >> > >
> > > > > >> > > 1. I currently have 6 brokers and want to add rack
> information
> > > for
> > > > > >> each
> > > > > >> > > 2. I'm adding a new broker and I want to specify which rack
> it
> > > > > >> belongs on
> > > > > >> > > while adding it.
> > > > > >> > >
> > > > > >> > > Thanks!
> > > > > >> > >
> > > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
> > > allenxwang@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > We discussed the KIP in the hangout today. The
> > recommendation
> > > is
> > > > > to
> > > > > >> > make
> > > > > >> > > > rack as a broker property in ZooKeeper. For users with
> > > existing
> > > > > rack
> > > > > >> > > > information stored somewhere, they would need to retrieve
> > the
> > > > > >> > information
> > > > > >> > > > at broker start up and dynamically set the rack property,
> > > which
> > > > > can
> > > > > >> be
> > > > > >> > > > implemented as a wrapper to bootstrap broker. There will
> be
> > no
> > > > > >> > interface
> > > > > >> > > or
> > > > > >> > > > pluggable implementation to retrieve the rack information.
> > > > > >> > > >
> > > > > >> > > > The assumption is that you always need to restart the
> broker
> > > to
> > > > > >> make a
> > > > > >> > > > change to the rack.
> > > > > >> > > >
> > > > > >> > > > Once the rack becomes a broker property, it will be
> possible
> > > to
> > > > > make
> > > > > >> > rack
> > > > > >> > > > part of the meta data to help the consumer choose which in
> > > sync
> > > > > >> replica
> > > > > >> > > to
> > > > > >> > > > consume from as part of the future consumer enhancement.
> > > > > >> > > >
> > > > > >> > > > I will update the KIP.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Allen
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
> > > > allenxwang@gmail.com>
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > I attended Tuesday's KIP hangout but this KIP was not
> > > > discussed
> > > > > >> due
> > > > > >> > to
> > > > > >> > > > > time constraint.
> > > > > >> > > > >
> > > > > >> > > > > However, after hearing discussion of KIP-35, I have the
> > > > feeling
> > > > > >> that
> > > > > >> > > > > incompatibility (caused by new broker property) between
> > > > brokers
> > > > > >> with
> > > > > >> > > > > different versions  will be solved there. In addition,
> > > having
> > > > > >> stack
> > > > > >> > in
> > > > > >> > > > > broker property as meta data may also help consumers in
> > the
> > > > > >> future.
> > > > > >> > So
> > > > > >> > > I
> > > > > >> > > > am
> > > > > >> > > > > open to adding stack property to broker.
> > > > > >> > > > >
> > > > > >> > > > > Hopefully we can discuss this in the next KIP hangout.
> > > > > >> > > > >
> > > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
> > > > > allenxwang@gmail.com
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > >> Can you send me the information on the next KIP
> hangout?
> > > > > >> > > > >>
> > > > > >> > > > >> Currently the broker-rack mapping is not cached. In
> > > > KafkaApis,
> > > > > >> > > > >> RackLocator.getRackInfo() is called each time the
> mapping
> > > is
> > > > > >> needed
> > > > > >> > > for
> > > > > >> > > > >> auto topic creation. This will ensure latest mapping is
> > > used
> > > > at
> > > > > >> any
> > > > > >> > > > time.
> > > > > >> > > > >>
> > > > > >> > > > >> The ability to get the complete mapping makes it simple
> > to
> > > > > reuse
> > > > > >> the
> > > > > >> > > > same
> > > > > >> > > > >> interface in command line tools.
> > > > > >> > > > >>
> > > > > >> > > > >>
> > > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> > > > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
> > > > > >> > > > >>
> > > > > >> > > > >>> Perhaps we discuss this during the next KIP hangout?
> > > > > >> > > > >>>
> > > > > >> > > > >>> I do see that a pluggable rack locator can be useful
> > but I
> > > > do
> > > > > >> see a
> > > > > >> > > few
> > > > > >> > > > >>> concerns:
> > > > > >> > > > >>>
> > > > > >> > > > >>> - The RackLocator (as described in the document),
> > implies
> > > > that
> > > > > >> it
> > > > > >> > can
> > > > > >> > > > >>> discover rack information for any node in the cluster.
> > How
> > > > > does
> > > > > >> it
> > > > > >> > > deal
> > > > > >> > > > >>> with rack location changes? For example, if I moved
> > broker
> > > > id
> > > > > >> (1)
> > > > > >> > > from
> > > > > >> > > > >>> rack
> > > > > >> > > > >>> X to Y, I only have to start that broker with a newer
> > rack
> > > > > >> config.
> > > > > >> > If
> > > > > >> > > > >>> RackLocator discovers broker -> rack information at
> > start
> > > up
> > > > > >> time,
> > > > > >> > > any
> > > > > >> > > > >>> change to a broker will require bouncing the entire
> > > cluster
> > > > > >> since
> > > > > >> > > > >>> createTopic requests can be sent to any node in the
> > > cluster.
> > > > > >> > > > >>> For this reason it may be simpler to have each node be
> > > aware
> > > > > of
> > > > > >> its
> > > > > >> > > own
> > > > > >> > > > >>> rack and persist it in ZK during start up time.
> > > > > >> > > > >>>
> > > > > >> > > > >>> - A pluggable RackLocator relies on an external
> service
> > > > being
> > > > > >> > > available
> > > > > >> > > > >>> to
> > > > > >> > > > >>> serve rack information.
> > > > > >> > > > >>>
> > > > > >> > > > >>> Out of curiosity, I looked up how a couple of other
> > > systems
> > > > > deal
> > > > > >> > with
> > > > > >> > > > >>> zone/rack awareness.
> > > > > >> > > > >>> For Cassandra some interesting modes are:
> > > > > >> > > > >>> (Property File configuration)
> > > > > >> > > > >>>
> > > > > >> > > > >>>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > > > >> > > > >>> (Dynamic inference)
> > > > > >> > > > >>>
> > > > > >> > > > >>>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > > > >> > > > >>>
> > > > > >> > > > >>> Voldemort does a static node -> zone assignment based
> on
> > > > > >> > > configuration.
> > > > > >> > > > >>>
> > > > > >> > > > >>> Aditya
> > > > > >> > > > >>>
> > > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
> > > > > >> allenxwang@gmail.com
> > > > > >> > >
> > > > > >> > > > >>> wrote:
> > > > > >> > > > >>>
> > > > > >> > > > >>> > I would like to see if we can do both:
> > > > > >> > > > >>> >
> > > > > >> > > > >>> > - Make RackLocator pluggable to facilitate migration
> > > with
> > > > > >> > existing
> > > > > >> > > > >>> > broker-rack mapping
> > > > > >> > > > >>> >
> > > > > >> > > > >>> > - Make rack an optional property for broker. If rack
> > is
> > > > > >> available
> > > > > >> > > > from
> > > > > >> > > > >>> > broker, treat it as source of truth. For users with
> > > > existing
> > > > > >> > > > >>> broker-rack
> > > > > >> > > > >>> > mapping somewhere else, they can use the pluggable
> way
> > > or
> > > > > they
> > > > > >> > can
> > > > > >> > > > >>> transfer
> > > > > >> > > > >>> > the mapping to the broker rack property.
> > > > > >> > > > >>> >
> > > > > >> > > > >>> > One thing I am not sure is what happens at rolling
> > > upgrade
> > > > > >> when
> > > > > >> > we
> > > > > >> > > > have
> > > > > >> > > > >>> > rack as a broker property. For brokers with older
> > > version
> > > > of
> > > > > >> > Kafka,
> > > > > >> > > > >>> will it
> > > > > >> > > > >>> > cause problem for them? If so, is there any
> > workaround?
> > > I
> > > > > also
> > > > > >> > > think
> > > > > >> > > > it
> > > > > >> > > > >>> > would be better not to have rack in the controller
> > wire
> > > > > >> protocol
> > > > > >> > > but
> > > > > >> > > > >>> not
> > > > > >> > > > >>> > sure if it is achievable.
> > > > > >> > > > >>> >
> > > > > >> > > > >>> > Thanks,
> > > > > >> > > > >>> > Allen
> > > > > >> > > > >>> >
> > > > > >> > > > >>> >
> > > > > >> > > > >>> >
> > > > > >> > > > >>> >
> > > > > >> > > > >>> >
> > > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
> > > > > >> tpalino@gmail.com>
> > > > > >> > > > >>> wrote:
> > > > > >> > > > >>> >
> > > > > >> > > > >>> > > I tend to like the idea of a pluggable locator.
> For
> > > > > >> example, we
> > > > > >> > > > >>> already
> > > > > >> > > > >>> > > have an interface for discovering information
> about
> > > the
> > > > > >> > physical
> > > > > >> > > > >>> location
> > > > > >> > > > >>> > > of servers. I don't relish the idea of having to
> > > > maintain
> > > > > >> data
> > > > > >> > in
> > > > > >> > > > >>> > multiple
> > > > > >> > > > >>> > > places.
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> > > -Todd
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> > > > > >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
> > > > > >> > > > >>> > > >
> > > > > >> > > > >>> > > > I agree with Gwen that having a RackLocator
> class
> > > that
> > > > > is
> > > > > >> > > > pluggable
> > > > > >> > > > >>> > seems
> > > > > >> > > > >>> > > > to be too complex. The KIP refers to potentially
> > > > non-ZK
> > > > > >> > storage
> > > > > >> > > > >>> for the
> > > > > >> > > > >>> > > > rack info which I don't think is necessary.
> > > > > >> > > > >>> > > >
> > > > > >> > > > >>> > > > Perhaps we can persist this info in zk under
> > > > > >> > > > >>> /brokers/ids/<broker_id>
> > > > > >> > > > >>> > > > similar to other broker properties and add a
> > config
> > > in
> > > > > >> > > > KafkaConfig
> > > > > >> > > > >>> > called
> > > > > >> > > > >>> > > > "rack".
> > > > > >> > > > >>> > > >
> > > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > > > >> > > "rack":
> > > > > >> > > > >>> > "abc"}
> > > > > >> > > > >>> > > >
> > > > > >> > > > >>> > > > Aditya
> > > > > >> > > > >>> > > >
> > > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <
> > > > > >> > > gwen@confluent.io
> > > > > >> > > > >
> > > > > >> > > > >>> > wrote:
> > > > > >> > > > >>> > > >
> > > > > >> > > > >>> > > > > Hi,
> > > > > >> > > > >>> > > > >
> > > > > >> > > > >>> > > > > First, thanks for putting out a KIP for this.
> > This
> > > > is
> > > > > >> super
> > > > > >> > > > >>> important
> > > > > >> > > > >>> > > for
> > > > > >> > > > >>> > > > > production deployments of Kafka.
> > > > > >> > > > >>> > > > >
> > > > > >> > > > >>> > > > > Few questions:
> > > > > >> > > > >>> > > > >
> > > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as
> > > possible"?
> > > > > I'd
> > > > > >> > want
> > > > > >> > > to
> > > > > >> > > > >>> > balance
> > > > > >> > > > >>> > > > > between safety (more racks) and network
> > > utilization
> > > > > >> > (traffic
> > > > > >> > > > >>> within a
> > > > > >> > > > >>> > > > rack
> > > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One
> replica
> > > on
> > > > a
> > > > > >> > > different
> > > > > >> > > > >>> rack
> > > > > >> > > > >>> > > and
> > > > > >> > > > >>> > > > > the rest on same rack (if possible) sounds
> > better
> > > to
> > > > > me.
> > > > > >> > > > >>> > > > >
> > > > > >> > > > >>> > > > > 2) Rack-locator class seems overly complex
> > > compared
> > > > to
> > > > > >> > > adding a
> > > > > >> > > > >>> > > > rack.number
> > > > > >> > > > >>> > > > > property to the broker properties file. Why do
> > we
> > > > want
> > > > > >> > that?
> > > > > >> > > > >>> > > > >
> > > > > >> > > > >>> > > > > Gwen
> > > > > >> > > > >>> > > > >
> > > > > >> > > > >>> > > > >
> > > > > >> > > > >>> > > > >
> > > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
> > > > > >> > > > >>> allenxwang@gmail.com>
> > > > > >> > > > >>> > > > wrote:
> > > > > >> > > > >>> > > > >
> > > > > >> > > > >>> > > > > > Hello Kafka Developers,
> > > > > >> > > > >>> > > > > >
> > > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware replica
> > > > > >> assignment.
> > > > > >> > > > >>> > > > > >
> > > > > >> > > > >>> > > > > >
> > > > > >> > > > >>> > > > > >
> > > > > >> > > > >>> > > > >
> > > > > >> > > > >>> > > >
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> >
> > > > > >> > > > >>>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > > >> > > > >>> > > > > >
> > > > > >> > > > >>> > > > > > The goal is to utilize the isolation
> provided
> > by
> > > > the
> > > > > >> > racks
> > > > > >> > > in
> > > > > >> > > > >>> data
> > > > > >> > > > >>> > > > center
> > > > > >> > > > >>> > > > > > and distribute replicas to racks to provide
> > > fault
> > > > > >> > > tolerance.
> > > > > >> > > > >>> > > > > >
> > > > > >> > > > >>> > > > > > Comments are welcome.
> > > > > >> > > > >>> > > > > >
> > > > > >> > > > >>> > > > > > Thanks,
> > > > > >> > > > >>> > > > > > Allen
> > > > > >> > > > >>> > > > > >
> > > > > >> > > > >>> > > > >
> > > > > >> > > > >>> > > >
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> >
> > > > > >> > > > >>>
> > > > > >> > > > >>
> > > > > >> > > > >>
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
During the discussion in the hangout, it was mentioned that it would be
desirable that consumers know the rack information of the brokers so that
they can consume from the broker in the same rack to reduce latency. As I
understand this will only be beneficial if consumer can consume from any
broker in ISR, which is not possible now.

I suggest we skip the change to TMR. Once the change is made to consumer to
be able to consume from any broker in ISR, the rack information can be
added to TMR.

Another thing I want to confirm is  command line behavior. I think the
desirable default behavior is to fail fast on command line for incomplete
rack mapping. The error message can include further instruction that tells
the user to add an extra argument (like "--allow-partial-rackinfo") to
suppress the error and do an imperfect rack aware assignment. If the
default behavior is to allow incomplete mapping, the error can still be
easily missed.

The affected command line tools are TopicCommand and
ReassignPartitionsCommand.

Thanks,
Allen





On Mon, Oct 26, 2015 at 12:55 PM, Aditya Auradkar <aa...@linkedin.com>
wrote:

> Hi Allen,
>
> For TopicMetadataResponse to understand version, you can bump up the
> request version itself. Based on the version of the request, the response
> can be appropriately serialized. It shouldn't be a huge change. For
> example: We went through something similar for ProduceRequest recently (
> https://reviews.apache.org/r/33378/)
> I guess the reason protocol information is not included in the TMR is
> because the topic itself is independent of any particular protocol (SSL vs
> Plaintext). Having said that, I'm not sure we even need rack information in
> TMR. What usecase were you thinking of initially?
>
> For 1 - I'd be fine with adding an option to the command line tools that
> check rack assignment. For e.g. "--strict-assignment" or something similar.
>
> Aditya
>
> On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <al...@gmail.com> wrote:
>
> > For 2 and 3, I have updated the KIP. Please take a look. One thing I have
> > changed is removing the proposal to add rack to TopicMetadataResponse.
> The
> > reason is that unlike UpdateMetadataRequest, TopicMetadataResponse does
> not
> > understand version. I don't see a way to include rack without breaking
> old
> > version of clients. That's probably why secure protocol is not included
> in
> > the TopicMetadataResponse either. I think it will be a much bigger change
> > to include rack in TopicMetadataResponse.
> >
> > For 1, my concern is that doing rack aware assignment without complete
> > broker to rack mapping will result in assignment that is not rack aware
> and
> > fail to provide fault tolerance in the event of rack outage. This kind of
> > problem will be difficult to surface. And the cost of this problem is
> high:
> > you have to do partition reassignment if you are lucky to spot the
> problem
> > early on or face the consequence of data loss during real rack outage.
> >
> > I do see the concern of fail-fast as it might also cause data loss if
> > producer is not able produce the message due to topic creation failure.
> Is
> > it feasible to treat dynamic topic creation and command tools
> differently?
> > We allow dynamic topic creation with incomplete broker-rack mapping and
> > fail fast in command line. Another option is to let user determine the
> > behavior for command line. For example, by default fail fast in command
> > line but allow incomplete broker-rack mapping if another switch is
> > provided.
> >
> >
> >
> >
> > On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> > aauradkar@linkedin.com.invalid> wrote:
> >
> > > Hey Allen,
> > >
> > > 1. If we choose fail fast topic creation, we will have topic creation
> > > failures while upgrading the cluster. I really doubt we want this
> > behavior.
> > > Ideally, this should be invisible to clients of a cluster. Currently,
> > each
> > > broker is effectively its own rack. So we probably can use the rack
> > > information whenever possible but not make it a hard requirement. To
> > extend
> > > Gwen's example, one badly configured broker should not degrade topic
> > > creation for the entire cluster.
> > >
> > > 2. Upgrade scenario - Can you add a section on the upgrade piece to
> > confirm
> > > that old clients will not see errors? I believe
> > ZookeeperConsumerConnector
> > > reads the Broker objects from ZK. I wanted to confirm that this will
> not
> > > cause any problems.
> > >
> > > 3. Could you elaborate your proposed changes to the
> UpdateMetadataRequest
> > > in the "Public Interfaces" section? Personally, I find this format easy
> > to
> > > read in terms of wire protocol changes:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> > >
> > > Aditya
> > >
> > > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <al...@gmail.com>
> > wrote:
> > >
> > > > KIP is updated include rack as an optional property for broker.
> Please
> > > take
> > > > a look and let me know if more details are needed.
> > > >
> > > > For the case where some brokers have rack and some do not, the
> current
> > > KIP
> > > > uses the fail-fast behavior. If there are concerns, we can further
> > > discuss
> > > > this in the email thread or next hangout.
> > > >
> > > >
> > > >
> > > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <al...@gmail.com>
> > > wrote:
> > > >
> > > > > That's a good question. I can think of three actions if the rack
> > > > > information is incomplete:
> > > > >
> > > > > 1. Treat the node without rack as if it is on its unique rack
> > > > > 2. Disregard all rack information and fallback to current algorithm
> > > > > 3. Fail-fast
> > > > >
> > > > > Now I think about it, one and three make more sense. The reason for
> > > > > fail-fast is that user mistake for not providing the rack may never
> > be
> > > > > found if we tolerate that and the assignment may not be rack aware
> as
> > > the
> > > > > user has expected and this creates debug problems when things fail.
> > > > >
> > > > > What do you think? If not fail-fast, is there anyway we can make
> the
> > > user
> > > > > error standing out?
> > > > >
> > > > >
> > > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <gw...@confluent.io>
> > > > wrote:
> > > > >
> > > > >> Thanks! Just to clarify, when some brokers have rack assignment
> and
> > > some
> > > > >> don't, do we act like none of them have it? or like those without
> > > > >> assignment are in their own rack?
> > > > >>
> > > > >> The first scenario is good when first setting up rack-awareness,
> but
> > > the
> > > > >> second makes more sense for on-going maintenance (I can totally
> see
> > > > >> someone
> > > > >> adding a node and forgetting to set the rack property, we don't
> want
> > > > this
> > > > >> to change behavior for anything except the new node).
> > > > >>
> > > > >> What do you think?
> > > > >>
> > > > >> Gwen
> > > > >>
> > > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <
> allenxwang@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > For scenario 1:
> > > > >> >
> > > > >> > - Add the rack information to broker property file or
> dynamically
> > > set
> > > > >> it in
> > > > >> > the wrapper code to bootstrap Kafka server. You would do that
> for
> > > all
> > > > >> > brokers and restart the brokers one by one.
> > > > >> >
> > > > >> > In this scenario, the complete broker to rack mapping may not be
> > > > >> available
> > > > >> > until every broker is restarted. During that time we fall back
> to
> > > > >> default
> > > > >> > replica assignment algorithm.
> > > > >> >
> > > > >> > For scenario 2:
> > > > >> >
> > > > >> > - Add the rack information to broker property file or
> dynamically
> > > set
> > > > >> it in
> > > > >> > the wrapper code and start the broker.
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <
> gwen@confluent.io>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Can you clarify the workflow for the following scenarios:
> > > > >> > >
> > > > >> > > 1. I currently have 6 brokers and want to add rack information
> > for
> > > > >> each
> > > > >> > > 2. I'm adding a new broker and I want to specify which rack it
> > > > >> belongs on
> > > > >> > > while adding it.
> > > > >> > >
> > > > >> > > Thanks!
> > > > >> > >
> > > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
> > allenxwang@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > We discussed the KIP in the hangout today. The
> recommendation
> > is
> > > > to
> > > > >> > make
> > > > >> > > > rack as a broker property in ZooKeeper. For users with
> > existing
> > > > rack
> > > > >> > > > information stored somewhere, they would need to retrieve
> the
> > > > >> > information
> > > > >> > > > at broker start up and dynamically set the rack property,
> > which
> > > > can
> > > > >> be
> > > > >> > > > implemented as a wrapper to bootstrap broker. There will be
> no
> > > > >> > interface
> > > > >> > > or
> > > > >> > > > pluggable implementation to retrieve the rack information.
> > > > >> > > >
> > > > >> > > > The assumption is that you always need to restart the broker
> > to
> > > > >> make a
> > > > >> > > > change to the rack.
> > > > >> > > >
> > > > >> > > > Once the rack becomes a broker property, it will be possible
> > to
> > > > make
> > > > >> > rack
> > > > >> > > > part of the meta data to help the consumer choose which in
> > sync
> > > > >> replica
> > > > >> > > to
> > > > >> > > > consume from as part of the future consumer enhancement.
> > > > >> > > >
> > > > >> > > > I will update the KIP.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Allen
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
> > > allenxwang@gmail.com>
> > > > >> > wrote:
> > > > >> > > >
> > > > >> > > > > I attended Tuesday's KIP hangout but this KIP was not
> > > discussed
> > > > >> due
> > > > >> > to
> > > > >> > > > > time constraint.
> > > > >> > > > >
> > > > >> > > > > However, after hearing discussion of KIP-35, I have the
> > > feeling
> > > > >> that
> > > > >> > > > > incompatibility (caused by new broker property) between
> > > brokers
> > > > >> with
> > > > >> > > > > different versions  will be solved there. In addition,
> > having
> > > > >> stack
> > > > >> > in
> > > > >> > > > > broker property as meta data may also help consumers in
> the
> > > > >> future.
> > > > >> > So
> > > > >> > > I
> > > > >> > > > am
> > > > >> > > > > open to adding stack property to broker.
> > > > >> > > > >
> > > > >> > > > > Hopefully we can discuss this in the next KIP hangout.
> > > > >> > > > >
> > > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
> > > > allenxwang@gmail.com
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > > >
> > > > >> > > > >> Can you send me the information on the next KIP hangout?
> > > > >> > > > >>
> > > > >> > > > >> Currently the broker-rack mapping is not cached. In
> > > KafkaApis,
> > > > >> > > > >> RackLocator.getRackInfo() is called each time the mapping
> > is
> > > > >> needed
> > > > >> > > for
> > > > >> > > > >> auto topic creation. This will ensure latest mapping is
> > used
> > > at
> > > > >> any
> > > > >> > > > time.
> > > > >> > > > >>
> > > > >> > > > >> The ability to get the complete mapping makes it simple
> to
> > > > reuse
> > > > >> the
> > > > >> > > > same
> > > > >> > > > >> interface in command line tools.
> > > > >> > > > >>
> > > > >> > > > >>
> > > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> > > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
> > > > >> > > > >>
> > > > >> > > > >>> Perhaps we discuss this during the next KIP hangout?
> > > > >> > > > >>>
> > > > >> > > > >>> I do see that a pluggable rack locator can be useful
> but I
> > > do
> > > > >> see a
> > > > >> > > few
> > > > >> > > > >>> concerns:
> > > > >> > > > >>>
> > > > >> > > > >>> - The RackLocator (as described in the document),
> implies
> > > that
> > > > >> it
> > > > >> > can
> > > > >> > > > >>> discover rack information for any node in the cluster.
> How
> > > > does
> > > > >> it
> > > > >> > > deal
> > > > >> > > > >>> with rack location changes? For example, if I moved
> broker
> > > id
> > > > >> (1)
> > > > >> > > from
> > > > >> > > > >>> rack
> > > > >> > > > >>> X to Y, I only have to start that broker with a newer
> rack
> > > > >> config.
> > > > >> > If
> > > > >> > > > >>> RackLocator discovers broker -> rack information at
> start
> > up
> > > > >> time,
> > > > >> > > any
> > > > >> > > > >>> change to a broker will require bouncing the entire
> > cluster
> > > > >> since
> > > > >> > > > >>> createTopic requests can be sent to any node in the
> > cluster.
> > > > >> > > > >>> For this reason it may be simpler to have each node be
> > aware
> > > > of
> > > > >> its
> > > > >> > > own
> > > > >> > > > >>> rack and persist it in ZK during start up time.
> > > > >> > > > >>>
> > > > >> > > > >>> - A pluggable RackLocator relies on an external service
> > > being
> > > > >> > > available
> > > > >> > > > >>> to
> > > > >> > > > >>> serve rack information.
> > > > >> > > > >>>
> > > > >> > > > >>> Out of curiosity, I looked up how a couple of other
> > systems
> > > > deal
> > > > >> > with
> > > > >> > > > >>> zone/rack awareness.
> > > > >> > > > >>> For Cassandra some interesting modes are:
> > > > >> > > > >>> (Property File configuration)
> > > > >> > > > >>>
> > > > >> > > > >>>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > > >> > > > >>> (Dynamic inference)
> > > > >> > > > >>>
> > > > >> > > > >>>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > > >> > > > >>>
> > > > >> > > > >>> Voldemort does a static node -> zone assignment based on
> > > > >> > > configuration.
> > > > >> > > > >>>
> > > > >> > > > >>> Aditya
> > > > >> > > > >>>
> > > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
> > > > >> allenxwang@gmail.com
> > > > >> > >
> > > > >> > > > >>> wrote:
> > > > >> > > > >>>
> > > > >> > > > >>> > I would like to see if we can do both:
> > > > >> > > > >>> >
> > > > >> > > > >>> > - Make RackLocator pluggable to facilitate migration
> > with
> > > > >> > existing
> > > > >> > > > >>> > broker-rack mapping
> > > > >> > > > >>> >
> > > > >> > > > >>> > - Make rack an optional property for broker. If rack
> is
> > > > >> available
> > > > >> > > > from
> > > > >> > > > >>> > broker, treat it as source of truth. For users with
> > > existing
> > > > >> > > > >>> broker-rack
> > > > >> > > > >>> > mapping somewhere else, they can use the pluggable way
> > or
> > > > they
> > > > >> > can
> > > > >> > > > >>> transfer
> > > > >> > > > >>> > the mapping to the broker rack property.
> > > > >> > > > >>> >
> > > > >> > > > >>> > One thing I am not sure is what happens at rolling
> > upgrade
> > > > >> when
> > > > >> > we
> > > > >> > > > have
> > > > >> > > > >>> > rack as a broker property. For brokers with older
> > version
> > > of
> > > > >> > Kafka,
> > > > >> > > > >>> will it
> > > > >> > > > >>> > cause problem for them? If so, is there any
> workaround?
> > I
> > > > also
> > > > >> > > think
> > > > >> > > > it
> > > > >> > > > >>> > would be better not to have rack in the controller
> wire
> > > > >> protocol
> > > > >> > > but
> > > > >> > > > >>> not
> > > > >> > > > >>> > sure if it is achievable.
> > > > >> > > > >>> >
> > > > >> > > > >>> > Thanks,
> > > > >> > > > >>> > Allen
> > > > >> > > > >>> >
> > > > >> > > > >>> >
> > > > >> > > > >>> >
> > > > >> > > > >>> >
> > > > >> > > > >>> >
> > > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
> > > > >> tpalino@gmail.com>
> > > > >> > > > >>> wrote:
> > > > >> > > > >>> >
> > > > >> > > > >>> > > I tend to like the idea of a pluggable locator. For
> > > > >> example, we
> > > > >> > > > >>> already
> > > > >> > > > >>> > > have an interface for discovering information about
> > the
> > > > >> > physical
> > > > >> > > > >>> location
> > > > >> > > > >>> > > of servers. I don't relish the idea of having to
> > > maintain
> > > > >> data
> > > > >> > in
> > > > >> > > > >>> > multiple
> > > > >> > > > >>> > > places.
> > > > >> > > > >>> > >
> > > > >> > > > >>> > > -Todd
> > > > >> > > > >>> > >
> > > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> > > > >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
> > > > >> > > > >>> > >
> > > > >> > > > >>> > > > Thanks for starting this KIP Allen.
> > > > >> > > > >>> > > >
> > > > >> > > > >>> > > > I agree with Gwen that having a RackLocator class
> > that
> > > > is
> > > > >> > > > pluggable
> > > > >> > > > >>> > seems
> > > > >> > > > >>> > > > to be too complex. The KIP refers to potentially
> > > non-ZK
> > > > >> > storage
> > > > >> > > > >>> for the
> > > > >> > > > >>> > > > rack info which I don't think is necessary.
> > > > >> > > > >>> > > >
> > > > >> > > > >>> > > > Perhaps we can persist this info in zk under
> > > > >> > > > >>> /brokers/ids/<broker_id>
> > > > >> > > > >>> > > > similar to other broker properties and add a
> config
> > in
> > > > >> > > > KafkaConfig
> > > > >> > > > >>> > called
> > > > >> > > > >>> > > > "rack".
> > > > >> > > > >>> > > >
> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > > >> > > "rack":
> > > > >> > > > >>> > "abc"}
> > > > >> > > > >>> > > >
> > > > >> > > > >>> > > > Aditya
> > > > >> > > > >>> > > >
> > > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <
> > > > >> > > gwen@confluent.io
> > > > >> > > > >
> > > > >> > > > >>> > wrote:
> > > > >> > > > >>> > > >
> > > > >> > > > >>> > > > > Hi,
> > > > >> > > > >>> > > > >
> > > > >> > > > >>> > > > > First, thanks for putting out a KIP for this.
> This
> > > is
> > > > >> super
> > > > >> > > > >>> important
> > > > >> > > > >>> > > for
> > > > >> > > > >>> > > > > production deployments of Kafka.
> > > > >> > > > >>> > > > >
> > > > >> > > > >>> > > > > Few questions:
> > > > >> > > > >>> > > > >
> > > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as
> > possible"?
> > > > I'd
> > > > >> > want
> > > > >> > > to
> > > > >> > > > >>> > balance
> > > > >> > > > >>> > > > > between safety (more racks) and network
> > utilization
> > > > >> > (traffic
> > > > >> > > > >>> within a
> > > > >> > > > >>> > > > rack
> > > > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One replica
> > on
> > > a
> > > > >> > > different
> > > > >> > > > >>> rack
> > > > >> > > > >>> > > and
> > > > >> > > > >>> > > > > the rest on same rack (if possible) sounds
> better
> > to
> > > > me.
> > > > >> > > > >>> > > > >
> > > > >> > > > >>> > > > > 2) Rack-locator class seems overly complex
> > compared
> > > to
> > > > >> > > adding a
> > > > >> > > > >>> > > > rack.number
> > > > >> > > > >>> > > > > property to the broker properties file. Why do
> we
> > > want
> > > > >> > that?
> > > > >> > > > >>> > > > >
> > > > >> > > > >>> > > > > Gwen
> > > > >> > > > >>> > > > >
> > > > >> > > > >>> > > > >
> > > > >> > > > >>> > > > >
> > > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
> > > > >> > > > >>> allenxwang@gmail.com>
> > > > >> > > > >>> > > > wrote:
> > > > >> > > > >>> > > > >
> > > > >> > > > >>> > > > > > Hello Kafka Developers,
> > > > >> > > > >>> > > > > >
> > > > >> > > > >>> > > > > > I just created KIP-36 for rack aware replica
> > > > >> assignment.
> > > > >> > > > >>> > > > > >
> > > > >> > > > >>> > > > > >
> > > > >> > > > >>> > > > > >
> > > > >> > > > >>> > > > >
> > > > >> > > > >>> > > >
> > > > >> > > > >>> > >
> > > > >> > > > >>> >
> > > > >> > > > >>>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > >> > > > >>> > > > > >
> > > > >> > > > >>> > > > > > The goal is to utilize the isolation provided
> by
> > > the
> > > > >> > racks
> > > > >> > > in
> > > > >> > > > >>> data
> > > > >> > > > >>> > > > center
> > > > >> > > > >>> > > > > > and distribute replicas to racks to provide
> > fault
> > > > >> > > tolerance.
> > > > >> > > > >>> > > > > >
> > > > >> > > > >>> > > > > > Comments are welcome.
> > > > >> > > > >>> > > > > >
> > > > >> > > > >>> > > > > > Thanks,
> > > > >> > > > >>> > > > > > Allen
> > > > >> > > > >>> > > > > >
> > > > >> > > > >>> > > > >
> > > > >> > > > >>> > > >
> > > > >> > > > >>> > >
> > > > >> > > > >>> >
> > > > >> > > > >>>
> > > > >> > > > >>
> > > > >> > > > >>
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Aditya Auradkar <aa...@linkedin.com>.
Hi Allen,

For TopicMetadataResponse to understand version, you can bump up the
request version itself. Based on the version of the request, the response
can be appropriately serialized. It shouldn't be a huge change. For
example: We went through something similar for ProduceRequest recently (
https://reviews.apache.org/r/33378/)
I guess the reason protocol information is not included in the TMR is
because the topic itself is independent of any particular protocol (SSL vs
Plaintext). Having said that, I'm not sure we even need rack information in
TMR. What usecase were you thinking of initially?

For 1 - I'd be fine with adding an option to the command line tools that
check rack assignment. For e.g. "--strict-assignment" or something similar.

Aditya

On Thu, Oct 22, 2015 at 6:44 PM, Allen Wang <al...@gmail.com> wrote:

> For 2 and 3, I have updated the KIP. Please take a look. One thing I have
> changed is removing the proposal to add rack to TopicMetadataResponse. The
> reason is that unlike UpdateMetadataRequest, TopicMetadataResponse does not
> understand version. I don't see a way to include rack without breaking old
> version of clients. That's probably why secure protocol is not included in
> the TopicMetadataResponse either. I think it will be a much bigger change
> to include rack in TopicMetadataResponse.
>
> For 1, my concern is that doing rack aware assignment without complete
> broker to rack mapping will result in assignment that is not rack aware and
> fail to provide fault tolerance in the event of rack outage. This kind of
> problem will be difficult to surface. And the cost of this problem is high:
> you have to do partition reassignment if you are lucky to spot the problem
> early on or face the consequence of data loss during real rack outage.
>
> I do see the concern of fail-fast as it might also cause data loss if
> producer is not able produce the message due to topic creation failure. Is
> it feasible to treat dynamic topic creation and command tools differently?
> We allow dynamic topic creation with incomplete broker-rack mapping and
> fail fast in command line. Another option is to let user determine the
> behavior for command line. For example, by default fail fast in command
> line but allow incomplete broker-rack mapping if another switch is
> provided.
>
>
>
>
> On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
> aauradkar@linkedin.com.invalid> wrote:
>
> > Hey Allen,
> >
> > 1. If we choose fail fast topic creation, we will have topic creation
> > failures while upgrading the cluster. I really doubt we want this
> behavior.
> > Ideally, this should be invisible to clients of a cluster. Currently,
> each
> > broker is effectively its own rack. So we probably can use the rack
> > information whenever possible but not make it a hard requirement. To
> extend
> > Gwen's example, one badly configured broker should not degrade topic
> > creation for the entire cluster.
> >
> > 2. Upgrade scenario - Can you add a section on the upgrade piece to
> confirm
> > that old clients will not see errors? I believe
> ZookeeperConsumerConnector
> > reads the Broker objects from ZK. I wanted to confirm that this will not
> > cause any problems.
> >
> > 3. Could you elaborate your proposed changes to the UpdateMetadataRequest
> > in the "Public Interfaces" section? Personally, I find this format easy
> to
> > read in terms of wire protocol changes:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> >
> > Aditya
> >
> > On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > KIP is updated include rack as an optional property for broker. Please
> > take
> > > a look and let me know if more details are needed.
> > >
> > > For the case where some brokers have rack and some do not, the current
> > KIP
> > > uses the fail-fast behavior. If there are concerns, we can further
> > discuss
> > > this in the email thread or next hangout.
> > >
> > >
> > >
> > > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <al...@gmail.com>
> > wrote:
> > >
> > > > That's a good question. I can think of three actions if the rack
> > > > information is incomplete:
> > > >
> > > > 1. Treat the node without rack as if it is on its unique rack
> > > > 2. Disregard all rack information and fallback to current algorithm
> > > > 3. Fail-fast
> > > >
> > > > Now I think about it, one and three make more sense. The reason for
> > > > fail-fast is that user mistake for not providing the rack may never
> be
> > > > found if we tolerate that and the assignment may not be rack aware as
> > the
> > > > user has expected and this creates debug problems when things fail.
> > > >
> > > > What do you think? If not fail-fast, is there anyway we can make the
> > user
> > > > error standing out?
> > > >
> > > >
> > > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <gw...@confluent.io>
> > > wrote:
> > > >
> > > >> Thanks! Just to clarify, when some brokers have rack assignment and
> > some
> > > >> don't, do we act like none of them have it? or like those without
> > > >> assignment are in their own rack?
> > > >>
> > > >> The first scenario is good when first setting up rack-awareness, but
> > the
> > > >> second makes more sense for on-going maintenance (I can totally see
> > > >> someone
> > > >> adding a node and forgetting to set the rack property, we don't want
> > > this
> > > >> to change behavior for anything except the new node).
> > > >>
> > > >> What do you think?
> > > >>
> > > >> Gwen
> > > >>
> > > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <al...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > For scenario 1:
> > > >> >
> > > >> > - Add the rack information to broker property file or dynamically
> > set
> > > >> it in
> > > >> > the wrapper code to bootstrap Kafka server. You would do that for
> > all
> > > >> > brokers and restart the brokers one by one.
> > > >> >
> > > >> > In this scenario, the complete broker to rack mapping may not be
> > > >> available
> > > >> > until every broker is restarted. During that time we fall back to
> > > >> default
> > > >> > replica assignment algorithm.
> > > >> >
> > > >> > For scenario 2:
> > > >> >
> > > >> > - Add the rack information to broker property file or dynamically
> > set
> > > >> it in
> > > >> > the wrapper code and start the broker.
> > > >> >
> > > >> >
> > > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <gw...@confluent.io>
> > > >> wrote:
> > > >> >
> > > >> > > Can you clarify the workflow for the following scenarios:
> > > >> > >
> > > >> > > 1. I currently have 6 brokers and want to add rack information
> for
> > > >> each
> > > >> > > 2. I'm adding a new broker and I want to specify which rack it
> > > >> belongs on
> > > >> > > while adding it.
> > > >> > >
> > > >> > > Thanks!
> > > >> > >
> > > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <
> allenxwang@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > We discussed the KIP in the hangout today. The recommendation
> is
> > > to
> > > >> > make
> > > >> > > > rack as a broker property in ZooKeeper. For users with
> existing
> > > rack
> > > >> > > > information stored somewhere, they would need to retrieve the
> > > >> > information
> > > >> > > > at broker start up and dynamically set the rack property,
> which
> > > can
> > > >> be
> > > >> > > > implemented as a wrapper to bootstrap broker. There will be no
> > > >> > interface
> > > >> > > or
> > > >> > > > pluggable implementation to retrieve the rack information.
> > > >> > > >
> > > >> > > > The assumption is that you always need to restart the broker
> to
> > > >> make a
> > > >> > > > change to the rack.
> > > >> > > >
> > > >> > > > Once the rack becomes a broker property, it will be possible
> to
> > > make
> > > >> > rack
> > > >> > > > part of the meta data to help the consumer choose which in
> sync
> > > >> replica
> > > >> > > to
> > > >> > > > consume from as part of the future consumer enhancement.
> > > >> > > >
> > > >> > > > I will update the KIP.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Allen
> > > >> > > >
> > > >> > > >
> > > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
> > allenxwang@gmail.com>
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > I attended Tuesday's KIP hangout but this KIP was not
> > discussed
> > > >> due
> > > >> > to
> > > >> > > > > time constraint.
> > > >> > > > >
> > > >> > > > > However, after hearing discussion of KIP-35, I have the
> > feeling
> > > >> that
> > > >> > > > > incompatibility (caused by new broker property) between
> > brokers
> > > >> with
> > > >> > > > > different versions  will be solved there. In addition,
> having
> > > >> stack
> > > >> > in
> > > >> > > > > broker property as meta data may also help consumers in the
> > > >> future.
> > > >> > So
> > > >> > > I
> > > >> > > > am
> > > >> > > > > open to adding stack property to broker.
> > > >> > > > >
> > > >> > > > > Hopefully we can discuss this in the next KIP hangout.
> > > >> > > > >
> > > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
> > > allenxwang@gmail.com
> > > >> >
> > > >> > > > wrote:
> > > >> > > > >
> > > >> > > > >> Can you send me the information on the next KIP hangout?
> > > >> > > > >>
> > > >> > > > >> Currently the broker-rack mapping is not cached. In
> > KafkaApis,
> > > >> > > > >> RackLocator.getRackInfo() is called each time the mapping
> is
> > > >> needed
> > > >> > > for
> > > >> > > > >> auto topic creation. This will ensure latest mapping is
> used
> > at
> > > >> any
> > > >> > > > time.
> > > >> > > > >>
> > > >> > > > >> The ability to get the complete mapping makes it simple to
> > > reuse
> > > >> the
> > > >> > > > same
> > > >> > > > >> interface in command line tools.
> > > >> > > > >>
> > > >> > > > >>
> > > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> > > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
> > > >> > > > >>
> > > >> > > > >>> Perhaps we discuss this during the next KIP hangout?
> > > >> > > > >>>
> > > >> > > > >>> I do see that a pluggable rack locator can be useful but I
> > do
> > > >> see a
> > > >> > > few
> > > >> > > > >>> concerns:
> > > >> > > > >>>
> > > >> > > > >>> - The RackLocator (as described in the document), implies
> > that
> > > >> it
> > > >> > can
> > > >> > > > >>> discover rack information for any node in the cluster. How
> > > does
> > > >> it
> > > >> > > deal
> > > >> > > > >>> with rack location changes? For example, if I moved broker
> > id
> > > >> (1)
> > > >> > > from
> > > >> > > > >>> rack
> > > >> > > > >>> X to Y, I only have to start that broker with a newer rack
> > > >> config.
> > > >> > If
> > > >> > > > >>> RackLocator discovers broker -> rack information at start
> up
> > > >> time,
> > > >> > > any
> > > >> > > > >>> change to a broker will require bouncing the entire
> cluster
> > > >> since
> > > >> > > > >>> createTopic requests can be sent to any node in the
> cluster.
> > > >> > > > >>> For this reason it may be simpler to have each node be
> aware
> > > of
> > > >> its
> > > >> > > own
> > > >> > > > >>> rack and persist it in ZK during start up time.
> > > >> > > > >>>
> > > >> > > > >>> - A pluggable RackLocator relies on an external service
> > being
> > > >> > > available
> > > >> > > > >>> to
> > > >> > > > >>> serve rack information.
> > > >> > > > >>>
> > > >> > > > >>> Out of curiosity, I looked up how a couple of other
> systems
> > > deal
> > > >> > with
> > > >> > > > >>> zone/rack awareness.
> > > >> > > > >>> For Cassandra some interesting modes are:
> > > >> > > > >>> (Property File configuration)
> > > >> > > > >>>
> > > >> > > > >>>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > >> > > > >>> (Dynamic inference)
> > > >> > > > >>>
> > > >> > > > >>>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > >> > > > >>>
> > > >> > > > >>> Voldemort does a static node -> zone assignment based on
> > > >> > > configuration.
> > > >> > > > >>>
> > > >> > > > >>> Aditya
> > > >> > > > >>>
> > > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
> > > >> allenxwang@gmail.com
> > > >> > >
> > > >> > > > >>> wrote:
> > > >> > > > >>>
> > > >> > > > >>> > I would like to see if we can do both:
> > > >> > > > >>> >
> > > >> > > > >>> > - Make RackLocator pluggable to facilitate migration
> with
> > > >> > existing
> > > >> > > > >>> > broker-rack mapping
> > > >> > > > >>> >
> > > >> > > > >>> > - Make rack an optional property for broker. If rack is
> > > >> available
> > > >> > > > from
> > > >> > > > >>> > broker, treat it as source of truth. For users with
> > existing
> > > >> > > > >>> broker-rack
> > > >> > > > >>> > mapping somewhere else, they can use the pluggable way
> or
> > > they
> > > >> > can
> > > >> > > > >>> transfer
> > > >> > > > >>> > the mapping to the broker rack property.
> > > >> > > > >>> >
> > > >> > > > >>> > One thing I am not sure is what happens at rolling
> upgrade
> > > >> when
> > > >> > we
> > > >> > > > have
> > > >> > > > >>> > rack as a broker property. For brokers with older
> version
> > of
> > > >> > Kafka,
> > > >> > > > >>> will it
> > > >> > > > >>> > cause problem for them? If so, is there any workaround?
> I
> > > also
> > > >> > > think
> > > >> > > > it
> > > >> > > > >>> > would be better not to have rack in the controller wire
> > > >> protocol
> > > >> > > but
> > > >> > > > >>> not
> > > >> > > > >>> > sure if it is achievable.
> > > >> > > > >>> >
> > > >> > > > >>> > Thanks,
> > > >> > > > >>> > Allen
> > > >> > > > >>> >
> > > >> > > > >>> >
> > > >> > > > >>> >
> > > >> > > > >>> >
> > > >> > > > >>> >
> > > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
> > > >> tpalino@gmail.com>
> > > >> > > > >>> wrote:
> > > >> > > > >>> >
> > > >> > > > >>> > > I tend to like the idea of a pluggable locator. For
> > > >> example, we
> > > >> > > > >>> already
> > > >> > > > >>> > > have an interface for discovering information about
> the
> > > >> > physical
> > > >> > > > >>> location
> > > >> > > > >>> > > of servers. I don't relish the idea of having to
> > maintain
> > > >> data
> > > >> > in
> > > >> > > > >>> > multiple
> > > >> > > > >>> > > places.
> > > >> > > > >>> > >
> > > >> > > > >>> > > -Todd
> > > >> > > > >>> > >
> > > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> > > >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
> > > >> > > > >>> > >
> > > >> > > > >>> > > > Thanks for starting this KIP Allen.
> > > >> > > > >>> > > >
> > > >> > > > >>> > > > I agree with Gwen that having a RackLocator class
> that
> > > is
> > > >> > > > pluggable
> > > >> > > > >>> > seems
> > > >> > > > >>> > > > to be too complex. The KIP refers to potentially
> > non-ZK
> > > >> > storage
> > > >> > > > >>> for the
> > > >> > > > >>> > > > rack info which I don't think is necessary.
> > > >> > > > >>> > > >
> > > >> > > > >>> > > > Perhaps we can persist this info in zk under
> > > >> > > > >>> /brokers/ids/<broker_id>
> > > >> > > > >>> > > > similar to other broker properties and add a config
> in
> > > >> > > > KafkaConfig
> > > >> > > > >>> > called
> > > >> > > > >>> > > > "rack".
> > > >> > > > >>> > > >
> > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > >> > > "rack":
> > > >> > > > >>> > "abc"}
> > > >> > > > >>> > > >
> > > >> > > > >>> > > > Aditya
> > > >> > > > >>> > > >
> > > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <
> > > >> > > gwen@confluent.io
> > > >> > > > >
> > > >> > > > >>> > wrote:
> > > >> > > > >>> > > >
> > > >> > > > >>> > > > > Hi,
> > > >> > > > >>> > > > >
> > > >> > > > >>> > > > > First, thanks for putting out a KIP for this. This
> > is
> > > >> super
> > > >> > > > >>> important
> > > >> > > > >>> > > for
> > > >> > > > >>> > > > > production deployments of Kafka.
> > > >> > > > >>> > > > >
> > > >> > > > >>> > > > > Few questions:
> > > >> > > > >>> > > > >
> > > >> > > > >>> > > > > 1) Are we sure we want "as many racks as
> possible"?
> > > I'd
> > > >> > want
> > > >> > > to
> > > >> > > > >>> > balance
> > > >> > > > >>> > > > > between safety (more racks) and network
> utilization
> > > >> > (traffic
> > > >> > > > >>> within a
> > > >> > > > >>> > > > rack
> > > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One replica
> on
> > a
> > > >> > > different
> > > >> > > > >>> rack
> > > >> > > > >>> > > and
> > > >> > > > >>> > > > > the rest on same rack (if possible) sounds better
> to
> > > me.
> > > >> > > > >>> > > > >
> > > >> > > > >>> > > > > 2) Rack-locator class seems overly complex
> compared
> > to
> > > >> > > adding a
> > > >> > > > >>> > > > rack.number
> > > >> > > > >>> > > > > property to the broker properties file. Why do we
> > want
> > > >> > that?
> > > >> > > > >>> > > > >
> > > >> > > > >>> > > > > Gwen
> > > >> > > > >>> > > > >
> > > >> > > > >>> > > > >
> > > >> > > > >>> > > > >
> > > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
> > > >> > > > >>> allenxwang@gmail.com>
> > > >> > > > >>> > > > wrote:
> > > >> > > > >>> > > > >
> > > >> > > > >>> > > > > > Hello Kafka Developers,
> > > >> > > > >>> > > > > >
> > > >> > > > >>> > > > > > I just created KIP-36 for rack aware replica
> > > >> assignment.
> > > >> > > > >>> > > > > >
> > > >> > > > >>> > > > > >
> > > >> > > > >>> > > > > >
> > > >> > > > >>> > > > >
> > > >> > > > >>> > > >
> > > >> > > > >>> > >
> > > >> > > > >>> >
> > > >> > > > >>>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > >> > > > >>> > > > > >
> > > >> > > > >>> > > > > > The goal is to utilize the isolation provided by
> > the
> > > >> > racks
> > > >> > > in
> > > >> > > > >>> data
> > > >> > > > >>> > > > center
> > > >> > > > >>> > > > > > and distribute replicas to racks to provide
> fault
> > > >> > > tolerance.
> > > >> > > > >>> > > > > >
> > > >> > > > >>> > > > > > Comments are welcome.
> > > >> > > > >>> > > > > >
> > > >> > > > >>> > > > > > Thanks,
> > > >> > > > >>> > > > > > Allen
> > > >> > > > >>> > > > > >
> > > >> > > > >>> > > > >
> > > >> > > > >>> > > >
> > > >> > > > >>> > >
> > > >> > > > >>> >
> > > >> > > > >>>
> > > >> > > > >>
> > > >> > > > >>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
For 2 and 3, I have updated the KIP. Please take a look. One thing I have
changed is removing the proposal to add rack to TopicMetadataResponse. The
reason is that unlike UpdateMetadataRequest, TopicMetadataResponse does not
understand version. I don't see a way to include rack without breaking old
version of clients. That's probably why secure protocol is not included in
the TopicMetadataResponse either. I think it will be a much bigger change
to include rack in TopicMetadataResponse.

For 1, my concern is that doing rack aware assignment without complete
broker to rack mapping will result in assignment that is not rack aware and
fail to provide fault tolerance in the event of rack outage. This kind of
problem will be difficult to surface. And the cost of this problem is high:
you have to do partition reassignment if you are lucky to spot the problem
early on or face the consequence of data loss during real rack outage.

I do see the concern of fail-fast as it might also cause data loss if
producer is not able produce the message due to topic creation failure. Is
it feasible to treat dynamic topic creation and command tools differently?
We allow dynamic topic creation with incomplete broker-rack mapping and
fail fast in command line. Another option is to let user determine the
behavior for command line. For example, by default fail fast in command
line but allow incomplete broker-rack mapping if another switch is provided.




On Tue, Oct 20, 2015 at 10:05 AM, Aditya Auradkar <
aauradkar@linkedin.com.invalid> wrote:

> Hey Allen,
>
> 1. If we choose fail fast topic creation, we will have topic creation
> failures while upgrading the cluster. I really doubt we want this behavior.
> Ideally, this should be invisible to clients of a cluster. Currently, each
> broker is effectively its own rack. So we probably can use the rack
> information whenever possible but not make it a hard requirement. To extend
> Gwen's example, one badly configured broker should not degrade topic
> creation for the entire cluster.
>
> 2. Upgrade scenario - Can you add a section on the upgrade piece to confirm
> that old clients will not see errors? I believe ZookeeperConsumerConnector
> reads the Broker objects from ZK. I wanted to confirm that this will not
> cause any problems.
>
> 3. Could you elaborate your proposed changes to the UpdateMetadataRequest
> in the "Public Interfaces" section? Personally, I find this format easy to
> read in terms of wire protocol changes:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
>
> Aditya
>
> On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <al...@gmail.com> wrote:
>
> > KIP is updated include rack as an optional property for broker. Please
> take
> > a look and let me know if more details are needed.
> >
> > For the case where some brokers have rack and some do not, the current
> KIP
> > uses the fail-fast behavior. If there are concerns, we can further
> discuss
> > this in the email thread or next hangout.
> >
> >
> >
> > On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > That's a good question. I can think of three actions if the rack
> > > information is incomplete:
> > >
> > > 1. Treat the node without rack as if it is on its unique rack
> > > 2. Disregard all rack information and fallback to current algorithm
> > > 3. Fail-fast
> > >
> > > Now I think about it, one and three make more sense. The reason for
> > > fail-fast is that user mistake for not providing the rack may never be
> > > found if we tolerate that and the assignment may not be rack aware as
> the
> > > user has expected and this creates debug problems when things fail.
> > >
> > > What do you think? If not fail-fast, is there anyway we can make the
> user
> > > error standing out?
> > >
> > >
> > > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <gw...@confluent.io>
> > wrote:
> > >
> > >> Thanks! Just to clarify, when some brokers have rack assignment and
> some
> > >> don't, do we act like none of them have it? or like those without
> > >> assignment are in their own rack?
> > >>
> > >> The first scenario is good when first setting up rack-awareness, but
> the
> > >> second makes more sense for on-going maintenance (I can totally see
> > >> someone
> > >> adding a node and forgetting to set the rack property, we don't want
> > this
> > >> to change behavior for anything except the new node).
> > >>
> > >> What do you think?
> > >>
> > >> Gwen
> > >>
> > >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <al...@gmail.com>
> > >> wrote:
> > >>
> > >> > For scenario 1:
> > >> >
> > >> > - Add the rack information to broker property file or dynamically
> set
> > >> it in
> > >> > the wrapper code to bootstrap Kafka server. You would do that for
> all
> > >> > brokers and restart the brokers one by one.
> > >> >
> > >> > In this scenario, the complete broker to rack mapping may not be
> > >> available
> > >> > until every broker is restarted. During that time we fall back to
> > >> default
> > >> > replica assignment algorithm.
> > >> >
> > >> > For scenario 2:
> > >> >
> > >> > - Add the rack information to broker property file or dynamically
> set
> > >> it in
> > >> > the wrapper code and start the broker.
> > >> >
> > >> >
> > >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <gw...@confluent.io>
> > >> wrote:
> > >> >
> > >> > > Can you clarify the workflow for the following scenarios:
> > >> > >
> > >> > > 1. I currently have 6 brokers and want to add rack information for
> > >> each
> > >> > > 2. I'm adding a new broker and I want to specify which rack it
> > >> belongs on
> > >> > > while adding it.
> > >> > >
> > >> > > Thanks!
> > >> > >
> > >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <allenxwang@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > We discussed the KIP in the hangout today. The recommendation is
> > to
> > >> > make
> > >> > > > rack as a broker property in ZooKeeper. For users with existing
> > rack
> > >> > > > information stored somewhere, they would need to retrieve the
> > >> > information
> > >> > > > at broker start up and dynamically set the rack property, which
> > can
> > >> be
> > >> > > > implemented as a wrapper to bootstrap broker. There will be no
> > >> > interface
> > >> > > or
> > >> > > > pluggable implementation to retrieve the rack information.
> > >> > > >
> > >> > > > The assumption is that you always need to restart the broker to
> > >> make a
> > >> > > > change to the rack.
> > >> > > >
> > >> > > > Once the rack becomes a broker property, it will be possible to
> > make
> > >> > rack
> > >> > > > part of the meta data to help the consumer choose which in sync
> > >> replica
> > >> > > to
> > >> > > > consume from as part of the future consumer enhancement.
> > >> > > >
> > >> > > > I will update the KIP.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Allen
> > >> > > >
> > >> > > >
> > >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <
> allenxwang@gmail.com>
> > >> > wrote:
> > >> > > >
> > >> > > > > I attended Tuesday's KIP hangout but this KIP was not
> discussed
> > >> due
> > >> > to
> > >> > > > > time constraint.
> > >> > > > >
> > >> > > > > However, after hearing discussion of KIP-35, I have the
> feeling
> > >> that
> > >> > > > > incompatibility (caused by new broker property) between
> brokers
> > >> with
> > >> > > > > different versions  will be solved there. In addition, having
> > >> stack
> > >> > in
> > >> > > > > broker property as meta data may also help consumers in the
> > >> future.
> > >> > So
> > >> > > I
> > >> > > > am
> > >> > > > > open to adding stack property to broker.
> > >> > > > >
> > >> > > > > Hopefully we can discuss this in the next KIP hangout.
> > >> > > > >
> > >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
> > allenxwang@gmail.com
> > >> >
> > >> > > > wrote:
> > >> > > > >
> > >> > > > >> Can you send me the information on the next KIP hangout?
> > >> > > > >>
> > >> > > > >> Currently the broker-rack mapping is not cached. In
> KafkaApis,
> > >> > > > >> RackLocator.getRackInfo() is called each time the mapping is
> > >> needed
> > >> > > for
> > >> > > > >> auto topic creation. This will ensure latest mapping is used
> at
> > >> any
> > >> > > > time.
> > >> > > > >>
> > >> > > > >> The ability to get the complete mapping makes it simple to
> > reuse
> > >> the
> > >> > > > same
> > >> > > > >> interface in command line tools.
> > >> > > > >>
> > >> > > > >>
> > >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> > >> > > > >> aauradkar@linkedin.com.invalid> wrote:
> > >> > > > >>
> > >> > > > >>> Perhaps we discuss this during the next KIP hangout?
> > >> > > > >>>
> > >> > > > >>> I do see that a pluggable rack locator can be useful but I
> do
> > >> see a
> > >> > > few
> > >> > > > >>> concerns:
> > >> > > > >>>
> > >> > > > >>> - The RackLocator (as described in the document), implies
> that
> > >> it
> > >> > can
> > >> > > > >>> discover rack information for any node in the cluster. How
> > does
> > >> it
> > >> > > deal
> > >> > > > >>> with rack location changes? For example, if I moved broker
> id
> > >> (1)
> > >> > > from
> > >> > > > >>> rack
> > >> > > > >>> X to Y, I only have to start that broker with a newer rack
> > >> config.
> > >> > If
> > >> > > > >>> RackLocator discovers broker -> rack information at start up
> > >> time,
> > >> > > any
> > >> > > > >>> change to a broker will require bouncing the entire cluster
> > >> since
> > >> > > > >>> createTopic requests can be sent to any node in the cluster.
> > >> > > > >>> For this reason it may be simpler to have each node be aware
> > of
> > >> its
> > >> > > own
> > >> > > > >>> rack and persist it in ZK during start up time.
> > >> > > > >>>
> > >> > > > >>> - A pluggable RackLocator relies on an external service
> being
> > >> > > available
> > >> > > > >>> to
> > >> > > > >>> serve rack information.
> > >> > > > >>>
> > >> > > > >>> Out of curiosity, I looked up how a couple of other systems
> > deal
> > >> > with
> > >> > > > >>> zone/rack awareness.
> > >> > > > >>> For Cassandra some interesting modes are:
> > >> > > > >>> (Property File configuration)
> > >> > > > >>>
> > >> > > > >>>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > >> > > > >>> (Dynamic inference)
> > >> > > > >>>
> > >> > > > >>>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > >> > > > >>>
> > >> > > > >>> Voldemort does a static node -> zone assignment based on
> > >> > > configuration.
> > >> > > > >>>
> > >> > > > >>> Aditya
> > >> > > > >>>
> > >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
> > >> allenxwang@gmail.com
> > >> > >
> > >> > > > >>> wrote:
> > >> > > > >>>
> > >> > > > >>> > I would like to see if we can do both:
> > >> > > > >>> >
> > >> > > > >>> > - Make RackLocator pluggable to facilitate migration with
> > >> > existing
> > >> > > > >>> > broker-rack mapping
> > >> > > > >>> >
> > >> > > > >>> > - Make rack an optional property for broker. If rack is
> > >> available
> > >> > > > from
> > >> > > > >>> > broker, treat it as source of truth. For users with
> existing
> > >> > > > >>> broker-rack
> > >> > > > >>> > mapping somewhere else, they can use the pluggable way or
> > they
> > >> > can
> > >> > > > >>> transfer
> > >> > > > >>> > the mapping to the broker rack property.
> > >> > > > >>> >
> > >> > > > >>> > One thing I am not sure is what happens at rolling upgrade
> > >> when
> > >> > we
> > >> > > > have
> > >> > > > >>> > rack as a broker property. For brokers with older version
> of
> > >> > Kafka,
> > >> > > > >>> will it
> > >> > > > >>> > cause problem for them? If so, is there any workaround? I
> > also
> > >> > > think
> > >> > > > it
> > >> > > > >>> > would be better not to have rack in the controller wire
> > >> protocol
> > >> > > but
> > >> > > > >>> not
> > >> > > > >>> > sure if it is achievable.
> > >> > > > >>> >
> > >> > > > >>> > Thanks,
> > >> > > > >>> > Allen
> > >> > > > >>> >
> > >> > > > >>> >
> > >> > > > >>> >
> > >> > > > >>> >
> > >> > > > >>> >
> > >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
> > >> tpalino@gmail.com>
> > >> > > > >>> wrote:
> > >> > > > >>> >
> > >> > > > >>> > > I tend to like the idea of a pluggable locator. For
> > >> example, we
> > >> > > > >>> already
> > >> > > > >>> > > have an interface for discovering information about the
> > >> > physical
> > >> > > > >>> location
> > >> > > > >>> > > of servers. I don't relish the idea of having to
> maintain
> > >> data
> > >> > in
> > >> > > > >>> > multiple
> > >> > > > >>> > > places.
> > >> > > > >>> > >
> > >> > > > >>> > > -Todd
> > >> > > > >>> > >
> > >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> > >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
> > >> > > > >>> > >
> > >> > > > >>> > > > Thanks for starting this KIP Allen.
> > >> > > > >>> > > >
> > >> > > > >>> > > > I agree with Gwen that having a RackLocator class that
> > is
> > >> > > > pluggable
> > >> > > > >>> > seems
> > >> > > > >>> > > > to be too complex. The KIP refers to potentially
> non-ZK
> > >> > storage
> > >> > > > >>> for the
> > >> > > > >>> > > > rack info which I don't think is necessary.
> > >> > > > >>> > > >
> > >> > > > >>> > > > Perhaps we can persist this info in zk under
> > >> > > > >>> /brokers/ids/<broker_id>
> > >> > > > >>> > > > similar to other broker properties and add a config in
> > >> > > > KafkaConfig
> > >> > > > >>> > called
> > >> > > > >>> > > > "rack".
> > >> > > > >>> > > >
> > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > >> > > "rack":
> > >> > > > >>> > "abc"}
> > >> > > > >>> > > >
> > >> > > > >>> > > > Aditya
> > >> > > > >>> > > >
> > >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <
> > >> > > gwen@confluent.io
> > >> > > > >
> > >> > > > >>> > wrote:
> > >> > > > >>> > > >
> > >> > > > >>> > > > > Hi,
> > >> > > > >>> > > > >
> > >> > > > >>> > > > > First, thanks for putting out a KIP for this. This
> is
> > >> super
> > >> > > > >>> important
> > >> > > > >>> > > for
> > >> > > > >>> > > > > production deployments of Kafka.
> > >> > > > >>> > > > >
> > >> > > > >>> > > > > Few questions:
> > >> > > > >>> > > > >
> > >> > > > >>> > > > > 1) Are we sure we want "as many racks as possible"?
> > I'd
> > >> > want
> > >> > > to
> > >> > > > >>> > balance
> > >> > > > >>> > > > > between safety (more racks) and network utilization
> > >> > (traffic
> > >> > > > >>> within a
> > >> > > > >>> > > > rack
> > >> > > > >>> > > > > uses the high-bandwidth TOR switch). One replica on
> a
> > >> > > different
> > >> > > > >>> rack
> > >> > > > >>> > > and
> > >> > > > >>> > > > > the rest on same rack (if possible) sounds better to
> > me.
> > >> > > > >>> > > > >
> > >> > > > >>> > > > > 2) Rack-locator class seems overly complex compared
> to
> > >> > > adding a
> > >> > > > >>> > > > rack.number
> > >> > > > >>> > > > > property to the broker properties file. Why do we
> want
> > >> > that?
> > >> > > > >>> > > > >
> > >> > > > >>> > > > > Gwen
> > >> > > > >>> > > > >
> > >> > > > >>> > > > >
> > >> > > > >>> > > > >
> > >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
> > >> > > > >>> allenxwang@gmail.com>
> > >> > > > >>> > > > wrote:
> > >> > > > >>> > > > >
> > >> > > > >>> > > > > > Hello Kafka Developers,
> > >> > > > >>> > > > > >
> > >> > > > >>> > > > > > I just created KIP-36 for rack aware replica
> > >> assignment.
> > >> > > > >>> > > > > >
> > >> > > > >>> > > > > >
> > >> > > > >>> > > > > >
> > >> > > > >>> > > > >
> > >> > > > >>> > > >
> > >> > > > >>> > >
> > >> > > > >>> >
> > >> > > > >>>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > >> > > > >>> > > > > >
> > >> > > > >>> > > > > > The goal is to utilize the isolation provided by
> the
> > >> > racks
> > >> > > in
> > >> > > > >>> data
> > >> > > > >>> > > > center
> > >> > > > >>> > > > > > and distribute replicas to racks to provide fault
> > >> > > tolerance.
> > >> > > > >>> > > > > >
> > >> > > > >>> > > > > > Comments are welcome.
> > >> > > > >>> > > > > >
> > >> > > > >>> > > > > > Thanks,
> > >> > > > >>> > > > > > Allen
> > >> > > > >>> > > > > >
> > >> > > > >>> > > > >
> > >> > > > >>> > > >
> > >> > > > >>> > >
> > >> > > > >>> >
> > >> > > > >>>
> > >> > > > >>
> > >> > > > >>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Aditya Auradkar <aa...@linkedin.com.INVALID>.
Hey Allen,

1. If we choose fail fast topic creation, we will have topic creation
failures while upgrading the cluster. I really doubt we want this behavior.
Ideally, this should be invisible to clients of a cluster. Currently, each
broker is effectively its own rack. So we probably can use the rack
information whenever possible but not make it a hard requirement. To extend
Gwen's example, one badly configured broker should not degrade topic
creation for the entire cluster.

2. Upgrade scenario - Can you add a section on the upgrade piece to confirm
that old clients will not see errors? I believe ZookeeperConsumerConnector
reads the Broker objects from ZK. I wanted to confirm that this will not
cause any problems.

3. Could you elaborate your proposed changes to the UpdateMetadataRequest
in the "Public Interfaces" section? Personally, I find this format easy to
read in terms of wire protocol changes:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest

Aditya

On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang <al...@gmail.com> wrote:

> KIP is updated include rack as an optional property for broker. Please take
> a look and let me know if more details are needed.
>
> For the case where some brokers have rack and some do not, the current KIP
> uses the fail-fast behavior. If there are concerns, we can further discuss
> this in the email thread or next hangout.
>
>
>
> On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <al...@gmail.com> wrote:
>
> > That's a good question. I can think of three actions if the rack
> > information is incomplete:
> >
> > 1. Treat the node without rack as if it is on its unique rack
> > 2. Disregard all rack information and fallback to current algorithm
> > 3. Fail-fast
> >
> > Now I think about it, one and three make more sense. The reason for
> > fail-fast is that user mistake for not providing the rack may never be
> > found if we tolerate that and the assignment may not be rack aware as the
> > user has expected and this creates debug problems when things fail.
> >
> > What do you think? If not fail-fast, is there anyway we can make the user
> > error standing out?
> >
> >
> > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <gw...@confluent.io>
> wrote:
> >
> >> Thanks! Just to clarify, when some brokers have rack assignment and some
> >> don't, do we act like none of them have it? or like those without
> >> assignment are in their own rack?
> >>
> >> The first scenario is good when first setting up rack-awareness, but the
> >> second makes more sense for on-going maintenance (I can totally see
> >> someone
> >> adding a node and forgetting to set the rack property, we don't want
> this
> >> to change behavior for anything except the new node).
> >>
> >> What do you think?
> >>
> >> Gwen
> >>
> >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <al...@gmail.com>
> >> wrote:
> >>
> >> > For scenario 1:
> >> >
> >> > - Add the rack information to broker property file or dynamically set
> >> it in
> >> > the wrapper code to bootstrap Kafka server. You would do that for all
> >> > brokers and restart the brokers one by one.
> >> >
> >> > In this scenario, the complete broker to rack mapping may not be
> >> available
> >> > until every broker is restarted. During that time we fall back to
> >> default
> >> > replica assignment algorithm.
> >> >
> >> > For scenario 2:
> >> >
> >> > - Add the rack information to broker property file or dynamically set
> >> it in
> >> > the wrapper code and start the broker.
> >> >
> >> >
> >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <gw...@confluent.io>
> >> wrote:
> >> >
> >> > > Can you clarify the workflow for the following scenarios:
> >> > >
> >> > > 1. I currently have 6 brokers and want to add rack information for
> >> each
> >> > > 2. I'm adding a new broker and I want to specify which rack it
> >> belongs on
> >> > > while adding it.
> >> > >
> >> > > Thanks!
> >> > >
> >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <al...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > We discussed the KIP in the hangout today. The recommendation is
> to
> >> > make
> >> > > > rack as a broker property in ZooKeeper. For users with existing
> rack
> >> > > > information stored somewhere, they would need to retrieve the
> >> > information
> >> > > > at broker start up and dynamically set the rack property, which
> can
> >> be
> >> > > > implemented as a wrapper to bootstrap broker. There will be no
> >> > interface
> >> > > or
> >> > > > pluggable implementation to retrieve the rack information.
> >> > > >
> >> > > > The assumption is that you always need to restart the broker to
> >> make a
> >> > > > change to the rack.
> >> > > >
> >> > > > Once the rack becomes a broker property, it will be possible to
> make
> >> > rack
> >> > > > part of the meta data to help the consumer choose which in sync
> >> replica
> >> > > to
> >> > > > consume from as part of the future consumer enhancement.
> >> > > >
> >> > > > I will update the KIP.
> >> > > >
> >> > > > Thanks,
> >> > > > Allen
> >> > > >
> >> > > >
> >> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <al...@gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > I attended Tuesday's KIP hangout but this KIP was not discussed
> >> due
> >> > to
> >> > > > > time constraint.
> >> > > > >
> >> > > > > However, after hearing discussion of KIP-35, I have the feeling
> >> that
> >> > > > > incompatibility (caused by new broker property) between brokers
> >> with
> >> > > > > different versions  will be solved there. In addition, having
> >> stack
> >> > in
> >> > > > > broker property as meta data may also help consumers in the
> >> future.
> >> > So
> >> > > I
> >> > > > am
> >> > > > > open to adding stack property to broker.
> >> > > > >
> >> > > > > Hopefully we can discuss this in the next KIP hangout.
> >> > > > >
> >> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <
> allenxwang@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > >> Can you send me the information on the next KIP hangout?
> >> > > > >>
> >> > > > >> Currently the broker-rack mapping is not cached. In KafkaApis,
> >> > > > >> RackLocator.getRackInfo() is called each time the mapping is
> >> needed
> >> > > for
> >> > > > >> auto topic creation. This will ensure latest mapping is used at
> >> any
> >> > > > time.
> >> > > > >>
> >> > > > >> The ability to get the complete mapping makes it simple to
> reuse
> >> the
> >> > > > same
> >> > > > >> interface in command line tools.
> >> > > > >>
> >> > > > >>
> >> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> >> > > > >> aauradkar@linkedin.com.invalid> wrote:
> >> > > > >>
> >> > > > >>> Perhaps we discuss this during the next KIP hangout?
> >> > > > >>>
> >> > > > >>> I do see that a pluggable rack locator can be useful but I do
> >> see a
> >> > > few
> >> > > > >>> concerns:
> >> > > > >>>
> >> > > > >>> - The RackLocator (as described in the document), implies that
> >> it
> >> > can
> >> > > > >>> discover rack information for any node in the cluster. How
> does
> >> it
> >> > > deal
> >> > > > >>> with rack location changes? For example, if I moved broker id
> >> (1)
> >> > > from
> >> > > > >>> rack
> >> > > > >>> X to Y, I only have to start that broker with a newer rack
> >> config.
> >> > If
> >> > > > >>> RackLocator discovers broker -> rack information at start up
> >> time,
> >> > > any
> >> > > > >>> change to a broker will require bouncing the entire cluster
> >> since
> >> > > > >>> createTopic requests can be sent to any node in the cluster.
> >> > > > >>> For this reason it may be simpler to have each node be aware
> of
> >> its
> >> > > own
> >> > > > >>> rack and persist it in ZK during start up time.
> >> > > > >>>
> >> > > > >>> - A pluggable RackLocator relies on an external service being
> >> > > available
> >> > > > >>> to
> >> > > > >>> serve rack information.
> >> > > > >>>
> >> > > > >>> Out of curiosity, I looked up how a couple of other systems
> deal
> >> > with
> >> > > > >>> zone/rack awareness.
> >> > > > >>> For Cassandra some interesting modes are:
> >> > > > >>> (Property File configuration)
> >> > > > >>>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> >> > > > >>> (Dynamic inference)
> >> > > > >>>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> >> > > > >>>
> >> > > > >>> Voldemort does a static node -> zone assignment based on
> >> > > configuration.
> >> > > > >>>
> >> > > > >>> Aditya
> >> > > > >>>
> >> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
> >> allenxwang@gmail.com
> >> > >
> >> > > > >>> wrote:
> >> > > > >>>
> >> > > > >>> > I would like to see if we can do both:
> >> > > > >>> >
> >> > > > >>> > - Make RackLocator pluggable to facilitate migration with
> >> > existing
> >> > > > >>> > broker-rack mapping
> >> > > > >>> >
> >> > > > >>> > - Make rack an optional property for broker. If rack is
> >> available
> >> > > > from
> >> > > > >>> > broker, treat it as source of truth. For users with existing
> >> > > > >>> broker-rack
> >> > > > >>> > mapping somewhere else, they can use the pluggable way or
> they
> >> > can
> >> > > > >>> transfer
> >> > > > >>> > the mapping to the broker rack property.
> >> > > > >>> >
> >> > > > >>> > One thing I am not sure is what happens at rolling upgrade
> >> when
> >> > we
> >> > > > have
> >> > > > >>> > rack as a broker property. For brokers with older version of
> >> > Kafka,
> >> > > > >>> will it
> >> > > > >>> > cause problem for them? If so, is there any workaround? I
> also
> >> > > think
> >> > > > it
> >> > > > >>> > would be better not to have rack in the controller wire
> >> protocol
> >> > > but
> >> > > > >>> not
> >> > > > >>> > sure if it is achievable.
> >> > > > >>> >
> >> > > > >>> > Thanks,
> >> > > > >>> > Allen
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
> >> tpalino@gmail.com>
> >> > > > >>> wrote:
> >> > > > >>> >
> >> > > > >>> > > I tend to like the idea of a pluggable locator. For
> >> example, we
> >> > > > >>> already
> >> > > > >>> > > have an interface for discovering information about the
> >> > physical
> >> > > > >>> location
> >> > > > >>> > > of servers. I don't relish the idea of having to maintain
> >> data
> >> > in
> >> > > > >>> > multiple
> >> > > > >>> > > places.
> >> > > > >>> > >
> >> > > > >>> > > -Todd
> >> > > > >>> > >
> >> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> >> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
> >> > > > >>> > >
> >> > > > >>> > > > Thanks for starting this KIP Allen.
> >> > > > >>> > > >
> >> > > > >>> > > > I agree with Gwen that having a RackLocator class that
> is
> >> > > > pluggable
> >> > > > >>> > seems
> >> > > > >>> > > > to be too complex. The KIP refers to potentially non-ZK
> >> > storage
> >> > > > >>> for the
> >> > > > >>> > > > rack info which I don't think is necessary.
> >> > > > >>> > > >
> >> > > > >>> > > > Perhaps we can persist this info in zk under
> >> > > > >>> /brokers/ids/<broker_id>
> >> > > > >>> > > > similar to other broker properties and add a config in
> >> > > > KafkaConfig
> >> > > > >>> > called
> >> > > > >>> > > > "rack".
> >> > > > >>> > > >
> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> >> > > "rack":
> >> > > > >>> > "abc"}
> >> > > > >>> > > >
> >> > > > >>> > > > Aditya
> >> > > > >>> > > >
> >> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <
> >> > > gwen@confluent.io
> >> > > > >
> >> > > > >>> > wrote:
> >> > > > >>> > > >
> >> > > > >>> > > > > Hi,
> >> > > > >>> > > > >
> >> > > > >>> > > > > First, thanks for putting out a KIP for this. This is
> >> super
> >> > > > >>> important
> >> > > > >>> > > for
> >> > > > >>> > > > > production deployments of Kafka.
> >> > > > >>> > > > >
> >> > > > >>> > > > > Few questions:
> >> > > > >>> > > > >
> >> > > > >>> > > > > 1) Are we sure we want "as many racks as possible"?
> I'd
> >> > want
> >> > > to
> >> > > > >>> > balance
> >> > > > >>> > > > > between safety (more racks) and network utilization
> >> > (traffic
> >> > > > >>> within a
> >> > > > >>> > > > rack
> >> > > > >>> > > > > uses the high-bandwidth TOR switch). One replica on a
> >> > > different
> >> > > > >>> rack
> >> > > > >>> > > and
> >> > > > >>> > > > > the rest on same rack (if possible) sounds better to
> me.
> >> > > > >>> > > > >
> >> > > > >>> > > > > 2) Rack-locator class seems overly complex compared to
> >> > > adding a
> >> > > > >>> > > > rack.number
> >> > > > >>> > > > > property to the broker properties file. Why do we want
> >> > that?
> >> > > > >>> > > > >
> >> > > > >>> > > > > Gwen
> >> > > > >>> > > > >
> >> > > > >>> > > > >
> >> > > > >>> > > > >
> >> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
> >> > > > >>> allenxwang@gmail.com>
> >> > > > >>> > > > wrote:
> >> > > > >>> > > > >
> >> > > > >>> > > > > > Hello Kafka Developers,
> >> > > > >>> > > > > >
> >> > > > >>> > > > > > I just created KIP-36 for rack aware replica
> >> assignment.
> >> > > > >>> > > > > >
> >> > > > >>> > > > > >
> >> > > > >>> > > > > >
> >> > > > >>> > > > >
> >> > > > >>> > > >
> >> > > > >>> > >
> >> > > > >>> >
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> >> > > > >>> > > > > >
> >> > > > >>> > > > > > The goal is to utilize the isolation provided by the
> >> > racks
> >> > > in
> >> > > > >>> data
> >> > > > >>> > > > center
> >> > > > >>> > > > > > and distribute replicas to racks to provide fault
> >> > > tolerance.
> >> > > > >>> > > > > >
> >> > > > >>> > > > > > Comments are welcome.
> >> > > > >>> > > > > >
> >> > > > >>> > > > > > Thanks,
> >> > > > >>> > > > > > Allen
> >> > > > >>> > > > > >
> >> > > > >>> > > > >
> >> > > > >>> > > >
> >> > > > >>> > >
> >> > > > >>> >
> >> > > > >>>
> >> > > > >>
> >> > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
KIP is updated include rack as an optional property for broker. Please take
a look and let me know if more details are needed.

For the case where some brokers have rack and some do not, the current KIP
uses the fail-fast behavior. If there are concerns, we can further discuss
this in the email thread or next hangout.



On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang <al...@gmail.com> wrote:

> That's a good question. I can think of three actions if the rack
> information is incomplete:
>
> 1. Treat the node without rack as if it is on its unique rack
> 2. Disregard all rack information and fallback to current algorithm
> 3. Fail-fast
>
> Now I think about it, one and three make more sense. The reason for
> fail-fast is that user mistake for not providing the rack may never be
> found if we tolerate that and the assignment may not be rack aware as the
> user has expected and this creates debug problems when things fail.
>
> What do you think? If not fail-fast, is there anyway we can make the user
> error standing out?
>
>
> On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <gw...@confluent.io> wrote:
>
>> Thanks! Just to clarify, when some brokers have rack assignment and some
>> don't, do we act like none of them have it? or like those without
>> assignment are in their own rack?
>>
>> The first scenario is good when first setting up rack-awareness, but the
>> second makes more sense for on-going maintenance (I can totally see
>> someone
>> adding a node and forgetting to set the rack property, we don't want this
>> to change behavior for anything except the new node).
>>
>> What do you think?
>>
>> Gwen
>>
>> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <al...@gmail.com>
>> wrote:
>>
>> > For scenario 1:
>> >
>> > - Add the rack information to broker property file or dynamically set
>> it in
>> > the wrapper code to bootstrap Kafka server. You would do that for all
>> > brokers and restart the brokers one by one.
>> >
>> > In this scenario, the complete broker to rack mapping may not be
>> available
>> > until every broker is restarted. During that time we fall back to
>> default
>> > replica assignment algorithm.
>> >
>> > For scenario 2:
>> >
>> > - Add the rack information to broker property file or dynamically set
>> it in
>> > the wrapper code and start the broker.
>> >
>> >
>> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <gw...@confluent.io>
>> wrote:
>> >
>> > > Can you clarify the workflow for the following scenarios:
>> > >
>> > > 1. I currently have 6 brokers and want to add rack information for
>> each
>> > > 2. I'm adding a new broker and I want to specify which rack it
>> belongs on
>> > > while adding it.
>> > >
>> > > Thanks!
>> > >
>> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <al...@gmail.com>
>> > wrote:
>> > >
>> > > > We discussed the KIP in the hangout today. The recommendation is to
>> > make
>> > > > rack as a broker property in ZooKeeper. For users with existing rack
>> > > > information stored somewhere, they would need to retrieve the
>> > information
>> > > > at broker start up and dynamically set the rack property, which can
>> be
>> > > > implemented as a wrapper to bootstrap broker. There will be no
>> > interface
>> > > or
>> > > > pluggable implementation to retrieve the rack information.
>> > > >
>> > > > The assumption is that you always need to restart the broker to
>> make a
>> > > > change to the rack.
>> > > >
>> > > > Once the rack becomes a broker property, it will be possible to make
>> > rack
>> > > > part of the meta data to help the consumer choose which in sync
>> replica
>> > > to
>> > > > consume from as part of the future consumer enhancement.
>> > > >
>> > > > I will update the KIP.
>> > > >
>> > > > Thanks,
>> > > > Allen
>> > > >
>> > > >
>> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <al...@gmail.com>
>> > wrote:
>> > > >
>> > > > > I attended Tuesday's KIP hangout but this KIP was not discussed
>> due
>> > to
>> > > > > time constraint.
>> > > > >
>> > > > > However, after hearing discussion of KIP-35, I have the feeling
>> that
>> > > > > incompatibility (caused by new broker property) between brokers
>> with
>> > > > > different versions  will be solved there. In addition, having
>> stack
>> > in
>> > > > > broker property as meta data may also help consumers in the
>> future.
>> > So
>> > > I
>> > > > am
>> > > > > open to adding stack property to broker.
>> > > > >
>> > > > > Hopefully we can discuss this in the next KIP hangout.
>> > > > >
>> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <allenxwang@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > >> Can you send me the information on the next KIP hangout?
>> > > > >>
>> > > > >> Currently the broker-rack mapping is not cached. In KafkaApis,
>> > > > >> RackLocator.getRackInfo() is called each time the mapping is
>> needed
>> > > for
>> > > > >> auto topic creation. This will ensure latest mapping is used at
>> any
>> > > > time.
>> > > > >>
>> > > > >> The ability to get the complete mapping makes it simple to reuse
>> the
>> > > > same
>> > > > >> interface in command line tools.
>> > > > >>
>> > > > >>
>> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
>> > > > >> aauradkar@linkedin.com.invalid> wrote:
>> > > > >>
>> > > > >>> Perhaps we discuss this during the next KIP hangout?
>> > > > >>>
>> > > > >>> I do see that a pluggable rack locator can be useful but I do
>> see a
>> > > few
>> > > > >>> concerns:
>> > > > >>>
>> > > > >>> - The RackLocator (as described in the document), implies that
>> it
>> > can
>> > > > >>> discover rack information for any node in the cluster. How does
>> it
>> > > deal
>> > > > >>> with rack location changes? For example, if I moved broker id
>> (1)
>> > > from
>> > > > >>> rack
>> > > > >>> X to Y, I only have to start that broker with a newer rack
>> config.
>> > If
>> > > > >>> RackLocator discovers broker -> rack information at start up
>> time,
>> > > any
>> > > > >>> change to a broker will require bouncing the entire cluster
>> since
>> > > > >>> createTopic requests can be sent to any node in the cluster.
>> > > > >>> For this reason it may be simpler to have each node be aware of
>> its
>> > > own
>> > > > >>> rack and persist it in ZK during start up time.
>> > > > >>>
>> > > > >>> - A pluggable RackLocator relies on an external service being
>> > > available
>> > > > >>> to
>> > > > >>> serve rack information.
>> > > > >>>
>> > > > >>> Out of curiosity, I looked up how a couple of other systems deal
>> > with
>> > > > >>> zone/rack awareness.
>> > > > >>> For Cassandra some interesting modes are:
>> > > > >>> (Property File configuration)
>> > > > >>>
>> > > > >>>
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>> > > > >>> (Dynamic inference)
>> > > > >>>
>> > > > >>>
>> > > >
>> > >
>> >
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>> > > > >>>
>> > > > >>> Voldemort does a static node -> zone assignment based on
>> > > configuration.
>> > > > >>>
>> > > > >>> Aditya
>> > > > >>>
>> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
>> allenxwang@gmail.com
>> > >
>> > > > >>> wrote:
>> > > > >>>
>> > > > >>> > I would like to see if we can do both:
>> > > > >>> >
>> > > > >>> > - Make RackLocator pluggable to facilitate migration with
>> > existing
>> > > > >>> > broker-rack mapping
>> > > > >>> >
>> > > > >>> > - Make rack an optional property for broker. If rack is
>> available
>> > > > from
>> > > > >>> > broker, treat it as source of truth. For users with existing
>> > > > >>> broker-rack
>> > > > >>> > mapping somewhere else, they can use the pluggable way or they
>> > can
>> > > > >>> transfer
>> > > > >>> > the mapping to the broker rack property.
>> > > > >>> >
>> > > > >>> > One thing I am not sure is what happens at rolling upgrade
>> when
>> > we
>> > > > have
>> > > > >>> > rack as a broker property. For brokers with older version of
>> > Kafka,
>> > > > >>> will it
>> > > > >>> > cause problem for them? If so, is there any workaround? I also
>> > > think
>> > > > it
>> > > > >>> > would be better not to have rack in the controller wire
>> protocol
>> > > but
>> > > > >>> not
>> > > > >>> > sure if it is achievable.
>> > > > >>> >
>> > > > >>> > Thanks,
>> > > > >>> > Allen
>> > > > >>> >
>> > > > >>> >
>> > > > >>> >
>> > > > >>> >
>> > > > >>> >
>> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
>> tpalino@gmail.com>
>> > > > >>> wrote:
>> > > > >>> >
>> > > > >>> > > I tend to like the idea of a pluggable locator. For
>> example, we
>> > > > >>> already
>> > > > >>> > > have an interface for discovering information about the
>> > physical
>> > > > >>> location
>> > > > >>> > > of servers. I don't relish the idea of having to maintain
>> data
>> > in
>> > > > >>> > multiple
>> > > > >>> > > places.
>> > > > >>> > >
>> > > > >>> > > -Todd
>> > > > >>> > >
>> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
>> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
>> > > > >>> > >
>> > > > >>> > > > Thanks for starting this KIP Allen.
>> > > > >>> > > >
>> > > > >>> > > > I agree with Gwen that having a RackLocator class that is
>> > > > pluggable
>> > > > >>> > seems
>> > > > >>> > > > to be too complex. The KIP refers to potentially non-ZK
>> > storage
>> > > > >>> for the
>> > > > >>> > > > rack info which I don't think is necessary.
>> > > > >>> > > >
>> > > > >>> > > > Perhaps we can persist this info in zk under
>> > > > >>> /brokers/ids/<broker_id>
>> > > > >>> > > > similar to other broker properties and add a config in
>> > > > KafkaConfig
>> > > > >>> > called
>> > > > >>> > > > "rack".
>> > > > >>> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
>> > > "rack":
>> > > > >>> > "abc"}
>> > > > >>> > > >
>> > > > >>> > > > Aditya
>> > > > >>> > > >
>> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <
>> > > gwen@confluent.io
>> > > > >
>> > > > >>> > wrote:
>> > > > >>> > > >
>> > > > >>> > > > > Hi,
>> > > > >>> > > > >
>> > > > >>> > > > > First, thanks for putting out a KIP for this. This is
>> super
>> > > > >>> important
>> > > > >>> > > for
>> > > > >>> > > > > production deployments of Kafka.
>> > > > >>> > > > >
>> > > > >>> > > > > Few questions:
>> > > > >>> > > > >
>> > > > >>> > > > > 1) Are we sure we want "as many racks as possible"? I'd
>> > want
>> > > to
>> > > > >>> > balance
>> > > > >>> > > > > between safety (more racks) and network utilization
>> > (traffic
>> > > > >>> within a
>> > > > >>> > > > rack
>> > > > >>> > > > > uses the high-bandwidth TOR switch). One replica on a
>> > > different
>> > > > >>> rack
>> > > > >>> > > and
>> > > > >>> > > > > the rest on same rack (if possible) sounds better to me.
>> > > > >>> > > > >
>> > > > >>> > > > > 2) Rack-locator class seems overly complex compared to
>> > > adding a
>> > > > >>> > > > rack.number
>> > > > >>> > > > > property to the broker properties file. Why do we want
>> > that?
>> > > > >>> > > > >
>> > > > >>> > > > > Gwen
>> > > > >>> > > > >
>> > > > >>> > > > >
>> > > > >>> > > > >
>> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
>> > > > >>> allenxwang@gmail.com>
>> > > > >>> > > > wrote:
>> > > > >>> > > > >
>> > > > >>> > > > > > Hello Kafka Developers,
>> > > > >>> > > > > >
>> > > > >>> > > > > > I just created KIP-36 for rack aware replica
>> assignment.
>> > > > >>> > > > > >
>> > > > >>> > > > > >
>> > > > >>> > > > > >
>> > > > >>> > > > >
>> > > > >>> > > >
>> > > > >>> > >
>> > > > >>> >
>> > > > >>>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>> > > > >>> > > > > >
>> > > > >>> > > > > > The goal is to utilize the isolation provided by the
>> > racks
>> > > in
>> > > > >>> data
>> > > > >>> > > > center
>> > > > >>> > > > > > and distribute replicas to racks to provide fault
>> > > tolerance.
>> > > > >>> > > > > >
>> > > > >>> > > > > > Comments are welcome.
>> > > > >>> > > > > >
>> > > > >>> > > > > > Thanks,
>> > > > >>> > > > > > Allen
>> > > > >>> > > > > >
>> > > > >>> > > > >
>> > > > >>> > > >
>> > > > >>> > >
>> > > > >>> >
>> > > > >>>
>> > > > >>
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
That's a good question. I can think of three actions if the rack
information is incomplete:

1. Treat the node without rack as if it is on its unique rack
2. Disregard all rack information and fallback to current algorithm
3. Fail-fast

Now I think about it, one and three make more sense. The reason for
fail-fast is that user mistake for not providing the rack may never be
found if we tolerate that and the assignment may not be rack aware as the
user has expected and this creates debug problems when things fail.

What do you think? If not fail-fast, is there anyway we can make the user
error standing out?


On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira <gw...@confluent.io> wrote:

> Thanks! Just to clarify, when some brokers have rack assignment and some
> don't, do we act like none of them have it? or like those without
> assignment are in their own rack?
>
> The first scenario is good when first setting up rack-awareness, but the
> second makes more sense for on-going maintenance (I can totally see someone
> adding a node and forgetting to set the rack property, we don't want this
> to change behavior for anything except the new node).
>
> What do you think?
>
> Gwen
>
> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <al...@gmail.com> wrote:
>
> > For scenario 1:
> >
> > - Add the rack information to broker property file or dynamically set it
> in
> > the wrapper code to bootstrap Kafka server. You would do that for all
> > brokers and restart the brokers one by one.
> >
> > In this scenario, the complete broker to rack mapping may not be
> available
> > until every broker is restarted. During that time we fall back to default
> > replica assignment algorithm.
> >
> > For scenario 2:
> >
> > - Add the rack information to broker property file or dynamically set it
> in
> > the wrapper code and start the broker.
> >
> >
> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <gw...@confluent.io> wrote:
> >
> > > Can you clarify the workflow for the following scenarios:
> > >
> > > 1. I currently have 6 brokers and want to add rack information for each
> > > 2. I'm adding a new broker and I want to specify which rack it belongs
> on
> > > while adding it.
> > >
> > > Thanks!
> > >
> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <al...@gmail.com>
> > wrote:
> > >
> > > > We discussed the KIP in the hangout today. The recommendation is to
> > make
> > > > rack as a broker property in ZooKeeper. For users with existing rack
> > > > information stored somewhere, they would need to retrieve the
> > information
> > > > at broker start up and dynamically set the rack property, which can
> be
> > > > implemented as a wrapper to bootstrap broker. There will be no
> > interface
> > > or
> > > > pluggable implementation to retrieve the rack information.
> > > >
> > > > The assumption is that you always need to restart the broker to make
> a
> > > > change to the rack.
> > > >
> > > > Once the rack becomes a broker property, it will be possible to make
> > rack
> > > > part of the meta data to help the consumer choose which in sync
> replica
> > > to
> > > > consume from as part of the future consumer enhancement.
> > > >
> > > > I will update the KIP.
> > > >
> > > > Thanks,
> > > > Allen
> > > >
> > > >
> > > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <al...@gmail.com>
> > wrote:
> > > >
> > > > > I attended Tuesday's KIP hangout but this KIP was not discussed due
> > to
> > > > > time constraint.
> > > > >
> > > > > However, after hearing discussion of KIP-35, I have the feeling
> that
> > > > > incompatibility (caused by new broker property) between brokers
> with
> > > > > different versions  will be solved there. In addition, having stack
> > in
> > > > > broker property as meta data may also help consumers in the future.
> > So
> > > I
> > > > am
> > > > > open to adding stack property to broker.
> > > > >
> > > > > Hopefully we can discuss this in the next KIP hangout.
> > > > >
> > > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <al...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Can you send me the information on the next KIP hangout?
> > > > >>
> > > > >> Currently the broker-rack mapping is not cached. In KafkaApis,
> > > > >> RackLocator.getRackInfo() is called each time the mapping is
> needed
> > > for
> > > > >> auto topic creation. This will ensure latest mapping is used at
> any
> > > > time.
> > > > >>
> > > > >> The ability to get the complete mapping makes it simple to reuse
> the
> > > > same
> > > > >> interface in command line tools.
> > > > >>
> > > > >>
> > > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> > > > >> aauradkar@linkedin.com.invalid> wrote:
> > > > >>
> > > > >>> Perhaps we discuss this during the next KIP hangout?
> > > > >>>
> > > > >>> I do see that a pluggable rack locator can be useful but I do
> see a
> > > few
> > > > >>> concerns:
> > > > >>>
> > > > >>> - The RackLocator (as described in the document), implies that it
> > can
> > > > >>> discover rack information for any node in the cluster. How does
> it
> > > deal
> > > > >>> with rack location changes? For example, if I moved broker id (1)
> > > from
> > > > >>> rack
> > > > >>> X to Y, I only have to start that broker with a newer rack
> config.
> > If
> > > > >>> RackLocator discovers broker -> rack information at start up
> time,
> > > any
> > > > >>> change to a broker will require bouncing the entire cluster since
> > > > >>> createTopic requests can be sent to any node in the cluster.
> > > > >>> For this reason it may be simpler to have each node be aware of
> its
> > > own
> > > > >>> rack and persist it in ZK during start up time.
> > > > >>>
> > > > >>> - A pluggable RackLocator relies on an external service being
> > > available
> > > > >>> to
> > > > >>> serve rack information.
> > > > >>>
> > > > >>> Out of curiosity, I looked up how a couple of other systems deal
> > with
> > > > >>> zone/rack awareness.
> > > > >>> For Cassandra some interesting modes are:
> > > > >>> (Property File configuration)
> > > > >>>
> > > > >>>
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > > >>> (Dynamic inference)
> > > > >>>
> > > > >>>
> > > >
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > > >>>
> > > > >>> Voldemort does a static node -> zone assignment based on
> > > configuration.
> > > > >>>
> > > > >>> Aditya
> > > > >>>
> > > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <
> allenxwang@gmail.com
> > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> > I would like to see if we can do both:
> > > > >>> >
> > > > >>> > - Make RackLocator pluggable to facilitate migration with
> > existing
> > > > >>> > broker-rack mapping
> > > > >>> >
> > > > >>> > - Make rack an optional property for broker. If rack is
> available
> > > > from
> > > > >>> > broker, treat it as source of truth. For users with existing
> > > > >>> broker-rack
> > > > >>> > mapping somewhere else, they can use the pluggable way or they
> > can
> > > > >>> transfer
> > > > >>> > the mapping to the broker rack property.
> > > > >>> >
> > > > >>> > One thing I am not sure is what happens at rolling upgrade when
> > we
> > > > have
> > > > >>> > rack as a broker property. For brokers with older version of
> > Kafka,
> > > > >>> will it
> > > > >>> > cause problem for them? If so, is there any workaround? I also
> > > think
> > > > it
> > > > >>> > would be better not to have rack in the controller wire
> protocol
> > > but
> > > > >>> not
> > > > >>> > sure if it is achievable.
> > > > >>> >
> > > > >>> > Thanks,
> > > > >>> > Allen
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <
> tpalino@gmail.com>
> > > > >>> wrote:
> > > > >>> >
> > > > >>> > > I tend to like the idea of a pluggable locator. For example,
> we
> > > > >>> already
> > > > >>> > > have an interface for discovering information about the
> > physical
> > > > >>> location
> > > > >>> > > of servers. I don't relish the idea of having to maintain
> data
> > in
> > > > >>> > multiple
> > > > >>> > > places.
> > > > >>> > >
> > > > >>> > > -Todd
> > > > >>> > >
> > > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> > > > >>> > > aauradkar@linkedin.com.invalid> wrote:
> > > > >>> > >
> > > > >>> > > > Thanks for starting this KIP Allen.
> > > > >>> > > >
> > > > >>> > > > I agree with Gwen that having a RackLocator class that is
> > > > pluggable
> > > > >>> > seems
> > > > >>> > > > to be too complex. The KIP refers to potentially non-ZK
> > storage
> > > > >>> for the
> > > > >>> > > > rack info which I don't think is necessary.
> > > > >>> > > >
> > > > >>> > > > Perhaps we can persist this info in zk under
> > > > >>> /brokers/ids/<broker_id>
> > > > >>> > > > similar to other broker properties and add a config in
> > > > KafkaConfig
> > > > >>> > called
> > > > >>> > > > "rack".
> > > > >>> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > > "rack":
> > > > >>> > "abc"}
> > > > >>> > > >
> > > > >>> > > > Aditya
> > > > >>> > > >
> > > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <
> > > gwen@confluent.io
> > > > >
> > > > >>> > wrote:
> > > > >>> > > >
> > > > >>> > > > > Hi,
> > > > >>> > > > >
> > > > >>> > > > > First, thanks for putting out a KIP for this. This is
> super
> > > > >>> important
> > > > >>> > > for
> > > > >>> > > > > production deployments of Kafka.
> > > > >>> > > > >
> > > > >>> > > > > Few questions:
> > > > >>> > > > >
> > > > >>> > > > > 1) Are we sure we want "as many racks as possible"? I'd
> > want
> > > to
> > > > >>> > balance
> > > > >>> > > > > between safety (more racks) and network utilization
> > (traffic
> > > > >>> within a
> > > > >>> > > > rack
> > > > >>> > > > > uses the high-bandwidth TOR switch). One replica on a
> > > different
> > > > >>> rack
> > > > >>> > > and
> > > > >>> > > > > the rest on same rack (if possible) sounds better to me.
> > > > >>> > > > >
> > > > >>> > > > > 2) Rack-locator class seems overly complex compared to
> > > adding a
> > > > >>> > > > rack.number
> > > > >>> > > > > property to the broker properties file. Why do we want
> > that?
> > > > >>> > > > >
> > > > >>> > > > > Gwen
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
> > > > >>> allenxwang@gmail.com>
> > > > >>> > > > wrote:
> > > > >>> > > > >
> > > > >>> > > > > > Hello Kafka Developers,
> > > > >>> > > > > >
> > > > >>> > > > > > I just created KIP-36 for rack aware replica
> assignment.
> > > > >>> > > > > >
> > > > >>> > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > >>> > > > > >
> > > > >>> > > > > > The goal is to utilize the isolation provided by the
> > racks
> > > in
> > > > >>> data
> > > > >>> > > > center
> > > > >>> > > > > > and distribute replicas to racks to provide fault
> > > tolerance.
> > > > >>> > > > > >
> > > > >>> > > > > > Comments are welcome.
> > > > >>> > > > > >
> > > > >>> > > > > > Thanks,
> > > > >>> > > > > > Allen
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Gwen Shapira <gw...@confluent.io>.
Thanks! Just to clarify, when some brokers have rack assignment and some
don't, do we act like none of them have it? or like those without
assignment are in their own rack?

The first scenario is good when first setting up rack-awareness, but the
second makes more sense for on-going maintenance (I can totally see someone
adding a node and forgetting to set the rack property, we don't want this
to change behavior for anything except the new node).

What do you think?

Gwen

On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang <al...@gmail.com> wrote:

> For scenario 1:
>
> - Add the rack information to broker property file or dynamically set it in
> the wrapper code to bootstrap Kafka server. You would do that for all
> brokers and restart the brokers one by one.
>
> In this scenario, the complete broker to rack mapping may not be available
> until every broker is restarted. During that time we fall back to default
> replica assignment algorithm.
>
> For scenario 2:
>
> - Add the rack information to broker property file or dynamically set it in
> the wrapper code and start the broker.
>
>
> On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <gw...@confluent.io> wrote:
>
> > Can you clarify the workflow for the following scenarios:
> >
> > 1. I currently have 6 brokers and want to add rack information for each
> > 2. I'm adding a new broker and I want to specify which rack it belongs on
> > while adding it.
> >
> > Thanks!
> >
> > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > We discussed the KIP in the hangout today. The recommendation is to
> make
> > > rack as a broker property in ZooKeeper. For users with existing rack
> > > information stored somewhere, they would need to retrieve the
> information
> > > at broker start up and dynamically set the rack property, which can be
> > > implemented as a wrapper to bootstrap broker. There will be no
> interface
> > or
> > > pluggable implementation to retrieve the rack information.
> > >
> > > The assumption is that you always need to restart the broker to make a
> > > change to the rack.
> > >
> > > Once the rack becomes a broker property, it will be possible to make
> rack
> > > part of the meta data to help the consumer choose which in sync replica
> > to
> > > consume from as part of the future consumer enhancement.
> > >
> > > I will update the KIP.
> > >
> > > Thanks,
> > > Allen
> > >
> > >
> > > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <al...@gmail.com>
> wrote:
> > >
> > > > I attended Tuesday's KIP hangout but this KIP was not discussed due
> to
> > > > time constraint.
> > > >
> > > > However, after hearing discussion of KIP-35, I have the feeling that
> > > > incompatibility (caused by new broker property) between brokers with
> > > > different versions  will be solved there. In addition, having stack
> in
> > > > broker property as meta data may also help consumers in the future.
> So
> > I
> > > am
> > > > open to adding stack property to broker.
> > > >
> > > > Hopefully we can discuss this in the next KIP hangout.
> > > >
> > > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <al...@gmail.com>
> > > wrote:
> > > >
> > > >> Can you send me the information on the next KIP hangout?
> > > >>
> > > >> Currently the broker-rack mapping is not cached. In KafkaApis,
> > > >> RackLocator.getRackInfo() is called each time the mapping is needed
> > for
> > > >> auto topic creation. This will ensure latest mapping is used at any
> > > time.
> > > >>
> > > >> The ability to get the complete mapping makes it simple to reuse the
> > > same
> > > >> interface in command line tools.
> > > >>
> > > >>
> > > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> > > >> aauradkar@linkedin.com.invalid> wrote:
> > > >>
> > > >>> Perhaps we discuss this during the next KIP hangout?
> > > >>>
> > > >>> I do see that a pluggable rack locator can be useful but I do see a
> > few
> > > >>> concerns:
> > > >>>
> > > >>> - The RackLocator (as described in the document), implies that it
> can
> > > >>> discover rack information for any node in the cluster. How does it
> > deal
> > > >>> with rack location changes? For example, if I moved broker id (1)
> > from
> > > >>> rack
> > > >>> X to Y, I only have to start that broker with a newer rack config.
> If
> > > >>> RackLocator discovers broker -> rack information at start up time,
> > any
> > > >>> change to a broker will require bouncing the entire cluster since
> > > >>> createTopic requests can be sent to any node in the cluster.
> > > >>> For this reason it may be simpler to have each node be aware of its
> > own
> > > >>> rack and persist it in ZK during start up time.
> > > >>>
> > > >>> - A pluggable RackLocator relies on an external service being
> > available
> > > >>> to
> > > >>> serve rack information.
> > > >>>
> > > >>> Out of curiosity, I looked up how a couple of other systems deal
> with
> > > >>> zone/rack awareness.
> > > >>> For Cassandra some interesting modes are:
> > > >>> (Property File configuration)
> > > >>>
> > > >>>
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > > >>> (Dynamic inference)
> > > >>>
> > > >>>
> > >
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > > >>>
> > > >>> Voldemort does a static node -> zone assignment based on
> > configuration.
> > > >>>
> > > >>> Aditya
> > > >>>
> > > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <allenxwang@gmail.com
> >
> > > >>> wrote:
> > > >>>
> > > >>> > I would like to see if we can do both:
> > > >>> >
> > > >>> > - Make RackLocator pluggable to facilitate migration with
> existing
> > > >>> > broker-rack mapping
> > > >>> >
> > > >>> > - Make rack an optional property for broker. If rack is available
> > > from
> > > >>> > broker, treat it as source of truth. For users with existing
> > > >>> broker-rack
> > > >>> > mapping somewhere else, they can use the pluggable way or they
> can
> > > >>> transfer
> > > >>> > the mapping to the broker rack property.
> > > >>> >
> > > >>> > One thing I am not sure is what happens at rolling upgrade when
> we
> > > have
> > > >>> > rack as a broker property. For brokers with older version of
> Kafka,
> > > >>> will it
> > > >>> > cause problem for them? If so, is there any workaround? I also
> > think
> > > it
> > > >>> > would be better not to have rack in the controller wire protocol
> > but
> > > >>> not
> > > >>> > sure if it is achievable.
> > > >>> >
> > > >>> > Thanks,
> > > >>> > Allen
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <tp...@gmail.com>
> > > >>> wrote:
> > > >>> >
> > > >>> > > I tend to like the idea of a pluggable locator. For example, we
> > > >>> already
> > > >>> > > have an interface for discovering information about the
> physical
> > > >>> location
> > > >>> > > of servers. I don't relish the idea of having to maintain data
> in
> > > >>> > multiple
> > > >>> > > places.
> > > >>> > >
> > > >>> > > -Todd
> > > >>> > >
> > > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> > > >>> > > aauradkar@linkedin.com.invalid> wrote:
> > > >>> > >
> > > >>> > > > Thanks for starting this KIP Allen.
> > > >>> > > >
> > > >>> > > > I agree with Gwen that having a RackLocator class that is
> > > pluggable
> > > >>> > seems
> > > >>> > > > to be too complex. The KIP refers to potentially non-ZK
> storage
> > > >>> for the
> > > >>> > > > rack info which I don't think is necessary.
> > > >>> > > >
> > > >>> > > > Perhaps we can persist this info in zk under
> > > >>> /brokers/ids/<broker_id>
> > > >>> > > > similar to other broker properties and add a config in
> > > KafkaConfig
> > > >>> > called
> > > >>> > > > "rack".
> > > >>> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> > "rack":
> > > >>> > "abc"}
> > > >>> > > >
> > > >>> > > > Aditya
> > > >>> > > >
> > > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <
> > gwen@confluent.io
> > > >
> > > >>> > wrote:
> > > >>> > > >
> > > >>> > > > > Hi,
> > > >>> > > > >
> > > >>> > > > > First, thanks for putting out a KIP for this. This is super
> > > >>> important
> > > >>> > > for
> > > >>> > > > > production deployments of Kafka.
> > > >>> > > > >
> > > >>> > > > > Few questions:
> > > >>> > > > >
> > > >>> > > > > 1) Are we sure we want "as many racks as possible"? I'd
> want
> > to
> > > >>> > balance
> > > >>> > > > > between safety (more racks) and network utilization
> (traffic
> > > >>> within a
> > > >>> > > > rack
> > > >>> > > > > uses the high-bandwidth TOR switch). One replica on a
> > different
> > > >>> rack
> > > >>> > > and
> > > >>> > > > > the rest on same rack (if possible) sounds better to me.
> > > >>> > > > >
> > > >>> > > > > 2) Rack-locator class seems overly complex compared to
> > adding a
> > > >>> > > > rack.number
> > > >>> > > > > property to the broker properties file. Why do we want
> that?
> > > >>> > > > >
> > > >>> > > > > Gwen
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
> > > >>> allenxwang@gmail.com>
> > > >>> > > > wrote:
> > > >>> > > > >
> > > >>> > > > > > Hello Kafka Developers,
> > > >>> > > > > >
> > > >>> > > > > > I just created KIP-36 for rack aware replica assignment.
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > >>> > > > > >
> > > >>> > > > > > The goal is to utilize the isolation provided by the
> racks
> > in
> > > >>> data
> > > >>> > > > center
> > > >>> > > > > > and distribute replicas to racks to provide fault
> > tolerance.
> > > >>> > > > > >
> > > >>> > > > > > Comments are welcome.
> > > >>> > > > > >
> > > >>> > > > > > Thanks,
> > > >>> > > > > > Allen
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
For scenario 1:

- Add the rack information to broker property file or dynamically set it in
the wrapper code to bootstrap Kafka server. You would do that for all
brokers and restart the brokers one by one.

In this scenario, the complete broker to rack mapping may not be available
until every broker is restarted. During that time we fall back to default
replica assignment algorithm.

For scenario 2:

- Add the rack information to broker property file or dynamically set it in
the wrapper code and start the broker.


On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira <gw...@confluent.io> wrote:

> Can you clarify the workflow for the following scenarios:
>
> 1. I currently have 6 brokers and want to add rack information for each
> 2. I'm adding a new broker and I want to specify which rack it belongs on
> while adding it.
>
> Thanks!
>
> On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <al...@gmail.com> wrote:
>
> > We discussed the KIP in the hangout today. The recommendation is to make
> > rack as a broker property in ZooKeeper. For users with existing rack
> > information stored somewhere, they would need to retrieve the information
> > at broker start up and dynamically set the rack property, which can be
> > implemented as a wrapper to bootstrap broker. There will be no interface
> or
> > pluggable implementation to retrieve the rack information.
> >
> > The assumption is that you always need to restart the broker to make a
> > change to the rack.
> >
> > Once the rack becomes a broker property, it will be possible to make rack
> > part of the meta data to help the consumer choose which in sync replica
> to
> > consume from as part of the future consumer enhancement.
> >
> > I will update the KIP.
> >
> > Thanks,
> > Allen
> >
> >
> > On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <al...@gmail.com> wrote:
> >
> > > I attended Tuesday's KIP hangout but this KIP was not discussed due to
> > > time constraint.
> > >
> > > However, after hearing discussion of KIP-35, I have the feeling that
> > > incompatibility (caused by new broker property) between brokers with
> > > different versions  will be solved there. In addition, having stack in
> > > broker property as meta data may also help consumers in the future. So
> I
> > am
> > > open to adding stack property to broker.
> > >
> > > Hopefully we can discuss this in the next KIP hangout.
> > >
> > > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <al...@gmail.com>
> > wrote:
> > >
> > >> Can you send me the information on the next KIP hangout?
> > >>
> > >> Currently the broker-rack mapping is not cached. In KafkaApis,
> > >> RackLocator.getRackInfo() is called each time the mapping is needed
> for
> > >> auto topic creation. This will ensure latest mapping is used at any
> > time.
> > >>
> > >> The ability to get the complete mapping makes it simple to reuse the
> > same
> > >> interface in command line tools.
> > >>
> > >>
> > >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> > >> aauradkar@linkedin.com.invalid> wrote:
> > >>
> > >>> Perhaps we discuss this during the next KIP hangout?
> > >>>
> > >>> I do see that a pluggable rack locator can be useful but I do see a
> few
> > >>> concerns:
> > >>>
> > >>> - The RackLocator (as described in the document), implies that it can
> > >>> discover rack information for any node in the cluster. How does it
> deal
> > >>> with rack location changes? For example, if I moved broker id (1)
> from
> > >>> rack
> > >>> X to Y, I only have to start that broker with a newer rack config. If
> > >>> RackLocator discovers broker -> rack information at start up time,
> any
> > >>> change to a broker will require bouncing the entire cluster since
> > >>> createTopic requests can be sent to any node in the cluster.
> > >>> For this reason it may be simpler to have each node be aware of its
> own
> > >>> rack and persist it in ZK during start up time.
> > >>>
> > >>> - A pluggable RackLocator relies on an external service being
> available
> > >>> to
> > >>> serve rack information.
> > >>>
> > >>> Out of curiosity, I looked up how a couple of other systems deal with
> > >>> zone/rack awareness.
> > >>> For Cassandra some interesting modes are:
> > >>> (Property File configuration)
> > >>>
> > >>>
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> > >>> (Dynamic inference)
> > >>>
> > >>>
> >
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> > >>>
> > >>> Voldemort does a static node -> zone assignment based on
> configuration.
> > >>>
> > >>> Aditya
> > >>>
> > >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <al...@gmail.com>
> > >>> wrote:
> > >>>
> > >>> > I would like to see if we can do both:
> > >>> >
> > >>> > - Make RackLocator pluggable to facilitate migration with existing
> > >>> > broker-rack mapping
> > >>> >
> > >>> > - Make rack an optional property for broker. If rack is available
> > from
> > >>> > broker, treat it as source of truth. For users with existing
> > >>> broker-rack
> > >>> > mapping somewhere else, they can use the pluggable way or they can
> > >>> transfer
> > >>> > the mapping to the broker rack property.
> > >>> >
> > >>> > One thing I am not sure is what happens at rolling upgrade when we
> > have
> > >>> > rack as a broker property. For brokers with older version of Kafka,
> > >>> will it
> > >>> > cause problem for them? If so, is there any workaround? I also
> think
> > it
> > >>> > would be better not to have rack in the controller wire protocol
> but
> > >>> not
> > >>> > sure if it is achievable.
> > >>> >
> > >>> > Thanks,
> > >>> > Allen
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <tp...@gmail.com>
> > >>> wrote:
> > >>> >
> > >>> > > I tend to like the idea of a pluggable locator. For example, we
> > >>> already
> > >>> > > have an interface for discovering information about the physical
> > >>> location
> > >>> > > of servers. I don't relish the idea of having to maintain data in
> > >>> > multiple
> > >>> > > places.
> > >>> > >
> > >>> > > -Todd
> > >>> > >
> > >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> > >>> > > aauradkar@linkedin.com.invalid> wrote:
> > >>> > >
> > >>> > > > Thanks for starting this KIP Allen.
> > >>> > > >
> > >>> > > > I agree with Gwen that having a RackLocator class that is
> > pluggable
> > >>> > seems
> > >>> > > > to be too complex. The KIP refers to potentially non-ZK storage
> > >>> for the
> > >>> > > > rack info which I don't think is necessary.
> > >>> > > >
> > >>> > > > Perhaps we can persist this info in zk under
> > >>> /brokers/ids/<broker_id>
> > >>> > > > similar to other broker properties and add a config in
> > KafkaConfig
> > >>> > called
> > >>> > > > "rack".
> > >>> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy,
> "rack":
> > >>> > "abc"}
> > >>> > > >
> > >>> > > > Aditya
> > >>> > > >
> > >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <
> gwen@confluent.io
> > >
> > >>> > wrote:
> > >>> > > >
> > >>> > > > > Hi,
> > >>> > > > >
> > >>> > > > > First, thanks for putting out a KIP for this. This is super
> > >>> important
> > >>> > > for
> > >>> > > > > production deployments of Kafka.
> > >>> > > > >
> > >>> > > > > Few questions:
> > >>> > > > >
> > >>> > > > > 1) Are we sure we want "as many racks as possible"? I'd want
> to
> > >>> > balance
> > >>> > > > > between safety (more racks) and network utilization (traffic
> > >>> within a
> > >>> > > > rack
> > >>> > > > > uses the high-bandwidth TOR switch). One replica on a
> different
> > >>> rack
> > >>> > > and
> > >>> > > > > the rest on same rack (if possible) sounds better to me.
> > >>> > > > >
> > >>> > > > > 2) Rack-locator class seems overly complex compared to
> adding a
> > >>> > > > rack.number
> > >>> > > > > property to the broker properties file. Why do we want that?
> > >>> > > > >
> > >>> > > > > Gwen
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
> > >>> allenxwang@gmail.com>
> > >>> > > > wrote:
> > >>> > > > >
> > >>> > > > > > Hello Kafka Developers,
> > >>> > > > > >
> > >>> > > > > > I just created KIP-36 for rack aware replica assignment.
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > >>> > > > > >
> > >>> > > > > > The goal is to utilize the isolation provided by the racks
> in
> > >>> data
> > >>> > > > center
> > >>> > > > > > and distribute replicas to racks to provide fault
> tolerance.
> > >>> > > > > >
> > >>> > > > > > Comments are welcome.
> > >>> > > > > >
> > >>> > > > > > Thanks,
> > >>> > > > > > Allen
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Gwen Shapira <gw...@confluent.io>.
Can you clarify the workflow for the following scenarios:

1. I currently have 6 brokers and want to add rack information for each
2. I'm adding a new broker and I want to specify which rack it belongs on
while adding it.

Thanks!

On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang <al...@gmail.com> wrote:

> We discussed the KIP in the hangout today. The recommendation is to make
> rack as a broker property in ZooKeeper. For users with existing rack
> information stored somewhere, they would need to retrieve the information
> at broker start up and dynamically set the rack property, which can be
> implemented as a wrapper to bootstrap broker. There will be no interface or
> pluggable implementation to retrieve the rack information.
>
> The assumption is that you always need to restart the broker to make a
> change to the rack.
>
> Once the rack becomes a broker property, it will be possible to make rack
> part of the meta data to help the consumer choose which in sync replica to
> consume from as part of the future consumer enhancement.
>
> I will update the KIP.
>
> Thanks,
> Allen
>
>
> On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <al...@gmail.com> wrote:
>
> > I attended Tuesday's KIP hangout but this KIP was not discussed due to
> > time constraint.
> >
> > However, after hearing discussion of KIP-35, I have the feeling that
> > incompatibility (caused by new broker property) between brokers with
> > different versions  will be solved there. In addition, having stack in
> > broker property as meta data may also help consumers in the future. So I
> am
> > open to adding stack property to broker.
> >
> > Hopefully we can discuss this in the next KIP hangout.
> >
> > On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <al...@gmail.com>
> wrote:
> >
> >> Can you send me the information on the next KIP hangout?
> >>
> >> Currently the broker-rack mapping is not cached. In KafkaApis,
> >> RackLocator.getRackInfo() is called each time the mapping is needed for
> >> auto topic creation. This will ensure latest mapping is used at any
> time.
> >>
> >> The ability to get the complete mapping makes it simple to reuse the
> same
> >> interface in command line tools.
> >>
> >>
> >> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> >> aauradkar@linkedin.com.invalid> wrote:
> >>
> >>> Perhaps we discuss this during the next KIP hangout?
> >>>
> >>> I do see that a pluggable rack locator can be useful but I do see a few
> >>> concerns:
> >>>
> >>> - The RackLocator (as described in the document), implies that it can
> >>> discover rack information for any node in the cluster. How does it deal
> >>> with rack location changes? For example, if I moved broker id (1) from
> >>> rack
> >>> X to Y, I only have to start that broker with a newer rack config. If
> >>> RackLocator discovers broker -> rack information at start up time, any
> >>> change to a broker will require bouncing the entire cluster since
> >>> createTopic requests can be sent to any node in the cluster.
> >>> For this reason it may be simpler to have each node be aware of its own
> >>> rack and persist it in ZK during start up time.
> >>>
> >>> - A pluggable RackLocator relies on an external service being available
> >>> to
> >>> serve rack information.
> >>>
> >>> Out of curiosity, I looked up how a couple of other systems deal with
> >>> zone/rack awareness.
> >>> For Cassandra some interesting modes are:
> >>> (Property File configuration)
> >>>
> >>>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> >>> (Dynamic inference)
> >>>
> >>>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
> >>>
> >>> Voldemort does a static node -> zone assignment based on configuration.
> >>>
> >>> Aditya
> >>>
> >>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <al...@gmail.com>
> >>> wrote:
> >>>
> >>> > I would like to see if we can do both:
> >>> >
> >>> > - Make RackLocator pluggable to facilitate migration with existing
> >>> > broker-rack mapping
> >>> >
> >>> > - Make rack an optional property for broker. If rack is available
> from
> >>> > broker, treat it as source of truth. For users with existing
> >>> broker-rack
> >>> > mapping somewhere else, they can use the pluggable way or they can
> >>> transfer
> >>> > the mapping to the broker rack property.
> >>> >
> >>> > One thing I am not sure is what happens at rolling upgrade when we
> have
> >>> > rack as a broker property. For brokers with older version of Kafka,
> >>> will it
> >>> > cause problem for them? If so, is there any workaround? I also think
> it
> >>> > would be better not to have rack in the controller wire protocol but
> >>> not
> >>> > sure if it is achievable.
> >>> >
> >>> > Thanks,
> >>> > Allen
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <tp...@gmail.com>
> >>> wrote:
> >>> >
> >>> > > I tend to like the idea of a pluggable locator. For example, we
> >>> already
> >>> > > have an interface for discovering information about the physical
> >>> location
> >>> > > of servers. I don't relish the idea of having to maintain data in
> >>> > multiple
> >>> > > places.
> >>> > >
> >>> > > -Todd
> >>> > >
> >>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> >>> > > aauradkar@linkedin.com.invalid> wrote:
> >>> > >
> >>> > > > Thanks for starting this KIP Allen.
> >>> > > >
> >>> > > > I agree with Gwen that having a RackLocator class that is
> pluggable
> >>> > seems
> >>> > > > to be too complex. The KIP refers to potentially non-ZK storage
> >>> for the
> >>> > > > rack info which I don't think is necessary.
> >>> > > >
> >>> > > > Perhaps we can persist this info in zk under
> >>> /brokers/ids/<broker_id>
> >>> > > > similar to other broker properties and add a config in
> KafkaConfig
> >>> > called
> >>> > > > "rack".
> >>> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, "rack":
> >>> > "abc"}
> >>> > > >
> >>> > > > Aditya
> >>> > > >
> >>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <gwen@confluent.io
> >
> >>> > wrote:
> >>> > > >
> >>> > > > > Hi,
> >>> > > > >
> >>> > > > > First, thanks for putting out a KIP for this. This is super
> >>> important
> >>> > > for
> >>> > > > > production deployments of Kafka.
> >>> > > > >
> >>> > > > > Few questions:
> >>> > > > >
> >>> > > > > 1) Are we sure we want "as many racks as possible"? I'd want to
> >>> > balance
> >>> > > > > between safety (more racks) and network utilization (traffic
> >>> within a
> >>> > > > rack
> >>> > > > > uses the high-bandwidth TOR switch). One replica on a different
> >>> rack
> >>> > > and
> >>> > > > > the rest on same rack (if possible) sounds better to me.
> >>> > > > >
> >>> > > > > 2) Rack-locator class seems overly complex compared to adding a
> >>> > > > rack.number
> >>> > > > > property to the broker properties file. Why do we want that?
> >>> > > > >
> >>> > > > > Gwen
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
> >>> allenxwang@gmail.com>
> >>> > > > wrote:
> >>> > > > >
> >>> > > > > > Hello Kafka Developers,
> >>> > > > > >
> >>> > > > > > I just created KIP-36 for rack aware replica assignment.
> >>> > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> >>> > > > > >
> >>> > > > > > The goal is to utilize the isolation provided by the racks in
> >>> data
> >>> > > > center
> >>> > > > > > and distribute replicas to racks to provide fault tolerance.
> >>> > > > > >
> >>> > > > > > Comments are welcome.
> >>> > > > > >
> >>> > > > > > Thanks,
> >>> > > > > > Allen
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
We discussed the KIP in the hangout today. The recommendation is to make
rack as a broker property in ZooKeeper. For users with existing rack
information stored somewhere, they would need to retrieve the information
at broker start up and dynamically set the rack property, which can be
implemented as a wrapper to bootstrap broker. There will be no interface or
pluggable implementation to retrieve the rack information.

The assumption is that you always need to restart the broker to make a
change to the rack.

Once the rack becomes a broker property, it will be possible to make rack
part of the meta data to help the consumer choose which in sync replica to
consume from as part of the future consumer enhancement.

I will update the KIP.

Thanks,
Allen


On Thu, Oct 8, 2015 at 9:23 AM, Allen Wang <al...@gmail.com> wrote:

> I attended Tuesday's KIP hangout but this KIP was not discussed due to
> time constraint.
>
> However, after hearing discussion of KIP-35, I have the feeling that
> incompatibility (caused by new broker property) between brokers with
> different versions  will be solved there. In addition, having stack in
> broker property as meta data may also help consumers in the future. So I am
> open to adding stack property to broker.
>
> Hopefully we can discuss this in the next KIP hangout.
>
> On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <al...@gmail.com> wrote:
>
>> Can you send me the information on the next KIP hangout?
>>
>> Currently the broker-rack mapping is not cached. In KafkaApis,
>> RackLocator.getRackInfo() is called each time the mapping is needed for
>> auto topic creation. This will ensure latest mapping is used at any time.
>>
>> The ability to get the complete mapping makes it simple to reuse the same
>> interface in command line tools.
>>
>>
>> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
>> aauradkar@linkedin.com.invalid> wrote:
>>
>>> Perhaps we discuss this during the next KIP hangout?
>>>
>>> I do see that a pluggable rack locator can be useful but I do see a few
>>> concerns:
>>>
>>> - The RackLocator (as described in the document), implies that it can
>>> discover rack information for any node in the cluster. How does it deal
>>> with rack location changes? For example, if I moved broker id (1) from
>>> rack
>>> X to Y, I only have to start that broker with a newer rack config. If
>>> RackLocator discovers broker -> rack information at start up time, any
>>> change to a broker will require bouncing the entire cluster since
>>> createTopic requests can be sent to any node in the cluster.
>>> For this reason it may be simpler to have each node be aware of its own
>>> rack and persist it in ZK during start up time.
>>>
>>> - A pluggable RackLocator relies on an external service being available
>>> to
>>> serve rack information.
>>>
>>> Out of curiosity, I looked up how a couple of other systems deal with
>>> zone/rack awareness.
>>> For Cassandra some interesting modes are:
>>> (Property File configuration)
>>>
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>>> (Dynamic inference)
>>>
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>>>
>>> Voldemort does a static node -> zone assignment based on configuration.
>>>
>>> Aditya
>>>
>>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <al...@gmail.com>
>>> wrote:
>>>
>>> > I would like to see if we can do both:
>>> >
>>> > - Make RackLocator pluggable to facilitate migration with existing
>>> > broker-rack mapping
>>> >
>>> > - Make rack an optional property for broker. If rack is available from
>>> > broker, treat it as source of truth. For users with existing
>>> broker-rack
>>> > mapping somewhere else, they can use the pluggable way or they can
>>> transfer
>>> > the mapping to the broker rack property.
>>> >
>>> > One thing I am not sure is what happens at rolling upgrade when we have
>>> > rack as a broker property. For brokers with older version of Kafka,
>>> will it
>>> > cause problem for them? If so, is there any workaround? I also think it
>>> > would be better not to have rack in the controller wire protocol but
>>> not
>>> > sure if it is achievable.
>>> >
>>> > Thanks,
>>> > Allen
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <tp...@gmail.com>
>>> wrote:
>>> >
>>> > > I tend to like the idea of a pluggable locator. For example, we
>>> already
>>> > > have an interface for discovering information about the physical
>>> location
>>> > > of servers. I don't relish the idea of having to maintain data in
>>> > multiple
>>> > > places.
>>> > >
>>> > > -Todd
>>> > >
>>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
>>> > > aauradkar@linkedin.com.invalid> wrote:
>>> > >
>>> > > > Thanks for starting this KIP Allen.
>>> > > >
>>> > > > I agree with Gwen that having a RackLocator class that is pluggable
>>> > seems
>>> > > > to be too complex. The KIP refers to potentially non-ZK storage
>>> for the
>>> > > > rack info which I don't think is necessary.
>>> > > >
>>> > > > Perhaps we can persist this info in zk under
>>> /brokers/ids/<broker_id>
>>> > > > similar to other broker properties and add a config in KafkaConfig
>>> > called
>>> > > > "rack".
>>> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, "rack":
>>> > "abc"}
>>> > > >
>>> > > > Aditya
>>> > > >
>>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <gw...@confluent.io>
>>> > wrote:
>>> > > >
>>> > > > > Hi,
>>> > > > >
>>> > > > > First, thanks for putting out a KIP for this. This is super
>>> important
>>> > > for
>>> > > > > production deployments of Kafka.
>>> > > > >
>>> > > > > Few questions:
>>> > > > >
>>> > > > > 1) Are we sure we want "as many racks as possible"? I'd want to
>>> > balance
>>> > > > > between safety (more racks) and network utilization (traffic
>>> within a
>>> > > > rack
>>> > > > > uses the high-bandwidth TOR switch). One replica on a different
>>> rack
>>> > > and
>>> > > > > the rest on same rack (if possible) sounds better to me.
>>> > > > >
>>> > > > > 2) Rack-locator class seems overly complex compared to adding a
>>> > > > rack.number
>>> > > > > property to the broker properties file. Why do we want that?
>>> > > > >
>>> > > > > Gwen
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
>>> allenxwang@gmail.com>
>>> > > > wrote:
>>> > > > >
>>> > > > > > Hello Kafka Developers,
>>> > > > > >
>>> > > > > > I just created KIP-36 for rack aware replica assignment.
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>>> > > > > >
>>> > > > > > The goal is to utilize the isolation provided by the racks in
>>> data
>>> > > > center
>>> > > > > > and distribute replicas to racks to provide fault tolerance.
>>> > > > > >
>>> > > > > > Comments are welcome.
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > > Allen
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
I attended Tuesday's KIP hangout but this KIP was not discussed due to time
constraint.

However, after hearing discussion of KIP-35, I have the feeling that
incompatibility (caused by new broker property) between brokers with
different versions  will be solved there. In addition, having stack in
broker property as meta data may also help consumers in the future. So I am
open to adding stack property to broker.

Hopefully we can discuss this in the next KIP hangout.

On Wed, Sep 30, 2015 at 2:46 PM, Allen Wang <al...@gmail.com> wrote:

> Can you send me the information on the next KIP hangout?
>
> Currently the broker-rack mapping is not cached. In KafkaApis,
> RackLocator.getRackInfo() is called each time the mapping is needed for
> auto topic creation. This will ensure latest mapping is used at any time.
>
> The ability to get the complete mapping makes it simple to reuse the same
> interface in command line tools.
>
>
> On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
> aauradkar@linkedin.com.invalid> wrote:
>
>> Perhaps we discuss this during the next KIP hangout?
>>
>> I do see that a pluggable rack locator can be useful but I do see a few
>> concerns:
>>
>> - The RackLocator (as described in the document), implies that it can
>> discover rack information for any node in the cluster. How does it deal
>> with rack location changes? For example, if I moved broker id (1) from
>> rack
>> X to Y, I only have to start that broker with a newer rack config. If
>> RackLocator discovers broker -> rack information at start up time, any
>> change to a broker will require bouncing the entire cluster since
>> createTopic requests can be sent to any node in the cluster.
>> For this reason it may be simpler to have each node be aware of its own
>> rack and persist it in ZK during start up time.
>>
>> - A pluggable RackLocator relies on an external service being available to
>> serve rack information.
>>
>> Out of curiosity, I looked up how a couple of other systems deal with
>> zone/rack awareness.
>> For Cassandra some interesting modes are:
>> (Property File configuration)
>>
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
>> (Dynamic inference)
>>
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>>
>> Voldemort does a static node -> zone assignment based on configuration.
>>
>> Aditya
>>
>> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <al...@gmail.com>
>> wrote:
>>
>> > I would like to see if we can do both:
>> >
>> > - Make RackLocator pluggable to facilitate migration with existing
>> > broker-rack mapping
>> >
>> > - Make rack an optional property for broker. If rack is available from
>> > broker, treat it as source of truth. For users with existing broker-rack
>> > mapping somewhere else, they can use the pluggable way or they can
>> transfer
>> > the mapping to the broker rack property.
>> >
>> > One thing I am not sure is what happens at rolling upgrade when we have
>> > rack as a broker property. For brokers with older version of Kafka,
>> will it
>> > cause problem for them? If so, is there any workaround? I also think it
>> > would be better not to have rack in the controller wire protocol but not
>> > sure if it is achievable.
>> >
>> > Thanks,
>> > Allen
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <tp...@gmail.com> wrote:
>> >
>> > > I tend to like the idea of a pluggable locator. For example, we
>> already
>> > > have an interface for discovering information about the physical
>> location
>> > > of servers. I don't relish the idea of having to maintain data in
>> > multiple
>> > > places.
>> > >
>> > > -Todd
>> > >
>> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
>> > > aauradkar@linkedin.com.invalid> wrote:
>> > >
>> > > > Thanks for starting this KIP Allen.
>> > > >
>> > > > I agree with Gwen that having a RackLocator class that is pluggable
>> > seems
>> > > > to be too complex. The KIP refers to potentially non-ZK storage for
>> the
>> > > > rack info which I don't think is necessary.
>> > > >
>> > > > Perhaps we can persist this info in zk under
>> /brokers/ids/<broker_id>
>> > > > similar to other broker properties and add a config in KafkaConfig
>> > called
>> > > > "rack".
>> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, "rack":
>> > "abc"}
>> > > >
>> > > > Aditya
>> > > >
>> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <gw...@confluent.io>
>> > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > First, thanks for putting out a KIP for this. This is super
>> important
>> > > for
>> > > > > production deployments of Kafka.
>> > > > >
>> > > > > Few questions:
>> > > > >
>> > > > > 1) Are we sure we want "as many racks as possible"? I'd want to
>> > balance
>> > > > > between safety (more racks) and network utilization (traffic
>> within a
>> > > > rack
>> > > > > uses the high-bandwidth TOR switch). One replica on a different
>> rack
>> > > and
>> > > > > the rest on same rack (if possible) sounds better to me.
>> > > > >
>> > > > > 2) Rack-locator class seems overly complex compared to adding a
>> > > > rack.number
>> > > > > property to the broker properties file. Why do we want that?
>> > > > >
>> > > > > Gwen
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <
>> allenxwang@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > > Hello Kafka Developers,
>> > > > > >
>> > > > > > I just created KIP-36 for rack aware replica assignment.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>> > > > > >
>> > > > > > The goal is to utilize the isolation provided by the racks in
>> data
>> > > > center
>> > > > > > and distribute replicas to racks to provide fault tolerance.
>> > > > > >
>> > > > > > Comments are welcome.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Allen
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Can you send me the information on the next KIP hangout?

Currently the broker-rack mapping is not cached. In KafkaApis,
RackLocator.getRackInfo() is called each time the mapping is needed for
auto topic creation. This will ensure latest mapping is used at any time.

The ability to get the complete mapping makes it simple to reuse the same
interface in command line tools.


On Wed, Sep 30, 2015 at 11:01 AM, Aditya Auradkar <
aauradkar@linkedin.com.invalid> wrote:

> Perhaps we discuss this during the next KIP hangout?
>
> I do see that a pluggable rack locator can be useful but I do see a few
> concerns:
>
> - The RackLocator (as described in the document), implies that it can
> discover rack information for any node in the cluster. How does it deal
> with rack location changes? For example, if I moved broker id (1) from rack
> X to Y, I only have to start that broker with a newer rack config. If
> RackLocator discovers broker -> rack information at start up time, any
> change to a broker will require bouncing the entire cluster since
> createTopic requests can be sent to any node in the cluster.
> For this reason it may be simpler to have each node be aware of its own
> rack and persist it in ZK during start up time.
>
> - A pluggable RackLocator relies on an external service being available to
> serve rack information.
>
> Out of curiosity, I looked up how a couple of other systems deal with
> zone/rack awareness.
> For Cassandra some interesting modes are:
> (Property File configuration)
>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
> (Dynamic inference)
>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html
>
> Voldemort does a static node -> zone assignment based on configuration.
>
> Aditya
>
> On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <al...@gmail.com> wrote:
>
> > I would like to see if we can do both:
> >
> > - Make RackLocator pluggable to facilitate migration with existing
> > broker-rack mapping
> >
> > - Make rack an optional property for broker. If rack is available from
> > broker, treat it as source of truth. For users with existing broker-rack
> > mapping somewhere else, they can use the pluggable way or they can
> transfer
> > the mapping to the broker rack property.
> >
> > One thing I am not sure is what happens at rolling upgrade when we have
> > rack as a broker property. For brokers with older version of Kafka, will
> it
> > cause problem for them? If so, is there any workaround? I also think it
> > would be better not to have rack in the controller wire protocol but not
> > sure if it is achievable.
> >
> > Thanks,
> > Allen
> >
> >
> >
> >
> >
> > On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <tp...@gmail.com> wrote:
> >
> > > I tend to like the idea of a pluggable locator. For example, we already
> > > have an interface for discovering information about the physical
> location
> > > of servers. I don't relish the idea of having to maintain data in
> > multiple
> > > places.
> > >
> > > -Todd
> > >
> > > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> > > aauradkar@linkedin.com.invalid> wrote:
> > >
> > > > Thanks for starting this KIP Allen.
> > > >
> > > > I agree with Gwen that having a RackLocator class that is pluggable
> > seems
> > > > to be too complex. The KIP refers to potentially non-ZK storage for
> the
> > > > rack info which I don't think is necessary.
> > > >
> > > > Perhaps we can persist this info in zk under /brokers/ids/<broker_id>
> > > > similar to other broker properties and add a config in KafkaConfig
> > called
> > > > "rack".
> > > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, "rack":
> > "abc"}
> > > >
> > > > Aditya
> > > >
> > > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <gw...@confluent.io>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > First, thanks for putting out a KIP for this. This is super
> important
> > > for
> > > > > production deployments of Kafka.
> > > > >
> > > > > Few questions:
> > > > >
> > > > > 1) Are we sure we want "as many racks as possible"? I'd want to
> > balance
> > > > > between safety (more racks) and network utilization (traffic
> within a
> > > > rack
> > > > > uses the high-bandwidth TOR switch). One replica on a different
> rack
> > > and
> > > > > the rest on same rack (if possible) sounds better to me.
> > > > >
> > > > > 2) Rack-locator class seems overly complex compared to adding a
> > > > rack.number
> > > > > property to the broker properties file. Why do we want that?
> > > > >
> > > > > Gwen
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <allenxwang@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hello Kafka Developers,
> > > > > >
> > > > > > I just created KIP-36 for rack aware replica assignment.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > > >
> > > > > > The goal is to utilize the isolation provided by the racks in
> data
> > > > center
> > > > > > and distribute replicas to racks to provide fault tolerance.
> > > > > >
> > > > > > Comments are welcome.
> > > > > >
> > > > > > Thanks,
> > > > > > Allen
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Aditya Auradkar <aa...@linkedin.com.INVALID>.
Perhaps we discuss this during the next KIP hangout?

I do see that a pluggable rack locator can be useful but I do see a few
concerns:

- The RackLocator (as described in the document), implies that it can
discover rack information for any node in the cluster. How does it deal
with rack location changes? For example, if I moved broker id (1) from rack
X to Y, I only have to start that broker with a newer rack config. If
RackLocator discovers broker -> rack information at start up time, any
change to a broker will require bouncing the entire cluster since
createTopic requests can be sent to any node in the cluster.
For this reason it may be simpler to have each node be aware of its own
rack and persist it in ZK during start up time.

- A pluggable RackLocator relies on an external service being available to
serve rack information.

Out of curiosity, I looked up how a couple of other systems deal with
zone/rack awareness.
For Cassandra some interesting modes are:
(Property File configuration)
http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchPFSnitch_t.html
(Dynamic inference)
http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchRackInf_c.html

Voldemort does a static node -> zone assignment based on configuration.

Aditya

On Wed, Sep 30, 2015 at 10:05 AM, Allen Wang <al...@gmail.com> wrote:

> I would like to see if we can do both:
>
> - Make RackLocator pluggable to facilitate migration with existing
> broker-rack mapping
>
> - Make rack an optional property for broker. If rack is available from
> broker, treat it as source of truth. For users with existing broker-rack
> mapping somewhere else, they can use the pluggable way or they can transfer
> the mapping to the broker rack property.
>
> One thing I am not sure is what happens at rolling upgrade when we have
> rack as a broker property. For brokers with older version of Kafka, will it
> cause problem for them? If so, is there any workaround? I also think it
> would be better not to have rack in the controller wire protocol but not
> sure if it is achievable.
>
> Thanks,
> Allen
>
>
>
>
>
> On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <tp...@gmail.com> wrote:
>
> > I tend to like the idea of a pluggable locator. For example, we already
> > have an interface for discovering information about the physical location
> > of servers. I don't relish the idea of having to maintain data in
> multiple
> > places.
> >
> > -Todd
> >
> > On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> > aauradkar@linkedin.com.invalid> wrote:
> >
> > > Thanks for starting this KIP Allen.
> > >
> > > I agree with Gwen that having a RackLocator class that is pluggable
> seems
> > > to be too complex. The KIP refers to potentially non-ZK storage for the
> > > rack info which I don't think is necessary.
> > >
> > > Perhaps we can persist this info in zk under /brokers/ids/<broker_id>
> > > similar to other broker properties and add a config in KafkaConfig
> called
> > > "rack".
> > > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, "rack":
> "abc"}
> > >
> > > Aditya
> > >
> > > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <gw...@confluent.io>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > First, thanks for putting out a KIP for this. This is super important
> > for
> > > > production deployments of Kafka.
> > > >
> > > > Few questions:
> > > >
> > > > 1) Are we sure we want "as many racks as possible"? I'd want to
> balance
> > > > between safety (more racks) and network utilization (traffic within a
> > > rack
> > > > uses the high-bandwidth TOR switch). One replica on a different rack
> > and
> > > > the rest on same rack (if possible) sounds better to me.
> > > >
> > > > 2) Rack-locator class seems overly complex compared to adding a
> > > rack.number
> > > > property to the broker properties file. Why do we want that?
> > > >
> > > > Gwen
> > > >
> > > >
> > > >
> > > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <al...@gmail.com>
> > > wrote:
> > > >
> > > > > Hello Kafka Developers,
> > > > >
> > > > > I just created KIP-36 for rack aware replica assignment.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > > >
> > > > > The goal is to utilize the isolation provided by the racks in data
> > > center
> > > > > and distribute replicas to racks to provide fault tolerance.
> > > > >
> > > > > Comments are welcome.
> > > > >
> > > > > Thanks,
> > > > > Allen
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
I would like to see if we can do both:

- Make RackLocator pluggable to facilitate migration with existing
broker-rack mapping

- Make rack an optional property for broker. If rack is available from
broker, treat it as source of truth. For users with existing broker-rack
mapping somewhere else, they can use the pluggable way or they can transfer
the mapping to the broker rack property.

One thing I am not sure is what happens at rolling upgrade when we have
rack as a broker property. For brokers with older version of Kafka, will it
cause problem for them? If so, is there any workaround? I also think it
would be better not to have rack in the controller wire protocol but not
sure if it is achievable.

Thanks,
Allen





On Mon, Sep 28, 2015 at 4:55 PM, Todd Palino <tp...@gmail.com> wrote:

> I tend to like the idea of a pluggable locator. For example, we already
> have an interface for discovering information about the physical location
> of servers. I don't relish the idea of having to maintain data in multiple
> places.
>
> -Todd
>
> On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
> aauradkar@linkedin.com.invalid> wrote:
>
> > Thanks for starting this KIP Allen.
> >
> > I agree with Gwen that having a RackLocator class that is pluggable seems
> > to be too complex. The KIP refers to potentially non-ZK storage for the
> > rack info which I don't think is necessary.
> >
> > Perhaps we can persist this info in zk under /brokers/ids/<broker_id>
> > similar to other broker properties and add a config in KafkaConfig called
> > "rack".
> > {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, "rack": "abc"}
> >
> > Aditya
> >
> > On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <gw...@confluent.io> wrote:
> >
> > > Hi,
> > >
> > > First, thanks for putting out a KIP for this. This is super important
> for
> > > production deployments of Kafka.
> > >
> > > Few questions:
> > >
> > > 1) Are we sure we want "as many racks as possible"? I'd want to balance
> > > between safety (more racks) and network utilization (traffic within a
> > rack
> > > uses the high-bandwidth TOR switch). One replica on a different rack
> and
> > > the rest on same rack (if possible) sounds better to me.
> > >
> > > 2) Rack-locator class seems overly complex compared to adding a
> > rack.number
> > > property to the broker properties file. Why do we want that?
> > >
> > > Gwen
> > >
> > >
> > >
> > > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <al...@gmail.com>
> > wrote:
> > >
> > > > Hello Kafka Developers,
> > > >
> > > > I just created KIP-36 for rack aware replica assignment.
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > > >
> > > > The goal is to utilize the isolation provided by the racks in data
> > center
> > > > and distribute replicas to racks to provide fault tolerance.
> > > >
> > > > Comments are welcome.
> > > >
> > > > Thanks,
> > > > Allen
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Todd Palino <tp...@gmail.com>.
I tend to like the idea of a pluggable locator. For example, we already
have an interface for discovering information about the physical location
of servers. I don't relish the idea of having to maintain data in multiple
places.

-Todd

On Mon, Sep 28, 2015 at 4:48 PM, Aditya Auradkar <
aauradkar@linkedin.com.invalid> wrote:

> Thanks for starting this KIP Allen.
>
> I agree with Gwen that having a RackLocator class that is pluggable seems
> to be too complex. The KIP refers to potentially non-ZK storage for the
> rack info which I don't think is necessary.
>
> Perhaps we can persist this info in zk under /brokers/ids/<broker_id>
> similar to other broker properties and add a config in KafkaConfig called
> "rack".
> {"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, "rack": "abc"}
>
> Aditya
>
> On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <gw...@confluent.io> wrote:
>
> > Hi,
> >
> > First, thanks for putting out a KIP for this. This is super important for
> > production deployments of Kafka.
> >
> > Few questions:
> >
> > 1) Are we sure we want "as many racks as possible"? I'd want to balance
> > between safety (more racks) and network utilization (traffic within a
> rack
> > uses the high-bandwidth TOR switch). One replica on a different rack and
> > the rest on same rack (if possible) sounds better to me.
> >
> > 2) Rack-locator class seems overly complex compared to adding a
> rack.number
> > property to the broker properties file. Why do we want that?
> >
> > Gwen
> >
> >
> >
> > On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <al...@gmail.com>
> wrote:
> >
> > > Hello Kafka Developers,
> > >
> > > I just created KIP-36 for rack aware replica assignment.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> > >
> > > The goal is to utilize the isolation provided by the racks in data
> center
> > > and distribute replicas to racks to provide fault tolerance.
> > >
> > > Comments are welcome.
> > >
> > > Thanks,
> > > Allen
> > >
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Aditya Auradkar <aa...@linkedin.com.INVALID>.
Thanks for starting this KIP Allen.

I agree with Gwen that having a RackLocator class that is pluggable seems
to be too complex. The KIP refers to potentially non-ZK storage for the
rack info which I don't think is necessary.

Perhaps we can persist this info in zk under /brokers/ids/<broker_id>
similar to other broker properties and add a config in KafkaConfig called
"rack".
{"jmx_port":-1,"endpoints":[...],"host":"xxx","port":yyy, "rack": "abc"}

Aditya

On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <gw...@confluent.io> wrote:

> Hi,
>
> First, thanks for putting out a KIP for this. This is super important for
> production deployments of Kafka.
>
> Few questions:
>
> 1) Are we sure we want "as many racks as possible"? I'd want to balance
> between safety (more racks) and network utilization (traffic within a rack
> uses the high-bandwidth TOR switch). One replica on a different rack and
> the rest on same rack (if possible) sounds better to me.
>
> 2) Rack-locator class seems overly complex compared to adding a rack.number
> property to the broker properties file. Why do we want that?
>
> Gwen
>
>
>
> On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <al...@gmail.com> wrote:
>
> > Hello Kafka Developers,
> >
> > I just created KIP-36 for rack aware replica assignment.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> >
> > The goal is to utilize the isolation provided by the racks in data center
> > and distribute replicas to racks to provide fault tolerance.
> >
> > Comments are welcome.
> >
> > Thanks,
> > Allen
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Allen Wang <al...@gmail.com>.
Hi Gwen,

For question 1, I agree there is a balance issue. The other perspective of
it is that the more racks the replicas are assigned to, the more resilient
it is to rack outages. If we have three replicas each on a different rack,
we can in theory be tolerant to 2 racks failure. Also, the current
assignment does not make any optimization toward traffic within a rack. So
there is no regression in this regard.

For question 2, think of a situation where people already have rack
information stored somewhere outside of ZooKeeper or in ZooKeeper but not
as part of the broker property. Having a generic Map would help them
migrating and utilizing the new feature. There are also some comments in
the code review of previous patch (see
https://reviews.apache.org/r/17248/diff/3/) that suggests that if we can
avoid changing the core data structure it would make migration easier when
doing rolling upgrade.

Thanks,
Allen


On Mon, Sep 28, 2015 at 2:30 PM, Gwen Shapira <gw...@confluent.io> wrote:

> Hi,
>
> First, thanks for putting out a KIP for this. This is super important for
> production deployments of Kafka.
>
> Few questions:
>
> 1) Are we sure we want "as many racks as possible"? I'd want to balance
> between safety (more racks) and network utilization (traffic within a rack
> uses the high-bandwidth TOR switch). One replica on a different rack and
> the rest on same rack (if possible) sounds better to me.
>
> 2) Rack-locator class seems overly complex compared to adding a rack.number
> property to the broker properties file. Why do we want that?
>
> Gwen
>
>
>
> On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <al...@gmail.com> wrote:
>
> > Hello Kafka Developers,
> >
> > I just created KIP-36 for rack aware replica assignment.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
> >
> > The goal is to utilize the isolation provided by the racks in data center
> > and distribute replicas to racks to provide fault tolerance.
> >
> > Comments are welcome.
> >
> > Thanks,
> > Allen
> >
>

Re: [DISCUSS] KIP-36 - Rack aware replica assignment

Posted by Gwen Shapira <gw...@confluent.io>.
Hi,

First, thanks for putting out a KIP for this. This is super important for
production deployments of Kafka.

Few questions:

1) Are we sure we want "as many racks as possible"? I'd want to balance
between safety (more racks) and network utilization (traffic within a rack
uses the high-bandwidth TOR switch). One replica on a different rack and
the rest on same rack (if possible) sounds better to me.

2) Rack-locator class seems overly complex compared to adding a rack.number
property to the broker properties file. Why do we want that?

Gwen



On Mon, Sep 28, 2015 at 12:15 PM, Allen Wang <al...@gmail.com> wrote:

> Hello Kafka Developers,
>
> I just created KIP-36 for rack aware replica assignment.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
>
> The goal is to utilize the isolation provided by the racks in data center
> and distribute replicas to racks to provide fault tolerance.
>
> Comments are welcome.
>
> Thanks,
> Allen
>