You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Aniket Bhatnagar <an...@gmail.com> on 2013/10/01 12:51:35 UTC

Strategies for auto generating broker ID

I would like to revive an older thread around auto generating broker ID. As
a AWS user, I would like Kafka to just use the instance's ID or instance's
IP or instance's internal domain (whichever is easier). This would mean I
can easily clone from a AMI to launch kafka instances without having to
worry about setting a unique broker ID. This also alows me to setup auto
scaling.

I realize 1 size may not fit all in this case. Other strategies that may
work for other cloud providers are generate the UUID and persist it on a
disk, etc.

What I propose is a way to define a a broker ID generation strategy in the
configuration file which points to a class file that is responsible for
generating the ID. Is this something being already worked upon?

Re: Strategies for auto generating broker ID

Posted by Neha Narkhede <ne...@gmail.com>.

>> what is the procedure for
having the new broker assume the identity of the previously failed one?

Copying the meta file to any one of the data directories on the new broker
prior to the initial startup will work.

Thanks,
Neha


On Wed, Oct 2, 2013 at 11:47 AM, Jason Rosenberg <jb...@squareup.com> wrote:

> The one concern with this meta data approach, is that it seems like a
> pretty low-level thing to have to manage.  If  I have a broker failure, and
> I want to bring in a new node to replace it, what is the procedure for
> having the new broker assume the identity of the previously failed one?
>  Manually setting it in the config for the broker is perhaps error prone,
> but is pretty straightforward.  I'm not sure it's cleaner to manually edit
> replicated meta files in the data log dirs for the broker?
>
>
> On Wed, Oct 2, 2013 at 2:20 PM, Sriram Subramanian <
> srsubramanian@linkedin.com> wrote:
>
> > Jason - You should be able to solve that with Jay's proposal below. If
> you
> > just persist the id in a meta file, you can copy the meta file over to
> the
> > new broker and broker will not re-generate another id.
> >
> > On 10/2/13 11:10 AM, "Jason Rosenberg" <jb...@squareup.com> wrote:
> >
> > >I recently moved away from generating a unique brokerId for each node,
> in
> > >favor of assigning ids in configuration.  The reason for this, is that
> in
> > >0.8, there isn't a convenient way yet to reassign partitions to a new
> > >brokerid, should one broker have a failure.  So, it seems the only
> > >work-around at the moment is to bring up a replacement broker, assign it
> > >the same brokerId as one that has failed and is no longer running.  The
> > >cluster will then automatically replicate all the partitions that were
> > >assigned to the failed broker to the new broker.
> > >
> > >This appears the only operational way to deal with failed brokers, at
> the
> > >moment.
> > >
> > >Longer term, it would be great if the cluster were self-healing, and if
> a
> > >broker went down, we could mark it as no longer available somehow, and
> the
> > >cluster would then reassign and re-replicate partitions to new brokers,
> > >that were previously assigned to the failed broker.  I expect something
> > >like this will be available in future versions, but that doesn't appear
> > >the
> > >case at present.
> > >
> > >And related, it would be nice, in the interests of horizontal
> scalability,
> > >to have an easy way for the cluster to dynamically rebalance load, if
> new
> > >nodes are added to the cluster (or to at least prefer assigning new
> > >partitions to brokers which have more space available).  I expect this
> > >will
> > >be something to prioritize in the future versions as well.
> > >
> > >Jason
> > >
> > >
> > >On Wed, Oct 2, 2013 at 1:00 PM, Sriram Subramanian <
> > >srsubramanian@linkedin.com> wrote:
> > >
> > >> I agree that we need a unique id and have something independent of the
> > >> machine. I am not sure you want a dependency on ZK to generate the
> > >>unique
> > >> id though. There are other ways to generate an unique id (Example -
> > >>UUID).
> > >> In case there was a collision (highly unlikely), the node creation in
> ZK
> > >> will anyways fail and the broker can regenerate another id.
> > >>
> > >> On 10/2/13 9:52 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> > >>
> > >> >There are scenarios in which you want a hostname to change or you
> want
> > >>to
> > >> >move the stored data off one machine onto another. This is the
> > >>motivation
> > >> >systems have for having a layer of indirection between the location
> and
> > >> >the
> > >> >identity of the nodes.
> > >> >
> > >> >-Jay
> > >> >
> > >> >
> > >> >On Wed, Oct 2, 2013 at 9:23 AM, Guozhang Wang <wa...@gmail.com>
> > >>wrote:
> > >> >
> > >> >> Wondering what is the reason behind decoupling the node id with its
> > >> >> physical host(port)? If we found that for example, node 1 is not
> > >>owning
> > >> >>any
> > >> >> partitions, how would we know which physical machine is this node
> > >>then?
> > >> >>
> > >> >> Guozhang
> > >> >>
> > >> >>
> > >> >> On Wed, Oct 2, 2013 at 9:07 AM, Jay Kreps <ja...@gmail.com>
> > >>wrote:
> > >> >>
> > >> >> > I'm in favor of doing this if someone is willing to work on it! I
> > >> >>agree
> > >> >> it
> > >> >> > would really help with easy provisioning.
> > >> >> >
> > >> >> > I filed a bug to discuss and track:
> > >> >> > https://issues.apache.org/jira/browse/KAFKA-1070
> > >> >> >
> > >> >> > Some comments:
> > >> >> > 1. I'm not in favor of having a pluggable strategy, unless we are
> > >> >>really
> > >> >> > really sure this is an area where people are going to get a lot
> of
> > >> >>value
> > >> >> by
> > >> >> > writing lots of plugins. I am not at all sure why you would want
> to
> > >> >> retain
> > >> >> > the current behavior if you had a good strategy for automatically
> > >> >> > generating ids. Basically plugins are an evil we only want to
> > >>accept
> > >> >>when
> > >> >> > either we don't understand the problem or the solutions have such
> > >> >>extreme
> > >> >> > tradeoffs that there is no single "good approach". Plugins cause
> > >> >>problems
> > >> >> > for upgrades, testing, documentation, user understandability,
> code
> > >> >> > understandability, etc.
> > >> >> > 2. The node id can't be the host or port or anything tied to the
> > >> >>physical
> > >> >> > machine or its location on the network because you need to be
> able
> > >>to
> > >> >> > change these things. I recommend we just keep an integer.
> > >> >> >
> > >> >> > -Jay
> > >> >> >
> > >> >> >
> > >> >> > On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar <
> > >> >> > aniket.bhatnagar@gmail.com
> > >> >> > > wrote:
> > >> >> >
> > >> >> > > Right. It is currently java integer. However, as per previous
> > >> >>thread,
> > >> >> it
> > >> >> > > seems possible to change it to a string. In that case, we can
> use
> > >> >> > instance
> > >> >> > > IDs, IP addresses, custom ID generators, etc.
> > >> >> > > How are you currently generating broker IDs from IP address?
> Chef
> > >> >> script
> > >> >> > or
> > >> >> > > custom shell script?
> > >> >> > >
> > >> >> > >
> > >> >> > > On 1 October 2013 18:34, Maxime Brugidou
> > >><maxime.brugidou@gmail.com
> > >> >
> > >> >> > > wrote:
> > >> >> > >
> > >> >> > > > I think it currently is a java (signed) integer or maybe this
> > >>was
> > >> >> > > > zookeeper?
> > >> >> > > > We are generating the id from IP address for now but this is
> > >>not
> > >> >> ideal
> > >> >> > > (and
> > >> >> > > > can cause integer overflow with java signed ints)
> > >> >> > > > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <
> > >> >> > aniket.bhatnagar@gmail.com>
> > >> >> > > > wrote:
> > >> >> > > >
> > >> >> > > > > I would like to revive an older thread around auto
> generating
> > >> >> broker
> > >> >> > > ID.
> > >> >> > > > As
> > >> >> > > > > a AWS user, I would like Kafka to just use the instance's
> ID
> > >>or
> > >> >> > > > instance's
> > >> >> > > > > IP or instance's internal domain (whichever is easier).
> This
> > >> >>would
> > >> >> > > mean I
> > >> >> > > > > can easily clone from a AMI to launch kafka instances
> without
> > >> >> having
> > >> >> > to
> > >> >> > > > > worry about setting a unique broker ID. This also alows me
> to
> > >> >>setup
> > >> >> > > auto
> > >> >> > > > > scaling.
> > >> >> > > > >
> > >> >> > > > > I realize 1 size may not fit all in this case. Other
> > >>strategies
> > >> >> that
> > >> >> > > may
> > >> >> > > > > work for other cloud providers are generate the UUID and
> > >> >>persist it
> > >> >> > on
> > >> >> > > a
> > >> >> > > > > disk, etc.
> > >> >> > > > >
> > >> >> > > > > What I propose is a way to define a a broker ID generation
> > >> >>strategy
> > >> >> > in
> > >> >> > > > the
> > >> >> > > > > configuration file which points to a class file that is
> > >> >>responsible
> > >> >> > for
> > >> >> > > > > generating the ID. Is this something being already worked
> > >>upon?
> > >> >> > > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> -- Guozhang
> > >> >>
> > >>
> > >>
> >
> >
>

Re: Strategies for auto generating broker ID

Posted by Jason Rosenberg <jb...@squareup.com>.

The one concern with this meta data approach, is that it seems like a
pretty low-level thing to have to manage.  If  I have a broker failure, and
I want to bring in a new node to replace it, what is the procedure for
having the new broker assume the identity of the previously failed one?
 Manually setting it in the config for the broker is perhaps error prone,
but is pretty straightforward.  I'm not sure it's cleaner to manually edit
replicated meta files in the data log dirs for the broker?


On Wed, Oct 2, 2013 at 2:20 PM, Sriram Subramanian <
srsubramanian@linkedin.com> wrote:

> Jason - You should be able to solve that with Jay's proposal below. If you
> just persist the id in a meta file, you can copy the meta file over to the
> new broker and broker will not re-generate another id.
>
> On 10/2/13 11:10 AM, "Jason Rosenberg" <jb...@squareup.com> wrote:
>
> >I recently moved away from generating a unique brokerId for each node, in
> >favor of assigning ids in configuration.  The reason for this, is that in
> >0.8, there isn't a convenient way yet to reassign partitions to a new
> >brokerid, should one broker have a failure.  So, it seems the only
> >work-around at the moment is to bring up a replacement broker, assign it
> >the same brokerId as one that has failed and is no longer running.  The
> >cluster will then automatically replicate all the partitions that were
> >assigned to the failed broker to the new broker.
> >
> >This appears the only operational way to deal with failed brokers, at the
> >moment.
> >
> >Longer term, it would be great if the cluster were self-healing, and if a
> >broker went down, we could mark it as no longer available somehow, and the
> >cluster would then reassign and re-replicate partitions to new brokers,
> >that were previously assigned to the failed broker.  I expect something
> >like this will be available in future versions, but that doesn't appear
> >the
> >case at present.
> >
> >And related, it would be nice, in the interests of horizontal scalability,
> >to have an easy way for the cluster to dynamically rebalance load, if new
> >nodes are added to the cluster (or to at least prefer assigning new
> >partitions to brokers which have more space available).  I expect this
> >will
> >be something to prioritize in the future versions as well.
> >
> >Jason
> >
> >
> >On Wed, Oct 2, 2013 at 1:00 PM, Sriram Subramanian <
> >srsubramanian@linkedin.com> wrote:
> >
> >> I agree that we need a unique id and have something independent of the
> >> machine. I am not sure you want a dependency on ZK to generate the
> >>unique
> >> id though. There are other ways to generate an unique id (Example -
> >>UUID).
> >> In case there was a collision (highly unlikely), the node creation in ZK
> >> will anyways fail and the broker can regenerate another id.
> >>
> >> On 10/2/13 9:52 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> >>
> >> >There are scenarios in which you want a hostname to change or you want
> >>to
> >> >move the stored data off one machine onto another. This is the
> >>motivation
> >> >systems have for having a layer of indirection between the location and
> >> >the
> >> >identity of the nodes.
> >> >
> >> >-Jay
> >> >
> >> >
> >> >On Wed, Oct 2, 2013 at 9:23 AM, Guozhang Wang <wa...@gmail.com>
> >>wrote:
> >> >
> >> >> Wondering what is the reason behind decoupling the node id with its
> >> >> physical host(port)? If we found that for example, node 1 is not
> >>owning
> >> >>any
> >> >> partitions, how would we know which physical machine is this node
> >>then?
> >> >>
> >> >> Guozhang
> >> >>
> >> >>
> >> >> On Wed, Oct 2, 2013 at 9:07 AM, Jay Kreps <ja...@gmail.com>
> >>wrote:
> >> >>
> >> >> > I'm in favor of doing this if someone is willing to work on it! I
> >> >>agree
> >> >> it
> >> >> > would really help with easy provisioning.
> >> >> >
> >> >> > I filed a bug to discuss and track:
> >> >> > https://issues.apache.org/jira/browse/KAFKA-1070
> >> >> >
> >> >> > Some comments:
> >> >> > 1. I'm not in favor of having a pluggable strategy, unless we are
> >> >>really
> >> >> > really sure this is an area where people are going to get a lot of
> >> >>value
> >> >> by
> >> >> > writing lots of plugins. I am not at all sure why you would want to
> >> >> retain
> >> >> > the current behavior if you had a good strategy for automatically
> >> >> > generating ids. Basically plugins are an evil we only want to
> >>accept
> >> >>when
> >> >> > either we don't understand the problem or the solutions have such
> >> >>extreme
> >> >> > tradeoffs that there is no single "good approach". Plugins cause
> >> >>problems
> >> >> > for upgrades, testing, documentation, user understandability, code
> >> >> > understandability, etc.
> >> >> > 2. The node id can't be the host or port or anything tied to the
> >> >>physical
> >> >> > machine or its location on the network because you need to be able
> >>to
> >> >> > change these things. I recommend we just keep an integer.
> >> >> >
> >> >> > -Jay
> >> >> >
> >> >> >
> >> >> > On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar <
> >> >> > aniket.bhatnagar@gmail.com
> >> >> > > wrote:
> >> >> >
> >> >> > > Right. It is currently java integer. However, as per previous
> >> >>thread,
> >> >> it
> >> >> > > seems possible to change it to a string. In that case, we can use
> >> >> > instance
> >> >> > > IDs, IP addresses, custom ID generators, etc.
> >> >> > > How are you currently generating broker IDs from IP address? Chef
> >> >> script
> >> >> > or
> >> >> > > custom shell script?
> >> >> > >
> >> >> > >
> >> >> > > On 1 October 2013 18:34, Maxime Brugidou
> >><maxime.brugidou@gmail.com
> >> >
> >> >> > > wrote:
> >> >> > >
> >> >> > > > I think it currently is a java (signed) integer or maybe this
> >>was
> >> >> > > > zookeeper?
> >> >> > > > We are generating the id from IP address for now but this is
> >>not
> >> >> ideal
> >> >> > > (and
> >> >> > > > can cause integer overflow with java signed ints)
> >> >> > > > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <
> >> >> > aniket.bhatnagar@gmail.com>
> >> >> > > > wrote:
> >> >> > > >
> >> >> > > > > I would like to revive an older thread around auto generating
> >> >> broker
> >> >> > > ID.
> >> >> > > > As
> >> >> > > > > a AWS user, I would like Kafka to just use the instance's ID
> >>or
> >> >> > > > instance's
> >> >> > > > > IP or instance's internal domain (whichever is easier). This
> >> >>would
> >> >> > > mean I
> >> >> > > > > can easily clone from a AMI to launch kafka instances without
> >> >> having
> >> >> > to
> >> >> > > > > worry about setting a unique broker ID. This also alows me to
> >> >>setup
> >> >> > > auto
> >> >> > > > > scaling.
> >> >> > > > >
> >> >> > > > > I realize 1 size may not fit all in this case. Other
> >>strategies
> >> >> that
> >> >> > > may
> >> >> > > > > work for other cloud providers are generate the UUID and
> >> >>persist it
> >> >> > on
> >> >> > > a
> >> >> > > > > disk, etc.
> >> >> > > > >
> >> >> > > > > What I propose is a way to define a a broker ID generation
> >> >>strategy
> >> >> > in
> >> >> > > > the
> >> >> > > > > configuration file which points to a class file that is
> >> >>responsible
> >> >> > for
> >> >> > > > > generating the ID. Is this something being already worked
> >>upon?
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> -- Guozhang
> >> >>
> >>
> >>
>
>

Re: Strategies for auto generating broker ID

Posted by Sriram Subramanian <sr...@linkedin.com>.

Jason - You should be able to solve that with Jay's proposal below. If you
just persist the id in a meta file, you can copy the meta file over to the
new broker and broker will not re-generate another id.

On 10/2/13 11:10 AM, "Jason Rosenberg" <jb...@squareup.com> wrote:

>I recently moved away from generating a unique brokerId for each node, in
>favor of assigning ids in configuration.  The reason for this, is that in
>0.8, there isn't a convenient way yet to reassign partitions to a new
>brokerid, should one broker have a failure.  So, it seems the only
>work-around at the moment is to bring up a replacement broker, assign it
>the same brokerId as one that has failed and is no longer running.  The
>cluster will then automatically replicate all the partitions that were
>assigned to the failed broker to the new broker.
>
>This appears the only operational way to deal with failed brokers, at the
>moment.
>
>Longer term, it would be great if the cluster were self-healing, and if a
>broker went down, we could mark it as no longer available somehow, and the
>cluster would then reassign and re-replicate partitions to new brokers,
>that were previously assigned to the failed broker.  I expect something
>like this will be available in future versions, but that doesn't appear
>the
>case at present.
>
>And related, it would be nice, in the interests of horizontal scalability,
>to have an easy way for the cluster to dynamically rebalance load, if new
>nodes are added to the cluster (or to at least prefer assigning new
>partitions to brokers which have more space available).  I expect this
>will
>be something to prioritize in the future versions as well.
>
>Jason
>
>
>On Wed, Oct 2, 2013 at 1:00 PM, Sriram Subramanian <
>srsubramanian@linkedin.com> wrote:
>
>> I agree that we need a unique id and have something independent of the
>> machine. I am not sure you want a dependency on ZK to generate the
>>unique
>> id though. There are other ways to generate an unique id (Example -
>>UUID).
>> In case there was a collision (highly unlikely), the node creation in ZK
>> will anyways fail and the broker can regenerate another id.
>>
>> On 10/2/13 9:52 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>>
>> >There are scenarios in which you want a hostname to change or you want
>>to
>> >move the stored data off one machine onto another. This is the
>>motivation
>> >systems have for having a layer of indirection between the location and
>> >the
>> >identity of the nodes.
>> >
>> >-Jay
>> >
>> >
>> >On Wed, Oct 2, 2013 at 9:23 AM, Guozhang Wang <wa...@gmail.com>
>>wrote:
>> >
>> >> Wondering what is the reason behind decoupling the node id with its
>> >> physical host(port)? If we found that for example, node 1 is not
>>owning
>> >>any
>> >> partitions, how would we know which physical machine is this node
>>then?
>> >>
>> >> Guozhang
>> >>
>> >>
>> >> On Wed, Oct 2, 2013 at 9:07 AM, Jay Kreps <ja...@gmail.com>
>>wrote:
>> >>
>> >> > I'm in favor of doing this if someone is willing to work on it! I
>> >>agree
>> >> it
>> >> > would really help with easy provisioning.
>> >> >
>> >> > I filed a bug to discuss and track:
>> >> > https://issues.apache.org/jira/browse/KAFKA-1070
>> >> >
>> >> > Some comments:
>> >> > 1. I'm not in favor of having a pluggable strategy, unless we are
>> >>really
>> >> > really sure this is an area where people are going to get a lot of
>> >>value
>> >> by
>> >> > writing lots of plugins. I am not at all sure why you would want to
>> >> retain
>> >> > the current behavior if you had a good strategy for automatically
>> >> > generating ids. Basically plugins are an evil we only want to
>>accept
>> >>when
>> >> > either we don't understand the problem or the solutions have such
>> >>extreme
>> >> > tradeoffs that there is no single "good approach". Plugins cause
>> >>problems
>> >> > for upgrades, testing, documentation, user understandability, code
>> >> > understandability, etc.
>> >> > 2. The node id can't be the host or port or anything tied to the
>> >>physical
>> >> > machine or its location on the network because you need to be able
>>to
>> >> > change these things. I recommend we just keep an integer.
>> >> >
>> >> > -Jay
>> >> >
>> >> >
>> >> > On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar <
>> >> > aniket.bhatnagar@gmail.com
>> >> > > wrote:
>> >> >
>> >> > > Right. It is currently java integer. However, as per previous
>> >>thread,
>> >> it
>> >> > > seems possible to change it to a string. In that case, we can use
>> >> > instance
>> >> > > IDs, IP addresses, custom ID generators, etc.
>> >> > > How are you currently generating broker IDs from IP address? Chef
>> >> script
>> >> > or
>> >> > > custom shell script?
>> >> > >
>> >> > >
>> >> > > On 1 October 2013 18:34, Maxime Brugidou
>><maxime.brugidou@gmail.com
>> >
>> >> > > wrote:
>> >> > >
>> >> > > > I think it currently is a java (signed) integer or maybe this
>>was
>> >> > > > zookeeper?
>> >> > > > We are generating the id from IP address for now but this is
>>not
>> >> ideal
>> >> > > (and
>> >> > > > can cause integer overflow with java signed ints)
>> >> > > > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <
>> >> > aniket.bhatnagar@gmail.com>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > I would like to revive an older thread around auto generating
>> >> broker
>> >> > > ID.
>> >> > > > As
>> >> > > > > a AWS user, I would like Kafka to just use the instance's ID
>>or
>> >> > > > instance's
>> >> > > > > IP or instance's internal domain (whichever is easier). This
>> >>would
>> >> > > mean I
>> >> > > > > can easily clone from a AMI to launch kafka instances without
>> >> having
>> >> > to
>> >> > > > > worry about setting a unique broker ID. This also alows me to
>> >>setup
>> >> > > auto
>> >> > > > > scaling.
>> >> > > > >
>> >> > > > > I realize 1 size may not fit all in this case. Other
>>strategies
>> >> that
>> >> > > may
>> >> > > > > work for other cloud providers are generate the UUID and
>> >>persist it
>> >> > on
>> >> > > a
>> >> > > > > disk, etc.
>> >> > > > >
>> >> > > > > What I propose is a way to define a a broker ID generation
>> >>strategy
>> >> > in
>> >> > > > the
>> >> > > > > configuration file which points to a class file that is
>> >>responsible
>> >> > for
>> >> > > > > generating the ID. Is this something being already worked
>>upon?
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> -- Guozhang
>> >>
>>
>>

Re: Strategies for auto generating broker ID

Posted by Jason Rosenberg <jb...@squareup.com>.

I recently moved away from generating a unique brokerId for each node, in
favor of assigning ids in configuration.  The reason for this, is that in
0.8, there isn't a convenient way yet to reassign partitions to a new
brokerid, should one broker have a failure.  So, it seems the only
work-around at the moment is to bring up a replacement broker, assign it
the same brokerId as one that has failed and is no longer running.  The
cluster will then automatically replicate all the partitions that were
assigned to the failed broker to the new broker.

This appears the only operational way to deal with failed brokers, at the
moment.

Longer term, it would be great if the cluster were self-healing, and if a
broker went down, we could mark it as no longer available somehow, and the
cluster would then reassign and re-replicate partitions to new brokers,
that were previously assigned to the failed broker.  I expect something
like this will be available in future versions, but that doesn't appear the
case at present.

And related, it would be nice, in the interests of horizontal scalability,
to have an easy way for the cluster to dynamically rebalance load, if new
nodes are added to the cluster (or to at least prefer assigning new
partitions to brokers which have more space available).  I expect this will
be something to prioritize in the future versions as well.

Jason


On Wed, Oct 2, 2013 at 1:00 PM, Sriram Subramanian <
srsubramanian@linkedin.com> wrote:

> I agree that we need a unique id and have something independent of the
> machine. I am not sure you want a dependency on ZK to generate the unique
> id though. There are other ways to generate an unique id (Example - UUID).
> In case there was a collision (highly unlikely), the node creation in ZK
> will anyways fail and the broker can regenerate another id.
>
> On 10/2/13 9:52 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>
> >There are scenarios in which you want a hostname to change or you want to
> >move the stored data off one machine onto another. This is the motivation
> >systems have for having a layer of indirection between the location and
> >the
> >identity of the nodes.
> >
> >-Jay
> >
> >
> >On Wed, Oct 2, 2013 at 9:23 AM, Guozhang Wang <wa...@gmail.com> wrote:
> >
> >> Wondering what is the reason behind decoupling the node id with its
> >> physical host(port)? If we found that for example, node 1 is not owning
> >>any
> >> partitions, how would we know which physical machine is this node then?
> >>
> >> Guozhang
> >>
> >>
> >> On Wed, Oct 2, 2013 at 9:07 AM, Jay Kreps <ja...@gmail.com> wrote:
> >>
> >> > I'm in favor of doing this if someone is willing to work on it! I
> >>agree
> >> it
> >> > would really help with easy provisioning.
> >> >
> >> > I filed a bug to discuss and track:
> >> > https://issues.apache.org/jira/browse/KAFKA-1070
> >> >
> >> > Some comments:
> >> > 1. I'm not in favor of having a pluggable strategy, unless we are
> >>really
> >> > really sure this is an area where people are going to get a lot of
> >>value
> >> by
> >> > writing lots of plugins. I am not at all sure why you would want to
> >> retain
> >> > the current behavior if you had a good strategy for automatically
> >> > generating ids. Basically plugins are an evil we only want to accept
> >>when
> >> > either we don't understand the problem or the solutions have such
> >>extreme
> >> > tradeoffs that there is no single "good approach". Plugins cause
> >>problems
> >> > for upgrades, testing, documentation, user understandability, code
> >> > understandability, etc.
> >> > 2. The node id can't be the host or port or anything tied to the
> >>physical
> >> > machine or its location on the network because you need to be able to
> >> > change these things. I recommend we just keep an integer.
> >> >
> >> > -Jay
> >> >
> >> >
> >> > On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar <
> >> > aniket.bhatnagar@gmail.com
> >> > > wrote:
> >> >
> >> > > Right. It is currently java integer. However, as per previous
> >>thread,
> >> it
> >> > > seems possible to change it to a string. In that case, we can use
> >> > instance
> >> > > IDs, IP addresses, custom ID generators, etc.
> >> > > How are you currently generating broker IDs from IP address? Chef
> >> script
> >> > or
> >> > > custom shell script?
> >> > >
> >> > >
> >> > > On 1 October 2013 18:34, Maxime Brugidou <maxime.brugidou@gmail.com
> >
> >> > > wrote:
> >> > >
> >> > > > I think it currently is a java (signed) integer or maybe this was
> >> > > > zookeeper?
> >> > > > We are generating the id from IP address for now but this is not
> >> ideal
> >> > > (and
> >> > > > can cause integer overflow with java signed ints)
> >> > > > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <
> >> > aniket.bhatnagar@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > I would like to revive an older thread around auto generating
> >> broker
> >> > > ID.
> >> > > > As
> >> > > > > a AWS user, I would like Kafka to just use the instance's ID or
> >> > > > instance's
> >> > > > > IP or instance's internal domain (whichever is easier). This
> >>would
> >> > > mean I
> >> > > > > can easily clone from a AMI to launch kafka instances without
> >> having
> >> > to
> >> > > > > worry about setting a unique broker ID. This also alows me to
> >>setup
> >> > > auto
> >> > > > > scaling.
> >> > > > >
> >> > > > > I realize 1 size may not fit all in this case. Other strategies
> >> that
> >> > > may
> >> > > > > work for other cloud providers are generate the UUID and
> >>persist it
> >> > on
> >> > > a
> >> > > > > disk, etc.
> >> > > > >
> >> > > > > What I propose is a way to define a a broker ID generation
> >>strategy
> >> > in
> >> > > > the
> >> > > > > configuration file which points to a class file that is
> >>responsible
> >> > for
> >> > > > > generating the ID. Is this something being already worked upon?
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
>
>

Re: Strategies for auto generating broker ID

Posted by Sriram Subramanian <sr...@linkedin.com>.

I agree that we need a unique id and have something independent of the
machine. I am not sure you want a dependency on ZK to generate the unique
id though. There are other ways to generate an unique id (Example - UUID).
In case there was a collision (highly unlikely), the node creation in ZK
will anyways fail and the broker can regenerate another id.

On 10/2/13 9:52 AM, "Jay Kreps" <ja...@gmail.com> wrote:

>There are scenarios in which you want a hostname to change or you want to
>move the stored data off one machine onto another. This is the motivation
>systems have for having a layer of indirection between the location and
>the
>identity of the nodes.
>
>-Jay
>
>
>On Wed, Oct 2, 2013 at 9:23 AM, Guozhang Wang <wa...@gmail.com> wrote:
>
>> Wondering what is the reason behind decoupling the node id with its
>> physical host(port)? If we found that for example, node 1 is not owning
>>any
>> partitions, how would we know which physical machine is this node then?
>>
>> Guozhang
>>
>>
>> On Wed, Oct 2, 2013 at 9:07 AM, Jay Kreps <ja...@gmail.com> wrote:
>>
>> > I'm in favor of doing this if someone is willing to work on it! I
>>agree
>> it
>> > would really help with easy provisioning.
>> >
>> > I filed a bug to discuss and track:
>> > https://issues.apache.org/jira/browse/KAFKA-1070
>> >
>> > Some comments:
>> > 1. I'm not in favor of having a pluggable strategy, unless we are
>>really
>> > really sure this is an area where people are going to get a lot of
>>value
>> by
>> > writing lots of plugins. I am not at all sure why you would want to
>> retain
>> > the current behavior if you had a good strategy for automatically
>> > generating ids. Basically plugins are an evil we only want to accept
>>when
>> > either we don't understand the problem or the solutions have such
>>extreme
>> > tradeoffs that there is no single "good approach". Plugins cause
>>problems
>> > for upgrades, testing, documentation, user understandability, code
>> > understandability, etc.
>> > 2. The node id can't be the host or port or anything tied to the
>>physical
>> > machine or its location on the network because you need to be able to
>> > change these things. I recommend we just keep an integer.
>> >
>> > -Jay
>> >
>> >
>> > On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar <
>> > aniket.bhatnagar@gmail.com
>> > > wrote:
>> >
>> > > Right. It is currently java integer. However, as per previous
>>thread,
>> it
>> > > seems possible to change it to a string. In that case, we can use
>> > instance
>> > > IDs, IP addresses, custom ID generators, etc.
>> > > How are you currently generating broker IDs from IP address? Chef
>> script
>> > or
>> > > custom shell script?
>> > >
>> > >
>> > > On 1 October 2013 18:34, Maxime Brugidou <ma...@gmail.com>
>> > > wrote:
>> > >
>> > > > I think it currently is a java (signed) integer or maybe this was
>> > > > zookeeper?
>> > > > We are generating the id from IP address for now but this is not
>> ideal
>> > > (and
>> > > > can cause integer overflow with java signed ints)
>> > > > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <
>> > aniket.bhatnagar@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > I would like to revive an older thread around auto generating
>> broker
>> > > ID.
>> > > > As
>> > > > > a AWS user, I would like Kafka to just use the instance's ID or
>> > > > instance's
>> > > > > IP or instance's internal domain (whichever is easier). This
>>would
>> > > mean I
>> > > > > can easily clone from a AMI to launch kafka instances without
>> having
>> > to
>> > > > > worry about setting a unique broker ID. This also alows me to
>>setup
>> > > auto
>> > > > > scaling.
>> > > > >
>> > > > > I realize 1 size may not fit all in this case. Other strategies
>> that
>> > > may
>> > > > > work for other cloud providers are generate the UUID and
>>persist it
>> > on
>> > > a
>> > > > > disk, etc.
>> > > > >
>> > > > > What I propose is a way to define a a broker ID generation
>>strategy
>> > in
>> > > > the
>> > > > > configuration file which points to a class file that is
>>responsible
>> > for
>> > > > > generating the ID. Is this something being already worked upon?
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>

Re: Strategies for auto generating broker ID

Posted by Jay Kreps <ja...@gmail.com>.

There are scenarios in which you want a hostname to change or you want to
move the stored data off one machine onto another. This is the motivation
systems have for having a layer of indirection between the location and the
identity of the nodes.

-Jay


On Wed, Oct 2, 2013 at 9:23 AM, Guozhang Wang <wa...@gmail.com> wrote:

> Wondering what is the reason behind decoupling the node id with its
> physical host(port)? If we found that for example, node 1 is not owning any
> partitions, how would we know which physical machine is this node then?
>
> Guozhang
>
>
> On Wed, Oct 2, 2013 at 9:07 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > I'm in favor of doing this if someone is willing to work on it! I agree
> it
> > would really help with easy provisioning.
> >
> > I filed a bug to discuss and track:
> > https://issues.apache.org/jira/browse/KAFKA-1070
> >
> > Some comments:
> > 1. I'm not in favor of having a pluggable strategy, unless we are really
> > really sure this is an area where people are going to get a lot of value
> by
> > writing lots of plugins. I am not at all sure why you would want to
> retain
> > the current behavior if you had a good strategy for automatically
> > generating ids. Basically plugins are an evil we only want to accept when
> > either we don't understand the problem or the solutions have such extreme
> > tradeoffs that there is no single "good approach". Plugins cause problems
> > for upgrades, testing, documentation, user understandability, code
> > understandability, etc.
> > 2. The node id can't be the host or port or anything tied to the physical
> > machine or its location on the network because you need to be able to
> > change these things. I recommend we just keep an integer.
> >
> > -Jay
> >
> >
> > On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar <
> > aniket.bhatnagar@gmail.com
> > > wrote:
> >
> > > Right. It is currently java integer. However, as per previous thread,
> it
> > > seems possible to change it to a string. In that case, we can use
> > instance
> > > IDs, IP addresses, custom ID generators, etc.
> > > How are you currently generating broker IDs from IP address? Chef
> script
> > or
> > > custom shell script?
> > >
> > >
> > > On 1 October 2013 18:34, Maxime Brugidou <ma...@gmail.com>
> > > wrote:
> > >
> > > > I think it currently is a java (signed) integer or maybe this was
> > > > zookeeper?
> > > > We are generating the id from IP address for now but this is not
> ideal
> > > (and
> > > > can cause integer overflow with java signed ints)
> > > > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <
> > aniket.bhatnagar@gmail.com>
> > > > wrote:
> > > >
> > > > > I would like to revive an older thread around auto generating
> broker
> > > ID.
> > > > As
> > > > > a AWS user, I would like Kafka to just use the instance's ID or
> > > > instance's
> > > > > IP or instance's internal domain (whichever is easier). This would
> > > mean I
> > > > > can easily clone from a AMI to launch kafka instances without
> having
> > to
> > > > > worry about setting a unique broker ID. This also alows me to setup
> > > auto
> > > > > scaling.
> > > > >
> > > > > I realize 1 size may not fit all in this case. Other strategies
> that
> > > may
> > > > > work for other cloud providers are generate the UUID and persist it
> > on
> > > a
> > > > > disk, etc.
> > > > >
> > > > > What I propose is a way to define a a broker ID generation strategy
> > in
> > > > the
> > > > > configuration file which points to a class file that is responsible
> > for
> > > > > generating the ID. Is this something being already worked upon?
> > > > >
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: Strategies for auto generating broker ID

Posted by Guozhang Wang <wa...@gmail.com>.

Wondering what is the reason behind decoupling the node id with its
physical host(port)? If we found that for example, node 1 is not owning any
partitions, how would we know which physical machine is this node then?

Guozhang


On Wed, Oct 2, 2013 at 9:07 AM, Jay Kreps <ja...@gmail.com> wrote:

> I'm in favor of doing this if someone is willing to work on it! I agree it
> would really help with easy provisioning.
>
> I filed a bug to discuss and track:
> https://issues.apache.org/jira/browse/KAFKA-1070
>
> Some comments:
> 1. I'm not in favor of having a pluggable strategy, unless we are really
> really sure this is an area where people are going to get a lot of value by
> writing lots of plugins. I am not at all sure why you would want to retain
> the current behavior if you had a good strategy for automatically
> generating ids. Basically plugins are an evil we only want to accept when
> either we don't understand the problem or the solutions have such extreme
> tradeoffs that there is no single "good approach". Plugins cause problems
> for upgrades, testing, documentation, user understandability, code
> understandability, etc.
> 2. The node id can't be the host or port or anything tied to the physical
> machine or its location on the network because you need to be able to
> change these things. I recommend we just keep an integer.
>
> -Jay
>
>
> On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar <
> aniket.bhatnagar@gmail.com
> > wrote:
>
> > Right. It is currently java integer. However, as per previous thread, it
> > seems possible to change it to a string. In that case, we can use
> instance
> > IDs, IP addresses, custom ID generators, etc.
> > How are you currently generating broker IDs from IP address? Chef script
> or
> > custom shell script?
> >
> >
> > On 1 October 2013 18:34, Maxime Brugidou <ma...@gmail.com>
> > wrote:
> >
> > > I think it currently is a java (signed) integer or maybe this was
> > > zookeeper?
> > > We are generating the id from IP address for now but this is not ideal
> > (and
> > > can cause integer overflow with java signed ints)
> > > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <
> aniket.bhatnagar@gmail.com>
> > > wrote:
> > >
> > > > I would like to revive an older thread around auto generating broker
> > ID.
> > > As
> > > > a AWS user, I would like Kafka to just use the instance's ID or
> > > instance's
> > > > IP or instance's internal domain (whichever is easier). This would
> > mean I
> > > > can easily clone from a AMI to launch kafka instances without having
> to
> > > > worry about setting a unique broker ID. This also alows me to setup
> > auto
> > > > scaling.
> > > >
> > > > I realize 1 size may not fit all in this case. Other strategies that
> > may
> > > > work for other cloud providers are generate the UUID and persist it
> on
> > a
> > > > disk, etc.
> > > >
> > > > What I propose is a way to define a a broker ID generation strategy
> in
> > > the
> > > > configuration file which points to a class file that is responsible
> for
> > > > generating the ID. Is this something being already worked upon?
> > > >
> > >
> >
>



-- 
-- Guozhang

Re: Strategies for auto generating broker ID

Posted by Jay Kreps <ja...@gmail.com>.

Hey Aniket,

Yeah we usually discuss on the tickets just to keep it in one place but
either is totally fine.

1. Actually that wasn't quite what I was proposing. What I am saying is
that there are three cases (a) metadata data missing in all dirs, (b)
metadata missing in some dirs, (c) metadata inconsistent between dirs. In
the case of (a) we should generate an id, in the case of (b) we should fill
in the missing data (this would be the case where a drive is destroyed and
replaced), and in the case of (c) someone has done something sketchy and we
should just error out. Let me know if you think that makes sense. An
alternative approach would be to designate a special place to keep this
kind of metadata but the question is always what happens in the case of
drive failure with multiple independently mounted drives.

2. Yup. We have a metadata api that does this.

-Jay


On Wed, Oct 2, 2013 at 9:50 AM, Aniket Bhatnagar <aniket.bhatnagar@gmail.com
> wrote:

> Thanks Jay. I read through the JIRA defect and had some queries. Apologies
> if I was supposed to comment on JIRA ticket instead of discussing it here.
> If so, let me know and I will repost my comments on JIRA.
>
> 1. With the suggested approach, each time a new disk/data dir is added to
> the configuration, Kafka will fail to start unless meta file is copied to
> the new disk. Copying over the meta file would result in copying over all
> other values like data format. Not sure if that would be intentional.
>
> 2. Is there a way to query broker id to get hostname, etc via zookeeper or
> kafka?
>  On 2 Oct 2013 21:38, "Jay Kreps" <ja...@gmail.com> wrote:
>
> > I'm in favor of doing this if someone is willing to work on it! I agree
> it
> > would really help with easy provisioning.
> >
> > I filed a bug to discuss and track:
> > https://issues.apache.org/jira/browse/KAFKA-1070
> >
> > Some comments:
> > 1. I'm not in favor of having a pluggable strategy, unless we are really
> > really sure this is an area where people are going to get a lot of value
> by
> > writing lots of plugins. I am not at all sure why you would want to
> retain
> > the current behavior if you had a good strategy for automatically
> > generating ids. Basically plugins are an evil we only want to accept when
> > either we don't understand the problem or the solutions have such extreme
> > tradeoffs that there is no single "good approach". Plugins cause problems
> > for upgrades, testing, documentation, user understandability, code
> > understandability, etc.
> > 2. The node id can't be the host or port or anything tied to the physical
> > machine or its location on the network because you need to be able to
> > change these things. I recommend we just keep an integer.
> >
> > -Jay
> >
> >
> > On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar <
> > aniket.bhatnagar@gmail.com
> > > wrote:
> >
> > > Right. It is currently java integer. However, as per previous thread,
> it
> > > seems possible to change it to a string. In that case, we can use
> > instance
> > > IDs, IP addresses, custom ID generators, etc.
> > > How are you currently generating broker IDs from IP address? Chef
> script
> > or
> > > custom shell script?
> > >
> > >
> > > On 1 October 2013 18:34, Maxime Brugidou <ma...@gmail.com>
> > > wrote:
> > >
> > > > I think it currently is a java (signed) integer or maybe this was
> > > > zookeeper?
> > > > We are generating the id from IP address for now but this is not
> ideal
> > > (and
> > > > can cause integer overflow with java signed ints)
> > > > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <
> > aniket.bhatnagar@gmail.com>
> > > > wrote:
> > > >
> > > > > I would like to revive an older thread around auto generating
> broker
> > > ID.
> > > > As
> > > > > a AWS user, I would like Kafka to just use the instance's ID or
> > > > instance's
> > > > > IP or instance's internal domain (whichever is easier). This would
> > > mean I
> > > > > can easily clone from a AMI to launch kafka instances without
> having
> > to
> > > > > worry about setting a unique broker ID. This also alows me to setup
> > > auto
> > > > > scaling.
> > > > >
> > > > > I realize 1 size may not fit all in this case. Other strategies
> that
> > > may
> > > > > work for other cloud providers are generate the UUID and persist it
> > on
> > > a
> > > > > disk, etc.
> > > > >
> > > > > What I propose is a way to define a a broker ID generation strategy
> > in
> > > > the
> > > > > configuration file which points to a class file that is responsible
> > for
> > > > > generating the ID. Is this something being already worked upon?
> > > > >
> > > >
> > >
> >
>

Re: Strategies for auto generating broker ID

Posted by Aniket Bhatnagar <an...@gmail.com>.

Thanks Jay. I read through the JIRA defect and had some queries. Apologies
if I was supposed to comment on JIRA ticket instead of discussing it here.
If so, let me know and I will repost my comments on JIRA.

1. With the suggested approach, each time a new disk/data dir is added to
the configuration, Kafka will fail to start unless meta file is copied to
the new disk. Copying over the meta file would result in copying over all
other values like data format. Not sure if that would be intentional.

2. Is there a way to query broker id to get hostname, etc via zookeeper or
kafka?
 On 2 Oct 2013 21:38, "Jay Kreps" <ja...@gmail.com> wrote:

> I'm in favor of doing this if someone is willing to work on it! I agree it
> would really help with easy provisioning.
>
> I filed a bug to discuss and track:
> https://issues.apache.org/jira/browse/KAFKA-1070
>
> Some comments:
> 1. I'm not in favor of having a pluggable strategy, unless we are really
> really sure this is an area where people are going to get a lot of value by
> writing lots of plugins. I am not at all sure why you would want to retain
> the current behavior if you had a good strategy for automatically
> generating ids. Basically plugins are an evil we only want to accept when
> either we don't understand the problem or the solutions have such extreme
> tradeoffs that there is no single "good approach". Plugins cause problems
> for upgrades, testing, documentation, user understandability, code
> understandability, etc.
> 2. The node id can't be the host or port or anything tied to the physical
> machine or its location on the network because you need to be able to
> change these things. I recommend we just keep an integer.
>
> -Jay
>
>
> On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar <
> aniket.bhatnagar@gmail.com
> > wrote:
>
> > Right. It is currently java integer. However, as per previous thread, it
> > seems possible to change it to a string. In that case, we can use
> instance
> > IDs, IP addresses, custom ID generators, etc.
> > How are you currently generating broker IDs from IP address? Chef script
> or
> > custom shell script?
> >
> >
> > On 1 October 2013 18:34, Maxime Brugidou <ma...@gmail.com>
> > wrote:
> >
> > > I think it currently is a java (signed) integer or maybe this was
> > > zookeeper?
> > > We are generating the id from IP address for now but this is not ideal
> > (and
> > > can cause integer overflow with java signed ints)
> > > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <
> aniket.bhatnagar@gmail.com>
> > > wrote:
> > >
> > > > I would like to revive an older thread around auto generating broker
> > ID.
> > > As
> > > > a AWS user, I would like Kafka to just use the instance's ID or
> > > instance's
> > > > IP or instance's internal domain (whichever is easier). This would
> > mean I
> > > > can easily clone from a AMI to launch kafka instances without having
> to
> > > > worry about setting a unique broker ID. This also alows me to setup
> > auto
> > > > scaling.
> > > >
> > > > I realize 1 size may not fit all in this case. Other strategies that
> > may
> > > > work for other cloud providers are generate the UUID and persist it
> on
> > a
> > > > disk, etc.
> > > >
> > > > What I propose is a way to define a a broker ID generation strategy
> in
> > > the
> > > > configuration file which points to a class file that is responsible
> for
> > > > generating the ID. Is this something being already worked upon?
> > > >
> > >
> >
>

Re: Strategies for auto generating broker ID

Posted by Jay Kreps <ja...@gmail.com>.

I'm in favor of doing this if someone is willing to work on it! I agree it
would really help with easy provisioning.

I filed a bug to discuss and track:
https://issues.apache.org/jira/browse/KAFKA-1070

Some comments:
1. I'm not in favor of having a pluggable strategy, unless we are really
really sure this is an area where people are going to get a lot of value by
writing lots of plugins. I am not at all sure why you would want to retain
the current behavior if you had a good strategy for automatically
generating ids. Basically plugins are an evil we only want to accept when
either we don't understand the problem or the solutions have such extreme
tradeoffs that there is no single "good approach". Plugins cause problems
for upgrades, testing, documentation, user understandability, code
understandability, etc.
2. The node id can't be the host or port or anything tied to the physical
machine or its location on the network because you need to be able to
change these things. I recommend we just keep an integer.

-Jay

On Tue, Oct 1, 2013 at 7:08 AM, Aniket Bhatnagar <aniket.bhatnagar@gmail.com
> wrote:

> Right. It is currently java integer. However, as per previous thread, it
> seems possible to change it to a string. In that case, we can use instance
> IDs, IP addresses, custom ID generators, etc.
> How are you currently generating broker IDs from IP address? Chef script or
> custom shell script?
>
>
> On 1 October 2013 18:34, Maxime Brugidou <ma...@gmail.com>
> wrote:
>
> > I think it currently is a java (signed) integer or maybe this was
> > zookeeper?
> > We are generating the id from IP address for now but this is not ideal
> (and
> > can cause integer overflow with java signed ints)
> > On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <an...@gmail.com>
> > wrote:
> >
> > > I would like to revive an older thread around auto generating broker
> ID.
> > As
> > > a AWS user, I would like Kafka to just use the instance's ID or
> > instance's
> > > IP or instance's internal domain (whichever is easier). This would
> mean I
> > > can easily clone from a AMI to launch kafka instances without having to
> > > worry about setting a unique broker ID. This also alows me to setup
> auto
> > > scaling.
> > >
> > > I realize 1 size may not fit all in this case. Other strategies that
> may
> > > work for other cloud providers are generate the UUID and persist it on
> a
> > > disk, etc.
> > >
> > > What I propose is a way to define a a broker ID generation strategy in
> > the
> > > configuration file which points to a class file that is responsible for
> > > generating the ID. Is this something being already worked upon?
> > >
> >
>

Re: Strategies for auto generating broker ID

Posted by Aniket Bhatnagar <an...@gmail.com>.

Right. It is currently java integer. However, as per previous thread, it
seems possible to change it to a string. In that case, we can use instance
IDs, IP addresses, custom ID generators, etc.
How are you currently generating broker IDs from IP address? Chef script or
custom shell script?


On 1 October 2013 18:34, Maxime Brugidou <ma...@gmail.com> wrote:

> I think it currently is a java (signed) integer or maybe this was
> zookeeper?
> We are generating the id from IP address for now but this is not ideal (and
> can cause integer overflow with java signed ints)
> On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <an...@gmail.com>
> wrote:
>
> > I would like to revive an older thread around auto generating broker ID.
> As
> > a AWS user, I would like Kafka to just use the instance's ID or
> instance's
> > IP or instance's internal domain (whichever is easier). This would mean I
> > can easily clone from a AMI to launch kafka instances without having to
> > worry about setting a unique broker ID. This also alows me to setup auto
> > scaling.
> >
> > I realize 1 size may not fit all in this case. Other strategies that may
> > work for other cloud providers are generate the UUID and persist it on a
> > disk, etc.
> >
> > What I propose is a way to define a a broker ID generation strategy in
> the
> > configuration file which points to a class file that is responsible for
> > generating the ID. Is this something being already worked upon?
> >
>

Re: Strategies for auto generating broker ID

Posted by Maxime Brugidou <ma...@gmail.com>.

I think it currently is a java (signed) integer or maybe this was zookeeper?
We are generating the id from IP address for now but this is not ideal (and
can cause integer overflow with java signed ints)
On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" <an...@gmail.com>
wrote:

> I would like to revive an older thread around auto generating broker ID. As
> a AWS user, I would like Kafka to just use the instance's ID or instance's
> IP or instance's internal domain (whichever is easier). This would mean I
> can easily clone from a AMI to launch kafka instances without having to
> worry about setting a unique broker ID. This also alows me to setup auto
> scaling.
>
> I realize 1 size may not fit all in this case. Other strategies that may
> work for other cloud providers are generate the UUID and persist it on a
> disk, etc.
>
> What I propose is a way to define a a broker ID generation strategy in the
> configuration file which points to a class file that is responsible for
> generating the ID. Is this something being already worked upon?
>