You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jay Kreps <ja...@gmail.com> on 2013/01/18 06:09:06 UTC

default configs

Currently kafka broker config is all statically defined in a properties
file with the broker. This mostly works pretty well, but for per-topic
configuration (the flush policy, partition count, etc) it is pretty painful
to have to bounce the broker every time you make a config change.

That lead to this proposal:
https://cwiki.apache.org/confluence/display/KAFKA/Dynamic+Topic+Config

An open question is how topic-default configurations should work.

Currently each of our topic-level configs is paired with a default. So you
would have something like
  segment.size.bytes
which would be the default, and then you can override this for topics that
need something different using a map:
  segment.size.bytes.per.topic

The proposal is to move the topic configuration into zookeeper so that for
a topic "my-topic" we would have a znode
  /brokers/topics/my-topic/config
and the contents of this znode would be the topic configuration either as
json or properties or whatever.

There are two ways this config could work:
1. Defaults resolved at topic creation time: At the time a topic is created
the user would specify some properties they wanted for that topic, any
topic they didn't specify would take the server default. ALL these
properties would be stored in the znode.
2. Defaults resolved at config read time: When a topic is created the user
specifies particularly properties they want and ONLY the properties they
particularly specify would be stored. At runtime we would merge these
properties with whatever the server defaults currently are.

This is a somewhat nuanced point, but perhaps important.

The advantage of the first proposal is that it is simple. If you want to
know the configuration for a particular topic you go to zookeeper and look
at that topics config. Mixing the combination of server config and
zookeeper config dynamically makes it a little harder to figure out what
the current state of anything is.

The disadvantage of the first proposal (and the advantage of the second
proposal) is that making global changes is easier. For example if you want
to globally lower the retention for all topics, in proposal one you would
have to iterate over all topics and update the config (this could be done
automatically with tooling, but under the covers the tool would do this).
In the second case you would just update the default value.

Thoughts? If no one cares, I will just pick whatever seems best.

-Jay

Re: default configs

Posted by Jay Kreps <ja...@gmail.com>.
Well, but the proposal was that topic-level configs are loaded when you run
the create_topic command, so wouldn't that be what you are asking for?

-Jay


On Fri, Jan 18, 2013 at 1:57 PM, Joe Stein <cr...@gmail.com> wrote:

> how about a command line script (bin/kafka-config-init.sh) to load in a
> file for the configs to initialize the config values in zookeerper but
> kafka reads the configs from zookeeper
>
> another script can also have options for doing updates
> (bin/kafka-config-update.sh)
>
> if we provide a writing mechanism then the config management systems (we
> use chef) can interact nice the zookeeper updates in a standard way that we
> document and support
>
> win? win?
>
> On Fri, Jan 18, 2013 at 4:19 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Yes please, any help very much appreciated.
> >
> > I am not sure if I understand what you are proposing, though. Are you
> > saying support both the config file and zk for topic-level configs? I
> hate
> > to do things where the answer is "do both"...I guess I feel that although
> > everyone walks away happy it ends up being a lot of code and
> combinatorial
> > testing. So if there is a different plan that hits all requirements I
> like
> > that better. I am very sensitive to the fact that zookeeper is an okay
> > key/value store but a really poor replacement for a config management
> > system. It might be worth while to try to work out a way that meets all
> > needs, if such a thing exists.
> >
> > Is bouncing brokers for topic-overrides a problem for you in your
> > environment? If so how would you fix it?
> >
> > -Jay
> >
> > On Fri, Jan 18, 2013 at 7:53 AM, Joe Stein <jo...@medialets.com>
> > wrote:
> >
> > > Can I help out?
> > >
> > > Also can we abstract the config call too?  We have so much in chef,
> it's
> > > not that i don't want to call our zookeeper cluster for it but we don't
> > > have our topology yet mapped out in znodes they are in our own
> instances
> > of
> > > code.
> > >
> > > It should have both a pull and push for changes, one thing that's nice
> > > with zookeeper and having a watcher.
> > >
> > > /*
> > > Joe Stein, Chief Architect
> > > http://www.medialets.com
> > > Twitter: @allthingshadoop
> > > Mobile: 917-597-9771
> > > */
> > >
> > > On Jan 18, 2013, at 12:09 AM, Jay Kreps <ja...@gmail.com> wrote:
> > >
> > > > Currently kafka broker config is all statically defined in a
> properties
> > > > file with the broker. This mostly works pretty well, but for
> per-topic
> > > > configuration (the flush policy, partition count, etc) it is pretty
> > > painful
> > > > to have to bounce the broker every time you make a config change.
> > > >
> > > > That lead to this proposal:
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/Dynamic+Topic+Config
> > > >
> > > > An open question is how topic-default configurations should work.
> > > >
> > > > Currently each of our topic-level configs is paired with a default.
> So
> > > you
> > > > would have something like
> > > >  segment.size.bytes
> > > > which would be the default, and then you can override this for topics
> > > that
> > > > need something different using a map:
> > > >  segment.size.bytes.per.topic
> > > >
> > > > The proposal is to move the topic configuration into zookeeper so
> that
> > > for
> > > > a topic "my-topic" we would have a znode
> > > >  /brokers/topics/my-topic/config
> > > > and the contents of this znode would be the topic configuration
> either
> > as
> > > > json or properties or whatever.
> > > >
> > > > There are two ways this config could work:
> > > > 1. Defaults resolved at topic creation time: At the time a topic is
> > > created
> > > > the user would specify some properties they wanted for that topic,
> any
> > > > topic they didn't specify would take the server default. ALL these
> > > > properties would be stored in the znode.
> > > > 2. Defaults resolved at config read time: When a topic is created the
> > > user
> > > > specifies particularly properties they want and ONLY the properties
> > they
> > > > particularly specify would be stored. At runtime we would merge these
> > > > properties with whatever the server defaults currently are.
> > > >
> > > > This is a somewhat nuanced point, but perhaps important.
> > > >
> > > > The advantage of the first proposal is that it is simple. If you want
> > to
> > > > know the configuration for a particular topic you go to zookeeper and
> > > look
> > > > at that topics config. Mixing the combination of server config and
> > > > zookeeper config dynamically makes it a little harder to figure out
> > what
> > > > the current state of anything is.
> > > >
> > > > The disadvantage of the first proposal (and the advantage of the
> second
> > > > proposal) is that making global changes is easier. For example if you
> > > want
> > > > to globally lower the retention for all topics, in proposal one you
> > would
> > > > have to iterate over all topics and update the config (this could be
> > done
> > > > automatically with tooling, but under the covers the tool would do
> > this).
> > > > In the second case you would just update the default value.
> > > >
> > > > Thoughts? If no one cares, I will just pick whatever seems best.
> > > >
> > > > -Jay
> > >
> >
>
>
>
> --
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> */
>

Re: default configs

Posted by Joe Stein <cr...@gmail.com>.
how about a command line script (bin/kafka-config-init.sh) to load in a
file for the configs to initialize the config values in zookeerper but
kafka reads the configs from zookeeper

another script can also have options for doing updates
(bin/kafka-config-update.sh)

if we provide a writing mechanism then the config management systems (we
use chef) can interact nice the zookeeper updates in a standard way that we
document and support

win? win?

On Fri, Jan 18, 2013 at 4:19 PM, Jay Kreps <ja...@gmail.com> wrote:

> Yes please, any help very much appreciated.
>
> I am not sure if I understand what you are proposing, though. Are you
> saying support both the config file and zk for topic-level configs? I hate
> to do things where the answer is "do both"...I guess I feel that although
> everyone walks away happy it ends up being a lot of code and combinatorial
> testing. So if there is a different plan that hits all requirements I like
> that better. I am very sensitive to the fact that zookeeper is an okay
> key/value store but a really poor replacement for a config management
> system. It might be worth while to try to work out a way that meets all
> needs, if such a thing exists.
>
> Is bouncing brokers for topic-overrides a problem for you in your
> environment? If so how would you fix it?
>
> -Jay
>
> On Fri, Jan 18, 2013 at 7:53 AM, Joe Stein <jo...@medialets.com>
> wrote:
>
> > Can I help out?
> >
> > Also can we abstract the config call too?  We have so much in chef, it's
> > not that i don't want to call our zookeeper cluster for it but we don't
> > have our topology yet mapped out in znodes they are in our own instances
> of
> > code.
> >
> > It should have both a pull and push for changes, one thing that's nice
> > with zookeeper and having a watcher.
> >
> > /*
> > Joe Stein, Chief Architect
> > http://www.medialets.com
> > Twitter: @allthingshadoop
> > Mobile: 917-597-9771
> > */
> >
> > On Jan 18, 2013, at 12:09 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > Currently kafka broker config is all statically defined in a properties
> > > file with the broker. This mostly works pretty well, but for per-topic
> > > configuration (the flush policy, partition count, etc) it is pretty
> > painful
> > > to have to bounce the broker every time you make a config change.
> > >
> > > That lead to this proposal:
> > > https://cwiki.apache.org/confluence/display/KAFKA/Dynamic+Topic+Config
> > >
> > > An open question is how topic-default configurations should work.
> > >
> > > Currently each of our topic-level configs is paired with a default. So
> > you
> > > would have something like
> > >  segment.size.bytes
> > > which would be the default, and then you can override this for topics
> > that
> > > need something different using a map:
> > >  segment.size.bytes.per.topic
> > >
> > > The proposal is to move the topic configuration into zookeeper so that
> > for
> > > a topic "my-topic" we would have a znode
> > >  /brokers/topics/my-topic/config
> > > and the contents of this znode would be the topic configuration either
> as
> > > json or properties or whatever.
> > >
> > > There are two ways this config could work:
> > > 1. Defaults resolved at topic creation time: At the time a topic is
> > created
> > > the user would specify some properties they wanted for that topic, any
> > > topic they didn't specify would take the server default. ALL these
> > > properties would be stored in the znode.
> > > 2. Defaults resolved at config read time: When a topic is created the
> > user
> > > specifies particularly properties they want and ONLY the properties
> they
> > > particularly specify would be stored. At runtime we would merge these
> > > properties with whatever the server defaults currently are.
> > >
> > > This is a somewhat nuanced point, but perhaps important.
> > >
> > > The advantage of the first proposal is that it is simple. If you want
> to
> > > know the configuration for a particular topic you go to zookeeper and
> > look
> > > at that topics config. Mixing the combination of server config and
> > > zookeeper config dynamically makes it a little harder to figure out
> what
> > > the current state of anything is.
> > >
> > > The disadvantage of the first proposal (and the advantage of the second
> > > proposal) is that making global changes is easier. For example if you
> > want
> > > to globally lower the retention for all topics, in proposal one you
> would
> > > have to iterate over all topics and update the config (this could be
> done
> > > automatically with tooling, but under the covers the tool would do
> this).
> > > In the second case you would just update the default value.
> > >
> > > Thoughts? If no one cares, I will just pick whatever seems best.
> > >
> > > -Jay
> >
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/

Re: default configs

Posted by Jay Kreps <ja...@gmail.com>.
Yes please, any help very much appreciated.

I am not sure if I understand what you are proposing, though. Are you
saying support both the config file and zk for topic-level configs? I hate
to do things where the answer is "do both"...I guess I feel that although
everyone walks away happy it ends up being a lot of code and combinatorial
testing. So if there is a different plan that hits all requirements I like
that better. I am very sensitive to the fact that zookeeper is an okay
key/value store but a really poor replacement for a config management
system. It might be worth while to try to work out a way that meets all
needs, if such a thing exists.

Is bouncing brokers for topic-overrides a problem for you in your
environment? If so how would you fix it?

-Jay

On Fri, Jan 18, 2013 at 7:53 AM, Joe Stein <jo...@medialets.com> wrote:

> Can I help out?
>
> Also can we abstract the config call too?  We have so much in chef, it's
> not that i don't want to call our zookeeper cluster for it but we don't
> have our topology yet mapped out in znodes they are in our own instances of
> code.
>
> It should have both a pull and push for changes, one thing that's nice
> with zookeeper and having a watcher.
>
> /*
> Joe Stein, Chief Architect
> http://www.medialets.com
> Twitter: @allthingshadoop
> Mobile: 917-597-9771
> */
>
> On Jan 18, 2013, at 12:09 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Currently kafka broker config is all statically defined in a properties
> > file with the broker. This mostly works pretty well, but for per-topic
> > configuration (the flush policy, partition count, etc) it is pretty
> painful
> > to have to bounce the broker every time you make a config change.
> >
> > That lead to this proposal:
> > https://cwiki.apache.org/confluence/display/KAFKA/Dynamic+Topic+Config
> >
> > An open question is how topic-default configurations should work.
> >
> > Currently each of our topic-level configs is paired with a default. So
> you
> > would have something like
> >  segment.size.bytes
> > which would be the default, and then you can override this for topics
> that
> > need something different using a map:
> >  segment.size.bytes.per.topic
> >
> > The proposal is to move the topic configuration into zookeeper so that
> for
> > a topic "my-topic" we would have a znode
> >  /brokers/topics/my-topic/config
> > and the contents of this znode would be the topic configuration either as
> > json or properties or whatever.
> >
> > There are two ways this config could work:
> > 1. Defaults resolved at topic creation time: At the time a topic is
> created
> > the user would specify some properties they wanted for that topic, any
> > topic they didn't specify would take the server default. ALL these
> > properties would be stored in the znode.
> > 2. Defaults resolved at config read time: When a topic is created the
> user
> > specifies particularly properties they want and ONLY the properties they
> > particularly specify would be stored. At runtime we would merge these
> > properties with whatever the server defaults currently are.
> >
> > This is a somewhat nuanced point, but perhaps important.
> >
> > The advantage of the first proposal is that it is simple. If you want to
> > know the configuration for a particular topic you go to zookeeper and
> look
> > at that topics config. Mixing the combination of server config and
> > zookeeper config dynamically makes it a little harder to figure out what
> > the current state of anything is.
> >
> > The disadvantage of the first proposal (and the advantage of the second
> > proposal) is that making global changes is easier. For example if you
> want
> > to globally lower the retention for all topics, in proposal one you would
> > have to iterate over all topics and update the config (this could be done
> > automatically with tooling, but under the covers the tool would do this).
> > In the second case you would just update the default value.
> >
> > Thoughts? If no one cares, I will just pick whatever seems best.
> >
> > -Jay
>

Re: default configs

Posted by Joe Stein <jo...@medialets.com>.
Can I help out?

Also can we abstract the config call too?  We have so much in chef, it's not that i don't want to call our zookeeper cluster for it but we don't have our topology yet mapped out in znodes they are in our own instances of code.

It should have both a pull and push for changes, one thing that's nice with zookeeper and having a watcher.

/*
Joe Stein, Chief Architect
http://www.medialets.com
Twitter: @allthingshadoop
Mobile: 917-597-9771
*/

On Jan 18, 2013, at 12:09 AM, Jay Kreps <ja...@gmail.com> wrote:

> Currently kafka broker config is all statically defined in a properties
> file with the broker. This mostly works pretty well, but for per-topic
> configuration (the flush policy, partition count, etc) it is pretty painful
> to have to bounce the broker every time you make a config change.
> 
> That lead to this proposal:
> https://cwiki.apache.org/confluence/display/KAFKA/Dynamic+Topic+Config
> 
> An open question is how topic-default configurations should work.
> 
> Currently each of our topic-level configs is paired with a default. So you
> would have something like
>  segment.size.bytes
> which would be the default, and then you can override this for topics that
> need something different using a map:
>  segment.size.bytes.per.topic
> 
> The proposal is to move the topic configuration into zookeeper so that for
> a topic "my-topic" we would have a znode
>  /brokers/topics/my-topic/config
> and the contents of this znode would be the topic configuration either as
> json or properties or whatever.
> 
> There are two ways this config could work:
> 1. Defaults resolved at topic creation time: At the time a topic is created
> the user would specify some properties they wanted for that topic, any
> topic they didn't specify would take the server default. ALL these
> properties would be stored in the znode.
> 2. Defaults resolved at config read time: When a topic is created the user
> specifies particularly properties they want and ONLY the properties they
> particularly specify would be stored. At runtime we would merge these
> properties with whatever the server defaults currently are.
> 
> This is a somewhat nuanced point, but perhaps important.
> 
> The advantage of the first proposal is that it is simple. If you want to
> know the configuration for a particular topic you go to zookeeper and look
> at that topics config. Mixing the combination of server config and
> zookeeper config dynamically makes it a little harder to figure out what
> the current state of anything is.
> 
> The disadvantage of the first proposal (and the advantage of the second
> proposal) is that making global changes is easier. For example if you want
> to globally lower the retention for all topics, in proposal one you would
> have to iterate over all topics and update the config (this could be done
> automatically with tooling, but under the covers the tool would do this).
> In the second case you would just update the default value.
> 
> Thoughts? If no one cares, I will just pick whatever seems best.
> 
> -Jay