You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Jay Kreps <ja...@gmail.com> on 2014/02/04 18:34:23 UTC

Config for new clients (and server)

We touched on this a bit in previous discussions, but I wanted to draw out
the approach to config specifically as an item of discussion.

The new producer and consumer use a similar key-value config approach as
the existing scala clients but have different implementation code to help
define these configs. The plan is to use the same approach on the server,
once the new clients are complete; so if we agree on this approach it will
be the new default across the board.

Let me split this into two parts. First I will try to motivate the use of
key-value pairs as a configuration api. Then let me discuss the mechanics
of specifying and parsing these. If we agree on the public api then the
public api then the implementation details are interesting as this will be
shared across producer, consumer, and broker and potentially some tools;
but if we disagree about the api then there is no point in discussing the
implementation.

Let me explain the rationale for this. In a sense a key-value map of
configs is the worst possible API to the programmer using the clients. Let
me contrast the pros and cons versus a POJO and motivate why I think it is
still superior overall.

Pro: An application can externalize the configuration of its kafka clients
into its own configuration. Whatever config management system the client
application is using will likely support key-value pairs, so the client
should be able to directly pull whatever configurations are present and use
them in its client. This means that any configuration the client supports
can be added to any application at runtime. With the pojo approach the
client application has to expose each pojo getter as some config parameter.
The result of many applications doing this is that the config is different
for each and it is very hard to have a standard client config shared
across. Moving config into config files allows the usual tooling (version
control, review, audit, config deployments separate from code pushes, etc.).

Pro: Backwards and forwards compatibility. Provided we stick to our java
api many internals can evolve and expose new configs. The application can
support both the new and old client by just specifying a config that will
be unused in the older version (and of course the reverse--we can remove
obsolete configs).

Pro: We can use a similar mechanism for both the client and the server.
Since most people run the server as a stand-alone process it needs a config
file.

Pro: Systems like Samza that need to ship configs across the network can
easily do so as configs have a natural serialized form. This can be done
with pojos using java serialization but it is ugly and has bizare failure
cases.

Con: The IDE gives nice auto-completion for pojos.

Con: There are some advantages to javadoc as a documentation mechanism for
java people.

Basically to me this is about operability versus niceness of api and I
think operability is more important.

Let me now give some details of the config support classes in
kafka.common.config and how they are intended to be used.

The goal of this code is the following:
1. Make specifying configs, their expected type (string, numbers, lists,
etc) simple and declarative
2. Allow for validating simple checks (numeric range checks, etc)
3. Make the config "self-documenting". I.e. we should be able to write code
that generates the configuration documentation off the config def.
4. Specify default values.
5. Track which configs actually get used.
6. Make it easy to get config values.

There are two classes there: ConfigDef and AbstractConfig. ConfigDef
defines the specification of the accepted configurations and AbstractConfig
is a helper class for implementing the configuration class. The difference
is kind of like the difference between a "class" and an "object": ConfigDef
is for specifying the configurations that are accepted, AbstractConfig is
the base class for an instance of these configs.

You can see this in action here:
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD

(Ignore the static config names in there for now...I'm not actually sure
that is the best approach).

So the way this works is that the config specification is defined as:

        config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
"documentation")

                                .define("metadata.timeout.ms", Type.LONG,
60 * 1000, atLeast(0), "documentation")
                                .define("max.partition.size", Type.INT,
16384, atLeast(0), "documentation")


This is used in a ProducerConfig class which extends AbstractConfig to get
access to some helper methods as well as the logic for tracking which
configs get accessed.

Currently I have included static String variables for each of the config
names in that class. However I actually think that is not very helpful as
the javadoc for them doesn't give the constant value and requires
duplicating the documentation. To understand this point look at the javadoc
and note that the doc on the string is not the same as what we define in
the ConfigDef. We could just have the javadoc for the config string be the
source of truth but it is actually pretty inconvient for that as it doesn't
show you the value of the constant, just the variable name (unless you
discover how to unhide it). That is fine for the clients, but for the
server would be very weird especially for non-java people. We could attempt
to duplicate documentation between the javadoc and the ConfigDef but given
our struggle to get well-documented config in a single place this seems
unwise.

So I recommend we have a single source for documentation of these and that
that source be the website documentation on configuration that covers
clients and server and that that be generated off the config defs. The
javadoc on KafkaProducer will link to this table so it should be quite
convenient to discover. This makes things a little more typo prone, but
that should be easily caught by the key detection. This will also make it
possible for us to retire configs in the future without causing compile
failures and add configs without having use of them break backwards
compatibility. This is useful during upgrades where you want to be
compatible with the old and new version so you can roll forwards and
backwards.

-Jay

Re: Config for new clients (and server)

Posted by Pradeep Gollakota <pr...@gmail.com>.

+1 Jun.


On Mon, Feb 10, 2014 at 2:17 PM, Sriram Subramanian <
srsubramanian@linkedin.com> wrote:

> +1 on Jun's suggestion.
>
> On 2/10/14 2:01 PM, "Jun Rao" <ju...@gmail.com> wrote:
>
> >I actually prefer to see those at INFO level. The reason is that the
> >config
> >system in an application can be complex. Some configs can be overridden in
> >different layers and it may not be easy to determine what the final
> >binding
> >value is. The logging in Kafka will serve as the source of truth.
> >
> >For reference, ZK client logs all overridden values during initialization.
> >It's a one time thing during starting up, so shouldn't add much noise.
> >It's
> >very useful for debugging subtle config issues.
> >
> >Exposing final configs programmatically is potentially useful. If we don't
> >want to log overridden values out of box, an app can achieve the same
> >thing
> >using the programming api. The only missing thing is that we won't know
> >those unused property keys, which is probably less important than seeing
> >the overridden values.
> >
> >Thanks,
> >
> >Jun
> >
> >
> >On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> >> Hey Jun,
> >>
> >> I think that is reasonable but would object to having it be debug
> >>logging?
> >> I think logging out a bunch of noise during normal operation in a client
> >> library is pretty ugly. Also, is there value in exposing the final
> >>configs
> >> programmatically?
> >>
> >> -Jay
> >>
> >>
> >>
> >> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:
> >>
> >> > +1 on the new config. Just one comment. Currently, when initiating a
> >> config
> >> > (e.g. ProducerConfig), we log those overridden property values and
> >>unused
> >> > property keys (likely due to mis-spelling). This has been very useful
> >>for
> >> > config verification. It would be good to add similar support in the
> >>new
> >> > config.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> >
> >> > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com>
> wrote:
> >> >
> >> > > We touched on this a bit in previous discussions, but I wanted to
> >>draw
> >> > out
> >> > > the approach to config specifically as an item of discussion.
> >> > >
> >> > > The new producer and consumer use a similar key-value config
> >>approach
> >> as
> >> > > the existing scala clients but have different implementation code to
> >> help
> >> > > define these configs. The plan is to use the same approach on the
> >> server,
> >> > > once the new clients are complete; so if we agree on this approach
> >>it
> >> > will
> >> > > be the new default across the board.
> >> > >
> >> > > Let me split this into two parts. First I will try to motivate the
> >>use
> >> of
> >> > > key-value pairs as a configuration api. Then let me discuss the
> >> mechanics
> >> > > of specifying and parsing these. If we agree on the public api then
> >>the
> >> > > public api then the implementation details are interesting as this
> >>will
> >> > be
> >> > > shared across producer, consumer, and broker and potentially some
> >> tools;
> >> > > but if we disagree about the api then there is no point in
> >>discussing
> >> the
> >> > > implementation.
> >> > >
> >> > > Let me explain the rationale for this. In a sense a key-value map of
> >> > > configs is the worst possible API to the programmer using the
> >>clients.
> >> > Let
> >> > > me contrast the pros and cons versus a POJO and motivate why I
> >>think it
> >> > is
> >> > > still superior overall.
> >> > >
> >> > > Pro: An application can externalize the configuration of its kafka
> >> > clients
> >> > > into its own configuration. Whatever config management system the
> >> client
> >> > > application is using will likely support key-value pairs, so the
> >>client
> >> > > should be able to directly pull whatever configurations are present
> >>and
> >> > use
> >> > > them in its client. This means that any configuration the client
> >> supports
> >> > > can be added to any application at runtime. With the pojo approach
> >>the
> >> > > client application has to expose each pojo getter as some config
> >> > parameter.
> >> > > The result of many applications doing this is that the config is
> >> > different
> >> > > for each and it is very hard to have a standard client config shared
> >> > > across. Moving config into config files allows the usual tooling
> >> (version
> >> > > control, review, audit, config deployments separate from code
> >>pushes,
> >> > > etc.).
> >> > >
> >> > > Pro: Backwards and forwards compatibility. Provided we stick to our
> >> java
> >> > > api many internals can evolve and expose new configs. The
> >>application
> >> can
> >> > > support both the new and old client by just specifying a config that
> >> will
> >> > > be unused in the older version (and of course the reverse--we can
> >> remove
> >> > > obsolete configs).
> >> > >
> >> > > Pro: We can use a similar mechanism for both the client and the
> >>server.
> >> > > Since most people run the server as a stand-alone process it needs a
> >> > config
> >> > > file.
> >> > >
> >> > > Pro: Systems like Samza that need to ship configs across the network
> >> can
> >> > > easily do so as configs have a natural serialized form. This can be
> >> done
> >> > > with pojos using java serialization but it is ugly and has bizare
> >> failure
> >> > > cases.
> >> > >
> >> > > Con: The IDE gives nice auto-completion for pojos.
> >> > >
> >> > > Con: There are some advantages to javadoc as a documentation
> >>mechanism
> >> > for
> >> > > java people.
> >> > >
> >> > > Basically to me this is about operability versus niceness of api
> >>and I
> >> > > think operability is more important.
> >> > >
> >> > > Let me now give some details of the config support classes in
> >> > > kafka.common.config and how they are intended to be used.
> >> > >
> >> > > The goal of this code is the following:
> >> > > 1. Make specifying configs, their expected type (string, numbers,
> >> lists,
> >> > > etc) simple and declarative
> >> > > 2. Allow for validating simple checks (numeric range checks, etc)
> >> > > 3. Make the config "self-documenting". I.e. we should be able to
> >>write
> >> > code
> >> > > that generates the configuration documentation off the config def.
> >> > > 4. Specify default values.
> >> > > 5. Track which configs actually get used.
> >> > > 6. Make it easy to get config values.
> >> > >
> >> > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> >> > > defines the specification of the accepted configurations and
> >> > AbstractConfig
> >> > > is a helper class for implementing the configuration class. The
> >> > difference
> >> > > is kind of like the difference between a "class" and an "object":
> >> > ConfigDef
> >> > > is for specifying the configurations that are accepted,
> >>AbstractConfig
> >> is
> >> > > the base class for an instance of these configs.
> >> > >
> >> > > You can see this in action here:
> >> > >
> >> > >
> >> >
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=client
> >>s/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> >> > >
> >> > > (Ignore the static config names in there for now...I'm not actually
> >> sure
> >> > > that is the best approach).
> >> > >
> >> > > So the way this works is that the config specification is defined
> >>as:
> >> > >
> >> > >         config = new ConfigDef().define("bootstrap.brokers",
> >>Type.LIST,
> >> > > "documentation")
> >> > >
> >> > >                                 .define("metadata.timeout.ms",
> >> > Type.LONG,
> >> > > 60 * 1000, atLeast(0), "documentation")
> >> > >                                 .define("max.partition.size",
> >>Type.INT,
> >> > > 16384, atLeast(0), "documentation")
> >> > >
> >> > >
> >> > > This is used in a ProducerConfig class which extends AbstractConfig
> >>to
> >> > get
> >> > > access to some helper methods as well as the logic for tracking
> >>which
> >> > > configs get accessed.
> >> > >
> >> > > Currently I have included static String variables for each of the
> >> config
> >> > > names in that class. However I actually think that is not very
> >>helpful
> >> as
> >> > > the javadoc for them doesn't give the constant value and requires
> >> > > duplicating the documentation. To understand this point look at the
> >> > javadoc
> >> > > and note that the doc on the string is not the same as what we
> >>define
> >> in
> >> > > the ConfigDef. We could just have the javadoc for the config string
> >>be
> >> > the
> >> > > source of truth but it is actually pretty inconvient for that as it
> >> > doesn't
> >> > > show you the value of the constant, just the variable name (unless
> >>you
> >> > > discover how to unhide it). That is fine for the clients, but for
> >>the
> >> > > server would be very weird especially for non-java people. We could
> >> > attempt
> >> > > to duplicate documentation between the javadoc and the ConfigDef but
> >> > given
> >> > > our struggle to get well-documented config in a single place this
> >>seems
> >> > > unwise.
> >> > >
> >> > > So I recommend we have a single source for documentation of these
> >>and
> >> > that
> >> > > that source be the website documentation on configuration that
> >>covers
> >> > > clients and server and that that be generated off the config defs.
> >>The
> >> > > javadoc on KafkaProducer will link to this table so it should be
> >>quite
> >> > > convenient to discover. This makes things a little more typo prone,
> >>but
> >> > > that should be easily caught by the key detection. This will also
> >>make
> >> it
> >> > > possible for us to retire configs in the future without causing
> >>compile
> >> > > failures and add configs without having use of them break backwards
> >> > > compatibility. This is useful during upgrades where you want to be
> >> > > compatible with the old and new version so you can roll forwards and
> >> > > backwards.
> >> > >
> >> > > -Jay
> >> > >
> >> >
> >>
>
>

Re: Config for new clients (and server)

Posted by Pradeep Gollakota <pr...@gmail.com>.

+1 Jun.


On Mon, Feb 10, 2014 at 2:17 PM, Sriram Subramanian <
srsubramanian@linkedin.com> wrote:

> +1 on Jun's suggestion.
>
> On 2/10/14 2:01 PM, "Jun Rao" <ju...@gmail.com> wrote:
>
> >I actually prefer to see those at INFO level. The reason is that the
> >config
> >system in an application can be complex. Some configs can be overridden in
> >different layers and it may not be easy to determine what the final
> >binding
> >value is. The logging in Kafka will serve as the source of truth.
> >
> >For reference, ZK client logs all overridden values during initialization.
> >It's a one time thing during starting up, so shouldn't add much noise.
> >It's
> >very useful for debugging subtle config issues.
> >
> >Exposing final configs programmatically is potentially useful. If we don't
> >want to log overridden values out of box, an app can achieve the same
> >thing
> >using the programming api. The only missing thing is that we won't know
> >those unused property keys, which is probably less important than seeing
> >the overridden values.
> >
> >Thanks,
> >
> >Jun
> >
> >
> >On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> >> Hey Jun,
> >>
> >> I think that is reasonable but would object to having it be debug
> >>logging?
> >> I think logging out a bunch of noise during normal operation in a client
> >> library is pretty ugly. Also, is there value in exposing the final
> >>configs
> >> programmatically?
> >>
> >> -Jay
> >>
> >>
> >>
> >> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:
> >>
> >> > +1 on the new config. Just one comment. Currently, when initiating a
> >> config
> >> > (e.g. ProducerConfig), we log those overridden property values and
> >>unused
> >> > property keys (likely due to mis-spelling). This has been very useful
> >>for
> >> > config verification. It would be good to add similar support in the
> >>new
> >> > config.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> >
> >> > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com>
> wrote:
> >> >
> >> > > We touched on this a bit in previous discussions, but I wanted to
> >>draw
> >> > out
> >> > > the approach to config specifically as an item of discussion.
> >> > >
> >> > > The new producer and consumer use a similar key-value config
> >>approach
> >> as
> >> > > the existing scala clients but have different implementation code to
> >> help
> >> > > define these configs. The plan is to use the same approach on the
> >> server,
> >> > > once the new clients are complete; so if we agree on this approach
> >>it
> >> > will
> >> > > be the new default across the board.
> >> > >
> >> > > Let me split this into two parts. First I will try to motivate the
> >>use
> >> of
> >> > > key-value pairs as a configuration api. Then let me discuss the
> >> mechanics
> >> > > of specifying and parsing these. If we agree on the public api then
> >>the
> >> > > public api then the implementation details are interesting as this
> >>will
> >> > be
> >> > > shared across producer, consumer, and broker and potentially some
> >> tools;
> >> > > but if we disagree about the api then there is no point in
> >>discussing
> >> the
> >> > > implementation.
> >> > >
> >> > > Let me explain the rationale for this. In a sense a key-value map of
> >> > > configs is the worst possible API to the programmer using the
> >>clients.
> >> > Let
> >> > > me contrast the pros and cons versus a POJO and motivate why I
> >>think it
> >> > is
> >> > > still superior overall.
> >> > >
> >> > > Pro: An application can externalize the configuration of its kafka
> >> > clients
> >> > > into its own configuration. Whatever config management system the
> >> client
> >> > > application is using will likely support key-value pairs, so the
> >>client
> >> > > should be able to directly pull whatever configurations are present
> >>and
> >> > use
> >> > > them in its client. This means that any configuration the client
> >> supports
> >> > > can be added to any application at runtime. With the pojo approach
> >>the
> >> > > client application has to expose each pojo getter as some config
> >> > parameter.
> >> > > The result of many applications doing this is that the config is
> >> > different
> >> > > for each and it is very hard to have a standard client config shared
> >> > > across. Moving config into config files allows the usual tooling
> >> (version
> >> > > control, review, audit, config deployments separate from code
> >>pushes,
> >> > > etc.).
> >> > >
> >> > > Pro: Backwards and forwards compatibility. Provided we stick to our
> >> java
> >> > > api many internals can evolve and expose new configs. The
> >>application
> >> can
> >> > > support both the new and old client by just specifying a config that
> >> will
> >> > > be unused in the older version (and of course the reverse--we can
> >> remove
> >> > > obsolete configs).
> >> > >
> >> > > Pro: We can use a similar mechanism for both the client and the
> >>server.
> >> > > Since most people run the server as a stand-alone process it needs a
> >> > config
> >> > > file.
> >> > >
> >> > > Pro: Systems like Samza that need to ship configs across the network
> >> can
> >> > > easily do so as configs have a natural serialized form. This can be
> >> done
> >> > > with pojos using java serialization but it is ugly and has bizare
> >> failure
> >> > > cases.
> >> > >
> >> > > Con: The IDE gives nice auto-completion for pojos.
> >> > >
> >> > > Con: There are some advantages to javadoc as a documentation
> >>mechanism
> >> > for
> >> > > java people.
> >> > >
> >> > > Basically to me this is about operability versus niceness of api
> >>and I
> >> > > think operability is more important.
> >> > >
> >> > > Let me now give some details of the config support classes in
> >> > > kafka.common.config and how they are intended to be used.
> >> > >
> >> > > The goal of this code is the following:
> >> > > 1. Make specifying configs, their expected type (string, numbers,
> >> lists,
> >> > > etc) simple and declarative
> >> > > 2. Allow for validating simple checks (numeric range checks, etc)
> >> > > 3. Make the config "self-documenting". I.e. we should be able to
> >>write
> >> > code
> >> > > that generates the configuration documentation off the config def.
> >> > > 4. Specify default values.
> >> > > 5. Track which configs actually get used.
> >> > > 6. Make it easy to get config values.
> >> > >
> >> > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> >> > > defines the specification of the accepted configurations and
> >> > AbstractConfig
> >> > > is a helper class for implementing the configuration class. The
> >> > difference
> >> > > is kind of like the difference between a "class" and an "object":
> >> > ConfigDef
> >> > > is for specifying the configurations that are accepted,
> >>AbstractConfig
> >> is
> >> > > the base class for an instance of these configs.
> >> > >
> >> > > You can see this in action here:
> >> > >
> >> > >
> >> >
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=client
> >>s/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> >> > >
> >> > > (Ignore the static config names in there for now...I'm not actually
> >> sure
> >> > > that is the best approach).
> >> > >
> >> > > So the way this works is that the config specification is defined
> >>as:
> >> > >
> >> > >         config = new ConfigDef().define("bootstrap.brokers",
> >>Type.LIST,
> >> > > "documentation")
> >> > >
> >> > >                                 .define("metadata.timeout.ms",
> >> > Type.LONG,
> >> > > 60 * 1000, atLeast(0), "documentation")
> >> > >                                 .define("max.partition.size",
> >>Type.INT,
> >> > > 16384, atLeast(0), "documentation")
> >> > >
> >> > >
> >> > > This is used in a ProducerConfig class which extends AbstractConfig
> >>to
> >> > get
> >> > > access to some helper methods as well as the logic for tracking
> >>which
> >> > > configs get accessed.
> >> > >
> >> > > Currently I have included static String variables for each of the
> >> config
> >> > > names in that class. However I actually think that is not very
> >>helpful
> >> as
> >> > > the javadoc for them doesn't give the constant value and requires
> >> > > duplicating the documentation. To understand this point look at the
> >> > javadoc
> >> > > and note that the doc on the string is not the same as what we
> >>define
> >> in
> >> > > the ConfigDef. We could just have the javadoc for the config string
> >>be
> >> > the
> >> > > source of truth but it is actually pretty inconvient for that as it
> >> > doesn't
> >> > > show you the value of the constant, just the variable name (unless
> >>you
> >> > > discover how to unhide it). That is fine for the clients, but for
> >>the
> >> > > server would be very weird especially for non-java people. We could
> >> > attempt
> >> > > to duplicate documentation between the javadoc and the ConfigDef but
> >> > given
> >> > > our struggle to get well-documented config in a single place this
> >>seems
> >> > > unwise.
> >> > >
> >> > > So I recommend we have a single source for documentation of these
> >>and
> >> > that
> >> > > that source be the website documentation on configuration that
> >>covers
> >> > > clients and server and that that be generated off the config defs.
> >>The
> >> > > javadoc on KafkaProducer will link to this table so it should be
> >>quite
> >> > > convenient to discover. This makes things a little more typo prone,
> >>but
> >> > > that should be easily caught by the key detection. This will also
> >>make
> >> it
> >> > > possible for us to retire configs in the future without causing
> >>compile
> >> > > failures and add configs without having use of them break backwards
> >> > > compatibility. This is useful during upgrades where you want to be
> >> > > compatible with the old and new version so you can roll forwards and
> >> > > backwards.
> >> > >
> >> > > -Jay
> >> > >
> >> >
> >>
>
>

Re: Config for new clients (and server)

Posted by Steve Morin <st...@gmail.com>.

I would agree Jun

> On Feb 11, 2014, at 21:30, Jun Rao <ju...@gmail.com> wrote:
> 
> I actually think this is useful for non-LinkedIn users as well. The
> following is the tradeoff that I see.
> 
> Most users probably won't care about seeing an extra 10-20 lines of INFO
> level logging when starting up a client. However, it's very easy for users
> to (1) mis-spell a config name (there was an issue in the mailing list just
> a few days ago when a user mistyped "advertised.host" to "advertise.host")
> or (2) inadvertently override a config value through the config systems. In
> both cases, the INFO level logging will make it much easier for the user to
> realize the human mistake. So, I think this is a case where the benefit
> outweighs the disadvantage.
> 
> Thanks,
> 
> Jun
> 
> 
>> On Mon, Feb 10, 2014 at 4:13 PM, Jay Kreps <ja...@gmail.com> wrote:
>> 
>> Yeah I am aware of how zookeeper behaves, I think it is kind of gross.
>> 
>> I think logging it at DEBUG gets you what you want--by default we don't
>> pollute logs, but anyone who wants to log this can enable DEBUG logging on
>> org.apache.kafka.clients.producer.ProducerConfig.
>> 
>> If we want this on by default at LinkedIn we can just set this logger to
>> debug in our wrapper, we don't need to inflict this on everyone.
>> 
>> The point is that spewing out each config IS a debug according to our
>> definition:
>>  http://kafka.apache.org/coding-guide.html
>> 
>> -Jay
>> 
>> 
>>> On Mon, Feb 10, 2014 at 2:01 PM, Jun Rao <ju...@gmail.com> wrote:
>>> 
>>> I actually prefer to see those at INFO level. The reason is that the
>> config
>>> system in an application can be complex. Some configs can be overridden
>> in
>>> different layers and it may not be easy to determine what the final
>> binding
>>> value is. The logging in Kafka will serve as the source of truth.
>>> 
>>> For reference, ZK client logs all overridden values during
>> initialization.
>>> It's a one time thing during starting up, so shouldn't add much noise.
>> It's
>>> very useful for debugging subtle config issues.
>>> 
>>> Exposing final configs programmatically is potentially useful. If we
>> don't
>>> want to log overridden values out of box, an app can achieve the same
>> thing
>>> using the programming api. The only missing thing is that we won't know
>>> those unused property keys, which is probably less important than seeing
>>> the overridden values.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> 
>>>> On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <ja...@gmail.com> wrote:
>>>> 
>>>> Hey Jun,
>>>> 
>>>> I think that is reasonable but would object to having it be debug
>>> logging?
>>>> I think logging out a bunch of noise during normal operation in a
>> client
>>>> library is pretty ugly. Also, is there value in exposing the final
>>> configs
>>>> programmatically?
>>>> 
>>>> -Jay
>>>> 
>>>> 
>>>> 
>>>>> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:
>>>>> 
>>>>> +1 on the new config. Just one comment. Currently, when initiating a
>>>> config
>>>>> (e.g. ProducerConfig), we log those overridden property values and
>>> unused
>>>>> property keys (likely due to mis-spelling). This has been very useful
>>> for
>>>>> config verification. It would be good to add similar support in the
>> new
>>>>> config.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> 
>>>>> On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com>
>> wrote:
>>>>> 
>>>>>> We touched on this a bit in previous discussions, but I wanted to
>>> draw
>>>>> out
>>>>>> the approach to config specifically as an item of discussion.
>>>>>> 
>>>>>> The new producer and consumer use a similar key-value config
>> approach
>>>> as
>>>>>> the existing scala clients but have different implementation code
>> to
>>>> help
>>>>>> define these configs. The plan is to use the same approach on the
>>>> server,
>>>>>> once the new clients are complete; so if we agree on this approach
>> it
>>>>> will
>>>>>> be the new default across the board.
>>>>>> 
>>>>>> Let me split this into two parts. First I will try to motivate the
>>> use
>>>> of
>>>>>> key-value pairs as a configuration api. Then let me discuss the
>>>> mechanics
>>>>>> of specifying and parsing these. If we agree on the public api then
>>> the
>>>>>> public api then the implementation details are interesting as this
>>> will
>>>>> be
>>>>>> shared across producer, consumer, and broker and potentially some
>>>> tools;
>>>>>> but if we disagree about the api then there is no point in
>> discussing
>>>> the
>>>>>> implementation.
>>>>>> 
>>>>>> Let me explain the rationale for this. In a sense a key-value map
>> of
>>>>>> configs is the worst possible API to the programmer using the
>>> clients.
>>>>> Let
>>>>>> me contrast the pros and cons versus a POJO and motivate why I
>> think
>>> it
>>>>> is
>>>>>> still superior overall.
>>>>>> 
>>>>>> Pro: An application can externalize the configuration of its kafka
>>>>> clients
>>>>>> into its own configuration. Whatever config management system the
>>>> client
>>>>>> application is using will likely support key-value pairs, so the
>>> client
>>>>>> should be able to directly pull whatever configurations are present
>>> and
>>>>> use
>>>>>> them in its client. This means that any configuration the client
>>>> supports
>>>>>> can be added to any application at runtime. With the pojo approach
>>> the
>>>>>> client application has to expose each pojo getter as some config
>>>>> parameter.
>>>>>> The result of many applications doing this is that the config is
>>>>> different
>>>>>> for each and it is very hard to have a standard client config
>> shared
>>>>>> across. Moving config into config files allows the usual tooling
>>>> (version
>>>>>> control, review, audit, config deployments separate from code
>> pushes,
>>>>>> etc.).
>>>>>> 
>>>>>> Pro: Backwards and forwards compatibility. Provided we stick to our
>>>> java
>>>>>> api many internals can evolve and expose new configs. The
>> application
>>>> can
>>>>>> support both the new and old client by just specifying a config
>> that
>>>> will
>>>>>> be unused in the older version (and of course the reverse--we can
>>>> remove
>>>>>> obsolete configs).
>>>>>> 
>>>>>> Pro: We can use a similar mechanism for both the client and the
>>> server.
>>>>>> Since most people run the server as a stand-alone process it needs
>> a
>>>>> config
>>>>>> file.
>>>>>> 
>>>>>> Pro: Systems like Samza that need to ship configs across the
>> network
>>>> can
>>>>>> easily do so as configs have a natural serialized form. This can be
>>>> done
>>>>>> with pojos using java serialization but it is ugly and has bizare
>>>> failure
>>>>>> cases.
>>>>>> 
>>>>>> Con: The IDE gives nice auto-completion for pojos.
>>>>>> 
>>>>>> Con: There are some advantages to javadoc as a documentation
>>> mechanism
>>>>> for
>>>>>> java people.
>>>>>> 
>>>>>> Basically to me this is about operability versus niceness of api
>> and
>>> I
>>>>>> think operability is more important.
>>>>>> 
>>>>>> Let me now give some details of the config support classes in
>>>>>> kafka.common.config and how they are intended to be used.
>>>>>> 
>>>>>> The goal of this code is the following:
>>>>>> 1. Make specifying configs, their expected type (string, numbers,
>>>> lists,
>>>>>> etc) simple and declarative
>>>>>> 2. Allow for validating simple checks (numeric range checks, etc)
>>>>>> 3. Make the config "self-documenting". I.e. we should be able to
>>> write
>>>>> code
>>>>>> that generates the configuration documentation off the config def.
>>>>>> 4. Specify default values.
>>>>>> 5. Track which configs actually get used.
>>>>>> 6. Make it easy to get config values.
>>>>>> 
>>>>>> There are two classes there: ConfigDef and AbstractConfig.
>> ConfigDef
>>>>>> defines the specification of the accepted configurations and
>>>>> AbstractConfig
>>>>>> is a helper class for implementing the configuration class. The
>>>>> difference
>>>>>> is kind of like the difference between a "class" and an "object":
>>>>> ConfigDef
>>>>>> is for specifying the configurations that are accepted,
>>> AbstractConfig
>>>> is
>>>>>> the base class for an instance of these configs.
>>>>>> 
>>>>>> You can see this in action here:
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
>>>>>> 
>>>>>> (Ignore the static config names in there for now...I'm not actually
>>>> sure
>>>>>> that is the best approach).
>>>>>> 
>>>>>> So the way this works is that the config specification is defined
>> as:
>>>>>> 
>>>>>>        config = new ConfigDef().define("bootstrap.brokers",
>>> Type.LIST,
>>>>>> "documentation")
>>>>>> 
>>>>>>                                .define("metadata.timeout.ms",
>>>>> Type.LONG,
>>>>>> 60 * 1000, atLeast(0), "documentation")
>>>>>>                                .define("max.partition.size",
>>> Type.INT,
>>>>>> 16384, atLeast(0), "documentation")
>>>>>> 
>>>>>> 
>>>>>> This is used in a ProducerConfig class which extends AbstractConfig
>>> to
>>>>> get
>>>>>> access to some helper methods as well as the logic for tracking
>> which
>>>>>> configs get accessed.
>>>>>> 
>>>>>> Currently I have included static String variables for each of the
>>>> config
>>>>>> names in that class. However I actually think that is not very
>>> helpful
>>>> as
>>>>>> the javadoc for them doesn't give the constant value and requires
>>>>>> duplicating the documentation. To understand this point look at the
>>>>> javadoc
>>>>>> and note that the doc on the string is not the same as what we
>> define
>>>> in
>>>>>> the ConfigDef. We could just have the javadoc for the config string
>>> be
>>>>> the
>>>>>> source of truth but it is actually pretty inconvient for that as it
>>>>> doesn't
>>>>>> show you the value of the constant, just the variable name (unless
>>> you
>>>>>> discover how to unhide it). That is fine for the clients, but for
>> the
>>>>>> server would be very weird especially for non-java people. We could
>>>>> attempt
>>>>>> to duplicate documentation between the javadoc and the ConfigDef
>> but
>>>>> given
>>>>>> our struggle to get well-documented config in a single place this
>>> seems
>>>>>> unwise.
>>>>>> 
>>>>>> So I recommend we have a single source for documentation of these
>> and
>>>>> that
>>>>>> that source be the website documentation on configuration that
>> covers
>>>>>> clients and server and that that be generated off the config defs.
>>> The
>>>>>> javadoc on KafkaProducer will link to this table so it should be
>>> quite
>>>>>> convenient to discover. This makes things a little more typo prone,
>>> but
>>>>>> that should be easily caught by the key detection. This will also
>>> make
>>>> it
>>>>>> possible for us to retire configs in the future without causing
>>> compile
>>>>>> failures and add configs without having use of them break backwards
>>>>>> compatibility. This is useful during upgrades where you want to be
>>>>>> compatible with the old and new version so you can roll forwards
>> and
>>>>>> backwards.
>>>>>> 
>>>>>> -Jay
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Config for new clients (and server)

Posted by Jun Rao <ju...@gmail.com>.

I actually think this is useful for non-LinkedIn users as well. The
following is the tradeoff that I see.

Most users probably won't care about seeing an extra 10-20 lines of INFO
level logging when starting up a client. However, it's very easy for users
to (1) mis-spell a config name (there was an issue in the mailing list just
a few days ago when a user mistyped "advertised.host" to "advertise.host")
or (2) inadvertently override a config value through the config systems. In
both cases, the INFO level logging will make it much easier for the user to
realize the human mistake. So, I think this is a case where the benefit
outweighs the disadvantage.

Thanks,

Jun


On Mon, Feb 10, 2014 at 4:13 PM, Jay Kreps <ja...@gmail.com> wrote:

> Yeah I am aware of how zookeeper behaves, I think it is kind of gross.
>
> I think logging it at DEBUG gets you what you want--by default we don't
> pollute logs, but anyone who wants to log this can enable DEBUG logging on
> org.apache.kafka.clients.producer.ProducerConfig.
>
> If we want this on by default at LinkedIn we can just set this logger to
> debug in our wrapper, we don't need to inflict this on everyone.
>
> The point is that spewing out each config IS a debug according to our
> definition:
>   http://kafka.apache.org/coding-guide.html
>
> -Jay
>
>
> On Mon, Feb 10, 2014 at 2:01 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > I actually prefer to see those at INFO level. The reason is that the
> config
> > system in an application can be complex. Some configs can be overridden
> in
> > different layers and it may not be easy to determine what the final
> binding
> > value is. The logging in Kafka will serve as the source of truth.
> >
> > For reference, ZK client logs all overridden values during
> initialization.
> > It's a one time thing during starting up, so shouldn't add much noise.
> It's
> > very useful for debugging subtle config issues.
> >
> > Exposing final configs programmatically is potentially useful. If we
> don't
> > want to log overridden values out of box, an app can achieve the same
> thing
> > using the programming api. The only missing thing is that we won't know
> > those unused property keys, which is probably less important than seeing
> > the overridden values.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > Hey Jun,
> > >
> > > I think that is reasonable but would object to having it be debug
> > logging?
> > > I think logging out a bunch of noise during normal operation in a
> client
> > > library is pretty ugly. Also, is there value in exposing the final
> > configs
> > > programmatically?
> > >
> > > -Jay
> > >
> > >
> > >
> > > On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > +1 on the new config. Just one comment. Currently, when initiating a
> > > config
> > > > (e.g. ProducerConfig), we log those overridden property values and
> > unused
> > > > property keys (likely due to mis-spelling). This has been very useful
> > for
> > > > config verification. It would be good to add similar support in the
> new
> > > > config.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com>
> wrote:
> > > >
> > > > > We touched on this a bit in previous discussions, but I wanted to
> > draw
> > > > out
> > > > > the approach to config specifically as an item of discussion.
> > > > >
> > > > > The new producer and consumer use a similar key-value config
> approach
> > > as
> > > > > the existing scala clients but have different implementation code
> to
> > > help
> > > > > define these configs. The plan is to use the same approach on the
> > > server,
> > > > > once the new clients are complete; so if we agree on this approach
> it
> > > > will
> > > > > be the new default across the board.
> > > > >
> > > > > Let me split this into two parts. First I will try to motivate the
> > use
> > > of
> > > > > key-value pairs as a configuration api. Then let me discuss the
> > > mechanics
> > > > > of specifying and parsing these. If we agree on the public api then
> > the
> > > > > public api then the implementation details are interesting as this
> > will
> > > > be
> > > > > shared across producer, consumer, and broker and potentially some
> > > tools;
> > > > > but if we disagree about the api then there is no point in
> discussing
> > > the
> > > > > implementation.
> > > > >
> > > > > Let me explain the rationale for this. In a sense a key-value map
> of
> > > > > configs is the worst possible API to the programmer using the
> > clients.
> > > > Let
> > > > > me contrast the pros and cons versus a POJO and motivate why I
> think
> > it
> > > > is
> > > > > still superior overall.
> > > > >
> > > > > Pro: An application can externalize the configuration of its kafka
> > > > clients
> > > > > into its own configuration. Whatever config management system the
> > > client
> > > > > application is using will likely support key-value pairs, so the
> > client
> > > > > should be able to directly pull whatever configurations are present
> > and
> > > > use
> > > > > them in its client. This means that any configuration the client
> > > supports
> > > > > can be added to any application at runtime. With the pojo approach
> > the
> > > > > client application has to expose each pojo getter as some config
> > > > parameter.
> > > > > The result of many applications doing this is that the config is
> > > > different
> > > > > for each and it is very hard to have a standard client config
> shared
> > > > > across. Moving config into config files allows the usual tooling
> > > (version
> > > > > control, review, audit, config deployments separate from code
> pushes,
> > > > > etc.).
> > > > >
> > > > > Pro: Backwards and forwards compatibility. Provided we stick to our
> > > java
> > > > > api many internals can evolve and expose new configs. The
> application
> > > can
> > > > > support both the new and old client by just specifying a config
> that
> > > will
> > > > > be unused in the older version (and of course the reverse--we can
> > > remove
> > > > > obsolete configs).
> > > > >
> > > > > Pro: We can use a similar mechanism for both the client and the
> > server.
> > > > > Since most people run the server as a stand-alone process it needs
> a
> > > > config
> > > > > file.
> > > > >
> > > > > Pro: Systems like Samza that need to ship configs across the
> network
> > > can
> > > > > easily do so as configs have a natural serialized form. This can be
> > > done
> > > > > with pojos using java serialization but it is ugly and has bizare
> > > failure
> > > > > cases.
> > > > >
> > > > > Con: The IDE gives nice auto-completion for pojos.
> > > > >
> > > > > Con: There are some advantages to javadoc as a documentation
> > mechanism
> > > > for
> > > > > java people.
> > > > >
> > > > > Basically to me this is about operability versus niceness of api
> and
> > I
> > > > > think operability is more important.
> > > > >
> > > > > Let me now give some details of the config support classes in
> > > > > kafka.common.config and how they are intended to be used.
> > > > >
> > > > > The goal of this code is the following:
> > > > > 1. Make specifying configs, their expected type (string, numbers,
> > > lists,
> > > > > etc) simple and declarative
> > > > > 2. Allow for validating simple checks (numeric range checks, etc)
> > > > > 3. Make the config "self-documenting". I.e. we should be able to
> > write
> > > > code
> > > > > that generates the configuration documentation off the config def.
> > > > > 4. Specify default values.
> > > > > 5. Track which configs actually get used.
> > > > > 6. Make it easy to get config values.
> > > > >
> > > > > There are two classes there: ConfigDef and AbstractConfig.
> ConfigDef
> > > > > defines the specification of the accepted configurations and
> > > > AbstractConfig
> > > > > is a helper class for implementing the configuration class. The
> > > > difference
> > > > > is kind of like the difference between a "class" and an "object":
> > > > ConfigDef
> > > > > is for specifying the configurations that are accepted,
> > AbstractConfig
> > > is
> > > > > the base class for an instance of these configs.
> > > > >
> > > > > You can see this in action here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> > > > >
> > > > > (Ignore the static config names in there for now...I'm not actually
> > > sure
> > > > > that is the best approach).
> > > > >
> > > > > So the way this works is that the config specification is defined
> as:
> > > > >
> > > > >         config = new ConfigDef().define("bootstrap.brokers",
> > Type.LIST,
> > > > > "documentation")
> > > > >
> > > > >                                 .define("metadata.timeout.ms",
> > > > Type.LONG,
> > > > > 60 * 1000, atLeast(0), "documentation")
> > > > >                                 .define("max.partition.size",
> > Type.INT,
> > > > > 16384, atLeast(0), "documentation")
> > > > >
> > > > >
> > > > > This is used in a ProducerConfig class which extends AbstractConfig
> > to
> > > > get
> > > > > access to some helper methods as well as the logic for tracking
> which
> > > > > configs get accessed.
> > > > >
> > > > > Currently I have included static String variables for each of the
> > > config
> > > > > names in that class. However I actually think that is not very
> > helpful
> > > as
> > > > > the javadoc for them doesn't give the constant value and requires
> > > > > duplicating the documentation. To understand this point look at the
> > > > javadoc
> > > > > and note that the doc on the string is not the same as what we
> define
> > > in
> > > > > the ConfigDef. We could just have the javadoc for the config string
> > be
> > > > the
> > > > > source of truth but it is actually pretty inconvient for that as it
> > > > doesn't
> > > > > show you the value of the constant, just the variable name (unless
> > you
> > > > > discover how to unhide it). That is fine for the clients, but for
> the
> > > > > server would be very weird especially for non-java people. We could
> > > > attempt
> > > > > to duplicate documentation between the javadoc and the ConfigDef
> but
> > > > given
> > > > > our struggle to get well-documented config in a single place this
> > seems
> > > > > unwise.
> > > > >
> > > > > So I recommend we have a single source for documentation of these
> and
> > > > that
> > > > > that source be the website documentation on configuration that
> covers
> > > > > clients and server and that that be generated off the config defs.
> > The
> > > > > javadoc on KafkaProducer will link to this table so it should be
> > quite
> > > > > convenient to discover. This makes things a little more typo prone,
> > but
> > > > > that should be easily caught by the key detection. This will also
> > make
> > > it
> > > > > possible for us to retire configs in the future without causing
> > compile
> > > > > failures and add configs without having use of them break backwards
> > > > > compatibility. This is useful during upgrades where you want to be
> > > > > compatible with the old and new version so you can roll forwards
> and
> > > > > backwards.
> > > > >
> > > > > -Jay
> > > > >
> > > >
> > >
> >
>

Re: Config for new clients (and server)

Posted by Jay Kreps <ja...@gmail.com>.

Yeah I am aware of how zookeeper behaves, I think it is kind of gross.

I think logging it at DEBUG gets you what you want--by default we don't
pollute logs, but anyone who wants to log this can enable DEBUG logging on
org.apache.kafka.clients.producer.ProducerConfig.

If we want this on by default at LinkedIn we can just set this logger to
debug in our wrapper, we don't need to inflict this on everyone.

The point is that spewing out each config IS a debug according to our
definition:
  http://kafka.apache.org/coding-guide.html

-Jay


On Mon, Feb 10, 2014 at 2:01 PM, Jun Rao <ju...@gmail.com> wrote:

> I actually prefer to see those at INFO level. The reason is that the config
> system in an application can be complex. Some configs can be overridden in
> different layers and it may not be easy to determine what the final binding
> value is. The logging in Kafka will serve as the source of truth.
>
> For reference, ZK client logs all overridden values during initialization.
> It's a one time thing during starting up, so shouldn't add much noise. It's
> very useful for debugging subtle config issues.
>
> Exposing final configs programmatically is potentially useful. If we don't
> want to log overridden values out of box, an app can achieve the same thing
> using the programming api. The only missing thing is that we won't know
> those unused property keys, which is probably less important than seeing
> the overridden values.
>
> Thanks,
>
> Jun
>
>
> On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Hey Jun,
> >
> > I think that is reasonable but would object to having it be debug
> logging?
> > I think logging out a bunch of noise during normal operation in a client
> > library is pretty ugly. Also, is there value in exposing the final
> configs
> > programmatically?
> >
> > -Jay
> >
> >
> >
> > On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > +1 on the new config. Just one comment. Currently, when initiating a
> > config
> > > (e.g. ProducerConfig), we log those overridden property values and
> unused
> > > property keys (likely due to mis-spelling). This has been very useful
> for
> > > config verification. It would be good to add similar support in the new
> > > config.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:
> > >
> > > > We touched on this a bit in previous discussions, but I wanted to
> draw
> > > out
> > > > the approach to config specifically as an item of discussion.
> > > >
> > > > The new producer and consumer use a similar key-value config approach
> > as
> > > > the existing scala clients but have different implementation code to
> > help
> > > > define these configs. The plan is to use the same approach on the
> > server,
> > > > once the new clients are complete; so if we agree on this approach it
> > > will
> > > > be the new default across the board.
> > > >
> > > > Let me split this into two parts. First I will try to motivate the
> use
> > of
> > > > key-value pairs as a configuration api. Then let me discuss the
> > mechanics
> > > > of specifying and parsing these. If we agree on the public api then
> the
> > > > public api then the implementation details are interesting as this
> will
> > > be
> > > > shared across producer, consumer, and broker and potentially some
> > tools;
> > > > but if we disagree about the api then there is no point in discussing
> > the
> > > > implementation.
> > > >
> > > > Let me explain the rationale for this. In a sense a key-value map of
> > > > configs is the worst possible API to the programmer using the
> clients.
> > > Let
> > > > me contrast the pros and cons versus a POJO and motivate why I think
> it
> > > is
> > > > still superior overall.
> > > >
> > > > Pro: An application can externalize the configuration of its kafka
> > > clients
> > > > into its own configuration. Whatever config management system the
> > client
> > > > application is using will likely support key-value pairs, so the
> client
> > > > should be able to directly pull whatever configurations are present
> and
> > > use
> > > > them in its client. This means that any configuration the client
> > supports
> > > > can be added to any application at runtime. With the pojo approach
> the
> > > > client application has to expose each pojo getter as some config
> > > parameter.
> > > > The result of many applications doing this is that the config is
> > > different
> > > > for each and it is very hard to have a standard client config shared
> > > > across. Moving config into config files allows the usual tooling
> > (version
> > > > control, review, audit, config deployments separate from code pushes,
> > > > etc.).
> > > >
> > > > Pro: Backwards and forwards compatibility. Provided we stick to our
> > java
> > > > api many internals can evolve and expose new configs. The application
> > can
> > > > support both the new and old client by just specifying a config that
> > will
> > > > be unused in the older version (and of course the reverse--we can
> > remove
> > > > obsolete configs).
> > > >
> > > > Pro: We can use a similar mechanism for both the client and the
> server.
> > > > Since most people run the server as a stand-alone process it needs a
> > > config
> > > > file.
> > > >
> > > > Pro: Systems like Samza that need to ship configs across the network
> > can
> > > > easily do so as configs have a natural serialized form. This can be
> > done
> > > > with pojos using java serialization but it is ugly and has bizare
> > failure
> > > > cases.
> > > >
> > > > Con: The IDE gives nice auto-completion for pojos.
> > > >
> > > > Con: There are some advantages to javadoc as a documentation
> mechanism
> > > for
> > > > java people.
> > > >
> > > > Basically to me this is about operability versus niceness of api and
> I
> > > > think operability is more important.
> > > >
> > > > Let me now give some details of the config support classes in
> > > > kafka.common.config and how they are intended to be used.
> > > >
> > > > The goal of this code is the following:
> > > > 1. Make specifying configs, their expected type (string, numbers,
> > lists,
> > > > etc) simple and declarative
> > > > 2. Allow for validating simple checks (numeric range checks, etc)
> > > > 3. Make the config "self-documenting". I.e. we should be able to
> write
> > > code
> > > > that generates the configuration documentation off the config def.
> > > > 4. Specify default values.
> > > > 5. Track which configs actually get used.
> > > > 6. Make it easy to get config values.
> > > >
> > > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> > > > defines the specification of the accepted configurations and
> > > AbstractConfig
> > > > is a helper class for implementing the configuration class. The
> > > difference
> > > > is kind of like the difference between a "class" and an "object":
> > > ConfigDef
> > > > is for specifying the configurations that are accepted,
> AbstractConfig
> > is
> > > > the base class for an instance of these configs.
> > > >
> > > > You can see this in action here:
> > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> > > >
> > > > (Ignore the static config names in there for now...I'm not actually
> > sure
> > > > that is the best approach).
> > > >
> > > > So the way this works is that the config specification is defined as:
> > > >
> > > >         config = new ConfigDef().define("bootstrap.brokers",
> Type.LIST,
> > > > "documentation")
> > > >
> > > >                                 .define("metadata.timeout.ms",
> > > Type.LONG,
> > > > 60 * 1000, atLeast(0), "documentation")
> > > >                                 .define("max.partition.size",
> Type.INT,
> > > > 16384, atLeast(0), "documentation")
> > > >
> > > >
> > > > This is used in a ProducerConfig class which extends AbstractConfig
> to
> > > get
> > > > access to some helper methods as well as the logic for tracking which
> > > > configs get accessed.
> > > >
> > > > Currently I have included static String variables for each of the
> > config
> > > > names in that class. However I actually think that is not very
> helpful
> > as
> > > > the javadoc for them doesn't give the constant value and requires
> > > > duplicating the documentation. To understand this point look at the
> > > javadoc
> > > > and note that the doc on the string is not the same as what we define
> > in
> > > > the ConfigDef. We could just have the javadoc for the config string
> be
> > > the
> > > > source of truth but it is actually pretty inconvient for that as it
> > > doesn't
> > > > show you the value of the constant, just the variable name (unless
> you
> > > > discover how to unhide it). That is fine for the clients, but for the
> > > > server would be very weird especially for non-java people. We could
> > > attempt
> > > > to duplicate documentation between the javadoc and the ConfigDef but
> > > given
> > > > our struggle to get well-documented config in a single place this
> seems
> > > > unwise.
> > > >
> > > > So I recommend we have a single source for documentation of these and
> > > that
> > > > that source be the website documentation on configuration that covers
> > > > clients and server and that that be generated off the config defs.
> The
> > > > javadoc on KafkaProducer will link to this table so it should be
> quite
> > > > convenient to discover. This makes things a little more typo prone,
> but
> > > > that should be easily caught by the key detection. This will also
> make
> > it
> > > > possible for us to retire configs in the future without causing
> compile
> > > > failures and add configs without having use of them break backwards
> > > > compatibility. This is useful during upgrades where you want to be
> > > > compatible with the old and new version so you can roll forwards and
> > > > backwards.
> > > >
> > > > -Jay
> > > >
> > >
> >
>

Re: Config for new clients (and server)

Posted by Jay Kreps <ja...@gmail.com>.

Yeah I am aware of how zookeeper behaves, I think it is kind of gross.

I think logging it at DEBUG gets you what you want--by default we don't
pollute logs, but anyone who wants to log this can enable DEBUG logging on
org.apache.kafka.clients.producer.ProducerConfig.

If we want this on by default at LinkedIn we can just set this logger to
debug in our wrapper, we don't need to inflict this on everyone.

The point is that spewing out each config IS a debug according to our
definition:
  http://kafka.apache.org/coding-guide.html

-Jay


On Mon, Feb 10, 2014 at 2:01 PM, Jun Rao <ju...@gmail.com> wrote:

> I actually prefer to see those at INFO level. The reason is that the config
> system in an application can be complex. Some configs can be overridden in
> different layers and it may not be easy to determine what the final binding
> value is. The logging in Kafka will serve as the source of truth.
>
> For reference, ZK client logs all overridden values during initialization.
> It's a one time thing during starting up, so shouldn't add much noise. It's
> very useful for debugging subtle config issues.
>
> Exposing final configs programmatically is potentially useful. If we don't
> want to log overridden values out of box, an app can achieve the same thing
> using the programming api. The only missing thing is that we won't know
> those unused property keys, which is probably less important than seeing
> the overridden values.
>
> Thanks,
>
> Jun
>
>
> On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Hey Jun,
> >
> > I think that is reasonable but would object to having it be debug
> logging?
> > I think logging out a bunch of noise during normal operation in a client
> > library is pretty ugly. Also, is there value in exposing the final
> configs
> > programmatically?
> >
> > -Jay
> >
> >
> >
> > On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > +1 on the new config. Just one comment. Currently, when initiating a
> > config
> > > (e.g. ProducerConfig), we log those overridden property values and
> unused
> > > property keys (likely due to mis-spelling). This has been very useful
> for
> > > config verification. It would be good to add similar support in the new
> > > config.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:
> > >
> > > > We touched on this a bit in previous discussions, but I wanted to
> draw
> > > out
> > > > the approach to config specifically as an item of discussion.
> > > >
> > > > The new producer and consumer use a similar key-value config approach
> > as
> > > > the existing scala clients but have different implementation code to
> > help
> > > > define these configs. The plan is to use the same approach on the
> > server,
> > > > once the new clients are complete; so if we agree on this approach it
> > > will
> > > > be the new default across the board.
> > > >
> > > > Let me split this into two parts. First I will try to motivate the
> use
> > of
> > > > key-value pairs as a configuration api. Then let me discuss the
> > mechanics
> > > > of specifying and parsing these. If we agree on the public api then
> the
> > > > public api then the implementation details are interesting as this
> will
> > > be
> > > > shared across producer, consumer, and broker and potentially some
> > tools;
> > > > but if we disagree about the api then there is no point in discussing
> > the
> > > > implementation.
> > > >
> > > > Let me explain the rationale for this. In a sense a key-value map of
> > > > configs is the worst possible API to the programmer using the
> clients.
> > > Let
> > > > me contrast the pros and cons versus a POJO and motivate why I think
> it
> > > is
> > > > still superior overall.
> > > >
> > > > Pro: An application can externalize the configuration of its kafka
> > > clients
> > > > into its own configuration. Whatever config management system the
> > client
> > > > application is using will likely support key-value pairs, so the
> client
> > > > should be able to directly pull whatever configurations are present
> and
> > > use
> > > > them in its client. This means that any configuration the client
> > supports
> > > > can be added to any application at runtime. With the pojo approach
> the
> > > > client application has to expose each pojo getter as some config
> > > parameter.
> > > > The result of many applications doing this is that the config is
> > > different
> > > > for each and it is very hard to have a standard client config shared
> > > > across. Moving config into config files allows the usual tooling
> > (version
> > > > control, review, audit, config deployments separate from code pushes,
> > > > etc.).
> > > >
> > > > Pro: Backwards and forwards compatibility. Provided we stick to our
> > java
> > > > api many internals can evolve and expose new configs. The application
> > can
> > > > support both the new and old client by just specifying a config that
> > will
> > > > be unused in the older version (and of course the reverse--we can
> > remove
> > > > obsolete configs).
> > > >
> > > > Pro: We can use a similar mechanism for both the client and the
> server.
> > > > Since most people run the server as a stand-alone process it needs a
> > > config
> > > > file.
> > > >
> > > > Pro: Systems like Samza that need to ship configs across the network
> > can
> > > > easily do so as configs have a natural serialized form. This can be
> > done
> > > > with pojos using java serialization but it is ugly and has bizare
> > failure
> > > > cases.
> > > >
> > > > Con: The IDE gives nice auto-completion for pojos.
> > > >
> > > > Con: There are some advantages to javadoc as a documentation
> mechanism
> > > for
> > > > java people.
> > > >
> > > > Basically to me this is about operability versus niceness of api and
> I
> > > > think operability is more important.
> > > >
> > > > Let me now give some details of the config support classes in
> > > > kafka.common.config and how they are intended to be used.
> > > >
> > > > The goal of this code is the following:
> > > > 1. Make specifying configs, their expected type (string, numbers,
> > lists,
> > > > etc) simple and declarative
> > > > 2. Allow for validating simple checks (numeric range checks, etc)
> > > > 3. Make the config "self-documenting". I.e. we should be able to
> write
> > > code
> > > > that generates the configuration documentation off the config def.
> > > > 4. Specify default values.
> > > > 5. Track which configs actually get used.
> > > > 6. Make it easy to get config values.
> > > >
> > > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> > > > defines the specification of the accepted configurations and
> > > AbstractConfig
> > > > is a helper class for implementing the configuration class. The
> > > difference
> > > > is kind of like the difference between a "class" and an "object":
> > > ConfigDef
> > > > is for specifying the configurations that are accepted,
> AbstractConfig
> > is
> > > > the base class for an instance of these configs.
> > > >
> > > > You can see this in action here:
> > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> > > >
> > > > (Ignore the static config names in there for now...I'm not actually
> > sure
> > > > that is the best approach).
> > > >
> > > > So the way this works is that the config specification is defined as:
> > > >
> > > >         config = new ConfigDef().define("bootstrap.brokers",
> Type.LIST,
> > > > "documentation")
> > > >
> > > >                                 .define("metadata.timeout.ms",
> > > Type.LONG,
> > > > 60 * 1000, atLeast(0), "documentation")
> > > >                                 .define("max.partition.size",
> Type.INT,
> > > > 16384, atLeast(0), "documentation")
> > > >
> > > >
> > > > This is used in a ProducerConfig class which extends AbstractConfig
> to
> > > get
> > > > access to some helper methods as well as the logic for tracking which
> > > > configs get accessed.
> > > >
> > > > Currently I have included static String variables for each of the
> > config
> > > > names in that class. However I actually think that is not very
> helpful
> > as
> > > > the javadoc for them doesn't give the constant value and requires
> > > > duplicating the documentation. To understand this point look at the
> > > javadoc
> > > > and note that the doc on the string is not the same as what we define
> > in
> > > > the ConfigDef. We could just have the javadoc for the config string
> be
> > > the
> > > > source of truth but it is actually pretty inconvient for that as it
> > > doesn't
> > > > show you the value of the constant, just the variable name (unless
> you
> > > > discover how to unhide it). That is fine for the clients, but for the
> > > > server would be very weird especially for non-java people. We could
> > > attempt
> > > > to duplicate documentation between the javadoc and the ConfigDef but
> > > given
> > > > our struggle to get well-documented config in a single place this
> seems
> > > > unwise.
> > > >
> > > > So I recommend we have a single source for documentation of these and
> > > that
> > > > that source be the website documentation on configuration that covers
> > > > clients and server and that that be generated off the config defs.
> The
> > > > javadoc on KafkaProducer will link to this table so it should be
> quite
> > > > convenient to discover. This makes things a little more typo prone,
> but
> > > > that should be easily caught by the key detection. This will also
> make
> > it
> > > > possible for us to retire configs in the future without causing
> compile
> > > > failures and add configs without having use of them break backwards
> > > > compatibility. This is useful during upgrades where you want to be
> > > > compatible with the old and new version so you can roll forwards and
> > > > backwards.
> > > >
> > > > -Jay
> > > >
> > >
> >
>

Re: Config for new clients (and server)

Posted by Sriram Subramanian <sr...@linkedin.com>.

+1 on Jun's suggestion.

On 2/10/14 2:01 PM, "Jun Rao" <ju...@gmail.com> wrote:

>I actually prefer to see those at INFO level. The reason is that the
>config
>system in an application can be complex. Some configs can be overridden in
>different layers and it may not be easy to determine what the final
>binding
>value is. The logging in Kafka will serve as the source of truth.
>
>For reference, ZK client logs all overridden values during initialization.
>It's a one time thing during starting up, so shouldn't add much noise.
>It's
>very useful for debugging subtle config issues.
>
>Exposing final configs programmatically is potentially useful. If we don't
>want to log overridden values out of box, an app can achieve the same
>thing
>using the programming api. The only missing thing is that we won't know
>those unused property keys, which is probably less important than seeing
>the overridden values.
>
>Thanks,
>
>Jun
>
>
>On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <ja...@gmail.com> wrote:
>
>> Hey Jun,
>>
>> I think that is reasonable but would object to having it be debug
>>logging?
>> I think logging out a bunch of noise during normal operation in a client
>> library is pretty ugly. Also, is there value in exposing the final
>>configs
>> programmatically?
>>
>> -Jay
>>
>>
>>
>> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:
>>
>> > +1 on the new config. Just one comment. Currently, when initiating a
>> config
>> > (e.g. ProducerConfig), we log those overridden property values and
>>unused
>> > property keys (likely due to mis-spelling). This has been very useful
>>for
>> > config verification. It would be good to add similar support in the
>>new
>> > config.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> >
>> > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:
>> >
>> > > We touched on this a bit in previous discussions, but I wanted to
>>draw
>> > out
>> > > the approach to config specifically as an item of discussion.
>> > >
>> > > The new producer and consumer use a similar key-value config
>>approach
>> as
>> > > the existing scala clients but have different implementation code to
>> help
>> > > define these configs. The plan is to use the same approach on the
>> server,
>> > > once the new clients are complete; so if we agree on this approach
>>it
>> > will
>> > > be the new default across the board.
>> > >
>> > > Let me split this into two parts. First I will try to motivate the
>>use
>> of
>> > > key-value pairs as a configuration api. Then let me discuss the
>> mechanics
>> > > of specifying and parsing these. If we agree on the public api then
>>the
>> > > public api then the implementation details are interesting as this
>>will
>> > be
>> > > shared across producer, consumer, and broker and potentially some
>> tools;
>> > > but if we disagree about the api then there is no point in
>>discussing
>> the
>> > > implementation.
>> > >
>> > > Let me explain the rationale for this. In a sense a key-value map of
>> > > configs is the worst possible API to the programmer using the
>>clients.
>> > Let
>> > > me contrast the pros and cons versus a POJO and motivate why I
>>think it
>> > is
>> > > still superior overall.
>> > >
>> > > Pro: An application can externalize the configuration of its kafka
>> > clients
>> > > into its own configuration. Whatever config management system the
>> client
>> > > application is using will likely support key-value pairs, so the
>>client
>> > > should be able to directly pull whatever configurations are present
>>and
>> > use
>> > > them in its client. This means that any configuration the client
>> supports
>> > > can be added to any application at runtime. With the pojo approach
>>the
>> > > client application has to expose each pojo getter as some config
>> > parameter.
>> > > The result of many applications doing this is that the config is
>> > different
>> > > for each and it is very hard to have a standard client config shared
>> > > across. Moving config into config files allows the usual tooling
>> (version
>> > > control, review, audit, config deployments separate from code
>>pushes,
>> > > etc.).
>> > >
>> > > Pro: Backwards and forwards compatibility. Provided we stick to our
>> java
>> > > api many internals can evolve and expose new configs. The
>>application
>> can
>> > > support both the new and old client by just specifying a config that
>> will
>> > > be unused in the older version (and of course the reverse--we can
>> remove
>> > > obsolete configs).
>> > >
>> > > Pro: We can use a similar mechanism for both the client and the
>>server.
>> > > Since most people run the server as a stand-alone process it needs a
>> > config
>> > > file.
>> > >
>> > > Pro: Systems like Samza that need to ship configs across the network
>> can
>> > > easily do so as configs have a natural serialized form. This can be
>> done
>> > > with pojos using java serialization but it is ugly and has bizare
>> failure
>> > > cases.
>> > >
>> > > Con: The IDE gives nice auto-completion for pojos.
>> > >
>> > > Con: There are some advantages to javadoc as a documentation
>>mechanism
>> > for
>> > > java people.
>> > >
>> > > Basically to me this is about operability versus niceness of api
>>and I
>> > > think operability is more important.
>> > >
>> > > Let me now give some details of the config support classes in
>> > > kafka.common.config and how they are intended to be used.
>> > >
>> > > The goal of this code is the following:
>> > > 1. Make specifying configs, their expected type (string, numbers,
>> lists,
>> > > etc) simple and declarative
>> > > 2. Allow for validating simple checks (numeric range checks, etc)
>> > > 3. Make the config "self-documenting". I.e. we should be able to
>>write
>> > code
>> > > that generates the configuration documentation off the config def.
>> > > 4. Specify default values.
>> > > 5. Track which configs actually get used.
>> > > 6. Make it easy to get config values.
>> > >
>> > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
>> > > defines the specification of the accepted configurations and
>> > AbstractConfig
>> > > is a helper class for implementing the configuration class. The
>> > difference
>> > > is kind of like the difference between a "class" and an "object":
>> > ConfigDef
>> > > is for specifying the configurations that are accepted,
>>AbstractConfig
>> is
>> > > the base class for an instance of these configs.
>> > >
>> > > You can see this in action here:
>> > >
>> > >
>> >
>> 
>>https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=client
>>s/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
>> > >
>> > > (Ignore the static config names in there for now...I'm not actually
>> sure
>> > > that is the best approach).
>> > >
>> > > So the way this works is that the config specification is defined
>>as:
>> > >
>> > >         config = new ConfigDef().define("bootstrap.brokers",
>>Type.LIST,
>> > > "documentation")
>> > >
>> > >                                 .define("metadata.timeout.ms",
>> > Type.LONG,
>> > > 60 * 1000, atLeast(0), "documentation")
>> > >                                 .define("max.partition.size",
>>Type.INT,
>> > > 16384, atLeast(0), "documentation")
>> > >
>> > >
>> > > This is used in a ProducerConfig class which extends AbstractConfig
>>to
>> > get
>> > > access to some helper methods as well as the logic for tracking
>>which
>> > > configs get accessed.
>> > >
>> > > Currently I have included static String variables for each of the
>> config
>> > > names in that class. However I actually think that is not very
>>helpful
>> as
>> > > the javadoc for them doesn't give the constant value and requires
>> > > duplicating the documentation. To understand this point look at the
>> > javadoc
>> > > and note that the doc on the string is not the same as what we
>>define
>> in
>> > > the ConfigDef. We could just have the javadoc for the config string
>>be
>> > the
>> > > source of truth but it is actually pretty inconvient for that as it
>> > doesn't
>> > > show you the value of the constant, just the variable name (unless
>>you
>> > > discover how to unhide it). That is fine for the clients, but for
>>the
>> > > server would be very weird especially for non-java people. We could
>> > attempt
>> > > to duplicate documentation between the javadoc and the ConfigDef but
>> > given
>> > > our struggle to get well-documented config in a single place this
>>seems
>> > > unwise.
>> > >
>> > > So I recommend we have a single source for documentation of these
>>and
>> > that
>> > > that source be the website documentation on configuration that
>>covers
>> > > clients and server and that that be generated off the config defs.
>>The
>> > > javadoc on KafkaProducer will link to this table so it should be
>>quite
>> > > convenient to discover. This makes things a little more typo prone,
>>but
>> > > that should be easily caught by the key detection. This will also
>>make
>> it
>> > > possible for us to retire configs in the future without causing
>>compile
>> > > failures and add configs without having use of them break backwards
>> > > compatibility. This is useful during upgrades where you want to be
>> > > compatible with the old and new version so you can roll forwards and
>> > > backwards.
>> > >
>> > > -Jay
>> > >
>> >
>>

Re: Config for new clients (and server)

Posted by Sriram Subramanian <sr...@linkedin.com>.

+1 on Jun's suggestion.

On 2/10/14 2:01 PM, "Jun Rao" <ju...@gmail.com> wrote:

>I actually prefer to see those at INFO level. The reason is that the
>config
>system in an application can be complex. Some configs can be overridden in
>different layers and it may not be easy to determine what the final
>binding
>value is. The logging in Kafka will serve as the source of truth.
>
>For reference, ZK client logs all overridden values during initialization.
>It's a one time thing during starting up, so shouldn't add much noise.
>It's
>very useful for debugging subtle config issues.
>
>Exposing final configs programmatically is potentially useful. If we don't
>want to log overridden values out of box, an app can achieve the same
>thing
>using the programming api. The only missing thing is that we won't know
>those unused property keys, which is probably less important than seeing
>the overridden values.
>
>Thanks,
>
>Jun
>
>
>On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <ja...@gmail.com> wrote:
>
>> Hey Jun,
>>
>> I think that is reasonable but would object to having it be debug
>>logging?
>> I think logging out a bunch of noise during normal operation in a client
>> library is pretty ugly. Also, is there value in exposing the final
>>configs
>> programmatically?
>>
>> -Jay
>>
>>
>>
>> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:
>>
>> > +1 on the new config. Just one comment. Currently, when initiating a
>> config
>> > (e.g. ProducerConfig), we log those overridden property values and
>>unused
>> > property keys (likely due to mis-spelling). This has been very useful
>>for
>> > config verification. It would be good to add similar support in the
>>new
>> > config.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> >
>> > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:
>> >
>> > > We touched on this a bit in previous discussions, but I wanted to
>>draw
>> > out
>> > > the approach to config specifically as an item of discussion.
>> > >
>> > > The new producer and consumer use a similar key-value config
>>approach
>> as
>> > > the existing scala clients but have different implementation code to
>> help
>> > > define these configs. The plan is to use the same approach on the
>> server,
>> > > once the new clients are complete; so if we agree on this approach
>>it
>> > will
>> > > be the new default across the board.
>> > >
>> > > Let me split this into two parts. First I will try to motivate the
>>use
>> of
>> > > key-value pairs as a configuration api. Then let me discuss the
>> mechanics
>> > > of specifying and parsing these. If we agree on the public api then
>>the
>> > > public api then the implementation details are interesting as this
>>will
>> > be
>> > > shared across producer, consumer, and broker and potentially some
>> tools;
>> > > but if we disagree about the api then there is no point in
>>discussing
>> the
>> > > implementation.
>> > >
>> > > Let me explain the rationale for this. In a sense a key-value map of
>> > > configs is the worst possible API to the programmer using the
>>clients.
>> > Let
>> > > me contrast the pros and cons versus a POJO and motivate why I
>>think it
>> > is
>> > > still superior overall.
>> > >
>> > > Pro: An application can externalize the configuration of its kafka
>> > clients
>> > > into its own configuration. Whatever config management system the
>> client
>> > > application is using will likely support key-value pairs, so the
>>client
>> > > should be able to directly pull whatever configurations are present
>>and
>> > use
>> > > them in its client. This means that any configuration the client
>> supports
>> > > can be added to any application at runtime. With the pojo approach
>>the
>> > > client application has to expose each pojo getter as some config
>> > parameter.
>> > > The result of many applications doing this is that the config is
>> > different
>> > > for each and it is very hard to have a standard client config shared
>> > > across. Moving config into config files allows the usual tooling
>> (version
>> > > control, review, audit, config deployments separate from code
>>pushes,
>> > > etc.).
>> > >
>> > > Pro: Backwards and forwards compatibility. Provided we stick to our
>> java
>> > > api many internals can evolve and expose new configs. The
>>application
>> can
>> > > support both the new and old client by just specifying a config that
>> will
>> > > be unused in the older version (and of course the reverse--we can
>> remove
>> > > obsolete configs).
>> > >
>> > > Pro: We can use a similar mechanism for both the client and the
>>server.
>> > > Since most people run the server as a stand-alone process it needs a
>> > config
>> > > file.
>> > >
>> > > Pro: Systems like Samza that need to ship configs across the network
>> can
>> > > easily do so as configs have a natural serialized form. This can be
>> done
>> > > with pojos using java serialization but it is ugly and has bizare
>> failure
>> > > cases.
>> > >
>> > > Con: The IDE gives nice auto-completion for pojos.
>> > >
>> > > Con: There are some advantages to javadoc as a documentation
>>mechanism
>> > for
>> > > java people.
>> > >
>> > > Basically to me this is about operability versus niceness of api
>>and I
>> > > think operability is more important.
>> > >
>> > > Let me now give some details of the config support classes in
>> > > kafka.common.config and how they are intended to be used.
>> > >
>> > > The goal of this code is the following:
>> > > 1. Make specifying configs, their expected type (string, numbers,
>> lists,
>> > > etc) simple and declarative
>> > > 2. Allow for validating simple checks (numeric range checks, etc)
>> > > 3. Make the config "self-documenting". I.e. we should be able to
>>write
>> > code
>> > > that generates the configuration documentation off the config def.
>> > > 4. Specify default values.
>> > > 5. Track which configs actually get used.
>> > > 6. Make it easy to get config values.
>> > >
>> > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
>> > > defines the specification of the accepted configurations and
>> > AbstractConfig
>> > > is a helper class for implementing the configuration class. The
>> > difference
>> > > is kind of like the difference between a "class" and an "object":
>> > ConfigDef
>> > > is for specifying the configurations that are accepted,
>>AbstractConfig
>> is
>> > > the base class for an instance of these configs.
>> > >
>> > > You can see this in action here:
>> > >
>> > >
>> >
>> 
>>https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=client
>>s/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
>> > >
>> > > (Ignore the static config names in there for now...I'm not actually
>> sure
>> > > that is the best approach).
>> > >
>> > > So the way this works is that the config specification is defined
>>as:
>> > >
>> > >         config = new ConfigDef().define("bootstrap.brokers",
>>Type.LIST,
>> > > "documentation")
>> > >
>> > >                                 .define("metadata.timeout.ms",
>> > Type.LONG,
>> > > 60 * 1000, atLeast(0), "documentation")
>> > >                                 .define("max.partition.size",
>>Type.INT,
>> > > 16384, atLeast(0), "documentation")
>> > >
>> > >
>> > > This is used in a ProducerConfig class which extends AbstractConfig
>>to
>> > get
>> > > access to some helper methods as well as the logic for tracking
>>which
>> > > configs get accessed.
>> > >
>> > > Currently I have included static String variables for each of the
>> config
>> > > names in that class. However I actually think that is not very
>>helpful
>> as
>> > > the javadoc for them doesn't give the constant value and requires
>> > > duplicating the documentation. To understand this point look at the
>> > javadoc
>> > > and note that the doc on the string is not the same as what we
>>define
>> in
>> > > the ConfigDef. We could just have the javadoc for the config string
>>be
>> > the
>> > > source of truth but it is actually pretty inconvient for that as it
>> > doesn't
>> > > show you the value of the constant, just the variable name (unless
>>you
>> > > discover how to unhide it). That is fine for the clients, but for
>>the
>> > > server would be very weird especially for non-java people. We could
>> > attempt
>> > > to duplicate documentation between the javadoc and the ConfigDef but
>> > given
>> > > our struggle to get well-documented config in a single place this
>>seems
>> > > unwise.
>> > >
>> > > So I recommend we have a single source for documentation of these
>>and
>> > that
>> > > that source be the website documentation on configuration that
>>covers
>> > > clients and server and that that be generated off the config defs.
>>The
>> > > javadoc on KafkaProducer will link to this table so it should be
>>quite
>> > > convenient to discover. This makes things a little more typo prone,
>>but
>> > > that should be easily caught by the key detection. This will also
>>make
>> it
>> > > possible for us to retire configs in the future without causing
>>compile
>> > > failures and add configs without having use of them break backwards
>> > > compatibility. This is useful during upgrades where you want to be
>> > > compatible with the old and new version so you can roll forwards and
>> > > backwards.
>> > >
>> > > -Jay
>> > >
>> >
>>

Re: Config for new clients (and server)

Posted by Jun Rao <ju...@gmail.com>.

I actually prefer to see those at INFO level. The reason is that the config
system in an application can be complex. Some configs can be overridden in
different layers and it may not be easy to determine what the final binding
value is. The logging in Kafka will serve as the source of truth.

For reference, ZK client logs all overridden values during initialization.
It's a one time thing during starting up, so shouldn't add much noise. It's
very useful for debugging subtle config issues.

Exposing final configs programmatically is potentially useful. If we don't
want to log overridden values out of box, an app can achieve the same thing
using the programming api. The only missing thing is that we won't know
those unused property keys, which is probably less important than seeing
the overridden values.

Thanks,

Jun


On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <ja...@gmail.com> wrote:

> Hey Jun,
>
> I think that is reasonable but would object to having it be debug logging?
> I think logging out a bunch of noise during normal operation in a client
> library is pretty ugly. Also, is there value in exposing the final configs
> programmatically?
>
> -Jay
>
>
>
> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > +1 on the new config. Just one comment. Currently, when initiating a
> config
> > (e.g. ProducerConfig), we log those overridden property values and unused
> > property keys (likely due to mis-spelling). This has been very useful for
> > config verification. It would be good to add similar support in the new
> > config.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > We touched on this a bit in previous discussions, but I wanted to draw
> > out
> > > the approach to config specifically as an item of discussion.
> > >
> > > The new producer and consumer use a similar key-value config approach
> as
> > > the existing scala clients but have different implementation code to
> help
> > > define these configs. The plan is to use the same approach on the
> server,
> > > once the new clients are complete; so if we agree on this approach it
> > will
> > > be the new default across the board.
> > >
> > > Let me split this into two parts. First I will try to motivate the use
> of
> > > key-value pairs as a configuration api. Then let me discuss the
> mechanics
> > > of specifying and parsing these. If we agree on the public api then the
> > > public api then the implementation details are interesting as this will
> > be
> > > shared across producer, consumer, and broker and potentially some
> tools;
> > > but if we disagree about the api then there is no point in discussing
> the
> > > implementation.
> > >
> > > Let me explain the rationale for this. In a sense a key-value map of
> > > configs is the worst possible API to the programmer using the clients.
> > Let
> > > me contrast the pros and cons versus a POJO and motivate why I think it
> > is
> > > still superior overall.
> > >
> > > Pro: An application can externalize the configuration of its kafka
> > clients
> > > into its own configuration. Whatever config management system the
> client
> > > application is using will likely support key-value pairs, so the client
> > > should be able to directly pull whatever configurations are present and
> > use
> > > them in its client. This means that any configuration the client
> supports
> > > can be added to any application at runtime. With the pojo approach the
> > > client application has to expose each pojo getter as some config
> > parameter.
> > > The result of many applications doing this is that the config is
> > different
> > > for each and it is very hard to have a standard client config shared
> > > across. Moving config into config files allows the usual tooling
> (version
> > > control, review, audit, config deployments separate from code pushes,
> > > etc.).
> > >
> > > Pro: Backwards and forwards compatibility. Provided we stick to our
> java
> > > api many internals can evolve and expose new configs. The application
> can
> > > support both the new and old client by just specifying a config that
> will
> > > be unused in the older version (and of course the reverse--we can
> remove
> > > obsolete configs).
> > >
> > > Pro: We can use a similar mechanism for both the client and the server.
> > > Since most people run the server as a stand-alone process it needs a
> > config
> > > file.
> > >
> > > Pro: Systems like Samza that need to ship configs across the network
> can
> > > easily do so as configs have a natural serialized form. This can be
> done
> > > with pojos using java serialization but it is ugly and has bizare
> failure
> > > cases.
> > >
> > > Con: The IDE gives nice auto-completion for pojos.
> > >
> > > Con: There are some advantages to javadoc as a documentation mechanism
> > for
> > > java people.
> > >
> > > Basically to me this is about operability versus niceness of api and I
> > > think operability is more important.
> > >
> > > Let me now give some details of the config support classes in
> > > kafka.common.config and how they are intended to be used.
> > >
> > > The goal of this code is the following:
> > > 1. Make specifying configs, their expected type (string, numbers,
> lists,
> > > etc) simple and declarative
> > > 2. Allow for validating simple checks (numeric range checks, etc)
> > > 3. Make the config "self-documenting". I.e. we should be able to write
> > code
> > > that generates the configuration documentation off the config def.
> > > 4. Specify default values.
> > > 5. Track which configs actually get used.
> > > 6. Make it easy to get config values.
> > >
> > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> > > defines the specification of the accepted configurations and
> > AbstractConfig
> > > is a helper class for implementing the configuration class. The
> > difference
> > > is kind of like the difference between a "class" and an "object":
> > ConfigDef
> > > is for specifying the configurations that are accepted, AbstractConfig
> is
> > > the base class for an instance of these configs.
> > >
> > > You can see this in action here:
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> > >
> > > (Ignore the static config names in there for now...I'm not actually
> sure
> > > that is the best approach).
> > >
> > > So the way this works is that the config specification is defined as:
> > >
> > >         config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> > > "documentation")
> > >
> > >                                 .define("metadata.timeout.ms",
> > Type.LONG,
> > > 60 * 1000, atLeast(0), "documentation")
> > >                                 .define("max.partition.size", Type.INT,
> > > 16384, atLeast(0), "documentation")
> > >
> > >
> > > This is used in a ProducerConfig class which extends AbstractConfig to
> > get
> > > access to some helper methods as well as the logic for tracking which
> > > configs get accessed.
> > >
> > > Currently I have included static String variables for each of the
> config
> > > names in that class. However I actually think that is not very helpful
> as
> > > the javadoc for them doesn't give the constant value and requires
> > > duplicating the documentation. To understand this point look at the
> > javadoc
> > > and note that the doc on the string is not the same as what we define
> in
> > > the ConfigDef. We could just have the javadoc for the config string be
> > the
> > > source of truth but it is actually pretty inconvient for that as it
> > doesn't
> > > show you the value of the constant, just the variable name (unless you
> > > discover how to unhide it). That is fine for the clients, but for the
> > > server would be very weird especially for non-java people. We could
> > attempt
> > > to duplicate documentation between the javadoc and the ConfigDef but
> > given
> > > our struggle to get well-documented config in a single place this seems
> > > unwise.
> > >
> > > So I recommend we have a single source for documentation of these and
> > that
> > > that source be the website documentation on configuration that covers
> > > clients and server and that that be generated off the config defs. The
> > > javadoc on KafkaProducer will link to this table so it should be quite
> > > convenient to discover. This makes things a little more typo prone, but
> > > that should be easily caught by the key detection. This will also make
> it
> > > possible for us to retire configs in the future without causing compile
> > > failures and add configs without having use of them break backwards
> > > compatibility. This is useful during upgrades where you want to be
> > > compatible with the old and new version so you can roll forwards and
> > > backwards.
> > >
> > > -Jay
> > >
> >
>

Re: Config for new clients (and server)

Posted by Jun Rao <ju...@gmail.com>.

I actually prefer to see those at INFO level. The reason is that the config
system in an application can be complex. Some configs can be overridden in
different layers and it may not be easy to determine what the final binding
value is. The logging in Kafka will serve as the source of truth.

For reference, ZK client logs all overridden values during initialization.
It's a one time thing during starting up, so shouldn't add much noise. It's
very useful for debugging subtle config issues.

Exposing final configs programmatically is potentially useful. If we don't
want to log overridden values out of box, an app can achieve the same thing
using the programming api. The only missing thing is that we won't know
those unused property keys, which is probably less important than seeing
the overridden values.

Thanks,

Jun


On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps <ja...@gmail.com> wrote:

> Hey Jun,
>
> I think that is reasonable but would object to having it be debug logging?
> I think logging out a bunch of noise during normal operation in a client
> library is pretty ugly. Also, is there value in exposing the final configs
> programmatically?
>
> -Jay
>
>
>
> On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > +1 on the new config. Just one comment. Currently, when initiating a
> config
> > (e.g. ProducerConfig), we log those overridden property values and unused
> > property keys (likely due to mis-spelling). This has been very useful for
> > config verification. It would be good to add similar support in the new
> > config.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > We touched on this a bit in previous discussions, but I wanted to draw
> > out
> > > the approach to config specifically as an item of discussion.
> > >
> > > The new producer and consumer use a similar key-value config approach
> as
> > > the existing scala clients but have different implementation code to
> help
> > > define these configs. The plan is to use the same approach on the
> server,
> > > once the new clients are complete; so if we agree on this approach it
> > will
> > > be the new default across the board.
> > >
> > > Let me split this into two parts. First I will try to motivate the use
> of
> > > key-value pairs as a configuration api. Then let me discuss the
> mechanics
> > > of specifying and parsing these. If we agree on the public api then the
> > > public api then the implementation details are interesting as this will
> > be
> > > shared across producer, consumer, and broker and potentially some
> tools;
> > > but if we disagree about the api then there is no point in discussing
> the
> > > implementation.
> > >
> > > Let me explain the rationale for this. In a sense a key-value map of
> > > configs is the worst possible API to the programmer using the clients.
> > Let
> > > me contrast the pros and cons versus a POJO and motivate why I think it
> > is
> > > still superior overall.
> > >
> > > Pro: An application can externalize the configuration of its kafka
> > clients
> > > into its own configuration. Whatever config management system the
> client
> > > application is using will likely support key-value pairs, so the client
> > > should be able to directly pull whatever configurations are present and
> > use
> > > them in its client. This means that any configuration the client
> supports
> > > can be added to any application at runtime. With the pojo approach the
> > > client application has to expose each pojo getter as some config
> > parameter.
> > > The result of many applications doing this is that the config is
> > different
> > > for each and it is very hard to have a standard client config shared
> > > across. Moving config into config files allows the usual tooling
> (version
> > > control, review, audit, config deployments separate from code pushes,
> > > etc.).
> > >
> > > Pro: Backwards and forwards compatibility. Provided we stick to our
> java
> > > api many internals can evolve and expose new configs. The application
> can
> > > support both the new and old client by just specifying a config that
> will
> > > be unused in the older version (and of course the reverse--we can
> remove
> > > obsolete configs).
> > >
> > > Pro: We can use a similar mechanism for both the client and the server.
> > > Since most people run the server as a stand-alone process it needs a
> > config
> > > file.
> > >
> > > Pro: Systems like Samza that need to ship configs across the network
> can
> > > easily do so as configs have a natural serialized form. This can be
> done
> > > with pojos using java serialization but it is ugly and has bizare
> failure
> > > cases.
> > >
> > > Con: The IDE gives nice auto-completion for pojos.
> > >
> > > Con: There are some advantages to javadoc as a documentation mechanism
> > for
> > > java people.
> > >
> > > Basically to me this is about operability versus niceness of api and I
> > > think operability is more important.
> > >
> > > Let me now give some details of the config support classes in
> > > kafka.common.config and how they are intended to be used.
> > >
> > > The goal of this code is the following:
> > > 1. Make specifying configs, their expected type (string, numbers,
> lists,
> > > etc) simple and declarative
> > > 2. Allow for validating simple checks (numeric range checks, etc)
> > > 3. Make the config "self-documenting". I.e. we should be able to write
> > code
> > > that generates the configuration documentation off the config def.
> > > 4. Specify default values.
> > > 5. Track which configs actually get used.
> > > 6. Make it easy to get config values.
> > >
> > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> > > defines the specification of the accepted configurations and
> > AbstractConfig
> > > is a helper class for implementing the configuration class. The
> > difference
> > > is kind of like the difference between a "class" and an "object":
> > ConfigDef
> > > is for specifying the configurations that are accepted, AbstractConfig
> is
> > > the base class for an instance of these configs.
> > >
> > > You can see this in action here:
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> > >
> > > (Ignore the static config names in there for now...I'm not actually
> sure
> > > that is the best approach).
> > >
> > > So the way this works is that the config specification is defined as:
> > >
> > >         config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> > > "documentation")
> > >
> > >                                 .define("metadata.timeout.ms",
> > Type.LONG,
> > > 60 * 1000, atLeast(0), "documentation")
> > >                                 .define("max.partition.size", Type.INT,
> > > 16384, atLeast(0), "documentation")
> > >
> > >
> > > This is used in a ProducerConfig class which extends AbstractConfig to
> > get
> > > access to some helper methods as well as the logic for tracking which
> > > configs get accessed.
> > >
> > > Currently I have included static String variables for each of the
> config
> > > names in that class. However I actually think that is not very helpful
> as
> > > the javadoc for them doesn't give the constant value and requires
> > > duplicating the documentation. To understand this point look at the
> > javadoc
> > > and note that the doc on the string is not the same as what we define
> in
> > > the ConfigDef. We could just have the javadoc for the config string be
> > the
> > > source of truth but it is actually pretty inconvient for that as it
> > doesn't
> > > show you the value of the constant, just the variable name (unless you
> > > discover how to unhide it). That is fine for the clients, but for the
> > > server would be very weird especially for non-java people. We could
> > attempt
> > > to duplicate documentation between the javadoc and the ConfigDef but
> > given
> > > our struggle to get well-documented config in a single place this seems
> > > unwise.
> > >
> > > So I recommend we have a single source for documentation of these and
> > that
> > > that source be the website documentation on configuration that covers
> > > clients and server and that that be generated off the config defs. The
> > > javadoc on KafkaProducer will link to this table so it should be quite
> > > convenient to discover. This makes things a little more typo prone, but
> > > that should be easily caught by the key detection. This will also make
> it
> > > possible for us to retire configs in the future without causing compile
> > > failures and add configs without having use of them break backwards
> > > compatibility. This is useful during upgrades where you want to be
> > > compatible with the old and new version so you can roll forwards and
> > > backwards.
> > >
> > > -Jay
> > >
> >
>

Re: Config for new clients (and server)

Posted by Jay Kreps <ja...@gmail.com>.

Hey Jun,

I think that is reasonable but would object to having it be debug logging?
I think logging out a bunch of noise during normal operation in a client
library is pretty ugly. Also, is there value in exposing the final configs
programmatically?

-Jay



On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:

> +1 on the new config. Just one comment. Currently, when initiating a config
> (e.g. ProducerConfig), we log those overridden property values and unused
> property keys (likely due to mis-spelling). This has been very useful for
> config verification. It would be good to add similar support in the new
> config.
>
> Thanks,
>
> Jun
>
>
> On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > We touched on this a bit in previous discussions, but I wanted to draw
> out
> > the approach to config specifically as an item of discussion.
> >
> > The new producer and consumer use a similar key-value config approach as
> > the existing scala clients but have different implementation code to help
> > define these configs. The plan is to use the same approach on the server,
> > once the new clients are complete; so if we agree on this approach it
> will
> > be the new default across the board.
> >
> > Let me split this into two parts. First I will try to motivate the use of
> > key-value pairs as a configuration api. Then let me discuss the mechanics
> > of specifying and parsing these. If we agree on the public api then the
> > public api then the implementation details are interesting as this will
> be
> > shared across producer, consumer, and broker and potentially some tools;
> > but if we disagree about the api then there is no point in discussing the
> > implementation.
> >
> > Let me explain the rationale for this. In a sense a key-value map of
> > configs is the worst possible API to the programmer using the clients.
> Let
> > me contrast the pros and cons versus a POJO and motivate why I think it
> is
> > still superior overall.
> >
> > Pro: An application can externalize the configuration of its kafka
> clients
> > into its own configuration. Whatever config management system the client
> > application is using will likely support key-value pairs, so the client
> > should be able to directly pull whatever configurations are present and
> use
> > them in its client. This means that any configuration the client supports
> > can be added to any application at runtime. With the pojo approach the
> > client application has to expose each pojo getter as some config
> parameter.
> > The result of many applications doing this is that the config is
> different
> > for each and it is very hard to have a standard client config shared
> > across. Moving config into config files allows the usual tooling (version
> > control, review, audit, config deployments separate from code pushes,
> > etc.).
> >
> > Pro: Backwards and forwards compatibility. Provided we stick to our java
> > api many internals can evolve and expose new configs. The application can
> > support both the new and old client by just specifying a config that will
> > be unused in the older version (and of course the reverse--we can remove
> > obsolete configs).
> >
> > Pro: We can use a similar mechanism for both the client and the server.
> > Since most people run the server as a stand-alone process it needs a
> config
> > file.
> >
> > Pro: Systems like Samza that need to ship configs across the network can
> > easily do so as configs have a natural serialized form. This can be done
> > with pojos using java serialization but it is ugly and has bizare failure
> > cases.
> >
> > Con: The IDE gives nice auto-completion for pojos.
> >
> > Con: There are some advantages to javadoc as a documentation mechanism
> for
> > java people.
> >
> > Basically to me this is about operability versus niceness of api and I
> > think operability is more important.
> >
> > Let me now give some details of the config support classes in
> > kafka.common.config and how they are intended to be used.
> >
> > The goal of this code is the following:
> > 1. Make specifying configs, their expected type (string, numbers, lists,
> > etc) simple and declarative
> > 2. Allow for validating simple checks (numeric range checks, etc)
> > 3. Make the config "self-documenting". I.e. we should be able to write
> code
> > that generates the configuration documentation off the config def.
> > 4. Specify default values.
> > 5. Track which configs actually get used.
> > 6. Make it easy to get config values.
> >
> > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> > defines the specification of the accepted configurations and
> AbstractConfig
> > is a helper class for implementing the configuration class. The
> difference
> > is kind of like the difference between a "class" and an "object":
> ConfigDef
> > is for specifying the configurations that are accepted, AbstractConfig is
> > the base class for an instance of these configs.
> >
> > You can see this in action here:
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> >
> > (Ignore the static config names in there for now...I'm not actually sure
> > that is the best approach).
> >
> > So the way this works is that the config specification is defined as:
> >
> >         config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> > "documentation")
> >
> >                                 .define("metadata.timeout.ms",
> Type.LONG,
> > 60 * 1000, atLeast(0), "documentation")
> >                                 .define("max.partition.size", Type.INT,
> > 16384, atLeast(0), "documentation")
> >
> >
> > This is used in a ProducerConfig class which extends AbstractConfig to
> get
> > access to some helper methods as well as the logic for tracking which
> > configs get accessed.
> >
> > Currently I have included static String variables for each of the config
> > names in that class. However I actually think that is not very helpful as
> > the javadoc for them doesn't give the constant value and requires
> > duplicating the documentation. To understand this point look at the
> javadoc
> > and note that the doc on the string is not the same as what we define in
> > the ConfigDef. We could just have the javadoc for the config string be
> the
> > source of truth but it is actually pretty inconvient for that as it
> doesn't
> > show you the value of the constant, just the variable name (unless you
> > discover how to unhide it). That is fine for the clients, but for the
> > server would be very weird especially for non-java people. We could
> attempt
> > to duplicate documentation between the javadoc and the ConfigDef but
> given
> > our struggle to get well-documented config in a single place this seems
> > unwise.
> >
> > So I recommend we have a single source for documentation of these and
> that
> > that source be the website documentation on configuration that covers
> > clients and server and that that be generated off the config defs. The
> > javadoc on KafkaProducer will link to this table so it should be quite
> > convenient to discover. This makes things a little more typo prone, but
> > that should be easily caught by the key detection. This will also make it
> > possible for us to retire configs in the future without causing compile
> > failures and add configs without having use of them break backwards
> > compatibility. This is useful during upgrades where you want to be
> > compatible with the old and new version so you can roll forwards and
> > backwards.
> >
> > -Jay
> >
>

Re: Config for new clients (and server)

Posted by Jay Kreps <ja...@gmail.com>.

Hey Jun,

I think that is reasonable but would object to having it be debug logging?
I think logging out a bunch of noise during normal operation in a client
library is pretty ugly. Also, is there value in exposing the final configs
programmatically?

-Jay



On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao <ju...@gmail.com> wrote:

> +1 on the new config. Just one comment. Currently, when initiating a config
> (e.g. ProducerConfig), we log those overridden property values and unused
> property keys (likely due to mis-spelling). This has been very useful for
> config verification. It would be good to add similar support in the new
> config.
>
> Thanks,
>
> Jun
>
>
> On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > We touched on this a bit in previous discussions, but I wanted to draw
> out
> > the approach to config specifically as an item of discussion.
> >
> > The new producer and consumer use a similar key-value config approach as
> > the existing scala clients but have different implementation code to help
> > define these configs. The plan is to use the same approach on the server,
> > once the new clients are complete; so if we agree on this approach it
> will
> > be the new default across the board.
> >
> > Let me split this into two parts. First I will try to motivate the use of
> > key-value pairs as a configuration api. Then let me discuss the mechanics
> > of specifying and parsing these. If we agree on the public api then the
> > public api then the implementation details are interesting as this will
> be
> > shared across producer, consumer, and broker and potentially some tools;
> > but if we disagree about the api then there is no point in discussing the
> > implementation.
> >
> > Let me explain the rationale for this. In a sense a key-value map of
> > configs is the worst possible API to the programmer using the clients.
> Let
> > me contrast the pros and cons versus a POJO and motivate why I think it
> is
> > still superior overall.
> >
> > Pro: An application can externalize the configuration of its kafka
> clients
> > into its own configuration. Whatever config management system the client
> > application is using will likely support key-value pairs, so the client
> > should be able to directly pull whatever configurations are present and
> use
> > them in its client. This means that any configuration the client supports
> > can be added to any application at runtime. With the pojo approach the
> > client application has to expose each pojo getter as some config
> parameter.
> > The result of many applications doing this is that the config is
> different
> > for each and it is very hard to have a standard client config shared
> > across. Moving config into config files allows the usual tooling (version
> > control, review, audit, config deployments separate from code pushes,
> > etc.).
> >
> > Pro: Backwards and forwards compatibility. Provided we stick to our java
> > api many internals can evolve and expose new configs. The application can
> > support both the new and old client by just specifying a config that will
> > be unused in the older version (and of course the reverse--we can remove
> > obsolete configs).
> >
> > Pro: We can use a similar mechanism for both the client and the server.
> > Since most people run the server as a stand-alone process it needs a
> config
> > file.
> >
> > Pro: Systems like Samza that need to ship configs across the network can
> > easily do so as configs have a natural serialized form. This can be done
> > with pojos using java serialization but it is ugly and has bizare failure
> > cases.
> >
> > Con: The IDE gives nice auto-completion for pojos.
> >
> > Con: There are some advantages to javadoc as a documentation mechanism
> for
> > java people.
> >
> > Basically to me this is about operability versus niceness of api and I
> > think operability is more important.
> >
> > Let me now give some details of the config support classes in
> > kafka.common.config and how they are intended to be used.
> >
> > The goal of this code is the following:
> > 1. Make specifying configs, their expected type (string, numbers, lists,
> > etc) simple and declarative
> > 2. Allow for validating simple checks (numeric range checks, etc)
> > 3. Make the config "self-documenting". I.e. we should be able to write
> code
> > that generates the configuration documentation off the config def.
> > 4. Specify default values.
> > 5. Track which configs actually get used.
> > 6. Make it easy to get config values.
> >
> > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> > defines the specification of the accepted configurations and
> AbstractConfig
> > is a helper class for implementing the configuration class. The
> difference
> > is kind of like the difference between a "class" and an "object":
> ConfigDef
> > is for specifying the configurations that are accepted, AbstractConfig is
> > the base class for an instance of these configs.
> >
> > You can see this in action here:
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> >
> > (Ignore the static config names in there for now...I'm not actually sure
> > that is the best approach).
> >
> > So the way this works is that the config specification is defined as:
> >
> >         config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> > "documentation")
> >
> >                                 .define("metadata.timeout.ms",
> Type.LONG,
> > 60 * 1000, atLeast(0), "documentation")
> >                                 .define("max.partition.size", Type.INT,
> > 16384, atLeast(0), "documentation")
> >
> >
> > This is used in a ProducerConfig class which extends AbstractConfig to
> get
> > access to some helper methods as well as the logic for tracking which
> > configs get accessed.
> >
> > Currently I have included static String variables for each of the config
> > names in that class. However I actually think that is not very helpful as
> > the javadoc for them doesn't give the constant value and requires
> > duplicating the documentation. To understand this point look at the
> javadoc
> > and note that the doc on the string is not the same as what we define in
> > the ConfigDef. We could just have the javadoc for the config string be
> the
> > source of truth but it is actually pretty inconvient for that as it
> doesn't
> > show you the value of the constant, just the variable name (unless you
> > discover how to unhide it). That is fine for the clients, but for the
> > server would be very weird especially for non-java people. We could
> attempt
> > to duplicate documentation between the javadoc and the ConfigDef but
> given
> > our struggle to get well-documented config in a single place this seems
> > unwise.
> >
> > So I recommend we have a single source for documentation of these and
> that
> > that source be the website documentation on configuration that covers
> > clients and server and that that be generated off the config defs. The
> > javadoc on KafkaProducer will link to this table so it should be quite
> > convenient to discover. This makes things a little more typo prone, but
> > that should be easily caught by the key detection. This will also make it
> > possible for us to retire configs in the future without causing compile
> > failures and add configs without having use of them break backwards
> > compatibility. This is useful during upgrades where you want to be
> > compatible with the old and new version so you can roll forwards and
> > backwards.
> >
> > -Jay
> >
>

Re: Config for new clients (and server)

Posted by Jun Rao <ju...@gmail.com>.

+1 on the new config. Just one comment. Currently, when initiating a config
(e.g. ProducerConfig), we log those overridden property values and unused
property keys (likely due to mis-spelling). This has been very useful for
config verification. It would be good to add similar support in the new
config.

Thanks,

Jun


On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:

> We touched on this a bit in previous discussions, but I wanted to draw out
> the approach to config specifically as an item of discussion.
>
> The new producer and consumer use a similar key-value config approach as
> the existing scala clients but have different implementation code to help
> define these configs. The plan is to use the same approach on the server,
> once the new clients are complete; so if we agree on this approach it will
> be the new default across the board.
>
> Let me split this into two parts. First I will try to motivate the use of
> key-value pairs as a configuration api. Then let me discuss the mechanics
> of specifying and parsing these. If we agree on the public api then the
> public api then the implementation details are interesting as this will be
> shared across producer, consumer, and broker and potentially some tools;
> but if we disagree about the api then there is no point in discussing the
> implementation.
>
> Let me explain the rationale for this. In a sense a key-value map of
> configs is the worst possible API to the programmer using the clients. Let
> me contrast the pros and cons versus a POJO and motivate why I think it is
> still superior overall.
>
> Pro: An application can externalize the configuration of its kafka clients
> into its own configuration. Whatever config management system the client
> application is using will likely support key-value pairs, so the client
> should be able to directly pull whatever configurations are present and use
> them in its client. This means that any configuration the client supports
> can be added to any application at runtime. With the pojo approach the
> client application has to expose each pojo getter as some config parameter.
> The result of many applications doing this is that the config is different
> for each and it is very hard to have a standard client config shared
> across. Moving config into config files allows the usual tooling (version
> control, review, audit, config deployments separate from code pushes,
> etc.).
>
> Pro: Backwards and forwards compatibility. Provided we stick to our java
> api many internals can evolve and expose new configs. The application can
> support both the new and old client by just specifying a config that will
> be unused in the older version (and of course the reverse--we can remove
> obsolete configs).
>
> Pro: We can use a similar mechanism for both the client and the server.
> Since most people run the server as a stand-alone process it needs a config
> file.
>
> Pro: Systems like Samza that need to ship configs across the network can
> easily do so as configs have a natural serialized form. This can be done
> with pojos using java serialization but it is ugly and has bizare failure
> cases.
>
> Con: The IDE gives nice auto-completion for pojos.
>
> Con: There are some advantages to javadoc as a documentation mechanism for
> java people.
>
> Basically to me this is about operability versus niceness of api and I
> think operability is more important.
>
> Let me now give some details of the config support classes in
> kafka.common.config and how they are intended to be used.
>
> The goal of this code is the following:
> 1. Make specifying configs, their expected type (string, numbers, lists,
> etc) simple and declarative
> 2. Allow for validating simple checks (numeric range checks, etc)
> 3. Make the config "self-documenting". I.e. we should be able to write code
> that generates the configuration documentation off the config def.
> 4. Specify default values.
> 5. Track which configs actually get used.
> 6. Make it easy to get config values.
>
> There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> defines the specification of the accepted configurations and AbstractConfig
> is a helper class for implementing the configuration class. The difference
> is kind of like the difference between a "class" and an "object": ConfigDef
> is for specifying the configurations that are accepted, AbstractConfig is
> the base class for an instance of these configs.
>
> You can see this in action here:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
>
> (Ignore the static config names in there for now...I'm not actually sure
> that is the best approach).
>
> So the way this works is that the config specification is defined as:
>
>         config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> "documentation")
>
>                                 .define("metadata.timeout.ms", Type.LONG,
> 60 * 1000, atLeast(0), "documentation")
>                                 .define("max.partition.size", Type.INT,
> 16384, atLeast(0), "documentation")
>
>
> This is used in a ProducerConfig class which extends AbstractConfig to get
> access to some helper methods as well as the logic for tracking which
> configs get accessed.
>
> Currently I have included static String variables for each of the config
> names in that class. However I actually think that is not very helpful as
> the javadoc for them doesn't give the constant value and requires
> duplicating the documentation. To understand this point look at the javadoc
> and note that the doc on the string is not the same as what we define in
> the ConfigDef. We could just have the javadoc for the config string be the
> source of truth but it is actually pretty inconvient for that as it doesn't
> show you the value of the constant, just the variable name (unless you
> discover how to unhide it). That is fine for the clients, but for the
> server would be very weird especially for non-java people. We could attempt
> to duplicate documentation between the javadoc and the ConfigDef but given
> our struggle to get well-documented config in a single place this seems
> unwise.
>
> So I recommend we have a single source for documentation of these and that
> that source be the website documentation on configuration that covers
> clients and server and that that be generated off the config defs. The
> javadoc on KafkaProducer will link to this table so it should be quite
> convenient to discover. This makes things a little more typo prone, but
> that should be easily caught by the key detection. This will also make it
> possible for us to retire configs in the future without causing compile
> failures and add configs without having use of them break backwards
> compatibility. This is useful during upgrades where you want to be
> compatible with the old and new version so you can roll forwards and
> backwards.
>
> -Jay
>

Re: Config for new clients (and server)

Posted by Jay Kreps <ja...@gmail.com>.

And you guys like the existing helper code for that?

-Jay


On Wed, Feb 5, 2014 at 10:17 AM, Neha Narkhede <ne...@gmail.com>wrote:

> I'm not so sure about the static config names used in the producer, but I'm
> +1 on using the key value approach for configs to ease operability.
>
> Thanks,
> Neha
>
>
> On Wed, Feb 5, 2014 at 10:10 AM, Guozhang Wang <wa...@gmail.com> wrote:
>
> > +1 for the key-value approach.
> >
> > Guozhang
> >
> >
> > On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > We touched on this a bit in previous discussions, but I wanted to draw
> > out
> > > the approach to config specifically as an item of discussion.
> > >
> > > The new producer and consumer use a similar key-value config approach
> as
> > > the existing scala clients but have different implementation code to
> help
> > > define these configs. The plan is to use the same approach on the
> server,
> > > once the new clients are complete; so if we agree on this approach it
> > will
> > > be the new default across the board.
> > >
> > > Let me split this into two parts. First I will try to motivate the use
> of
> > > key-value pairs as a configuration api. Then let me discuss the
> mechanics
> > > of specifying and parsing these. If we agree on the public api then the
> > > public api then the implementation details are interesting as this will
> > be
> > > shared across producer, consumer, and broker and potentially some
> tools;
> > > but if we disagree about the api then there is no point in discussing
> the
> > > implementation.
> > >
> > > Let me explain the rationale for this. In a sense a key-value map of
> > > configs is the worst possible API to the programmer using the clients.
> > Let
> > > me contrast the pros and cons versus a POJO and motivate why I think it
> > is
> > > still superior overall.
> > >
> > > Pro: An application can externalize the configuration of its kafka
> > clients
> > > into its own configuration. Whatever config management system the
> client
> > > application is using will likely support key-value pairs, so the client
> > > should be able to directly pull whatever configurations are present and
> > use
> > > them in its client. This means that any configuration the client
> supports
> > > can be added to any application at runtime. With the pojo approach the
> > > client application has to expose each pojo getter as some config
> > parameter.
> > > The result of many applications doing this is that the config is
> > different
> > > for each and it is very hard to have a standard client config shared
> > > across. Moving config into config files allows the usual tooling
> (version
> > > control, review, audit, config deployments separate from code pushes,
> > > etc.).
> > >
> > > Pro: Backwards and forwards compatibility. Provided we stick to our
> java
> > > api many internals can evolve and expose new configs. The application
> can
> > > support both the new and old client by just specifying a config that
> will
> > > be unused in the older version (and of course the reverse--we can
> remove
> > > obsolete configs).
> > >
> > > Pro: We can use a similar mechanism for both the client and the server.
> > > Since most people run the server as a stand-alone process it needs a
> > config
> > > file.
> > >
> > > Pro: Systems like Samza that need to ship configs across the network
> can
> > > easily do so as configs have a natural serialized form. This can be
> done
> > > with pojos using java serialization but it is ugly and has bizare
> failure
> > > cases.
> > >
> > > Con: The IDE gives nice auto-completion for pojos.
> > >
> > > Con: There are some advantages to javadoc as a documentation mechanism
> > for
> > > java people.
> > >
> > > Basically to me this is about operability versus niceness of api and I
> > > think operability is more important.
> > >
> > > Let me now give some details of the config support classes in
> > > kafka.common.config and how they are intended to be used.
> > >
> > > The goal of this code is the following:
> > > 1. Make specifying configs, their expected type (string, numbers,
> lists,
> > > etc) simple and declarative
> > > 2. Allow for validating simple checks (numeric range checks, etc)
> > > 3. Make the config "self-documenting". I.e. we should be able to write
> > code
> > > that generates the configuration documentation off the config def.
> > > 4. Specify default values.
> > > 5. Track which configs actually get used.
> > > 6. Make it easy to get config values.
> > >
> > > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> > > defines the specification of the accepted configurations and
> > AbstractConfig
> > > is a helper class for implementing the configuration class. The
> > difference
> > > is kind of like the difference between a "class" and an "object":
> > ConfigDef
> > > is for specifying the configurations that are accepted, AbstractConfig
> is
> > > the base class for an instance of these configs.
> > >
> > > You can see this in action here:
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> > >
> > > (Ignore the static config names in there for now...I'm not actually
> sure
> > > that is the best approach).
> > >
> > > So the way this works is that the config specification is defined as:
> > >
> > >         config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> > > "documentation")
> > >
> > >                                 .define("metadata.timeout.ms",
> > Type.LONG,
> > > 60 * 1000, atLeast(0), "documentation")
> > >                                 .define("max.partition.size", Type.INT,
> > > 16384, atLeast(0), "documentation")
> > >
> > >
> > > This is used in a ProducerConfig class which extends AbstractConfig to
> > get
> > > access to some helper methods as well as the logic for tracking which
> > > configs get accessed.
> > >
> > > Currently I have included static String variables for each of the
> config
> > > names in that class. However I actually think that is not very helpful
> as
> > > the javadoc for them doesn't give the constant value and requires
> > > duplicating the documentation. To understand this point look at the
> > javadoc
> > > and note that the doc on the string is not the same as what we define
> in
> > > the ConfigDef. We could just have the javadoc for the config string be
> > the
> > > source of truth but it is actually pretty inconvient for that as it
> > doesn't
> > > show you the value of the constant, just the variable name (unless you
> > > discover how to unhide it). That is fine for the clients, but for the
> > > server would be very weird especially for non-java people. We could
> > attempt
> > > to duplicate documentation between the javadoc and the ConfigDef but
> > given
> > > our struggle to get well-documented config in a single place this seems
> > > unwise.
> > >
> > > So I recommend we have a single source for documentation of these and
> > that
> > > that source be the website documentation on configuration that covers
> > > clients and server and that that be generated off the config defs. The
> > > javadoc on KafkaProducer will link to this table so it should be quite
> > > convenient to discover. This makes things a little more typo prone, but
> > > that should be easily caught by the key detection. This will also make
> it
> > > possible for us to retire configs in the future without causing compile
> > > failures and add configs without having use of them break backwards
> > > compatibility. This is useful during upgrades where you want to be
> > > compatible with the old and new version so you can roll forwards and
> > > backwards.
> > >
> > > -Jay
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: Config for new clients (and server)

Posted by Neha Narkhede <ne...@gmail.com>.

I'm not so sure about the static config names used in the producer, but I'm
+1 on using the key value approach for configs to ease operability.

Thanks,
Neha


On Wed, Feb 5, 2014 at 10:10 AM, Guozhang Wang <wa...@gmail.com> wrote:

> +1 for the key-value approach.
>
> Guozhang
>
>
> On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > We touched on this a bit in previous discussions, but I wanted to draw
> out
> > the approach to config specifically as an item of discussion.
> >
> > The new producer and consumer use a similar key-value config approach as
> > the existing scala clients but have different implementation code to help
> > define these configs. The plan is to use the same approach on the server,
> > once the new clients are complete; so if we agree on this approach it
> will
> > be the new default across the board.
> >
> > Let me split this into two parts. First I will try to motivate the use of
> > key-value pairs as a configuration api. Then let me discuss the mechanics
> > of specifying and parsing these. If we agree on the public api then the
> > public api then the implementation details are interesting as this will
> be
> > shared across producer, consumer, and broker and potentially some tools;
> > but if we disagree about the api then there is no point in discussing the
> > implementation.
> >
> > Let me explain the rationale for this. In a sense a key-value map of
> > configs is the worst possible API to the programmer using the clients.
> Let
> > me contrast the pros and cons versus a POJO and motivate why I think it
> is
> > still superior overall.
> >
> > Pro: An application can externalize the configuration of its kafka
> clients
> > into its own configuration. Whatever config management system the client
> > application is using will likely support key-value pairs, so the client
> > should be able to directly pull whatever configurations are present and
> use
> > them in its client. This means that any configuration the client supports
> > can be added to any application at runtime. With the pojo approach the
> > client application has to expose each pojo getter as some config
> parameter.
> > The result of many applications doing this is that the config is
> different
> > for each and it is very hard to have a standard client config shared
> > across. Moving config into config files allows the usual tooling (version
> > control, review, audit, config deployments separate from code pushes,
> > etc.).
> >
> > Pro: Backwards and forwards compatibility. Provided we stick to our java
> > api many internals can evolve and expose new configs. The application can
> > support both the new and old client by just specifying a config that will
> > be unused in the older version (and of course the reverse--we can remove
> > obsolete configs).
> >
> > Pro: We can use a similar mechanism for both the client and the server.
> > Since most people run the server as a stand-alone process it needs a
> config
> > file.
> >
> > Pro: Systems like Samza that need to ship configs across the network can
> > easily do so as configs have a natural serialized form. This can be done
> > with pojos using java serialization but it is ugly and has bizare failure
> > cases.
> >
> > Con: The IDE gives nice auto-completion for pojos.
> >
> > Con: There are some advantages to javadoc as a documentation mechanism
> for
> > java people.
> >
> > Basically to me this is about operability versus niceness of api and I
> > think operability is more important.
> >
> > Let me now give some details of the config support classes in
> > kafka.common.config and how they are intended to be used.
> >
> > The goal of this code is the following:
> > 1. Make specifying configs, their expected type (string, numbers, lists,
> > etc) simple and declarative
> > 2. Allow for validating simple checks (numeric range checks, etc)
> > 3. Make the config "self-documenting". I.e. we should be able to write
> code
> > that generates the configuration documentation off the config def.
> > 4. Specify default values.
> > 5. Track which configs actually get used.
> > 6. Make it easy to get config values.
> >
> > There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> > defines the specification of the accepted configurations and
> AbstractConfig
> > is a helper class for implementing the configuration class. The
> difference
> > is kind of like the difference between a "class" and an "object":
> ConfigDef
> > is for specifying the configurations that are accepted, AbstractConfig is
> > the base class for an instance of these configs.
> >
> > You can see this in action here:
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
> >
> > (Ignore the static config names in there for now...I'm not actually sure
> > that is the best approach).
> >
> > So the way this works is that the config specification is defined as:
> >
> >         config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> > "documentation")
> >
> >                                 .define("metadata.timeout.ms",
> Type.LONG,
> > 60 * 1000, atLeast(0), "documentation")
> >                                 .define("max.partition.size", Type.INT,
> > 16384, atLeast(0), "documentation")
> >
> >
> > This is used in a ProducerConfig class which extends AbstractConfig to
> get
> > access to some helper methods as well as the logic for tracking which
> > configs get accessed.
> >
> > Currently I have included static String variables for each of the config
> > names in that class. However I actually think that is not very helpful as
> > the javadoc for them doesn't give the constant value and requires
> > duplicating the documentation. To understand this point look at the
> javadoc
> > and note that the doc on the string is not the same as what we define in
> > the ConfigDef. We could just have the javadoc for the config string be
> the
> > source of truth but it is actually pretty inconvient for that as it
> doesn't
> > show you the value of the constant, just the variable name (unless you
> > discover how to unhide it). That is fine for the clients, but for the
> > server would be very weird especially for non-java people. We could
> attempt
> > to duplicate documentation between the javadoc and the ConfigDef but
> given
> > our struggle to get well-documented config in a single place this seems
> > unwise.
> >
> > So I recommend we have a single source for documentation of these and
> that
> > that source be the website documentation on configuration that covers
> > clients and server and that that be generated off the config defs. The
> > javadoc on KafkaProducer will link to this table so it should be quite
> > convenient to discover. This makes things a little more typo prone, but
> > that should be easily caught by the key detection. This will also make it
> > possible for us to retire configs in the future without causing compile
> > failures and add configs without having use of them break backwards
> > compatibility. This is useful during upgrades where you want to be
> > compatible with the old and new version so you can roll forwards and
> > backwards.
> >
> > -Jay
> >
>
>
>
> --
> -- Guozhang
>

Re: Config for new clients (and server)

Posted by Guozhang Wang <wa...@gmail.com>.

+1 for the key-value approach.

Guozhang


On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:

> We touched on this a bit in previous discussions, but I wanted to draw out
> the approach to config specifically as an item of discussion.
>
> The new producer and consumer use a similar key-value config approach as
> the existing scala clients but have different implementation code to help
> define these configs. The plan is to use the same approach on the server,
> once the new clients are complete; so if we agree on this approach it will
> be the new default across the board.
>
> Let me split this into two parts. First I will try to motivate the use of
> key-value pairs as a configuration api. Then let me discuss the mechanics
> of specifying and parsing these. If we agree on the public api then the
> public api then the implementation details are interesting as this will be
> shared across producer, consumer, and broker and potentially some tools;
> but if we disagree about the api then there is no point in discussing the
> implementation.
>
> Let me explain the rationale for this. In a sense a key-value map of
> configs is the worst possible API to the programmer using the clients. Let
> me contrast the pros and cons versus a POJO and motivate why I think it is
> still superior overall.
>
> Pro: An application can externalize the configuration of its kafka clients
> into its own configuration. Whatever config management system the client
> application is using will likely support key-value pairs, so the client
> should be able to directly pull whatever configurations are present and use
> them in its client. This means that any configuration the client supports
> can be added to any application at runtime. With the pojo approach the
> client application has to expose each pojo getter as some config parameter.
> The result of many applications doing this is that the config is different
> for each and it is very hard to have a standard client config shared
> across. Moving config into config files allows the usual tooling (version
> control, review, audit, config deployments separate from code pushes,
> etc.).
>
> Pro: Backwards and forwards compatibility. Provided we stick to our java
> api many internals can evolve and expose new configs. The application can
> support both the new and old client by just specifying a config that will
> be unused in the older version (and of course the reverse--we can remove
> obsolete configs).
>
> Pro: We can use a similar mechanism for both the client and the server.
> Since most people run the server as a stand-alone process it needs a config
> file.
>
> Pro: Systems like Samza that need to ship configs across the network can
> easily do so as configs have a natural serialized form. This can be done
> with pojos using java serialization but it is ugly and has bizare failure
> cases.
>
> Con: The IDE gives nice auto-completion for pojos.
>
> Con: There are some advantages to javadoc as a documentation mechanism for
> java people.
>
> Basically to me this is about operability versus niceness of api and I
> think operability is more important.
>
> Let me now give some details of the config support classes in
> kafka.common.config and how they are intended to be used.
>
> The goal of this code is the following:
> 1. Make specifying configs, their expected type (string, numbers, lists,
> etc) simple and declarative
> 2. Allow for validating simple checks (numeric range checks, etc)
> 3. Make the config "self-documenting". I.e. we should be able to write code
> that generates the configuration documentation off the config def.
> 4. Specify default values.
> 5. Track which configs actually get used.
> 6. Make it easy to get config values.
>
> There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> defines the specification of the accepted configurations and AbstractConfig
> is a helper class for implementing the configuration class. The difference
> is kind of like the difference between a "class" and an "object": ConfigDef
> is for specifying the configurations that are accepted, AbstractConfig is
> the base class for an instance of these configs.
>
> You can see this in action here:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
>
> (Ignore the static config names in there for now...I'm not actually sure
> that is the best approach).
>
> So the way this works is that the config specification is defined as:
>
>         config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> "documentation")
>
>                                 .define("metadata.timeout.ms", Type.LONG,
> 60 * 1000, atLeast(0), "documentation")
>                                 .define("max.partition.size", Type.INT,
> 16384, atLeast(0), "documentation")
>
>
> This is used in a ProducerConfig class which extends AbstractConfig to get
> access to some helper methods as well as the logic for tracking which
> configs get accessed.
>
> Currently I have included static String variables for each of the config
> names in that class. However I actually think that is not very helpful as
> the javadoc for them doesn't give the constant value and requires
> duplicating the documentation. To understand this point look at the javadoc
> and note that the doc on the string is not the same as what we define in
> the ConfigDef. We could just have the javadoc for the config string be the
> source of truth but it is actually pretty inconvient for that as it doesn't
> show you the value of the constant, just the variable name (unless you
> discover how to unhide it). That is fine for the clients, but for the
> server would be very weird especially for non-java people. We could attempt
> to duplicate documentation between the javadoc and the ConfigDef but given
> our struggle to get well-documented config in a single place this seems
> unwise.
>
> So I recommend we have a single source for documentation of these and that
> that source be the website documentation on configuration that covers
> clients and server and that that be generated off the config defs. The
> javadoc on KafkaProducer will link to this table so it should be quite
> convenient to discover. This makes things a little more typo prone, but
> that should be easily caught by the key detection. This will also make it
> possible for us to retire configs in the future without causing compile
> failures and add configs without having use of them break backwards
> compatibility. This is useful during upgrades where you want to be
> compatible with the old and new version so you can roll forwards and
> backwards.
>
> -Jay
>



-- 
-- Guozhang

Re: Config for new clients (and server)

Posted by Jay Kreps <ja...@gmail.com>.

Joel,

Ah, I actually don't think the internal usage is a problem for *us*. We
just use config in one place, whereas it gets set in 1000s of apps, so I am
implicitly optimizing for the application interface. I agree that we can
add getters and setters on the ProducerConfig if we like. Basically I was
just concerned about the user, the nice thing about a pojo is that when you
type
   config.set
The ide pops up a list of configs with documentation right there, which is
just very convenient.

-Jay


On Wed, Feb 5, 2014 at 5:06 PM, Joel Koshy <jj...@gmail.com> wrote:

> Overall, +1 on sticking with key-values for configs.
>
>
> > Con: The IDE gives nice auto-completion for pojos.
> >
> > Con: There are some advantages to javadoc as a documentation mechanism
> for
> > java people.
>
> Optionally, both the above cons can be addressed (to some degree) by
> wrapper config POJOs that read in the config. i.e., the client will
> provide a KV config, but then we (internally) would load that into a
> specific config POJO that will be helpful for auto-completion and
> javadocs and convenience for our internal implementation (as opposed
> to using getLong/getString, etc. which could cause runtime exceptions
> if done incorrectly). The javadoc in the pojo would need a @value link
> to the original config key string if it is to show up in the generated
> javadoc.
>
>
> > show you the value of the constant, just the variable name (unless you
> > discover how to unhide it). That is fine for the clients, but for the
>
> Figuring out a way to un-hide it would be preferable to having to keep
> the website as the single source of documentation (even if it is
> generated from the javadoc) and make the javadoc link to it. I tried,
> but was unsuccessful so unless someone knows how to do that the above
> approach is the next-best alternative.
>
> > server would be very weird especially for non-java people. We could
> attempt
> > to duplicate documentation between the javadoc and the ConfigDef but
> given
> > our struggle to get well-documented config in a single place this seems
> > unwise.
> >
> > So I recommend we have a single source for documentation of these and
> that
> > that source be the website documentation on configuration that covers
> > clients and server and that that be generated off the config defs. The
> > javadoc on KafkaProducer will link to this table so it should be quite
> > convenient to discover.
>
>

Re: Config for new clients (and server)

Posted by Jay Kreps <ja...@gmail.com>.

Joel,

Ah, I actually don't think the internal usage is a problem for *us*. We
just use config in one place, whereas it gets set in 1000s of apps, so I am
implicitly optimizing for the application interface. I agree that we can
add getters and setters on the ProducerConfig if we like. Basically I was
just concerned about the user, the nice thing about a pojo is that when you
type
   config.set
The ide pops up a list of configs with documentation right there, which is
just very convenient.

-Jay


On Wed, Feb 5, 2014 at 5:06 PM, Joel Koshy <jj...@gmail.com> wrote:

> Overall, +1 on sticking with key-values for configs.
>
>
> > Con: The IDE gives nice auto-completion for pojos.
> >
> > Con: There are some advantages to javadoc as a documentation mechanism
> for
> > java people.
>
> Optionally, both the above cons can be addressed (to some degree) by
> wrapper config POJOs that read in the config. i.e., the client will
> provide a KV config, but then we (internally) would load that into a
> specific config POJO that will be helpful for auto-completion and
> javadocs and convenience for our internal implementation (as opposed
> to using getLong/getString, etc. which could cause runtime exceptions
> if done incorrectly). The javadoc in the pojo would need a @value link
> to the original config key string if it is to show up in the generated
> javadoc.
>
>
> > show you the value of the constant, just the variable name (unless you
> > discover how to unhide it). That is fine for the clients, but for the
>
> Figuring out a way to un-hide it would be preferable to having to keep
> the website as the single source of documentation (even if it is
> generated from the javadoc) and make the javadoc link to it. I tried,
> but was unsuccessful so unless someone knows how to do that the above
> approach is the next-best alternative.
>
> > server would be very weird especially for non-java people. We could
> attempt
> > to duplicate documentation between the javadoc and the ConfigDef but
> given
> > our struggle to get well-documented config in a single place this seems
> > unwise.
> >
> > So I recommend we have a single source for documentation of these and
> that
> > that source be the website documentation on configuration that covers
> > clients and server and that that be generated off the config defs. The
> > javadoc on KafkaProducer will link to this table so it should be quite
> > convenient to discover.
>
>

Re: Config for new clients (and server)

Posted by Jay Kreps <ja...@gmail.com>.

Hey Guozhang,

You have to accept the boolean true to support convenient programmatic
configuration (the example I showed). You have to support taking a string
to allow injecting config from a config file or external application config
system. E.g. in the case where you supply a Properties instance if that
instance was read from a .properties file the value will always be a string.

-Jay


On Thu, Feb 6, 2014 at 9:51 AM, Guozhang Wang <wa...@gmail.com> wrote:

> I made a typo in my previous email, I meant to say "not allowing", i.e. if
> a config' type is Boolean, do not accept "true" as the value.
>
>
> On Thu, Feb 6, 2014 at 8:30 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Guozhang,
> >
> > In spirit I agree with you, but it is really annoying if I can't say
> >    props.put("config.a", 45)
> >    props.put("config.b", true)
> >
> > This is especially true because config will actually come from elsewhere
> > and may have the appropriate type so in reality you end up with
> >    props.put("config.a", Integer.toString(theValue))
> > and the error is usually pretty confusing because it is like
> >   "Invalid value true, not a boolean!"
> >
> > Since the type is specified I can't think of any corner case with
> confusing
> > behavior.
> >
> > -Jay
> >
> > -Jay
> >
> >
> > On Wed, Feb 5, 2014 at 5:30 PM, Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > I like the helper function in all except in parseType: is it better to
> be
> > > restrict about types, i.e. now allowing "true" if the type is really
> > > Boolean?
> > >
> > >
> > > On Wed, Feb 5, 2014 at 5:06 PM, Joel Koshy <jj...@gmail.com>
> wrote:
> > >
> > > > Overall, +1 on sticking with key-values for configs.
> > > >
> > > >
> > > > > Con: The IDE gives nice auto-completion for pojos.
> > > > >
> > > > > Con: There are some advantages to javadoc as a documentation
> > mechanism
> > > > for
> > > > > java people.
> > > >
> > > > Optionally, both the above cons can be addressed (to some degree) by
> > > > wrapper config POJOs that read in the config. i.e., the client will
> > > > provide a KV config, but then we (internally) would load that into a
> > > > specific config POJO that will be helpful for auto-completion and
> > > > javadocs and convenience for our internal implementation (as opposed
> > > > to using getLong/getString, etc. which could cause runtime exceptions
> > > > if done incorrectly). The javadoc in the pojo would need a @value
> link
> > > > to the original config key string if it is to show up in the
> generated
> > > > javadoc.
> > > >
> > > >
> > > > > show you the value of the constant, just the variable name (unless
> > you
> > > > > discover how to unhide it). That is fine for the clients, but for
> the
> > > >
> > > > Figuring out a way to un-hide it would be preferable to having to
> keep
> > > > the website as the single source of documentation (even if it is
> > > > generated from the javadoc) and make the javadoc link to it. I tried,
> > > > but was unsuccessful so unless someone knows how to do that the above
> > > > approach is the next-best alternative.
> > > >
> > > > > server would be very weird especially for non-java people. We could
> > > > attempt
> > > > > to duplicate documentation between the javadoc and the ConfigDef
> but
> > > > given
> > > > > our struggle to get well-documented config in a single place this
> > seems
> > > > > unwise.
> > > > >
> > > > > So I recommend we have a single source for documentation of these
> and
> > > > that
> > > > > that source be the website documentation on configuration that
> covers
> > > > > clients and server and that that be generated off the config defs.
> > The
> > > > > javadoc on KafkaProducer will link to this table so it should be
> > quite
> > > > > convenient to discover.
> > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: Config for new clients (and server)

Posted by Guozhang Wang <wa...@gmail.com>.

I made a typo in my previous email, I meant to say "not allowing", i.e. if
a config' type is Boolean, do not accept "true" as the value.


On Thu, Feb 6, 2014 at 8:30 AM, Jay Kreps <ja...@gmail.com> wrote:

> Guozhang,
>
> In spirit I agree with you, but it is really annoying if I can't say
>    props.put("config.a", 45)
>    props.put("config.b", true)
>
> This is especially true because config will actually come from elsewhere
> and may have the appropriate type so in reality you end up with
>    props.put("config.a", Integer.toString(theValue))
> and the error is usually pretty confusing because it is like
>   "Invalid value true, not a boolean!"
>
> Since the type is specified I can't think of any corner case with confusing
> behavior.
>
> -Jay
>
> -Jay
>
>
> On Wed, Feb 5, 2014 at 5:30 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
> > I like the helper function in all except in parseType: is it better to be
> > restrict about types, i.e. now allowing "true" if the type is really
> > Boolean?
> >
> >
> > On Wed, Feb 5, 2014 at 5:06 PM, Joel Koshy <jj...@gmail.com> wrote:
> >
> > > Overall, +1 on sticking with key-values for configs.
> > >
> > >
> > > > Con: The IDE gives nice auto-completion for pojos.
> > > >
> > > > Con: There are some advantages to javadoc as a documentation
> mechanism
> > > for
> > > > java people.
> > >
> > > Optionally, both the above cons can be addressed (to some degree) by
> > > wrapper config POJOs that read in the config. i.e., the client will
> > > provide a KV config, but then we (internally) would load that into a
> > > specific config POJO that will be helpful for auto-completion and
> > > javadocs and convenience for our internal implementation (as opposed
> > > to using getLong/getString, etc. which could cause runtime exceptions
> > > if done incorrectly). The javadoc in the pojo would need a @value link
> > > to the original config key string if it is to show up in the generated
> > > javadoc.
> > >
> > >
> > > > show you the value of the constant, just the variable name (unless
> you
> > > > discover how to unhide it). That is fine for the clients, but for the
> > >
> > > Figuring out a way to un-hide it would be preferable to having to keep
> > > the website as the single source of documentation (even if it is
> > > generated from the javadoc) and make the javadoc link to it. I tried,
> > > but was unsuccessful so unless someone knows how to do that the above
> > > approach is the next-best alternative.
> > >
> > > > server would be very weird especially for non-java people. We could
> > > attempt
> > > > to duplicate documentation between the javadoc and the ConfigDef but
> > > given
> > > > our struggle to get well-documented config in a single place this
> seems
> > > > unwise.
> > > >
> > > > So I recommend we have a single source for documentation of these and
> > > that
> > > > that source be the website documentation on configuration that covers
> > > > clients and server and that that be generated off the config defs.
> The
> > > > javadoc on KafkaProducer will link to this table so it should be
> quite
> > > > convenient to discover.
> > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: Config for new clients (and server)

Posted by Jay Kreps <ja...@gmail.com>.

Guozhang,

In spirit I agree with you, but it is really annoying if I can't say
   props.put("config.a", 45)
   props.put("config.b", true)

This is especially true because config will actually come from elsewhere
and may have the appropriate type so in reality you end up with
   props.put("config.a", Integer.toString(theValue))
and the error is usually pretty confusing because it is like
  "Invalid value true, not a boolean!"

Since the type is specified I can't think of any corner case with confusing
behavior.

-Jay

-Jay


On Wed, Feb 5, 2014 at 5:30 PM, Guozhang Wang <wa...@gmail.com> wrote:

> I like the helper function in all except in parseType: is it better to be
> restrict about types, i.e. now allowing "true" if the type is really
> Boolean?
>
>
> On Wed, Feb 5, 2014 at 5:06 PM, Joel Koshy <jj...@gmail.com> wrote:
>
> > Overall, +1 on sticking with key-values for configs.
> >
> >
> > > Con: The IDE gives nice auto-completion for pojos.
> > >
> > > Con: There are some advantages to javadoc as a documentation mechanism
> > for
> > > java people.
> >
> > Optionally, both the above cons can be addressed (to some degree) by
> > wrapper config POJOs that read in the config. i.e., the client will
> > provide a KV config, but then we (internally) would load that into a
> > specific config POJO that will be helpful for auto-completion and
> > javadocs and convenience for our internal implementation (as opposed
> > to using getLong/getString, etc. which could cause runtime exceptions
> > if done incorrectly). The javadoc in the pojo would need a @value link
> > to the original config key string if it is to show up in the generated
> > javadoc.
> >
> >
> > > show you the value of the constant, just the variable name (unless you
> > > discover how to unhide it). That is fine for the clients, but for the
> >
> > Figuring out a way to un-hide it would be preferable to having to keep
> > the website as the single source of documentation (even if it is
> > generated from the javadoc) and make the javadoc link to it. I tried,
> > but was unsuccessful so unless someone knows how to do that the above
> > approach is the next-best alternative.
> >
> > > server would be very weird especially for non-java people. We could
> > attempt
> > > to duplicate documentation between the javadoc and the ConfigDef but
> > given
> > > our struggle to get well-documented config in a single place this seems
> > > unwise.
> > >
> > > So I recommend we have a single source for documentation of these and
> > that
> > > that source be the website documentation on configuration that covers
> > > clients and server and that that be generated off the config defs. The
> > > javadoc on KafkaProducer will link to this table so it should be quite
> > > convenient to discover.
> >
> >
>
>
> --
> -- Guozhang
>

Re: Config for new clients (and server)

Posted by Guozhang Wang <wa...@gmail.com>.

I like the helper function in all except in parseType: is it better to be
restrict about types, i.e. now allowing "true" if the type is really
Boolean?


On Wed, Feb 5, 2014 at 5:06 PM, Joel Koshy <jj...@gmail.com> wrote:

> Overall, +1 on sticking with key-values for configs.
>
>
> > Con: The IDE gives nice auto-completion for pojos.
> >
> > Con: There are some advantages to javadoc as a documentation mechanism
> for
> > java people.
>
> Optionally, both the above cons can be addressed (to some degree) by
> wrapper config POJOs that read in the config. i.e., the client will
> provide a KV config, but then we (internally) would load that into a
> specific config POJO that will be helpful for auto-completion and
> javadocs and convenience for our internal implementation (as opposed
> to using getLong/getString, etc. which could cause runtime exceptions
> if done incorrectly). The javadoc in the pojo would need a @value link
> to the original config key string if it is to show up in the generated
> javadoc.
>
>
> > show you the value of the constant, just the variable name (unless you
> > discover how to unhide it). That is fine for the clients, but for the
>
> Figuring out a way to un-hide it would be preferable to having to keep
> the website as the single source of documentation (even if it is
> generated from the javadoc) and make the javadoc link to it. I tried,
> but was unsuccessful so unless someone knows how to do that the above
> approach is the next-best alternative.
>
> > server would be very weird especially for non-java people. We could
> attempt
> > to duplicate documentation between the javadoc and the ConfigDef but
> given
> > our struggle to get well-documented config in a single place this seems
> > unwise.
> >
> > So I recommend we have a single source for documentation of these and
> that
> > that source be the website documentation on configuration that covers
> > clients and server and that that be generated off the config defs. The
> > javadoc on KafkaProducer will link to this table so it should be quite
> > convenient to discover.
>
>


-- 
-- Guozhang

Re: Config for new clients (and server)

Posted by Joel Koshy <jj...@gmail.com>.

Overall, +1 on sticking with key-values for configs.


> Con: The IDE gives nice auto-completion for pojos.
> 
> Con: There are some advantages to javadoc as a documentation mechanism for
> java people.

Optionally, both the above cons can be addressed (to some degree) by
wrapper config POJOs that read in the config. i.e., the client will
provide a KV config, but then we (internally) would load that into a
specific config POJO that will be helpful for auto-completion and
javadocs and convenience for our internal implementation (as opposed
to using getLong/getString, etc. which could cause runtime exceptions
if done incorrectly). The javadoc in the pojo would need a @value link
to the original config key string if it is to show up in the generated
javadoc.


> show you the value of the constant, just the variable name (unless you
> discover how to unhide it). That is fine for the clients, but for the

Figuring out a way to un-hide it would be preferable to having to keep
the website as the single source of documentation (even if it is
generated from the javadoc) and make the javadoc link to it. I tried,
but was unsuccessful so unless someone knows how to do that the above
approach is the next-best alternative.

> server would be very weird especially for non-java people. We could attempt
> to duplicate documentation between the javadoc and the ConfigDef but given
> our struggle to get well-documented config in a single place this seems
> unwise.
> 
> So I recommend we have a single source for documentation of these and that
> that source be the website documentation on configuration that covers
> clients and server and that that be generated off the config defs. The
> javadoc on KafkaProducer will link to this table so it should be quite
> convenient to discover.

Re: Config for new clients (and server)

Posted by Jay Kreps <ja...@gmail.com>.

Hey David,

Great questions.

The Configurable interface was meant to use with plugins. Originally we had
two of these in the producer, Serializer and Partitioner but we removed
them so now it is unused. However I think we will likely have several in
the consumer so I will leave it there for now. The idea was for interfaces
where the user provides an implementation you often need some configuration
in your implementation (e.g. a range partitioner may need to fetch its list
of break points, a string serializer may need to take a configurable
charset, etc). That interface is a way to pass in the configuration when
instantiating the object.

In the existing scala code we have the same basic idea but there is no
configure method, instead we just look for a constructor that takes
properties. The problem with this is that it isn't very discoverable so no
one knows how to use it.

That is a good point that this approach does assume static configuration. I
actually prefer this because we follow a dependency injection style of
config usage which allows you to take values and hold them in final/val
variables. Getting away from this to allow any config to change dynamically
is pretty hard. I think dynamic config tends to have a lot of corner cases
and synchronization to handle correctly. Some things are very difficult to
dynamically configure and others are impossible (like socket buffers)). My
preference would just be to work on making startup/shutdown quick.

 -Jay


On Thu, Feb 6, 2014 at 11:52 AM, David Arthur <mu...@gmail.com> wrote:

> Jay, thanks for the clarification. I think I understand it a bit better
> now, and It all pretty much makes sense. See inline
>
> Also, slightly OT, but what are Configurable and AbstractConfig#getConfiguredInstance
> for? I don't see anything using them.
>
>
> On 2/6/14 11:46 AM, Jay Kreps wrote:
>
>> Hey David,
>>
>> You raise a couple points, let me give the rationale and see if you are
>> convinced.
>>
>> 1. Why not reuse commons config? This is exactly the right question for
>> application programming. We could save ourselves a few hundred lines of
>> code we would maintain by just adding a library dependency, which is
>> almost
>> always the right decision if you are building an application as the
>> application is itself the end result. But the issue here is that these
>> clients will be embedded in thousands of applications which will in turn
>> have other dependencies. So as a result any common libraries we use will
>> also be used by others and this inevitably leads to
>> dependency/incompatibility hell. As a result I really think for this kind
>> of helper code we should just do it ourselves. I think of this as kind of
>> our sacrifice for the users. :-)
>>
>> Meanwhile since we are decoupling the client and the server and since the
>> server should always be a stand alone thing we should be able to be much
>> more liberal in our dependencies there.
>>
> Yea, good point. I'd like to think that reusing existing stuff is always
> the right thing to do, but I've been through dependency hell plenty to know
> that's not the reality.
>
>
>> 2. The next question is why not use a commons-config-like approach where
>> the config object is a map/properties thing and we wrap it in a pojo that
>> provides getters that parse values. This is the approach we took in the
>> existing client and server code. The problem with this is that it is
>> impossible to determine the set of configs programmatically. The goal was
>> to have our config documentation automatically generated off the code
>> (including the type, default value, documentation, etc). This also has the
>> nice side effect that all the configs are validated up front, you don't
>> have to wait until someone calls the getter to check validity.
>>
> I can see where the declarative stuff is useful for auto-generating
> documentation, however that's another thing we'd have to build - that is of
> course if you haven't built it already ;)
>
> Loading the config eagerly also breaks the ability for config to change at
> runtime, which can be appealing for changing things like buffer sizes,
> timeouts, etc. I've been using Archaius lately, and it's pretty awesome.
> But again, it's more geared towards application development (like myself).
>
>
>> 3. This approach doesn't prohibit custom validation. All validation is
>> done
>> with a Validator, so you can plug in any validation you like. This can be
>> a
>> bit ugly for one-off code (annonymous inner class, yuck...but will get
>> better with java 8 and in scala is already good). But we can also just do
>> validation in line in the wiring class. I see the "responsibility" of
>> KafkaProducer or KafkaServer classes as turning configs into a constructed
>> assembled hierachy of objects so checking stuff there is totally legit.
>>
> Gotcha
>
>
>> 4. If the pojo wrapper you describe is just an internal convenience for
>> our
>> code (as Joel described), then I have no objection to that. But the
>> objection to a user facing pojo was what I described before...
>>
> Internal/user-facing POJO might be nice so we _do_ get compiler warnings
> when trying to get non-existent configs.
>
>
>
>> Make sense?
>>
>> -Jay
>>
>>
>> On Thu, Feb 6, 2014 at 6:27 AM, David Arthur <mu...@gmail.com> wrote:
>>
>>  The declarative approach strikes me as a bit odd. Why not put the
>>> precondition logic in a POJO wrapper? Also, why reinvent
>>> commons-configuration? It's got a nice API and tons of people use it.
>>>
>>> public class ProducerConfig extends org.apache.commons.configuration.
>>> AbstractConfiguration {
>>>
>>>    /**
>>>      * Blah, short doc @{see #getRequiredAcks}
>>>      */
>>>    public static final String REQUIRED_ACKS_CONFIG = "
>>> request.required.acks";
>>>
>>>    /**
>>>      * Blah blah, detailed docs
>>>      */
>>>    public int getRequiredAcks() {
>>>      int acks = super.getInt( REQUIRED_ACKS_CONFIG, 1);
>>>      if(acks < -1 || acks > Short.MAX_VALUE) {
>>>        throw new ConfigurationError("Config precondition failed, " +
>>> REQUIRED_ACKS_CONFIG + " must be between -1 and 32767");
>>>      }
>>>    }
>>> }
>>>
>>> After typing that all out, I can see how the declarative
>>> config-configuration saves a bit of boilerplate, but it also limits you
>>> to
>>> what preconditions you define. E.g., what if you wanted to validate a ZK
>>> string in the POJO wrapper?
>>>
>>> -David
>>>
>>>
>>> On 2/4/14 12:34 PM, Jay Kreps wrote:
>>>
>>>  We touched on this a bit in previous discussions, but I wanted to draw
>>>> out
>>>> the approach to config specifically as an item of discussion.
>>>>
>>>> The new producer and consumer use a similar key-value config approach as
>>>> the existing scala clients but have different implementation code to
>>>> help
>>>> define these configs. The plan is to use the same approach on the
>>>> server,
>>>> once the new clients are complete; so if we agree on this approach it
>>>> will
>>>> be the new default across the board.
>>>>
>>>> Let me split this into two parts. First I will try to motivate the use
>>>> of
>>>> key-value pairs as a configuration api. Then let me discuss the
>>>> mechanics
>>>> of specifying and parsing these. If we agree on the public api then the
>>>> public api then the implementation details are interesting as this will
>>>> be
>>>> shared across producer, consumer, and broker and potentially some tools;
>>>> but if we disagree about the api then there is no point in discussing
>>>> the
>>>> implementation.
>>>>
>>>> Let me explain the rationale for this. In a sense a key-value map of
>>>> configs is the worst possible API to the programmer using the clients.
>>>> Let
>>>> me contrast the pros and cons versus a POJO and motivate why I think it
>>>> is
>>>> still superior overall.
>>>>
>>>> Pro: An application can externalize the configuration of its kafka
>>>> clients
>>>> into its own configuration. Whatever config management system the client
>>>> application is using will likely support key-value pairs, so the client
>>>> should be able to directly pull whatever configurations are present and
>>>> use
>>>> them in its client. This means that any configuration the client
>>>> supports
>>>> can be added to any application at runtime. With the pojo approach the
>>>> client application has to expose each pojo getter as some config
>>>> parameter.
>>>> The result of many applications doing this is that the config is
>>>> different
>>>> for each and it is very hard to have a standard client config shared
>>>> across. Moving config into config files allows the usual tooling
>>>> (version
>>>> control, review, audit, config deployments separate from code pushes,
>>>> etc.).
>>>>
>>>> Pro: Backwards and forwards compatibility. Provided we stick to our java
>>>> api many internals can evolve and expose new configs. The application
>>>> can
>>>> support both the new and old client by just specifying a config that
>>>> will
>>>> be unused in the older version (and of course the reverse--we can remove
>>>> obsolete configs).
>>>>
>>>> Pro: We can use a similar mechanism for both the client and the server.
>>>> Since most people run the server as a stand-alone process it needs a
>>>> config
>>>> file.
>>>>
>>>> Pro: Systems like Samza that need to ship configs across the network can
>>>> easily do so as configs have a natural serialized form. This can be done
>>>> with pojos using java serialization but it is ugly and has bizare
>>>> failure
>>>> cases.
>>>>
>>>> Con: The IDE gives nice auto-completion for pojos.
>>>>
>>>> Con: There are some advantages to javadoc as a documentation mechanism
>>>> for
>>>> java people.
>>>>
>>>> Basically to me this is about operability versus niceness of api and I
>>>> think operability is more important.
>>>>
>>>> Let me now give some details of the config support classes in
>>>> kafka.common.config and how they are intended to be used.
>>>>
>>>> The goal of this code is the following:
>>>> 1. Make specifying configs, their expected type (string, numbers, lists,
>>>> etc) simple and declarative
>>>> 2. Allow for validating simple checks (numeric range checks, etc)
>>>> 3. Make the config "self-documenting". I.e. we should be able to write
>>>> code
>>>> that generates the configuration documentation off the config def.
>>>> 4. Specify default values.
>>>> 5. Track which configs actually get used.
>>>> 6. Make it easy to get config values.
>>>>
>>>> There are two classes there: ConfigDef and AbstractConfig. ConfigDef
>>>> defines the specification of the accepted configurations and
>>>> AbstractConfig
>>>> is a helper class for implementing the configuration class. The
>>>> difference
>>>> is kind of like the difference between a "class" and an "object":
>>>> ConfigDef
>>>> is for specifying the configurations that are accepted, AbstractConfig
>>>> is
>>>> the base class for an instance of these configs.
>>>>
>>>> You can see this in action here:
>>>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_
>>>> plain;f=clients/src/main/java/kafka/clients/producer/
>>>> ProducerConfig.java;hb=HEAD
>>>>
>>>> (Ignore the static config names in there for now...I'm not actually sure
>>>> that is the best approach).
>>>>
>>>> So the way this works is that the config specification is defined as:
>>>>
>>>>           config = new ConfigDef().define("bootstrap.brokers",
>>>> Type.LIST,
>>>> "documentation")
>>>>
>>>>                                   .define("metadata.timeout.ms",
>>>> Type.LONG,
>>>> 60 * 1000, atLeast(0), "documentation")
>>>>                                   .define("max.partition.size",
>>>> Type.INT,
>>>> 16384, atLeast(0), "documentation")
>>>>
>>>>
>>>> This is used in a ProducerConfig class which extends AbstractConfig to
>>>> get
>>>> access to some helper methods as well as the logic for tracking which
>>>> configs get accessed.
>>>>
>>>> Currently I have included static String variables for each of the config
>>>> names in that class. However I actually think that is not very helpful
>>>> as
>>>> the javadoc for them doesn't give the constant value and requires
>>>> duplicating the documentation. To understand this point look at the
>>>> javadoc
>>>> and note that the doc on the string is not the same as what we define in
>>>> the ConfigDef. We could just have the javadoc for the config string be
>>>> the
>>>> source of truth but it is actually pretty inconvient for that as it
>>>> doesn't
>>>> show you the value of the constant, just the variable name (unless you
>>>> discover how to unhide it). That is fine for the clients, but for the
>>>> server would be very weird especially for non-java people. We could
>>>> attempt
>>>> to duplicate documentation between the javadoc and the ConfigDef but
>>>> given
>>>> our struggle to get well-documented config in a single place this seems
>>>> unwise.
>>>>
>>>> So I recommend we have a single source for documentation of these and
>>>> that
>>>> that source be the website documentation on configuration that covers
>>>> clients and server and that that be generated off the config defs. The
>>>> javadoc on KafkaProducer will link to this table so it should be quite
>>>> convenient to discover. This makes things a little more typo prone, but
>>>> that should be easily caught by the key detection. This will also make
>>>> it
>>>> possible for us to retire configs in the future without causing compile
>>>> failures and add configs without having use of them break backwards
>>>> compatibility. This is useful during upgrades where you want to be
>>>> compatible with the old and new version so you can roll forwards and
>>>> backwards.
>>>>
>>>> -Jay
>>>>
>>>>
>>>>
>

Re: Config for new clients (and server)

Posted by David Arthur <mu...@gmail.com>.

Jay, thanks for the clarification. I think I understand it a bit better 
now, and It all pretty much makes sense. See inline

Also, slightly OT, but what are Configurable and 
AbstractConfig#getConfiguredInstance for? I don't see anything using them.

On 2/6/14 11:46 AM, Jay Kreps wrote:
> Hey David,
>
> You raise a couple points, let me give the rationale and see if you are
> convinced.
>
> 1. Why not reuse commons config? This is exactly the right question for
> application programming. We could save ourselves a few hundred lines of
> code we would maintain by just adding a library dependency, which is almost
> always the right decision if you are building an application as the
> application is itself the end result. But the issue here is that these
> clients will be embedded in thousands of applications which will in turn
> have other dependencies. So as a result any common libraries we use will
> also be used by others and this inevitably leads to
> dependency/incompatibility hell. As a result I really think for this kind
> of helper code we should just do it ourselves. I think of this as kind of
> our sacrifice for the users. :-)
>
> Meanwhile since we are decoupling the client and the server and since the
> server should always be a stand alone thing we should be able to be much
> more liberal in our dependencies there.
Yea, good point. I'd like to think that reusing existing stuff is always 
the right thing to do, but I've been through dependency hell plenty to 
know that's not the reality.
>
> 2. The next question is why not use a commons-config-like approach where
> the config object is a map/properties thing and we wrap it in a pojo that
> provides getters that parse values. This is the approach we took in the
> existing client and server code. The problem with this is that it is
> impossible to determine the set of configs programmatically. The goal was
> to have our config documentation automatically generated off the code
> (including the type, default value, documentation, etc). This also has the
> nice side effect that all the configs are validated up front, you don't
> have to wait until someone calls the getter to check validity.
I can see where the declarative stuff is useful for auto-generating 
documentation, however that's another thing we'd have to build - that is 
of course if you haven't built it already ;)

Loading the config eagerly also breaks the ability for config to change 
at runtime, which can be appealing for changing things like buffer 
sizes, timeouts, etc. I've been using Archaius lately, and it's pretty 
awesome. But again, it's more geared towards application development 
(like myself).
>
> 3. This approach doesn't prohibit custom validation. All validation is done
> with a Validator, so you can plug in any validation you like. This can be a
> bit ugly for one-off code (annonymous inner class, yuck...but will get
> better with java 8 and in scala is already good). But we can also just do
> validation in line in the wiring class. I see the "responsibility" of
> KafkaProducer or KafkaServer classes as turning configs into a constructed
> assembled hierachy of objects so checking stuff there is totally legit.
Gotcha
>
> 4. If the pojo wrapper you describe is just an internal convenience for our
> code (as Joel described), then I have no objection to that. But the
> objection to a user facing pojo was what I described before...
Internal/user-facing POJO might be nice so we _do_ get compiler warnings 
when trying to get non-existent configs.

>
> Make sense?
>
> -Jay
>
>
> On Thu, Feb 6, 2014 at 6:27 AM, David Arthur <mu...@gmail.com> wrote:
>
>> The declarative approach strikes me as a bit odd. Why not put the
>> precondition logic in a POJO wrapper? Also, why reinvent
>> commons-configuration? It's got a nice API and tons of people use it.
>>
>> public class ProducerConfig extends org.apache.commons.configuration.
>> AbstractConfiguration {
>>
>>    /**
>>      * Blah, short doc @{see #getRequiredAcks}
>>      */
>>    public static final String REQUIRED_ACKS_CONFIG = "
>> request.required.acks";
>>
>>    /**
>>      * Blah blah, detailed docs
>>      */
>>    public int getRequiredAcks() {
>>      int acks = super.getInt( REQUIRED_ACKS_CONFIG, 1);
>>      if(acks < -1 || acks > Short.MAX_VALUE) {
>>        throw new ConfigurationError("Config precondition failed, " +
>> REQUIRED_ACKS_CONFIG + " must be between -1 and 32767");
>>      }
>>    }
>> }
>>
>> After typing that all out, I can see how the declarative
>> config-configuration saves a bit of boilerplate, but it also limits you to
>> what preconditions you define. E.g., what if you wanted to validate a ZK
>> string in the POJO wrapper?
>>
>> -David
>>
>>
>> On 2/4/14 12:34 PM, Jay Kreps wrote:
>>
>>> We touched on this a bit in previous discussions, but I wanted to draw out
>>> the approach to config specifically as an item of discussion.
>>>
>>> The new producer and consumer use a similar key-value config approach as
>>> the existing scala clients but have different implementation code to help
>>> define these configs. The plan is to use the same approach on the server,
>>> once the new clients are complete; so if we agree on this approach it will
>>> be the new default across the board.
>>>
>>> Let me split this into two parts. First I will try to motivate the use of
>>> key-value pairs as a configuration api. Then let me discuss the mechanics
>>> of specifying and parsing these. If we agree on the public api then the
>>> public api then the implementation details are interesting as this will be
>>> shared across producer, consumer, and broker and potentially some tools;
>>> but if we disagree about the api then there is no point in discussing the
>>> implementation.
>>>
>>> Let me explain the rationale for this. In a sense a key-value map of
>>> configs is the worst possible API to the programmer using the clients. Let
>>> me contrast the pros and cons versus a POJO and motivate why I think it is
>>> still superior overall.
>>>
>>> Pro: An application can externalize the configuration of its kafka clients
>>> into its own configuration. Whatever config management system the client
>>> application is using will likely support key-value pairs, so the client
>>> should be able to directly pull whatever configurations are present and
>>> use
>>> them in its client. This means that any configuration the client supports
>>> can be added to any application at runtime. With the pojo approach the
>>> client application has to expose each pojo getter as some config
>>> parameter.
>>> The result of many applications doing this is that the config is different
>>> for each and it is very hard to have a standard client config shared
>>> across. Moving config into config files allows the usual tooling (version
>>> control, review, audit, config deployments separate from code pushes,
>>> etc.).
>>>
>>> Pro: Backwards and forwards compatibility. Provided we stick to our java
>>> api many internals can evolve and expose new configs. The application can
>>> support both the new and old client by just specifying a config that will
>>> be unused in the older version (and of course the reverse--we can remove
>>> obsolete configs).
>>>
>>> Pro: We can use a similar mechanism for both the client and the server.
>>> Since most people run the server as a stand-alone process it needs a
>>> config
>>> file.
>>>
>>> Pro: Systems like Samza that need to ship configs across the network can
>>> easily do so as configs have a natural serialized form. This can be done
>>> with pojos using java serialization but it is ugly and has bizare failure
>>> cases.
>>>
>>> Con: The IDE gives nice auto-completion for pojos.
>>>
>>> Con: There are some advantages to javadoc as a documentation mechanism for
>>> java people.
>>>
>>> Basically to me this is about operability versus niceness of api and I
>>> think operability is more important.
>>>
>>> Let me now give some details of the config support classes in
>>> kafka.common.config and how they are intended to be used.
>>>
>>> The goal of this code is the following:
>>> 1. Make specifying configs, their expected type (string, numbers, lists,
>>> etc) simple and declarative
>>> 2. Allow for validating simple checks (numeric range checks, etc)
>>> 3. Make the config "self-documenting". I.e. we should be able to write
>>> code
>>> that generates the configuration documentation off the config def.
>>> 4. Specify default values.
>>> 5. Track which configs actually get used.
>>> 6. Make it easy to get config values.
>>>
>>> There are two classes there: ConfigDef and AbstractConfig. ConfigDef
>>> defines the specification of the accepted configurations and
>>> AbstractConfig
>>> is a helper class for implementing the configuration class. The difference
>>> is kind of like the difference between a "class" and an "object":
>>> ConfigDef
>>> is for specifying the configurations that are accepted, AbstractConfig is
>>> the base class for an instance of these configs.
>>>
>>> You can see this in action here:
>>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_
>>> plain;f=clients/src/main/java/kafka/clients/producer/
>>> ProducerConfig.java;hb=HEAD
>>>
>>> (Ignore the static config names in there for now...I'm not actually sure
>>> that is the best approach).
>>>
>>> So the way this works is that the config specification is defined as:
>>>
>>>           config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
>>> "documentation")
>>>
>>>                                   .define("metadata.timeout.ms",
>>> Type.LONG,
>>> 60 * 1000, atLeast(0), "documentation")
>>>                                   .define("max.partition.size", Type.INT,
>>> 16384, atLeast(0), "documentation")
>>>
>>>
>>> This is used in a ProducerConfig class which extends AbstractConfig to get
>>> access to some helper methods as well as the logic for tracking which
>>> configs get accessed.
>>>
>>> Currently I have included static String variables for each of the config
>>> names in that class. However I actually think that is not very helpful as
>>> the javadoc for them doesn't give the constant value and requires
>>> duplicating the documentation. To understand this point look at the
>>> javadoc
>>> and note that the doc on the string is not the same as what we define in
>>> the ConfigDef. We could just have the javadoc for the config string be the
>>> source of truth but it is actually pretty inconvient for that as it
>>> doesn't
>>> show you the value of the constant, just the variable name (unless you
>>> discover how to unhide it). That is fine for the clients, but for the
>>> server would be very weird especially for non-java people. We could
>>> attempt
>>> to duplicate documentation between the javadoc and the ConfigDef but given
>>> our struggle to get well-documented config in a single place this seems
>>> unwise.
>>>
>>> So I recommend we have a single source for documentation of these and that
>>> that source be the website documentation on configuration that covers
>>> clients and server and that that be generated off the config defs. The
>>> javadoc on KafkaProducer will link to this table so it should be quite
>>> convenient to discover. This makes things a little more typo prone, but
>>> that should be easily caught by the key detection. This will also make it
>>> possible for us to retire configs in the future without causing compile
>>> failures and add configs without having use of them break backwards
>>> compatibility. This is useful during upgrades where you want to be
>>> compatible with the old and new version so you can roll forwards and
>>> backwards.
>>>
>>> -Jay
>>>
>>>

Re: Config for new clients (and server)

Posted by Jay Kreps <ja...@gmail.com>.

Hey David,

You raise a couple points, let me give the rationale and see if you are
convinced.

1. Why not reuse commons config? This is exactly the right question for
application programming. We could save ourselves a few hundred lines of
code we would maintain by just adding a library dependency, which is almost
always the right decision if you are building an application as the
application is itself the end result. But the issue here is that these
clients will be embedded in thousands of applications which will in turn
have other dependencies. So as a result any common libraries we use will
also be used by others and this inevitably leads to
dependency/incompatibility hell. As a result I really think for this kind
of helper code we should just do it ourselves. I think of this as kind of
our sacrifice for the users. :-)

Meanwhile since we are decoupling the client and the server and since the
server should always be a stand alone thing we should be able to be much
more liberal in our dependencies there.

2. The next question is why not use a commons-config-like approach where
the config object is a map/properties thing and we wrap it in a pojo that
provides getters that parse values. This is the approach we took in the
existing client and server code. The problem with this is that it is
impossible to determine the set of configs programmatically. The goal was
to have our config documentation automatically generated off the code
(including the type, default value, documentation, etc). This also has the
nice side effect that all the configs are validated up front, you don't
have to wait until someone calls the getter to check validity.

3. This approach doesn't prohibit custom validation. All validation is done
with a Validator, so you can plug in any validation you like. This can be a
bit ugly for one-off code (annonymous inner class, yuck...but will get
better with java 8 and in scala is already good). But we can also just do
validation in line in the wiring class. I see the "responsibility" of
KafkaProducer or KafkaServer classes as turning configs into a constructed
assembled hierachy of objects so checking stuff there is totally legit.

4. If the pojo wrapper you describe is just an internal convenience for our
code (as Joel described), then I have no objection to that. But the
objection to a user facing pojo was what I described before...

Make sense?

-Jay


On Thu, Feb 6, 2014 at 6:27 AM, David Arthur <mu...@gmail.com> wrote:

> The declarative approach strikes me as a bit odd. Why not put the
> precondition logic in a POJO wrapper? Also, why reinvent
> commons-configuration? It's got a nice API and tons of people use it.
>
> public class ProducerConfig extends org.apache.commons.configuration.
> AbstractConfiguration {
>
>   /**
>     * Blah, short doc @{see #getRequiredAcks}
>     */
>   public static final String REQUIRED_ACKS_CONFIG = "
> request.required.acks";
>
>   /**
>     * Blah blah, detailed docs
>     */
>   public int getRequiredAcks() {
>     int acks = super.getInt( REQUIRED_ACKS_CONFIG, 1);
>     if(acks < -1 || acks > Short.MAX_VALUE) {
>       throw new ConfigurationError("Config precondition failed, " +
> REQUIRED_ACKS_CONFIG + " must be between -1 and 32767");
>     }
>   }
> }
>
> After typing that all out, I can see how the declarative
> config-configuration saves a bit of boilerplate, but it also limits you to
> what preconditions you define. E.g., what if you wanted to validate a ZK
> string in the POJO wrapper?
>
> -David
>
>
> On 2/4/14 12:34 PM, Jay Kreps wrote:
>
>> We touched on this a bit in previous discussions, but I wanted to draw out
>> the approach to config specifically as an item of discussion.
>>
>> The new producer and consumer use a similar key-value config approach as
>> the existing scala clients but have different implementation code to help
>> define these configs. The plan is to use the same approach on the server,
>> once the new clients are complete; so if we agree on this approach it will
>> be the new default across the board.
>>
>> Let me split this into two parts. First I will try to motivate the use of
>> key-value pairs as a configuration api. Then let me discuss the mechanics
>> of specifying and parsing these. If we agree on the public api then the
>> public api then the implementation details are interesting as this will be
>> shared across producer, consumer, and broker and potentially some tools;
>> but if we disagree about the api then there is no point in discussing the
>> implementation.
>>
>> Let me explain the rationale for this. In a sense a key-value map of
>> configs is the worst possible API to the programmer using the clients. Let
>> me contrast the pros and cons versus a POJO and motivate why I think it is
>> still superior overall.
>>
>> Pro: An application can externalize the configuration of its kafka clients
>> into its own configuration. Whatever config management system the client
>> application is using will likely support key-value pairs, so the client
>> should be able to directly pull whatever configurations are present and
>> use
>> them in its client. This means that any configuration the client supports
>> can be added to any application at runtime. With the pojo approach the
>> client application has to expose each pojo getter as some config
>> parameter.
>> The result of many applications doing this is that the config is different
>> for each and it is very hard to have a standard client config shared
>> across. Moving config into config files allows the usual tooling (version
>> control, review, audit, config deployments separate from code pushes,
>> etc.).
>>
>> Pro: Backwards and forwards compatibility. Provided we stick to our java
>> api many internals can evolve and expose new configs. The application can
>> support both the new and old client by just specifying a config that will
>> be unused in the older version (and of course the reverse--we can remove
>> obsolete configs).
>>
>> Pro: We can use a similar mechanism for both the client and the server.
>> Since most people run the server as a stand-alone process it needs a
>> config
>> file.
>>
>> Pro: Systems like Samza that need to ship configs across the network can
>> easily do so as configs have a natural serialized form. This can be done
>> with pojos using java serialization but it is ugly and has bizare failure
>> cases.
>>
>> Con: The IDE gives nice auto-completion for pojos.
>>
>> Con: There are some advantages to javadoc as a documentation mechanism for
>> java people.
>>
>> Basically to me this is about operability versus niceness of api and I
>> think operability is more important.
>>
>> Let me now give some details of the config support classes in
>> kafka.common.config and how they are intended to be used.
>>
>> The goal of this code is the following:
>> 1. Make specifying configs, their expected type (string, numbers, lists,
>> etc) simple and declarative
>> 2. Allow for validating simple checks (numeric range checks, etc)
>> 3. Make the config "self-documenting". I.e. we should be able to write
>> code
>> that generates the configuration documentation off the config def.
>> 4. Specify default values.
>> 5. Track which configs actually get used.
>> 6. Make it easy to get config values.
>>
>> There are two classes there: ConfigDef and AbstractConfig. ConfigDef
>> defines the specification of the accepted configurations and
>> AbstractConfig
>> is a helper class for implementing the configuration class. The difference
>> is kind of like the difference between a "class" and an "object":
>> ConfigDef
>> is for specifying the configurations that are accepted, AbstractConfig is
>> the base class for an instance of these configs.
>>
>> You can see this in action here:
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_
>> plain;f=clients/src/main/java/kafka/clients/producer/
>> ProducerConfig.java;hb=HEAD
>>
>> (Ignore the static config names in there for now...I'm not actually sure
>> that is the best approach).
>>
>> So the way this works is that the config specification is defined as:
>>
>>          config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
>> "documentation")
>>
>>                                  .define("metadata.timeout.ms",
>> Type.LONG,
>> 60 * 1000, atLeast(0), "documentation")
>>                                  .define("max.partition.size", Type.INT,
>> 16384, atLeast(0), "documentation")
>>
>>
>> This is used in a ProducerConfig class which extends AbstractConfig to get
>> access to some helper methods as well as the logic for tracking which
>> configs get accessed.
>>
>> Currently I have included static String variables for each of the config
>> names in that class. However I actually think that is not very helpful as
>> the javadoc for them doesn't give the constant value and requires
>> duplicating the documentation. To understand this point look at the
>> javadoc
>> and note that the doc on the string is not the same as what we define in
>> the ConfigDef. We could just have the javadoc for the config string be the
>> source of truth but it is actually pretty inconvient for that as it
>> doesn't
>> show you the value of the constant, just the variable name (unless you
>> discover how to unhide it). That is fine for the clients, but for the
>> server would be very weird especially for non-java people. We could
>> attempt
>> to duplicate documentation between the javadoc and the ConfigDef but given
>> our struggle to get well-documented config in a single place this seems
>> unwise.
>>
>> So I recommend we have a single source for documentation of these and that
>> that source be the website documentation on configuration that covers
>> clients and server and that that be generated off the config defs. The
>> javadoc on KafkaProducer will link to this table so it should be quite
>> convenient to discover. This makes things a little more typo prone, but
>> that should be easily caught by the key detection. This will also make it
>> possible for us to retire configs in the future without causing compile
>> failures and add configs without having use of them break backwards
>> compatibility. This is useful during upgrades where you want to be
>> compatible with the old and new version so you can roll forwards and
>> backwards.
>>
>> -Jay
>>
>>
>

Re: Config for new clients (and server)

Posted by David Arthur <mu...@gmail.com>.

The declarative approach strikes me as a bit odd. Why not put the 
precondition logic in a POJO wrapper? Also, why reinvent 
commons-configuration? It's got a nice API and tons of people use it.

public class ProducerConfig extends org.apache.commons.configuration. 
AbstractConfiguration {

   /**
     * Blah, short doc @{see #getRequiredAcks}
     */
   public static final String REQUIRED_ACKS_CONFIG = " 
request.required.acks";

   /**
     * Blah blah, detailed docs
     */
   public int getRequiredAcks() {
     int acks = super.getInt( REQUIRED_ACKS_CONFIG, 1);
     if(acks < -1 || acks > Short.MAX_VALUE) {
       throw new ConfigurationError("Config precondition failed, " + 
REQUIRED_ACKS_CONFIG + " must be between -1 and 32767");
     }
   }
}

After typing that all out, I can see how the declarative 
config-configuration saves a bit of boilerplate, but it also limits you 
to what preconditions you define. E.g., what if you wanted to validate a 
ZK string in the POJO wrapper?

-David

On 2/4/14 12:34 PM, Jay Kreps wrote:
> We touched on this a bit in previous discussions, but I wanted to draw out
> the approach to config specifically as an item of discussion.
>
> The new producer and consumer use a similar key-value config approach as
> the existing scala clients but have different implementation code to help
> define these configs. The plan is to use the same approach on the server,
> once the new clients are complete; so if we agree on this approach it will
> be the new default across the board.
>
> Let me split this into two parts. First I will try to motivate the use of
> key-value pairs as a configuration api. Then let me discuss the mechanics
> of specifying and parsing these. If we agree on the public api then the
> public api then the implementation details are interesting as this will be
> shared across producer, consumer, and broker and potentially some tools;
> but if we disagree about the api then there is no point in discussing the
> implementation.
>
> Let me explain the rationale for this. In a sense a key-value map of
> configs is the worst possible API to the programmer using the clients. Let
> me contrast the pros and cons versus a POJO and motivate why I think it is
> still superior overall.
>
> Pro: An application can externalize the configuration of its kafka clients
> into its own configuration. Whatever config management system the client
> application is using will likely support key-value pairs, so the client
> should be able to directly pull whatever configurations are present and use
> them in its client. This means that any configuration the client supports
> can be added to any application at runtime. With the pojo approach the
> client application has to expose each pojo getter as some config parameter.
> The result of many applications doing this is that the config is different
> for each and it is very hard to have a standard client config shared
> across. Moving config into config files allows the usual tooling (version
> control, review, audit, config deployments separate from code pushes, etc.).
>
> Pro: Backwards and forwards compatibility. Provided we stick to our java
> api many internals can evolve and expose new configs. The application can
> support both the new and old client by just specifying a config that will
> be unused in the older version (and of course the reverse--we can remove
> obsolete configs).
>
> Pro: We can use a similar mechanism for both the client and the server.
> Since most people run the server as a stand-alone process it needs a config
> file.
>
> Pro: Systems like Samza that need to ship configs across the network can
> easily do so as configs have a natural serialized form. This can be done
> with pojos using java serialization but it is ugly and has bizare failure
> cases.
>
> Con: The IDE gives nice auto-completion for pojos.
>
> Con: There are some advantages to javadoc as a documentation mechanism for
> java people.
>
> Basically to me this is about operability versus niceness of api and I
> think operability is more important.
>
> Let me now give some details of the config support classes in
> kafka.common.config and how they are intended to be used.
>
> The goal of this code is the following:
> 1. Make specifying configs, their expected type (string, numbers, lists,
> etc) simple and declarative
> 2. Allow for validating simple checks (numeric range checks, etc)
> 3. Make the config "self-documenting". I.e. we should be able to write code
> that generates the configuration documentation off the config def.
> 4. Specify default values.
> 5. Track which configs actually get used.
> 6. Make it easy to get config values.
>
> There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> defines the specification of the accepted configurations and AbstractConfig
> is a helper class for implementing the configuration class. The difference
> is kind of like the difference between a "class" and an "object": ConfigDef
> is for specifying the configurations that are accepted, AbstractConfig is
> the base class for an instance of these configs.
>
> You can see this in action here:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
>
> (Ignore the static config names in there for now...I'm not actually sure
> that is the best approach).
>
> So the way this works is that the config specification is defined as:
>
>          config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> "documentation")
>
>                                  .define("metadata.timeout.ms", Type.LONG,
> 60 * 1000, atLeast(0), "documentation")
>                                  .define("max.partition.size", Type.INT,
> 16384, atLeast(0), "documentation")
>
>
> This is used in a ProducerConfig class which extends AbstractConfig to get
> access to some helper methods as well as the logic for tracking which
> configs get accessed.
>
> Currently I have included static String variables for each of the config
> names in that class. However I actually think that is not very helpful as
> the javadoc for them doesn't give the constant value and requires
> duplicating the documentation. To understand this point look at the javadoc
> and note that the doc on the string is not the same as what we define in
> the ConfigDef. We could just have the javadoc for the config string be the
> source of truth but it is actually pretty inconvient for that as it doesn't
> show you the value of the constant, just the variable name (unless you
> discover how to unhide it). That is fine for the clients, but for the
> server would be very weird especially for non-java people. We could attempt
> to duplicate documentation between the javadoc and the ConfigDef but given
> our struggle to get well-documented config in a single place this seems
> unwise.
>
> So I recommend we have a single source for documentation of these and that
> that source be the website documentation on configuration that covers
> clients and server and that that be generated off the config defs. The
> javadoc on KafkaProducer will link to this table so it should be quite
> convenient to discover. This makes things a little more typo prone, but
> that should be easily caught by the key detection. This will also make it
> possible for us to retire configs in the future without causing compile
> failures and add configs without having use of them break backwards
> compatibility. This is useful during upgrades where you want to be
> compatible with the old and new version so you can roll forwards and
> backwards.
>
> -Jay
>

Re: Config for new clients (and server)

Posted by Jun Rao <ju...@gmail.com>.

+1 on the new config. Just one comment. Currently, when initiating a config
(e.g. ProducerConfig), we log those overridden property values and unused
property keys (likely due to mis-spelling). This has been very useful for
config verification. It would be good to add similar support in the new
config.

Thanks,

Jun


On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps <ja...@gmail.com> wrote:

> We touched on this a bit in previous discussions, but I wanted to draw out
> the approach to config specifically as an item of discussion.
>
> The new producer and consumer use a similar key-value config approach as
> the existing scala clients but have different implementation code to help
> define these configs. The plan is to use the same approach on the server,
> once the new clients are complete; so if we agree on this approach it will
> be the new default across the board.
>
> Let me split this into two parts. First I will try to motivate the use of
> key-value pairs as a configuration api. Then let me discuss the mechanics
> of specifying and parsing these. If we agree on the public api then the
> public api then the implementation details are interesting as this will be
> shared across producer, consumer, and broker and potentially some tools;
> but if we disagree about the api then there is no point in discussing the
> implementation.
>
> Let me explain the rationale for this. In a sense a key-value map of
> configs is the worst possible API to the programmer using the clients. Let
> me contrast the pros and cons versus a POJO and motivate why I think it is
> still superior overall.
>
> Pro: An application can externalize the configuration of its kafka clients
> into its own configuration. Whatever config management system the client
> application is using will likely support key-value pairs, so the client
> should be able to directly pull whatever configurations are present and use
> them in its client. This means that any configuration the client supports
> can be added to any application at runtime. With the pojo approach the
> client application has to expose each pojo getter as some config parameter.
> The result of many applications doing this is that the config is different
> for each and it is very hard to have a standard client config shared
> across. Moving config into config files allows the usual tooling (version
> control, review, audit, config deployments separate from code pushes,
> etc.).
>
> Pro: Backwards and forwards compatibility. Provided we stick to our java
> api many internals can evolve and expose new configs. The application can
> support both the new and old client by just specifying a config that will
> be unused in the older version (and of course the reverse--we can remove
> obsolete configs).
>
> Pro: We can use a similar mechanism for both the client and the server.
> Since most people run the server as a stand-alone process it needs a config
> file.
>
> Pro: Systems like Samza that need to ship configs across the network can
> easily do so as configs have a natural serialized form. This can be done
> with pojos using java serialization but it is ugly and has bizare failure
> cases.
>
> Con: The IDE gives nice auto-completion for pojos.
>
> Con: There are some advantages to javadoc as a documentation mechanism for
> java people.
>
> Basically to me this is about operability versus niceness of api and I
> think operability is more important.
>
> Let me now give some details of the config support classes in
> kafka.common.config and how they are intended to be used.
>
> The goal of this code is the following:
> 1. Make specifying configs, their expected type (string, numbers, lists,
> etc) simple and declarative
> 2. Allow for validating simple checks (numeric range checks, etc)
> 3. Make the config "self-documenting". I.e. we should be able to write code
> that generates the configuration documentation off the config def.
> 4. Specify default values.
> 5. Track which configs actually get used.
> 6. Make it easy to get config values.
>
> There are two classes there: ConfigDef and AbstractConfig. ConfigDef
> defines the specification of the accepted configurations and AbstractConfig
> is a helper class for implementing the configuration class. The difference
> is kind of like the difference between a "class" and an "object": ConfigDef
> is for specifying the configurations that are accepted, AbstractConfig is
> the base class for an instance of these configs.
>
> You can see this in action here:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
>
> (Ignore the static config names in there for now...I'm not actually sure
> that is the best approach).
>
> So the way this works is that the config specification is defined as:
>
>         config = new ConfigDef().define("bootstrap.brokers", Type.LIST,
> "documentation")
>
>                                 .define("metadata.timeout.ms", Type.LONG,
> 60 * 1000, atLeast(0), "documentation")
>                                 .define("max.partition.size", Type.INT,
> 16384, atLeast(0), "documentation")
>
>
> This is used in a ProducerConfig class which extends AbstractConfig to get
> access to some helper methods as well as the logic for tracking which
> configs get accessed.
>
> Currently I have included static String variables for each of the config
> names in that class. However I actually think that is not very helpful as
> the javadoc for them doesn't give the constant value and requires
> duplicating the documentation. To understand this point look at the javadoc
> and note that the doc on the string is not the same as what we define in
> the ConfigDef. We could just have the javadoc for the config string be the
> source of truth but it is actually pretty inconvient for that as it doesn't
> show you the value of the constant, just the variable name (unless you
> discover how to unhide it). That is fine for the clients, but for the
> server would be very weird especially for non-java people. We could attempt
> to duplicate documentation between the javadoc and the ConfigDef but given
> our struggle to get well-documented config in a single place this seems
> unwise.
>
> So I recommend we have a single source for documentation of these and that
> that source be the website documentation on configuration that covers
> clients and server and that that be generated off the config defs. The
> javadoc on KafkaProducer will link to this table so it should be quite
> convenient to discover. This makes things a little more typo prone, but
> that should be easily caught by the key detection. This will also make it
> possible for us to retire configs in the future without causing compile
> failures and add configs without having use of them break backwards
> compatibility. This is useful during upgrades where you want to be
> compatible with the old and new version so you can roll forwards and
> backwards.
>
> -Jay
>

Re: Config for new clients (and server)

Posted by Joel Koshy <jj...@gmail.com>.

Overall, +1 on sticking with key-values for configs.


> Con: The IDE gives nice auto-completion for pojos.
> 
> Con: There are some advantages to javadoc as a documentation mechanism for
> java people.

Optionally, both the above cons can be addressed (to some degree) by
wrapper config POJOs that read in the config. i.e., the client will
provide a KV config, but then we (internally) would load that into a
specific config POJO that will be helpful for auto-completion and
javadocs and convenience for our internal implementation (as opposed
to using getLong/getString, etc. which could cause runtime exceptions
if done incorrectly). The javadoc in the pojo would need a @value link
to the original config key string if it is to show up in the generated
javadoc.


> show you the value of the constant, just the variable name (unless you
> discover how to unhide it). That is fine for the clients, but for the

Figuring out a way to un-hide it would be preferable to having to keep
the website as the single source of documentation (even if it is
generated from the javadoc) and make the javadoc link to it. I tried,
but was unsuccessful so unless someone knows how to do that the above
approach is the next-best alternative.

> server would be very weird especially for non-java people. We could attempt
> to duplicate documentation between the javadoc and the ConfigDef but given
> our struggle to get well-documented config in a single place this seems
> unwise.
> 
> So I recommend we have a single source for documentation of these and that
> that source be the website documentation on configuration that covers
> clients and server and that that be generated off the config defs. The
> javadoc on KafkaProducer will link to this table so it should be quite
> convenient to discover.