You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Jay Kreps <ja...@confluent.io> on 2015/01/10 01:29:39 UTC

[DISCUSS] Compatability and KIPs

Hey guys,

We had a bit of a compatibility slip-up in 0.8.2 with the offset commit
stuff. We caught this one before the final release so it's not too bad. But
I do think it kind of points to an area we could do better.

One piece of feedback we have gotten from going out and talking to users is
that compatibility is really, really important to them. Kafka is getting
deployed in big environments where the clients are embedded in lots of
applications and any kind of incompatibility is a huge pain for people
using it and generally makes upgrade difficult or impossible.

In practice what I think this means for development is a lot more pressure
to really think about the public interfaces we are making and try our best
to get them right. This can be hard sometimes as changes come in patches
and it is hard to follow every single rb with enough diligence to know.

Compatibility really means a couple things:
1. Protocol changes
2. Binary data format changes
3. Changes in public apis in the clients
4. Configs
5. Metric names
6. Command line tools

I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are pretty
important but not critical.

One thing this implies is that we are really going to have to do a good job
of thinking about apis and use cases. You can definitely see a number of
places in the old clients and in a couple of the protocols where enough
care was not given to thinking things through. Some of those were from long
long ago, but we should really try to avoid adding to that set because
increasingly we will have to carry around these mistakes for a long time.

Here are a few things I thought we could do that might help us get better
in this area:

1. Technically we are just in a really bad place with the protocol because
it is defined twice--once in the old scala request objects, and once in the
new protocol format for the clients. This makes changes massively painful.
The good news is that the new request definition DSL was intended to make
adding new protocol versions a lot easier and clearer. It will also make it
a lot more obvious when the protocol is changed since you will be checking
in or reviewing a change to Protocol.java. Getting the server moved over to
the new request objects and protocol definition will be a bit of a slog but
it will really help here I think.

2. We need to get some testing in place on cross-version compatibility.
This is work and no tests here will be perfect, but I suspect with some
effort we could catch a lot of things.

3. I was also thinking it might be worth it to get a little bit more formal
about the review and discussion process for things which will have impact
to these public areas to ensure we end up with something we are happy with.
Python has a PIP process (https://www.python.org/dev/peps/pep-0257/) by
which major changes are made, and it might be worth it for us to do a
similar thing. We have essentially been doing this already--major changes
almost always have an associated wiki, but I think just getting a little
more rigorous might be good. The idea would be to just call out these wikis
as official proposals and do a full Apache discuss/vote thread for these
important change. We would use these for big features (security, log
compaction, etc) as well as for small changes that introduce or change a
public api/config/etc. This is a little heavier weight, but I think it is
really just critical that we get these things right and this would be a way
to call out this kind of change so that everyone would take the time to
look at them.

Thoughts?

-Jay

Re: [DISCUSS] Compatability and KIPs

Posted by Ted Yu <yu...@gmail.com>.

For projects written in Java, there is
http://ispras.linuxbase.org/index.php/Java_API_Compliance_Checker

I searched for similar tool for Scala but haven't found one yet.

Cheers

On Sat, Jan 10, 2015 at 10:40 AM, Ashish Singh <as...@cloudera.com> wrote:

> Jay,
>
> I totally agree with paying more attention to compatibility across
> versions. Incompatibility is indeed a big cause of customers' woes. Human
> checks, stringent reviews, will help, but I think having compatibility
> tests will be more effective. +INT_MAX for compatibility tests.
>
> - Ashish
>
> On Friday, January 9, 2015, Jay Kreps <ja...@confluent.io> wrote:
>
> > Hey guys,
> >
> > We had a bit of a compatibility slip-up in 0.8.2 with the offset commit
> > stuff. We caught this one before the final release so it's not too bad.
> But
> > I do think it kind of points to an area we could do better.
> >
> > One piece of feedback we have gotten from going out and talking to users
> is
> > that compatibility is really, really important to them. Kafka is getting
> > deployed in big environments where the clients are embedded in lots of
> > applications and any kind of incompatibility is a huge pain for people
> > using it and generally makes upgrade difficult or impossible.
> >
> > In practice what I think this means for development is a lot more
> pressure
> > to really think about the public interfaces we are making and try our
> best
> > to get them right. This can be hard sometimes as changes come in patches
> > and it is hard to follow every single rb with enough diligence to know.
> >
> > Compatibility really means a couple things:
> > 1. Protocol changes
> > 2. Binary data format changes
> > 3. Changes in public apis in the clients
> > 4. Configs
> > 5. Metric names
> > 6. Command line tools
> >
> > I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are pretty
> > important but not critical.
> >
> > One thing this implies is that we are really going to have to do a good
> job
> > of thinking about apis and use cases. You can definitely see a number of
> > places in the old clients and in a couple of the protocols where enough
> > care was not given to thinking things through. Some of those were from
> long
> > long ago, but we should really try to avoid adding to that set because
> > increasingly we will have to carry around these mistakes for a long time.
> >
> > Here are a few things I thought we could do that might help us get better
> > in this area:
> >
> > 1. Technically we are just in a really bad place with the protocol
> because
> > it is defined twice--once in the old scala request objects, and once in
> the
> > new protocol format for the clients. This makes changes massively
> painful.
> > The good news is that the new request definition DSL was intended to make
> > adding new protocol versions a lot easier and clearer. It will also make
> it
> > a lot more obvious when the protocol is changed since you will be
> checking
> > in or reviewing a change to Protocol.java. Getting the server moved over
> to
> > the new request objects and protocol definition will be a bit of a slog
> but
> > it will really help here I think.
> >
> > 2. We need to get some testing in place on cross-version compatibility.
> > This is work and no tests here will be perfect, but I suspect with some
> > effort we could catch a lot of things.
> >
> > 3. I was also thinking it might be worth it to get a little bit more
> formal
> > about the review and discussion process for things which will have impact
> > to these public areas to ensure we end up with something we are happy
> with.
> > Python has a PIP process (https://www.python.org/dev/peps/pep-0257/) by
> > which major changes are made, and it might be worth it for us to do a
> > similar thing. We have essentially been doing this already--major changes
> > almost always have an associated wiki, but I think just getting a little
> > more rigorous might be good. The idea would be to just call out these
> wikis
> > as official proposals and do a full Apache discuss/vote thread for these
> > important change. We would use these for big features (security, log
> > compaction, etc) as well as for small changes that introduce or change a
> > public api/config/etc. This is a little heavier weight, but I think it is
> > really just critical that we get these things right and this would be a
> way
> > to call out this kind of change so that everyone would take the time to
> > look at them.
> >
> > Thoughts?
> >
> > -Jay
> >
>
>
> --
>
> Regards,
> Ashish
>

Re: [DISCUSS] Compatability and KIPs

Posted by Ashish Singh <as...@cloudera.com>.

Jay,

I totally agree with paying more attention to compatibility across
versions. Incompatibility is indeed a big cause of customers' woes. Human
checks, stringent reviews, will help, but I think having compatibility
tests will be more effective. +INT_MAX for compatibility tests.

- Ashish

On Friday, January 9, 2015, Jay Kreps <ja...@confluent.io> wrote:

> Hey guys,
>
> We had a bit of a compatibility slip-up in 0.8.2 with the offset commit
> stuff. We caught this one before the final release so it's not too bad. But
> I do think it kind of points to an area we could do better.
>
> One piece of feedback we have gotten from going out and talking to users is
> that compatibility is really, really important to them. Kafka is getting
> deployed in big environments where the clients are embedded in lots of
> applications and any kind of incompatibility is a huge pain for people
> using it and generally makes upgrade difficult or impossible.
>
> In practice what I think this means for development is a lot more pressure
> to really think about the public interfaces we are making and try our best
> to get them right. This can be hard sometimes as changes come in patches
> and it is hard to follow every single rb with enough diligence to know.
>
> Compatibility really means a couple things:
> 1. Protocol changes
> 2. Binary data format changes
> 3. Changes in public apis in the clients
> 4. Configs
> 5. Metric names
> 6. Command line tools
>
> I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are pretty
> important but not critical.
>
> One thing this implies is that we are really going to have to do a good job
> of thinking about apis and use cases. You can definitely see a number of
> places in the old clients and in a couple of the protocols where enough
> care was not given to thinking things through. Some of those were from long
> long ago, but we should really try to avoid adding to that set because
> increasingly we will have to carry around these mistakes for a long time.
>
> Here are a few things I thought we could do that might help us get better
> in this area:
>
> 1. Technically we are just in a really bad place with the protocol because
> it is defined twice--once in the old scala request objects, and once in the
> new protocol format for the clients. This makes changes massively painful.
> The good news is that the new request definition DSL was intended to make
> adding new protocol versions a lot easier and clearer. It will also make it
> a lot more obvious when the protocol is changed since you will be checking
> in or reviewing a change to Protocol.java. Getting the server moved over to
> the new request objects and protocol definition will be a bit of a slog but
> it will really help here I think.
>
> 2. We need to get some testing in place on cross-version compatibility.
> This is work and no tests here will be perfect, but I suspect with some
> effort we could catch a lot of things.
>
> 3. I was also thinking it might be worth it to get a little bit more formal
> about the review and discussion process for things which will have impact
> to these public areas to ensure we end up with something we are happy with.
> Python has a PIP process (https://www.python.org/dev/peps/pep-0257/) by
> which major changes are made, and it might be worth it for us to do a
> similar thing. We have essentially been doing this already--major changes
> almost always have an associated wiki, but I think just getting a little
> more rigorous might be good. The idea would be to just call out these wikis
> as official proposals and do a full Apache discuss/vote thread for these
> important change. We would use these for big features (security, log
> compaction, etc) as well as for small changes that introduce or change a
> public api/config/etc. This is a little heavier weight, but I think it is
> really just critical that we get these things right and this would be a way
> to call out this kind of change so that everyone would take the time to
> look at them.
>
> Thoughts?
>
> -Jay
>


-- 

Regards,
Ashish

Re: [DISCUSS] Compatability and KIPs

Posted by Jay Kreps <ja...@gmail.com>.

Yeah this is a good point.

I don't think we have really called that out in any way. I think ideally
the hashing should be documented and should be an official part of the
contract.

-Jay

On Wed, Jan 14, 2015 at 6:15 PM, Joel Koshy <jj...@gmail.com> wrote:

> (deviating a bit from the points in the last set of emails) - just
> wanted to ask if there was any heads-up regarding the change in
> default partitioning behavior in the new producer. Chris (cc'd)
> encountered this while upgrading to the new producer - the new
> producer uses a murmur hash by default (which I agree is the right
> thing to do btw) and the old producer uses hashCode on the original
> partitioning key object. This affects users that depend on the default
> partitioning logic. Although the new producer allows you to partition
> outside explicitly, users will need to be aware of this change when
> switching to the new producer.
>
> On Mon, Jan 12, 2015 at 06:08:14PM -0800, Jay Kreps wrote:
> > Hey Joe,
> >
> > Yeah I think a lot of those items are limitations in that document that
> we
> > should definitely fix.
> >
> > The first issue you point out is a serious one: We give the total list of
> > errors but don't list which errors can result from which APIs. This is a
> > big issue because actually no one knows and even if you know the code
> base,
> > determining that from the code is not trivial (since errors can percolate
> > from lower layers). If you are writing a client, in practice, you just
> try
> > stuff and handle the errors that you've seen and add some generic catch
> all
> > for any new errors (which is actually a good forward-compatability
> > practice). But it would be a lot easier if this kind of trial and error
> > wasn't required. Having just done the Java producer and consumer I
> > definitely felt that pain.
> >
> > The second issue I think we kind of tried to address by giving basic
> usage
> > info for things like metadata requests etc. But I think what you are
> > pointing out is that this just isn't nearly detailed enough. Ideally we
> > should give a lot of guidance on implementation options, optimizations,
> > best practices, etc. I agree with this. Especially as we start to get the
> > new consumer protocols in shape having this is really important for
> helping
> > people make use of them as there are several apis that work together. I
> > think we could expand this section of the docs a lot.
> >
> > I think it also might be a good idea to move this document out of wiki
> and
> > into the main docs. This way we can version it with releases. Currently
> > there is no way to tell which api versions are supported in which Kafka
> > version as the document is always the current state of the protocol minus
> > stuff on trunk that isn't released yet. This mostly works since in
> practice
> > if you are developing a client you should probably target the latest
> > release, but it would be better to be able to tell what was in each
> release.
> >
> > -Jay
> >
> > On Mon, Jan 12, 2015 at 5:50 PM, Joe Stein <jo...@stealth.ly> wrote:
> >
> > > Having an index for every protocol/API change (like
> > > https://www.python.org/dev/peps/pep-0257/ ) will be much better than
> the
> > > flat wire protocol doc we have now. It is impossible ( without jumping
> into
> > > code ) right now to know if an error is supported in one version of
> Kafka
> > > vs another or different messages even. Having something that is
> iterative
> > > for each change that is explicit, clear and concise for developers for
> > > client development would be wonderful. Some folks just try to keep pace
> > > with the wire protocol doc regardless and often develop the wrong
> > > functionality expected because functionality is not always part of the
> > > protocol but an expectation / extension of the producer and/or consumer
> > > layer from the project code.
> > >
> > > The "expected behavior" I think is a huge gap between the project and
> > > client implementations. When you are a Kafka user you have certain
> > > expectations when working with producers and consumers. e.g. if you
> fail a
> > > produced message the expectation is to retry X times with a Y backoff
> > > between each try. The wire protocol doc doesn't always expose these
> > > "features" that are expected behaviors and often get missed.
> Assumptions
> > > get made and in clients developed very large features take a while
> (often
> > > seen via production issues) to get found out. I think this problem
> (which
> > > is a big one IMHO) also will be better resolved with the KIP process.
> > > Client application developers can look at new features, understand the
> > > goals and expectations, develop those goals in the language/system
> required
> > > and support the byte structure(s) for a complete use case.
> > >
> > > I think child pages from
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
> > > might be a way to go.  I only suggest that because people already use
> that
> > > page now and we can keep it as a high level "here is what you do" and
> then
> > > sub link to the child pages when appropriate. I hate completely
> abandoning
> > > something that is not entirely bad but just missing some updates in
> > > different ways.  So, maybe something like that or having a committed
> > > specific part under git or svn might also make sense also.
> > >
> > > I am not really opinionated on how we implement as long as we do
> implement
> > > something for these issues.
> > >
> > > Feature and/or byte changes should bump the version number, +1
> > >
> > > /*******************************************
> > >  Joe Stein
> > >  Founder, Principal Consultant
> > >  Big Data Open Source Security LLC
> > >  http://www.stealth.ly
> > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > ********************************************/
> > >
> > > On Mon, Jan 12, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com>
> wrote:
> > >
> > > > Yeah I think this makes sense. Some of the crazy nesting will get
> better
> > > > when we move to the new protocol definition I think, but we will
> always
> > > > need some kind of if statement that branches for the different
> behavior
> > > and
> > > > this makes testing difficult.
> > > >
> > > > Probably the best thing to do would be to announce a version
> deprecated
> > > > which will have no function but will serve as a warning that it is
> going
> > > > away and then remove it some time later. This would mean including
> > > > something that notes this in the protocol docs and maybe the release
> > > notes.
> > > > We should probably just always do this for all but the latest
> version of
> > > > all apis. I think probably a year of deprecation should be sufficient
> > > prior
> > > > to removal?
> > > >
> > > > I also think we can maybe use some common sense in deciding this.
> > > Removing
> > > > older versions will always be bad for users and client developers and
> > > > always be good for Kafka committers. I think we can be more
> aggressive on
> > > > things that are not heavily used (and hence less bad for users) or
> for
> > > > which supporting multiple versions is particularly onerous.
> > > >
> > > > -Jay
> > > >
> > > > On Mon, Jan 12, 2015 at 5:02 PM, Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > > >
> > > > > +1 on version evolving with any protocol / data format /
> functionality
> > > > > changes, and I am wondering if we have a standard process of
> > > deprecating
> > > > > old versions? Today with just a couple of versions for the protocol
> > > (e.g.
> > > > > offset commit) the code on the server side is already pretty
> nested and
> > > > > complicated in order to support different version supports.
> > > > >
> > > > > On Mon, Jan 12, 2015 at 9:21 AM, Jay Kreps <ja...@confluent.io>
> wrote:
> > > > >
> > > > > > Hey Jun,
> > > > > >
> > > > > > Good points.
> > > > > >
> > > > > > I totally agree that the versioning needs to cover both format
> and
> > > > > behavior
> > > > > > if the behavior change is incompatible.
> > > > > >
> > > > > > I kind of agree about the stable/unstable stuff. What I think
> this
> > > > means
> > > > > is
> > > > > > not that we would ever evolve the protocol without changing the
> > > > version,
> > > > > > but rather that we would drop support for older versions
> quicker. On
> > > > one
> > > > > > hand that makes sense and it is often a high bar to get things
> right
> > > > the
> > > > > > first time. On the other hand I think in practice the set of
> people
> > > who
> > > > > > interact with the protocol is often different from the end user.
> So
> > > the
> > > > > > end-user experience may still be "hey my code just broke" because
> > > some
> > > > > > client they use relied on an unstable protocol unbeknownst to
> them.
> > > > But I
> > > > > > think all that means is that we should be thoughtful about
> removing
> > > > > support
> > > > > > for old protocol versions even if they were marked unstable.
> > > > > >
> > > > > > Does anyone else have feedback or thoughts on the KIP stuff?
> > > > Objections?
> > > > > > Thoughts on structure?
> > > > > >
> > > > > > -Jay
> > > > > >
> > > > > > On Mon, Jan 12, 2015 at 8:20 AM, Jun Rao <ju...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Jay,
> > > > > > >
> > > > > > > Thanks for bringing this up. Yes, we should increase the level
> of
> > > > > > awareness
> > > > > > > of compatibility.
> > > > > > >
> > > > > > > For 1 and 2, they probably should include any functional
> change.
> > > For
> > > > > > > example, even if there is no change in the binary data format,
> but
> > > > the
> > > > > > > interpretation is changed, we should consider this as a binary
> > > format
> > > > > > > change and bump up the version number.
> > > > > > >
> > > > > > > 3. Having a wider discussion on api/protocol/data changes in
> the
> > > > > mailing
> > > > > > > list seems like a good idea.
> > > > > > >
> > > > > > > 7. It might be good to also document api/protocol/data format
> that
> > > > are
> > > > > > > considered stable (or unstable). For example, in 0.8.2
> release, we
> > > > will
> > > > > > > have a few new protocols (e.g. HeartBeat) for the development
> of
> > > the
> > > > > new
> > > > > > > consumer. Those new protocols probably shouldn't be considered
> > > stable
> > > > > > until
> > > > > > > the new consumer is more fully developed.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jan 9, 2015 at 4:29 PM, Jay Kreps <ja...@confluent.io>
> > > wrote:
> > > > > > >
> > > > > > > > Hey guys,
> > > > > > > >
> > > > > > > > We had a bit of a compatibility slip-up in 0.8.2 with the
> offset
> > > > > commit
> > > > > > > > stuff. We caught this one before the final release so it's
> not
> > > too
> > > > > bad.
> > > > > > > But
> > > > > > > > I do think it kind of points to an area we could do better.
> > > > > > > >
> > > > > > > > One piece of feedback we have gotten from going out and
> talking
> > > to
> > > > > > users
> > > > > > > is
> > > > > > > > that compatibility is really, really important to them.
> Kafka is
> > > > > > getting
> > > > > > > > deployed in big environments where the clients are embedded
> in
> > > lots
> > > > > of
> > > > > > > > applications and any kind of incompatibility is a huge pain
> for
> > > > > people
> > > > > > > > using it and generally makes upgrade difficult or impossible.
> > > > > > > >
> > > > > > > > In practice what I think this means for development is a lot
> more
> > > > > > > pressure
> > > > > > > > to really think about the public interfaces we are making
> and try
> > > > our
> > > > > > > best
> > > > > > > > to get them right. This can be hard sometimes as changes
> come in
> > > > > > patches
> > > > > > > > and it is hard to follow every single rb with enough
> diligence to
> > > > > know.
> > > > > > > >
> > > > > > > > Compatibility really means a couple things:
> > > > > > > > 1. Protocol changes
> > > > > > > > 2. Binary data format changes
> > > > > > > > 3. Changes in public apis in the clients
> > > > > > > > 4. Configs
> > > > > > > > 5. Metric names
> > > > > > > > 6. Command line tools
> > > > > > > >
> > > > > > > > I think 1-2 are critical. 3 is very important. And 4, 5 and
> 6 are
> > > > > > pretty
> > > > > > > > important but not critical.
> > > > > > > >
> > > > > > > > One thing this implies is that we are really going to have
> to do
> > > a
> > > > > good
> > > > > > > job
> > > > > > > > of thinking about apis and use cases. You can definitely see
> a
> > > > number
> > > > > > of
> > > > > > > > places in the old clients and in a couple of the protocols
> where
> > > > > enough
> > > > > > > > care was not given to thinking things through. Some of those
> were
> > > > > from
> > > > > > > long
> > > > > > > > long ago, but we should really try to avoid adding to that
> set
> > > > > because
> > > > > > > > increasingly we will have to carry around these mistakes for
> a
> > > long
> > > > > > time.
> > > > > > > >
> > > > > > > > Here are a few things I thought we could do that might help
> us
> > > get
> > > > > > better
> > > > > > > > in this area:
> > > > > > > >
> > > > > > > > 1. Technically we are just in a really bad place with the
> > > protocol
> > > > > > > because
> > > > > > > > it is defined twice--once in the old scala request objects,
> and
> > > > once
> > > > > in
> > > > > > > the
> > > > > > > > new protocol format for the clients. This makes changes
> massively
> > > > > > > painful.
> > > > > > > > The good news is that the new request definition DSL was
> intended
> > > > to
> > > > > > make
> > > > > > > > adding new protocol versions a lot easier and clearer. It
> will
> > > also
> > > > > > make
> > > > > > > it
> > > > > > > > a lot more obvious when the protocol is changed since you
> will be
> > > > > > > checking
> > > > > > > > in or reviewing a change to Protocol.java. Getting the server
> > > moved
> > > > > > over
> > > > > > > to
> > > > > > > > the new request objects and protocol definition will be a
> bit of
> > > a
> > > > > slog
> > > > > > > but
> > > > > > > > it will really help here I think.
> > > > > > > >
> > > > > > > > 2. We need to get some testing in place on cross-version
> > > > > compatibility.
> > > > > > > > This is work and no tests here will be perfect, but I suspect
> > > with
> > > > > some
> > > > > > > > effort we could catch a lot of things.
> > > > > > > >
> > > > > > > > 3. I was also thinking it might be worth it to get a little
> bit
> > > > more
> > > > > > > formal
> > > > > > > > about the review and discussion process for things which will
> > > have
> > > > > > impact
> > > > > > > > to these public areas to ensure we end up with something we
> are
> > > > happy
> > > > > > > with.
> > > > > > > > Python has a PIP process (
> > > > https://www.python.org/dev/peps/pep-0257/)
> > > > > > by
> > > > > > > > which major changes are made, and it might be worth it for
> us to
> > > > do a
> > > > > > > > similar thing. We have essentially been doing this
> already--major
> > > > > > changes
> > > > > > > > almost always have an associated wiki, but I think just
> getting a
> > > > > > little
> > > > > > > > more rigorous might be good. The idea would be to just call
> out
> > > > these
> > > > > > > wikis
> > > > > > > > as official proposals and do a full Apache discuss/vote
> thread
> > > for
> > > > > > these
> > > > > > > > important change. We would use these for big features
> (security,
> > > > log
> > > > > > > > compaction, etc) as well as for small changes that introduce
> or
> > > > > change
> > > > > > a
> > > > > > > > public api/config/etc. This is a little heavier weight, but I
> > > think
> > > > > it
> > > > > > is
> > > > > > > > really just critical that we get these things right and this
> > > would
> > > > > be a
> > > > > > > way
> > > > > > > > to call out this kind of change so that everyone would take
> the
> > > > time
> > > > > to
> > > > > > > > look at them.
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > -Jay
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
>
>

Re: [DISCUSS] Compatability and KIPs

Posted by Joel Koshy <jj...@gmail.com>.

(deviating a bit from the points in the last set of emails) - just
wanted to ask if there was any heads-up regarding the change in
default partitioning behavior in the new producer. Chris (cc'd)
encountered this while upgrading to the new producer - the new
producer uses a murmur hash by default (which I agree is the right
thing to do btw) and the old producer uses hashCode on the original
partitioning key object. This affects users that depend on the default
partitioning logic. Although the new producer allows you to partition
outside explicitly, users will need to be aware of this change when
switching to the new producer.

On Mon, Jan 12, 2015 at 06:08:14PM -0800, Jay Kreps wrote:
> Hey Joe,
> 
> Yeah I think a lot of those items are limitations in that document that we
> should definitely fix.
> 
> The first issue you point out is a serious one: We give the total list of
> errors but don't list which errors can result from which APIs. This is a
> big issue because actually no one knows and even if you know the code base,
> determining that from the code is not trivial (since errors can percolate
> from lower layers). If you are writing a client, in practice, you just try
> stuff and handle the errors that you've seen and add some generic catch all
> for any new errors (which is actually a good forward-compatability
> practice). But it would be a lot easier if this kind of trial and error
> wasn't required. Having just done the Java producer and consumer I
> definitely felt that pain.
> 
> The second issue I think we kind of tried to address by giving basic usage
> info for things like metadata requests etc. But I think what you are
> pointing out is that this just isn't nearly detailed enough. Ideally we
> should give a lot of guidance on implementation options, optimizations,
> best practices, etc. I agree with this. Especially as we start to get the
> new consumer protocols in shape having this is really important for helping
> people make use of them as there are several apis that work together. I
> think we could expand this section of the docs a lot.
> 
> I think it also might be a good idea to move this document out of wiki and
> into the main docs. This way we can version it with releases. Currently
> there is no way to tell which api versions are supported in which Kafka
> version as the document is always the current state of the protocol minus
> stuff on trunk that isn't released yet. This mostly works since in practice
> if you are developing a client you should probably target the latest
> release, but it would be better to be able to tell what was in each release.
> 
> -Jay
> 
> On Mon, Jan 12, 2015 at 5:50 PM, Joe Stein <jo...@stealth.ly> wrote:
> 
> > Having an index for every protocol/API change (like
> > https://www.python.org/dev/peps/pep-0257/ ) will be much better than the
> > flat wire protocol doc we have now. It is impossible ( without jumping into
> > code ) right now to know if an error is supported in one version of Kafka
> > vs another or different messages even. Having something that is iterative
> > for each change that is explicit, clear and concise for developers for
> > client development would be wonderful. Some folks just try to keep pace
> > with the wire protocol doc regardless and often develop the wrong
> > functionality expected because functionality is not always part of the
> > protocol but an expectation / extension of the producer and/or consumer
> > layer from the project code.
> >
> > The "expected behavior" I think is a huge gap between the project and
> > client implementations. When you are a Kafka user you have certain
> > expectations when working with producers and consumers. e.g. if you fail a
> > produced message the expectation is to retry X times with a Y backoff
> > between each try. The wire protocol doc doesn't always expose these
> > "features" that are expected behaviors and often get missed. Assumptions
> > get made and in clients developed very large features take a while (often
> > seen via production issues) to get found out. I think this problem (which
> > is a big one IMHO) also will be better resolved with the KIP process.
> > Client application developers can look at new features, understand the
> > goals and expectations, develop those goals in the language/system required
> > and support the byte structure(s) for a complete use case.
> >
> > I think child pages from
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
> > might be a way to go.  I only suggest that because people already use that
> > page now and we can keep it as a high level "here is what you do" and then
> > sub link to the child pages when appropriate. I hate completely abandoning
> > something that is not entirely bad but just missing some updates in
> > different ways.  So, maybe something like that or having a committed
> > specific part under git or svn might also make sense also.
> >
> > I am not really opinionated on how we implement as long as we do implement
> > something for these issues.
> >
> > Feature and/or byte changes should bump the version number, +1
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
> > On Mon, Jan 12, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > Yeah I think this makes sense. Some of the crazy nesting will get better
> > > when we move to the new protocol definition I think, but we will always
> > > need some kind of if statement that branches for the different behavior
> > and
> > > this makes testing difficult.
> > >
> > > Probably the best thing to do would be to announce a version deprecated
> > > which will have no function but will serve as a warning that it is going
> > > away and then remove it some time later. This would mean including
> > > something that notes this in the protocol docs and maybe the release
> > notes.
> > > We should probably just always do this for all but the latest version of
> > > all apis. I think probably a year of deprecation should be sufficient
> > prior
> > > to removal?
> > >
> > > I also think we can maybe use some common sense in deciding this.
> > Removing
> > > older versions will always be bad for users and client developers and
> > > always be good for Kafka committers. I think we can be more aggressive on
> > > things that are not heavily used (and hence less bad for users) or for
> > > which supporting multiple versions is particularly onerous.
> > >
> > > -Jay
> > >
> > > On Mon, Jan 12, 2015 at 5:02 PM, Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > > > +1 on version evolving with any protocol / data format / functionality
> > > > changes, and I am wondering if we have a standard process of
> > deprecating
> > > > old versions? Today with just a couple of versions for the protocol
> > (e.g.
> > > > offset commit) the code on the server side is already pretty nested and
> > > > complicated in order to support different version supports.
> > > >
> > > > On Mon, Jan 12, 2015 at 9:21 AM, Jay Kreps <ja...@confluent.io> wrote:
> > > >
> > > > > Hey Jun,
> > > > >
> > > > > Good points.
> > > > >
> > > > > I totally agree that the versioning needs to cover both format and
> > > > behavior
> > > > > if the behavior change is incompatible.
> > > > >
> > > > > I kind of agree about the stable/unstable stuff. What I think this
> > > means
> > > > is
> > > > > not that we would ever evolve the protocol without changing the
> > > version,
> > > > > but rather that we would drop support for older versions quicker. On
> > > one
> > > > > hand that makes sense and it is often a high bar to get things right
> > > the
> > > > > first time. On the other hand I think in practice the set of people
> > who
> > > > > interact with the protocol is often different from the end user. So
> > the
> > > > > end-user experience may still be "hey my code just broke" because
> > some
> > > > > client they use relied on an unstable protocol unbeknownst to them.
> > > But I
> > > > > think all that means is that we should be thoughtful about removing
> > > > support
> > > > > for old protocol versions even if they were marked unstable.
> > > > >
> > > > > Does anyone else have feedback or thoughts on the KIP stuff?
> > > Objections?
> > > > > Thoughts on structure?
> > > > >
> > > > > -Jay
> > > > >
> > > > > On Mon, Jan 12, 2015 at 8:20 AM, Jun Rao <ju...@confluent.io> wrote:
> > > > >
> > > > > > Jay,
> > > > > >
> > > > > > Thanks for bringing this up. Yes, we should increase the level of
> > > > > awareness
> > > > > > of compatibility.
> > > > > >
> > > > > > For 1 and 2, they probably should include any functional change.
> > For
> > > > > > example, even if there is no change in the binary data format, but
> > > the
> > > > > > interpretation is changed, we should consider this as a binary
> > format
> > > > > > change and bump up the version number.
> > > > > >
> > > > > > 3. Having a wider discussion on api/protocol/data changes in the
> > > > mailing
> > > > > > list seems like a good idea.
> > > > > >
> > > > > > 7. It might be good to also document api/protocol/data format that
> > > are
> > > > > > considered stable (or unstable). For example, in 0.8.2 release, we
> > > will
> > > > > > have a few new protocols (e.g. HeartBeat) for the development of
> > the
> > > > new
> > > > > > consumer. Those new protocols probably shouldn't be considered
> > stable
> > > > > until
> > > > > > the new consumer is more fully developed.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jan 9, 2015 at 4:29 PM, Jay Kreps <ja...@confluent.io>
> > wrote:
> > > > > >
> > > > > > > Hey guys,
> > > > > > >
> > > > > > > We had a bit of a compatibility slip-up in 0.8.2 with the offset
> > > > commit
> > > > > > > stuff. We caught this one before the final release so it's not
> > too
> > > > bad.
> > > > > > But
> > > > > > > I do think it kind of points to an area we could do better.
> > > > > > >
> > > > > > > One piece of feedback we have gotten from going out and talking
> > to
> > > > > users
> > > > > > is
> > > > > > > that compatibility is really, really important to them. Kafka is
> > > > > getting
> > > > > > > deployed in big environments where the clients are embedded in
> > lots
> > > > of
> > > > > > > applications and any kind of incompatibility is a huge pain for
> > > > people
> > > > > > > using it and generally makes upgrade difficult or impossible.
> > > > > > >
> > > > > > > In practice what I think this means for development is a lot more
> > > > > > pressure
> > > > > > > to really think about the public interfaces we are making and try
> > > our
> > > > > > best
> > > > > > > to get them right. This can be hard sometimes as changes come in
> > > > > patches
> > > > > > > and it is hard to follow every single rb with enough diligence to
> > > > know.
> > > > > > >
> > > > > > > Compatibility really means a couple things:
> > > > > > > 1. Protocol changes
> > > > > > > 2. Binary data format changes
> > > > > > > 3. Changes in public apis in the clients
> > > > > > > 4. Configs
> > > > > > > 5. Metric names
> > > > > > > 6. Command line tools
> > > > > > >
> > > > > > > I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are
> > > > > pretty
> > > > > > > important but not critical.
> > > > > > >
> > > > > > > One thing this implies is that we are really going to have to do
> > a
> > > > good
> > > > > > job
> > > > > > > of thinking about apis and use cases. You can definitely see a
> > > number
> > > > > of
> > > > > > > places in the old clients and in a couple of the protocols where
> > > > enough
> > > > > > > care was not given to thinking things through. Some of those were
> > > > from
> > > > > > long
> > > > > > > long ago, but we should really try to avoid adding to that set
> > > > because
> > > > > > > increasingly we will have to carry around these mistakes for a
> > long
> > > > > time.
> > > > > > >
> > > > > > > Here are a few things I thought we could do that might help us
> > get
> > > > > better
> > > > > > > in this area:
> > > > > > >
> > > > > > > 1. Technically we are just in a really bad place with the
> > protocol
> > > > > > because
> > > > > > > it is defined twice--once in the old scala request objects, and
> > > once
> > > > in
> > > > > > the
> > > > > > > new protocol format for the clients. This makes changes massively
> > > > > > painful.
> > > > > > > The good news is that the new request definition DSL was intended
> > > to
> > > > > make
> > > > > > > adding new protocol versions a lot easier and clearer. It will
> > also
> > > > > make
> > > > > > it
> > > > > > > a lot more obvious when the protocol is changed since you will be
> > > > > > checking
> > > > > > > in or reviewing a change to Protocol.java. Getting the server
> > moved
> > > > > over
> > > > > > to
> > > > > > > the new request objects and protocol definition will be a bit of
> > a
> > > > slog
> > > > > > but
> > > > > > > it will really help here I think.
> > > > > > >
> > > > > > > 2. We need to get some testing in place on cross-version
> > > > compatibility.
> > > > > > > This is work and no tests here will be perfect, but I suspect
> > with
> > > > some
> > > > > > > effort we could catch a lot of things.
> > > > > > >
> > > > > > > 3. I was also thinking it might be worth it to get a little bit
> > > more
> > > > > > formal
> > > > > > > about the review and discussion process for things which will
> > have
> > > > > impact
> > > > > > > to these public areas to ensure we end up with something we are
> > > happy
> > > > > > with.
> > > > > > > Python has a PIP process (
> > > https://www.python.org/dev/peps/pep-0257/)
> > > > > by
> > > > > > > which major changes are made, and it might be worth it for us to
> > > do a
> > > > > > > similar thing. We have essentially been doing this already--major
> > > > > changes
> > > > > > > almost always have an associated wiki, but I think just getting a
> > > > > little
> > > > > > > more rigorous might be good. The idea would be to just call out
> > > these
> > > > > > wikis
> > > > > > > as official proposals and do a full Apache discuss/vote thread
> > for
> > > > > these
> > > > > > > important change. We would use these for big features (security,
> > > log
> > > > > > > compaction, etc) as well as for small changes that introduce or
> > > > change
> > > > > a
> > > > > > > public api/config/etc. This is a little heavier weight, but I
> > think
> > > > it
> > > > > is
> > > > > > > really just critical that we get these things right and this
> > would
> > > > be a
> > > > > > way
> > > > > > > to call out this kind of change so that everyone would take the
> > > time
> > > > to
> > > > > > > look at them.
> > > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > -Jay
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >

Re: [DISCUSS] Compatability and KIPs

Posted by Jay Kreps <ja...@gmail.com>.

Hey Joe,

Yeah I think a lot of those items are limitations in that document that we
should definitely fix.

The first issue you point out is a serious one: We give the total list of
errors but don't list which errors can result from which APIs. This is a
big issue because actually no one knows and even if you know the code base,
determining that from the code is not trivial (since errors can percolate
from lower layers). If you are writing a client, in practice, you just try
stuff and handle the errors that you've seen and add some generic catch all
for any new errors (which is actually a good forward-compatability
practice). But it would be a lot easier if this kind of trial and error
wasn't required. Having just done the Java producer and consumer I
definitely felt that pain.

The second issue I think we kind of tried to address by giving basic usage
info for things like metadata requests etc. But I think what you are
pointing out is that this just isn't nearly detailed enough. Ideally we
should give a lot of guidance on implementation options, optimizations,
best practices, etc. I agree with this. Especially as we start to get the
new consumer protocols in shape having this is really important for helping
people make use of them as there are several apis that work together. I
think we could expand this section of the docs a lot.

I think it also might be a good idea to move this document out of wiki and
into the main docs. This way we can version it with releases. Currently
there is no way to tell which api versions are supported in which Kafka
version as the document is always the current state of the protocol minus
stuff on trunk that isn't released yet. This mostly works since in practice
if you are developing a client you should probably target the latest
release, but it would be better to be able to tell what was in each release.

-Jay

On Mon, Jan 12, 2015 at 5:50 PM, Joe Stein <jo...@stealth.ly> wrote:

> Having an index for every protocol/API change (like
> https://www.python.org/dev/peps/pep-0257/ ) will be much better than the
> flat wire protocol doc we have now. It is impossible ( without jumping into
> code ) right now to know if an error is supported in one version of Kafka
> vs another or different messages even. Having something that is iterative
> for each change that is explicit, clear and concise for developers for
> client development would be wonderful. Some folks just try to keep pace
> with the wire protocol doc regardless and often develop the wrong
> functionality expected because functionality is not always part of the
> protocol but an expectation / extension of the producer and/or consumer
> layer from the project code.
>
> The "expected behavior" I think is a huge gap between the project and
> client implementations. When you are a Kafka user you have certain
> expectations when working with producers and consumers. e.g. if you fail a
> produced message the expectation is to retry X times with a Y backoff
> between each try. The wire protocol doc doesn't always expose these
> "features" that are expected behaviors and often get missed. Assumptions
> get made and in clients developed very large features take a while (often
> seen via production issues) to get found out. I think this problem (which
> is a big one IMHO) also will be better resolved with the KIP process.
> Client application developers can look at new features, understand the
> goals and expectations, develop those goals in the language/system required
> and support the byte structure(s) for a complete use case.
>
> I think child pages from
>
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
> might be a way to go.  I only suggest that because people already use that
> page now and we can keep it as a high level "here is what you do" and then
> sub link to the child pages when appropriate. I hate completely abandoning
> something that is not entirely bad but just missing some updates in
> different ways.  So, maybe something like that or having a committed
> specific part under git or svn might also make sense also.
>
> I am not really opinionated on how we implement as long as we do implement
> something for these issues.
>
> Feature and/or byte changes should bump the version number, +1
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
> On Mon, Jan 12, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Yeah I think this makes sense. Some of the crazy nesting will get better
> > when we move to the new protocol definition I think, but we will always
> > need some kind of if statement that branches for the different behavior
> and
> > this makes testing difficult.
> >
> > Probably the best thing to do would be to announce a version deprecated
> > which will have no function but will serve as a warning that it is going
> > away and then remove it some time later. This would mean including
> > something that notes this in the protocol docs and maybe the release
> notes.
> > We should probably just always do this for all but the latest version of
> > all apis. I think probably a year of deprecation should be sufficient
> prior
> > to removal?
> >
> > I also think we can maybe use some common sense in deciding this.
> Removing
> > older versions will always be bad for users and client developers and
> > always be good for Kafka committers. I think we can be more aggressive on
> > things that are not heavily used (and hence less bad for users) or for
> > which supporting multiple versions is particularly onerous.
> >
> > -Jay
> >
> > On Mon, Jan 12, 2015 at 5:02 PM, Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > +1 on version evolving with any protocol / data format / functionality
> > > changes, and I am wondering if we have a standard process of
> deprecating
> > > old versions? Today with just a couple of versions for the protocol
> (e.g.
> > > offset commit) the code on the server side is already pretty nested and
> > > complicated in order to support different version supports.
> > >
> > > On Mon, Jan 12, 2015 at 9:21 AM, Jay Kreps <ja...@confluent.io> wrote:
> > >
> > > > Hey Jun,
> > > >
> > > > Good points.
> > > >
> > > > I totally agree that the versioning needs to cover both format and
> > > behavior
> > > > if the behavior change is incompatible.
> > > >
> > > > I kind of agree about the stable/unstable stuff. What I think this
> > means
> > > is
> > > > not that we would ever evolve the protocol without changing the
> > version,
> > > > but rather that we would drop support for older versions quicker. On
> > one
> > > > hand that makes sense and it is often a high bar to get things right
> > the
> > > > first time. On the other hand I think in practice the set of people
> who
> > > > interact with the protocol is often different from the end user. So
> the
> > > > end-user experience may still be "hey my code just broke" because
> some
> > > > client they use relied on an unstable protocol unbeknownst to them.
> > But I
> > > > think all that means is that we should be thoughtful about removing
> > > support
> > > > for old protocol versions even if they were marked unstable.
> > > >
> > > > Does anyone else have feedback or thoughts on the KIP stuff?
> > Objections?
> > > > Thoughts on structure?
> > > >
> > > > -Jay
> > > >
> > > > On Mon, Jan 12, 2015 at 8:20 AM, Jun Rao <ju...@confluent.io> wrote:
> > > >
> > > > > Jay,
> > > > >
> > > > > Thanks for bringing this up. Yes, we should increase the level of
> > > > awareness
> > > > > of compatibility.
> > > > >
> > > > > For 1 and 2, they probably should include any functional change.
> For
> > > > > example, even if there is no change in the binary data format, but
> > the
> > > > > interpretation is changed, we should consider this as a binary
> format
> > > > > change and bump up the version number.
> > > > >
> > > > > 3. Having a wider discussion on api/protocol/data changes in the
> > > mailing
> > > > > list seems like a good idea.
> > > > >
> > > > > 7. It might be good to also document api/protocol/data format that
> > are
> > > > > considered stable (or unstable). For example, in 0.8.2 release, we
> > will
> > > > > have a few new protocols (e.g. HeartBeat) for the development of
> the
> > > new
> > > > > consumer. Those new protocols probably shouldn't be considered
> stable
> > > > until
> > > > > the new consumer is more fully developed.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jan 9, 2015 at 4:29 PM, Jay Kreps <ja...@confluent.io>
> wrote:
> > > > >
> > > > > > Hey guys,
> > > > > >
> > > > > > We had a bit of a compatibility slip-up in 0.8.2 with the offset
> > > commit
> > > > > > stuff. We caught this one before the final release so it's not
> too
> > > bad.
> > > > > But
> > > > > > I do think it kind of points to an area we could do better.
> > > > > >
> > > > > > One piece of feedback we have gotten from going out and talking
> to
> > > > users
> > > > > is
> > > > > > that compatibility is really, really important to them. Kafka is
> > > > getting
> > > > > > deployed in big environments where the clients are embedded in
> lots
> > > of
> > > > > > applications and any kind of incompatibility is a huge pain for
> > > people
> > > > > > using it and generally makes upgrade difficult or impossible.
> > > > > >
> > > > > > In practice what I think this means for development is a lot more
> > > > > pressure
> > > > > > to really think about the public interfaces we are making and try
> > our
> > > > > best
> > > > > > to get them right. This can be hard sometimes as changes come in
> > > > patches
> > > > > > and it is hard to follow every single rb with enough diligence to
> > > know.
> > > > > >
> > > > > > Compatibility really means a couple things:
> > > > > > 1. Protocol changes
> > > > > > 2. Binary data format changes
> > > > > > 3. Changes in public apis in the clients
> > > > > > 4. Configs
> > > > > > 5. Metric names
> > > > > > 6. Command line tools
> > > > > >
> > > > > > I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are
> > > > pretty
> > > > > > important but not critical.
> > > > > >
> > > > > > One thing this implies is that we are really going to have to do
> a
> > > good
> > > > > job
> > > > > > of thinking about apis and use cases. You can definitely see a
> > number
> > > > of
> > > > > > places in the old clients and in a couple of the protocols where
> > > enough
> > > > > > care was not given to thinking things through. Some of those were
> > > from
> > > > > long
> > > > > > long ago, but we should really try to avoid adding to that set
> > > because
> > > > > > increasingly we will have to carry around these mistakes for a
> long
> > > > time.
> > > > > >
> > > > > > Here are a few things I thought we could do that might help us
> get
> > > > better
> > > > > > in this area:
> > > > > >
> > > > > > 1. Technically we are just in a really bad place with the
> protocol
> > > > > because
> > > > > > it is defined twice--once in the old scala request objects, and
> > once
> > > in
> > > > > the
> > > > > > new protocol format for the clients. This makes changes massively
> > > > > painful.
> > > > > > The good news is that the new request definition DSL was intended
> > to
> > > > make
> > > > > > adding new protocol versions a lot easier and clearer. It will
> also
> > > > make
> > > > > it
> > > > > > a lot more obvious when the protocol is changed since you will be
> > > > > checking
> > > > > > in or reviewing a change to Protocol.java. Getting the server
> moved
> > > > over
> > > > > to
> > > > > > the new request objects and protocol definition will be a bit of
> a
> > > slog
> > > > > but
> > > > > > it will really help here I think.
> > > > > >
> > > > > > 2. We need to get some testing in place on cross-version
> > > compatibility.
> > > > > > This is work and no tests here will be perfect, but I suspect
> with
> > > some
> > > > > > effort we could catch a lot of things.
> > > > > >
> > > > > > 3. I was also thinking it might be worth it to get a little bit
> > more
> > > > > formal
> > > > > > about the review and discussion process for things which will
> have
> > > > impact
> > > > > > to these public areas to ensure we end up with something we are
> > happy
> > > > > with.
> > > > > > Python has a PIP process (
> > https://www.python.org/dev/peps/pep-0257/)
> > > > by
> > > > > > which major changes are made, and it might be worth it for us to
> > do a
> > > > > > similar thing. We have essentially been doing this already--major
> > > > changes
> > > > > > almost always have an associated wiki, but I think just getting a
> > > > little
> > > > > > more rigorous might be good. The idea would be to just call out
> > these
> > > > > wikis
> > > > > > as official proposals and do a full Apache discuss/vote thread
> for
> > > > these
> > > > > > important change. We would use these for big features (security,
> > log
> > > > > > compaction, etc) as well as for small changes that introduce or
> > > change
> > > > a
> > > > > > public api/config/etc. This is a little heavier weight, but I
> think
> > > it
> > > > is
> > > > > > really just critical that we get these things right and this
> would
> > > be a
> > > > > way
> > > > > > to call out this kind of change so that everyone would take the
> > time
> > > to
> > > > > > look at them.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > -Jay
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>

Re: [DISCUSS] Compatability and KIPs

Posted by Joe Stein <jo...@stealth.ly>.

Having an index for every protocol/API change (like
https://www.python.org/dev/peps/pep-0257/ ) will be much better than the
flat wire protocol doc we have now. It is impossible ( without jumping into
code ) right now to know if an error is supported in one version of Kafka
vs another or different messages even. Having something that is iterative
for each change that is explicit, clear and concise for developers for
client development would be wonderful. Some folks just try to keep pace
with the wire protocol doc regardless and often develop the wrong
functionality expected because functionality is not always part of the
protocol but an expectation / extension of the producer and/or consumer
layer from the project code.

The "expected behavior" I think is a huge gap between the project and
client implementations. When you are a Kafka user you have certain
expectations when working with producers and consumers. e.g. if you fail a
produced message the expectation is to retry X times with a Y backoff
between each try. The wire protocol doc doesn't always expose these
"features" that are expected behaviors and often get missed. Assumptions
get made and in clients developed very large features take a while (often
seen via production issues) to get found out. I think this problem (which
is a big one IMHO) also will be better resolved with the KIP process.
Client application developers can look at new features, understand the
goals and expectations, develop those goals in the language/system required
and support the byte structure(s) for a complete use case.

I think child pages from
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
might be a way to go.  I only suggest that because people already use that
page now and we can keep it as a high level "here is what you do" and then
sub link to the child pages when appropriate. I hate completely abandoning
something that is not entirely bad but just missing some updates in
different ways.  So, maybe something like that or having a committed
specific part under git or svn might also make sense also.

I am not really opinionated on how we implement as long as we do implement
something for these issues.

Feature and/or byte changes should bump the version number, +1

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

On Mon, Jan 12, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com> wrote:

> Yeah I think this makes sense. Some of the crazy nesting will get better
> when we move to the new protocol definition I think, but we will always
> need some kind of if statement that branches for the different behavior and
> this makes testing difficult.
>
> Probably the best thing to do would be to announce a version deprecated
> which will have no function but will serve as a warning that it is going
> away and then remove it some time later. This would mean including
> something that notes this in the protocol docs and maybe the release notes.
> We should probably just always do this for all but the latest version of
> all apis. I think probably a year of deprecation should be sufficient prior
> to removal?
>
> I also think we can maybe use some common sense in deciding this. Removing
> older versions will always be bad for users and client developers and
> always be good for Kafka committers. I think we can be more aggressive on
> things that are not heavily used (and hence less bad for users) or for
> which supporting multiple versions is particularly onerous.
>
> -Jay
>
> On Mon, Jan 12, 2015 at 5:02 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
> > +1 on version evolving with any protocol / data format / functionality
> > changes, and I am wondering if we have a standard process of deprecating
> > old versions? Today with just a couple of versions for the protocol (e.g.
> > offset commit) the code on the server side is already pretty nested and
> > complicated in order to support different version supports.
> >
> > On Mon, Jan 12, 2015 at 9:21 AM, Jay Kreps <ja...@confluent.io> wrote:
> >
> > > Hey Jun,
> > >
> > > Good points.
> > >
> > > I totally agree that the versioning needs to cover both format and
> > behavior
> > > if the behavior change is incompatible.
> > >
> > > I kind of agree about the stable/unstable stuff. What I think this
> means
> > is
> > > not that we would ever evolve the protocol without changing the
> version,
> > > but rather that we would drop support for older versions quicker. On
> one
> > > hand that makes sense and it is often a high bar to get things right
> the
> > > first time. On the other hand I think in practice the set of people who
> > > interact with the protocol is often different from the end user. So the
> > > end-user experience may still be "hey my code just broke" because some
> > > client they use relied on an unstable protocol unbeknownst to them.
> But I
> > > think all that means is that we should be thoughtful about removing
> > support
> > > for old protocol versions even if they were marked unstable.
> > >
> > > Does anyone else have feedback or thoughts on the KIP stuff?
> Objections?
> > > Thoughts on structure?
> > >
> > > -Jay
> > >
> > > On Mon, Jan 12, 2015 at 8:20 AM, Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > Jay,
> > > >
> > > > Thanks for bringing this up. Yes, we should increase the level of
> > > awareness
> > > > of compatibility.
> > > >
> > > > For 1 and 2, they probably should include any functional change. For
> > > > example, even if there is no change in the binary data format, but
> the
> > > > interpretation is changed, we should consider this as a binary format
> > > > change and bump up the version number.
> > > >
> > > > 3. Having a wider discussion on api/protocol/data changes in the
> > mailing
> > > > list seems like a good idea.
> > > >
> > > > 7. It might be good to also document api/protocol/data format that
> are
> > > > considered stable (or unstable). For example, in 0.8.2 release, we
> will
> > > > have a few new protocols (e.g. HeartBeat) for the development of the
> > new
> > > > consumer. Those new protocols probably shouldn't be considered stable
> > > until
> > > > the new consumer is more fully developed.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > >
> > > > On Fri, Jan 9, 2015 at 4:29 PM, Jay Kreps <ja...@confluent.io> wrote:
> > > >
> > > > > Hey guys,
> > > > >
> > > > > We had a bit of a compatibility slip-up in 0.8.2 with the offset
> > commit
> > > > > stuff. We caught this one before the final release so it's not too
> > bad.
> > > > But
> > > > > I do think it kind of points to an area we could do better.
> > > > >
> > > > > One piece of feedback we have gotten from going out and talking to
> > > users
> > > > is
> > > > > that compatibility is really, really important to them. Kafka is
> > > getting
> > > > > deployed in big environments where the clients are embedded in lots
> > of
> > > > > applications and any kind of incompatibility is a huge pain for
> > people
> > > > > using it and generally makes upgrade difficult or impossible.
> > > > >
> > > > > In practice what I think this means for development is a lot more
> > > > pressure
> > > > > to really think about the public interfaces we are making and try
> our
> > > > best
> > > > > to get them right. This can be hard sometimes as changes come in
> > > patches
> > > > > and it is hard to follow every single rb with enough diligence to
> > know.
> > > > >
> > > > > Compatibility really means a couple things:
> > > > > 1. Protocol changes
> > > > > 2. Binary data format changes
> > > > > 3. Changes in public apis in the clients
> > > > > 4. Configs
> > > > > 5. Metric names
> > > > > 6. Command line tools
> > > > >
> > > > > I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are
> > > pretty
> > > > > important but not critical.
> > > > >
> > > > > One thing this implies is that we are really going to have to do a
> > good
> > > > job
> > > > > of thinking about apis and use cases. You can definitely see a
> number
> > > of
> > > > > places in the old clients and in a couple of the protocols where
> > enough
> > > > > care was not given to thinking things through. Some of those were
> > from
> > > > long
> > > > > long ago, but we should really try to avoid adding to that set
> > because
> > > > > increasingly we will have to carry around these mistakes for a long
> > > time.
> > > > >
> > > > > Here are a few things I thought we could do that might help us get
> > > better
> > > > > in this area:
> > > > >
> > > > > 1. Technically we are just in a really bad place with the protocol
> > > > because
> > > > > it is defined twice--once in the old scala request objects, and
> once
> > in
> > > > the
> > > > > new protocol format for the clients. This makes changes massively
> > > > painful.
> > > > > The good news is that the new request definition DSL was intended
> to
> > > make
> > > > > adding new protocol versions a lot easier and clearer. It will also
> > > make
> > > > it
> > > > > a lot more obvious when the protocol is changed since you will be
> > > > checking
> > > > > in or reviewing a change to Protocol.java. Getting the server moved
> > > over
> > > > to
> > > > > the new request objects and protocol definition will be a bit of a
> > slog
> > > > but
> > > > > it will really help here I think.
> > > > >
> > > > > 2. We need to get some testing in place on cross-version
> > compatibility.
> > > > > This is work and no tests here will be perfect, but I suspect with
> > some
> > > > > effort we could catch a lot of things.
> > > > >
> > > > > 3. I was also thinking it might be worth it to get a little bit
> more
> > > > formal
> > > > > about the review and discussion process for things which will have
> > > impact
> > > > > to these public areas to ensure we end up with something we are
> happy
> > > > with.
> > > > > Python has a PIP process (
> https://www.python.org/dev/peps/pep-0257/)
> > > by
> > > > > which major changes are made, and it might be worth it for us to
> do a
> > > > > similar thing. We have essentially been doing this already--major
> > > changes
> > > > > almost always have an associated wiki, but I think just getting a
> > > little
> > > > > more rigorous might be good. The idea would be to just call out
> these
> > > > wikis
> > > > > as official proposals and do a full Apache discuss/vote thread for
> > > these
> > > > > important change. We would use these for big features (security,
> log
> > > > > compaction, etc) as well as for small changes that introduce or
> > change
> > > a
> > > > > public api/config/etc. This is a little heavier weight, but I think
> > it
> > > is
> > > > > really just critical that we get these things right and this would
> > be a
> > > > way
> > > > > to call out this kind of change so that everyone would take the
> time
> > to
> > > > > look at them.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > -Jay
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [DISCUSS] Compatability and KIPs

Posted by Jay Kreps <ja...@gmail.com>.

Yeah I think this makes sense. Some of the crazy nesting will get better
when we move to the new protocol definition I think, but we will always
need some kind of if statement that branches for the different behavior and
this makes testing difficult.

Probably the best thing to do would be to announce a version deprecated
which will have no function but will serve as a warning that it is going
away and then remove it some time later. This would mean including
something that notes this in the protocol docs and maybe the release notes.
We should probably just always do this for all but the latest version of
all apis. I think probably a year of deprecation should be sufficient prior
to removal?

I also think we can maybe use some common sense in deciding this. Removing
older versions will always be bad for users and client developers and
always be good for Kafka committers. I think we can be more aggressive on
things that are not heavily used (and hence less bad for users) or for
which supporting multiple versions is particularly onerous.

-Jay

On Mon, Jan 12, 2015 at 5:02 PM, Guozhang Wang <wa...@gmail.com> wrote:

> +1 on version evolving with any protocol / data format / functionality
> changes, and I am wondering if we have a standard process of deprecating
> old versions? Today with just a couple of versions for the protocol (e.g.
> offset commit) the code on the server side is already pretty nested and
> complicated in order to support different version supports.
>
> On Mon, Jan 12, 2015 at 9:21 AM, Jay Kreps <ja...@confluent.io> wrote:
>
> > Hey Jun,
> >
> > Good points.
> >
> > I totally agree that the versioning needs to cover both format and
> behavior
> > if the behavior change is incompatible.
> >
> > I kind of agree about the stable/unstable stuff. What I think this means
> is
> > not that we would ever evolve the protocol without changing the version,
> > but rather that we would drop support for older versions quicker. On one
> > hand that makes sense and it is often a high bar to get things right the
> > first time. On the other hand I think in practice the set of people who
> > interact with the protocol is often different from the end user. So the
> > end-user experience may still be "hey my code just broke" because some
> > client they use relied on an unstable protocol unbeknownst to them. But I
> > think all that means is that we should be thoughtful about removing
> support
> > for old protocol versions even if they were marked unstable.
> >
> > Does anyone else have feedback or thoughts on the KIP stuff? Objections?
> > Thoughts on structure?
> >
> > -Jay
> >
> > On Mon, Jan 12, 2015 at 8:20 AM, Jun Rao <ju...@confluent.io> wrote:
> >
> > > Jay,
> > >
> > > Thanks for bringing this up. Yes, we should increase the level of
> > awareness
> > > of compatibility.
> > >
> > > For 1 and 2, they probably should include any functional change. For
> > > example, even if there is no change in the binary data format, but the
> > > interpretation is changed, we should consider this as a binary format
> > > change and bump up the version number.
> > >
> > > 3. Having a wider discussion on api/protocol/data changes in the
> mailing
> > > list seems like a good idea.
> > >
> > > 7. It might be good to also document api/protocol/data format that are
> > > considered stable (or unstable). For example, in 0.8.2 release, we will
> > > have a few new protocols (e.g. HeartBeat) for the development of the
> new
> > > consumer. Those new protocols probably shouldn't be considered stable
> > until
> > > the new consumer is more fully developed.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > >
> > > On Fri, Jan 9, 2015 at 4:29 PM, Jay Kreps <ja...@confluent.io> wrote:
> > >
> > > > Hey guys,
> > > >
> > > > We had a bit of a compatibility slip-up in 0.8.2 with the offset
> commit
> > > > stuff. We caught this one before the final release so it's not too
> bad.
> > > But
> > > > I do think it kind of points to an area we could do better.
> > > >
> > > > One piece of feedback we have gotten from going out and talking to
> > users
> > > is
> > > > that compatibility is really, really important to them. Kafka is
> > getting
> > > > deployed in big environments where the clients are embedded in lots
> of
> > > > applications and any kind of incompatibility is a huge pain for
> people
> > > > using it and generally makes upgrade difficult or impossible.
> > > >
> > > > In practice what I think this means for development is a lot more
> > > pressure
> > > > to really think about the public interfaces we are making and try our
> > > best
> > > > to get them right. This can be hard sometimes as changes come in
> > patches
> > > > and it is hard to follow every single rb with enough diligence to
> know.
> > > >
> > > > Compatibility really means a couple things:
> > > > 1. Protocol changes
> > > > 2. Binary data format changes
> > > > 3. Changes in public apis in the clients
> > > > 4. Configs
> > > > 5. Metric names
> > > > 6. Command line tools
> > > >
> > > > I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are
> > pretty
> > > > important but not critical.
> > > >
> > > > One thing this implies is that we are really going to have to do a
> good
> > > job
> > > > of thinking about apis and use cases. You can definitely see a number
> > of
> > > > places in the old clients and in a couple of the protocols where
> enough
> > > > care was not given to thinking things through. Some of those were
> from
> > > long
> > > > long ago, but we should really try to avoid adding to that set
> because
> > > > increasingly we will have to carry around these mistakes for a long
> > time.
> > > >
> > > > Here are a few things I thought we could do that might help us get
> > better
> > > > in this area:
> > > >
> > > > 1. Technically we are just in a really bad place with the protocol
> > > because
> > > > it is defined twice--once in the old scala request objects, and once
> in
> > > the
> > > > new protocol format for the clients. This makes changes massively
> > > painful.
> > > > The good news is that the new request definition DSL was intended to
> > make
> > > > adding new protocol versions a lot easier and clearer. It will also
> > make
> > > it
> > > > a lot more obvious when the protocol is changed since you will be
> > > checking
> > > > in or reviewing a change to Protocol.java. Getting the server moved
> > over
> > > to
> > > > the new request objects and protocol definition will be a bit of a
> slog
> > > but
> > > > it will really help here I think.
> > > >
> > > > 2. We need to get some testing in place on cross-version
> compatibility.
> > > > This is work and no tests here will be perfect, but I suspect with
> some
> > > > effort we could catch a lot of things.
> > > >
> > > > 3. I was also thinking it might be worth it to get a little bit more
> > > formal
> > > > about the review and discussion process for things which will have
> > impact
> > > > to these public areas to ensure we end up with something we are happy
> > > with.
> > > > Python has a PIP process (https://www.python.org/dev/peps/pep-0257/)
> > by
> > > > which major changes are made, and it might be worth it for us to do a
> > > > similar thing. We have essentially been doing this already--major
> > changes
> > > > almost always have an associated wiki, but I think just getting a
> > little
> > > > more rigorous might be good. The idea would be to just call out these
> > > wikis
> > > > as official proposals and do a full Apache discuss/vote thread for
> > these
> > > > important change. We would use these for big features (security, log
> > > > compaction, etc) as well as for small changes that introduce or
> change
> > a
> > > > public api/config/etc. This is a little heavier weight, but I think
> it
> > is
> > > > really just critical that we get these things right and this would
> be a
> > > way
> > > > to call out this kind of change so that everyone would take the time
> to
> > > > look at them.
> > > >
> > > > Thoughts?
> > > >
> > > > -Jay
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] Compatability and KIPs

Posted by Guozhang Wang <wa...@gmail.com>.

+1 on version evolving with any protocol / data format / functionality
changes, and I am wondering if we have a standard process of deprecating
old versions? Today with just a couple of versions for the protocol (e.g.
offset commit) the code on the server side is already pretty nested and
complicated in order to support different version supports.

On Mon, Jan 12, 2015 at 9:21 AM, Jay Kreps <ja...@confluent.io> wrote:

> Hey Jun,
>
> Good points.
>
> I totally agree that the versioning needs to cover both format and behavior
> if the behavior change is incompatible.
>
> I kind of agree about the stable/unstable stuff. What I think this means is
> not that we would ever evolve the protocol without changing the version,
> but rather that we would drop support for older versions quicker. On one
> hand that makes sense and it is often a high bar to get things right the
> first time. On the other hand I think in practice the set of people who
> interact with the protocol is often different from the end user. So the
> end-user experience may still be "hey my code just broke" because some
> client they use relied on an unstable protocol unbeknownst to them. But I
> think all that means is that we should be thoughtful about removing support
> for old protocol versions even if they were marked unstable.
>
> Does anyone else have feedback or thoughts on the KIP stuff? Objections?
> Thoughts on structure?
>
> -Jay
>
> On Mon, Jan 12, 2015 at 8:20 AM, Jun Rao <ju...@confluent.io> wrote:
>
> > Jay,
> >
> > Thanks for bringing this up. Yes, we should increase the level of
> awareness
> > of compatibility.
> >
> > For 1 and 2, they probably should include any functional change. For
> > example, even if there is no change in the binary data format, but the
> > interpretation is changed, we should consider this as a binary format
> > change and bump up the version number.
> >
> > 3. Having a wider discussion on api/protocol/data changes in the mailing
> > list seems like a good idea.
> >
> > 7. It might be good to also document api/protocol/data format that are
> > considered stable (or unstable). For example, in 0.8.2 release, we will
> > have a few new protocols (e.g. HeartBeat) for the development of the new
> > consumer. Those new protocols probably shouldn't be considered stable
> until
> > the new consumer is more fully developed.
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> > On Fri, Jan 9, 2015 at 4:29 PM, Jay Kreps <ja...@confluent.io> wrote:
> >
> > > Hey guys,
> > >
> > > We had a bit of a compatibility slip-up in 0.8.2 with the offset commit
> > > stuff. We caught this one before the final release so it's not too bad.
> > But
> > > I do think it kind of points to an area we could do better.
> > >
> > > One piece of feedback we have gotten from going out and talking to
> users
> > is
> > > that compatibility is really, really important to them. Kafka is
> getting
> > > deployed in big environments where the clients are embedded in lots of
> > > applications and any kind of incompatibility is a huge pain for people
> > > using it and generally makes upgrade difficult or impossible.
> > >
> > > In practice what I think this means for development is a lot more
> > pressure
> > > to really think about the public interfaces we are making and try our
> > best
> > > to get them right. This can be hard sometimes as changes come in
> patches
> > > and it is hard to follow every single rb with enough diligence to know.
> > >
> > > Compatibility really means a couple things:
> > > 1. Protocol changes
> > > 2. Binary data format changes
> > > 3. Changes in public apis in the clients
> > > 4. Configs
> > > 5. Metric names
> > > 6. Command line tools
> > >
> > > I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are
> pretty
> > > important but not critical.
> > >
> > > One thing this implies is that we are really going to have to do a good
> > job
> > > of thinking about apis and use cases. You can definitely see a number
> of
> > > places in the old clients and in a couple of the protocols where enough
> > > care was not given to thinking things through. Some of those were from
> > long
> > > long ago, but we should really try to avoid adding to that set because
> > > increasingly we will have to carry around these mistakes for a long
> time.
> > >
> > > Here are a few things I thought we could do that might help us get
> better
> > > in this area:
> > >
> > > 1. Technically we are just in a really bad place with the protocol
> > because
> > > it is defined twice--once in the old scala request objects, and once in
> > the
> > > new protocol format for the clients. This makes changes massively
> > painful.
> > > The good news is that the new request definition DSL was intended to
> make
> > > adding new protocol versions a lot easier and clearer. It will also
> make
> > it
> > > a lot more obvious when the protocol is changed since you will be
> > checking
> > > in or reviewing a change to Protocol.java. Getting the server moved
> over
> > to
> > > the new request objects and protocol definition will be a bit of a slog
> > but
> > > it will really help here I think.
> > >
> > > 2. We need to get some testing in place on cross-version compatibility.
> > > This is work and no tests here will be perfect, but I suspect with some
> > > effort we could catch a lot of things.
> > >
> > > 3. I was also thinking it might be worth it to get a little bit more
> > formal
> > > about the review and discussion process for things which will have
> impact
> > > to these public areas to ensure we end up with something we are happy
> > with.
> > > Python has a PIP process (https://www.python.org/dev/peps/pep-0257/)
> by
> > > which major changes are made, and it might be worth it for us to do a
> > > similar thing. We have essentially been doing this already--major
> changes
> > > almost always have an associated wiki, but I think just getting a
> little
> > > more rigorous might be good. The idea would be to just call out these
> > wikis
> > > as official proposals and do a full Apache discuss/vote thread for
> these
> > > important change. We would use these for big features (security, log
> > > compaction, etc) as well as for small changes that introduce or change
> a
> > > public api/config/etc. This is a little heavier weight, but I think it
> is
> > > really just critical that we get these things right and this would be a
> > way
> > > to call out this kind of change so that everyone would take the time to
> > > look at them.
> > >
> > > Thoughts?
> > >
> > > -Jay
> > >
> >
>



-- 
-- Guozhang

Re: [DISCUSS] Compatability and KIPs

Posted by Jay Kreps <ja...@confluent.io>.

Hey Jun,

Good points.

I totally agree that the versioning needs to cover both format and behavior
if the behavior change is incompatible.

I kind of agree about the stable/unstable stuff. What I think this means is
not that we would ever evolve the protocol without changing the version,
but rather that we would drop support for older versions quicker. On one
hand that makes sense and it is often a high bar to get things right the
first time. On the other hand I think in practice the set of people who
interact with the protocol is often different from the end user. So the
end-user experience may still be "hey my code just broke" because some
client they use relied on an unstable protocol unbeknownst to them. But I
think all that means is that we should be thoughtful about removing support
for old protocol versions even if they were marked unstable.

Does anyone else have feedback or thoughts on the KIP stuff? Objections?
Thoughts on structure?

-Jay

On Mon, Jan 12, 2015 at 8:20 AM, Jun Rao <ju...@confluent.io> wrote:

> Jay,
>
> Thanks for bringing this up. Yes, we should increase the level of awareness
> of compatibility.
>
> For 1 and 2, they probably should include any functional change. For
> example, even if there is no change in the binary data format, but the
> interpretation is changed, we should consider this as a binary format
> change and bump up the version number.
>
> 3. Having a wider discussion on api/protocol/data changes in the mailing
> list seems like a good idea.
>
> 7. It might be good to also document api/protocol/data format that are
> considered stable (or unstable). For example, in 0.8.2 release, we will
> have a few new protocols (e.g. HeartBeat) for the development of the new
> consumer. Those new protocols probably shouldn't be considered stable until
> the new consumer is more fully developed.
>
> Thanks,
>
> Jun
>
>
>
> On Fri, Jan 9, 2015 at 4:29 PM, Jay Kreps <ja...@confluent.io> wrote:
>
> > Hey guys,
> >
> > We had a bit of a compatibility slip-up in 0.8.2 with the offset commit
> > stuff. We caught this one before the final release so it's not too bad.
> But
> > I do think it kind of points to an area we could do better.
> >
> > One piece of feedback we have gotten from going out and talking to users
> is
> > that compatibility is really, really important to them. Kafka is getting
> > deployed in big environments where the clients are embedded in lots of
> > applications and any kind of incompatibility is a huge pain for people
> > using it and generally makes upgrade difficult or impossible.
> >
> > In practice what I think this means for development is a lot more
> pressure
> > to really think about the public interfaces we are making and try our
> best
> > to get them right. This can be hard sometimes as changes come in patches
> > and it is hard to follow every single rb with enough diligence to know.
> >
> > Compatibility really means a couple things:
> > 1. Protocol changes
> > 2. Binary data format changes
> > 3. Changes in public apis in the clients
> > 4. Configs
> > 5. Metric names
> > 6. Command line tools
> >
> > I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are pretty
> > important but not critical.
> >
> > One thing this implies is that we are really going to have to do a good
> job
> > of thinking about apis and use cases. You can definitely see a number of
> > places in the old clients and in a couple of the protocols where enough
> > care was not given to thinking things through. Some of those were from
> long
> > long ago, but we should really try to avoid adding to that set because
> > increasingly we will have to carry around these mistakes for a long time.
> >
> > Here are a few things I thought we could do that might help us get better
> > in this area:
> >
> > 1. Technically we are just in a really bad place with the protocol
> because
> > it is defined twice--once in the old scala request objects, and once in
> the
> > new protocol format for the clients. This makes changes massively
> painful.
> > The good news is that the new request definition DSL was intended to make
> > adding new protocol versions a lot easier and clearer. It will also make
> it
> > a lot more obvious when the protocol is changed since you will be
> checking
> > in or reviewing a change to Protocol.java. Getting the server moved over
> to
> > the new request objects and protocol definition will be a bit of a slog
> but
> > it will really help here I think.
> >
> > 2. We need to get some testing in place on cross-version compatibility.
> > This is work and no tests here will be perfect, but I suspect with some
> > effort we could catch a lot of things.
> >
> > 3. I was also thinking it might be worth it to get a little bit more
> formal
> > about the review and discussion process for things which will have impact
> > to these public areas to ensure we end up with something we are happy
> with.
> > Python has a PIP process (https://www.python.org/dev/peps/pep-0257/) by
> > which major changes are made, and it might be worth it for us to do a
> > similar thing. We have essentially been doing this already--major changes
> > almost always have an associated wiki, but I think just getting a little
> > more rigorous might be good. The idea would be to just call out these
> wikis
> > as official proposals and do a full Apache discuss/vote thread for these
> > important change. We would use these for big features (security, log
> > compaction, etc) as well as for small changes that introduce or change a
> > public api/config/etc. This is a little heavier weight, but I think it is
> > really just critical that we get these things right and this would be a
> way
> > to call out this kind of change so that everyone would take the time to
> > look at them.
> >
> > Thoughts?
> >
> > -Jay
> >
>

Re: [DISCUSS] Compatability and KIPs

Posted by Jun Rao <ju...@confluent.io>.

Jay,

Thanks for bringing this up. Yes, we should increase the level of awareness
of compatibility.

For 1 and 2, they probably should include any functional change. For
example, even if there is no change in the binary data format, but the
interpretation is changed, we should consider this as a binary format
change and bump up the version number.

3. Having a wider discussion on api/protocol/data changes in the mailing
list seems like a good idea.

7. It might be good to also document api/protocol/data format that are
considered stable (or unstable). For example, in 0.8.2 release, we will
have a few new protocols (e.g. HeartBeat) for the development of the new
consumer. Those new protocols probably shouldn't be considered stable until
the new consumer is more fully developed.

Thanks,

Jun



On Fri, Jan 9, 2015 at 4:29 PM, Jay Kreps <ja...@confluent.io> wrote:

> Hey guys,
>
> We had a bit of a compatibility slip-up in 0.8.2 with the offset commit
> stuff. We caught this one before the final release so it's not too bad. But
> I do think it kind of points to an area we could do better.
>
> One piece of feedback we have gotten from going out and talking to users is
> that compatibility is really, really important to them. Kafka is getting
> deployed in big environments where the clients are embedded in lots of
> applications and any kind of incompatibility is a huge pain for people
> using it and generally makes upgrade difficult or impossible.
>
> In practice what I think this means for development is a lot more pressure
> to really think about the public interfaces we are making and try our best
> to get them right. This can be hard sometimes as changes come in patches
> and it is hard to follow every single rb with enough diligence to know.
>
> Compatibility really means a couple things:
> 1. Protocol changes
> 2. Binary data format changes
> 3. Changes in public apis in the clients
> 4. Configs
> 5. Metric names
> 6. Command line tools
>
> I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are pretty
> important but not critical.
>
> One thing this implies is that we are really going to have to do a good job
> of thinking about apis and use cases. You can definitely see a number of
> places in the old clients and in a couple of the protocols where enough
> care was not given to thinking things through. Some of those were from long
> long ago, but we should really try to avoid adding to that set because
> increasingly we will have to carry around these mistakes for a long time.
>
> Here are a few things I thought we could do that might help us get better
> in this area:
>
> 1. Technically we are just in a really bad place with the protocol because
> it is defined twice--once in the old scala request objects, and once in the
> new protocol format for the clients. This makes changes massively painful.
> The good news is that the new request definition DSL was intended to make
> adding new protocol versions a lot easier and clearer. It will also make it
> a lot more obvious when the protocol is changed since you will be checking
> in or reviewing a change to Protocol.java. Getting the server moved over to
> the new request objects and protocol definition will be a bit of a slog but
> it will really help here I think.
>
> 2. We need to get some testing in place on cross-version compatibility.
> This is work and no tests here will be perfect, but I suspect with some
> effort we could catch a lot of things.
>
> 3. I was also thinking it might be worth it to get a little bit more formal
> about the review and discussion process for things which will have impact
> to these public areas to ensure we end up with something we are happy with.
> Python has a PIP process (https://www.python.org/dev/peps/pep-0257/) by
> which major changes are made, and it might be worth it for us to do a
> similar thing. We have essentially been doing this already--major changes
> almost always have an associated wiki, but I think just getting a little
> more rigorous might be good. The idea would be to just call out these wikis
> as official proposals and do a full Apache discuss/vote thread for these
> important change. We would use these for big features (security, log
> compaction, etc) as well as for small changes that introduce or change a
> public api/config/etc. This is a little heavier weight, but I think it is
> really just critical that we get these things right and this would be a way
> to call out this kind of change so that everyone would take the time to
> look at them.
>
> Thoughts?
>
> -Jay
>