You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Gwen Shapira <gs...@cloudera.com> on 2015/01/15 00:51:32 UTC

Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Hi Kafka Devs,

We are working on KAFKA-1697 - remove code related to ack>1 on the
broker. Per Neha's suggestion, I'd like to give everyone a heads up on
what these changes mean.

Once this patch is included, any produce requests that include
request.required.acks > 1 will result in an exception. This will be
InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
UnknownException in existing versions (sorry, but I can't add error
codes retroactively).

This behavior is already enforced by 0.8.2 producers (sync and new),
but we expect impact on users with older producers that relied on acks
> 1 and external clients (i.e python, go, etc).

Users who relied on acks > 1 are expected to switch to using acks = -1
and a min.isr parameter than matches their user case.

This change was discussed in the past in the context of KAFKA-1555
(min.isr), but let us know if you have any questions or concerns
regarding this change.

Gwen

RE: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Felix GV <fv...@linkedin.com.INVALID>.
Next time the protocol is evolved and new error codes can be introduced, would it make sense to add a new one called "Deprecated" (or "Deprecation" or "DeprecatedOperation" or whatever sounds best)? This would act as a more precise form of "Unknown" error. It could help identify what the problem is more easily when debugging clients.

Of course, this is the kind of lever one would prefer never pulling, but when you need it, you're better off having it than not, and if you end up having it and never using it, it does not do much harm either.


--

Felix GV
Data Infrastructure Engineer
Distributed Data Systems
LinkedIn

fgv@linkedin.com<ma...@linkedin.com>
linkedin.com/in/felixgv<http://linkedin.com/in/felixgv>
________________________________
From: kafka-clients@googlegroups.com [kafka-clients@googlegroups.com] on behalf of Magnus Edenhill [magnus@edenhill.se]
Sent: Thursday, January 15, 2015 10:40 AM
To: dev@kafka.apache.org
Cc: kafka-clients@googlegroups.com
Subject: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker


I very much agree on what Joe is saying, let's use the version field as intended
and be very strict with not removing nor altering existing behaviour without bumping the version.
Old API versions could be deprecated (documentation only?) immediately and removed completely in the next minor version bump (0.8->0.9).

An API to query supported API versions would be a good addition in the long run but doesn't help current clients
much as such a request to an older broker version will kill the connection without any error reporting to the client, thus making it rather useless in the short term.

Regards,
Magnus


--
You received this message because you are subscribed to the Google Groups "kafka-clients" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kafka-clients+unsubscribe@googlegroups.com<ma...@googlegroups.com>.
To post to this group, send email to kafka-clients@googlegroups.com<ma...@googlegroups.com>.
Visit this group at http://groups.google.com/group/kafka-clients.
To view this discussion on the web visit https://groups.google.com/d/msgid/kafka-clients/CAHCQUcBtJ1nXi5_dEaHyR2QcRycQHh03rUCY%2BRo2Ussg9kM6UQ%40mail.gmail.com<https://groups.google.com/d/msgid/kafka-clients/CAHCQUcBtJ1nXi5_dEaHyR2QcRycQHh03rUCY%2BRo2Ussg9kM6UQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Magnus Edenhill <ma...@edenhill.se>.
I very much agree on what Joe is saying, let's use the version field as
intended
and be very strict with not removing nor altering existing behaviour
without bumping the version.
Old API versions could be deprecated (documentation only?) immediately and
removed completely in the next minor version bump (0.8->0.9).

An API to query supported API versions would be a good addition in the long
run but doesn't help current clients
much as such a request to an older broker version will kill the connection
without any error reporting to the client, thus making it rather useless in
the short term.

Regards,
Magnus

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Guozhang Wang <wa...@gmail.com>.
+1 on Joe's suggestions, glad to see it happening!

On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> The errors are part of the KIP process now, so I think the clients are
> safe :)
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>
> On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <st...@gmail.com>
> wrote:
> > Agree errors should be part of the protocol
> >
> >> On Jan 15, 2015, at 17:59, Gwen Shapira <gs...@cloudera.com> wrote:
> >>
> >> Hi,
> >>
> >> I got convinced by Joe and Dana that errors are indeed part of the
> >> protocol and can't be randomly added.
> >>
> >> So, it looks like we need to bump version of ProduceRequest in the
> >> following way:
> >> Version 0 -> accept acks >1. I think we should keep the existing
> >> behavior too (i.e. not replace it with -1) to avoid surprising
> >> clients, but I'm willing to hear other opinions.
> >> Version 1 -> do not accept acks >1 and return an error.
> >> Are we ok with the error I added in KAFKA-1697? We can use something
> >> less specific like InvalidRequestParameter. This error can be reused
> >> in the future and reduce the need to add errors, but will also be less
> >> clear to the client and its users. Maybe even add the error message
> >> string to the protocol in addition to the error code? (since we are
> >> bumping versions....)
> >>
> >> I think maintaining the old version throughout 0.8.X makes sense. IMO
> >> dropping it for 0.9 is feasible, but I'll let client owners help make
> >> that call.
> >>
> >> Am I missing anything? Should I start a KIP? It seems like a KIP-type
> >> discussion :)
> >>
> >> Gwen
> >>
> >>
> >> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
> >> <ew...@confluent.io> wrote:
> >>> Gwen,
> >>>
> >>> I think the only option that wouldn't require a protocol version
> change is
> >>> the one where acks > 1 is converted to acks = -1 since it's the only
> one
> >>> that doesn't potentially break older clients. The protocol guide says
> that
> >>> the expected upgrade path is servers first, then clients, so old
> clients,
> >>> including non-java clients, that may be using acks > 1 should be able
> to
> >>> work with a new broker version.
> >>>
> >>> It's more work, but I think dealing with the protocol change is the
> right
> >>> thing to do since it eventually gets us to the behavior I think is
> better --
> >>> the broker should reject requests with invalid values. I think Joe and
> I
> >>> were basically in agreement. In my mind the major piece missing from
> his
> >>> description is how long we're going to maintain his "case 0" behavior.
> It's
> >>> impractical to maintain old versions forever, but it sounds like there
> >>> hasn't been a decision on how long to maintain them. Maybe that's
> another
> >>> item to add to KIPs -- protocol versions and behavior need to be
> listed as
> >>> deprecated and the earliest version in which they'll be removed should
> be
> >>> specified so users can understand which versions are guaranteed to be
> >>> compatible, even if they're using (well-written) non-java clients.
> >>>
> >>> -Ewen
> >>>
> >>>
> >>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <da...@gmail.com>
> wrote:
> >>>>
> >>>>> clients don't break on unknown errors
> >>>>
> >>>> maybe true for the official java clients, but I dont think the
> assumption
> >>>> holds true for community-maintained clients and users of those
> clients.
> >>>> kafka-python generally follows the fail-fast philosophy and raises an
> >>>> exception on any unrecognized error code in any server response.  in
> this
> >>>> case, kafka-python allows users to set their own required-acks policy
> when
> >>>> creating a producer instance.  It is possible that users of
> kafka-python
> >>>> have deployed producer code that uses ack>1 -- perhaps in production
> >>>> environments -- and for those users the new error code will crash
> their
> >>>> producer code.  I would not be surprised if the same were true of
> other
> >>>> community clients.
> >>>>
> >>>> *one reason for the fail-fast approach is that there isn't great
> >>>> documentation on what errors to expect for each request / response --
> so
> >>>> we
> >>>> use failures to alert that some error case is not handled properly.
> and
> >>>> because of that, introducing new error cases without bumping the api
> >>>> version is likely to cause those errors to get raised/thrown all the
> way
> >>>> back up to the user.  of course we (client maintainers) can fix the
> issues
> >>>> in the client libraries and suggest users upgrade, but it's not the
> ideal
> >>>> situation.
> >>>>
> >>>>
> >>>> long-winded way of saying: I agree w/ Joe.
> >>>>
> >>>> -Dana
> >>>>
> >>>>
> >>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <gshapira@cloudera.com
> >
> >>>> wrote:
> >>>>
> >>>>> Is the protocol bump caused by the behavior change or the new error
> >>>>> code?
> >>>>>
> >>>>> 1) IMO, error_codes are data, and clients can expect to receive
> errors
> >>>>> that they don't understand (i.e. unknown errors). AFAIK, clients
> don't
> >>>>> break on unknown errors, they are simple more challenging to debug.
> If
> >>>>> we document the new behavior, then its definitely debuggable and
> >>>>> fixable.
> >>>>>
> >>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1
> were
> >>>>> never documented, and are not supported by Kafka clients starting
> with
> >>>>> version 0.8.2. I'm not sure this requires a protocol bump either,
> >>>>> although its a better case than new error codes.
> >>>>>
> >>>>> Thanks,
> >>>>> Gwen
> >>>>>
> >>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <jo...@stealth.ly>
> >>>>> wrote:
> >>>>>> Looping in the mailing list that the client developers live on
> because
> >>>>> they
> >>>>>> are all not on dev (though they should be if they want to be helping
> >>>>>> to
> >>>>>> build the best client libraries they can).
> >>>>>>
> >>>>>> I whole hardily believe that we need to not break existing
> >>>>>> functionality
> >>>>> of
> >>>>>> the client protocol, ever.
> >>>>>>
> >>>>>> There are many reasons for this and we have other threads on the
> >>>>>> mailing
> >>>>>> list where we are discussing that topic (no pun intended) that I
> don't
> >>>>> want
> >>>>>> to re-hash here.
> >>>>>>
> >>>>>> If we change wire protocol functionality OR the binary format
> (either)
> >>>>>> we
> >>>>>> must bump version AND treat version as a feature flag with backward
> >>>>>> compatibility support until it is deprecated for some time for folks
> >>>>>> to
> >>>>> deal
> >>>>>> with it.
> >>>>>>
> >>>>>> match version = {
> >>>>>> case 0: keepDoingWhatWeWereDoing()
> >>>>>> case 1: doNewStuff()
> >>>>>> case 2: doEvenMoreNewStuff()
> >>>>>> }
> >>>>>>
> >>>>>> has to be a practice we adopt imho ... I know feature flags can be
> >>>>> construed
> >>>>>> as "messy code" but I am eager to hear another (better? different?)
> >>>>> solution
> >>>>>> to this.
> >>>>>>
> >>>>>> If we don't do a feature flag like this specifically with this
> change
> >>>>> then
> >>>>>> what happens is that someone upgrades their brokers with a rolling
> >>>>> restart
> >>>>>> in 0.8.3 and every single one of their producer requests start to
> fail
> >>>>> and
> >>>>>> they have a major production outage. eeeek!!!!
> >>>>>>
> >>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need people
> to
> >>>>> start
> >>>>>> using 0,1,-1 but we need to-do that in a way that is going to work
> for
> >>>>>> everyone.
> >>>>>>
> >>>>>> Old producers and consumers must keep working with new brokers and
> if
> >>>>>> we
> >>>>> are
> >>>>>> not going to support that then I am unclear what the use of
> "version"
> >>>>>> is
> >>>>>> based on our original intentions of having it because of the
> >>>>>> 0.7=>-0.8.
> >>>>> We
> >>>>>> said no more breaking changes when we did that.
> >>>>>>
> >>>>>> - Joe Stein
> >>>>>>
> >>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
> >>>>> ewen@confluent.io>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Right, so this looks like it could create an issue similar to
> what's
> >>>>>>> currently being discussed in
> >>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users now
> get
> >>>>>>> errors
> >>>>>>> under conditions when they previously wouldn't. Old clients won't
> >>>>>>> even
> >>>>>>> know
> >>>>>>> about the error code, so besides failing they won't even be able to
> >>>>>>> log
> >>>>>>> any
> >>>>>>> meaningful error messages.
> >>>>>>>
> >>>>>>> I think there are two options for compatibility:
> >>>>>>>
> >>>>>>> 1. An alternative change is to remove the ack > 1 code, but
> silently
> >>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the same
> as
> >>>>>>> other
> >>>>>>> changes to behavior since the interaction between the client and
> >>>>>>> server
> >>>>>>> remains the same, no error codes change, etc. The client might just
> >>>>>>> see
> >>>>>>> some increased latency since the message might need to be
> replicated
> >>>>>>> to
> >>>>>>> more brokers than they requested.
> >>>>>>> 2. Split this into two patches, one that bumps the protocol version
> >>>>>>> on
> >>>>>>> that
> >>>>>>> message to include the new error code but maintains both old (now
> >>>>>>> deprecated) and new behavior, then a second that would be applied
> in
> >>>>>>> a
> >>>>>>> later release that removes the old protocol + code for handling
> acks
> >>>>> 1.
> >>>>>>>
> >>>>>>> 2 is probably the right thing to do. If we specify the release when
> >>>>> we'll
> >>>>>>> remove the deprecated protocol at the time of deprecation it makes
> >>>>> things
> >>>>>>> a
> >>>>>>> lot easier for people writing non-java clients and could give users
> >>>>> better
> >>>>>>> predictability (e.g. if clients are at most 1 major release behind
> >>>>>>> brokers,
> >>>>>>> they'll remain compatible but possibly use deprecated features).
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
> gshapira@cloudera.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Kafka Devs,
> >>>>>>>>
> >>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on the
> >>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a heads
> up
> >>>>>>>> on
> >>>>>>>> what these changes mean.
> >>>>>>>>
> >>>>>>>> Once this patch is included, any produce requests that include
> >>>>>>>> request.required.acks > 1 will result in an exception. This will
> be
> >>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
> >>>>>>>> UnknownException in existing versions (sorry, but I can't add
> error
> >>>>>>>> codes retroactively).
> >>>>>>>>
> >>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and
> >>>>>>>> new),
> >>>>>>>> but we expect impact on users with older producers that relied on
> >>>>>>>> acks
> >>>>>>>>> 1 and external clients (i.e python, go, etc).
> >>>>>>>>
> >>>>>>>> Users who relied on acks > 1 are expected to switch to using acks
> =
> >>>>>>>> -1
> >>>>>>>> and a min.isr parameter than matches their user case.
> >>>>>>>>
> >>>>>>>> This change was discussed in the past in the context of KAFKA-1555
> >>>>>>>> (min.isr), but let us know if you have any questions or concerns
> >>>>>>>> regarding this change.
> >>>>>>>>
> >>>>>>>> Gwen
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Thanks,
> >>>>>>> Ewen
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> You received this message because you are subscribed to the Google
> >>>>>> Groups
> >>>>>> "kafka-clients" group.
> >>>>>> To unsubscribe from this group and stop receiving emails from it,
> send
> >>>>>> an
> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
> >>>>>> To post to this group, send email to kafka-clients@googlegroups.com
> .
> >>>>>> Visit this group at http://groups.google.com/group/kafka-clients.
> >>>>>> To view this discussion on the web visit
> >>>>>
> >>>>>
> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
> >>>>> .
> >>>>>>
> >>>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>>
> >>>>> --
> >>>>> You received this message because you are subscribed to the Google
> >>>>> Groups
> >>>>> "kafka-clients" group.
> >>>>> To unsubscribe from this group and stop receiving emails from it,
> send
> >>>>> an
> >>>>> email to kafka-clients+unsubscribe@googlegroups.com.
> >>>>> To post to this group, send email to kafka-clients@googlegroups.com.
> >>>>> Visit this group at http://groups.google.com/group/kafka-clients.
> >>>>> To view this discussion on the web visit
> >>>>>
> >>>>>
> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
> >>>>> .
> >>>>> For more options, visit https://groups.google.com/d/optout.
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Ewen
>



-- 
-- Guozhang

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Gwen Shapira <gs...@cloudera.com>.
We have the new warning in 0.8.2.

I updated KIP-1 with the new plan:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks

I'm waiting a day for additional discussion and if there are no
replies, I'll send the [VOTE] email.

Gwen

On Mon, Jan 19, 2015 at 10:00 AM, Gwen Shapira <gs...@cloudera.com> wrote:
> Sounds good to me.
> I'll open a new JIRA for 0.8.2 with just an extra log warning, to
> avoid making KAFKA-1697 any more confusing.
>
> On Mon, Jan 19, 2015 at 9:46 AM, Joe Stein <jo...@stealth.ly> wrote:
>> << For 2, how about we make a change to log a warning for ack > 1 in 0.8.2
>> and then drop the ack > 1 support in trunk (w/o bumping up the protocol
>> version)?
>>
>> +1
>>
>>
>> On Mon, Jan 19, 2015 at 12:35 PM, Jun Rao <ju...@confluent.io> wrote:
>>>
>>> For 2, how about we make a change to log a warning for ack > 1 in 0.8.2
>>> and then drop the ack > 1 support in trunk (w/o bumping up the protocol
>>> version)? Thanks,
>>>
>>> Jun
>>>
>>> On Sun, Jan 18, 2015 at 8:24 PM, Gwen Shapira <gs...@cloudera.com>
>>> wrote:
>>>>
>>>> Overall, agree on point #1, less sure on point #2.
>>>>
>>>> 1. Some protocols never ever add new errors, while others add errors
>>>> without bumping versions. HTTP is a good example of the second type.
>>>> HTTP-451 was added fairly recently, there are some errors specific to
>>>> NGINX, etc. No one cares. I think we should properly document in the
>>>> wire-protocol doc that new errors can be added, and I think we should
>>>> strongly suggest (and implement ourselves) that unknown error codes
>>>> should be shown to users (or at least logged), so they can be googled
>>>> and understood through our documentation.
>>>> In addition, hierarchy of error codes, so clients will know if an
>>>> error is retry-able just by looking at the code could be nice. Same
>>>> for adding an error string to the protocol. These are future
>>>> enhancements that should be discussed separately.
>>>>
>>>> 2. I think we want to allow admins to upgrade their Kafka brokers
>>>> without having to chase down clients in their organization and without
>>>> getting blamed if clients break. I think it makes sense to have one
>>>> version that will support existing behavior, but log warnings, so
>>>> admins will know about misbehaving clients and can track them down
>>>> before an upgrade that breaks them (or before the broken config causes
>>>> them to lose data!). Hopefully this is indeed a very rare behavior and
>>>> we are taking extra precaution for nothing, but I have customers where
>>>> one traumatic upgrade means they will never upgrade a Kafka again, so
>>>> I'm being conservative.
>>>>
>>>> Gwen
>>>>
>>>>
>>>> On Sun, Jan 18, 2015 at 3:50 PM, Jun Rao <ju...@confluent.io> wrote:
>>>> > Overall, I agree with Jay on both points.
>>>> >
>>>> > 1. I think it's reasonable to add new error codes w/o bumping up the
>>>> > protocol version. In most cases, by adding new error codes, we are just
>>>> > refining the categorization of those unknown errors. So, a client
>>>> > shouldn't
>>>> > behave worse than before as long as unknown errors have been properly
>>>> > handled.
>>>> >
>>>> > 2. I think it's reasonable to just document that 0.8.2 will be the last
>>>> > release that will support ack > 1 and remove the support completely in
>>>> > trunk
>>>> > w/o bumping up the protocol. This is because (a) we never included ack
>>>> > > 1
>>>> > explicitly in the documentation and so the usage should be limited; (2)
>>>> > ack
>>>> >> 1 doesn't provide the guarantee that people really want and so it
>>>> > shouldn't really be used.
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Jun
>>>> >
>>>> >
>>>> > On Sun, Jan 18, 2015 at 11:03 AM, Jay Kreps <ja...@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Hey guys,
>>>> >>
>>>> >> I really think we are discussing two things here:
>>>> >>
>>>> >> How should we generally handle changes to the set of errors? Should
>>>> >> introducing new errors be considered a protocol change or should we
>>>> >> reserve
>>>> >> the right to introduce new error codes?
>>>> >> Given that this particular change is possibly incompatible, how should
>>>> >> we
>>>> >> handle it?
>>>> >>
>>>> >> I think it would be good for people who are responding here to be
>>>> >> specific
>>>> >> about which they are addressing.
>>>> >>
>>>> >> Here is what I think:
>>>> >>
>>>> >> 1. Errors should be extensible within a protocol version.
>>>> >>
>>>> >> We should change the protocol documentation to list the errors that
>>>> >> can be
>>>> >> given back from each api, their meaning, and how to handle them, BUT
>>>> >> we
>>>> >> should explicitly state that the set of errors are open ended. That is
>>>> >> we
>>>> >> should reserve the right to introduce new errors and explicitly state
>>>> >> that
>>>> >> clients need a blanket "unknown error" handling mechanism. The error
>>>> >> can
>>>> >> link to the protocol definition (something like "Unknown error 42, see
>>>> >> protocol definition at http://link"). We could make this work really
>>>> >> well by
>>>> >> instructing all the clients to report the error in a very googlable
>>>> >> way as
>>>> >> Oracle does with their error format (e.g. "ORA-32") so that if you
>>>> >> ever get
>>>> >> the raw error google will take you to the definition.
>>>> >>
>>>> >> I agree that a more rigid definition seems like "right thing", but
>>>> >> having
>>>> >> just implemented two clients and spent a bunch of time on the server
>>>> >> side, I
>>>> >> think, it will work out poorly in practice. Here is why:
>>>> >>
>>>> >> I think we will make a lot of mistakes in nailing down the set of
>>>> >> error
>>>> >> codes up front and we will end up going through 3-4 churns of the
>>>> >> protocol
>>>> >> definition just realizing the set of errors that can be thrown. I
>>>> >> think this
>>>> >> churn will actually make life worse for clients that now have to
>>>> >> figure out
>>>> >> 7 identical versions of the protocol and will be a mess in terms of
>>>> >> testing
>>>> >> on the server side. I actually know this to be true because while
>>>> >> implementing the clients I tried to guess the errors that could be
>>>> >> thrown,
>>>> >> then checked my guess by close code inspection. It turned out that I
>>>> >> always
>>>> >> missed things in my belief about errors, but more importantly even
>>>> >> after
>>>> >> close code inspection I found tons of other errors in my stress
>>>> >> testing.
>>>> >> In practice error handling always involves calling out one or two
>>>> >> meaningful failures that have special recovery and then a blanket case
>>>> >> that
>>>> >> just handles everything else. It's true that some clients may not have
>>>> >> done
>>>> >> this well, but I think it is for the best if they fix that.
>>>> >> Reserving the right to add errors doesn't mean we will do it without
>>>> >> care.
>>>> >> We will think through each change and decide whether giving a little
>>>> >> more
>>>> >> precision in the error is worth the overhead and churn of a protocol
>>>> >> version
>>>> >> bump.
>>>> >>
>>>> >> 2. In this case in particular we should not introduce a new protocol
>>>> >> version
>>>> >>
>>>> >> In this particular case we are saying that acks > 1 doesn't make sense
>>>> >> and
>>>> >> we want to give an error to people specifying this so that they change
>>>> >> their
>>>> >> configuration. This is a configuration that few people use and we want
>>>> >> to
>>>> >> just make it an error. The bad behavior will just be that the error
>>>> >> will not
>>>> >> be as good as it could be. I think that is a better tradeoff than
>>>> >> introducing a separate protocol version (this may be true of the java
>>>> >> clients too).
>>>> >>
>>>> >> We will have lots of cases like this in the future and we aren't going
>>>> >> to
>>>> >> want to churn the protocol for each of them. For example we previously
>>>> >> had
>>>> >> to get more precise about which characters were legal and which
>>>> >> illegal in
>>>> >> topic names.
>>>> >>
>>>> >> -Jay
>>>> >>
>>>> >> On Fri, Jan 16, 2015 at 11:55 AM, Gwen Shapira <gs...@cloudera.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> I updated the KIP: Using acks > 1 in version 0 will log a WARN
>>>> >>> message
>>>> >>> in the broker about client using deprecated behavior (suggested by
>>>> >>> Joe
>>>> >>> in the JIRA, and I think it makes sense).
>>>> >>>
>>>> >>> Gwen
>>>> >>>
>>>> >>> On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira
>>>> >>> <gs...@cloudera.com>
>>>> >>> wrote:
>>>> >>> > How about we continue the discussion on this thread, so we won't
>>>> >>> > lose
>>>> >>> > the context of this discussion, and put it up for VOTE when this
>>>> >>> > has
>>>> >>> > been finalized?
>>>> >>> >
>>>> >>> > On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <ne...@confluent.io>
>>>> >>> > wrote:
>>>> >>> >> Gwen,
>>>> >>> >>
>>>> >>> >> KIP write-up looks good. According to the rest of the KIP process
>>>> >>> >> proposal,
>>>> >>> >> would you like to start a DISCUSS/VOTE thread for it?
>>>> >>> >>
>>>> >>> >> Thanks,
>>>> >>> >> Neha
>>>> >>> >>
>>>> >>> >> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava
>>>> >>> >> <ew...@confluent.io>
>>>> >>> >> wrote:
>>>> >>> >>
>>>> >>> >>> Gwen -- KIP write up looks good. Deprecation schedule probably
>>>> >>> >>> needs
>>>> >>> >>> to be
>>>> >>> >>> more specific, but I think that discussion probably needs to
>>>> >>> >>> happen
>>>> >>> >>> after a
>>>> >>> >>> solution is agreed upon.
>>>> >>> >>>
>>>> >>> >>> Jay -- I think "older clients will get a bad error message
>>>> >>> >>> instead of
>>>> >>> >>> a
>>>> >>> >>> good one" isn't what would be happening with this change.
>>>> >>> >>> Previously
>>>> >>> >>> they
>>>> >>> >>> wouldn't have received an error and they would have been able to
>>>> >>> >>> produce
>>>> >>> >>> messages. After the change they'll just receive this new error
>>>> >>> >>> message
>>>> >>> >>> which their clients can't possibly handle gracefully since it
>>>> >>> >>> didn't
>>>> >>> >>> exist
>>>> >>> >>> when the client was written. Whether the acks > 1 setting was
>>>> >>> >>> actually
>>>> >>> >>> accomplishing what they thought doesn't matter. Someone could
>>>> >>> >>> have
>>>> >>> >>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok
>>>> >>> >>> setting for
>>>> >>> >>> their applications, set it as a default across a bunch of apps,
>>>> >>> >>> then
>>>> >>> >>> follow
>>>> >>> >>> the recommended upgrade path of updating brokers to 0.8.2 and all
>>>> >>> >>> their
>>>> >>> >>> apps will start failing on produce requests.
>>>> >>> >>>
>>>> >>> >>>
>>>> >>> >>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com>
>>>> >>> >>> wrote:
>>>> >>> >>>
>>>> >>> >>> > This is a good case to discuss.
>>>> >>> >>> >
>>>> >>> >>> > Let's figure the general case of how we want to handle errors
>>>> >>> >>> > and
>>>> >>> >>> > get
>>>> >>> >>> that
>>>> >>> >>> > documented in the protocol. The problem right now is that we
>>>> >>> >>> > give
>>>> >>> >>> > no
>>>> >>> >>> > guidance on this. I actually thought Gwen's suggestion made
>>>> >>> >>> > sense
>>>> >>> >>> > on the
>>>> >>> >>> > guidance we should have given which is that we will enumerate a
>>>> >>> >>> > set
>>>> >>> >>> > of
>>>> >>> >>> > errors and their meaning for each API but it is possible that
>>>> >>> >>> > other
>>>> >>> >>> errors
>>>> >>> >>> > will occur and they should be handled (maybe poorly) in the
>>>> >>> >>> > same
>>>> >>> >>> > way
>>>> >>> >>> > UNKNOWN_ERROR is handled which is our normal escape hatch for
>>>> >>> >>> > things like
>>>> >>> >>> > OOMException.
>>>> >>> >>> >
>>>> >>> >>> > I really do think we shouldn't be dogmatic here: In considering
>>>> >>> >>> > a
>>>> >>> >>> > change
>>>> >>> >>> > to errors we should consider the potential ill-effect vs the
>>>> >>> >>> > complexity
>>>> >>> >>> of
>>>> >>> >>> > yet another protocol version.
>>>> >>> >>> >
>>>> >>> >>> > In this case I actually am not sure we need to bump the
>>>> >>> >>> > protocol
>>>> >>> >>> > because
>>>> >>> >>> > the whole point of the change was to make a setting we think
>>>> >>> >>> > doesn't make
>>>> >>> >>> > sense break, right? Well this will break it. It seems like the
>>>> >>> >>> > only
>>>> >>> >>> > downside is that older clients will get a bad error message
>>>> >>> >>> > instead
>>>> >>> >>> > of a
>>>> >>> >>> > good one. But it isn't like we will have rendered a client
>>>> >>> >>> > unusable, it
>>>> >>> >>> is
>>>> >>> >>> > just that they will need to change their config.
>>>> >>> >>> >
>>>> >>> >>> > -Jay
>>>> >>> >>> >
>>>> >>> >>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira
>>>> >>> >>> > <gs...@cloudera.com>
>>>> >>> >>> > wrote:
>>>> >>> >>> >
>>>> >>> >>> >> I created a KIP for this suggestion:
>>>> >>> >>> >>
>>>> >>> >>> >>
>>>> >>> >>> >>
>>>> >>> >>>
>>>> >>> >>>
>>>> >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>>>> >>> >>> >>
>>>> >>> >>> >> Basically documenting what was already discussed here.
>>>> >>> >>> >> Comments
>>>> >>> >>> >> will
>>>> >>> >>> >> be awesome!
>>>> >>> >>> >>
>>>> >>> >>> >> Gwen
>>>> >>> >>> >>
>>>> >>> >>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira
>>>> >>> >>> >> <gs...@cloudera.com>
>>>> >>> >>> >> wrote:
>>>> >>> >>> >> > The errors are part of the KIP process now, so I think the
>>>> >>> >>> >> > clients are
>>>> >>> >>> >> safe :)
>>>> >>> >>> >> >
>>>> >>> >>> >> >
>>>> >>> >>> >>
>>>> >>> >>>
>>>> >>> >>>
>>>> >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>>> >>> >>> >> >
>>>> >>> >>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin
>>>> >>> >>> >> > <st...@gmail.com>
>>>> >>> >>> >> wrote:
>>>> >>> >>> >> >> Agree errors should be part of the protocol
>>>> >>> >>> >> >>
>>>> >>> >>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira
>>>> >>> >>> >> >>> <gs...@cloudera.com>
>>>> >>> >>> >> wrote:
>>>> >>> >>> >> >>>
>>>> >>> >>> >> >>> Hi,
>>>> >>> >>> >> >>>
>>>> >>> >>> >> >>> I got convinced by Joe and Dana that errors are indeed
>>>> >>> >>> >> >>> part of
>>>> >>> >>> >> >>> the
>>>> >>> >>> >> >>> protocol and can't be randomly added.
>>>> >>> >>> >> >>>
>>>> >>> >>> >> >>> So, it looks like we need to bump version of
>>>> >>> >>> >> >>> ProduceRequest in
>>>> >>> >>> >> >>> the
>>>> >>> >>> >> >>> following way:
>>>> >>> >>> >> >>> Version 0 -> accept acks >1. I think we should keep the
>>>> >>> >>> >> >>> existing
>>>> >>> >>> >> >>> behavior too (i.e. not replace it with -1) to avoid
>>>> >>> >>> >> >>> surprising
>>>> >>> >>> >> >>> clients, but I'm willing to hear other opinions.
>>>> >>> >>> >> >>> Version 1 -> do not accept acks >1 and return an error.
>>>> >>> >>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use
>>>> >>> >>> >> >>> something
>>>> >>> >>> >> >>> less specific like InvalidRequestParameter. This error can
>>>> >>> >>> >> >>> be
>>>> >>> >>> >> >>> reused
>>>> >>> >>> >> >>> in the future and reduce the need to add errors, but will
>>>> >>> >>> >> >>> also
>>>> >>> >>> >> >>> be
>>>> >>> >>> less
>>>> >>> >>> >> >>> clear to the client and its users. Maybe even add the
>>>> >>> >>> >> >>> error
>>>> >>> >>> >> >>> message
>>>> >>> >>> >> >>> string to the protocol in addition to the error code?
>>>> >>> >>> >> >>> (since
>>>> >>> >>> >> >>> we are
>>>> >>> >>> >> >>> bumping versions....)
>>>> >>> >>> >> >>>
>>>> >>> >>> >> >>> I think maintaining the old version throughout 0.8.X makes
>>>> >>> >>> >> >>> sense.
>>>> >>> >>> IMO
>>>> >>> >>> >> >>> dropping it for 0.9 is feasible, but I'll let client
>>>> >>> >>> >> >>> owners
>>>> >>> >>> >> >>> help
>>>> >>> >>> make
>>>> >>> >>> >> >>> that call.
>>>> >>> >>> >> >>>
>>>> >>> >>> >> >>> Am I missing anything? Should I start a KIP? It seems like
>>>> >>> >>> >> >>> a
>>>> >>> >>> KIP-type
>>>> >>> >>> >> >>> discussion :)
>>>> >>> >>> >> >>>
>>>> >>> >>> >> >>> Gwen
>>>> >>> >>> >> >>>
>>>> >>> >>> >> >>>
>>>> >>> >>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>>>> >>> >>> >> >>> <ew...@confluent.io> wrote:
>>>> >>> >>> >> >>>> Gwen,
>>>> >>> >>> >> >>>>
>>>> >>> >>> >> >>>> I think the only option that wouldn't require a protocol
>>>> >>> >>> >> >>>> version
>>>> >>> >>> >> change is
>>>> >>> >>> >> >>>> the one where acks > 1 is converted to acks = -1 since
>>>> >>> >>> >> >>>> it's
>>>> >>> >>> >> >>>> the
>>>> >>> >>> only
>>>> >>> >>> >> one
>>>> >>> >>> >> >>>> that doesn't potentially break older clients. The
>>>> >>> >>> >> >>>> protocol
>>>> >>> >>> >> >>>> guide
>>>> >>> >>> >> says that
>>>> >>> >>> >> >>>> the expected upgrade path is servers first, then clients,
>>>> >>> >>> >> >>>> so
>>>> >>> >>> >> >>>> old
>>>> >>> >>> >> clients,
>>>> >>> >>> >> >>>> including non-java clients, that may be using acks > 1
>>>> >>> >>> >> >>>> should
>>>> >>> >>> >> >>>> be
>>>> >>> >>> >> able to
>>>> >>> >>> >> >>>> work with a new broker version.
>>>> >>> >>> >> >>>>
>>>> >>> >>> >> >>>> It's more work, but I think dealing with the protocol
>>>> >>> >>> >> >>>> change
>>>> >>> >>> >> >>>> is the
>>>> >>> >>> >> right
>>>> >>> >>> >> >>>> thing to do since it eventually gets us to the behavior I
>>>> >>> >>> >> >>>> think is
>>>> >>> >>> >> better --
>>>> >>> >>> >> >>>> the broker should reject requests with invalid values. I
>>>> >>> >>> >> >>>> think Joe
>>>> >>> >>> >> and I
>>>> >>> >>> >> >>>> were basically in agreement. In my mind the major piece
>>>> >>> >>> >> >>>> missing
>>>> >>> >>> from
>>>> >>> >>> >> his
>>>> >>> >>> >> >>>> description is how long we're going to maintain his "case
>>>> >>> >>> >> >>>> 0"
>>>> >>> >>> >> behavior. It's
>>>> >>> >>> >> >>>> impractical to maintain old versions forever, but it
>>>> >>> >>> >> >>>> sounds
>>>> >>> >>> >> >>>> like
>>>> >>> >>> >> there
>>>> >>> >>> >> >>>> hasn't been a decision on how long to maintain them.
>>>> >>> >>> >> >>>> Maybe
>>>> >>> >>> >> >>>> that's
>>>> >>> >>> >> another
>>>> >>> >>> >> >>>> item to add to KIPs -- protocol versions and behavior
>>>> >>> >>> >> >>>> need to
>>>> >>> >>> >> >>>> be
>>>> >>> >>> >> listed as
>>>> >>> >>> >> >>>> deprecated and the earliest version in which they'll be
>>>> >>> >>> >> >>>> removed
>>>> >>> >>> >> should be
>>>> >>> >>> >> >>>> specified so users can understand which versions are
>>>> >>> >>> >> >>>> guaranteed to
>>>> >>> >>> be
>>>> >>> >>> >> >>>> compatible, even if they're using (well-written) non-java
>>>> >>> >>> >> >>>> clients.
>>>> >>> >>> >> >>>>
>>>> >>> >>> >> >>>> -Ewen
>>>> >>> >>> >> >>>>
>>>> >>> >>> >> >>>>
>>>> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
>>>> >>> >>> >> dana.powers@gmail.com> wrote:
>>>> >>> >>> >> >>>>>
>>>> >>> >>> >> >>>>>> clients don't break on unknown errors
>>>> >>> >>> >> >>>>>
>>>> >>> >>> >> >>>>> maybe true for the official java clients, but I dont
>>>> >>> >>> >> >>>>> think
>>>> >>> >>> >> >>>>> the
>>>> >>> >>> >> assumption
>>>> >>> >>> >> >>>>> holds true for community-maintained clients and users of
>>>> >>> >>> >> >>>>> those
>>>> >>> >>> >> clients.
>>>> >>> >>> >> >>>>> kafka-python generally follows the fail-fast philosophy
>>>> >>> >>> >> >>>>> and
>>>> >>> >>> >> >>>>> raises
>>>> >>> >>> >> an
>>>> >>> >>> >> >>>>> exception on any unrecognized error code in any server
>>>> >>> >>> >> >>>>> response.
>>>> >>> >>> >> in this
>>>> >>> >>> >> >>>>> case, kafka-python allows users to set their own
>>>> >>> >>> >> >>>>> required-acks
>>>> >>> >>> >> policy when
>>>> >>> >>> >> >>>>> creating a producer instance.  It is possible that users
>>>> >>> >>> >> >>>>> of
>>>> >>> >>> >> kafka-python
>>>> >>> >>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps
>>>> >>> >>> >> >>>>> in
>>>> >>> >>> production
>>>> >>> >>> >> >>>>> environments -- and for those users the new error code
>>>> >>> >>> >> >>>>> will
>>>> >>> >>> >> >>>>> crash
>>>> >>> >>> >> their
>>>> >>> >>> >> >>>>> producer code.  I would not be surprised if the same
>>>> >>> >>> >> >>>>> were
>>>> >>> >>> >> >>>>> true of
>>>> >>> >>> >> other
>>>> >>> >>> >> >>>>> community clients.
>>>> >>> >>> >> >>>>>
>>>> >>> >>> >> >>>>> *one reason for the fail-fast approach is that there
>>>> >>> >>> >> >>>>> isn't
>>>> >>> >>> >> >>>>> great
>>>> >>> >>> >> >>>>> documentation on what errors to expect for each request
>>>> >>> >>> >> >>>>> /
>>>> >>> >>> >> >>>>> response
>>>> >>> >>> >> -- so
>>>> >>> >>> >> >>>>> we
>>>> >>> >>> >> >>>>> use failures to alert that some error case is not
>>>> >>> >>> >> >>>>> handled
>>>> >>> >>> >> properly.  and
>>>> >>> >>> >> >>>>> because of that, introducing new error cases without
>>>> >>> >>> >> >>>>> bumping
>>>> >>> >>> >> >>>>> the
>>>> >>> >>> api
>>>> >>> >>> >> >>>>> version is likely to cause those errors to get
>>>> >>> >>> >> >>>>> raised/thrown
>>>> >>> >>> >> >>>>> all
>>>> >>> >>> >> the way
>>>> >>> >>> >> >>>>> back up to the user.  of course we (client maintainers)
>>>> >>> >>> >> >>>>> can
>>>> >>> >>> >> >>>>> fix
>>>> >>> >>> the
>>>> >>> >>> >> issues
>>>> >>> >>> >> >>>>> in the client libraries and suggest users upgrade, but
>>>> >>> >>> >> >>>>> it's
>>>> >>> >>> >> >>>>> not
>>>> >>> >>> the
>>>> >>> >>> >> ideal
>>>> >>> >>> >> >>>>> situation.
>>>> >>> >>> >> >>>>>
>>>> >>> >>> >> >>>>>
>>>> >>> >>> >> >>>>> long-winded way of saying: I agree w/ Joe.
>>>> >>> >>> >> >>>>>
>>>> >>> >>> >> >>>>> -Dana
>>>> >>> >>> >> >>>>>
>>>> >>> >>> >> >>>>>
>>>> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
>>>> >>> >>> >> gshapira@cloudera.com>
>>>> >>> >>> >> >>>>> wrote:
>>>> >>> >>> >> >>>>>
>>>> >>> >>> >> >>>>>> Is the protocol bump caused by the behavior change or
>>>> >>> >>> >> >>>>>> the
>>>> >>> >>> >> >>>>>> new
>>>> >>> >>> error
>>>> >>> >>> >> >>>>>> code?
>>>> >>> >>> >> >>>>>>
>>>> >>> >>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to
>>>> >>> >>> >> >>>>>> receive
>>>> >>> >>> >> errors
>>>> >>> >>> >> >>>>>> that they don't understand (i.e. unknown errors).
>>>> >>> >>> >> >>>>>> AFAIK,
>>>> >>> >>> >> >>>>>> clients
>>>> >>> >>> >> don't
>>>> >>> >>> >> >>>>>> break on unknown errors, they are simple more
>>>> >>> >>> >> >>>>>> challenging
>>>> >>> >>> >> >>>>>> to
>>>> >>> >>> >> debug. If
>>>> >>> >>> >> >>>>>> we document the new behavior, then its definitely
>>>> >>> >>> >> >>>>>> debuggable and
>>>> >>> >>> >> >>>>>> fixable.
>>>> >>> >>> >> >>>>>>
>>>> >>> >>> >> >>>>>> 2) The behavior change is basically a deprecation -
>>>> >>> >>> >> >>>>>> i.e.
>>>> >>> >>> >> >>>>>> acks > 1
>>>> >>> >>> >> were
>>>> >>> >>> >> >>>>>> never documented, and are not supported by Kafka
>>>> >>> >>> >> >>>>>> clients
>>>> >>> >>> >> >>>>>> starting
>>>> >>> >>> >> with
>>>> >>> >>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol
>>>> >>> >>> >> >>>>>> bump
>>>> >>> >>> >> >>>>>> either,
>>>> >>> >>> >> >>>>>> although its a better case than new error codes.
>>>> >>> >>> >> >>>>>>
>>>> >>> >>> >> >>>>>> Thanks,
>>>> >>> >>> >> >>>>>> Gwen
>>>> >>> >>> >> >>>>>>
>>>> >>> >>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
>>>> >>> >>> joe.stein@stealth.ly>
>>>> >>> >>> >> >>>>>> wrote:
>>>> >>> >>> >> >>>>>>> Looping in the mailing list that the client developers
>>>> >>> >>> >> >>>>>>> live on
>>>> >>> >>> >> because
>>>> >>> >>> >> >>>>>> they
>>>> >>> >>> >> >>>>>>> are all not on dev (though they should be if they want
>>>> >>> >>> >> >>>>>>> to
>>>> >>> >>> >> >>>>>>> be
>>>> >>> >>> >> helping
>>>> >>> >>> >> >>>>>>> to
>>>> >>> >>> >> >>>>>>> build the best client libraries they can).
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> I whole hardily believe that we need to not break
>>>> >>> >>> >> >>>>>>> existing
>>>> >>> >>> >> >>>>>>> functionality
>>>> >>> >>> >> >>>>>> of
>>>> >>> >>> >> >>>>>>> the client protocol, ever.
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> There are many reasons for this and we have other
>>>> >>> >>> >> >>>>>>> threads
>>>> >>> >>> >> >>>>>>> on the
>>>> >>> >>> >> >>>>>>> mailing
>>>> >>> >>> >> >>>>>>> list where we are discussing that topic (no pun
>>>> >>> >>> >> >>>>>>> intended)
>>>> >>> >>> >> >>>>>>> that I
>>>> >>> >>> >> don't
>>>> >>> >>> >> >>>>>> want
>>>> >>> >>> >> >>>>>>> to re-hash here.
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> If we change wire protocol functionality OR the binary
>>>> >>> >>> >> >>>>>>> format
>>>> >>> >>> >> (either)
>>>> >>> >>> >> >>>>>>> we
>>>> >>> >>> >> >>>>>>> must bump version AND treat version as a feature flag
>>>> >>> >>> >> >>>>>>> with
>>>> >>> >>> >> backward
>>>> >>> >>> >> >>>>>>> compatibility support until it is deprecated for some
>>>> >>> >>> >> >>>>>>> time
>>>> >>> >>> >> >>>>>>> for
>>>> >>> >>> >> folks
>>>> >>> >>> >> >>>>>>> to
>>>> >>> >>> >> >>>>>> deal
>>>> >>> >>> >> >>>>>>> with it.
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> match version = {
>>>> >>> >>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
>>>> >>> >>> >> >>>>>>> case 1: doNewStuff()
>>>> >>> >>> >> >>>>>>> case 2: doEvenMoreNewStuff()
>>>> >>> >>> >> >>>>>>> }
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> has to be a practice we adopt imho ... I know feature
>>>> >>> >>> >> >>>>>>> flags can
>>>> >>> >>> be
>>>> >>> >>> >> >>>>>> construed
>>>> >>> >>> >> >>>>>>> as "messy code" but I am eager to hear another
>>>> >>> >>> >> >>>>>>> (better?
>>>> >>> >>> >> different?)
>>>> >>> >>> >> >>>>>> solution
>>>> >>> >>> >> >>>>>>> to this.
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> If we don't do a feature flag like this specifically
>>>> >>> >>> >> >>>>>>> with
>>>> >>> >>> >> >>>>>>> this
>>>> >>> >>> >> change
>>>> >>> >>> >> >>>>>> then
>>>> >>> >>> >> >>>>>>> what happens is that someone upgrades their brokers
>>>> >>> >>> >> >>>>>>> with a
>>>> >>> >>> rolling
>>>> >>> >>> >> >>>>>> restart
>>>> >>> >>> >> >>>>>>> in 0.8.3 and every single one of their producer
>>>> >>> >>> >> >>>>>>> requests
>>>> >>> >>> >> >>>>>>> start
>>>> >>> >>> to
>>>> >>> >>> >> fail
>>>> >>> >>> >> >>>>>> and
>>>> >>> >>> >> >>>>>>> they have a major production outage. eeeek!!!!
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we
>>>> >>> >>> >> >>>>>>> *REALLY*
>>>> >>> >>> >> >>>>>>> need
>>>> >>> >>> >> people to
>>>> >>> >>> >> >>>>>> start
>>>> >>> >>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is
>>>> >>> >>> >> >>>>>>> going
>>>> >>> >>> >> >>>>>>> to
>>>> >>> >>> >> work for
>>>> >>> >>> >> >>>>>>> everyone.
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> Old producers and consumers must keep working with new
>>>> >>> >>> >> >>>>>>> brokers
>>>> >>> >>> >> and if
>>>> >>> >>> >> >>>>>>> we
>>>> >>> >>> >> >>>>>> are
>>>> >>> >>> >> >>>>>>> not going to support that then I am unclear what the
>>>> >>> >>> >> >>>>>>> use
>>>> >>> >>> >> >>>>>>> of
>>>> >>> >>> >> "version"
>>>> >>> >>> >> >>>>>>> is
>>>> >>> >>> >> >>>>>>> based on our original intentions of having it because
>>>> >>> >>> >> >>>>>>> of
>>>> >>> >>> >> >>>>>>> the
>>>> >>> >>> >> >>>>>>> 0.7=>-0.8.
>>>> >>> >>> >> >>>>>> We
>>>> >>> >>> >> >>>>>>> said no more breaking changes when we did that.
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> - Joe Stein
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen
>>>> >>> >>> >> >>>>>>> Cheslack-Postava <
>>>> >>> >>> >> >>>>>> ewen@confluent.io>
>>>> >>> >>> >> >>>>>>> wrote:
>>>> >>> >>> >> >>>>>>>>
>>>> >>> >>> >> >>>>>>>> Right, so this looks like it could create an issue
>>>> >>> >>> >> >>>>>>>> similar to
>>>> >>> >>> >> what's
>>>> >>> >>> >> >>>>>>>> currently being discussed in
>>>> >>> >>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649
>>>> >>> >>> >> >>>>>>>> where
>>>> >>> >>> >> >>>>>>>> users
>>>> >>> >>> >> now get
>>>> >>> >>> >> >>>>>>>> errors
>>>> >>> >>> >> >>>>>>>> under conditions when they previously wouldn't. Old
>>>> >>> >>> >> >>>>>>>> clients
>>>> >>> >>> won't
>>>> >>> >>> >> >>>>>>>> even
>>>> >>> >>> >> >>>>>>>> know
>>>> >>> >>> >> >>>>>>>> about the error code, so besides failing they won't
>>>> >>> >>> >> >>>>>>>> even
>>>> >>> >>> >> >>>>>>>> be
>>>> >>> >>> able
>>>> >>> >>> >> to
>>>> >>> >>> >> >>>>>>>> log
>>>> >>> >>> >> >>>>>>>> any
>>>> >>> >>> >> >>>>>>>> meaningful error messages.
>>>> >>> >>> >> >>>>>>>>
>>>> >>> >>> >> >>>>>>>> I think there are two options for compatibility:
>>>> >>> >>> >> >>>>>>>>
>>>> >>> >>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1
>>>> >>> >>> >> >>>>>>>> code,
>>>> >>> >>> >> >>>>>>>> but
>>>> >>> >>> >> silently
>>>> >>> >>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This
>>>> >>> >>> >> >>>>>>>> isn't
>>>> >>> >>> >> >>>>>>>> the
>>>> >>> >>> >> same as
>>>> >>> >>> >> >>>>>>>> other
>>>> >>> >>> >> >>>>>>>> changes to behavior since the interaction between the
>>>> >>> >>> >> >>>>>>>> client
>>>> >>> >>> and
>>>> >>> >>> >> >>>>>>>> server
>>>> >>> >>> >> >>>>>>>> remains the same, no error codes change, etc. The
>>>> >>> >>> >> >>>>>>>> client
>>>> >>> >>> >> >>>>>>>> might
>>>> >>> >>> >> just
>>>> >>> >>> >> >>>>>>>> see
>>>> >>> >>> >> >>>>>>>> some increased latency since the message might need
>>>> >>> >>> >> >>>>>>>> to be
>>>> >>> >>> >> replicated
>>>> >>> >>> >> >>>>>>>> to
>>>> >>> >>> >> >>>>>>>> more brokers than they requested.
>>>> >>> >>> >> >>>>>>>> 2. Split this into two patches, one that bumps the
>>>> >>> >>> >> >>>>>>>> protocol
>>>> >>> >>> >> version
>>>> >>> >>> >> >>>>>>>> on
>>>> >>> >>> >> >>>>>>>> that
>>>> >>> >>> >> >>>>>>>> message to include the new error code but maintains
>>>> >>> >>> >> >>>>>>>> both
>>>> >>> >>> >> >>>>>>>> old
>>>> >>> >>> (now
>>>> >>> >>> >> >>>>>>>> deprecated) and new behavior, then a second that
>>>> >>> >>> >> >>>>>>>> would be
>>>> >>> >>> >> applied in
>>>> >>> >>> >> >>>>>>>> a
>>>> >>> >>> >> >>>>>>>> later release that removes the old protocol + code
>>>> >>> >>> >> >>>>>>>> for
>>>> >>> >>> >> >>>>>>>> handling
>>>> >>> >>> >> acks
>>>> >>> >>> >> >>>>>> 1.
>>>> >>> >>> >> >>>>>>>>
>>>> >>> >>> >> >>>>>>>> 2 is probably the right thing to do. If we specify
>>>> >>> >>> >> >>>>>>>> the
>>>> >>> >>> >> >>>>>>>> release
>>>> >>> >>> >> when
>>>> >>> >>> >> >>>>>> we'll
>>>> >>> >>> >> >>>>>>>> remove the deprecated protocol at the time of
>>>> >>> >>> >> >>>>>>>> deprecation
>>>> >>> >>> >> >>>>>>>> it
>>>> >>> >>> >> makes
>>>> >>> >>> >> >>>>>> things
>>>> >>> >>> >> >>>>>>>> a
>>>> >>> >>> >> >>>>>>>> lot easier for people writing non-java clients and
>>>> >>> >>> >> >>>>>>>> could
>>>> >>> >>> >> >>>>>>>> give
>>>> >>> >>> >> users
>>>> >>> >>> >> >>>>>> better
>>>> >>> >>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major
>>>> >>> >>> >> >>>>>>>> release
>>>> >>> >>> >> behind
>>>> >>> >>> >> >>>>>>>> brokers,
>>>> >>> >>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
>>>> >>> >>> features).
>>>> >>> >>> >> >>>>>>>>
>>>> >>> >>> >> >>>>>>>>
>>>> >>> >>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
>>>> >>> >>> >> gshapira@cloudera.com>
>>>> >>> >>> >> >>>>>>>> wrote:
>>>> >>> >>> >> >>>>>>>>
>>>> >>> >>> >> >>>>>>>>> Hi Kafka Devs,
>>>> >>> >>> >> >>>>>>>>>
>>>> >>> >>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related
>>>> >>> >>> >> >>>>>>>>> to
>>>> >>> >>> >> >>>>>>>>> ack>1 on
>>>> >>> >>> >> the
>>>> >>> >>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give
>>>> >>> >>> >> >>>>>>>>> everyone
>>>> >>> >>> >> >>>>>>>>> a
>>>> >>> >>> >> heads up
>>>> >>> >>> >> >>>>>>>>> on
>>>> >>> >>> >> >>>>>>>>> what these changes mean.
>>>> >>> >>> >> >>>>>>>>>
>>>> >>> >>> >> >>>>>>>>> Once this patch is included, any produce requests
>>>> >>> >>> >> >>>>>>>>> that
>>>> >>> >>> >> >>>>>>>>> include
>>>> >>> >>> >> >>>>>>>>> request.required.acks > 1 will result in an
>>>> >>> >>> >> >>>>>>>>> exception.
>>>> >>> >>> >> >>>>>>>>> This
>>>> >>> >>> >> will be
>>>> >>> >>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I
>>>> >>> >>> >> >>>>>>>>> assume)
>>>> >>> >>> and
>>>> >>> >>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I
>>>> >>> >>> >> >>>>>>>>> can't add
>>>> >>> >>> >> error
>>>> >>> >>> >> >>>>>>>>> codes retroactively).
>>>> >>> >>> >> >>>>>>>>>
>>>> >>> >>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers
>>>> >>> >>> >> >>>>>>>>> (sync and
>>>> >>> >>> >> >>>>>>>>> new),
>>>> >>> >>> >> >>>>>>>>> but we expect impact on users with older producers
>>>> >>> >>> >> >>>>>>>>> that
>>>> >>> >>> >> >>>>>>>>> relied
>>>> >>> >>> >> on
>>>> >>> >>> >> >>>>>>>>> acks
>>>> >>> >>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
>>>> >>> >>> >> >>>>>>>>>
>>>> >>> >>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch
>>>> >>> >>> >> >>>>>>>>> to
>>>> >>> >>> >> >>>>>>>>> using
>>>> >>> >>> >> acks =
>>>> >>> >>> >> >>>>>>>>> -1
>>>> >>> >>> >> >>>>>>>>> and a min.isr parameter than matches their user
>>>> >>> >>> >> >>>>>>>>> case.
>>>> >>> >>> >> >>>>>>>>>
>>>> >>> >>> >> >>>>>>>>> This change was discussed in the past in the context
>>>> >>> >>> >> >>>>>>>>> of
>>>> >>> >>> >> KAFKA-1555
>>>> >>> >>> >> >>>>>>>>> (min.isr), but let us know if you have any questions
>>>> >>> >>> >> >>>>>>>>> or
>>>> >>> >>> concerns
>>>> >>> >>> >> >>>>>>>>> regarding this change.
>>>> >>> >>> >> >>>>>>>>>
>>>> >>> >>> >> >>>>>>>>> Gwen
>>>> >>> >>> >> >>>>>>>>
>>>> >>> >>> >> >>>>>>>>
>>>> >>> >>> >> >>>>>>>>
>>>> >>> >>> >> >>>>>>>> --
>>>> >>> >>> >> >>>>>>>> Thanks,
>>>> >>> >>> >> >>>>>>>> Ewen
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> --
>>>> >>> >>> >> >>>>>>> You received this message because you are subscribed
>>>> >>> >>> >> >>>>>>> to
>>>> >>> >>> >> >>>>>>> the
>>>> >>> >>> Google
>>>> >>> >>> >> >>>>>>> Groups
>>>> >>> >>> >> >>>>>>> "kafka-clients" group.
>>>> >>> >>> >> >>>>>>> To unsubscribe from this group and stop receiving
>>>> >>> >>> >> >>>>>>> emails
>>>> >>> >>> >> >>>>>>> from
>>>> >>> >>> it,
>>>> >>> >>> >> send
>>>> >>> >>> >> >>>>>>> an
>>>> >>> >>> >> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>>> >>> >>> >> >>>>>>> To post to this group, send email to
>>>> >>> >>> >> kafka-clients@googlegroups.com.
>>>> >>> >>> >> >>>>>>> Visit this group at
>>>> >>> >>> http://groups.google.com/group/kafka-clients.
>>>> >>> >>> >> >>>>>>> To view this discussion on the web visit
>>>> >>> >>> >> >>>>>>
>>>> >>> >>> >> >>>>>>
>>>> >>> >>> >>
>>>> >>> >>>
>>>> >>> >>>
>>>> >>> >>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>>>> >>> >>> >> >>>>>> .
>>>> >>> >>> >> >>>>>>>
>>>> >>> >>> >> >>>>>>> For more options, visit
>>>> >>> >>> >> >>>>>>> https://groups.google.com/d/optout.
>>>> >>> >>> >> >>>>>>
>>>> >>> >>> >> >>>>>> --
>>>> >>> >>> >> >>>>>> You received this message because you are subscribed to
>>>> >>> >>> >> >>>>>> the
>>>> >>> >>> Google
>>>> >>> >>> >> >>>>>> Groups
>>>> >>> >>> >> >>>>>> "kafka-clients" group.
>>>> >>> >>> >> >>>>>> To unsubscribe from this group and stop receiving
>>>> >>> >>> >> >>>>>> emails
>>>> >>> >>> >> >>>>>> from it,
>>>> >>> >>> >> send
>>>> >>> >>> >> >>>>>> an
>>>> >>> >>> >> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>>> >>> >>> >> >>>>>> To post to this group, send email to
>>>> >>> >>> >> kafka-clients@googlegroups.com.
>>>> >>> >>> >> >>>>>> Visit this group at
>>>> >>> >>> >> >>>>>> http://groups.google.com/group/kafka-clients
>>>> >>> >>> .
>>>> >>> >>> >> >>>>>> To view this discussion on the web visit
>>>> >>> >>> >> >>>>>>
>>>> >>> >>> >> >>>>>>
>>>> >>> >>> >>
>>>> >>> >>>
>>>> >>> >>>
>>>> >>> >>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>>>> >>> >>> >> >>>>>> .
>>>> >>> >>> >> >>>>>> For more options, visit
>>>> >>> >>> >> >>>>>> https://groups.google.com/d/optout.
>>>> >>> >>> >> >>>>
>>>> >>> >>> >> >>>>
>>>> >>> >>> >> >>>>
>>>> >>> >>> >> >>>>
>>>> >>> >>> >> >>>> --
>>>> >>> >>> >> >>>> Thanks,
>>>> >>> >>> >> >>>> Ewen
>>>> >>> >>> >>
>>>> >>> >>> >
>>>> >>> >>> >
>>>> >>> >>>
>>>> >>> >>>
>>>> >>> >>> --
>>>> >>> >>> Thanks,
>>>> >>> >>> Ewen
>>>> >>> >>>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> --
>>>> >>> >> Thanks,
>>>> >>> >> Neha
>>>> >>
>>>> >>
>>>> >> --
>>>> >> You received this message because you are subscribed to the Google
>>>> >> Groups
>>>> >> "kafka-clients" group.
>>>> >> To unsubscribe from this group and stop receiving emails from it, send
>>>> >> an
>>>> >> email to kafka-clients+unsubscribe@googlegroups.com.
>>>> >> To post to this group, send email to kafka-clients@googlegroups.com.
>>>> >> Visit this group at http://groups.google.com/group/kafka-clients.
>>>> >> To view this discussion on the web visit
>>>> >>
>>>> >> https://groups.google.com/d/msgid/kafka-clients/CAOeJiJh17CYq%3D-qgPu9rnArsPW%3D7RL9AAW_h%3DrrXx0%2BKhhKgNQ%40mail.gmail.com.
>>>> >>
>>>> >> For more options, visit https://groups.google.com/d/optout.
>>>> >
>>>> >
>>>
>>>
>>

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Gwen Shapira <gs...@cloudera.com>.
Sounds good to me.
I'll open a new JIRA for 0.8.2 with just an extra log warning, to
avoid making KAFKA-1697 any more confusing.

On Mon, Jan 19, 2015 at 9:46 AM, Joe Stein <jo...@stealth.ly> wrote:
> << For 2, how about we make a change to log a warning for ack > 1 in 0.8.2
> and then drop the ack > 1 support in trunk (w/o bumping up the protocol
> version)?
>
> +1
>
>
> On Mon, Jan 19, 2015 at 12:35 PM, Jun Rao <ju...@confluent.io> wrote:
>>
>> For 2, how about we make a change to log a warning for ack > 1 in 0.8.2
>> and then drop the ack > 1 support in trunk (w/o bumping up the protocol
>> version)? Thanks,
>>
>> Jun
>>
>> On Sun, Jan 18, 2015 at 8:24 PM, Gwen Shapira <gs...@cloudera.com>
>> wrote:
>>>
>>> Overall, agree on point #1, less sure on point #2.
>>>
>>> 1. Some protocols never ever add new errors, while others add errors
>>> without bumping versions. HTTP is a good example of the second type.
>>> HTTP-451 was added fairly recently, there are some errors specific to
>>> NGINX, etc. No one cares. I think we should properly document in the
>>> wire-protocol doc that new errors can be added, and I think we should
>>> strongly suggest (and implement ourselves) that unknown error codes
>>> should be shown to users (or at least logged), so they can be googled
>>> and understood through our documentation.
>>> In addition, hierarchy of error codes, so clients will know if an
>>> error is retry-able just by looking at the code could be nice. Same
>>> for adding an error string to the protocol. These are future
>>> enhancements that should be discussed separately.
>>>
>>> 2. I think we want to allow admins to upgrade their Kafka brokers
>>> without having to chase down clients in their organization and without
>>> getting blamed if clients break. I think it makes sense to have one
>>> version that will support existing behavior, but log warnings, so
>>> admins will know about misbehaving clients and can track them down
>>> before an upgrade that breaks them (or before the broken config causes
>>> them to lose data!). Hopefully this is indeed a very rare behavior and
>>> we are taking extra precaution for nothing, but I have customers where
>>> one traumatic upgrade means they will never upgrade a Kafka again, so
>>> I'm being conservative.
>>>
>>> Gwen
>>>
>>>
>>> On Sun, Jan 18, 2015 at 3:50 PM, Jun Rao <ju...@confluent.io> wrote:
>>> > Overall, I agree with Jay on both points.
>>> >
>>> > 1. I think it's reasonable to add new error codes w/o bumping up the
>>> > protocol version. In most cases, by adding new error codes, we are just
>>> > refining the categorization of those unknown errors. So, a client
>>> > shouldn't
>>> > behave worse than before as long as unknown errors have been properly
>>> > handled.
>>> >
>>> > 2. I think it's reasonable to just document that 0.8.2 will be the last
>>> > release that will support ack > 1 and remove the support completely in
>>> > trunk
>>> > w/o bumping up the protocol. This is because (a) we never included ack
>>> > > 1
>>> > explicitly in the documentation and so the usage should be limited; (2)
>>> > ack
>>> >> 1 doesn't provide the guarantee that people really want and so it
>>> > shouldn't really be used.
>>> >
>>> > Thanks,
>>> >
>>> > Jun
>>> >
>>> >
>>> > On Sun, Jan 18, 2015 at 11:03 AM, Jay Kreps <ja...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hey guys,
>>> >>
>>> >> I really think we are discussing two things here:
>>> >>
>>> >> How should we generally handle changes to the set of errors? Should
>>> >> introducing new errors be considered a protocol change or should we
>>> >> reserve
>>> >> the right to introduce new error codes?
>>> >> Given that this particular change is possibly incompatible, how should
>>> >> we
>>> >> handle it?
>>> >>
>>> >> I think it would be good for people who are responding here to be
>>> >> specific
>>> >> about which they are addressing.
>>> >>
>>> >> Here is what I think:
>>> >>
>>> >> 1. Errors should be extensible within a protocol version.
>>> >>
>>> >> We should change the protocol documentation to list the errors that
>>> >> can be
>>> >> given back from each api, their meaning, and how to handle them, BUT
>>> >> we
>>> >> should explicitly state that the set of errors are open ended. That is
>>> >> we
>>> >> should reserve the right to introduce new errors and explicitly state
>>> >> that
>>> >> clients need a blanket "unknown error" handling mechanism. The error
>>> >> can
>>> >> link to the protocol definition (something like "Unknown error 42, see
>>> >> protocol definition at http://link"). We could make this work really
>>> >> well by
>>> >> instructing all the clients to report the error in a very googlable
>>> >> way as
>>> >> Oracle does with their error format (e.g. "ORA-32") so that if you
>>> >> ever get
>>> >> the raw error google will take you to the definition.
>>> >>
>>> >> I agree that a more rigid definition seems like "right thing", but
>>> >> having
>>> >> just implemented two clients and spent a bunch of time on the server
>>> >> side, I
>>> >> think, it will work out poorly in practice. Here is why:
>>> >>
>>> >> I think we will make a lot of mistakes in nailing down the set of
>>> >> error
>>> >> codes up front and we will end up going through 3-4 churns of the
>>> >> protocol
>>> >> definition just realizing the set of errors that can be thrown. I
>>> >> think this
>>> >> churn will actually make life worse for clients that now have to
>>> >> figure out
>>> >> 7 identical versions of the protocol and will be a mess in terms of
>>> >> testing
>>> >> on the server side. I actually know this to be true because while
>>> >> implementing the clients I tried to guess the errors that could be
>>> >> thrown,
>>> >> then checked my guess by close code inspection. It turned out that I
>>> >> always
>>> >> missed things in my belief about errors, but more importantly even
>>> >> after
>>> >> close code inspection I found tons of other errors in my stress
>>> >> testing.
>>> >> In practice error handling always involves calling out one or two
>>> >> meaningful failures that have special recovery and then a blanket case
>>> >> that
>>> >> just handles everything else. It's true that some clients may not have
>>> >> done
>>> >> this well, but I think it is for the best if they fix that.
>>> >> Reserving the right to add errors doesn't mean we will do it without
>>> >> care.
>>> >> We will think through each change and decide whether giving a little
>>> >> more
>>> >> precision in the error is worth the overhead and churn of a protocol
>>> >> version
>>> >> bump.
>>> >>
>>> >> 2. In this case in particular we should not introduce a new protocol
>>> >> version
>>> >>
>>> >> In this particular case we are saying that acks > 1 doesn't make sense
>>> >> and
>>> >> we want to give an error to people specifying this so that they change
>>> >> their
>>> >> configuration. This is a configuration that few people use and we want
>>> >> to
>>> >> just make it an error. The bad behavior will just be that the error
>>> >> will not
>>> >> be as good as it could be. I think that is a better tradeoff than
>>> >> introducing a separate protocol version (this may be true of the java
>>> >> clients too).
>>> >>
>>> >> We will have lots of cases like this in the future and we aren't going
>>> >> to
>>> >> want to churn the protocol for each of them. For example we previously
>>> >> had
>>> >> to get more precise about which characters were legal and which
>>> >> illegal in
>>> >> topic names.
>>> >>
>>> >> -Jay
>>> >>
>>> >> On Fri, Jan 16, 2015 at 11:55 AM, Gwen Shapira <gs...@cloudera.com>
>>> >> wrote:
>>> >>>
>>> >>> I updated the KIP: Using acks > 1 in version 0 will log a WARN
>>> >>> message
>>> >>> in the broker about client using deprecated behavior (suggested by
>>> >>> Joe
>>> >>> in the JIRA, and I think it makes sense).
>>> >>>
>>> >>> Gwen
>>> >>>
>>> >>> On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira
>>> >>> <gs...@cloudera.com>
>>> >>> wrote:
>>> >>> > How about we continue the discussion on this thread, so we won't
>>> >>> > lose
>>> >>> > the context of this discussion, and put it up for VOTE when this
>>> >>> > has
>>> >>> > been finalized?
>>> >>> >
>>> >>> > On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <ne...@confluent.io>
>>> >>> > wrote:
>>> >>> >> Gwen,
>>> >>> >>
>>> >>> >> KIP write-up looks good. According to the rest of the KIP process
>>> >>> >> proposal,
>>> >>> >> would you like to start a DISCUSS/VOTE thread for it?
>>> >>> >>
>>> >>> >> Thanks,
>>> >>> >> Neha
>>> >>> >>
>>> >>> >> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava
>>> >>> >> <ew...@confluent.io>
>>> >>> >> wrote:
>>> >>> >>
>>> >>> >>> Gwen -- KIP write up looks good. Deprecation schedule probably
>>> >>> >>> needs
>>> >>> >>> to be
>>> >>> >>> more specific, but I think that discussion probably needs to
>>> >>> >>> happen
>>> >>> >>> after a
>>> >>> >>> solution is agreed upon.
>>> >>> >>>
>>> >>> >>> Jay -- I think "older clients will get a bad error message
>>> >>> >>> instead of
>>> >>> >>> a
>>> >>> >>> good one" isn't what would be happening with this change.
>>> >>> >>> Previously
>>> >>> >>> they
>>> >>> >>> wouldn't have received an error and they would have been able to
>>> >>> >>> produce
>>> >>> >>> messages. After the change they'll just receive this new error
>>> >>> >>> message
>>> >>> >>> which their clients can't possibly handle gracefully since it
>>> >>> >>> didn't
>>> >>> >>> exist
>>> >>> >>> when the client was written. Whether the acks > 1 setting was
>>> >>> >>> actually
>>> >>> >>> accomplishing what they thought doesn't matter. Someone could
>>> >>> >>> have
>>> >>> >>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok
>>> >>> >>> setting for
>>> >>> >>> their applications, set it as a default across a bunch of apps,
>>> >>> >>> then
>>> >>> >>> follow
>>> >>> >>> the recommended upgrade path of updating brokers to 0.8.2 and all
>>> >>> >>> their
>>> >>> >>> apps will start failing on produce requests.
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com>
>>> >>> >>> wrote:
>>> >>> >>>
>>> >>> >>> > This is a good case to discuss.
>>> >>> >>> >
>>> >>> >>> > Let's figure the general case of how we want to handle errors
>>> >>> >>> > and
>>> >>> >>> > get
>>> >>> >>> that
>>> >>> >>> > documented in the protocol. The problem right now is that we
>>> >>> >>> > give
>>> >>> >>> > no
>>> >>> >>> > guidance on this. I actually thought Gwen's suggestion made
>>> >>> >>> > sense
>>> >>> >>> > on the
>>> >>> >>> > guidance we should have given which is that we will enumerate a
>>> >>> >>> > set
>>> >>> >>> > of
>>> >>> >>> > errors and their meaning for each API but it is possible that
>>> >>> >>> > other
>>> >>> >>> errors
>>> >>> >>> > will occur and they should be handled (maybe poorly) in the
>>> >>> >>> > same
>>> >>> >>> > way
>>> >>> >>> > UNKNOWN_ERROR is handled which is our normal escape hatch for
>>> >>> >>> > things like
>>> >>> >>> > OOMException.
>>> >>> >>> >
>>> >>> >>> > I really do think we shouldn't be dogmatic here: In considering
>>> >>> >>> > a
>>> >>> >>> > change
>>> >>> >>> > to errors we should consider the potential ill-effect vs the
>>> >>> >>> > complexity
>>> >>> >>> of
>>> >>> >>> > yet another protocol version.
>>> >>> >>> >
>>> >>> >>> > In this case I actually am not sure we need to bump the
>>> >>> >>> > protocol
>>> >>> >>> > because
>>> >>> >>> > the whole point of the change was to make a setting we think
>>> >>> >>> > doesn't make
>>> >>> >>> > sense break, right? Well this will break it. It seems like the
>>> >>> >>> > only
>>> >>> >>> > downside is that older clients will get a bad error message
>>> >>> >>> > instead
>>> >>> >>> > of a
>>> >>> >>> > good one. But it isn't like we will have rendered a client
>>> >>> >>> > unusable, it
>>> >>> >>> is
>>> >>> >>> > just that they will need to change their config.
>>> >>> >>> >
>>> >>> >>> > -Jay
>>> >>> >>> >
>>> >>> >>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira
>>> >>> >>> > <gs...@cloudera.com>
>>> >>> >>> > wrote:
>>> >>> >>> >
>>> >>> >>> >> I created a KIP for this suggestion:
>>> >>> >>> >>
>>> >>> >>> >>
>>> >>> >>> >>
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>>> >>> >>> >>
>>> >>> >>> >> Basically documenting what was already discussed here.
>>> >>> >>> >> Comments
>>> >>> >>> >> will
>>> >>> >>> >> be awesome!
>>> >>> >>> >>
>>> >>> >>> >> Gwen
>>> >>> >>> >>
>>> >>> >>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira
>>> >>> >>> >> <gs...@cloudera.com>
>>> >>> >>> >> wrote:
>>> >>> >>> >> > The errors are part of the KIP process now, so I think the
>>> >>> >>> >> > clients are
>>> >>> >>> >> safe :)
>>> >>> >>> >> >
>>> >>> >>> >> >
>>> >>> >>> >>
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>> >>> >>> >> >
>>> >>> >>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin
>>> >>> >>> >> > <st...@gmail.com>
>>> >>> >>> >> wrote:
>>> >>> >>> >> >> Agree errors should be part of the protocol
>>> >>> >>> >> >>
>>> >>> >>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira
>>> >>> >>> >> >>> <gs...@cloudera.com>
>>> >>> >>> >> wrote:
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> Hi,
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> I got convinced by Joe and Dana that errors are indeed
>>> >>> >>> >> >>> part of
>>> >>> >>> >> >>> the
>>> >>> >>> >> >>> protocol and can't be randomly added.
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> So, it looks like we need to bump version of
>>> >>> >>> >> >>> ProduceRequest in
>>> >>> >>> >> >>> the
>>> >>> >>> >> >>> following way:
>>> >>> >>> >> >>> Version 0 -> accept acks >1. I think we should keep the
>>> >>> >>> >> >>> existing
>>> >>> >>> >> >>> behavior too (i.e. not replace it with -1) to avoid
>>> >>> >>> >> >>> surprising
>>> >>> >>> >> >>> clients, but I'm willing to hear other opinions.
>>> >>> >>> >> >>> Version 1 -> do not accept acks >1 and return an error.
>>> >>> >>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use
>>> >>> >>> >> >>> something
>>> >>> >>> >> >>> less specific like InvalidRequestParameter. This error can
>>> >>> >>> >> >>> be
>>> >>> >>> >> >>> reused
>>> >>> >>> >> >>> in the future and reduce the need to add errors, but will
>>> >>> >>> >> >>> also
>>> >>> >>> >> >>> be
>>> >>> >>> less
>>> >>> >>> >> >>> clear to the client and its users. Maybe even add the
>>> >>> >>> >> >>> error
>>> >>> >>> >> >>> message
>>> >>> >>> >> >>> string to the protocol in addition to the error code?
>>> >>> >>> >> >>> (since
>>> >>> >>> >> >>> we are
>>> >>> >>> >> >>> bumping versions....)
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> I think maintaining the old version throughout 0.8.X makes
>>> >>> >>> >> >>> sense.
>>> >>> >>> IMO
>>> >>> >>> >> >>> dropping it for 0.9 is feasible, but I'll let client
>>> >>> >>> >> >>> owners
>>> >>> >>> >> >>> help
>>> >>> >>> make
>>> >>> >>> >> >>> that call.
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> Am I missing anything? Should I start a KIP? It seems like
>>> >>> >>> >> >>> a
>>> >>> >>> KIP-type
>>> >>> >>> >> >>> discussion :)
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> Gwen
>>> >>> >>> >> >>>
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>>> >>> >>> >> >>> <ew...@confluent.io> wrote:
>>> >>> >>> >> >>>> Gwen,
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>> I think the only option that wouldn't require a protocol
>>> >>> >>> >> >>>> version
>>> >>> >>> >> change is
>>> >>> >>> >> >>>> the one where acks > 1 is converted to acks = -1 since
>>> >>> >>> >> >>>> it's
>>> >>> >>> >> >>>> the
>>> >>> >>> only
>>> >>> >>> >> one
>>> >>> >>> >> >>>> that doesn't potentially break older clients. The
>>> >>> >>> >> >>>> protocol
>>> >>> >>> >> >>>> guide
>>> >>> >>> >> says that
>>> >>> >>> >> >>>> the expected upgrade path is servers first, then clients,
>>> >>> >>> >> >>>> so
>>> >>> >>> >> >>>> old
>>> >>> >>> >> clients,
>>> >>> >>> >> >>>> including non-java clients, that may be using acks > 1
>>> >>> >>> >> >>>> should
>>> >>> >>> >> >>>> be
>>> >>> >>> >> able to
>>> >>> >>> >> >>>> work with a new broker version.
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>> It's more work, but I think dealing with the protocol
>>> >>> >>> >> >>>> change
>>> >>> >>> >> >>>> is the
>>> >>> >>> >> right
>>> >>> >>> >> >>>> thing to do since it eventually gets us to the behavior I
>>> >>> >>> >> >>>> think is
>>> >>> >>> >> better --
>>> >>> >>> >> >>>> the broker should reject requests with invalid values. I
>>> >>> >>> >> >>>> think Joe
>>> >>> >>> >> and I
>>> >>> >>> >> >>>> were basically in agreement. In my mind the major piece
>>> >>> >>> >> >>>> missing
>>> >>> >>> from
>>> >>> >>> >> his
>>> >>> >>> >> >>>> description is how long we're going to maintain his "case
>>> >>> >>> >> >>>> 0"
>>> >>> >>> >> behavior. It's
>>> >>> >>> >> >>>> impractical to maintain old versions forever, but it
>>> >>> >>> >> >>>> sounds
>>> >>> >>> >> >>>> like
>>> >>> >>> >> there
>>> >>> >>> >> >>>> hasn't been a decision on how long to maintain them.
>>> >>> >>> >> >>>> Maybe
>>> >>> >>> >> >>>> that's
>>> >>> >>> >> another
>>> >>> >>> >> >>>> item to add to KIPs -- protocol versions and behavior
>>> >>> >>> >> >>>> need to
>>> >>> >>> >> >>>> be
>>> >>> >>> >> listed as
>>> >>> >>> >> >>>> deprecated and the earliest version in which they'll be
>>> >>> >>> >> >>>> removed
>>> >>> >>> >> should be
>>> >>> >>> >> >>>> specified so users can understand which versions are
>>> >>> >>> >> >>>> guaranteed to
>>> >>> >>> be
>>> >>> >>> >> >>>> compatible, even if they're using (well-written) non-java
>>> >>> >>> >> >>>> clients.
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>> -Ewen
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
>>> >>> >>> >> dana.powers@gmail.com> wrote:
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>>> clients don't break on unknown errors
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>> maybe true for the official java clients, but I dont
>>> >>> >>> >> >>>>> think
>>> >>> >>> >> >>>>> the
>>> >>> >>> >> assumption
>>> >>> >>> >> >>>>> holds true for community-maintained clients and users of
>>> >>> >>> >> >>>>> those
>>> >>> >>> >> clients.
>>> >>> >>> >> >>>>> kafka-python generally follows the fail-fast philosophy
>>> >>> >>> >> >>>>> and
>>> >>> >>> >> >>>>> raises
>>> >>> >>> >> an
>>> >>> >>> >> >>>>> exception on any unrecognized error code in any server
>>> >>> >>> >> >>>>> response.
>>> >>> >>> >> in this
>>> >>> >>> >> >>>>> case, kafka-python allows users to set their own
>>> >>> >>> >> >>>>> required-acks
>>> >>> >>> >> policy when
>>> >>> >>> >> >>>>> creating a producer instance.  It is possible that users
>>> >>> >>> >> >>>>> of
>>> >>> >>> >> kafka-python
>>> >>> >>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps
>>> >>> >>> >> >>>>> in
>>> >>> >>> production
>>> >>> >>> >> >>>>> environments -- and for those users the new error code
>>> >>> >>> >> >>>>> will
>>> >>> >>> >> >>>>> crash
>>> >>> >>> >> their
>>> >>> >>> >> >>>>> producer code.  I would not be surprised if the same
>>> >>> >>> >> >>>>> were
>>> >>> >>> >> >>>>> true of
>>> >>> >>> >> other
>>> >>> >>> >> >>>>> community clients.
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>> *one reason for the fail-fast approach is that there
>>> >>> >>> >> >>>>> isn't
>>> >>> >>> >> >>>>> great
>>> >>> >>> >> >>>>> documentation on what errors to expect for each request
>>> >>> >>> >> >>>>> /
>>> >>> >>> >> >>>>> response
>>> >>> >>> >> -- so
>>> >>> >>> >> >>>>> we
>>> >>> >>> >> >>>>> use failures to alert that some error case is not
>>> >>> >>> >> >>>>> handled
>>> >>> >>> >> properly.  and
>>> >>> >>> >> >>>>> because of that, introducing new error cases without
>>> >>> >>> >> >>>>> bumping
>>> >>> >>> >> >>>>> the
>>> >>> >>> api
>>> >>> >>> >> >>>>> version is likely to cause those errors to get
>>> >>> >>> >> >>>>> raised/thrown
>>> >>> >>> >> >>>>> all
>>> >>> >>> >> the way
>>> >>> >>> >> >>>>> back up to the user.  of course we (client maintainers)
>>> >>> >>> >> >>>>> can
>>> >>> >>> >> >>>>> fix
>>> >>> >>> the
>>> >>> >>> >> issues
>>> >>> >>> >> >>>>> in the client libraries and suggest users upgrade, but
>>> >>> >>> >> >>>>> it's
>>> >>> >>> >> >>>>> not
>>> >>> >>> the
>>> >>> >>> >> ideal
>>> >>> >>> >> >>>>> situation.
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>> long-winded way of saying: I agree w/ Joe.
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>> -Dana
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
>>> >>> >>> >> gshapira@cloudera.com>
>>> >>> >>> >> >>>>> wrote:
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>>> Is the protocol bump caused by the behavior change or
>>> >>> >>> >> >>>>>> the
>>> >>> >>> >> >>>>>> new
>>> >>> >>> error
>>> >>> >>> >> >>>>>> code?
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to
>>> >>> >>> >> >>>>>> receive
>>> >>> >>> >> errors
>>> >>> >>> >> >>>>>> that they don't understand (i.e. unknown errors).
>>> >>> >>> >> >>>>>> AFAIK,
>>> >>> >>> >> >>>>>> clients
>>> >>> >>> >> don't
>>> >>> >>> >> >>>>>> break on unknown errors, they are simple more
>>> >>> >>> >> >>>>>> challenging
>>> >>> >>> >> >>>>>> to
>>> >>> >>> >> debug. If
>>> >>> >>> >> >>>>>> we document the new behavior, then its definitely
>>> >>> >>> >> >>>>>> debuggable and
>>> >>> >>> >> >>>>>> fixable.
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>> 2) The behavior change is basically a deprecation -
>>> >>> >>> >> >>>>>> i.e.
>>> >>> >>> >> >>>>>> acks > 1
>>> >>> >>> >> were
>>> >>> >>> >> >>>>>> never documented, and are not supported by Kafka
>>> >>> >>> >> >>>>>> clients
>>> >>> >>> >> >>>>>> starting
>>> >>> >>> >> with
>>> >>> >>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol
>>> >>> >>> >> >>>>>> bump
>>> >>> >>> >> >>>>>> either,
>>> >>> >>> >> >>>>>> although its a better case than new error codes.
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>> Thanks,
>>> >>> >>> >> >>>>>> Gwen
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
>>> >>> >>> joe.stein@stealth.ly>
>>> >>> >>> >> >>>>>> wrote:
>>> >>> >>> >> >>>>>>> Looping in the mailing list that the client developers
>>> >>> >>> >> >>>>>>> live on
>>> >>> >>> >> because
>>> >>> >>> >> >>>>>> they
>>> >>> >>> >> >>>>>>> are all not on dev (though they should be if they want
>>> >>> >>> >> >>>>>>> to
>>> >>> >>> >> >>>>>>> be
>>> >>> >>> >> helping
>>> >>> >>> >> >>>>>>> to
>>> >>> >>> >> >>>>>>> build the best client libraries they can).
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> I whole hardily believe that we need to not break
>>> >>> >>> >> >>>>>>> existing
>>> >>> >>> >> >>>>>>> functionality
>>> >>> >>> >> >>>>>> of
>>> >>> >>> >> >>>>>>> the client protocol, ever.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> There are many reasons for this and we have other
>>> >>> >>> >> >>>>>>> threads
>>> >>> >>> >> >>>>>>> on the
>>> >>> >>> >> >>>>>>> mailing
>>> >>> >>> >> >>>>>>> list where we are discussing that topic (no pun
>>> >>> >>> >> >>>>>>> intended)
>>> >>> >>> >> >>>>>>> that I
>>> >>> >>> >> don't
>>> >>> >>> >> >>>>>> want
>>> >>> >>> >> >>>>>>> to re-hash here.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> If we change wire protocol functionality OR the binary
>>> >>> >>> >> >>>>>>> format
>>> >>> >>> >> (either)
>>> >>> >>> >> >>>>>>> we
>>> >>> >>> >> >>>>>>> must bump version AND treat version as a feature flag
>>> >>> >>> >> >>>>>>> with
>>> >>> >>> >> backward
>>> >>> >>> >> >>>>>>> compatibility support until it is deprecated for some
>>> >>> >>> >> >>>>>>> time
>>> >>> >>> >> >>>>>>> for
>>> >>> >>> >> folks
>>> >>> >>> >> >>>>>>> to
>>> >>> >>> >> >>>>>> deal
>>> >>> >>> >> >>>>>>> with it.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> match version = {
>>> >>> >>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
>>> >>> >>> >> >>>>>>> case 1: doNewStuff()
>>> >>> >>> >> >>>>>>> case 2: doEvenMoreNewStuff()
>>> >>> >>> >> >>>>>>> }
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> has to be a practice we adopt imho ... I know feature
>>> >>> >>> >> >>>>>>> flags can
>>> >>> >>> be
>>> >>> >>> >> >>>>>> construed
>>> >>> >>> >> >>>>>>> as "messy code" but I am eager to hear another
>>> >>> >>> >> >>>>>>> (better?
>>> >>> >>> >> different?)
>>> >>> >>> >> >>>>>> solution
>>> >>> >>> >> >>>>>>> to this.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> If we don't do a feature flag like this specifically
>>> >>> >>> >> >>>>>>> with
>>> >>> >>> >> >>>>>>> this
>>> >>> >>> >> change
>>> >>> >>> >> >>>>>> then
>>> >>> >>> >> >>>>>>> what happens is that someone upgrades their brokers
>>> >>> >>> >> >>>>>>> with a
>>> >>> >>> rolling
>>> >>> >>> >> >>>>>> restart
>>> >>> >>> >> >>>>>>> in 0.8.3 and every single one of their producer
>>> >>> >>> >> >>>>>>> requests
>>> >>> >>> >> >>>>>>> start
>>> >>> >>> to
>>> >>> >>> >> fail
>>> >>> >>> >> >>>>>> and
>>> >>> >>> >> >>>>>>> they have a major production outage. eeeek!!!!
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we
>>> >>> >>> >> >>>>>>> *REALLY*
>>> >>> >>> >> >>>>>>> need
>>> >>> >>> >> people to
>>> >>> >>> >> >>>>>> start
>>> >>> >>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is
>>> >>> >>> >> >>>>>>> going
>>> >>> >>> >> >>>>>>> to
>>> >>> >>> >> work for
>>> >>> >>> >> >>>>>>> everyone.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> Old producers and consumers must keep working with new
>>> >>> >>> >> >>>>>>> brokers
>>> >>> >>> >> and if
>>> >>> >>> >> >>>>>>> we
>>> >>> >>> >> >>>>>> are
>>> >>> >>> >> >>>>>>> not going to support that then I am unclear what the
>>> >>> >>> >> >>>>>>> use
>>> >>> >>> >> >>>>>>> of
>>> >>> >>> >> "version"
>>> >>> >>> >> >>>>>>> is
>>> >>> >>> >> >>>>>>> based on our original intentions of having it because
>>> >>> >>> >> >>>>>>> of
>>> >>> >>> >> >>>>>>> the
>>> >>> >>> >> >>>>>>> 0.7=>-0.8.
>>> >>> >>> >> >>>>>> We
>>> >>> >>> >> >>>>>>> said no more breaking changes when we did that.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> - Joe Stein
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen
>>> >>> >>> >> >>>>>>> Cheslack-Postava <
>>> >>> >>> >> >>>>>> ewen@confluent.io>
>>> >>> >>> >> >>>>>>> wrote:
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> Right, so this looks like it could create an issue
>>> >>> >>> >> >>>>>>>> similar to
>>> >>> >>> >> what's
>>> >>> >>> >> >>>>>>>> currently being discussed in
>>> >>> >>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649
>>> >>> >>> >> >>>>>>>> where
>>> >>> >>> >> >>>>>>>> users
>>> >>> >>> >> now get
>>> >>> >>> >> >>>>>>>> errors
>>> >>> >>> >> >>>>>>>> under conditions when they previously wouldn't. Old
>>> >>> >>> >> >>>>>>>> clients
>>> >>> >>> won't
>>> >>> >>> >> >>>>>>>> even
>>> >>> >>> >> >>>>>>>> know
>>> >>> >>> >> >>>>>>>> about the error code, so besides failing they won't
>>> >>> >>> >> >>>>>>>> even
>>> >>> >>> >> >>>>>>>> be
>>> >>> >>> able
>>> >>> >>> >> to
>>> >>> >>> >> >>>>>>>> log
>>> >>> >>> >> >>>>>>>> any
>>> >>> >>> >> >>>>>>>> meaningful error messages.
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> I think there are two options for compatibility:
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1
>>> >>> >>> >> >>>>>>>> code,
>>> >>> >>> >> >>>>>>>> but
>>> >>> >>> >> silently
>>> >>> >>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This
>>> >>> >>> >> >>>>>>>> isn't
>>> >>> >>> >> >>>>>>>> the
>>> >>> >>> >> same as
>>> >>> >>> >> >>>>>>>> other
>>> >>> >>> >> >>>>>>>> changes to behavior since the interaction between the
>>> >>> >>> >> >>>>>>>> client
>>> >>> >>> and
>>> >>> >>> >> >>>>>>>> server
>>> >>> >>> >> >>>>>>>> remains the same, no error codes change, etc. The
>>> >>> >>> >> >>>>>>>> client
>>> >>> >>> >> >>>>>>>> might
>>> >>> >>> >> just
>>> >>> >>> >> >>>>>>>> see
>>> >>> >>> >> >>>>>>>> some increased latency since the message might need
>>> >>> >>> >> >>>>>>>> to be
>>> >>> >>> >> replicated
>>> >>> >>> >> >>>>>>>> to
>>> >>> >>> >> >>>>>>>> more brokers than they requested.
>>> >>> >>> >> >>>>>>>> 2. Split this into two patches, one that bumps the
>>> >>> >>> >> >>>>>>>> protocol
>>> >>> >>> >> version
>>> >>> >>> >> >>>>>>>> on
>>> >>> >>> >> >>>>>>>> that
>>> >>> >>> >> >>>>>>>> message to include the new error code but maintains
>>> >>> >>> >> >>>>>>>> both
>>> >>> >>> >> >>>>>>>> old
>>> >>> >>> (now
>>> >>> >>> >> >>>>>>>> deprecated) and new behavior, then a second that
>>> >>> >>> >> >>>>>>>> would be
>>> >>> >>> >> applied in
>>> >>> >>> >> >>>>>>>> a
>>> >>> >>> >> >>>>>>>> later release that removes the old protocol + code
>>> >>> >>> >> >>>>>>>> for
>>> >>> >>> >> >>>>>>>> handling
>>> >>> >>> >> acks
>>> >>> >>> >> >>>>>> 1.
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> 2 is probably the right thing to do. If we specify
>>> >>> >>> >> >>>>>>>> the
>>> >>> >>> >> >>>>>>>> release
>>> >>> >>> >> when
>>> >>> >>> >> >>>>>> we'll
>>> >>> >>> >> >>>>>>>> remove the deprecated protocol at the time of
>>> >>> >>> >> >>>>>>>> deprecation
>>> >>> >>> >> >>>>>>>> it
>>> >>> >>> >> makes
>>> >>> >>> >> >>>>>> things
>>> >>> >>> >> >>>>>>>> a
>>> >>> >>> >> >>>>>>>> lot easier for people writing non-java clients and
>>> >>> >>> >> >>>>>>>> could
>>> >>> >>> >> >>>>>>>> give
>>> >>> >>> >> users
>>> >>> >>> >> >>>>>> better
>>> >>> >>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major
>>> >>> >>> >> >>>>>>>> release
>>> >>> >>> >> behind
>>> >>> >>> >> >>>>>>>> brokers,
>>> >>> >>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
>>> >>> >>> features).
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
>>> >>> >>> >> gshapira@cloudera.com>
>>> >>> >>> >> >>>>>>>> wrote:
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>>> Hi Kafka Devs,
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related
>>> >>> >>> >> >>>>>>>>> to
>>> >>> >>> >> >>>>>>>>> ack>1 on
>>> >>> >>> >> the
>>> >>> >>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give
>>> >>> >>> >> >>>>>>>>> everyone
>>> >>> >>> >> >>>>>>>>> a
>>> >>> >>> >> heads up
>>> >>> >>> >> >>>>>>>>> on
>>> >>> >>> >> >>>>>>>>> what these changes mean.
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> Once this patch is included, any produce requests
>>> >>> >>> >> >>>>>>>>> that
>>> >>> >>> >> >>>>>>>>> include
>>> >>> >>> >> >>>>>>>>> request.required.acks > 1 will result in an
>>> >>> >>> >> >>>>>>>>> exception.
>>> >>> >>> >> >>>>>>>>> This
>>> >>> >>> >> will be
>>> >>> >>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I
>>> >>> >>> >> >>>>>>>>> assume)
>>> >>> >>> and
>>> >>> >>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I
>>> >>> >>> >> >>>>>>>>> can't add
>>> >>> >>> >> error
>>> >>> >>> >> >>>>>>>>> codes retroactively).
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers
>>> >>> >>> >> >>>>>>>>> (sync and
>>> >>> >>> >> >>>>>>>>> new),
>>> >>> >>> >> >>>>>>>>> but we expect impact on users with older producers
>>> >>> >>> >> >>>>>>>>> that
>>> >>> >>> >> >>>>>>>>> relied
>>> >>> >>> >> on
>>> >>> >>> >> >>>>>>>>> acks
>>> >>> >>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch
>>> >>> >>> >> >>>>>>>>> to
>>> >>> >>> >> >>>>>>>>> using
>>> >>> >>> >> acks =
>>> >>> >>> >> >>>>>>>>> -1
>>> >>> >>> >> >>>>>>>>> and a min.isr parameter than matches their user
>>> >>> >>> >> >>>>>>>>> case.
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> This change was discussed in the past in the context
>>> >>> >>> >> >>>>>>>>> of
>>> >>> >>> >> KAFKA-1555
>>> >>> >>> >> >>>>>>>>> (min.isr), but let us know if you have any questions
>>> >>> >>> >> >>>>>>>>> or
>>> >>> >>> concerns
>>> >>> >>> >> >>>>>>>>> regarding this change.
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> Gwen
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> --
>>> >>> >>> >> >>>>>>>> Thanks,
>>> >>> >>> >> >>>>>>>> Ewen
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> --
>>> >>> >>> >> >>>>>>> You received this message because you are subscribed
>>> >>> >>> >> >>>>>>> to
>>> >>> >>> >> >>>>>>> the
>>> >>> >>> Google
>>> >>> >>> >> >>>>>>> Groups
>>> >>> >>> >> >>>>>>> "kafka-clients" group.
>>> >>> >>> >> >>>>>>> To unsubscribe from this group and stop receiving
>>> >>> >>> >> >>>>>>> emails
>>> >>> >>> >> >>>>>>> from
>>> >>> >>> it,
>>> >>> >>> >> send
>>> >>> >>> >> >>>>>>> an
>>> >>> >>> >> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>> >>> >>> >> >>>>>>> To post to this group, send email to
>>> >>> >>> >> kafka-clients@googlegroups.com.
>>> >>> >>> >> >>>>>>> Visit this group at
>>> >>> >>> http://groups.google.com/group/kafka-clients.
>>> >>> >>> >> >>>>>>> To view this discussion on the web visit
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>>
>>> >>> >>> >>
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>>> >>> >>> >> >>>>>> .
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> For more options, visit
>>> >>> >>> >> >>>>>>> https://groups.google.com/d/optout.
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>> --
>>> >>> >>> >> >>>>>> You received this message because you are subscribed to
>>> >>> >>> >> >>>>>> the
>>> >>> >>> Google
>>> >>> >>> >> >>>>>> Groups
>>> >>> >>> >> >>>>>> "kafka-clients" group.
>>> >>> >>> >> >>>>>> To unsubscribe from this group and stop receiving
>>> >>> >>> >> >>>>>> emails
>>> >>> >>> >> >>>>>> from it,
>>> >>> >>> >> send
>>> >>> >>> >> >>>>>> an
>>> >>> >>> >> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>> >>> >>> >> >>>>>> To post to this group, send email to
>>> >>> >>> >> kafka-clients@googlegroups.com.
>>> >>> >>> >> >>>>>> Visit this group at
>>> >>> >>> >> >>>>>> http://groups.google.com/group/kafka-clients
>>> >>> >>> .
>>> >>> >>> >> >>>>>> To view this discussion on the web visit
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>>
>>> >>> >>> >>
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>>> >>> >>> >> >>>>>> .
>>> >>> >>> >> >>>>>> For more options, visit
>>> >>> >>> >> >>>>>> https://groups.google.com/d/optout.
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>> --
>>> >>> >>> >> >>>> Thanks,
>>> >>> >>> >> >>>> Ewen
>>> >>> >>> >>
>>> >>> >>> >
>>> >>> >>> >
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> --
>>> >>> >>> Thanks,
>>> >>> >>> Ewen
>>> >>> >>>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Thanks,
>>> >>> >> Neha
>>> >>
>>> >>
>>> >> --
>>> >> You received this message because you are subscribed to the Google
>>> >> Groups
>>> >> "kafka-clients" group.
>>> >> To unsubscribe from this group and stop receiving emails from it, send
>>> >> an
>>> >> email to kafka-clients+unsubscribe@googlegroups.com.
>>> >> To post to this group, send email to kafka-clients@googlegroups.com.
>>> >> Visit this group at http://groups.google.com/group/kafka-clients.
>>> >> To view this discussion on the web visit
>>> >>
>>> >> https://groups.google.com/d/msgid/kafka-clients/CAOeJiJh17CYq%3D-qgPu9rnArsPW%3D7RL9AAW_h%3DrrXx0%2BKhhKgNQ%40mail.gmail.com.
>>> >>
>>> >> For more options, visit https://groups.google.com/d/optout.
>>> >
>>> >
>>
>>
>

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Joe Stein <jo...@stealth.ly>.
<< For 2, how about we make a change to log a warning for ack > 1 in 0.8.2
and then drop the ack > 1 support in trunk (w/o bumping up the protocol
version)?

+1

On Mon, Jan 19, 2015 at 12:35 PM, Jun Rao <ju...@confluent.io> wrote:

> For 2, how about we make a change to log a warning for ack > 1 in 0.8.2
> and then drop the ack > 1 support in trunk (w/o bumping up the protocol
> version)? Thanks,
>
> Jun
>
> On Sun, Jan 18, 2015 at 8:24 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
>> Overall, agree on point #1, less sure on point #2.
>>
>> 1. Some protocols never ever add new errors, while others add errors
>> without bumping versions. HTTP is a good example of the second type.
>> HTTP-451 was added fairly recently, there are some errors specific to
>> NGINX, etc. No one cares. I think we should properly document in the
>> wire-protocol doc that new errors can be added, and I think we should
>> strongly suggest (and implement ourselves) that unknown error codes
>> should be shown to users (or at least logged), so they can be googled
>> and understood through our documentation.
>> In addition, hierarchy of error codes, so clients will know if an
>> error is retry-able just by looking at the code could be nice. Same
>> for adding an error string to the protocol. These are future
>> enhancements that should be discussed separately.
>>
>> 2. I think we want to allow admins to upgrade their Kafka brokers
>> without having to chase down clients in their organization and without
>> getting blamed if clients break. I think it makes sense to have one
>> version that will support existing behavior, but log warnings, so
>> admins will know about misbehaving clients and can track them down
>> before an upgrade that breaks them (or before the broken config causes
>> them to lose data!). Hopefully this is indeed a very rare behavior and
>> we are taking extra precaution for nothing, but I have customers where
>> one traumatic upgrade means they will never upgrade a Kafka again, so
>> I'm being conservative.
>>
>> Gwen
>>
>>
>> On Sun, Jan 18, 2015 at 3:50 PM, Jun Rao <ju...@confluent.io> wrote:
>> > Overall, I agree with Jay on both points.
>> >
>> > 1. I think it's reasonable to add new error codes w/o bumping up the
>> > protocol version. In most cases, by adding new error codes, we are just
>> > refining the categorization of those unknown errors. So, a client
>> shouldn't
>> > behave worse than before as long as unknown errors have been properly
>> > handled.
>> >
>> > 2. I think it's reasonable to just document that 0.8.2 will be the last
>> > release that will support ack > 1 and remove the support completely in
>> trunk
>> > w/o bumping up the protocol. This is because (a) we never included ack
>> > 1
>> > explicitly in the documentation and so the usage should be limited; (2)
>> ack
>> >> 1 doesn't provide the guarantee that people really want and so it
>> > shouldn't really be used.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> >
>> > On Sun, Jan 18, 2015 at 11:03 AM, Jay Kreps <ja...@gmail.com>
>> wrote:
>> >>
>> >> Hey guys,
>> >>
>> >> I really think we are discussing two things here:
>> >>
>> >> How should we generally handle changes to the set of errors? Should
>> >> introducing new errors be considered a protocol change or should we
>> reserve
>> >> the right to introduce new error codes?
>> >> Given that this particular change is possibly incompatible, how should
>> we
>> >> handle it?
>> >>
>> >> I think it would be good for people who are responding here to be
>> specific
>> >> about which they are addressing.
>> >>
>> >> Here is what I think:
>> >>
>> >> 1. Errors should be extensible within a protocol version.
>> >>
>> >> We should change the protocol documentation to list the errors that
>> can be
>> >> given back from each api, their meaning, and how to handle them, BUT we
>> >> should explicitly state that the set of errors are open ended. That is
>> we
>> >> should reserve the right to introduce new errors and explicitly state
>> that
>> >> clients need a blanket "unknown error" handling mechanism. The error
>> can
>> >> link to the protocol definition (something like "Unknown error 42, see
>> >> protocol definition at http://link"). We could make this work really
>> well by
>> >> instructing all the clients to report the error in a very googlable
>> way as
>> >> Oracle does with their error format (e.g. "ORA-32") so that if you
>> ever get
>> >> the raw error google will take you to the definition.
>> >>
>> >> I agree that a more rigid definition seems like "right thing", but
>> having
>> >> just implemented two clients and spent a bunch of time on the server
>> side, I
>> >> think, it will work out poorly in practice. Here is why:
>> >>
>> >> I think we will make a lot of mistakes in nailing down the set of error
>> >> codes up front and we will end up going through 3-4 churns of the
>> protocol
>> >> definition just realizing the set of errors that can be thrown. I
>> think this
>> >> churn will actually make life worse for clients that now have to
>> figure out
>> >> 7 identical versions of the protocol and will be a mess in terms of
>> testing
>> >> on the server side. I actually know this to be true because while
>> >> implementing the clients I tried to guess the errors that could be
>> thrown,
>> >> then checked my guess by close code inspection. It turned out that I
>> always
>> >> missed things in my belief about errors, but more importantly even
>> after
>> >> close code inspection I found tons of other errors in my stress
>> testing.
>> >> In practice error handling always involves calling out one or two
>> >> meaningful failures that have special recovery and then a blanket case
>> that
>> >> just handles everything else. It's true that some clients may not have
>> done
>> >> this well, but I think it is for the best if they fix that.
>> >> Reserving the right to add errors doesn't mean we will do it without
>> care.
>> >> We will think through each change and decide whether giving a little
>> more
>> >> precision in the error is worth the overhead and churn of a protocol
>> version
>> >> bump.
>> >>
>> >> 2. In this case in particular we should not introduce a new protocol
>> >> version
>> >>
>> >> In this particular case we are saying that acks > 1 doesn't make sense
>> and
>> >> we want to give an error to people specifying this so that they change
>> their
>> >> configuration. This is a configuration that few people use and we want
>> to
>> >> just make it an error. The bad behavior will just be that the error
>> will not
>> >> be as good as it could be. I think that is a better tradeoff than
>> >> introducing a separate protocol version (this may be true of the java
>> >> clients too).
>> >>
>> >> We will have lots of cases like this in the future and we aren't going
>> to
>> >> want to churn the protocol for each of them. For example we previously
>> had
>> >> to get more precise about which characters were legal and which
>> illegal in
>> >> topic names.
>> >>
>> >> -Jay
>> >>
>> >> On Fri, Jan 16, 2015 at 11:55 AM, Gwen Shapira <gs...@cloudera.com>
>> >> wrote:
>> >>>
>> >>> I updated the KIP: Using acks > 1 in version 0 will log a WARN message
>> >>> in the broker about client using deprecated behavior (suggested by Joe
>> >>> in the JIRA, and I think it makes sense).
>> >>>
>> >>> Gwen
>> >>>
>> >>> On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira <gshapira@cloudera.com
>> >
>> >>> wrote:
>> >>> > How about we continue the discussion on this thread, so we won't
>> lose
>> >>> > the context of this discussion, and put it up for VOTE when this has
>> >>> > been finalized?
>> >>> >
>> >>> > On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <ne...@confluent.io>
>> >>> > wrote:
>> >>> >> Gwen,
>> >>> >>
>> >>> >> KIP write-up looks good. According to the rest of the KIP process
>> >>> >> proposal,
>> >>> >> would you like to start a DISCUSS/VOTE thread for it?
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Neha
>> >>> >>
>> >>> >> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava
>> >>> >> <ew...@confluent.io>
>> >>> >> wrote:
>> >>> >>
>> >>> >>> Gwen -- KIP write up looks good. Deprecation schedule probably
>> needs
>> >>> >>> to be
>> >>> >>> more specific, but I think that discussion probably needs to
>> happen
>> >>> >>> after a
>> >>> >>> solution is agreed upon.
>> >>> >>>
>> >>> >>> Jay -- I think "older clients will get a bad error message
>> instead of
>> >>> >>> a
>> >>> >>> good one" isn't what would be happening with this change.
>> Previously
>> >>> >>> they
>> >>> >>> wouldn't have received an error and they would have been able to
>> >>> >>> produce
>> >>> >>> messages. After the change they'll just receive this new error
>> >>> >>> message
>> >>> >>> which their clients can't possibly handle gracefully since it
>> didn't
>> >>> >>> exist
>> >>> >>> when the client was written. Whether the acks > 1 setting was
>> >>> >>> actually
>> >>> >>> accomplishing what they thought doesn't matter. Someone could have
>> >>> >>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok
>> >>> >>> setting for
>> >>> >>> their applications, set it as a default across a bunch of apps,
>> then
>> >>> >>> follow
>> >>> >>> the recommended upgrade path of updating brokers to 0.8.2 and all
>> >>> >>> their
>> >>> >>> apps will start failing on produce requests.
>> >>> >>>
>> >>> >>>
>> >>> >>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>> > This is a good case to discuss.
>> >>> >>> >
>> >>> >>> > Let's figure the general case of how we want to handle errors
>> and
>> >>> >>> > get
>> >>> >>> that
>> >>> >>> > documented in the protocol. The problem right now is that we
>> give
>> >>> >>> > no
>> >>> >>> > guidance on this. I actually thought Gwen's suggestion made
>> sense
>> >>> >>> > on the
>> >>> >>> > guidance we should have given which is that we will enumerate a
>> set
>> >>> >>> > of
>> >>> >>> > errors and their meaning for each API but it is possible that
>> other
>> >>> >>> errors
>> >>> >>> > will occur and they should be handled (maybe poorly) in the same
>> >>> >>> > way
>> >>> >>> > UNKNOWN_ERROR is handled which is our normal escape hatch for
>> >>> >>> > things like
>> >>> >>> > OOMException.
>> >>> >>> >
>> >>> >>> > I really do think we shouldn't be dogmatic here: In considering
>> a
>> >>> >>> > change
>> >>> >>> > to errors we should consider the potential ill-effect vs the
>> >>> >>> > complexity
>> >>> >>> of
>> >>> >>> > yet another protocol version.
>> >>> >>> >
>> >>> >>> > In this case I actually am not sure we need to bump the protocol
>> >>> >>> > because
>> >>> >>> > the whole point of the change was to make a setting we think
>> >>> >>> > doesn't make
>> >>> >>> > sense break, right? Well this will break it. It seems like the
>> only
>> >>> >>> > downside is that older clients will get a bad error message
>> instead
>> >>> >>> > of a
>> >>> >>> > good one. But it isn't like we will have rendered a client
>> >>> >>> > unusable, it
>> >>> >>> is
>> >>> >>> > just that they will need to change their config.
>> >>> >>> >
>> >>> >>> > -Jay
>> >>> >>> >
>> >>> >>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira
>> >>> >>> > <gs...@cloudera.com>
>> >>> >>> > wrote:
>> >>> >>> >
>> >>> >>> >> I created a KIP for this suggestion:
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>>
>> >>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>> >>> >>> >>
>> >>> >>> >> Basically documenting what was already discussed here.
>> Comments
>> >>> >>> >> will
>> >>> >>> >> be awesome!
>> >>> >>> >>
>> >>> >>> >> Gwen
>> >>> >>> >>
>> >>> >>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira
>> >>> >>> >> <gs...@cloudera.com>
>> >>> >>> >> wrote:
>> >>> >>> >> > The errors are part of the KIP process now, so I think the
>> >>> >>> >> > clients are
>> >>> >>> >> safe :)
>> >>> >>> >> >
>> >>> >>> >> >
>> >>> >>> >>
>> >>> >>>
>> >>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>> >>> >>> >> >
>> >>> >>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin
>> >>> >>> >> > <st...@gmail.com>
>> >>> >>> >> wrote:
>> >>> >>> >> >> Agree errors should be part of the protocol
>> >>> >>> >> >>
>> >>> >>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira
>> >>> >>> >> >>> <gs...@cloudera.com>
>> >>> >>> >> wrote:
>> >>> >>> >> >>>
>> >>> >>> >> >>> Hi,
>> >>> >>> >> >>>
>> >>> >>> >> >>> I got convinced by Joe and Dana that errors are indeed
>> part of
>> >>> >>> >> >>> the
>> >>> >>> >> >>> protocol and can't be randomly added.
>> >>> >>> >> >>>
>> >>> >>> >> >>> So, it looks like we need to bump version of
>> ProduceRequest in
>> >>> >>> >> >>> the
>> >>> >>> >> >>> following way:
>> >>> >>> >> >>> Version 0 -> accept acks >1. I think we should keep the
>> >>> >>> >> >>> existing
>> >>> >>> >> >>> behavior too (i.e. not replace it with -1) to avoid
>> surprising
>> >>> >>> >> >>> clients, but I'm willing to hear other opinions.
>> >>> >>> >> >>> Version 1 -> do not accept acks >1 and return an error.
>> >>> >>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use
>> >>> >>> >> >>> something
>> >>> >>> >> >>> less specific like InvalidRequestParameter. This error can
>> be
>> >>> >>> >> >>> reused
>> >>> >>> >> >>> in the future and reduce the need to add errors, but will
>> also
>> >>> >>> >> >>> be
>> >>> >>> less
>> >>> >>> >> >>> clear to the client and its users. Maybe even add the error
>> >>> >>> >> >>> message
>> >>> >>> >> >>> string to the protocol in addition to the error code?
>> (since
>> >>> >>> >> >>> we are
>> >>> >>> >> >>> bumping versions....)
>> >>> >>> >> >>>
>> >>> >>> >> >>> I think maintaining the old version throughout 0.8.X makes
>> >>> >>> >> >>> sense.
>> >>> >>> IMO
>> >>> >>> >> >>> dropping it for 0.9 is feasible, but I'll let client owners
>> >>> >>> >> >>> help
>> >>> >>> make
>> >>> >>> >> >>> that call.
>> >>> >>> >> >>>
>> >>> >>> >> >>> Am I missing anything? Should I start a KIP? It seems like
>> a
>> >>> >>> KIP-type
>> >>> >>> >> >>> discussion :)
>> >>> >>> >> >>>
>> >>> >>> >> >>> Gwen
>> >>> >>> >> >>>
>> >>> >>> >> >>>
>> >>> >>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>> >>> >>> >> >>> <ew...@confluent.io> wrote:
>> >>> >>> >> >>>> Gwen,
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> I think the only option that wouldn't require a protocol
>> >>> >>> >> >>>> version
>> >>> >>> >> change is
>> >>> >>> >> >>>> the one where acks > 1 is converted to acks = -1 since
>> it's
>> >>> >>> >> >>>> the
>> >>> >>> only
>> >>> >>> >> one
>> >>> >>> >> >>>> that doesn't potentially break older clients. The protocol
>> >>> >>> >> >>>> guide
>> >>> >>> >> says that
>> >>> >>> >> >>>> the expected upgrade path is servers first, then clients,
>> so
>> >>> >>> >> >>>> old
>> >>> >>> >> clients,
>> >>> >>> >> >>>> including non-java clients, that may be using acks > 1
>> should
>> >>> >>> >> >>>> be
>> >>> >>> >> able to
>> >>> >>> >> >>>> work with a new broker version.
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> It's more work, but I think dealing with the protocol
>> change
>> >>> >>> >> >>>> is the
>> >>> >>> >> right
>> >>> >>> >> >>>> thing to do since it eventually gets us to the behavior I
>> >>> >>> >> >>>> think is
>> >>> >>> >> better --
>> >>> >>> >> >>>> the broker should reject requests with invalid values. I
>> >>> >>> >> >>>> think Joe
>> >>> >>> >> and I
>> >>> >>> >> >>>> were basically in agreement. In my mind the major piece
>> >>> >>> >> >>>> missing
>> >>> >>> from
>> >>> >>> >> his
>> >>> >>> >> >>>> description is how long we're going to maintain his "case
>> 0"
>> >>> >>> >> behavior. It's
>> >>> >>> >> >>>> impractical to maintain old versions forever, but it
>> sounds
>> >>> >>> >> >>>> like
>> >>> >>> >> there
>> >>> >>> >> >>>> hasn't been a decision on how long to maintain them. Maybe
>> >>> >>> >> >>>> that's
>> >>> >>> >> another
>> >>> >>> >> >>>> item to add to KIPs -- protocol versions and behavior
>> need to
>> >>> >>> >> >>>> be
>> >>> >>> >> listed as
>> >>> >>> >> >>>> deprecated and the earliest version in which they'll be
>> >>> >>> >> >>>> removed
>> >>> >>> >> should be
>> >>> >>> >> >>>> specified so users can understand which versions are
>> >>> >>> >> >>>> guaranteed to
>> >>> >>> be
>> >>> >>> >> >>>> compatible, even if they're using (well-written) non-java
>> >>> >>> >> >>>> clients.
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> -Ewen
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
>> >>> >>> >> dana.powers@gmail.com> wrote:
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>>> clients don't break on unknown errors
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> maybe true for the official java clients, but I dont
>> think
>> >>> >>> >> >>>>> the
>> >>> >>> >> assumption
>> >>> >>> >> >>>>> holds true for community-maintained clients and users of
>> >>> >>> >> >>>>> those
>> >>> >>> >> clients.
>> >>> >>> >> >>>>> kafka-python generally follows the fail-fast philosophy
>> and
>> >>> >>> >> >>>>> raises
>> >>> >>> >> an
>> >>> >>> >> >>>>> exception on any unrecognized error code in any server
>> >>> >>> >> >>>>> response.
>> >>> >>> >> in this
>> >>> >>> >> >>>>> case, kafka-python allows users to set their own
>> >>> >>> >> >>>>> required-acks
>> >>> >>> >> policy when
>> >>> >>> >> >>>>> creating a producer instance.  It is possible that users
>> of
>> >>> >>> >> kafka-python
>> >>> >>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps in
>> >>> >>> production
>> >>> >>> >> >>>>> environments -- and for those users the new error code
>> will
>> >>> >>> >> >>>>> crash
>> >>> >>> >> their
>> >>> >>> >> >>>>> producer code.  I would not be surprised if the same were
>> >>> >>> >> >>>>> true of
>> >>> >>> >> other
>> >>> >>> >> >>>>> community clients.
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> *one reason for the fail-fast approach is that there
>> isn't
>> >>> >>> >> >>>>> great
>> >>> >>> >> >>>>> documentation on what errors to expect for each request /
>> >>> >>> >> >>>>> response
>> >>> >>> >> -- so
>> >>> >>> >> >>>>> we
>> >>> >>> >> >>>>> use failures to alert that some error case is not handled
>> >>> >>> >> properly.  and
>> >>> >>> >> >>>>> because of that, introducing new error cases without
>> bumping
>> >>> >>> >> >>>>> the
>> >>> >>> api
>> >>> >>> >> >>>>> version is likely to cause those errors to get
>> raised/thrown
>> >>> >>> >> >>>>> all
>> >>> >>> >> the way
>> >>> >>> >> >>>>> back up to the user.  of course we (client maintainers)
>> can
>> >>> >>> >> >>>>> fix
>> >>> >>> the
>> >>> >>> >> issues
>> >>> >>> >> >>>>> in the client libraries and suggest users upgrade, but
>> it's
>> >>> >>> >> >>>>> not
>> >>> >>> the
>> >>> >>> >> ideal
>> >>> >>> >> >>>>> situation.
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> long-winded way of saying: I agree w/ Joe.
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> -Dana
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
>> >>> >>> >> gshapira@cloudera.com>
>> >>> >>> >> >>>>> wrote:
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>>> Is the protocol bump caused by the behavior change or
>> the
>> >>> >>> >> >>>>>> new
>> >>> >>> error
>> >>> >>> >> >>>>>> code?
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to
>> >>> >>> >> >>>>>> receive
>> >>> >>> >> errors
>> >>> >>> >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK,
>> >>> >>> >> >>>>>> clients
>> >>> >>> >> don't
>> >>> >>> >> >>>>>> break on unknown errors, they are simple more
>> challenging
>> >>> >>> >> >>>>>> to
>> >>> >>> >> debug. If
>> >>> >>> >> >>>>>> we document the new behavior, then its definitely
>> >>> >>> >> >>>>>> debuggable and
>> >>> >>> >> >>>>>> fixable.
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> 2) The behavior change is basically a deprecation - i.e.
>> >>> >>> >> >>>>>> acks > 1
>> >>> >>> >> were
>> >>> >>> >> >>>>>> never documented, and are not supported by Kafka clients
>> >>> >>> >> >>>>>> starting
>> >>> >>> >> with
>> >>> >>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol
>> bump
>> >>> >>> >> >>>>>> either,
>> >>> >>> >> >>>>>> although its a better case than new error codes.
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> Thanks,
>> >>> >>> >> >>>>>> Gwen
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
>> >>> >>> joe.stein@stealth.ly>
>> >>> >>> >> >>>>>> wrote:
>> >>> >>> >> >>>>>>> Looping in the mailing list that the client developers
>> >>> >>> >> >>>>>>> live on
>> >>> >>> >> because
>> >>> >>> >> >>>>>> they
>> >>> >>> >> >>>>>>> are all not on dev (though they should be if they want
>> to
>> >>> >>> >> >>>>>>> be
>> >>> >>> >> helping
>> >>> >>> >> >>>>>>> to
>> >>> >>> >> >>>>>>> build the best client libraries they can).
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> I whole hardily believe that we need to not break
>> existing
>> >>> >>> >> >>>>>>> functionality
>> >>> >>> >> >>>>>> of
>> >>> >>> >> >>>>>>> the client protocol, ever.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> There are many reasons for this and we have other
>> threads
>> >>> >>> >> >>>>>>> on the
>> >>> >>> >> >>>>>>> mailing
>> >>> >>> >> >>>>>>> list where we are discussing that topic (no pun
>> intended)
>> >>> >>> >> >>>>>>> that I
>> >>> >>> >> don't
>> >>> >>> >> >>>>>> want
>> >>> >>> >> >>>>>>> to re-hash here.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> If we change wire protocol functionality OR the binary
>> >>> >>> >> >>>>>>> format
>> >>> >>> >> (either)
>> >>> >>> >> >>>>>>> we
>> >>> >>> >> >>>>>>> must bump version AND treat version as a feature flag
>> with
>> >>> >>> >> backward
>> >>> >>> >> >>>>>>> compatibility support until it is deprecated for some
>> time
>> >>> >>> >> >>>>>>> for
>> >>> >>> >> folks
>> >>> >>> >> >>>>>>> to
>> >>> >>> >> >>>>>> deal
>> >>> >>> >> >>>>>>> with it.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> match version = {
>> >>> >>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
>> >>> >>> >> >>>>>>> case 1: doNewStuff()
>> >>> >>> >> >>>>>>> case 2: doEvenMoreNewStuff()
>> >>> >>> >> >>>>>>> }
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> has to be a practice we adopt imho ... I know feature
>> >>> >>> >> >>>>>>> flags can
>> >>> >>> be
>> >>> >>> >> >>>>>> construed
>> >>> >>> >> >>>>>>> as "messy code" but I am eager to hear another (better?
>> >>> >>> >> different?)
>> >>> >>> >> >>>>>> solution
>> >>> >>> >> >>>>>>> to this.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> If we don't do a feature flag like this specifically
>> with
>> >>> >>> >> >>>>>>> this
>> >>> >>> >> change
>> >>> >>> >> >>>>>> then
>> >>> >>> >> >>>>>>> what happens is that someone upgrades their brokers
>> with a
>> >>> >>> rolling
>> >>> >>> >> >>>>>> restart
>> >>> >>> >> >>>>>>> in 0.8.3 and every single one of their producer
>> requests
>> >>> >>> >> >>>>>>> start
>> >>> >>> to
>> >>> >>> >> fail
>> >>> >>> >> >>>>>> and
>> >>> >>> >> >>>>>>> they have a major production outage. eeeek!!!!
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY*
>> >>> >>> >> >>>>>>> need
>> >>> >>> >> people to
>> >>> >>> >> >>>>>> start
>> >>> >>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is
>> going
>> >>> >>> >> >>>>>>> to
>> >>> >>> >> work for
>> >>> >>> >> >>>>>>> everyone.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> Old producers and consumers must keep working with new
>> >>> >>> >> >>>>>>> brokers
>> >>> >>> >> and if
>> >>> >>> >> >>>>>>> we
>> >>> >>> >> >>>>>> are
>> >>> >>> >> >>>>>>> not going to support that then I am unclear what the
>> use
>> >>> >>> >> >>>>>>> of
>> >>> >>> >> "version"
>> >>> >>> >> >>>>>>> is
>> >>> >>> >> >>>>>>> based on our original intentions of having it because
>> of
>> >>> >>> >> >>>>>>> the
>> >>> >>> >> >>>>>>> 0.7=>-0.8.
>> >>> >>> >> >>>>>> We
>> >>> >>> >> >>>>>>> said no more breaking changes when we did that.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> - Joe Stein
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen
>> Cheslack-Postava <
>> >>> >>> >> >>>>>> ewen@confluent.io>
>> >>> >>> >> >>>>>>> wrote:
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> Right, so this looks like it could create an issue
>> >>> >>> >> >>>>>>>> similar to
>> >>> >>> >> what's
>> >>> >>> >> >>>>>>>> currently being discussed in
>> >>> >>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649
>> where
>> >>> >>> >> >>>>>>>> users
>> >>> >>> >> now get
>> >>> >>> >> >>>>>>>> errors
>> >>> >>> >> >>>>>>>> under conditions when they previously wouldn't. Old
>> >>> >>> >> >>>>>>>> clients
>> >>> >>> won't
>> >>> >>> >> >>>>>>>> even
>> >>> >>> >> >>>>>>>> know
>> >>> >>> >> >>>>>>>> about the error code, so besides failing they won't
>> even
>> >>> >>> >> >>>>>>>> be
>> >>> >>> able
>> >>> >>> >> to
>> >>> >>> >> >>>>>>>> log
>> >>> >>> >> >>>>>>>> any
>> >>> >>> >> >>>>>>>> meaningful error messages.
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> I think there are two options for compatibility:
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1
>> code,
>> >>> >>> >> >>>>>>>> but
>> >>> >>> >> silently
>> >>> >>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This
>> isn't
>> >>> >>> >> >>>>>>>> the
>> >>> >>> >> same as
>> >>> >>> >> >>>>>>>> other
>> >>> >>> >> >>>>>>>> changes to behavior since the interaction between the
>> >>> >>> >> >>>>>>>> client
>> >>> >>> and
>> >>> >>> >> >>>>>>>> server
>> >>> >>> >> >>>>>>>> remains the same, no error codes change, etc. The
>> client
>> >>> >>> >> >>>>>>>> might
>> >>> >>> >> just
>> >>> >>> >> >>>>>>>> see
>> >>> >>> >> >>>>>>>> some increased latency since the message might need
>> to be
>> >>> >>> >> replicated
>> >>> >>> >> >>>>>>>> to
>> >>> >>> >> >>>>>>>> more brokers than they requested.
>> >>> >>> >> >>>>>>>> 2. Split this into two patches, one that bumps the
>> >>> >>> >> >>>>>>>> protocol
>> >>> >>> >> version
>> >>> >>> >> >>>>>>>> on
>> >>> >>> >> >>>>>>>> that
>> >>> >>> >> >>>>>>>> message to include the new error code but maintains
>> both
>> >>> >>> >> >>>>>>>> old
>> >>> >>> (now
>> >>> >>> >> >>>>>>>> deprecated) and new behavior, then a second that
>> would be
>> >>> >>> >> applied in
>> >>> >>> >> >>>>>>>> a
>> >>> >>> >> >>>>>>>> later release that removes the old protocol + code for
>> >>> >>> >> >>>>>>>> handling
>> >>> >>> >> acks
>> >>> >>> >> >>>>>> 1.
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> 2 is probably the right thing to do. If we specify the
>> >>> >>> >> >>>>>>>> release
>> >>> >>> >> when
>> >>> >>> >> >>>>>> we'll
>> >>> >>> >> >>>>>>>> remove the deprecated protocol at the time of
>> deprecation
>> >>> >>> >> >>>>>>>> it
>> >>> >>> >> makes
>> >>> >>> >> >>>>>> things
>> >>> >>> >> >>>>>>>> a
>> >>> >>> >> >>>>>>>> lot easier for people writing non-java clients and
>> could
>> >>> >>> >> >>>>>>>> give
>> >>> >>> >> users
>> >>> >>> >> >>>>>> better
>> >>> >>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major
>> >>> >>> >> >>>>>>>> release
>> >>> >>> >> behind
>> >>> >>> >> >>>>>>>> brokers,
>> >>> >>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
>> >>> >>> features).
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
>> >>> >>> >> gshapira@cloudera.com>
>> >>> >>> >> >>>>>>>> wrote:
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>>> Hi Kafka Devs,
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to
>> >>> >>> >> >>>>>>>>> ack>1 on
>> >>> >>> >> the
>> >>> >>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give
>> everyone
>> >>> >>> >> >>>>>>>>> a
>> >>> >>> >> heads up
>> >>> >>> >> >>>>>>>>> on
>> >>> >>> >> >>>>>>>>> what these changes mean.
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> Once this patch is included, any produce requests
>> that
>> >>> >>> >> >>>>>>>>> include
>> >>> >>> >> >>>>>>>>> request.required.acks > 1 will result in an
>> exception.
>> >>> >>> >> >>>>>>>>> This
>> >>> >>> >> will be
>> >>> >>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I
>> >>> >>> >> >>>>>>>>> assume)
>> >>> >>> and
>> >>> >>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I
>> >>> >>> >> >>>>>>>>> can't add
>> >>> >>> >> error
>> >>> >>> >> >>>>>>>>> codes retroactively).
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers
>> >>> >>> >> >>>>>>>>> (sync and
>> >>> >>> >> >>>>>>>>> new),
>> >>> >>> >> >>>>>>>>> but we expect impact on users with older producers
>> that
>> >>> >>> >> >>>>>>>>> relied
>> >>> >>> >> on
>> >>> >>> >> >>>>>>>>> acks
>> >>> >>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch
>> to
>> >>> >>> >> >>>>>>>>> using
>> >>> >>> >> acks =
>> >>> >>> >> >>>>>>>>> -1
>> >>> >>> >> >>>>>>>>> and a min.isr parameter than matches their user case.
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> This change was discussed in the past in the context
>> of
>> >>> >>> >> KAFKA-1555
>> >>> >>> >> >>>>>>>>> (min.isr), but let us know if you have any questions
>> or
>> >>> >>> concerns
>> >>> >>> >> >>>>>>>>> regarding this change.
>> >>> >>> >> >>>>>>>>>
>> >>> >>> >> >>>>>>>>> Gwen
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>>
>> >>> >>> >> >>>>>>>> --
>> >>> >>> >> >>>>>>>> Thanks,
>> >>> >>> >> >>>>>>>> Ewen
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> --
>> >>> >>> >> >>>>>>> You received this message because you are subscribed to
>> >>> >>> >> >>>>>>> the
>> >>> >>> Google
>> >>> >>> >> >>>>>>> Groups
>> >>> >>> >> >>>>>>> "kafka-clients" group.
>> >>> >>> >> >>>>>>> To unsubscribe from this group and stop receiving
>> emails
>> >>> >>> >> >>>>>>> from
>> >>> >>> it,
>> >>> >>> >> send
>> >>> >>> >> >>>>>>> an
>> >>> >>> >> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>> >>> >>> >> >>>>>>> To post to this group, send email to
>> >>> >>> >> kafka-clients@googlegroups.com.
>> >>> >>> >> >>>>>>> Visit this group at
>> >>> >>> http://groups.google.com/group/kafka-clients.
>> >>> >>> >> >>>>>>> To view this discussion on the web visit
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>>
>> >>> >>> >>
>> >>> >>>
>> >>> >>>
>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>> >>> >>> >> >>>>>> .
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> For more options, visit
>> >>> >>> >> >>>>>>> https://groups.google.com/d/optout.
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> --
>> >>> >>> >> >>>>>> You received this message because you are subscribed to
>> the
>> >>> >>> Google
>> >>> >>> >> >>>>>> Groups
>> >>> >>> >> >>>>>> "kafka-clients" group.
>> >>> >>> >> >>>>>> To unsubscribe from this group and stop receiving emails
>> >>> >>> >> >>>>>> from it,
>> >>> >>> >> send
>> >>> >>> >> >>>>>> an
>> >>> >>> >> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>> >>> >>> >> >>>>>> To post to this group, send email to
>> >>> >>> >> kafka-clients@googlegroups.com.
>> >>> >>> >> >>>>>> Visit this group at
>> >>> >>> >> >>>>>> http://groups.google.com/group/kafka-clients
>> >>> >>> .
>> >>> >>> >> >>>>>> To view this discussion on the web visit
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>>
>> >>> >>> >>
>> >>> >>>
>> >>> >>>
>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>> >>> >>> >> >>>>>> .
>> >>> >>> >> >>>>>> For more options, visit
>> https://groups.google.com/d/optout.
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> --
>> >>> >>> >> >>>> Thanks,
>> >>> >>> >> >>>> Ewen
>> >>> >>> >>
>> >>> >>> >
>> >>> >>> >
>> >>> >>>
>> >>> >>>
>> >>> >>> --
>> >>> >>> Thanks,
>> >>> >>> Ewen
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Thanks,
>> >>> >> Neha
>> >>
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups
>> >> "kafka-clients" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an
>> >> email to kafka-clients+unsubscribe@googlegroups.com.
>> >> To post to this group, send email to kafka-clients@googlegroups.com.
>> >> Visit this group at http://groups.google.com/group/kafka-clients.
>> >> To view this discussion on the web visit
>> >>
>> https://groups.google.com/d/msgid/kafka-clients/CAOeJiJh17CYq%3D-qgPu9rnArsPW%3D7RL9AAW_h%3DrrXx0%2BKhhKgNQ%40mail.gmail.com
>> .
>> >>
>> >> For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>>
>
>

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Jun Rao <ju...@confluent.io>.
For 2, how about we make a change to log a warning for ack > 1 in 0.8.2 and
then drop the ack > 1 support in trunk (w/o bumping up the protocol
version)? Thanks,

Jun

On Sun, Jan 18, 2015 at 8:24 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> Overall, agree on point #1, less sure on point #2.
>
> 1. Some protocols never ever add new errors, while others add errors
> without bumping versions. HTTP is a good example of the second type.
> HTTP-451 was added fairly recently, there are some errors specific to
> NGINX, etc. No one cares. I think we should properly document in the
> wire-protocol doc that new errors can be added, and I think we should
> strongly suggest (and implement ourselves) that unknown error codes
> should be shown to users (or at least logged), so they can be googled
> and understood through our documentation.
> In addition, hierarchy of error codes, so clients will know if an
> error is retry-able just by looking at the code could be nice. Same
> for adding an error string to the protocol. These are future
> enhancements that should be discussed separately.
>
> 2. I think we want to allow admins to upgrade their Kafka brokers
> without having to chase down clients in their organization and without
> getting blamed if clients break. I think it makes sense to have one
> version that will support existing behavior, but log warnings, so
> admins will know about misbehaving clients and can track them down
> before an upgrade that breaks them (or before the broken config causes
> them to lose data!). Hopefully this is indeed a very rare behavior and
> we are taking extra precaution for nothing, but I have customers where
> one traumatic upgrade means they will never upgrade a Kafka again, so
> I'm being conservative.
>
> Gwen
>
>
> On Sun, Jan 18, 2015 at 3:50 PM, Jun Rao <ju...@confluent.io> wrote:
> > Overall, I agree with Jay on both points.
> >
> > 1. I think it's reasonable to add new error codes w/o bumping up the
> > protocol version. In most cases, by adding new error codes, we are just
> > refining the categorization of those unknown errors. So, a client
> shouldn't
> > behave worse than before as long as unknown errors have been properly
> > handled.
> >
> > 2. I think it's reasonable to just document that 0.8.2 will be the last
> > release that will support ack > 1 and remove the support completely in
> trunk
> > w/o bumping up the protocol. This is because (a) we never included ack >
> 1
> > explicitly in the documentation and so the usage should be limited; (2)
> ack
> >> 1 doesn't provide the guarantee that people really want and so it
> > shouldn't really be used.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Sun, Jan 18, 2015 at 11:03 AM, Jay Kreps <ja...@gmail.com> wrote:
> >>
> >> Hey guys,
> >>
> >> I really think we are discussing two things here:
> >>
> >> How should we generally handle changes to the set of errors? Should
> >> introducing new errors be considered a protocol change or should we
> reserve
> >> the right to introduce new error codes?
> >> Given that this particular change is possibly incompatible, how should
> we
> >> handle it?
> >>
> >> I think it would be good for people who are responding here to be
> specific
> >> about which they are addressing.
> >>
> >> Here is what I think:
> >>
> >> 1. Errors should be extensible within a protocol version.
> >>
> >> We should change the protocol documentation to list the errors that can
> be
> >> given back from each api, their meaning, and how to handle them, BUT we
> >> should explicitly state that the set of errors are open ended. That is
> we
> >> should reserve the right to introduce new errors and explicitly state
> that
> >> clients need a blanket "unknown error" handling mechanism. The error can
> >> link to the protocol definition (something like "Unknown error 42, see
> >> protocol definition at http://link"). We could make this work really
> well by
> >> instructing all the clients to report the error in a very googlable way
> as
> >> Oracle does with their error format (e.g. "ORA-32") so that if you ever
> get
> >> the raw error google will take you to the definition.
> >>
> >> I agree that a more rigid definition seems like "right thing", but
> having
> >> just implemented two clients and spent a bunch of time on the server
> side, I
> >> think, it will work out poorly in practice. Here is why:
> >>
> >> I think we will make a lot of mistakes in nailing down the set of error
> >> codes up front and we will end up going through 3-4 churns of the
> protocol
> >> definition just realizing the set of errors that can be thrown. I think
> this
> >> churn will actually make life worse for clients that now have to figure
> out
> >> 7 identical versions of the protocol and will be a mess in terms of
> testing
> >> on the server side. I actually know this to be true because while
> >> implementing the clients I tried to guess the errors that could be
> thrown,
> >> then checked my guess by close code inspection. It turned out that I
> always
> >> missed things in my belief about errors, but more importantly even after
> >> close code inspection I found tons of other errors in my stress testing.
> >> In practice error handling always involves calling out one or two
> >> meaningful failures that have special recovery and then a blanket case
> that
> >> just handles everything else. It's true that some clients may not have
> done
> >> this well, but I think it is for the best if they fix that.
> >> Reserving the right to add errors doesn't mean we will do it without
> care.
> >> We will think through each change and decide whether giving a little
> more
> >> precision in the error is worth the overhead and churn of a protocol
> version
> >> bump.
> >>
> >> 2. In this case in particular we should not introduce a new protocol
> >> version
> >>
> >> In this particular case we are saying that acks > 1 doesn't make sense
> and
> >> we want to give an error to people specifying this so that they change
> their
> >> configuration. This is a configuration that few people use and we want
> to
> >> just make it an error. The bad behavior will just be that the error
> will not
> >> be as good as it could be. I think that is a better tradeoff than
> >> introducing a separate protocol version (this may be true of the java
> >> clients too).
> >>
> >> We will have lots of cases like this in the future and we aren't going
> to
> >> want to churn the protocol for each of them. For example we previously
> had
> >> to get more precise about which characters were legal and which illegal
> in
> >> topic names.
> >>
> >> -Jay
> >>
> >> On Fri, Jan 16, 2015 at 11:55 AM, Gwen Shapira <gs...@cloudera.com>
> >> wrote:
> >>>
> >>> I updated the KIP: Using acks > 1 in version 0 will log a WARN message
> >>> in the broker about client using deprecated behavior (suggested by Joe
> >>> in the JIRA, and I think it makes sense).
> >>>
> >>> Gwen
> >>>
> >>> On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira <gs...@cloudera.com>
> >>> wrote:
> >>> > How about we continue the discussion on this thread, so we won't lose
> >>> > the context of this discussion, and put it up for VOTE when this has
> >>> > been finalized?
> >>> >
> >>> > On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <ne...@confluent.io>
> >>> > wrote:
> >>> >> Gwen,
> >>> >>
> >>> >> KIP write-up looks good. According to the rest of the KIP process
> >>> >> proposal,
> >>> >> would you like to start a DISCUSS/VOTE thread for it?
> >>> >>
> >>> >> Thanks,
> >>> >> Neha
> >>> >>
> >>> >> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava
> >>> >> <ew...@confluent.io>
> >>> >> wrote:
> >>> >>
> >>> >>> Gwen -- KIP write up looks good. Deprecation schedule probably
> needs
> >>> >>> to be
> >>> >>> more specific, but I think that discussion probably needs to happen
> >>> >>> after a
> >>> >>> solution is agreed upon.
> >>> >>>
> >>> >>> Jay -- I think "older clients will get a bad error message instead
> of
> >>> >>> a
> >>> >>> good one" isn't what would be happening with this change.
> Previously
> >>> >>> they
> >>> >>> wouldn't have received an error and they would have been able to
> >>> >>> produce
> >>> >>> messages. After the change they'll just receive this new error
> >>> >>> message
> >>> >>> which their clients can't possibly handle gracefully since it
> didn't
> >>> >>> exist
> >>> >>> when the client was written. Whether the acks > 1 setting was
> >>> >>> actually
> >>> >>> accomplishing what they thought doesn't matter. Someone could have
> >>> >>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok
> >>> >>> setting for
> >>> >>> their applications, set it as a default across a bunch of apps,
> then
> >>> >>> follow
> >>> >>> the recommended upgrade path of updating brokers to 0.8.2 and all
> >>> >>> their
> >>> >>> apps will start failing on produce requests.
> >>> >>>
> >>> >>>
> >>> >>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com>
> >>> >>> wrote:
> >>> >>>
> >>> >>> > This is a good case to discuss.
> >>> >>> >
> >>> >>> > Let's figure the general case of how we want to handle errors and
> >>> >>> > get
> >>> >>> that
> >>> >>> > documented in the protocol. The problem right now is that we give
> >>> >>> > no
> >>> >>> > guidance on this. I actually thought Gwen's suggestion made sense
> >>> >>> > on the
> >>> >>> > guidance we should have given which is that we will enumerate a
> set
> >>> >>> > of
> >>> >>> > errors and their meaning for each API but it is possible that
> other
> >>> >>> errors
> >>> >>> > will occur and they should be handled (maybe poorly) in the same
> >>> >>> > way
> >>> >>> > UNKNOWN_ERROR is handled which is our normal escape hatch for
> >>> >>> > things like
> >>> >>> > OOMException.
> >>> >>> >
> >>> >>> > I really do think we shouldn't be dogmatic here: In considering a
> >>> >>> > change
> >>> >>> > to errors we should consider the potential ill-effect vs the
> >>> >>> > complexity
> >>> >>> of
> >>> >>> > yet another protocol version.
> >>> >>> >
> >>> >>> > In this case I actually am not sure we need to bump the protocol
> >>> >>> > because
> >>> >>> > the whole point of the change was to make a setting we think
> >>> >>> > doesn't make
> >>> >>> > sense break, right? Well this will break it. It seems like the
> only
> >>> >>> > downside is that older clients will get a bad error message
> instead
> >>> >>> > of a
> >>> >>> > good one. But it isn't like we will have rendered a client
> >>> >>> > unusable, it
> >>> >>> is
> >>> >>> > just that they will need to change their config.
> >>> >>> >
> >>> >>> > -Jay
> >>> >>> >
> >>> >>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira
> >>> >>> > <gs...@cloudera.com>
> >>> >>> > wrote:
> >>> >>> >
> >>> >>> >> I created a KIP for this suggestion:
> >>> >>> >>
> >>> >>> >>
> >>> >>> >>
> >>> >>>
> >>> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
> >>> >>> >>
> >>> >>> >> Basically documenting what was already discussed here.  Comments
> >>> >>> >> will
> >>> >>> >> be awesome!
> >>> >>> >>
> >>> >>> >> Gwen
> >>> >>> >>
> >>> >>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira
> >>> >>> >> <gs...@cloudera.com>
> >>> >>> >> wrote:
> >>> >>> >> > The errors are part of the KIP process now, so I think the
> >>> >>> >> > clients are
> >>> >>> >> safe :)
> >>> >>> >> >
> >>> >>> >> >
> >>> >>> >>
> >>> >>>
> >>> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >>> >>> >> >
> >>> >>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin
> >>> >>> >> > <st...@gmail.com>
> >>> >>> >> wrote:
> >>> >>> >> >> Agree errors should be part of the protocol
> >>> >>> >> >>
> >>> >>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira
> >>> >>> >> >>> <gs...@cloudera.com>
> >>> >>> >> wrote:
> >>> >>> >> >>>
> >>> >>> >> >>> Hi,
> >>> >>> >> >>>
> >>> >>> >> >>> I got convinced by Joe and Dana that errors are indeed part
> of
> >>> >>> >> >>> the
> >>> >>> >> >>> protocol and can't be randomly added.
> >>> >>> >> >>>
> >>> >>> >> >>> So, it looks like we need to bump version of ProduceRequest
> in
> >>> >>> >> >>> the
> >>> >>> >> >>> following way:
> >>> >>> >> >>> Version 0 -> accept acks >1. I think we should keep the
> >>> >>> >> >>> existing
> >>> >>> >> >>> behavior too (i.e. not replace it with -1) to avoid
> surprising
> >>> >>> >> >>> clients, but I'm willing to hear other opinions.
> >>> >>> >> >>> Version 1 -> do not accept acks >1 and return an error.
> >>> >>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use
> >>> >>> >> >>> something
> >>> >>> >> >>> less specific like InvalidRequestParameter. This error can
> be
> >>> >>> >> >>> reused
> >>> >>> >> >>> in the future and reduce the need to add errors, but will
> also
> >>> >>> >> >>> be
> >>> >>> less
> >>> >>> >> >>> clear to the client and its users. Maybe even add the error
> >>> >>> >> >>> message
> >>> >>> >> >>> string to the protocol in addition to the error code? (since
> >>> >>> >> >>> we are
> >>> >>> >> >>> bumping versions....)
> >>> >>> >> >>>
> >>> >>> >> >>> I think maintaining the old version throughout 0.8.X makes
> >>> >>> >> >>> sense.
> >>> >>> IMO
> >>> >>> >> >>> dropping it for 0.9 is feasible, but I'll let client owners
> >>> >>> >> >>> help
> >>> >>> make
> >>> >>> >> >>> that call.
> >>> >>> >> >>>
> >>> >>> >> >>> Am I missing anything? Should I start a KIP? It seems like a
> >>> >>> KIP-type
> >>> >>> >> >>> discussion :)
> >>> >>> >> >>>
> >>> >>> >> >>> Gwen
> >>> >>> >> >>>
> >>> >>> >> >>>
> >>> >>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
> >>> >>> >> >>> <ew...@confluent.io> wrote:
> >>> >>> >> >>>> Gwen,
> >>> >>> >> >>>>
> >>> >>> >> >>>> I think the only option that wouldn't require a protocol
> >>> >>> >> >>>> version
> >>> >>> >> change is
> >>> >>> >> >>>> the one where acks > 1 is converted to acks = -1 since it's
> >>> >>> >> >>>> the
> >>> >>> only
> >>> >>> >> one
> >>> >>> >> >>>> that doesn't potentially break older clients. The protocol
> >>> >>> >> >>>> guide
> >>> >>> >> says that
> >>> >>> >> >>>> the expected upgrade path is servers first, then clients,
> so
> >>> >>> >> >>>> old
> >>> >>> >> clients,
> >>> >>> >> >>>> including non-java clients, that may be using acks > 1
> should
> >>> >>> >> >>>> be
> >>> >>> >> able to
> >>> >>> >> >>>> work with a new broker version.
> >>> >>> >> >>>>
> >>> >>> >> >>>> It's more work, but I think dealing with the protocol
> change
> >>> >>> >> >>>> is the
> >>> >>> >> right
> >>> >>> >> >>>> thing to do since it eventually gets us to the behavior I
> >>> >>> >> >>>> think is
> >>> >>> >> better --
> >>> >>> >> >>>> the broker should reject requests with invalid values. I
> >>> >>> >> >>>> think Joe
> >>> >>> >> and I
> >>> >>> >> >>>> were basically in agreement. In my mind the major piece
> >>> >>> >> >>>> missing
> >>> >>> from
> >>> >>> >> his
> >>> >>> >> >>>> description is how long we're going to maintain his "case
> 0"
> >>> >>> >> behavior. It's
> >>> >>> >> >>>> impractical to maintain old versions forever, but it sounds
> >>> >>> >> >>>> like
> >>> >>> >> there
> >>> >>> >> >>>> hasn't been a decision on how long to maintain them. Maybe
> >>> >>> >> >>>> that's
> >>> >>> >> another
> >>> >>> >> >>>> item to add to KIPs -- protocol versions and behavior need
> to
> >>> >>> >> >>>> be
> >>> >>> >> listed as
> >>> >>> >> >>>> deprecated and the earliest version in which they'll be
> >>> >>> >> >>>> removed
> >>> >>> >> should be
> >>> >>> >> >>>> specified so users can understand which versions are
> >>> >>> >> >>>> guaranteed to
> >>> >>> be
> >>> >>> >> >>>> compatible, even if they're using (well-written) non-java
> >>> >>> >> >>>> clients.
> >>> >>> >> >>>>
> >>> >>> >> >>>> -Ewen
> >>> >>> >> >>>>
> >>> >>> >> >>>>
> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
> >>> >>> >> dana.powers@gmail.com> wrote:
> >>> >>> >> >>>>>
> >>> >>> >> >>>>>> clients don't break on unknown errors
> >>> >>> >> >>>>>
> >>> >>> >> >>>>> maybe true for the official java clients, but I dont think
> >>> >>> >> >>>>> the
> >>> >>> >> assumption
> >>> >>> >> >>>>> holds true for community-maintained clients and users of
> >>> >>> >> >>>>> those
> >>> >>> >> clients.
> >>> >>> >> >>>>> kafka-python generally follows the fail-fast philosophy
> and
> >>> >>> >> >>>>> raises
> >>> >>> >> an
> >>> >>> >> >>>>> exception on any unrecognized error code in any server
> >>> >>> >> >>>>> response.
> >>> >>> >> in this
> >>> >>> >> >>>>> case, kafka-python allows users to set their own
> >>> >>> >> >>>>> required-acks
> >>> >>> >> policy when
> >>> >>> >> >>>>> creating a producer instance.  It is possible that users
> of
> >>> >>> >> kafka-python
> >>> >>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps in
> >>> >>> production
> >>> >>> >> >>>>> environments -- and for those users the new error code
> will
> >>> >>> >> >>>>> crash
> >>> >>> >> their
> >>> >>> >> >>>>> producer code.  I would not be surprised if the same were
> >>> >>> >> >>>>> true of
> >>> >>> >> other
> >>> >>> >> >>>>> community clients.
> >>> >>> >> >>>>>
> >>> >>> >> >>>>> *one reason for the fail-fast approach is that there isn't
> >>> >>> >> >>>>> great
> >>> >>> >> >>>>> documentation on what errors to expect for each request /
> >>> >>> >> >>>>> response
> >>> >>> >> -- so
> >>> >>> >> >>>>> we
> >>> >>> >> >>>>> use failures to alert that some error case is not handled
> >>> >>> >> properly.  and
> >>> >>> >> >>>>> because of that, introducing new error cases without
> bumping
> >>> >>> >> >>>>> the
> >>> >>> api
> >>> >>> >> >>>>> version is likely to cause those errors to get
> raised/thrown
> >>> >>> >> >>>>> all
> >>> >>> >> the way
> >>> >>> >> >>>>> back up to the user.  of course we (client maintainers)
> can
> >>> >>> >> >>>>> fix
> >>> >>> the
> >>> >>> >> issues
> >>> >>> >> >>>>> in the client libraries and suggest users upgrade, but
> it's
> >>> >>> >> >>>>> not
> >>> >>> the
> >>> >>> >> ideal
> >>> >>> >> >>>>> situation.
> >>> >>> >> >>>>>
> >>> >>> >> >>>>>
> >>> >>> >> >>>>> long-winded way of saying: I agree w/ Joe.
> >>> >>> >> >>>>>
> >>> >>> >> >>>>> -Dana
> >>> >>> >> >>>>>
> >>> >>> >> >>>>>
> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
> >>> >>> >> gshapira@cloudera.com>
> >>> >>> >> >>>>> wrote:
> >>> >>> >> >>>>>
> >>> >>> >> >>>>>> Is the protocol bump caused by the behavior change or the
> >>> >>> >> >>>>>> new
> >>> >>> error
> >>> >>> >> >>>>>> code?
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to
> >>> >>> >> >>>>>> receive
> >>> >>> >> errors
> >>> >>> >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK,
> >>> >>> >> >>>>>> clients
> >>> >>> >> don't
> >>> >>> >> >>>>>> break on unknown errors, they are simple more challenging
> >>> >>> >> >>>>>> to
> >>> >>> >> debug. If
> >>> >>> >> >>>>>> we document the new behavior, then its definitely
> >>> >>> >> >>>>>> debuggable and
> >>> >>> >> >>>>>> fixable.
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>> 2) The behavior change is basically a deprecation - i.e.
> >>> >>> >> >>>>>> acks > 1
> >>> >>> >> were
> >>> >>> >> >>>>>> never documented, and are not supported by Kafka clients
> >>> >>> >> >>>>>> starting
> >>> >>> >> with
> >>> >>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump
> >>> >>> >> >>>>>> either,
> >>> >>> >> >>>>>> although its a better case than new error codes.
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>> Thanks,
> >>> >>> >> >>>>>> Gwen
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
> >>> >>> joe.stein@stealth.ly>
> >>> >>> >> >>>>>> wrote:
> >>> >>> >> >>>>>>> Looping in the mailing list that the client developers
> >>> >>> >> >>>>>>> live on
> >>> >>> >> because
> >>> >>> >> >>>>>> they
> >>> >>> >> >>>>>>> are all not on dev (though they should be if they want
> to
> >>> >>> >> >>>>>>> be
> >>> >>> >> helping
> >>> >>> >> >>>>>>> to
> >>> >>> >> >>>>>>> build the best client libraries they can).
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> I whole hardily believe that we need to not break
> existing
> >>> >>> >> >>>>>>> functionality
> >>> >>> >> >>>>>> of
> >>> >>> >> >>>>>>> the client protocol, ever.
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> There are many reasons for this and we have other
> threads
> >>> >>> >> >>>>>>> on the
> >>> >>> >> >>>>>>> mailing
> >>> >>> >> >>>>>>> list where we are discussing that topic (no pun
> intended)
> >>> >>> >> >>>>>>> that I
> >>> >>> >> don't
> >>> >>> >> >>>>>> want
> >>> >>> >> >>>>>>> to re-hash here.
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> If we change wire protocol functionality OR the binary
> >>> >>> >> >>>>>>> format
> >>> >>> >> (either)
> >>> >>> >> >>>>>>> we
> >>> >>> >> >>>>>>> must bump version AND treat version as a feature flag
> with
> >>> >>> >> backward
> >>> >>> >> >>>>>>> compatibility support until it is deprecated for some
> time
> >>> >>> >> >>>>>>> for
> >>> >>> >> folks
> >>> >>> >> >>>>>>> to
> >>> >>> >> >>>>>> deal
> >>> >>> >> >>>>>>> with it.
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> match version = {
> >>> >>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
> >>> >>> >> >>>>>>> case 1: doNewStuff()
> >>> >>> >> >>>>>>> case 2: doEvenMoreNewStuff()
> >>> >>> >> >>>>>>> }
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> has to be a practice we adopt imho ... I know feature
> >>> >>> >> >>>>>>> flags can
> >>> >>> be
> >>> >>> >> >>>>>> construed
> >>> >>> >> >>>>>>> as "messy code" but I am eager to hear another (better?
> >>> >>> >> different?)
> >>> >>> >> >>>>>> solution
> >>> >>> >> >>>>>>> to this.
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> If we don't do a feature flag like this specifically
> with
> >>> >>> >> >>>>>>> this
> >>> >>> >> change
> >>> >>> >> >>>>>> then
> >>> >>> >> >>>>>>> what happens is that someone upgrades their brokers
> with a
> >>> >>> rolling
> >>> >>> >> >>>>>> restart
> >>> >>> >> >>>>>>> in 0.8.3 and every single one of their producer requests
> >>> >>> >> >>>>>>> start
> >>> >>> to
> >>> >>> >> fail
> >>> >>> >> >>>>>> and
> >>> >>> >> >>>>>>> they have a major production outage. eeeek!!!!
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY*
> >>> >>> >> >>>>>>> need
> >>> >>> >> people to
> >>> >>> >> >>>>>> start
> >>> >>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is
> going
> >>> >>> >> >>>>>>> to
> >>> >>> >> work for
> >>> >>> >> >>>>>>> everyone.
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> Old producers and consumers must keep working with new
> >>> >>> >> >>>>>>> brokers
> >>> >>> >> and if
> >>> >>> >> >>>>>>> we
> >>> >>> >> >>>>>> are
> >>> >>> >> >>>>>>> not going to support that then I am unclear what the use
> >>> >>> >> >>>>>>> of
> >>> >>> >> "version"
> >>> >>> >> >>>>>>> is
> >>> >>> >> >>>>>>> based on our original intentions of having it because of
> >>> >>> >> >>>>>>> the
> >>> >>> >> >>>>>>> 0.7=>-0.8.
> >>> >>> >> >>>>>> We
> >>> >>> >> >>>>>>> said no more breaking changes when we did that.
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> - Joe Stein
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava
> <
> >>> >>> >> >>>>>> ewen@confluent.io>
> >>> >>> >> >>>>>>> wrote:
> >>> >>> >> >>>>>>>>
> >>> >>> >> >>>>>>>> Right, so this looks like it could create an issue
> >>> >>> >> >>>>>>>> similar to
> >>> >>> >> what's
> >>> >>> >> >>>>>>>> currently being discussed in
> >>> >>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where
> >>> >>> >> >>>>>>>> users
> >>> >>> >> now get
> >>> >>> >> >>>>>>>> errors
> >>> >>> >> >>>>>>>> under conditions when they previously wouldn't. Old
> >>> >>> >> >>>>>>>> clients
> >>> >>> won't
> >>> >>> >> >>>>>>>> even
> >>> >>> >> >>>>>>>> know
> >>> >>> >> >>>>>>>> about the error code, so besides failing they won't
> even
> >>> >>> >> >>>>>>>> be
> >>> >>> able
> >>> >>> >> to
> >>> >>> >> >>>>>>>> log
> >>> >>> >> >>>>>>>> any
> >>> >>> >> >>>>>>>> meaningful error messages.
> >>> >>> >> >>>>>>>>
> >>> >>> >> >>>>>>>> I think there are two options for compatibility:
> >>> >>> >> >>>>>>>>
> >>> >>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1 code,
> >>> >>> >> >>>>>>>> but
> >>> >>> >> silently
> >>> >>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This
> isn't
> >>> >>> >> >>>>>>>> the
> >>> >>> >> same as
> >>> >>> >> >>>>>>>> other
> >>> >>> >> >>>>>>>> changes to behavior since the interaction between the
> >>> >>> >> >>>>>>>> client
> >>> >>> and
> >>> >>> >> >>>>>>>> server
> >>> >>> >> >>>>>>>> remains the same, no error codes change, etc. The
> client
> >>> >>> >> >>>>>>>> might
> >>> >>> >> just
> >>> >>> >> >>>>>>>> see
> >>> >>> >> >>>>>>>> some increased latency since the message might need to
> be
> >>> >>> >> replicated
> >>> >>> >> >>>>>>>> to
> >>> >>> >> >>>>>>>> more brokers than they requested.
> >>> >>> >> >>>>>>>> 2. Split this into two patches, one that bumps the
> >>> >>> >> >>>>>>>> protocol
> >>> >>> >> version
> >>> >>> >> >>>>>>>> on
> >>> >>> >> >>>>>>>> that
> >>> >>> >> >>>>>>>> message to include the new error code but maintains
> both
> >>> >>> >> >>>>>>>> old
> >>> >>> (now
> >>> >>> >> >>>>>>>> deprecated) and new behavior, then a second that would
> be
> >>> >>> >> applied in
> >>> >>> >> >>>>>>>> a
> >>> >>> >> >>>>>>>> later release that removes the old protocol + code for
> >>> >>> >> >>>>>>>> handling
> >>> >>> >> acks
> >>> >>> >> >>>>>> 1.
> >>> >>> >> >>>>>>>>
> >>> >>> >> >>>>>>>> 2 is probably the right thing to do. If we specify the
> >>> >>> >> >>>>>>>> release
> >>> >>> >> when
> >>> >>> >> >>>>>> we'll
> >>> >>> >> >>>>>>>> remove the deprecated protocol at the time of
> deprecation
> >>> >>> >> >>>>>>>> it
> >>> >>> >> makes
> >>> >>> >> >>>>>> things
> >>> >>> >> >>>>>>>> a
> >>> >>> >> >>>>>>>> lot easier for people writing non-java clients and
> could
> >>> >>> >> >>>>>>>> give
> >>> >>> >> users
> >>> >>> >> >>>>>> better
> >>> >>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major
> >>> >>> >> >>>>>>>> release
> >>> >>> >> behind
> >>> >>> >> >>>>>>>> brokers,
> >>> >>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
> >>> >>> features).
> >>> >>> >> >>>>>>>>
> >>> >>> >> >>>>>>>>
> >>> >>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
> >>> >>> >> gshapira@cloudera.com>
> >>> >>> >> >>>>>>>> wrote:
> >>> >>> >> >>>>>>>>
> >>> >>> >> >>>>>>>>> Hi Kafka Devs,
> >>> >>> >> >>>>>>>>>
> >>> >>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to
> >>> >>> >> >>>>>>>>> ack>1 on
> >>> >>> >> the
> >>> >>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give
> everyone
> >>> >>> >> >>>>>>>>> a
> >>> >>> >> heads up
> >>> >>> >> >>>>>>>>> on
> >>> >>> >> >>>>>>>>> what these changes mean.
> >>> >>> >> >>>>>>>>>
> >>> >>> >> >>>>>>>>> Once this patch is included, any produce requests that
> >>> >>> >> >>>>>>>>> include
> >>> >>> >> >>>>>>>>> request.required.acks > 1 will result in an exception.
> >>> >>> >> >>>>>>>>> This
> >>> >>> >> will be
> >>> >>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I
> >>> >>> >> >>>>>>>>> assume)
> >>> >>> and
> >>> >>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I
> >>> >>> >> >>>>>>>>> can't add
> >>> >>> >> error
> >>> >>> >> >>>>>>>>> codes retroactively).
> >>> >>> >> >>>>>>>>>
> >>> >>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers
> >>> >>> >> >>>>>>>>> (sync and
> >>> >>> >> >>>>>>>>> new),
> >>> >>> >> >>>>>>>>> but we expect impact on users with older producers
> that
> >>> >>> >> >>>>>>>>> relied
> >>> >>> >> on
> >>> >>> >> >>>>>>>>> acks
> >>> >>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
> >>> >>> >> >>>>>>>>>
> >>> >>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch to
> >>> >>> >> >>>>>>>>> using
> >>> >>> >> acks =
> >>> >>> >> >>>>>>>>> -1
> >>> >>> >> >>>>>>>>> and a min.isr parameter than matches their user case.
> >>> >>> >> >>>>>>>>>
> >>> >>> >> >>>>>>>>> This change was discussed in the past in the context
> of
> >>> >>> >> KAFKA-1555
> >>> >>> >> >>>>>>>>> (min.isr), but let us know if you have any questions
> or
> >>> >>> concerns
> >>> >>> >> >>>>>>>>> regarding this change.
> >>> >>> >> >>>>>>>>>
> >>> >>> >> >>>>>>>>> Gwen
> >>> >>> >> >>>>>>>>
> >>> >>> >> >>>>>>>>
> >>> >>> >> >>>>>>>>
> >>> >>> >> >>>>>>>> --
> >>> >>> >> >>>>>>>> Thanks,
> >>> >>> >> >>>>>>>> Ewen
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> --
> >>> >>> >> >>>>>>> You received this message because you are subscribed to
> >>> >>> >> >>>>>>> the
> >>> >>> Google
> >>> >>> >> >>>>>>> Groups
> >>> >>> >> >>>>>>> "kafka-clients" group.
> >>> >>> >> >>>>>>> To unsubscribe from this group and stop receiving emails
> >>> >>> >> >>>>>>> from
> >>> >>> it,
> >>> >>> >> send
> >>> >>> >> >>>>>>> an
> >>> >>> >> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
> >>> >>> >> >>>>>>> To post to this group, send email to
> >>> >>> >> kafka-clients@googlegroups.com.
> >>> >>> >> >>>>>>> Visit this group at
> >>> >>> http://groups.google.com/group/kafka-clients.
> >>> >>> >> >>>>>>> To view this discussion on the web visit
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>>
> >>> >>> >>
> >>> >>>
> >>> >>>
> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
> >>> >>> >> >>>>>> .
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> For more options, visit
> >>> >>> >> >>>>>>> https://groups.google.com/d/optout.
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>> --
> >>> >>> >> >>>>>> You received this message because you are subscribed to
> the
> >>> >>> Google
> >>> >>> >> >>>>>> Groups
> >>> >>> >> >>>>>> "kafka-clients" group.
> >>> >>> >> >>>>>> To unsubscribe from this group and stop receiving emails
> >>> >>> >> >>>>>> from it,
> >>> >>> >> send
> >>> >>> >> >>>>>> an
> >>> >>> >> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
> >>> >>> >> >>>>>> To post to this group, send email to
> >>> >>> >> kafka-clients@googlegroups.com.
> >>> >>> >> >>>>>> Visit this group at
> >>> >>> >> >>>>>> http://groups.google.com/group/kafka-clients
> >>> >>> .
> >>> >>> >> >>>>>> To view this discussion on the web visit
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>>
> >>> >>> >>
> >>> >>>
> >>> >>>
> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
> >>> >>> >> >>>>>> .
> >>> >>> >> >>>>>> For more options, visit
> https://groups.google.com/d/optout.
> >>> >>> >> >>>>
> >>> >>> >> >>>>
> >>> >>> >> >>>>
> >>> >>> >> >>>>
> >>> >>> >> >>>> --
> >>> >>> >> >>>> Thanks,
> >>> >>> >> >>>> Ewen
> >>> >>> >>
> >>> >>> >
> >>> >>> >
> >>> >>>
> >>> >>>
> >>> >>> --
> >>> >>> Thanks,
> >>> >>> Ewen
> >>> >>>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Thanks,
> >>> >> Neha
> >>
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "kafka-clients" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to kafka-clients+unsubscribe@googlegroups.com.
> >> To post to this group, send email to kafka-clients@googlegroups.com.
> >> Visit this group at http://groups.google.com/group/kafka-clients.
> >> To view this discussion on the web visit
> >>
> https://groups.google.com/d/msgid/kafka-clients/CAOeJiJh17CYq%3D-qgPu9rnArsPW%3D7RL9AAW_h%3DrrXx0%2BKhhKgNQ%40mail.gmail.com
> .
> >>
> >> For more options, visit https://groups.google.com/d/optout.
> >
> >
>

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Gwen Shapira <gs...@cloudera.com>.
Overall, agree on point #1, less sure on point #2.

1. Some protocols never ever add new errors, while others add errors
without bumping versions. HTTP is a good example of the second type.
HTTP-451 was added fairly recently, there are some errors specific to
NGINX, etc. No one cares. I think we should properly document in the
wire-protocol doc that new errors can be added, and I think we should
strongly suggest (and implement ourselves) that unknown error codes
should be shown to users (or at least logged), so they can be googled
and understood through our documentation.
In addition, hierarchy of error codes, so clients will know if an
error is retry-able just by looking at the code could be nice. Same
for adding an error string to the protocol. These are future
enhancements that should be discussed separately.

2. I think we want to allow admins to upgrade their Kafka brokers
without having to chase down clients in their organization and without
getting blamed if clients break. I think it makes sense to have one
version that will support existing behavior, but log warnings, so
admins will know about misbehaving clients and can track them down
before an upgrade that breaks them (or before the broken config causes
them to lose data!). Hopefully this is indeed a very rare behavior and
we are taking extra precaution for nothing, but I have customers where
one traumatic upgrade means they will never upgrade a Kafka again, so
I'm being conservative.

Gwen


On Sun, Jan 18, 2015 at 3:50 PM, Jun Rao <ju...@confluent.io> wrote:
> Overall, I agree with Jay on both points.
>
> 1. I think it's reasonable to add new error codes w/o bumping up the
> protocol version. In most cases, by adding new error codes, we are just
> refining the categorization of those unknown errors. So, a client shouldn't
> behave worse than before as long as unknown errors have been properly
> handled.
>
> 2. I think it's reasonable to just document that 0.8.2 will be the last
> release that will support ack > 1 and remove the support completely in trunk
> w/o bumping up the protocol. This is because (a) we never included ack > 1
> explicitly in the documentation and so the usage should be limited; (2) ack
>> 1 doesn't provide the guarantee that people really want and so it
> shouldn't really be used.
>
> Thanks,
>
> Jun
>
>
> On Sun, Jan 18, 2015 at 11:03 AM, Jay Kreps <ja...@gmail.com> wrote:
>>
>> Hey guys,
>>
>> I really think we are discussing two things here:
>>
>> How should we generally handle changes to the set of errors? Should
>> introducing new errors be considered a protocol change or should we reserve
>> the right to introduce new error codes?
>> Given that this particular change is possibly incompatible, how should we
>> handle it?
>>
>> I think it would be good for people who are responding here to be specific
>> about which they are addressing.
>>
>> Here is what I think:
>>
>> 1. Errors should be extensible within a protocol version.
>>
>> We should change the protocol documentation to list the errors that can be
>> given back from each api, their meaning, and how to handle them, BUT we
>> should explicitly state that the set of errors are open ended. That is we
>> should reserve the right to introduce new errors and explicitly state that
>> clients need a blanket "unknown error" handling mechanism. The error can
>> link to the protocol definition (something like "Unknown error 42, see
>> protocol definition at http://link"). We could make this work really well by
>> instructing all the clients to report the error in a very googlable way as
>> Oracle does with their error format (e.g. "ORA-32") so that if you ever get
>> the raw error google will take you to the definition.
>>
>> I agree that a more rigid definition seems like "right thing", but having
>> just implemented two clients and spent a bunch of time on the server side, I
>> think, it will work out poorly in practice. Here is why:
>>
>> I think we will make a lot of mistakes in nailing down the set of error
>> codes up front and we will end up going through 3-4 churns of the protocol
>> definition just realizing the set of errors that can be thrown. I think this
>> churn will actually make life worse for clients that now have to figure out
>> 7 identical versions of the protocol and will be a mess in terms of testing
>> on the server side. I actually know this to be true because while
>> implementing the clients I tried to guess the errors that could be thrown,
>> then checked my guess by close code inspection. It turned out that I always
>> missed things in my belief about errors, but more importantly even after
>> close code inspection I found tons of other errors in my stress testing.
>> In practice error handling always involves calling out one or two
>> meaningful failures that have special recovery and then a blanket case that
>> just handles everything else. It's true that some clients may not have done
>> this well, but I think it is for the best if they fix that.
>> Reserving the right to add errors doesn't mean we will do it without care.
>> We will think through each change and decide whether giving a little more
>> precision in the error is worth the overhead and churn of a protocol version
>> bump.
>>
>> 2. In this case in particular we should not introduce a new protocol
>> version
>>
>> In this particular case we are saying that acks > 1 doesn't make sense and
>> we want to give an error to people specifying this so that they change their
>> configuration. This is a configuration that few people use and we want to
>> just make it an error. The bad behavior will just be that the error will not
>> be as good as it could be. I think that is a better tradeoff than
>> introducing a separate protocol version (this may be true of the java
>> clients too).
>>
>> We will have lots of cases like this in the future and we aren't going to
>> want to churn the protocol for each of them. For example we previously had
>> to get more precise about which characters were legal and which illegal in
>> topic names.
>>
>> -Jay
>>
>> On Fri, Jan 16, 2015 at 11:55 AM, Gwen Shapira <gs...@cloudera.com>
>> wrote:
>>>
>>> I updated the KIP: Using acks > 1 in version 0 will log a WARN message
>>> in the broker about client using deprecated behavior (suggested by Joe
>>> in the JIRA, and I think it makes sense).
>>>
>>> Gwen
>>>
>>> On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira <gs...@cloudera.com>
>>> wrote:
>>> > How about we continue the discussion on this thread, so we won't lose
>>> > the context of this discussion, and put it up for VOTE when this has
>>> > been finalized?
>>> >
>>> > On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <ne...@confluent.io>
>>> > wrote:
>>> >> Gwen,
>>> >>
>>> >> KIP write-up looks good. According to the rest of the KIP process
>>> >> proposal,
>>> >> would you like to start a DISCUSS/VOTE thread for it?
>>> >>
>>> >> Thanks,
>>> >> Neha
>>> >>
>>> >> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava
>>> >> <ew...@confluent.io>
>>> >> wrote:
>>> >>
>>> >>> Gwen -- KIP write up looks good. Deprecation schedule probably needs
>>> >>> to be
>>> >>> more specific, but I think that discussion probably needs to happen
>>> >>> after a
>>> >>> solution is agreed upon.
>>> >>>
>>> >>> Jay -- I think "older clients will get a bad error message instead of
>>> >>> a
>>> >>> good one" isn't what would be happening with this change. Previously
>>> >>> they
>>> >>> wouldn't have received an error and they would have been able to
>>> >>> produce
>>> >>> messages. After the change they'll just receive this new error
>>> >>> message
>>> >>> which their clients can't possibly handle gracefully since it didn't
>>> >>> exist
>>> >>> when the client was written. Whether the acks > 1 setting was
>>> >>> actually
>>> >>> accomplishing what they thought doesn't matter. Someone could have
>>> >>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok
>>> >>> setting for
>>> >>> their applications, set it as a default across a bunch of apps, then
>>> >>> follow
>>> >>> the recommended upgrade path of updating brokers to 0.8.2 and all
>>> >>> their
>>> >>> apps will start failing on produce requests.
>>> >>>
>>> >>>
>>> >>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>> > This is a good case to discuss.
>>> >>> >
>>> >>> > Let's figure the general case of how we want to handle errors and
>>> >>> > get
>>> >>> that
>>> >>> > documented in the protocol. The problem right now is that we give
>>> >>> > no
>>> >>> > guidance on this. I actually thought Gwen's suggestion made sense
>>> >>> > on the
>>> >>> > guidance we should have given which is that we will enumerate a set
>>> >>> > of
>>> >>> > errors and their meaning for each API but it is possible that other
>>> >>> errors
>>> >>> > will occur and they should be handled (maybe poorly) in the same
>>> >>> > way
>>> >>> > UNKNOWN_ERROR is handled which is our normal escape hatch for
>>> >>> > things like
>>> >>> > OOMException.
>>> >>> >
>>> >>> > I really do think we shouldn't be dogmatic here: In considering a
>>> >>> > change
>>> >>> > to errors we should consider the potential ill-effect vs the
>>> >>> > complexity
>>> >>> of
>>> >>> > yet another protocol version.
>>> >>> >
>>> >>> > In this case I actually am not sure we need to bump the protocol
>>> >>> > because
>>> >>> > the whole point of the change was to make a setting we think
>>> >>> > doesn't make
>>> >>> > sense break, right? Well this will break it. It seems like the only
>>> >>> > downside is that older clients will get a bad error message instead
>>> >>> > of a
>>> >>> > good one. But it isn't like we will have rendered a client
>>> >>> > unusable, it
>>> >>> is
>>> >>> > just that they will need to change their config.
>>> >>> >
>>> >>> > -Jay
>>> >>> >
>>> >>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira
>>> >>> > <gs...@cloudera.com>
>>> >>> > wrote:
>>> >>> >
>>> >>> >> I created a KIP for this suggestion:
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>>
>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>>> >>> >>
>>> >>> >> Basically documenting what was already discussed here.  Comments
>>> >>> >> will
>>> >>> >> be awesome!
>>> >>> >>
>>> >>> >> Gwen
>>> >>> >>
>>> >>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira
>>> >>> >> <gs...@cloudera.com>
>>> >>> >> wrote:
>>> >>> >> > The errors are part of the KIP process now, so I think the
>>> >>> >> > clients are
>>> >>> >> safe :)
>>> >>> >> >
>>> >>> >> >
>>> >>> >>
>>> >>>
>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>> >>> >> >
>>> >>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin
>>> >>> >> > <st...@gmail.com>
>>> >>> >> wrote:
>>> >>> >> >> Agree errors should be part of the protocol
>>> >>> >> >>
>>> >>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira
>>> >>> >> >>> <gs...@cloudera.com>
>>> >>> >> wrote:
>>> >>> >> >>>
>>> >>> >> >>> Hi,
>>> >>> >> >>>
>>> >>> >> >>> I got convinced by Joe and Dana that errors are indeed part of
>>> >>> >> >>> the
>>> >>> >> >>> protocol and can't be randomly added.
>>> >>> >> >>>
>>> >>> >> >>> So, it looks like we need to bump version of ProduceRequest in
>>> >>> >> >>> the
>>> >>> >> >>> following way:
>>> >>> >> >>> Version 0 -> accept acks >1. I think we should keep the
>>> >>> >> >>> existing
>>> >>> >> >>> behavior too (i.e. not replace it with -1) to avoid surprising
>>> >>> >> >>> clients, but I'm willing to hear other opinions.
>>> >>> >> >>> Version 1 -> do not accept acks >1 and return an error.
>>> >>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use
>>> >>> >> >>> something
>>> >>> >> >>> less specific like InvalidRequestParameter. This error can be
>>> >>> >> >>> reused
>>> >>> >> >>> in the future and reduce the need to add errors, but will also
>>> >>> >> >>> be
>>> >>> less
>>> >>> >> >>> clear to the client and its users. Maybe even add the error
>>> >>> >> >>> message
>>> >>> >> >>> string to the protocol in addition to the error code? (since
>>> >>> >> >>> we are
>>> >>> >> >>> bumping versions....)
>>> >>> >> >>>
>>> >>> >> >>> I think maintaining the old version throughout 0.8.X makes
>>> >>> >> >>> sense.
>>> >>> IMO
>>> >>> >> >>> dropping it for 0.9 is feasible, but I'll let client owners
>>> >>> >> >>> help
>>> >>> make
>>> >>> >> >>> that call.
>>> >>> >> >>>
>>> >>> >> >>> Am I missing anything? Should I start a KIP? It seems like a
>>> >>> KIP-type
>>> >>> >> >>> discussion :)
>>> >>> >> >>>
>>> >>> >> >>> Gwen
>>> >>> >> >>>
>>> >>> >> >>>
>>> >>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>>> >>> >> >>> <ew...@confluent.io> wrote:
>>> >>> >> >>>> Gwen,
>>> >>> >> >>>>
>>> >>> >> >>>> I think the only option that wouldn't require a protocol
>>> >>> >> >>>> version
>>> >>> >> change is
>>> >>> >> >>>> the one where acks > 1 is converted to acks = -1 since it's
>>> >>> >> >>>> the
>>> >>> only
>>> >>> >> one
>>> >>> >> >>>> that doesn't potentially break older clients. The protocol
>>> >>> >> >>>> guide
>>> >>> >> says that
>>> >>> >> >>>> the expected upgrade path is servers first, then clients, so
>>> >>> >> >>>> old
>>> >>> >> clients,
>>> >>> >> >>>> including non-java clients, that may be using acks > 1 should
>>> >>> >> >>>> be
>>> >>> >> able to
>>> >>> >> >>>> work with a new broker version.
>>> >>> >> >>>>
>>> >>> >> >>>> It's more work, but I think dealing with the protocol change
>>> >>> >> >>>> is the
>>> >>> >> right
>>> >>> >> >>>> thing to do since it eventually gets us to the behavior I
>>> >>> >> >>>> think is
>>> >>> >> better --
>>> >>> >> >>>> the broker should reject requests with invalid values. I
>>> >>> >> >>>> think Joe
>>> >>> >> and I
>>> >>> >> >>>> were basically in agreement. In my mind the major piece
>>> >>> >> >>>> missing
>>> >>> from
>>> >>> >> his
>>> >>> >> >>>> description is how long we're going to maintain his "case 0"
>>> >>> >> behavior. It's
>>> >>> >> >>>> impractical to maintain old versions forever, but it sounds
>>> >>> >> >>>> like
>>> >>> >> there
>>> >>> >> >>>> hasn't been a decision on how long to maintain them. Maybe
>>> >>> >> >>>> that's
>>> >>> >> another
>>> >>> >> >>>> item to add to KIPs -- protocol versions and behavior need to
>>> >>> >> >>>> be
>>> >>> >> listed as
>>> >>> >> >>>> deprecated and the earliest version in which they'll be
>>> >>> >> >>>> removed
>>> >>> >> should be
>>> >>> >> >>>> specified so users can understand which versions are
>>> >>> >> >>>> guaranteed to
>>> >>> be
>>> >>> >> >>>> compatible, even if they're using (well-written) non-java
>>> >>> >> >>>> clients.
>>> >>> >> >>>>
>>> >>> >> >>>> -Ewen
>>> >>> >> >>>>
>>> >>> >> >>>>
>>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
>>> >>> >> dana.powers@gmail.com> wrote:
>>> >>> >> >>>>>
>>> >>> >> >>>>>> clients don't break on unknown errors
>>> >>> >> >>>>>
>>> >>> >> >>>>> maybe true for the official java clients, but I dont think
>>> >>> >> >>>>> the
>>> >>> >> assumption
>>> >>> >> >>>>> holds true for community-maintained clients and users of
>>> >>> >> >>>>> those
>>> >>> >> clients.
>>> >>> >> >>>>> kafka-python generally follows the fail-fast philosophy and
>>> >>> >> >>>>> raises
>>> >>> >> an
>>> >>> >> >>>>> exception on any unrecognized error code in any server
>>> >>> >> >>>>> response.
>>> >>> >> in this
>>> >>> >> >>>>> case, kafka-python allows users to set their own
>>> >>> >> >>>>> required-acks
>>> >>> >> policy when
>>> >>> >> >>>>> creating a producer instance.  It is possible that users of
>>> >>> >> kafka-python
>>> >>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps in
>>> >>> production
>>> >>> >> >>>>> environments -- and for those users the new error code will
>>> >>> >> >>>>> crash
>>> >>> >> their
>>> >>> >> >>>>> producer code.  I would not be surprised if the same were
>>> >>> >> >>>>> true of
>>> >>> >> other
>>> >>> >> >>>>> community clients.
>>> >>> >> >>>>>
>>> >>> >> >>>>> *one reason for the fail-fast approach is that there isn't
>>> >>> >> >>>>> great
>>> >>> >> >>>>> documentation on what errors to expect for each request /
>>> >>> >> >>>>> response
>>> >>> >> -- so
>>> >>> >> >>>>> we
>>> >>> >> >>>>> use failures to alert that some error case is not handled
>>> >>> >> properly.  and
>>> >>> >> >>>>> because of that, introducing new error cases without bumping
>>> >>> >> >>>>> the
>>> >>> api
>>> >>> >> >>>>> version is likely to cause those errors to get raised/thrown
>>> >>> >> >>>>> all
>>> >>> >> the way
>>> >>> >> >>>>> back up to the user.  of course we (client maintainers) can
>>> >>> >> >>>>> fix
>>> >>> the
>>> >>> >> issues
>>> >>> >> >>>>> in the client libraries and suggest users upgrade, but it's
>>> >>> >> >>>>> not
>>> >>> the
>>> >>> >> ideal
>>> >>> >> >>>>> situation.
>>> >>> >> >>>>>
>>> >>> >> >>>>>
>>> >>> >> >>>>> long-winded way of saying: I agree w/ Joe.
>>> >>> >> >>>>>
>>> >>> >> >>>>> -Dana
>>> >>> >> >>>>>
>>> >>> >> >>>>>
>>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
>>> >>> >> gshapira@cloudera.com>
>>> >>> >> >>>>> wrote:
>>> >>> >> >>>>>
>>> >>> >> >>>>>> Is the protocol bump caused by the behavior change or the
>>> >>> >> >>>>>> new
>>> >>> error
>>> >>> >> >>>>>> code?
>>> >>> >> >>>>>>
>>> >>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to
>>> >>> >> >>>>>> receive
>>> >>> >> errors
>>> >>> >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK,
>>> >>> >> >>>>>> clients
>>> >>> >> don't
>>> >>> >> >>>>>> break on unknown errors, they are simple more challenging
>>> >>> >> >>>>>> to
>>> >>> >> debug. If
>>> >>> >> >>>>>> we document the new behavior, then its definitely
>>> >>> >> >>>>>> debuggable and
>>> >>> >> >>>>>> fixable.
>>> >>> >> >>>>>>
>>> >>> >> >>>>>> 2) The behavior change is basically a deprecation - i.e.
>>> >>> >> >>>>>> acks > 1
>>> >>> >> were
>>> >>> >> >>>>>> never documented, and are not supported by Kafka clients
>>> >>> >> >>>>>> starting
>>> >>> >> with
>>> >>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump
>>> >>> >> >>>>>> either,
>>> >>> >> >>>>>> although its a better case than new error codes.
>>> >>> >> >>>>>>
>>> >>> >> >>>>>> Thanks,
>>> >>> >> >>>>>> Gwen
>>> >>> >> >>>>>>
>>> >>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
>>> >>> joe.stein@stealth.ly>
>>> >>> >> >>>>>> wrote:
>>> >>> >> >>>>>>> Looping in the mailing list that the client developers
>>> >>> >> >>>>>>> live on
>>> >>> >> because
>>> >>> >> >>>>>> they
>>> >>> >> >>>>>>> are all not on dev (though they should be if they want to
>>> >>> >> >>>>>>> be
>>> >>> >> helping
>>> >>> >> >>>>>>> to
>>> >>> >> >>>>>>> build the best client libraries they can).
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> I whole hardily believe that we need to not break existing
>>> >>> >> >>>>>>> functionality
>>> >>> >> >>>>>> of
>>> >>> >> >>>>>>> the client protocol, ever.
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> There are many reasons for this and we have other threads
>>> >>> >> >>>>>>> on the
>>> >>> >> >>>>>>> mailing
>>> >>> >> >>>>>>> list where we are discussing that topic (no pun intended)
>>> >>> >> >>>>>>> that I
>>> >>> >> don't
>>> >>> >> >>>>>> want
>>> >>> >> >>>>>>> to re-hash here.
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> If we change wire protocol functionality OR the binary
>>> >>> >> >>>>>>> format
>>> >>> >> (either)
>>> >>> >> >>>>>>> we
>>> >>> >> >>>>>>> must bump version AND treat version as a feature flag with
>>> >>> >> backward
>>> >>> >> >>>>>>> compatibility support until it is deprecated for some time
>>> >>> >> >>>>>>> for
>>> >>> >> folks
>>> >>> >> >>>>>>> to
>>> >>> >> >>>>>> deal
>>> >>> >> >>>>>>> with it.
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> match version = {
>>> >>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
>>> >>> >> >>>>>>> case 1: doNewStuff()
>>> >>> >> >>>>>>> case 2: doEvenMoreNewStuff()
>>> >>> >> >>>>>>> }
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> has to be a practice we adopt imho ... I know feature
>>> >>> >> >>>>>>> flags can
>>> >>> be
>>> >>> >> >>>>>> construed
>>> >>> >> >>>>>>> as "messy code" but I am eager to hear another (better?
>>> >>> >> different?)
>>> >>> >> >>>>>> solution
>>> >>> >> >>>>>>> to this.
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> If we don't do a feature flag like this specifically with
>>> >>> >> >>>>>>> this
>>> >>> >> change
>>> >>> >> >>>>>> then
>>> >>> >> >>>>>>> what happens is that someone upgrades their brokers with a
>>> >>> rolling
>>> >>> >> >>>>>> restart
>>> >>> >> >>>>>>> in 0.8.3 and every single one of their producer requests
>>> >>> >> >>>>>>> start
>>> >>> to
>>> >>> >> fail
>>> >>> >> >>>>>> and
>>> >>> >> >>>>>>> they have a major production outage. eeeek!!!!
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY*
>>> >>> >> >>>>>>> need
>>> >>> >> people to
>>> >>> >> >>>>>> start
>>> >>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going
>>> >>> >> >>>>>>> to
>>> >>> >> work for
>>> >>> >> >>>>>>> everyone.
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> Old producers and consumers must keep working with new
>>> >>> >> >>>>>>> brokers
>>> >>> >> and if
>>> >>> >> >>>>>>> we
>>> >>> >> >>>>>> are
>>> >>> >> >>>>>>> not going to support that then I am unclear what the use
>>> >>> >> >>>>>>> of
>>> >>> >> "version"
>>> >>> >> >>>>>>> is
>>> >>> >> >>>>>>> based on our original intentions of having it because of
>>> >>> >> >>>>>>> the
>>> >>> >> >>>>>>> 0.7=>-0.8.
>>> >>> >> >>>>>> We
>>> >>> >> >>>>>>> said no more breaking changes when we did that.
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> - Joe Stein
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
>>> >>> >> >>>>>> ewen@confluent.io>
>>> >>> >> >>>>>>> wrote:
>>> >>> >> >>>>>>>>
>>> >>> >> >>>>>>>> Right, so this looks like it could create an issue
>>> >>> >> >>>>>>>> similar to
>>> >>> >> what's
>>> >>> >> >>>>>>>> currently being discussed in
>>> >>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where
>>> >>> >> >>>>>>>> users
>>> >>> >> now get
>>> >>> >> >>>>>>>> errors
>>> >>> >> >>>>>>>> under conditions when they previously wouldn't. Old
>>> >>> >> >>>>>>>> clients
>>> >>> won't
>>> >>> >> >>>>>>>> even
>>> >>> >> >>>>>>>> know
>>> >>> >> >>>>>>>> about the error code, so besides failing they won't even
>>> >>> >> >>>>>>>> be
>>> >>> able
>>> >>> >> to
>>> >>> >> >>>>>>>> log
>>> >>> >> >>>>>>>> any
>>> >>> >> >>>>>>>> meaningful error messages.
>>> >>> >> >>>>>>>>
>>> >>> >> >>>>>>>> I think there are two options for compatibility:
>>> >>> >> >>>>>>>>
>>> >>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1 code,
>>> >>> >> >>>>>>>> but
>>> >>> >> silently
>>> >>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't
>>> >>> >> >>>>>>>> the
>>> >>> >> same as
>>> >>> >> >>>>>>>> other
>>> >>> >> >>>>>>>> changes to behavior since the interaction between the
>>> >>> >> >>>>>>>> client
>>> >>> and
>>> >>> >> >>>>>>>> server
>>> >>> >> >>>>>>>> remains the same, no error codes change, etc. The client
>>> >>> >> >>>>>>>> might
>>> >>> >> just
>>> >>> >> >>>>>>>> see
>>> >>> >> >>>>>>>> some increased latency since the message might need to be
>>> >>> >> replicated
>>> >>> >> >>>>>>>> to
>>> >>> >> >>>>>>>> more brokers than they requested.
>>> >>> >> >>>>>>>> 2. Split this into two patches, one that bumps the
>>> >>> >> >>>>>>>> protocol
>>> >>> >> version
>>> >>> >> >>>>>>>> on
>>> >>> >> >>>>>>>> that
>>> >>> >> >>>>>>>> message to include the new error code but maintains both
>>> >>> >> >>>>>>>> old
>>> >>> (now
>>> >>> >> >>>>>>>> deprecated) and new behavior, then a second that would be
>>> >>> >> applied in
>>> >>> >> >>>>>>>> a
>>> >>> >> >>>>>>>> later release that removes the old protocol + code for
>>> >>> >> >>>>>>>> handling
>>> >>> >> acks
>>> >>> >> >>>>>> 1.
>>> >>> >> >>>>>>>>
>>> >>> >> >>>>>>>> 2 is probably the right thing to do. If we specify the
>>> >>> >> >>>>>>>> release
>>> >>> >> when
>>> >>> >> >>>>>> we'll
>>> >>> >> >>>>>>>> remove the deprecated protocol at the time of deprecation
>>> >>> >> >>>>>>>> it
>>> >>> >> makes
>>> >>> >> >>>>>> things
>>> >>> >> >>>>>>>> a
>>> >>> >> >>>>>>>> lot easier for people writing non-java clients and could
>>> >>> >> >>>>>>>> give
>>> >>> >> users
>>> >>> >> >>>>>> better
>>> >>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major
>>> >>> >> >>>>>>>> release
>>> >>> >> behind
>>> >>> >> >>>>>>>> brokers,
>>> >>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
>>> >>> features).
>>> >>> >> >>>>>>>>
>>> >>> >> >>>>>>>>
>>> >>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
>>> >>> >> gshapira@cloudera.com>
>>> >>> >> >>>>>>>> wrote:
>>> >>> >> >>>>>>>>
>>> >>> >> >>>>>>>>> Hi Kafka Devs,
>>> >>> >> >>>>>>>>>
>>> >>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to
>>> >>> >> >>>>>>>>> ack>1 on
>>> >>> >> the
>>> >>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone
>>> >>> >> >>>>>>>>> a
>>> >>> >> heads up
>>> >>> >> >>>>>>>>> on
>>> >>> >> >>>>>>>>> what these changes mean.
>>> >>> >> >>>>>>>>>
>>> >>> >> >>>>>>>>> Once this patch is included, any produce requests that
>>> >>> >> >>>>>>>>> include
>>> >>> >> >>>>>>>>> request.required.acks > 1 will result in an exception.
>>> >>> >> >>>>>>>>> This
>>> >>> >> will be
>>> >>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I
>>> >>> >> >>>>>>>>> assume)
>>> >>> and
>>> >>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I
>>> >>> >> >>>>>>>>> can't add
>>> >>> >> error
>>> >>> >> >>>>>>>>> codes retroactively).
>>> >>> >> >>>>>>>>>
>>> >>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers
>>> >>> >> >>>>>>>>> (sync and
>>> >>> >> >>>>>>>>> new),
>>> >>> >> >>>>>>>>> but we expect impact on users with older producers that
>>> >>> >> >>>>>>>>> relied
>>> >>> >> on
>>> >>> >> >>>>>>>>> acks
>>> >>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
>>> >>> >> >>>>>>>>>
>>> >>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch to
>>> >>> >> >>>>>>>>> using
>>> >>> >> acks =
>>> >>> >> >>>>>>>>> -1
>>> >>> >> >>>>>>>>> and a min.isr parameter than matches their user case.
>>> >>> >> >>>>>>>>>
>>> >>> >> >>>>>>>>> This change was discussed in the past in the context of
>>> >>> >> KAFKA-1555
>>> >>> >> >>>>>>>>> (min.isr), but let us know if you have any questions or
>>> >>> concerns
>>> >>> >> >>>>>>>>> regarding this change.
>>> >>> >> >>>>>>>>>
>>> >>> >> >>>>>>>>> Gwen
>>> >>> >> >>>>>>>>
>>> >>> >> >>>>>>>>
>>> >>> >> >>>>>>>>
>>> >>> >> >>>>>>>> --
>>> >>> >> >>>>>>>> Thanks,
>>> >>> >> >>>>>>>> Ewen
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> --
>>> >>> >> >>>>>>> You received this message because you are subscribed to
>>> >>> >> >>>>>>> the
>>> >>> Google
>>> >>> >> >>>>>>> Groups
>>> >>> >> >>>>>>> "kafka-clients" group.
>>> >>> >> >>>>>>> To unsubscribe from this group and stop receiving emails
>>> >>> >> >>>>>>> from
>>> >>> it,
>>> >>> >> send
>>> >>> >> >>>>>>> an
>>> >>> >> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>> >>> >> >>>>>>> To post to this group, send email to
>>> >>> >> kafka-clients@googlegroups.com.
>>> >>> >> >>>>>>> Visit this group at
>>> >>> http://groups.google.com/group/kafka-clients.
>>> >>> >> >>>>>>> To view this discussion on the web visit
>>> >>> >> >>>>>>
>>> >>> >> >>>>>>
>>> >>> >>
>>> >>>
>>> >>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>>> >>> >> >>>>>> .
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> For more options, visit
>>> >>> >> >>>>>>> https://groups.google.com/d/optout.
>>> >>> >> >>>>>>
>>> >>> >> >>>>>> --
>>> >>> >> >>>>>> You received this message because you are subscribed to the
>>> >>> Google
>>> >>> >> >>>>>> Groups
>>> >>> >> >>>>>> "kafka-clients" group.
>>> >>> >> >>>>>> To unsubscribe from this group and stop receiving emails
>>> >>> >> >>>>>> from it,
>>> >>> >> send
>>> >>> >> >>>>>> an
>>> >>> >> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>> >>> >> >>>>>> To post to this group, send email to
>>> >>> >> kafka-clients@googlegroups.com.
>>> >>> >> >>>>>> Visit this group at
>>> >>> >> >>>>>> http://groups.google.com/group/kafka-clients
>>> >>> .
>>> >>> >> >>>>>> To view this discussion on the web visit
>>> >>> >> >>>>>>
>>> >>> >> >>>>>>
>>> >>> >>
>>> >>>
>>> >>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>>> >>> >> >>>>>> .
>>> >>> >> >>>>>> For more options, visit https://groups.google.com/d/optout.
>>> >>> >> >>>>
>>> >>> >> >>>>
>>> >>> >> >>>>
>>> >>> >> >>>>
>>> >>> >> >>>> --
>>> >>> >> >>>> Thanks,
>>> >>> >> >>>> Ewen
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Thanks,
>>> >>> Ewen
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Thanks,
>>> >> Neha
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kafka-clients" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to kafka-clients+unsubscribe@googlegroups.com.
>> To post to this group, send email to kafka-clients@googlegroups.com.
>> Visit this group at http://groups.google.com/group/kafka-clients.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/kafka-clients/CAOeJiJh17CYq%3D-qgPu9rnArsPW%3D7RL9AAW_h%3DrrXx0%2BKhhKgNQ%40mail.gmail.com.
>>
>> For more options, visit https://groups.google.com/d/optout.
>
>

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Jun Rao <ju...@confluent.io>.
Overall, I agree with Jay on both points.

1. I think it's reasonable to add new error codes w/o bumping up the
protocol version. In most cases, by adding new error codes, we are just
refining the categorization of those unknown errors. So, a client shouldn't
behave worse than before as long as unknown errors have been properly
handled.

2. I think it's reasonable to just document that 0.8.2 will be the last
release that will support ack > 1 and remove the support completely in
trunk w/o bumping up the protocol. This is because (a) we never included
ack > 1 explicitly in the documentation and so the usage should be limited;
(2) ack > 1 doesn't provide the guarantee that people really want and so it
shouldn't really be used.

Thanks,

Jun


On Sun, Jan 18, 2015 at 11:03 AM, Jay Kreps <ja...@gmail.com> wrote:

> Hey guys,
>
> I really think we are discussing two things here:
>
>    1. How should we generally handle changes to the set of errors? Should
>    introducing new errors be considered a protocol change or should we reserve
>    the right to introduce new error codes?
>    2. Given that this particular change is possibly incompatible, how
>    should we handle it?
>
> I think it would be good for people who are responding here to be specific
> about which they are addressing.
>
> Here is what I think:
>
> 1. Errors should be extensible within a protocol version.
>
> We should change the protocol documentation to list the errors that can be
> given back from each api, their meaning, and how to handle them, BUT we
> should explicitly state that the set of errors are open ended. That is we
> should reserve the right to introduce new errors and explicitly state that
> clients need a blanket "unknown error" handling mechanism. The error can
> link to the protocol definition (something like "Unknown error 42, see
> protocol definition at http://link"). We could make this work really well
> by instructing all the clients to report the error in a very googlable way
> as Oracle does with their error format (e.g. "ORA-32") so that if you ever
> get the raw error google will take you to the definition.
>
> I agree that a more rigid definition seems like "right thing", but having
> just implemented two clients and spent a bunch of time on the server side,
> I think, it will work out poorly in practice. Here is why:
>
>    - I think we will make a lot of mistakes in nailing down the set of
>    error codes up front and we will end up going through 3-4 churns of the
>    protocol definition just realizing the set of errors that can be thrown. I
>    think this churn will actually make life worse for clients that now have to
>    figure out 7 identical versions of the protocol and will be a mess in terms
>    of testing on the server side. I actually know this to be true because
>    while implementing the clients I tried to guess the errors that could be
>    thrown, then checked my guess by close code inspection. It turned out that
>    I always missed things in my belief about errors, but more importantly even
>    after close code inspection I found tons of other errors in my stress
>    testing.
>    - In practice error handling always involves calling out one or two
>    meaningful failures that have special recovery and then a blanket case that
>    just handles everything else. It's true that some clients may not have done
>    this well, but I think it is for the best if they fix that.
>    - Reserving the right to add errors doesn't mean we will do it without
>    care. We will think through each change and decide whether giving a little
>    more precision in the error is worth the overhead and churn of a protocol
>    version bump.
>
> 2. In this case in particular we should not introduce a new protocol
> version
>
> In this particular case we are saying that acks > 1 doesn't make sense and
> we want to give an error to people specifying this so that they change
> their configuration. This is a configuration that few people use and we
> want to just make it an error. The bad behavior will just be that the error
> will not be as good as it could be. I think that is a better tradeoff than
> introducing a separate protocol version (this may be true of the java
> clients too).
>
> We will have lots of cases like this in the future and we aren't going to
> want to churn the protocol for each of them. For example we previously had
> to get more precise about which characters were legal and which illegal in
> topic names.
>
> -Jay
>
> On Fri, Jan 16, 2015 at 11:55 AM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
>> I updated the KIP: Using acks > 1 in version 0 will log a WARN message
>> in the broker about client using deprecated behavior (suggested by Joe
>> in the JIRA, and I think it makes sense).
>>
>> Gwen
>>
>> On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira <gs...@cloudera.com>
>> wrote:
>> > How about we continue the discussion on this thread, so we won't lose
>> > the context of this discussion, and put it up for VOTE when this has
>> > been finalized?
>> >
>> > On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <ne...@confluent.io>
>> wrote:
>> >> Gwen,
>> >>
>> >> KIP write-up looks good. According to the rest of the KIP process
>> proposal,
>> >> would you like to start a DISCUSS/VOTE thread for it?
>> >>
>> >> Thanks,
>> >> Neha
>> >>
>> >> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava <
>> ewen@confluent.io>
>> >> wrote:
>> >>
>> >>> Gwen -- KIP write up looks good. Deprecation schedule probably needs
>> to be
>> >>> more specific, but I think that discussion probably needs to happen
>> after a
>> >>> solution is agreed upon.
>> >>>
>> >>> Jay -- I think "older clients will get a bad error message instead of
>> a
>> >>> good one" isn't what would be happening with this change. Previously
>> they
>> >>> wouldn't have received an error and they would have been able to
>> produce
>> >>> messages. After the change they'll just receive this new error message
>> >>> which their clients can't possibly handle gracefully since it didn't
>> exist
>> >>> when the client was written. Whether the acks > 1 setting was actually
>> >>> accomplishing what they thought doesn't matter. Someone could have
>> >>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok
>> setting for
>> >>> their applications, set it as a default across a bunch of apps, then
>> follow
>> >>> the recommended upgrade path of updating brokers to 0.8.2 and all
>> their
>> >>> apps will start failing on produce requests.
>> >>>
>> >>>
>> >>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com>
>> wrote:
>> >>>
>> >>> > This is a good case to discuss.
>> >>> >
>> >>> > Let's figure the general case of how we want to handle errors and
>> get
>> >>> that
>> >>> > documented in the protocol. The problem right now is that we give no
>> >>> > guidance on this. I actually thought Gwen's suggestion made sense
>> on the
>> >>> > guidance we should have given which is that we will enumerate a set
>> of
>> >>> > errors and their meaning for each API but it is possible that other
>> >>> errors
>> >>> > will occur and they should be handled (maybe poorly) in the same way
>> >>> > UNKNOWN_ERROR is handled which is our normal escape hatch for
>> things like
>> >>> > OOMException.
>> >>> >
>> >>> > I really do think we shouldn't be dogmatic here: In considering a
>> change
>> >>> > to errors we should consider the potential ill-effect vs the
>> complexity
>> >>> of
>> >>> > yet another protocol version.
>> >>> >
>> >>> > In this case I actually am not sure we need to bump the protocol
>> because
>> >>> > the whole point of the change was to make a setting we think
>> doesn't make
>> >>> > sense break, right? Well this will break it. It seems like the only
>> >>> > downside is that older clients will get a bad error message instead
>> of a
>> >>> > good one. But it isn't like we will have rendered a client
>> unusable, it
>> >>> is
>> >>> > just that they will need to change their config.
>> >>> >
>> >>> > -Jay
>> >>> >
>> >>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira <
>> gshapira@cloudera.com>
>> >>> > wrote:
>> >>> >
>> >>> >> I created a KIP for this suggestion:
>> >>> >>
>> >>> >>
>> >>> >>
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>> >>> >>
>> >>> >> Basically documenting what was already discussed here.  Comments
>> will
>> >>> >> be awesome!
>> >>> >>
>> >>> >> Gwen
>> >>> >>
>> >>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <
>> gshapira@cloudera.com>
>> >>> >> wrote:
>> >>> >> > The errors are part of the KIP process now, so I think the
>> clients are
>> >>> >> safe :)
>> >>> >> >
>> >>> >> >
>> >>> >>
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>> >>> >> >
>> >>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <
>> steve.morin@gmail.com>
>> >>> >> wrote:
>> >>> >> >> Agree errors should be part of the protocol
>> >>> >> >>
>> >>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira <gshapira@cloudera.com
>> >
>> >>> >> wrote:
>> >>> >> >>>
>> >>> >> >>> Hi,
>> >>> >> >>>
>> >>> >> >>> I got convinced by Joe and Dana that errors are indeed part of
>> the
>> >>> >> >>> protocol and can't be randomly added.
>> >>> >> >>>
>> >>> >> >>> So, it looks like we need to bump version of ProduceRequest in
>> the
>> >>> >> >>> following way:
>> >>> >> >>> Version 0 -> accept acks >1. I think we should keep the
>> existing
>> >>> >> >>> behavior too (i.e. not replace it with -1) to avoid surprising
>> >>> >> >>> clients, but I'm willing to hear other opinions.
>> >>> >> >>> Version 1 -> do not accept acks >1 and return an error.
>> >>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use
>> something
>> >>> >> >>> less specific like InvalidRequestParameter. This error can be
>> reused
>> >>> >> >>> in the future and reduce the need to add errors, but will also
>> be
>> >>> less
>> >>> >> >>> clear to the client and its users. Maybe even add the error
>> message
>> >>> >> >>> string to the protocol in addition to the error code? (since
>> we are
>> >>> >> >>> bumping versions....)
>> >>> >> >>>
>> >>> >> >>> I think maintaining the old version throughout 0.8.X makes
>> sense.
>> >>> IMO
>> >>> >> >>> dropping it for 0.9 is feasible, but I'll let client owners
>> help
>> >>> make
>> >>> >> >>> that call.
>> >>> >> >>>
>> >>> >> >>> Am I missing anything? Should I start a KIP? It seems like a
>> >>> KIP-type
>> >>> >> >>> discussion :)
>> >>> >> >>>
>> >>> >> >>> Gwen
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>> >>> >> >>> <ew...@confluent.io> wrote:
>> >>> >> >>>> Gwen,
>> >>> >> >>>>
>> >>> >> >>>> I think the only option that wouldn't require a protocol
>> version
>> >>> >> change is
>> >>> >> >>>> the one where acks > 1 is converted to acks = -1 since it's
>> the
>> >>> only
>> >>> >> one
>> >>> >> >>>> that doesn't potentially break older clients. The protocol
>> guide
>> >>> >> says that
>> >>> >> >>>> the expected upgrade path is servers first, then clients, so
>> old
>> >>> >> clients,
>> >>> >> >>>> including non-java clients, that may be using acks > 1 should
>> be
>> >>> >> able to
>> >>> >> >>>> work with a new broker version.
>> >>> >> >>>>
>> >>> >> >>>> It's more work, but I think dealing with the protocol change
>> is the
>> >>> >> right
>> >>> >> >>>> thing to do since it eventually gets us to the behavior I
>> think is
>> >>> >> better --
>> >>> >> >>>> the broker should reject requests with invalid values. I
>> think Joe
>> >>> >> and I
>> >>> >> >>>> were basically in agreement. In my mind the major piece
>> missing
>> >>> from
>> >>> >> his
>> >>> >> >>>> description is how long we're going to maintain his "case 0"
>> >>> >> behavior. It's
>> >>> >> >>>> impractical to maintain old versions forever, but it sounds
>> like
>> >>> >> there
>> >>> >> >>>> hasn't been a decision on how long to maintain them. Maybe
>> that's
>> >>> >> another
>> >>> >> >>>> item to add to KIPs -- protocol versions and behavior need to
>> be
>> >>> >> listed as
>> >>> >> >>>> deprecated and the earliest version in which they'll be
>> removed
>> >>> >> should be
>> >>> >> >>>> specified so users can understand which versions are
>> guaranteed to
>> >>> be
>> >>> >> >>>> compatible, even if they're using (well-written) non-java
>> clients.
>> >>> >> >>>>
>> >>> >> >>>> -Ewen
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
>> >>> >> dana.powers@gmail.com> wrote:
>> >>> >> >>>>>
>> >>> >> >>>>>> clients don't break on unknown errors
>> >>> >> >>>>>
>> >>> >> >>>>> maybe true for the official java clients, but I dont think
>> the
>> >>> >> assumption
>> >>> >> >>>>> holds true for community-maintained clients and users of
>> those
>> >>> >> clients.
>> >>> >> >>>>> kafka-python generally follows the fail-fast philosophy and
>> raises
>> >>> >> an
>> >>> >> >>>>> exception on any unrecognized error code in any server
>> response.
>> >>> >> in this
>> >>> >> >>>>> case, kafka-python allows users to set their own
>> required-acks
>> >>> >> policy when
>> >>> >> >>>>> creating a producer instance.  It is possible that users of
>> >>> >> kafka-python
>> >>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps in
>> >>> production
>> >>> >> >>>>> environments -- and for those users the new error code will
>> crash
>> >>> >> their
>> >>> >> >>>>> producer code.  I would not be surprised if the same were
>> true of
>> >>> >> other
>> >>> >> >>>>> community clients.
>> >>> >> >>>>>
>> >>> >> >>>>> *one reason for the fail-fast approach is that there isn't
>> great
>> >>> >> >>>>> documentation on what errors to expect for each request /
>> response
>> >>> >> -- so
>> >>> >> >>>>> we
>> >>> >> >>>>> use failures to alert that some error case is not handled
>> >>> >> properly.  and
>> >>> >> >>>>> because of that, introducing new error cases without bumping
>> the
>> >>> api
>> >>> >> >>>>> version is likely to cause those errors to get raised/thrown
>> all
>> >>> >> the way
>> >>> >> >>>>> back up to the user.  of course we (client maintainers) can
>> fix
>> >>> the
>> >>> >> issues
>> >>> >> >>>>> in the client libraries and suggest users upgrade, but it's
>> not
>> >>> the
>> >>> >> ideal
>> >>> >> >>>>> situation.
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>> long-winded way of saying: I agree w/ Joe.
>> >>> >> >>>>>
>> >>> >> >>>>> -Dana
>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
>> >>> >> gshapira@cloudera.com>
>> >>> >> >>>>> wrote:
>> >>> >> >>>>>
>> >>> >> >>>>>> Is the protocol bump caused by the behavior change or the
>> new
>> >>> error
>> >>> >> >>>>>> code?
>> >>> >> >>>>>>
>> >>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to
>> receive
>> >>> >> errors
>> >>> >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK,
>> clients
>> >>> >> don't
>> >>> >> >>>>>> break on unknown errors, they are simple more challenging to
>> >>> >> debug. If
>> >>> >> >>>>>> we document the new behavior, then its definitely
>> debuggable and
>> >>> >> >>>>>> fixable.
>> >>> >> >>>>>>
>> >>> >> >>>>>> 2) The behavior change is basically a deprecation - i.e.
>> acks > 1
>> >>> >> were
>> >>> >> >>>>>> never documented, and are not supported by Kafka clients
>> starting
>> >>> >> with
>> >>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump
>> either,
>> >>> >> >>>>>> although its a better case than new error codes.
>> >>> >> >>>>>>
>> >>> >> >>>>>> Thanks,
>> >>> >> >>>>>> Gwen
>> >>> >> >>>>>>
>> >>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
>> >>> joe.stein@stealth.ly>
>> >>> >> >>>>>> wrote:
>> >>> >> >>>>>>> Looping in the mailing list that the client developers
>> live on
>> >>> >> because
>> >>> >> >>>>>> they
>> >>> >> >>>>>>> are all not on dev (though they should be if they want to
>> be
>> >>> >> helping
>> >>> >> >>>>>>> to
>> >>> >> >>>>>>> build the best client libraries they can).
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> I whole hardily believe that we need to not break existing
>> >>> >> >>>>>>> functionality
>> >>> >> >>>>>> of
>> >>> >> >>>>>>> the client protocol, ever.
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> There are many reasons for this and we have other threads
>> on the
>> >>> >> >>>>>>> mailing
>> >>> >> >>>>>>> list where we are discussing that topic (no pun intended)
>> that I
>> >>> >> don't
>> >>> >> >>>>>> want
>> >>> >> >>>>>>> to re-hash here.
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> If we change wire protocol functionality OR the binary
>> format
>> >>> >> (either)
>> >>> >> >>>>>>> we
>> >>> >> >>>>>>> must bump version AND treat version as a feature flag with
>> >>> >> backward
>> >>> >> >>>>>>> compatibility support until it is deprecated for some time
>> for
>> >>> >> folks
>> >>> >> >>>>>>> to
>> >>> >> >>>>>> deal
>> >>> >> >>>>>>> with it.
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> match version = {
>> >>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
>> >>> >> >>>>>>> case 1: doNewStuff()
>> >>> >> >>>>>>> case 2: doEvenMoreNewStuff()
>> >>> >> >>>>>>> }
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> has to be a practice we adopt imho ... I know feature
>> flags can
>> >>> be
>> >>> >> >>>>>> construed
>> >>> >> >>>>>>> as "messy code" but I am eager to hear another (better?
>> >>> >> different?)
>> >>> >> >>>>>> solution
>> >>> >> >>>>>>> to this.
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> If we don't do a feature flag like this specifically with
>> this
>> >>> >> change
>> >>> >> >>>>>> then
>> >>> >> >>>>>>> what happens is that someone upgrades their brokers with a
>> >>> rolling
>> >>> >> >>>>>> restart
>> >>> >> >>>>>>> in 0.8.3 and every single one of their producer requests
>> start
>> >>> to
>> >>> >> fail
>> >>> >> >>>>>> and
>> >>> >> >>>>>>> they have a major production outage. eeeek!!!!
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY*
>> need
>> >>> >> people to
>> >>> >> >>>>>> start
>> >>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going
>> to
>> >>> >> work for
>> >>> >> >>>>>>> everyone.
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> Old producers and consumers must keep working with new
>> brokers
>> >>> >> and if
>> >>> >> >>>>>>> we
>> >>> >> >>>>>> are
>> >>> >> >>>>>>> not going to support that then I am unclear what the use of
>> >>> >> "version"
>> >>> >> >>>>>>> is
>> >>> >> >>>>>>> based on our original intentions of having it because of
>> the
>> >>> >> >>>>>>> 0.7=>-0.8.
>> >>> >> >>>>>> We
>> >>> >> >>>>>>> said no more breaking changes when we did that.
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> - Joe Stein
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
>> >>> >> >>>>>> ewen@confluent.io>
>> >>> >> >>>>>>> wrote:
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> Right, so this looks like it could create an issue
>> similar to
>> >>> >> what's
>> >>> >> >>>>>>>> currently being discussed in
>> >>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where
>> users
>> >>> >> now get
>> >>> >> >>>>>>>> errors
>> >>> >> >>>>>>>> under conditions when they previously wouldn't. Old
>> clients
>> >>> won't
>> >>> >> >>>>>>>> even
>> >>> >> >>>>>>>> know
>> >>> >> >>>>>>>> about the error code, so besides failing they won't even
>> be
>> >>> able
>> >>> >> to
>> >>> >> >>>>>>>> log
>> >>> >> >>>>>>>> any
>> >>> >> >>>>>>>> meaningful error messages.
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> I think there are two options for compatibility:
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1 code,
>> but
>> >>> >> silently
>> >>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't
>> the
>> >>> >> same as
>> >>> >> >>>>>>>> other
>> >>> >> >>>>>>>> changes to behavior since the interaction between the
>> client
>> >>> and
>> >>> >> >>>>>>>> server
>> >>> >> >>>>>>>> remains the same, no error codes change, etc. The client
>> might
>> >>> >> just
>> >>> >> >>>>>>>> see
>> >>> >> >>>>>>>> some increased latency since the message might need to be
>> >>> >> replicated
>> >>> >> >>>>>>>> to
>> >>> >> >>>>>>>> more brokers than they requested.
>> >>> >> >>>>>>>> 2. Split this into two patches, one that bumps the
>> protocol
>> >>> >> version
>> >>> >> >>>>>>>> on
>> >>> >> >>>>>>>> that
>> >>> >> >>>>>>>> message to include the new error code but maintains both
>> old
>> >>> (now
>> >>> >> >>>>>>>> deprecated) and new behavior, then a second that would be
>> >>> >> applied in
>> >>> >> >>>>>>>> a
>> >>> >> >>>>>>>> later release that removes the old protocol + code for
>> handling
>> >>> >> acks
>> >>> >> >>>>>> 1.
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> 2 is probably the right thing to do. If we specify the
>> release
>> >>> >> when
>> >>> >> >>>>>> we'll
>> >>> >> >>>>>>>> remove the deprecated protocol at the time of deprecation
>> it
>> >>> >> makes
>> >>> >> >>>>>> things
>> >>> >> >>>>>>>> a
>> >>> >> >>>>>>>> lot easier for people writing non-java clients and could
>> give
>> >>> >> users
>> >>> >> >>>>>> better
>> >>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major
>> release
>> >>> >> behind
>> >>> >> >>>>>>>> brokers,
>> >>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
>> >>> features).
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
>> >>> >> gshapira@cloudera.com>
>> >>> >> >>>>>>>> wrote:
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>>> Hi Kafka Devs,
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to
>> ack>1 on
>> >>> >> the
>> >>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone
>> a
>> >>> >> heads up
>> >>> >> >>>>>>>>> on
>> >>> >> >>>>>>>>> what these changes mean.
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>> Once this patch is included, any produce requests that
>> include
>> >>> >> >>>>>>>>> request.required.acks > 1 will result in an exception.
>> This
>> >>> >> will be
>> >>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I
>> assume)
>> >>> and
>> >>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I
>> can't add
>> >>> >> error
>> >>> >> >>>>>>>>> codes retroactively).
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers
>> (sync and
>> >>> >> >>>>>>>>> new),
>> >>> >> >>>>>>>>> but we expect impact on users with older producers that
>> relied
>> >>> >> on
>> >>> >> >>>>>>>>> acks
>> >>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch to
>> using
>> >>> >> acks =
>> >>> >> >>>>>>>>> -1
>> >>> >> >>>>>>>>> and a min.isr parameter than matches their user case.
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>> This change was discussed in the past in the context of
>> >>> >> KAFKA-1555
>> >>> >> >>>>>>>>> (min.isr), but let us know if you have any questions or
>> >>> concerns
>> >>> >> >>>>>>>>> regarding this change.
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>> Gwen
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>> --
>> >>> >> >>>>>>>> Thanks,
>> >>> >> >>>>>>>> Ewen
>> >>> >> >>>>>>>
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> --
>> >>> >> >>>>>>> You received this message because you are subscribed to the
>> >>> Google
>> >>> >> >>>>>>> Groups
>> >>> >> >>>>>>> "kafka-clients" group.
>> >>> >> >>>>>>> To unsubscribe from this group and stop receiving emails
>> from
>> >>> it,
>> >>> >> send
>> >>> >> >>>>>>> an
>> >>> >> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>> >>> >> >>>>>>> To post to this group, send email to
>> >>> >> kafka-clients@googlegroups.com.
>> >>> >> >>>>>>> Visit this group at
>> >>> http://groups.google.com/group/kafka-clients.
>> >>> >> >>>>>>> To view this discussion on the web visit
>> >>> >> >>>>>>
>> >>> >> >>>>>>
>> >>> >>
>> >>>
>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>> >>> >> >>>>>> .
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> For more options, visit https://groups.google.com/d/optout
>> .
>> >>> >> >>>>>>
>> >>> >> >>>>>> --
>> >>> >> >>>>>> You received this message because you are subscribed to the
>> >>> Google
>> >>> >> >>>>>> Groups
>> >>> >> >>>>>> "kafka-clients" group.
>> >>> >> >>>>>> To unsubscribe from this group and stop receiving emails
>> from it,
>> >>> >> send
>> >>> >> >>>>>> an
>> >>> >> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>> >>> >> >>>>>> To post to this group, send email to
>> >>> >> kafka-clients@googlegroups.com.
>> >>> >> >>>>>> Visit this group at
>> http://groups.google.com/group/kafka-clients
>> >>> .
>> >>> >> >>>>>> To view this discussion on the web visit
>> >>> >> >>>>>>
>> >>> >> >>>>>>
>> >>> >>
>> >>>
>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>> >>> >> >>>>>> .
>> >>> >> >>>>>> For more options, visit https://groups.google.com/d/optout.
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>> --
>> >>> >> >>>> Thanks,
>> >>> >> >>>> Ewen
>> >>> >>
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>> --
>> >>> Thanks,
>> >>> Ewen
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Thanks,
>> >> Neha
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscribe@googlegroups.com.
> To post to this group, send email to kafka-clients@googlegroups.com.
> Visit this group at http://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAOeJiJh17CYq%3D-qgPu9rnArsPW%3D7RL9AAW_h%3DrrXx0%2BKhhKgNQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/kafka-clients/CAOeJiJh17CYq%3D-qgPu9rnArsPW%3D7RL9AAW_h%3DrrXx0%2BKhhKgNQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Jay Kreps <ja...@gmail.com>.
Hey guys,

I really think we are discussing two things here:

   1. How should we generally handle changes to the set of errors? Should
   introducing new errors be considered a protocol change or should we reserve
   the right to introduce new error codes?
   2. Given that this particular change is possibly incompatible, how
   should we handle it?

I think it would be good for people who are responding here to be specific
about which they are addressing.

Here is what I think:

1. Errors should be extensible within a protocol version.

We should change the protocol documentation to list the errors that can be
given back from each api, their meaning, and how to handle them, BUT we
should explicitly state that the set of errors are open ended. That is we
should reserve the right to introduce new errors and explicitly state that
clients need a blanket "unknown error" handling mechanism. The error can
link to the protocol definition (something like "Unknown error 42, see
protocol definition at http://link"). We could make this work really well
by instructing all the clients to report the error in a very googlable way
as Oracle does with their error format (e.g. "ORA-32") so that if you ever
get the raw error google will take you to the definition.

I agree that a more rigid definition seems like "right thing", but having
just implemented two clients and spent a bunch of time on the server side,
I think, it will work out poorly in practice. Here is why:

   - I think we will make a lot of mistakes in nailing down the set of
   error codes up front and we will end up going through 3-4 churns of the
   protocol definition just realizing the set of errors that can be thrown. I
   think this churn will actually make life worse for clients that now have to
   figure out 7 identical versions of the protocol and will be a mess in terms
   of testing on the server side. I actually know this to be true because
   while implementing the clients I tried to guess the errors that could be
   thrown, then checked my guess by close code inspection. It turned out that
   I always missed things in my belief about errors, but more importantly even
   after close code inspection I found tons of other errors in my stress
   testing.
   - In practice error handling always involves calling out one or two
   meaningful failures that have special recovery and then a blanket case that
   just handles everything else. It's true that some clients may not have done
   this well, but I think it is for the best if they fix that.
   - Reserving the right to add errors doesn't mean we will do it without
   care. We will think through each change and decide whether giving a little
   more precision in the error is worth the overhead and churn of a protocol
   version bump.

2. In this case in particular we should not introduce a new protocol version

In this particular case we are saying that acks > 1 doesn't make sense and
we want to give an error to people specifying this so that they change
their configuration. This is a configuration that few people use and we
want to just make it an error. The bad behavior will just be that the error
will not be as good as it could be. I think that is a better tradeoff than
introducing a separate protocol version (this may be true of the java
clients too).

We will have lots of cases like this in the future and we aren't going to
want to churn the protocol for each of them. For example we previously had
to get more precise about which characters were legal and which illegal in
topic names.

-Jay

On Fri, Jan 16, 2015 at 11:55 AM, Gwen Shapira <gs...@cloudera.com>
wrote:

> I updated the KIP: Using acks > 1 in version 0 will log a WARN message
> in the broker about client using deprecated behavior (suggested by Joe
> in the JIRA, and I think it makes sense).
>
> Gwen
>
> On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira <gs...@cloudera.com>
> wrote:
> > How about we continue the discussion on this thread, so we won't lose
> > the context of this discussion, and put it up for VOTE when this has
> > been finalized?
> >
> > On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <ne...@confluent.io>
> wrote:
> >> Gwen,
> >>
> >> KIP write-up looks good. According to the rest of the KIP process
> proposal,
> >> would you like to start a DISCUSS/VOTE thread for it?
> >>
> >> Thanks,
> >> Neha
> >>
> >> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava <
> ewen@confluent.io>
> >> wrote:
> >>
> >>> Gwen -- KIP write up looks good. Deprecation schedule probably needs
> to be
> >>> more specific, but I think that discussion probably needs to happen
> after a
> >>> solution is agreed upon.
> >>>
> >>> Jay -- I think "older clients will get a bad error message instead of a
> >>> good one" isn't what would be happening with this change. Previously
> they
> >>> wouldn't have received an error and they would have been able to
> produce
> >>> messages. After the change they'll just receive this new error message
> >>> which their clients can't possibly handle gracefully since it didn't
> exist
> >>> when the client was written. Whether the acks > 1 setting was actually
> >>> accomplishing what they thought doesn't matter. Someone could have
> >>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok setting
> for
> >>> their applications, set it as a default across a bunch of apps, then
> follow
> >>> the recommended upgrade path of updating brokers to 0.8.2 and all their
> >>> apps will start failing on produce requests.
> >>>
> >>>
> >>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com>
> wrote:
> >>>
> >>> > This is a good case to discuss.
> >>> >
> >>> > Let's figure the general case of how we want to handle errors and get
> >>> that
> >>> > documented in the protocol. The problem right now is that we give no
> >>> > guidance on this. I actually thought Gwen's suggestion made sense on
> the
> >>> > guidance we should have given which is that we will enumerate a set
> of
> >>> > errors and their meaning for each API but it is possible that other
> >>> errors
> >>> > will occur and they should be handled (maybe poorly) in the same way
> >>> > UNKNOWN_ERROR is handled which is our normal escape hatch for things
> like
> >>> > OOMException.
> >>> >
> >>> > I really do think we shouldn't be dogmatic here: In considering a
> change
> >>> > to errors we should consider the potential ill-effect vs the
> complexity
> >>> of
> >>> > yet another protocol version.
> >>> >
> >>> > In this case I actually am not sure we need to bump the protocol
> because
> >>> > the whole point of the change was to make a setting we think doesn't
> make
> >>> > sense break, right? Well this will break it. It seems like the only
> >>> > downside is that older clients will get a bad error message instead
> of a
> >>> > good one. But it isn't like we will have rendered a client unusable,
> it
> >>> is
> >>> > just that they will need to change their config.
> >>> >
> >>> > -Jay
> >>> >
> >>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira <gshapira@cloudera.com
> >
> >>> > wrote:
> >>> >
> >>> >> I created a KIP for this suggestion:
> >>> >>
> >>> >>
> >>> >>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
> >>> >>
> >>> >> Basically documenting what was already discussed here.  Comments
> will
> >>> >> be awesome!
> >>> >>
> >>> >> Gwen
> >>> >>
> >>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <
> gshapira@cloudera.com>
> >>> >> wrote:
> >>> >> > The errors are part of the KIP process now, so I think the
> clients are
> >>> >> safe :)
> >>> >> >
> >>> >> >
> >>> >>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >>> >> >
> >>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <
> steve.morin@gmail.com>
> >>> >> wrote:
> >>> >> >> Agree errors should be part of the protocol
> >>> >> >>
> >>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira <gs...@cloudera.com>
> >>> >> wrote:
> >>> >> >>>
> >>> >> >>> Hi,
> >>> >> >>>
> >>> >> >>> I got convinced by Joe and Dana that errors are indeed part of
> the
> >>> >> >>> protocol and can't be randomly added.
> >>> >> >>>
> >>> >> >>> So, it looks like we need to bump version of ProduceRequest in
> the
> >>> >> >>> following way:
> >>> >> >>> Version 0 -> accept acks >1. I think we should keep the existing
> >>> >> >>> behavior too (i.e. not replace it with -1) to avoid surprising
> >>> >> >>> clients, but I'm willing to hear other opinions.
> >>> >> >>> Version 1 -> do not accept acks >1 and return an error.
> >>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use
> something
> >>> >> >>> less specific like InvalidRequestParameter. This error can be
> reused
> >>> >> >>> in the future and reduce the need to add errors, but will also
> be
> >>> less
> >>> >> >>> clear to the client and its users. Maybe even add the error
> message
> >>> >> >>> string to the protocol in addition to the error code? (since we
> are
> >>> >> >>> bumping versions....)
> >>> >> >>>
> >>> >> >>> I think maintaining the old version throughout 0.8.X makes
> sense.
> >>> IMO
> >>> >> >>> dropping it for 0.9 is feasible, but I'll let client owners help
> >>> make
> >>> >> >>> that call.
> >>> >> >>>
> >>> >> >>> Am I missing anything? Should I start a KIP? It seems like a
> >>> KIP-type
> >>> >> >>> discussion :)
> >>> >> >>>
> >>> >> >>> Gwen
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
> >>> >> >>> <ew...@confluent.io> wrote:
> >>> >> >>>> Gwen,
> >>> >> >>>>
> >>> >> >>>> I think the only option that wouldn't require a protocol
> version
> >>> >> change is
> >>> >> >>>> the one where acks > 1 is converted to acks = -1 since it's the
> >>> only
> >>> >> one
> >>> >> >>>> that doesn't potentially break older clients. The protocol
> guide
> >>> >> says that
> >>> >> >>>> the expected upgrade path is servers first, then clients, so
> old
> >>> >> clients,
> >>> >> >>>> including non-java clients, that may be using acks > 1 should
> be
> >>> >> able to
> >>> >> >>>> work with a new broker version.
> >>> >> >>>>
> >>> >> >>>> It's more work, but I think dealing with the protocol change
> is the
> >>> >> right
> >>> >> >>>> thing to do since it eventually gets us to the behavior I
> think is
> >>> >> better --
> >>> >> >>>> the broker should reject requests with invalid values. I think
> Joe
> >>> >> and I
> >>> >> >>>> were basically in agreement. In my mind the major piece missing
> >>> from
> >>> >> his
> >>> >> >>>> description is how long we're going to maintain his "case 0"
> >>> >> behavior. It's
> >>> >> >>>> impractical to maintain old versions forever, but it sounds
> like
> >>> >> there
> >>> >> >>>> hasn't been a decision on how long to maintain them. Maybe
> that's
> >>> >> another
> >>> >> >>>> item to add to KIPs -- protocol versions and behavior need to
> be
> >>> >> listed as
> >>> >> >>>> deprecated and the earliest version in which they'll be removed
> >>> >> should be
> >>> >> >>>> specified so users can understand which versions are
> guaranteed to
> >>> be
> >>> >> >>>> compatible, even if they're using (well-written) non-java
> clients.
> >>> >> >>>>
> >>> >> >>>> -Ewen
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
> >>> >> dana.powers@gmail.com> wrote:
> >>> >> >>>>>
> >>> >> >>>>>> clients don't break on unknown errors
> >>> >> >>>>>
> >>> >> >>>>> maybe true for the official java clients, but I dont think the
> >>> >> assumption
> >>> >> >>>>> holds true for community-maintained clients and users of those
> >>> >> clients.
> >>> >> >>>>> kafka-python generally follows the fail-fast philosophy and
> raises
> >>> >> an
> >>> >> >>>>> exception on any unrecognized error code in any server
> response.
> >>> >> in this
> >>> >> >>>>> case, kafka-python allows users to set their own required-acks
> >>> >> policy when
> >>> >> >>>>> creating a producer instance.  It is possible that users of
> >>> >> kafka-python
> >>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps in
> >>> production
> >>> >> >>>>> environments -- and for those users the new error code will
> crash
> >>> >> their
> >>> >> >>>>> producer code.  I would not be surprised if the same were
> true of
> >>> >> other
> >>> >> >>>>> community clients.
> >>> >> >>>>>
> >>> >> >>>>> *one reason for the fail-fast approach is that there isn't
> great
> >>> >> >>>>> documentation on what errors to expect for each request /
> response
> >>> >> -- so
> >>> >> >>>>> we
> >>> >> >>>>> use failures to alert that some error case is not handled
> >>> >> properly.  and
> >>> >> >>>>> because of that, introducing new error cases without bumping
> the
> >>> api
> >>> >> >>>>> version is likely to cause those errors to get raised/thrown
> all
> >>> >> the way
> >>> >> >>>>> back up to the user.  of course we (client maintainers) can
> fix
> >>> the
> >>> >> issues
> >>> >> >>>>> in the client libraries and suggest users upgrade, but it's
> not
> >>> the
> >>> >> ideal
> >>> >> >>>>> situation.
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>> long-winded way of saying: I agree w/ Joe.
> >>> >> >>>>>
> >>> >> >>>>> -Dana
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
> >>> >> gshapira@cloudera.com>
> >>> >> >>>>> wrote:
> >>> >> >>>>>
> >>> >> >>>>>> Is the protocol bump caused by the behavior change or the new
> >>> error
> >>> >> >>>>>> code?
> >>> >> >>>>>>
> >>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to
> receive
> >>> >> errors
> >>> >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK,
> clients
> >>> >> don't
> >>> >> >>>>>> break on unknown errors, they are simple more challenging to
> >>> >> debug. If
> >>> >> >>>>>> we document the new behavior, then its definitely debuggable
> and
> >>> >> >>>>>> fixable.
> >>> >> >>>>>>
> >>> >> >>>>>> 2) The behavior change is basically a deprecation - i.e.
> acks > 1
> >>> >> were
> >>> >> >>>>>> never documented, and are not supported by Kafka clients
> starting
> >>> >> with
> >>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump
> either,
> >>> >> >>>>>> although its a better case than new error codes.
> >>> >> >>>>>>
> >>> >> >>>>>> Thanks,
> >>> >> >>>>>> Gwen
> >>> >> >>>>>>
> >>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
> >>> joe.stein@stealth.ly>
> >>> >> >>>>>> wrote:
> >>> >> >>>>>>> Looping in the mailing list that the client developers live
> on
> >>> >> because
> >>> >> >>>>>> they
> >>> >> >>>>>>> are all not on dev (though they should be if they want to be
> >>> >> helping
> >>> >> >>>>>>> to
> >>> >> >>>>>>> build the best client libraries they can).
> >>> >> >>>>>>>
> >>> >> >>>>>>> I whole hardily believe that we need to not break existing
> >>> >> >>>>>>> functionality
> >>> >> >>>>>> of
> >>> >> >>>>>>> the client protocol, ever.
> >>> >> >>>>>>>
> >>> >> >>>>>>> There are many reasons for this and we have other threads
> on the
> >>> >> >>>>>>> mailing
> >>> >> >>>>>>> list where we are discussing that topic (no pun intended)
> that I
> >>> >> don't
> >>> >> >>>>>> want
> >>> >> >>>>>>> to re-hash here.
> >>> >> >>>>>>>
> >>> >> >>>>>>> If we change wire protocol functionality OR the binary
> format
> >>> >> (either)
> >>> >> >>>>>>> we
> >>> >> >>>>>>> must bump version AND treat version as a feature flag with
> >>> >> backward
> >>> >> >>>>>>> compatibility support until it is deprecated for some time
> for
> >>> >> folks
> >>> >> >>>>>>> to
> >>> >> >>>>>> deal
> >>> >> >>>>>>> with it.
> >>> >> >>>>>>>
> >>> >> >>>>>>> match version = {
> >>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
> >>> >> >>>>>>> case 1: doNewStuff()
> >>> >> >>>>>>> case 2: doEvenMoreNewStuff()
> >>> >> >>>>>>> }
> >>> >> >>>>>>>
> >>> >> >>>>>>> has to be a practice we adopt imho ... I know feature flags
> can
> >>> be
> >>> >> >>>>>> construed
> >>> >> >>>>>>> as "messy code" but I am eager to hear another (better?
> >>> >> different?)
> >>> >> >>>>>> solution
> >>> >> >>>>>>> to this.
> >>> >> >>>>>>>
> >>> >> >>>>>>> If we don't do a feature flag like this specifically with
> this
> >>> >> change
> >>> >> >>>>>> then
> >>> >> >>>>>>> what happens is that someone upgrades their brokers with a
> >>> rolling
> >>> >> >>>>>> restart
> >>> >> >>>>>>> in 0.8.3 and every single one of their producer requests
> start
> >>> to
> >>> >> fail
> >>> >> >>>>>> and
> >>> >> >>>>>>> they have a major production outage. eeeek!!!!
> >>> >> >>>>>>>
> >>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need
> >>> >> people to
> >>> >> >>>>>> start
> >>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going
> to
> >>> >> work for
> >>> >> >>>>>>> everyone.
> >>> >> >>>>>>>
> >>> >> >>>>>>> Old producers and consumers must keep working with new
> brokers
> >>> >> and if
> >>> >> >>>>>>> we
> >>> >> >>>>>> are
> >>> >> >>>>>>> not going to support that then I am unclear what the use of
> >>> >> "version"
> >>> >> >>>>>>> is
> >>> >> >>>>>>> based on our original intentions of having it because of the
> >>> >> >>>>>>> 0.7=>-0.8.
> >>> >> >>>>>> We
> >>> >> >>>>>>> said no more breaking changes when we did that.
> >>> >> >>>>>>>
> >>> >> >>>>>>> - Joe Stein
> >>> >> >>>>>>>
> >>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
> >>> >> >>>>>> ewen@confluent.io>
> >>> >> >>>>>>> wrote:
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> Right, so this looks like it could create an issue similar
> to
> >>> >> what's
> >>> >> >>>>>>>> currently being discussed in
> >>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where
> users
> >>> >> now get
> >>> >> >>>>>>>> errors
> >>> >> >>>>>>>> under conditions when they previously wouldn't. Old clients
> >>> won't
> >>> >> >>>>>>>> even
> >>> >> >>>>>>>> know
> >>> >> >>>>>>>> about the error code, so besides failing they won't even be
> >>> able
> >>> >> to
> >>> >> >>>>>>>> log
> >>> >> >>>>>>>> any
> >>> >> >>>>>>>> meaningful error messages.
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> I think there are two options for compatibility:
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1 code, but
> >>> >> silently
> >>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't
> the
> >>> >> same as
> >>> >> >>>>>>>> other
> >>> >> >>>>>>>> changes to behavior since the interaction between the
> client
> >>> and
> >>> >> >>>>>>>> server
> >>> >> >>>>>>>> remains the same, no error codes change, etc. The client
> might
> >>> >> just
> >>> >> >>>>>>>> see
> >>> >> >>>>>>>> some increased latency since the message might need to be
> >>> >> replicated
> >>> >> >>>>>>>> to
> >>> >> >>>>>>>> more brokers than they requested.
> >>> >> >>>>>>>> 2. Split this into two patches, one that bumps the protocol
> >>> >> version
> >>> >> >>>>>>>> on
> >>> >> >>>>>>>> that
> >>> >> >>>>>>>> message to include the new error code but maintains both
> old
> >>> (now
> >>> >> >>>>>>>> deprecated) and new behavior, then a second that would be
> >>> >> applied in
> >>> >> >>>>>>>> a
> >>> >> >>>>>>>> later release that removes the old protocol + code for
> handling
> >>> >> acks
> >>> >> >>>>>> 1.
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> 2 is probably the right thing to do. If we specify the
> release
> >>> >> when
> >>> >> >>>>>> we'll
> >>> >> >>>>>>>> remove the deprecated protocol at the time of deprecation
> it
> >>> >> makes
> >>> >> >>>>>> things
> >>> >> >>>>>>>> a
> >>> >> >>>>>>>> lot easier for people writing non-java clients and could
> give
> >>> >> users
> >>> >> >>>>>> better
> >>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major release
> >>> >> behind
> >>> >> >>>>>>>> brokers,
> >>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
> >>> features).
> >>> >> >>>>>>>>
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
> >>> >> gshapira@cloudera.com>
> >>> >> >>>>>>>> wrote:
> >>> >> >>>>>>>>
> >>> >> >>>>>>>>> Hi Kafka Devs,
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to
> ack>1 on
> >>> >> the
> >>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a
> >>> >> heads up
> >>> >> >>>>>>>>> on
> >>> >> >>>>>>>>> what these changes mean.
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>> Once this patch is included, any produce requests that
> include
> >>> >> >>>>>>>>> request.required.acks > 1 will result in an exception.
> This
> >>> >> will be
> >>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I
> assume)
> >>> and
> >>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I can't
> add
> >>> >> error
> >>> >> >>>>>>>>> codes retroactively).
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers
> (sync and
> >>> >> >>>>>>>>> new),
> >>> >> >>>>>>>>> but we expect impact on users with older producers that
> relied
> >>> >> on
> >>> >> >>>>>>>>> acks
> >>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch to
> using
> >>> >> acks =
> >>> >> >>>>>>>>> -1
> >>> >> >>>>>>>>> and a min.isr parameter than matches their user case.
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>> This change was discussed in the past in the context of
> >>> >> KAFKA-1555
> >>> >> >>>>>>>>> (min.isr), but let us know if you have any questions or
> >>> concerns
> >>> >> >>>>>>>>> regarding this change.
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>> Gwen
> >>> >> >>>>>>>>
> >>> >> >>>>>>>>
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> --
> >>> >> >>>>>>>> Thanks,
> >>> >> >>>>>>>> Ewen
> >>> >> >>>>>>>
> >>> >> >>>>>>>
> >>> >> >>>>>>> --
> >>> >> >>>>>>> You received this message because you are subscribed to the
> >>> Google
> >>> >> >>>>>>> Groups
> >>> >> >>>>>>> "kafka-clients" group.
> >>> >> >>>>>>> To unsubscribe from this group and stop receiving emails
> from
> >>> it,
> >>> >> send
> >>> >> >>>>>>> an
> >>> >> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
> >>> >> >>>>>>> To post to this group, send email to
> >>> >> kafka-clients@googlegroups.com.
> >>> >> >>>>>>> Visit this group at
> >>> http://groups.google.com/group/kafka-clients.
> >>> >> >>>>>>> To view this discussion on the web visit
> >>> >> >>>>>>
> >>> >> >>>>>>
> >>> >>
> >>>
> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
> >>> >> >>>>>> .
> >>> >> >>>>>>>
> >>> >> >>>>>>> For more options, visit https://groups.google.com/d/optout.
> >>> >> >>>>>>
> >>> >> >>>>>> --
> >>> >> >>>>>> You received this message because you are subscribed to the
> >>> Google
> >>> >> >>>>>> Groups
> >>> >> >>>>>> "kafka-clients" group.
> >>> >> >>>>>> To unsubscribe from this group and stop receiving emails
> from it,
> >>> >> send
> >>> >> >>>>>> an
> >>> >> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
> >>> >> >>>>>> To post to this group, send email to
> >>> >> kafka-clients@googlegroups.com.
> >>> >> >>>>>> Visit this group at
> http://groups.google.com/group/kafka-clients
> >>> .
> >>> >> >>>>>> To view this discussion on the web visit
> >>> >> >>>>>>
> >>> >> >>>>>>
> >>> >>
> >>>
> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
> >>> >> >>>>>> .
> >>> >> >>>>>> For more options, visit https://groups.google.com/d/optout.
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> --
> >>> >> >>>> Thanks,
> >>> >> >>>> Ewen
> >>> >>
> >>> >
> >>> >
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Ewen
> >>>
> >>
> >>
> >>
> >> --
> >> Thanks,
> >> Neha
>

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Gwen Shapira <gs...@cloudera.com>.
I updated the KIP: Using acks > 1 in version 0 will log a WARN message
in the broker about client using deprecated behavior (suggested by Joe
in the JIRA, and I think it makes sense).

Gwen

On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira <gs...@cloudera.com> wrote:
> How about we continue the discussion on this thread, so we won't lose
> the context of this discussion, and put it up for VOTE when this has
> been finalized?
>
> On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <ne...@confluent.io> wrote:
>> Gwen,
>>
>> KIP write-up looks good. According to the rest of the KIP process proposal,
>> would you like to start a DISCUSS/VOTE thread for it?
>>
>> Thanks,
>> Neha
>>
>> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava <ew...@confluent.io>
>> wrote:
>>
>>> Gwen -- KIP write up looks good. Deprecation schedule probably needs to be
>>> more specific, but I think that discussion probably needs to happen after a
>>> solution is agreed upon.
>>>
>>> Jay -- I think "older clients will get a bad error message instead of a
>>> good one" isn't what would be happening with this change. Previously they
>>> wouldn't have received an error and they would have been able to produce
>>> messages. After the change they'll just receive this new error message
>>> which their clients can't possibly handle gracefully since it didn't exist
>>> when the client was written. Whether the acks > 1 setting was actually
>>> accomplishing what they thought doesn't matter. Someone could have
>>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok setting for
>>> their applications, set it as a default across a bunch of apps, then follow
>>> the recommended upgrade path of updating brokers to 0.8.2 and all their
>>> apps will start failing on produce requests.
>>>
>>>
>>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com> wrote:
>>>
>>> > This is a good case to discuss.
>>> >
>>> > Let's figure the general case of how we want to handle errors and get
>>> that
>>> > documented in the protocol. The problem right now is that we give no
>>> > guidance on this. I actually thought Gwen's suggestion made sense on the
>>> > guidance we should have given which is that we will enumerate a set of
>>> > errors and their meaning for each API but it is possible that other
>>> errors
>>> > will occur and they should be handled (maybe poorly) in the same way
>>> > UNKNOWN_ERROR is handled which is our normal escape hatch for things like
>>> > OOMException.
>>> >
>>> > I really do think we shouldn't be dogmatic here: In considering a change
>>> > to errors we should consider the potential ill-effect vs the complexity
>>> of
>>> > yet another protocol version.
>>> >
>>> > In this case I actually am not sure we need to bump the protocol because
>>> > the whole point of the change was to make a setting we think doesn't make
>>> > sense break, right? Well this will break it. It seems like the only
>>> > downside is that older clients will get a bad error message instead of a
>>> > good one. But it isn't like we will have rendered a client unusable, it
>>> is
>>> > just that they will need to change their config.
>>> >
>>> > -Jay
>>> >
>>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira <gs...@cloudera.com>
>>> > wrote:
>>> >
>>> >> I created a KIP for this suggestion:
>>> >>
>>> >>
>>> >>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>>> >>
>>> >> Basically documenting what was already discussed here.  Comments will
>>> >> be awesome!
>>> >>
>>> >> Gwen
>>> >>
>>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gs...@cloudera.com>
>>> >> wrote:
>>> >> > The errors are part of the KIP process now, so I think the clients are
>>> >> safe :)
>>> >> >
>>> >> >
>>> >>
>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>> >> >
>>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <st...@gmail.com>
>>> >> wrote:
>>> >> >> Agree errors should be part of the protocol
>>> >> >>
>>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira <gs...@cloudera.com>
>>> >> wrote:
>>> >> >>>
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> I got convinced by Joe and Dana that errors are indeed part of the
>>> >> >>> protocol and can't be randomly added.
>>> >> >>>
>>> >> >>> So, it looks like we need to bump version of ProduceRequest in the
>>> >> >>> following way:
>>> >> >>> Version 0 -> accept acks >1. I think we should keep the existing
>>> >> >>> behavior too (i.e. not replace it with -1) to avoid surprising
>>> >> >>> clients, but I'm willing to hear other opinions.
>>> >> >>> Version 1 -> do not accept acks >1 and return an error.
>>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use something
>>> >> >>> less specific like InvalidRequestParameter. This error can be reused
>>> >> >>> in the future and reduce the need to add errors, but will also be
>>> less
>>> >> >>> clear to the client and its users. Maybe even add the error message
>>> >> >>> string to the protocol in addition to the error code? (since we are
>>> >> >>> bumping versions....)
>>> >> >>>
>>> >> >>> I think maintaining the old version throughout 0.8.X makes sense.
>>> IMO
>>> >> >>> dropping it for 0.9 is feasible, but I'll let client owners help
>>> make
>>> >> >>> that call.
>>> >> >>>
>>> >> >>> Am I missing anything? Should I start a KIP? It seems like a
>>> KIP-type
>>> >> >>> discussion :)
>>> >> >>>
>>> >> >>> Gwen
>>> >> >>>
>>> >> >>>
>>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>>> >> >>> <ew...@confluent.io> wrote:
>>> >> >>>> Gwen,
>>> >> >>>>
>>> >> >>>> I think the only option that wouldn't require a protocol version
>>> >> change is
>>> >> >>>> the one where acks > 1 is converted to acks = -1 since it's the
>>> only
>>> >> one
>>> >> >>>> that doesn't potentially break older clients. The protocol guide
>>> >> says that
>>> >> >>>> the expected upgrade path is servers first, then clients, so old
>>> >> clients,
>>> >> >>>> including non-java clients, that may be using acks > 1 should be
>>> >> able to
>>> >> >>>> work with a new broker version.
>>> >> >>>>
>>> >> >>>> It's more work, but I think dealing with the protocol change is the
>>> >> right
>>> >> >>>> thing to do since it eventually gets us to the behavior I think is
>>> >> better --
>>> >> >>>> the broker should reject requests with invalid values. I think Joe
>>> >> and I
>>> >> >>>> were basically in agreement. In my mind the major piece missing
>>> from
>>> >> his
>>> >> >>>> description is how long we're going to maintain his "case 0"
>>> >> behavior. It's
>>> >> >>>> impractical to maintain old versions forever, but it sounds like
>>> >> there
>>> >> >>>> hasn't been a decision on how long to maintain them. Maybe that's
>>> >> another
>>> >> >>>> item to add to KIPs -- protocol versions and behavior need to be
>>> >> listed as
>>> >> >>>> deprecated and the earliest version in which they'll be removed
>>> >> should be
>>> >> >>>> specified so users can understand which versions are guaranteed to
>>> be
>>> >> >>>> compatible, even if they're using (well-written) non-java clients.
>>> >> >>>>
>>> >> >>>> -Ewen
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
>>> >> dana.powers@gmail.com> wrote:
>>> >> >>>>>
>>> >> >>>>>> clients don't break on unknown errors
>>> >> >>>>>
>>> >> >>>>> maybe true for the official java clients, but I dont think the
>>> >> assumption
>>> >> >>>>> holds true for community-maintained clients and users of those
>>> >> clients.
>>> >> >>>>> kafka-python generally follows the fail-fast philosophy and raises
>>> >> an
>>> >> >>>>> exception on any unrecognized error code in any server response.
>>> >> in this
>>> >> >>>>> case, kafka-python allows users to set their own required-acks
>>> >> policy when
>>> >> >>>>> creating a producer instance.  It is possible that users of
>>> >> kafka-python
>>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps in
>>> production
>>> >> >>>>> environments -- and for those users the new error code will crash
>>> >> their
>>> >> >>>>> producer code.  I would not be surprised if the same were true of
>>> >> other
>>> >> >>>>> community clients.
>>> >> >>>>>
>>> >> >>>>> *one reason for the fail-fast approach is that there isn't great
>>> >> >>>>> documentation on what errors to expect for each request / response
>>> >> -- so
>>> >> >>>>> we
>>> >> >>>>> use failures to alert that some error case is not handled
>>> >> properly.  and
>>> >> >>>>> because of that, introducing new error cases without bumping the
>>> api
>>> >> >>>>> version is likely to cause those errors to get raised/thrown all
>>> >> the way
>>> >> >>>>> back up to the user.  of course we (client maintainers) can fix
>>> the
>>> >> issues
>>> >> >>>>> in the client libraries and suggest users upgrade, but it's not
>>> the
>>> >> ideal
>>> >> >>>>> situation.
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> long-winded way of saying: I agree w/ Joe.
>>> >> >>>>>
>>> >> >>>>> -Dana
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
>>> >> gshapira@cloudera.com>
>>> >> >>>>> wrote:
>>> >> >>>>>
>>> >> >>>>>> Is the protocol bump caused by the behavior change or the new
>>> error
>>> >> >>>>>> code?
>>> >> >>>>>>
>>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to receive
>>> >> errors
>>> >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK, clients
>>> >> don't
>>> >> >>>>>> break on unknown errors, they are simple more challenging to
>>> >> debug. If
>>> >> >>>>>> we document the new behavior, then its definitely debuggable and
>>> >> >>>>>> fixable.
>>> >> >>>>>>
>>> >> >>>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1
>>> >> were
>>> >> >>>>>> never documented, and are not supported by Kafka clients starting
>>> >> with
>>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump either,
>>> >> >>>>>> although its a better case than new error codes.
>>> >> >>>>>>
>>> >> >>>>>> Thanks,
>>> >> >>>>>> Gwen
>>> >> >>>>>>
>>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
>>> joe.stein@stealth.ly>
>>> >> >>>>>> wrote:
>>> >> >>>>>>> Looping in the mailing list that the client developers live on
>>> >> because
>>> >> >>>>>> they
>>> >> >>>>>>> are all not on dev (though they should be if they want to be
>>> >> helping
>>> >> >>>>>>> to
>>> >> >>>>>>> build the best client libraries they can).
>>> >> >>>>>>>
>>> >> >>>>>>> I whole hardily believe that we need to not break existing
>>> >> >>>>>>> functionality
>>> >> >>>>>> of
>>> >> >>>>>>> the client protocol, ever.
>>> >> >>>>>>>
>>> >> >>>>>>> There are many reasons for this and we have other threads on the
>>> >> >>>>>>> mailing
>>> >> >>>>>>> list where we are discussing that topic (no pun intended) that I
>>> >> don't
>>> >> >>>>>> want
>>> >> >>>>>>> to re-hash here.
>>> >> >>>>>>>
>>> >> >>>>>>> If we change wire protocol functionality OR the binary format
>>> >> (either)
>>> >> >>>>>>> we
>>> >> >>>>>>> must bump version AND treat version as a feature flag with
>>> >> backward
>>> >> >>>>>>> compatibility support until it is deprecated for some time for
>>> >> folks
>>> >> >>>>>>> to
>>> >> >>>>>> deal
>>> >> >>>>>>> with it.
>>> >> >>>>>>>
>>> >> >>>>>>> match version = {
>>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
>>> >> >>>>>>> case 1: doNewStuff()
>>> >> >>>>>>> case 2: doEvenMoreNewStuff()
>>> >> >>>>>>> }
>>> >> >>>>>>>
>>> >> >>>>>>> has to be a practice we adopt imho ... I know feature flags can
>>> be
>>> >> >>>>>> construed
>>> >> >>>>>>> as "messy code" but I am eager to hear another (better?
>>> >> different?)
>>> >> >>>>>> solution
>>> >> >>>>>>> to this.
>>> >> >>>>>>>
>>> >> >>>>>>> If we don't do a feature flag like this specifically with this
>>> >> change
>>> >> >>>>>> then
>>> >> >>>>>>> what happens is that someone upgrades their brokers with a
>>> rolling
>>> >> >>>>>> restart
>>> >> >>>>>>> in 0.8.3 and every single one of their producer requests start
>>> to
>>> >> fail
>>> >> >>>>>> and
>>> >> >>>>>>> they have a major production outage. eeeek!!!!
>>> >> >>>>>>>
>>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need
>>> >> people to
>>> >> >>>>>> start
>>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going to
>>> >> work for
>>> >> >>>>>>> everyone.
>>> >> >>>>>>>
>>> >> >>>>>>> Old producers and consumers must keep working with new brokers
>>> >> and if
>>> >> >>>>>>> we
>>> >> >>>>>> are
>>> >> >>>>>>> not going to support that then I am unclear what the use of
>>> >> "version"
>>> >> >>>>>>> is
>>> >> >>>>>>> based on our original intentions of having it because of the
>>> >> >>>>>>> 0.7=>-0.8.
>>> >> >>>>>> We
>>> >> >>>>>>> said no more breaking changes when we did that.
>>> >> >>>>>>>
>>> >> >>>>>>> - Joe Stein
>>> >> >>>>>>>
>>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
>>> >> >>>>>> ewen@confluent.io>
>>> >> >>>>>>> wrote:
>>> >> >>>>>>>>
>>> >> >>>>>>>> Right, so this looks like it could create an issue similar to
>>> >> what's
>>> >> >>>>>>>> currently being discussed in
>>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users
>>> >> now get
>>> >> >>>>>>>> errors
>>> >> >>>>>>>> under conditions when they previously wouldn't. Old clients
>>> won't
>>> >> >>>>>>>> even
>>> >> >>>>>>>> know
>>> >> >>>>>>>> about the error code, so besides failing they won't even be
>>> able
>>> >> to
>>> >> >>>>>>>> log
>>> >> >>>>>>>> any
>>> >> >>>>>>>> meaningful error messages.
>>> >> >>>>>>>>
>>> >> >>>>>>>> I think there are two options for compatibility:
>>> >> >>>>>>>>
>>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1 code, but
>>> >> silently
>>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the
>>> >> same as
>>> >> >>>>>>>> other
>>> >> >>>>>>>> changes to behavior since the interaction between the client
>>> and
>>> >> >>>>>>>> server
>>> >> >>>>>>>> remains the same, no error codes change, etc. The client might
>>> >> just
>>> >> >>>>>>>> see
>>> >> >>>>>>>> some increased latency since the message might need to be
>>> >> replicated
>>> >> >>>>>>>> to
>>> >> >>>>>>>> more brokers than they requested.
>>> >> >>>>>>>> 2. Split this into two patches, one that bumps the protocol
>>> >> version
>>> >> >>>>>>>> on
>>> >> >>>>>>>> that
>>> >> >>>>>>>> message to include the new error code but maintains both old
>>> (now
>>> >> >>>>>>>> deprecated) and new behavior, then a second that would be
>>> >> applied in
>>> >> >>>>>>>> a
>>> >> >>>>>>>> later release that removes the old protocol + code for handling
>>> >> acks
>>> >> >>>>>> 1.
>>> >> >>>>>>>>
>>> >> >>>>>>>> 2 is probably the right thing to do. If we specify the release
>>> >> when
>>> >> >>>>>> we'll
>>> >> >>>>>>>> remove the deprecated protocol at the time of deprecation it
>>> >> makes
>>> >> >>>>>> things
>>> >> >>>>>>>> a
>>> >> >>>>>>>> lot easier for people writing non-java clients and could give
>>> >> users
>>> >> >>>>>> better
>>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major release
>>> >> behind
>>> >> >>>>>>>> brokers,
>>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
>>> features).
>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
>>> >> gshapira@cloudera.com>
>>> >> >>>>>>>> wrote:
>>> >> >>>>>>>>
>>> >> >>>>>>>>> Hi Kafka Devs,
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on
>>> >> the
>>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a
>>> >> heads up
>>> >> >>>>>>>>> on
>>> >> >>>>>>>>> what these changes mean.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> Once this patch is included, any produce requests that include
>>> >> >>>>>>>>> request.required.acks > 1 will result in an exception. This
>>> >> will be
>>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume)
>>> and
>>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I can't add
>>> >> error
>>> >> >>>>>>>>> codes retroactively).
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and
>>> >> >>>>>>>>> new),
>>> >> >>>>>>>>> but we expect impact on users with older producers that relied
>>> >> on
>>> >> >>>>>>>>> acks
>>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch to using
>>> >> acks =
>>> >> >>>>>>>>> -1
>>> >> >>>>>>>>> and a min.isr parameter than matches their user case.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> This change was discussed in the past in the context of
>>> >> KAFKA-1555
>>> >> >>>>>>>>> (min.isr), but let us know if you have any questions or
>>> concerns
>>> >> >>>>>>>>> regarding this change.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> Gwen
>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>> --
>>> >> >>>>>>>> Thanks,
>>> >> >>>>>>>> Ewen
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> --
>>> >> >>>>>>> You received this message because you are subscribed to the
>>> Google
>>> >> >>>>>>> Groups
>>> >> >>>>>>> "kafka-clients" group.
>>> >> >>>>>>> To unsubscribe from this group and stop receiving emails from
>>> it,
>>> >> send
>>> >> >>>>>>> an
>>> >> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>> >> >>>>>>> To post to this group, send email to
>>> >> kafka-clients@googlegroups.com.
>>> >> >>>>>>> Visit this group at
>>> http://groups.google.com/group/kafka-clients.
>>> >> >>>>>>> To view this discussion on the web visit
>>> >> >>>>>>
>>> >> >>>>>>
>>> >>
>>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>>> >> >>>>>> .
>>> >> >>>>>>>
>>> >> >>>>>>> For more options, visit https://groups.google.com/d/optout.
>>> >> >>>>>>
>>> >> >>>>>> --
>>> >> >>>>>> You received this message because you are subscribed to the
>>> Google
>>> >> >>>>>> Groups
>>> >> >>>>>> "kafka-clients" group.
>>> >> >>>>>> To unsubscribe from this group and stop receiving emails from it,
>>> >> send
>>> >> >>>>>> an
>>> >> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>> >> >>>>>> To post to this group, send email to
>>> >> kafka-clients@googlegroups.com.
>>> >> >>>>>> Visit this group at http://groups.google.com/group/kafka-clients
>>> .
>>> >> >>>>>> To view this discussion on the web visit
>>> >> >>>>>>
>>> >> >>>>>>
>>> >>
>>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>>> >> >>>>>> .
>>> >> >>>>>> For more options, visit https://groups.google.com/d/optout.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> --
>>> >> >>>> Thanks,
>>> >> >>>> Ewen
>>> >>
>>> >
>>> >
>>>
>>>
>>> --
>>> Thanks,
>>> Ewen
>>>
>>
>>
>>
>> --
>> Thanks,
>> Neha

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Gwen Shapira <gs...@cloudera.com>.
How about we continue the discussion on this thread, so we won't lose
the context of this discussion, and put it up for VOTE when this has
been finalized?

On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <ne...@confluent.io> wrote:
> Gwen,
>
> KIP write-up looks good. According to the rest of the KIP process proposal,
> would you like to start a DISCUSS/VOTE thread for it?
>
> Thanks,
> Neha
>
> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava <ew...@confluent.io>
> wrote:
>
>> Gwen -- KIP write up looks good. Deprecation schedule probably needs to be
>> more specific, but I think that discussion probably needs to happen after a
>> solution is agreed upon.
>>
>> Jay -- I think "older clients will get a bad error message instead of a
>> good one" isn't what would be happening with this change. Previously they
>> wouldn't have received an error and they would have been able to produce
>> messages. After the change they'll just receive this new error message
>> which their clients can't possibly handle gracefully since it didn't exist
>> when the client was written. Whether the acks > 1 setting was actually
>> accomplishing what they thought doesn't matter. Someone could have
>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok setting for
>> their applications, set it as a default across a bunch of apps, then follow
>> the recommended upgrade path of updating brokers to 0.8.2 and all their
>> apps will start failing on produce requests.
>>
>>
>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com> wrote:
>>
>> > This is a good case to discuss.
>> >
>> > Let's figure the general case of how we want to handle errors and get
>> that
>> > documented in the protocol. The problem right now is that we give no
>> > guidance on this. I actually thought Gwen's suggestion made sense on the
>> > guidance we should have given which is that we will enumerate a set of
>> > errors and their meaning for each API but it is possible that other
>> errors
>> > will occur and they should be handled (maybe poorly) in the same way
>> > UNKNOWN_ERROR is handled which is our normal escape hatch for things like
>> > OOMException.
>> >
>> > I really do think we shouldn't be dogmatic here: In considering a change
>> > to errors we should consider the potential ill-effect vs the complexity
>> of
>> > yet another protocol version.
>> >
>> > In this case I actually am not sure we need to bump the protocol because
>> > the whole point of the change was to make a setting we think doesn't make
>> > sense break, right? Well this will break it. It seems like the only
>> > downside is that older clients will get a bad error message instead of a
>> > good one. But it isn't like we will have rendered a client unusable, it
>> is
>> > just that they will need to change their config.
>> >
>> > -Jay
>> >
>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira <gs...@cloudera.com>
>> > wrote:
>> >
>> >> I created a KIP for this suggestion:
>> >>
>> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>> >>
>> >> Basically documenting what was already discussed here.  Comments will
>> >> be awesome!
>> >>
>> >> Gwen
>> >>
>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gs...@cloudera.com>
>> >> wrote:
>> >> > The errors are part of the KIP process now, so I think the clients are
>> >> safe :)
>> >> >
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>> >> >
>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <st...@gmail.com>
>> >> wrote:
>> >> >> Agree errors should be part of the protocol
>> >> >>
>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira <gs...@cloudera.com>
>> >> wrote:
>> >> >>>
>> >> >>> Hi,
>> >> >>>
>> >> >>> I got convinced by Joe and Dana that errors are indeed part of the
>> >> >>> protocol and can't be randomly added.
>> >> >>>
>> >> >>> So, it looks like we need to bump version of ProduceRequest in the
>> >> >>> following way:
>> >> >>> Version 0 -> accept acks >1. I think we should keep the existing
>> >> >>> behavior too (i.e. not replace it with -1) to avoid surprising
>> >> >>> clients, but I'm willing to hear other opinions.
>> >> >>> Version 1 -> do not accept acks >1 and return an error.
>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use something
>> >> >>> less specific like InvalidRequestParameter. This error can be reused
>> >> >>> in the future and reduce the need to add errors, but will also be
>> less
>> >> >>> clear to the client and its users. Maybe even add the error message
>> >> >>> string to the protocol in addition to the error code? (since we are
>> >> >>> bumping versions....)
>> >> >>>
>> >> >>> I think maintaining the old version throughout 0.8.X makes sense.
>> IMO
>> >> >>> dropping it for 0.9 is feasible, but I'll let client owners help
>> make
>> >> >>> that call.
>> >> >>>
>> >> >>> Am I missing anything? Should I start a KIP? It seems like a
>> KIP-type
>> >> >>> discussion :)
>> >> >>>
>> >> >>> Gwen
>> >> >>>
>> >> >>>
>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>> >> >>> <ew...@confluent.io> wrote:
>> >> >>>> Gwen,
>> >> >>>>
>> >> >>>> I think the only option that wouldn't require a protocol version
>> >> change is
>> >> >>>> the one where acks > 1 is converted to acks = -1 since it's the
>> only
>> >> one
>> >> >>>> that doesn't potentially break older clients. The protocol guide
>> >> says that
>> >> >>>> the expected upgrade path is servers first, then clients, so old
>> >> clients,
>> >> >>>> including non-java clients, that may be using acks > 1 should be
>> >> able to
>> >> >>>> work with a new broker version.
>> >> >>>>
>> >> >>>> It's more work, but I think dealing with the protocol change is the
>> >> right
>> >> >>>> thing to do since it eventually gets us to the behavior I think is
>> >> better --
>> >> >>>> the broker should reject requests with invalid values. I think Joe
>> >> and I
>> >> >>>> were basically in agreement. In my mind the major piece missing
>> from
>> >> his
>> >> >>>> description is how long we're going to maintain his "case 0"
>> >> behavior. It's
>> >> >>>> impractical to maintain old versions forever, but it sounds like
>> >> there
>> >> >>>> hasn't been a decision on how long to maintain them. Maybe that's
>> >> another
>> >> >>>> item to add to KIPs -- protocol versions and behavior need to be
>> >> listed as
>> >> >>>> deprecated and the earliest version in which they'll be removed
>> >> should be
>> >> >>>> specified so users can understand which versions are guaranteed to
>> be
>> >> >>>> compatible, even if they're using (well-written) non-java clients.
>> >> >>>>
>> >> >>>> -Ewen
>> >> >>>>
>> >> >>>>
>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
>> >> dana.powers@gmail.com> wrote:
>> >> >>>>>
>> >> >>>>>> clients don't break on unknown errors
>> >> >>>>>
>> >> >>>>> maybe true for the official java clients, but I dont think the
>> >> assumption
>> >> >>>>> holds true for community-maintained clients and users of those
>> >> clients.
>> >> >>>>> kafka-python generally follows the fail-fast philosophy and raises
>> >> an
>> >> >>>>> exception on any unrecognized error code in any server response.
>> >> in this
>> >> >>>>> case, kafka-python allows users to set their own required-acks
>> >> policy when
>> >> >>>>> creating a producer instance.  It is possible that users of
>> >> kafka-python
>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps in
>> production
>> >> >>>>> environments -- and for those users the new error code will crash
>> >> their
>> >> >>>>> producer code.  I would not be surprised if the same were true of
>> >> other
>> >> >>>>> community clients.
>> >> >>>>>
>> >> >>>>> *one reason for the fail-fast approach is that there isn't great
>> >> >>>>> documentation on what errors to expect for each request / response
>> >> -- so
>> >> >>>>> we
>> >> >>>>> use failures to alert that some error case is not handled
>> >> properly.  and
>> >> >>>>> because of that, introducing new error cases without bumping the
>> api
>> >> >>>>> version is likely to cause those errors to get raised/thrown all
>> >> the way
>> >> >>>>> back up to the user.  of course we (client maintainers) can fix
>> the
>> >> issues
>> >> >>>>> in the client libraries and suggest users upgrade, but it's not
>> the
>> >> ideal
>> >> >>>>> situation.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> long-winded way of saying: I agree w/ Joe.
>> >> >>>>>
>> >> >>>>> -Dana
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
>> >> gshapira@cloudera.com>
>> >> >>>>> wrote:
>> >> >>>>>
>> >> >>>>>> Is the protocol bump caused by the behavior change or the new
>> error
>> >> >>>>>> code?
>> >> >>>>>>
>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to receive
>> >> errors
>> >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK, clients
>> >> don't
>> >> >>>>>> break on unknown errors, they are simple more challenging to
>> >> debug. If
>> >> >>>>>> we document the new behavior, then its definitely debuggable and
>> >> >>>>>> fixable.
>> >> >>>>>>
>> >> >>>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1
>> >> were
>> >> >>>>>> never documented, and are not supported by Kafka clients starting
>> >> with
>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump either,
>> >> >>>>>> although its a better case than new error codes.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Gwen
>> >> >>>>>>
>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
>> joe.stein@stealth.ly>
>> >> >>>>>> wrote:
>> >> >>>>>>> Looping in the mailing list that the client developers live on
>> >> because
>> >> >>>>>> they
>> >> >>>>>>> are all not on dev (though they should be if they want to be
>> >> helping
>> >> >>>>>>> to
>> >> >>>>>>> build the best client libraries they can).
>> >> >>>>>>>
>> >> >>>>>>> I whole hardily believe that we need to not break existing
>> >> >>>>>>> functionality
>> >> >>>>>> of
>> >> >>>>>>> the client protocol, ever.
>> >> >>>>>>>
>> >> >>>>>>> There are many reasons for this and we have other threads on the
>> >> >>>>>>> mailing
>> >> >>>>>>> list where we are discussing that topic (no pun intended) that I
>> >> don't
>> >> >>>>>> want
>> >> >>>>>>> to re-hash here.
>> >> >>>>>>>
>> >> >>>>>>> If we change wire protocol functionality OR the binary format
>> >> (either)
>> >> >>>>>>> we
>> >> >>>>>>> must bump version AND treat version as a feature flag with
>> >> backward
>> >> >>>>>>> compatibility support until it is deprecated for some time for
>> >> folks
>> >> >>>>>>> to
>> >> >>>>>> deal
>> >> >>>>>>> with it.
>> >> >>>>>>>
>> >> >>>>>>> match version = {
>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
>> >> >>>>>>> case 1: doNewStuff()
>> >> >>>>>>> case 2: doEvenMoreNewStuff()
>> >> >>>>>>> }
>> >> >>>>>>>
>> >> >>>>>>> has to be a practice we adopt imho ... I know feature flags can
>> be
>> >> >>>>>> construed
>> >> >>>>>>> as "messy code" but I am eager to hear another (better?
>> >> different?)
>> >> >>>>>> solution
>> >> >>>>>>> to this.
>> >> >>>>>>>
>> >> >>>>>>> If we don't do a feature flag like this specifically with this
>> >> change
>> >> >>>>>> then
>> >> >>>>>>> what happens is that someone upgrades their brokers with a
>> rolling
>> >> >>>>>> restart
>> >> >>>>>>> in 0.8.3 and every single one of their producer requests start
>> to
>> >> fail
>> >> >>>>>> and
>> >> >>>>>>> they have a major production outage. eeeek!!!!
>> >> >>>>>>>
>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need
>> >> people to
>> >> >>>>>> start
>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going to
>> >> work for
>> >> >>>>>>> everyone.
>> >> >>>>>>>
>> >> >>>>>>> Old producers and consumers must keep working with new brokers
>> >> and if
>> >> >>>>>>> we
>> >> >>>>>> are
>> >> >>>>>>> not going to support that then I am unclear what the use of
>> >> "version"
>> >> >>>>>>> is
>> >> >>>>>>> based on our original intentions of having it because of the
>> >> >>>>>>> 0.7=>-0.8.
>> >> >>>>>> We
>> >> >>>>>>> said no more breaking changes when we did that.
>> >> >>>>>>>
>> >> >>>>>>> - Joe Stein
>> >> >>>>>>>
>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
>> >> >>>>>> ewen@confluent.io>
>> >> >>>>>>> wrote:
>> >> >>>>>>>>
>> >> >>>>>>>> Right, so this looks like it could create an issue similar to
>> >> what's
>> >> >>>>>>>> currently being discussed in
>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users
>> >> now get
>> >> >>>>>>>> errors
>> >> >>>>>>>> under conditions when they previously wouldn't. Old clients
>> won't
>> >> >>>>>>>> even
>> >> >>>>>>>> know
>> >> >>>>>>>> about the error code, so besides failing they won't even be
>> able
>> >> to
>> >> >>>>>>>> log
>> >> >>>>>>>> any
>> >> >>>>>>>> meaningful error messages.
>> >> >>>>>>>>
>> >> >>>>>>>> I think there are two options for compatibility:
>> >> >>>>>>>>
>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1 code, but
>> >> silently
>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the
>> >> same as
>> >> >>>>>>>> other
>> >> >>>>>>>> changes to behavior since the interaction between the client
>> and
>> >> >>>>>>>> server
>> >> >>>>>>>> remains the same, no error codes change, etc. The client might
>> >> just
>> >> >>>>>>>> see
>> >> >>>>>>>> some increased latency since the message might need to be
>> >> replicated
>> >> >>>>>>>> to
>> >> >>>>>>>> more brokers than they requested.
>> >> >>>>>>>> 2. Split this into two patches, one that bumps the protocol
>> >> version
>> >> >>>>>>>> on
>> >> >>>>>>>> that
>> >> >>>>>>>> message to include the new error code but maintains both old
>> (now
>> >> >>>>>>>> deprecated) and new behavior, then a second that would be
>> >> applied in
>> >> >>>>>>>> a
>> >> >>>>>>>> later release that removes the old protocol + code for handling
>> >> acks
>> >> >>>>>> 1.
>> >> >>>>>>>>
>> >> >>>>>>>> 2 is probably the right thing to do. If we specify the release
>> >> when
>> >> >>>>>> we'll
>> >> >>>>>>>> remove the deprecated protocol at the time of deprecation it
>> >> makes
>> >> >>>>>> things
>> >> >>>>>>>> a
>> >> >>>>>>>> lot easier for people writing non-java clients and could give
>> >> users
>> >> >>>>>> better
>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major release
>> >> behind
>> >> >>>>>>>> brokers,
>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
>> features).
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
>> >> gshapira@cloudera.com>
>> >> >>>>>>>> wrote:
>> >> >>>>>>>>
>> >> >>>>>>>>> Hi Kafka Devs,
>> >> >>>>>>>>>
>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on
>> >> the
>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a
>> >> heads up
>> >> >>>>>>>>> on
>> >> >>>>>>>>> what these changes mean.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Once this patch is included, any produce requests that include
>> >> >>>>>>>>> request.required.acks > 1 will result in an exception. This
>> >> will be
>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume)
>> and
>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I can't add
>> >> error
>> >> >>>>>>>>> codes retroactively).
>> >> >>>>>>>>>
>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and
>> >> >>>>>>>>> new),
>> >> >>>>>>>>> but we expect impact on users with older producers that relied
>> >> on
>> >> >>>>>>>>> acks
>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
>> >> >>>>>>>>>
>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch to using
>> >> acks =
>> >> >>>>>>>>> -1
>> >> >>>>>>>>> and a min.isr parameter than matches their user case.
>> >> >>>>>>>>>
>> >> >>>>>>>>> This change was discussed in the past in the context of
>> >> KAFKA-1555
>> >> >>>>>>>>> (min.isr), but let us know if you have any questions or
>> concerns
>> >> >>>>>>>>> regarding this change.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Gwen
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> --
>> >> >>>>>>>> Thanks,
>> >> >>>>>>>> Ewen
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> --
>> >> >>>>>>> You received this message because you are subscribed to the
>> Google
>> >> >>>>>>> Groups
>> >> >>>>>>> "kafka-clients" group.
>> >> >>>>>>> To unsubscribe from this group and stop receiving emails from
>> it,
>> >> send
>> >> >>>>>>> an
>> >> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>> >> >>>>>>> To post to this group, send email to
>> >> kafka-clients@googlegroups.com.
>> >> >>>>>>> Visit this group at
>> http://groups.google.com/group/kafka-clients.
>> >> >>>>>>> To view this discussion on the web visit
>> >> >>>>>>
>> >> >>>>>>
>> >>
>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>> >> >>>>>> .
>> >> >>>>>>>
>> >> >>>>>>> For more options, visit https://groups.google.com/d/optout.
>> >> >>>>>>
>> >> >>>>>> --
>> >> >>>>>> You received this message because you are subscribed to the
>> Google
>> >> >>>>>> Groups
>> >> >>>>>> "kafka-clients" group.
>> >> >>>>>> To unsubscribe from this group and stop receiving emails from it,
>> >> send
>> >> >>>>>> an
>> >> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>> >> >>>>>> To post to this group, send email to
>> >> kafka-clients@googlegroups.com.
>> >> >>>>>> Visit this group at http://groups.google.com/group/kafka-clients
>> .
>> >> >>>>>> To view this discussion on the web visit
>> >> >>>>>>
>> >> >>>>>>
>> >>
>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>> >> >>>>>> .
>> >> >>>>>> For more options, visit https://groups.google.com/d/optout.
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>> Thanks,
>> >> >>>> Ewen
>> >>
>> >
>> >
>>
>>
>> --
>> Thanks,
>> Ewen
>>
>
>
>
> --
> Thanks,
> Neha

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Neha Narkhede <ne...@confluent.io>.
Gwen,

KIP write-up looks good. According to the rest of the KIP process proposal,
would you like to start a DISCUSS/VOTE thread for it?

Thanks,
Neha

On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> Gwen -- KIP write up looks good. Deprecation schedule probably needs to be
> more specific, but I think that discussion probably needs to happen after a
> solution is agreed upon.
>
> Jay -- I think "older clients will get a bad error message instead of a
> good one" isn't what would be happening with this change. Previously they
> wouldn't have received an error and they would have been able to produce
> messages. After the change they'll just receive this new error message
> which their clients can't possibly handle gracefully since it didn't exist
> when the client was written. Whether the acks > 1 setting was actually
> accomplishing what they thought doesn't matter. Someone could have
> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok setting for
> their applications, set it as a default across a bunch of apps, then follow
> the recommended upgrade path of updating brokers to 0.8.2 and all their
> apps will start failing on produce requests.
>
>
> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > This is a good case to discuss.
> >
> > Let's figure the general case of how we want to handle errors and get
> that
> > documented in the protocol. The problem right now is that we give no
> > guidance on this. I actually thought Gwen's suggestion made sense on the
> > guidance we should have given which is that we will enumerate a set of
> > errors and their meaning for each API but it is possible that other
> errors
> > will occur and they should be handled (maybe poorly) in the same way
> > UNKNOWN_ERROR is handled which is our normal escape hatch for things like
> > OOMException.
> >
> > I really do think we shouldn't be dogmatic here: In considering a change
> > to errors we should consider the potential ill-effect vs the complexity
> of
> > yet another protocol version.
> >
> > In this case I actually am not sure we need to bump the protocol because
> > the whole point of the change was to make a setting we think doesn't make
> > sense break, right? Well this will break it. It seems like the only
> > downside is that older clients will get a bad error message instead of a
> > good one. But it isn't like we will have rendered a client unusable, it
> is
> > just that they will need to change their config.
> >
> > -Jay
> >
> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira <gs...@cloudera.com>
> > wrote:
> >
> >> I created a KIP for this suggestion:
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
> >>
> >> Basically documenting what was already discussed here.  Comments will
> >> be awesome!
> >>
> >> Gwen
> >>
> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gs...@cloudera.com>
> >> wrote:
> >> > The errors are part of the KIP process now, so I think the clients are
> >> safe :)
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >> >
> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <st...@gmail.com>
> >> wrote:
> >> >> Agree errors should be part of the protocol
> >> >>
> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira <gs...@cloudera.com>
> >> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> I got convinced by Joe and Dana that errors are indeed part of the
> >> >>> protocol and can't be randomly added.
> >> >>>
> >> >>> So, it looks like we need to bump version of ProduceRequest in the
> >> >>> following way:
> >> >>> Version 0 -> accept acks >1. I think we should keep the existing
> >> >>> behavior too (i.e. not replace it with -1) to avoid surprising
> >> >>> clients, but I'm willing to hear other opinions.
> >> >>> Version 1 -> do not accept acks >1 and return an error.
> >> >>> Are we ok with the error I added in KAFKA-1697? We can use something
> >> >>> less specific like InvalidRequestParameter. This error can be reused
> >> >>> in the future and reduce the need to add errors, but will also be
> less
> >> >>> clear to the client and its users. Maybe even add the error message
> >> >>> string to the protocol in addition to the error code? (since we are
> >> >>> bumping versions....)
> >> >>>
> >> >>> I think maintaining the old version throughout 0.8.X makes sense.
> IMO
> >> >>> dropping it for 0.9 is feasible, but I'll let client owners help
> make
> >> >>> that call.
> >> >>>
> >> >>> Am I missing anything? Should I start a KIP? It seems like a
> KIP-type
> >> >>> discussion :)
> >> >>>
> >> >>> Gwen
> >> >>>
> >> >>>
> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
> >> >>> <ew...@confluent.io> wrote:
> >> >>>> Gwen,
> >> >>>>
> >> >>>> I think the only option that wouldn't require a protocol version
> >> change is
> >> >>>> the one where acks > 1 is converted to acks = -1 since it's the
> only
> >> one
> >> >>>> that doesn't potentially break older clients. The protocol guide
> >> says that
> >> >>>> the expected upgrade path is servers first, then clients, so old
> >> clients,
> >> >>>> including non-java clients, that may be using acks > 1 should be
> >> able to
> >> >>>> work with a new broker version.
> >> >>>>
> >> >>>> It's more work, but I think dealing with the protocol change is the
> >> right
> >> >>>> thing to do since it eventually gets us to the behavior I think is
> >> better --
> >> >>>> the broker should reject requests with invalid values. I think Joe
> >> and I
> >> >>>> were basically in agreement. In my mind the major piece missing
> from
> >> his
> >> >>>> description is how long we're going to maintain his "case 0"
> >> behavior. It's
> >> >>>> impractical to maintain old versions forever, but it sounds like
> >> there
> >> >>>> hasn't been a decision on how long to maintain them. Maybe that's
> >> another
> >> >>>> item to add to KIPs -- protocol versions and behavior need to be
> >> listed as
> >> >>>> deprecated and the earliest version in which they'll be removed
> >> should be
> >> >>>> specified so users can understand which versions are guaranteed to
> be
> >> >>>> compatible, even if they're using (well-written) non-java clients.
> >> >>>>
> >> >>>> -Ewen
> >> >>>>
> >> >>>>
> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
> >> dana.powers@gmail.com> wrote:
> >> >>>>>
> >> >>>>>> clients don't break on unknown errors
> >> >>>>>
> >> >>>>> maybe true for the official java clients, but I dont think the
> >> assumption
> >> >>>>> holds true for community-maintained clients and users of those
> >> clients.
> >> >>>>> kafka-python generally follows the fail-fast philosophy and raises
> >> an
> >> >>>>> exception on any unrecognized error code in any server response.
> >> in this
> >> >>>>> case, kafka-python allows users to set their own required-acks
> >> policy when
> >> >>>>> creating a producer instance.  It is possible that users of
> >> kafka-python
> >> >>>>> have deployed producer code that uses ack>1 -- perhaps in
> production
> >> >>>>> environments -- and for those users the new error code will crash
> >> their
> >> >>>>> producer code.  I would not be surprised if the same were true of
> >> other
> >> >>>>> community clients.
> >> >>>>>
> >> >>>>> *one reason for the fail-fast approach is that there isn't great
> >> >>>>> documentation on what errors to expect for each request / response
> >> -- so
> >> >>>>> we
> >> >>>>> use failures to alert that some error case is not handled
> >> properly.  and
> >> >>>>> because of that, introducing new error cases without bumping the
> api
> >> >>>>> version is likely to cause those errors to get raised/thrown all
> >> the way
> >> >>>>> back up to the user.  of course we (client maintainers) can fix
> the
> >> issues
> >> >>>>> in the client libraries and suggest users upgrade, but it's not
> the
> >> ideal
> >> >>>>> situation.
> >> >>>>>
> >> >>>>>
> >> >>>>> long-winded way of saying: I agree w/ Joe.
> >> >>>>>
> >> >>>>> -Dana
> >> >>>>>
> >> >>>>>
> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
> >> gshapira@cloudera.com>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>> Is the protocol bump caused by the behavior change or the new
> error
> >> >>>>>> code?
> >> >>>>>>
> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to receive
> >> errors
> >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK, clients
> >> don't
> >> >>>>>> break on unknown errors, they are simple more challenging to
> >> debug. If
> >> >>>>>> we document the new behavior, then its definitely debuggable and
> >> >>>>>> fixable.
> >> >>>>>>
> >> >>>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1
> >> were
> >> >>>>>> never documented, and are not supported by Kafka clients starting
> >> with
> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump either,
> >> >>>>>> although its a better case than new error codes.
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> Gwen
> >> >>>>>>
> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
> joe.stein@stealth.ly>
> >> >>>>>> wrote:
> >> >>>>>>> Looping in the mailing list that the client developers live on
> >> because
> >> >>>>>> they
> >> >>>>>>> are all not on dev (though they should be if they want to be
> >> helping
> >> >>>>>>> to
> >> >>>>>>> build the best client libraries they can).
> >> >>>>>>>
> >> >>>>>>> I whole hardily believe that we need to not break existing
> >> >>>>>>> functionality
> >> >>>>>> of
> >> >>>>>>> the client protocol, ever.
> >> >>>>>>>
> >> >>>>>>> There are many reasons for this and we have other threads on the
> >> >>>>>>> mailing
> >> >>>>>>> list where we are discussing that topic (no pun intended) that I
> >> don't
> >> >>>>>> want
> >> >>>>>>> to re-hash here.
> >> >>>>>>>
> >> >>>>>>> If we change wire protocol functionality OR the binary format
> >> (either)
> >> >>>>>>> we
> >> >>>>>>> must bump version AND treat version as a feature flag with
> >> backward
> >> >>>>>>> compatibility support until it is deprecated for some time for
> >> folks
> >> >>>>>>> to
> >> >>>>>> deal
> >> >>>>>>> with it.
> >> >>>>>>>
> >> >>>>>>> match version = {
> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
> >> >>>>>>> case 1: doNewStuff()
> >> >>>>>>> case 2: doEvenMoreNewStuff()
> >> >>>>>>> }
> >> >>>>>>>
> >> >>>>>>> has to be a practice we adopt imho ... I know feature flags can
> be
> >> >>>>>> construed
> >> >>>>>>> as "messy code" but I am eager to hear another (better?
> >> different?)
> >> >>>>>> solution
> >> >>>>>>> to this.
> >> >>>>>>>
> >> >>>>>>> If we don't do a feature flag like this specifically with this
> >> change
> >> >>>>>> then
> >> >>>>>>> what happens is that someone upgrades their brokers with a
> rolling
> >> >>>>>> restart
> >> >>>>>>> in 0.8.3 and every single one of their producer requests start
> to
> >> fail
> >> >>>>>> and
> >> >>>>>>> they have a major production outage. eeeek!!!!
> >> >>>>>>>
> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need
> >> people to
> >> >>>>>> start
> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going to
> >> work for
> >> >>>>>>> everyone.
> >> >>>>>>>
> >> >>>>>>> Old producers and consumers must keep working with new brokers
> >> and if
> >> >>>>>>> we
> >> >>>>>> are
> >> >>>>>>> not going to support that then I am unclear what the use of
> >> "version"
> >> >>>>>>> is
> >> >>>>>>> based on our original intentions of having it because of the
> >> >>>>>>> 0.7=>-0.8.
> >> >>>>>> We
> >> >>>>>>> said no more breaking changes when we did that.
> >> >>>>>>>
> >> >>>>>>> - Joe Stein
> >> >>>>>>>
> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
> >> >>>>>> ewen@confluent.io>
> >> >>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>> Right, so this looks like it could create an issue similar to
> >> what's
> >> >>>>>>>> currently being discussed in
> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users
> >> now get
> >> >>>>>>>> errors
> >> >>>>>>>> under conditions when they previously wouldn't. Old clients
> won't
> >> >>>>>>>> even
> >> >>>>>>>> know
> >> >>>>>>>> about the error code, so besides failing they won't even be
> able
> >> to
> >> >>>>>>>> log
> >> >>>>>>>> any
> >> >>>>>>>> meaningful error messages.
> >> >>>>>>>>
> >> >>>>>>>> I think there are two options for compatibility:
> >> >>>>>>>>
> >> >>>>>>>> 1. An alternative change is to remove the ack > 1 code, but
> >> silently
> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the
> >> same as
> >> >>>>>>>> other
> >> >>>>>>>> changes to behavior since the interaction between the client
> and
> >> >>>>>>>> server
> >> >>>>>>>> remains the same, no error codes change, etc. The client might
> >> just
> >> >>>>>>>> see
> >> >>>>>>>> some increased latency since the message might need to be
> >> replicated
> >> >>>>>>>> to
> >> >>>>>>>> more brokers than they requested.
> >> >>>>>>>> 2. Split this into two patches, one that bumps the protocol
> >> version
> >> >>>>>>>> on
> >> >>>>>>>> that
> >> >>>>>>>> message to include the new error code but maintains both old
> (now
> >> >>>>>>>> deprecated) and new behavior, then a second that would be
> >> applied in
> >> >>>>>>>> a
> >> >>>>>>>> later release that removes the old protocol + code for handling
> >> acks
> >> >>>>>> 1.
> >> >>>>>>>>
> >> >>>>>>>> 2 is probably the right thing to do. If we specify the release
> >> when
> >> >>>>>> we'll
> >> >>>>>>>> remove the deprecated protocol at the time of deprecation it
> >> makes
> >> >>>>>> things
> >> >>>>>>>> a
> >> >>>>>>>> lot easier for people writing non-java clients and could give
> >> users
> >> >>>>>> better
> >> >>>>>>>> predictability (e.g. if clients are at most 1 major release
> >> behind
> >> >>>>>>>> brokers,
> >> >>>>>>>> they'll remain compatible but possibly use deprecated
> features).
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
> >> gshapira@cloudera.com>
> >> >>>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> Hi Kafka Devs,
> >> >>>>>>>>>
> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on
> >> the
> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a
> >> heads up
> >> >>>>>>>>> on
> >> >>>>>>>>> what these changes mean.
> >> >>>>>>>>>
> >> >>>>>>>>> Once this patch is included, any produce requests that include
> >> >>>>>>>>> request.required.acks > 1 will result in an exception. This
> >> will be
> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume)
> and
> >> >>>>>>>>> UnknownException in existing versions (sorry, but I can't add
> >> error
> >> >>>>>>>>> codes retroactively).
> >> >>>>>>>>>
> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and
> >> >>>>>>>>> new),
> >> >>>>>>>>> but we expect impact on users with older producers that relied
> >> on
> >> >>>>>>>>> acks
> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
> >> >>>>>>>>>
> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch to using
> >> acks =
> >> >>>>>>>>> -1
> >> >>>>>>>>> and a min.isr parameter than matches their user case.
> >> >>>>>>>>>
> >> >>>>>>>>> This change was discussed in the past in the context of
> >> KAFKA-1555
> >> >>>>>>>>> (min.isr), but let us know if you have any questions or
> concerns
> >> >>>>>>>>> regarding this change.
> >> >>>>>>>>>
> >> >>>>>>>>> Gwen
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> --
> >> >>>>>>>> Thanks,
> >> >>>>>>>> Ewen
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>> You received this message because you are subscribed to the
> Google
> >> >>>>>>> Groups
> >> >>>>>>> "kafka-clients" group.
> >> >>>>>>> To unsubscribe from this group and stop receiving emails from
> it,
> >> send
> >> >>>>>>> an
> >> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
> >> >>>>>>> To post to this group, send email to
> >> kafka-clients@googlegroups.com.
> >> >>>>>>> Visit this group at
> http://groups.google.com/group/kafka-clients.
> >> >>>>>>> To view this discussion on the web visit
> >> >>>>>>
> >> >>>>>>
> >>
> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
> >> >>>>>> .
> >> >>>>>>>
> >> >>>>>>> For more options, visit https://groups.google.com/d/optout.
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> You received this message because you are subscribed to the
> Google
> >> >>>>>> Groups
> >> >>>>>> "kafka-clients" group.
> >> >>>>>> To unsubscribe from this group and stop receiving emails from it,
> >> send
> >> >>>>>> an
> >> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
> >> >>>>>> To post to this group, send email to
> >> kafka-clients@googlegroups.com.
> >> >>>>>> Visit this group at http://groups.google.com/group/kafka-clients
> .
> >> >>>>>> To view this discussion on the web visit
> >> >>>>>>
> >> >>>>>>
> >>
> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
> >> >>>>>> .
> >> >>>>>> For more options, visit https://groups.google.com/d/optout.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Thanks,
> >> >>>> Ewen
> >>
> >
> >
>
>
> --
> Thanks,
> Ewen
>



-- 
Thanks,
Neha

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
Gwen -- KIP write up looks good. Deprecation schedule probably needs to be
more specific, but I think that discussion probably needs to happen after a
solution is agreed upon.

Jay -- I think "older clients will get a bad error message instead of a
good one" isn't what would be happening with this change. Previously they
wouldn't have received an error and they would have been able to produce
messages. After the change they'll just receive this new error message
which their clients can't possibly handle gracefully since it didn't exist
when the client was written. Whether the acks > 1 setting was actually
accomplishing what they thought doesn't matter. Someone could have
reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok setting for
their applications, set it as a default across a bunch of apps, then follow
the recommended upgrade path of updating brokers to 0.8.2 and all their
apps will start failing on produce requests.


On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <ja...@gmail.com> wrote:

> This is a good case to discuss.
>
> Let's figure the general case of how we want to handle errors and get that
> documented in the protocol. The problem right now is that we give no
> guidance on this. I actually thought Gwen's suggestion made sense on the
> guidance we should have given which is that we will enumerate a set of
> errors and their meaning for each API but it is possible that other errors
> will occur and they should be handled (maybe poorly) in the same way
> UNKNOWN_ERROR is handled which is our normal escape hatch for things like
> OOMException.
>
> I really do think we shouldn't be dogmatic here: In considering a change
> to errors we should consider the potential ill-effect vs the complexity of
> yet another protocol version.
>
> In this case I actually am not sure we need to bump the protocol because
> the whole point of the change was to make a setting we think doesn't make
> sense break, right? Well this will break it. It seems like the only
> downside is that older clients will get a bad error message instead of a
> good one. But it isn't like we will have rendered a client unusable, it is
> just that they will need to change their config.
>
> -Jay
>
> On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
>> I created a KIP for this suggestion:
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>>
>> Basically documenting what was already discussed here.  Comments will
>> be awesome!
>>
>> Gwen
>>
>> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gs...@cloudera.com>
>> wrote:
>> > The errors are part of the KIP process now, so I think the clients are
>> safe :)
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>> >
>> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <st...@gmail.com>
>> wrote:
>> >> Agree errors should be part of the protocol
>> >>
>> >>> On Jan 15, 2015, at 17:59, Gwen Shapira <gs...@cloudera.com>
>> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I got convinced by Joe and Dana that errors are indeed part of the
>> >>> protocol and can't be randomly added.
>> >>>
>> >>> So, it looks like we need to bump version of ProduceRequest in the
>> >>> following way:
>> >>> Version 0 -> accept acks >1. I think we should keep the existing
>> >>> behavior too (i.e. not replace it with -1) to avoid surprising
>> >>> clients, but I'm willing to hear other opinions.
>> >>> Version 1 -> do not accept acks >1 and return an error.
>> >>> Are we ok with the error I added in KAFKA-1697? We can use something
>> >>> less specific like InvalidRequestParameter. This error can be reused
>> >>> in the future and reduce the need to add errors, but will also be less
>> >>> clear to the client and its users. Maybe even add the error message
>> >>> string to the protocol in addition to the error code? (since we are
>> >>> bumping versions....)
>> >>>
>> >>> I think maintaining the old version throughout 0.8.X makes sense. IMO
>> >>> dropping it for 0.9 is feasible, but I'll let client owners help make
>> >>> that call.
>> >>>
>> >>> Am I missing anything? Should I start a KIP? It seems like a KIP-type
>> >>> discussion :)
>> >>>
>> >>> Gwen
>> >>>
>> >>>
>> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>> >>> <ew...@confluent.io> wrote:
>> >>>> Gwen,
>> >>>>
>> >>>> I think the only option that wouldn't require a protocol version
>> change is
>> >>>> the one where acks > 1 is converted to acks = -1 since it's the only
>> one
>> >>>> that doesn't potentially break older clients. The protocol guide
>> says that
>> >>>> the expected upgrade path is servers first, then clients, so old
>> clients,
>> >>>> including non-java clients, that may be using acks > 1 should be
>> able to
>> >>>> work with a new broker version.
>> >>>>
>> >>>> It's more work, but I think dealing with the protocol change is the
>> right
>> >>>> thing to do since it eventually gets us to the behavior I think is
>> better --
>> >>>> the broker should reject requests with invalid values. I think Joe
>> and I
>> >>>> were basically in agreement. In my mind the major piece missing from
>> his
>> >>>> description is how long we're going to maintain his "case 0"
>> behavior. It's
>> >>>> impractical to maintain old versions forever, but it sounds like
>> there
>> >>>> hasn't been a decision on how long to maintain them. Maybe that's
>> another
>> >>>> item to add to KIPs -- protocol versions and behavior need to be
>> listed as
>> >>>> deprecated and the earliest version in which they'll be removed
>> should be
>> >>>> specified so users can understand which versions are guaranteed to be
>> >>>> compatible, even if they're using (well-written) non-java clients.
>> >>>>
>> >>>> -Ewen
>> >>>>
>> >>>>
>> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
>> dana.powers@gmail.com> wrote:
>> >>>>>
>> >>>>>> clients don't break on unknown errors
>> >>>>>
>> >>>>> maybe true for the official java clients, but I dont think the
>> assumption
>> >>>>> holds true for community-maintained clients and users of those
>> clients.
>> >>>>> kafka-python generally follows the fail-fast philosophy and raises
>> an
>> >>>>> exception on any unrecognized error code in any server response.
>> in this
>> >>>>> case, kafka-python allows users to set their own required-acks
>> policy when
>> >>>>> creating a producer instance.  It is possible that users of
>> kafka-python
>> >>>>> have deployed producer code that uses ack>1 -- perhaps in production
>> >>>>> environments -- and for those users the new error code will crash
>> their
>> >>>>> producer code.  I would not be surprised if the same were true of
>> other
>> >>>>> community clients.
>> >>>>>
>> >>>>> *one reason for the fail-fast approach is that there isn't great
>> >>>>> documentation on what errors to expect for each request / response
>> -- so
>> >>>>> we
>> >>>>> use failures to alert that some error case is not handled
>> properly.  and
>> >>>>> because of that, introducing new error cases without bumping the api
>> >>>>> version is likely to cause those errors to get raised/thrown all
>> the way
>> >>>>> back up to the user.  of course we (client maintainers) can fix the
>> issues
>> >>>>> in the client libraries and suggest users upgrade, but it's not the
>> ideal
>> >>>>> situation.
>> >>>>>
>> >>>>>
>> >>>>> long-winded way of saying: I agree w/ Joe.
>> >>>>>
>> >>>>> -Dana
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
>> gshapira@cloudera.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Is the protocol bump caused by the behavior change or the new error
>> >>>>>> code?
>> >>>>>>
>> >>>>>> 1) IMO, error_codes are data, and clients can expect to receive
>> errors
>> >>>>>> that they don't understand (i.e. unknown errors). AFAIK, clients
>> don't
>> >>>>>> break on unknown errors, they are simple more challenging to
>> debug. If
>> >>>>>> we document the new behavior, then its definitely debuggable and
>> >>>>>> fixable.
>> >>>>>>
>> >>>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1
>> were
>> >>>>>> never documented, and are not supported by Kafka clients starting
>> with
>> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump either,
>> >>>>>> although its a better case than new error codes.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Gwen
>> >>>>>>
>> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <jo...@stealth.ly>
>> >>>>>> wrote:
>> >>>>>>> Looping in the mailing list that the client developers live on
>> because
>> >>>>>> they
>> >>>>>>> are all not on dev (though they should be if they want to be
>> helping
>> >>>>>>> to
>> >>>>>>> build the best client libraries they can).
>> >>>>>>>
>> >>>>>>> I whole hardily believe that we need to not break existing
>> >>>>>>> functionality
>> >>>>>> of
>> >>>>>>> the client protocol, ever.
>> >>>>>>>
>> >>>>>>> There are many reasons for this and we have other threads on the
>> >>>>>>> mailing
>> >>>>>>> list where we are discussing that topic (no pun intended) that I
>> don't
>> >>>>>> want
>> >>>>>>> to re-hash here.
>> >>>>>>>
>> >>>>>>> If we change wire protocol functionality OR the binary format
>> (either)
>> >>>>>>> we
>> >>>>>>> must bump version AND treat version as a feature flag with
>> backward
>> >>>>>>> compatibility support until it is deprecated for some time for
>> folks
>> >>>>>>> to
>> >>>>>> deal
>> >>>>>>> with it.
>> >>>>>>>
>> >>>>>>> match version = {
>> >>>>>>> case 0: keepDoingWhatWeWereDoing()
>> >>>>>>> case 1: doNewStuff()
>> >>>>>>> case 2: doEvenMoreNewStuff()
>> >>>>>>> }
>> >>>>>>>
>> >>>>>>> has to be a practice we adopt imho ... I know feature flags can be
>> >>>>>> construed
>> >>>>>>> as "messy code" but I am eager to hear another (better?
>> different?)
>> >>>>>> solution
>> >>>>>>> to this.
>> >>>>>>>
>> >>>>>>> If we don't do a feature flag like this specifically with this
>> change
>> >>>>>> then
>> >>>>>>> what happens is that someone upgrades their brokers with a rolling
>> >>>>>> restart
>> >>>>>>> in 0.8.3 and every single one of their producer requests start to
>> fail
>> >>>>>> and
>> >>>>>>> they have a major production outage. eeeek!!!!
>> >>>>>>>
>> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need
>> people to
>> >>>>>> start
>> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going to
>> work for
>> >>>>>>> everyone.
>> >>>>>>>
>> >>>>>>> Old producers and consumers must keep working with new brokers
>> and if
>> >>>>>>> we
>> >>>>>> are
>> >>>>>>> not going to support that then I am unclear what the use of
>> "version"
>> >>>>>>> is
>> >>>>>>> based on our original intentions of having it because of the
>> >>>>>>> 0.7=>-0.8.
>> >>>>>> We
>> >>>>>>> said no more breaking changes when we did that.
>> >>>>>>>
>> >>>>>>> - Joe Stein
>> >>>>>>>
>> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
>> >>>>>> ewen@confluent.io>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Right, so this looks like it could create an issue similar to
>> what's
>> >>>>>>>> currently being discussed in
>> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users
>> now get
>> >>>>>>>> errors
>> >>>>>>>> under conditions when they previously wouldn't. Old clients won't
>> >>>>>>>> even
>> >>>>>>>> know
>> >>>>>>>> about the error code, so besides failing they won't even be able
>> to
>> >>>>>>>> log
>> >>>>>>>> any
>> >>>>>>>> meaningful error messages.
>> >>>>>>>>
>> >>>>>>>> I think there are two options for compatibility:
>> >>>>>>>>
>> >>>>>>>> 1. An alternative change is to remove the ack > 1 code, but
>> silently
>> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the
>> same as
>> >>>>>>>> other
>> >>>>>>>> changes to behavior since the interaction between the client and
>> >>>>>>>> server
>> >>>>>>>> remains the same, no error codes change, etc. The client might
>> just
>> >>>>>>>> see
>> >>>>>>>> some increased latency since the message might need to be
>> replicated
>> >>>>>>>> to
>> >>>>>>>> more brokers than they requested.
>> >>>>>>>> 2. Split this into two patches, one that bumps the protocol
>> version
>> >>>>>>>> on
>> >>>>>>>> that
>> >>>>>>>> message to include the new error code but maintains both old (now
>> >>>>>>>> deprecated) and new behavior, then a second that would be
>> applied in
>> >>>>>>>> a
>> >>>>>>>> later release that removes the old protocol + code for handling
>> acks
>> >>>>>> 1.
>> >>>>>>>>
>> >>>>>>>> 2 is probably the right thing to do. If we specify the release
>> when
>> >>>>>> we'll
>> >>>>>>>> remove the deprecated protocol at the time of deprecation it
>> makes
>> >>>>>> things
>> >>>>>>>> a
>> >>>>>>>> lot easier for people writing non-java clients and could give
>> users
>> >>>>>> better
>> >>>>>>>> predictability (e.g. if clients are at most 1 major release
>> behind
>> >>>>>>>> brokers,
>> >>>>>>>> they'll remain compatible but possibly use deprecated features).
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
>> gshapira@cloudera.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi Kafka Devs,
>> >>>>>>>>>
>> >>>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on
>> the
>> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a
>> heads up
>> >>>>>>>>> on
>> >>>>>>>>> what these changes mean.
>> >>>>>>>>>
>> >>>>>>>>> Once this patch is included, any produce requests that include
>> >>>>>>>>> request.required.acks > 1 will result in an exception. This
>> will be
>> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
>> >>>>>>>>> UnknownException in existing versions (sorry, but I can't add
>> error
>> >>>>>>>>> codes retroactively).
>> >>>>>>>>>
>> >>>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and
>> >>>>>>>>> new),
>> >>>>>>>>> but we expect impact on users with older producers that relied
>> on
>> >>>>>>>>> acks
>> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
>> >>>>>>>>>
>> >>>>>>>>> Users who relied on acks > 1 are expected to switch to using
>> acks =
>> >>>>>>>>> -1
>> >>>>>>>>> and a min.isr parameter than matches their user case.
>> >>>>>>>>>
>> >>>>>>>>> This change was discussed in the past in the context of
>> KAFKA-1555
>> >>>>>>>>> (min.isr), but let us know if you have any questions or concerns
>> >>>>>>>>> regarding this change.
>> >>>>>>>>>
>> >>>>>>>>> Gwen
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Thanks,
>> >>>>>>>> Ewen
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> You received this message because you are subscribed to the Google
>> >>>>>>> Groups
>> >>>>>>> "kafka-clients" group.
>> >>>>>>> To unsubscribe from this group and stop receiving emails from it,
>> send
>> >>>>>>> an
>> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>> >>>>>>> To post to this group, send email to
>> kafka-clients@googlegroups.com.
>> >>>>>>> Visit this group at http://groups.google.com/group/kafka-clients.
>> >>>>>>> To view this discussion on the web visit
>> >>>>>>
>> >>>>>>
>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>> >>>>>> .
>> >>>>>>>
>> >>>>>>> For more options, visit https://groups.google.com/d/optout.
>> >>>>>>
>> >>>>>> --
>> >>>>>> You received this message because you are subscribed to the Google
>> >>>>>> Groups
>> >>>>>> "kafka-clients" group.
>> >>>>>> To unsubscribe from this group and stop receiving emails from it,
>> send
>> >>>>>> an
>> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>> >>>>>> To post to this group, send email to
>> kafka-clients@googlegroups.com.
>> >>>>>> Visit this group at http://groups.google.com/group/kafka-clients.
>> >>>>>> To view this discussion on the web visit
>> >>>>>>
>> >>>>>>
>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>> >>>>>> .
>> >>>>>> For more options, visit https://groups.google.com/d/optout.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Thanks,
>> >>>> Ewen
>>
>
>


-- 
Thanks,
Ewen

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Jay Kreps <ja...@gmail.com>.
This is a good case to discuss.

Let's figure the general case of how we want to handle errors and get that
documented in the protocol. The problem right now is that we give no
guidance on this. I actually thought Gwen's suggestion made sense on the
guidance we should have given which is that we will enumerate a set of
errors and their meaning for each API but it is possible that other errors
will occur and they should be handled (maybe poorly) in the same way
UNKNOWN_ERROR is handled which is our normal escape hatch for things like
OOMException.

I really do think we shouldn't be dogmatic here: In considering a change to
errors we should consider the potential ill-effect vs the complexity of yet
another protocol version.

In this case I actually am not sure we need to bump the protocol because
the whole point of the change was to make a setting we think doesn't make
sense break, right? Well this will break it. It seems like the only
downside is that older clients will get a bad error message instead of a
good one. But it isn't like we will have rendered a client unusable, it is
just that they will need to change their config.

-Jay

On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> I created a KIP for this suggestion:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>
> Basically documenting what was already discussed here.  Comments will
> be awesome!
>
> Gwen
>
> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
> > The errors are part of the KIP process now, so I think the clients are
> safe :)
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >
> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <st...@gmail.com>
> wrote:
> >> Agree errors should be part of the protocol
> >>
> >>> On Jan 15, 2015, at 17:59, Gwen Shapira <gs...@cloudera.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I got convinced by Joe and Dana that errors are indeed part of the
> >>> protocol and can't be randomly added.
> >>>
> >>> So, it looks like we need to bump version of ProduceRequest in the
> >>> following way:
> >>> Version 0 -> accept acks >1. I think we should keep the existing
> >>> behavior too (i.e. not replace it with -1) to avoid surprising
> >>> clients, but I'm willing to hear other opinions.
> >>> Version 1 -> do not accept acks >1 and return an error.
> >>> Are we ok with the error I added in KAFKA-1697? We can use something
> >>> less specific like InvalidRequestParameter. This error can be reused
> >>> in the future and reduce the need to add errors, but will also be less
> >>> clear to the client and its users. Maybe even add the error message
> >>> string to the protocol in addition to the error code? (since we are
> >>> bumping versions....)
> >>>
> >>> I think maintaining the old version throughout 0.8.X makes sense. IMO
> >>> dropping it for 0.9 is feasible, but I'll let client owners help make
> >>> that call.
> >>>
> >>> Am I missing anything? Should I start a KIP? It seems like a KIP-type
> >>> discussion :)
> >>>
> >>> Gwen
> >>>
> >>>
> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
> >>> <ew...@confluent.io> wrote:
> >>>> Gwen,
> >>>>
> >>>> I think the only option that wouldn't require a protocol version
> change is
> >>>> the one where acks > 1 is converted to acks = -1 since it's the only
> one
> >>>> that doesn't potentially break older clients. The protocol guide says
> that
> >>>> the expected upgrade path is servers first, then clients, so old
> clients,
> >>>> including non-java clients, that may be using acks > 1 should be able
> to
> >>>> work with a new broker version.
> >>>>
> >>>> It's more work, but I think dealing with the protocol change is the
> right
> >>>> thing to do since it eventually gets us to the behavior I think is
> better --
> >>>> the broker should reject requests with invalid values. I think Joe
> and I
> >>>> were basically in agreement. In my mind the major piece missing from
> his
> >>>> description is how long we're going to maintain his "case 0"
> behavior. It's
> >>>> impractical to maintain old versions forever, but it sounds like there
> >>>> hasn't been a decision on how long to maintain them. Maybe that's
> another
> >>>> item to add to KIPs -- protocol versions and behavior need to be
> listed as
> >>>> deprecated and the earliest version in which they'll be removed
> should be
> >>>> specified so users can understand which versions are guaranteed to be
> >>>> compatible, even if they're using (well-written) non-java clients.
> >>>>
> >>>> -Ewen
> >>>>
> >>>>
> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <da...@gmail.com>
> wrote:
> >>>>>
> >>>>>> clients don't break on unknown errors
> >>>>>
> >>>>> maybe true for the official java clients, but I dont think the
> assumption
> >>>>> holds true for community-maintained clients and users of those
> clients.
> >>>>> kafka-python generally follows the fail-fast philosophy and raises an
> >>>>> exception on any unrecognized error code in any server response.  in
> this
> >>>>> case, kafka-python allows users to set their own required-acks
> policy when
> >>>>> creating a producer instance.  It is possible that users of
> kafka-python
> >>>>> have deployed producer code that uses ack>1 -- perhaps in production
> >>>>> environments -- and for those users the new error code will crash
> their
> >>>>> producer code.  I would not be surprised if the same were true of
> other
> >>>>> community clients.
> >>>>>
> >>>>> *one reason for the fail-fast approach is that there isn't great
> >>>>> documentation on what errors to expect for each request / response
> -- so
> >>>>> we
> >>>>> use failures to alert that some error case is not handled properly.
> and
> >>>>> because of that, introducing new error cases without bumping the api
> >>>>> version is likely to cause those errors to get raised/thrown all the
> way
> >>>>> back up to the user.  of course we (client maintainers) can fix the
> issues
> >>>>> in the client libraries and suggest users upgrade, but it's not the
> ideal
> >>>>> situation.
> >>>>>
> >>>>>
> >>>>> long-winded way of saying: I agree w/ Joe.
> >>>>>
> >>>>> -Dana
> >>>>>
> >>>>>
> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
> gshapira@cloudera.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Is the protocol bump caused by the behavior change or the new error
> >>>>>> code?
> >>>>>>
> >>>>>> 1) IMO, error_codes are data, and clients can expect to receive
> errors
> >>>>>> that they don't understand (i.e. unknown errors). AFAIK, clients
> don't
> >>>>>> break on unknown errors, they are simple more challenging to debug.
> If
> >>>>>> we document the new behavior, then its definitely debuggable and
> >>>>>> fixable.
> >>>>>>
> >>>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1
> were
> >>>>>> never documented, and are not supported by Kafka clients starting
> with
> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump either,
> >>>>>> although its a better case than new error codes.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Gwen
> >>>>>>
> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <jo...@stealth.ly>
> >>>>>> wrote:
> >>>>>>> Looping in the mailing list that the client developers live on
> because
> >>>>>> they
> >>>>>>> are all not on dev (though they should be if they want to be
> helping
> >>>>>>> to
> >>>>>>> build the best client libraries they can).
> >>>>>>>
> >>>>>>> I whole hardily believe that we need to not break existing
> >>>>>>> functionality
> >>>>>> of
> >>>>>>> the client protocol, ever.
> >>>>>>>
> >>>>>>> There are many reasons for this and we have other threads on the
> >>>>>>> mailing
> >>>>>>> list where we are discussing that topic (no pun intended) that I
> don't
> >>>>>> want
> >>>>>>> to re-hash here.
> >>>>>>>
> >>>>>>> If we change wire protocol functionality OR the binary format
> (either)
> >>>>>>> we
> >>>>>>> must bump version AND treat version as a feature flag with backward
> >>>>>>> compatibility support until it is deprecated for some time for
> folks
> >>>>>>> to
> >>>>>> deal
> >>>>>>> with it.
> >>>>>>>
> >>>>>>> match version = {
> >>>>>>> case 0: keepDoingWhatWeWereDoing()
> >>>>>>> case 1: doNewStuff()
> >>>>>>> case 2: doEvenMoreNewStuff()
> >>>>>>> }
> >>>>>>>
> >>>>>>> has to be a practice we adopt imho ... I know feature flags can be
> >>>>>> construed
> >>>>>>> as "messy code" but I am eager to hear another (better? different?)
> >>>>>> solution
> >>>>>>> to this.
> >>>>>>>
> >>>>>>> If we don't do a feature flag like this specifically with this
> change
> >>>>>> then
> >>>>>>> what happens is that someone upgrades their brokers with a rolling
> >>>>>> restart
> >>>>>>> in 0.8.3 and every single one of their producer requests start to
> fail
> >>>>>> and
> >>>>>>> they have a major production outage. eeeek!!!!
> >>>>>>>
> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need
> people to
> >>>>>> start
> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going to work
> for
> >>>>>>> everyone.
> >>>>>>>
> >>>>>>> Old producers and consumers must keep working with new brokers and
> if
> >>>>>>> we
> >>>>>> are
> >>>>>>> not going to support that then I am unclear what the use of
> "version"
> >>>>>>> is
> >>>>>>> based on our original intentions of having it because of the
> >>>>>>> 0.7=>-0.8.
> >>>>>> We
> >>>>>>> said no more breaking changes when we did that.
> >>>>>>>
> >>>>>>> - Joe Stein
> >>>>>>>
> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
> >>>>>> ewen@confluent.io>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Right, so this looks like it could create an issue similar to
> what's
> >>>>>>>> currently being discussed in
> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users now
> get
> >>>>>>>> errors
> >>>>>>>> under conditions when they previously wouldn't. Old clients won't
> >>>>>>>> even
> >>>>>>>> know
> >>>>>>>> about the error code, so besides failing they won't even be able
> to
> >>>>>>>> log
> >>>>>>>> any
> >>>>>>>> meaningful error messages.
> >>>>>>>>
> >>>>>>>> I think there are two options for compatibility:
> >>>>>>>>
> >>>>>>>> 1. An alternative change is to remove the ack > 1 code, but
> silently
> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the
> same as
> >>>>>>>> other
> >>>>>>>> changes to behavior since the interaction between the client and
> >>>>>>>> server
> >>>>>>>> remains the same, no error codes change, etc. The client might
> just
> >>>>>>>> see
> >>>>>>>> some increased latency since the message might need to be
> replicated
> >>>>>>>> to
> >>>>>>>> more brokers than they requested.
> >>>>>>>> 2. Split this into two patches, one that bumps the protocol
> version
> >>>>>>>> on
> >>>>>>>> that
> >>>>>>>> message to include the new error code but maintains both old (now
> >>>>>>>> deprecated) and new behavior, then a second that would be applied
> in
> >>>>>>>> a
> >>>>>>>> later release that removes the old protocol + code for handling
> acks
> >>>>>> 1.
> >>>>>>>>
> >>>>>>>> 2 is probably the right thing to do. If we specify the release
> when
> >>>>>> we'll
> >>>>>>>> remove the deprecated protocol at the time of deprecation it makes
> >>>>>> things
> >>>>>>>> a
> >>>>>>>> lot easier for people writing non-java clients and could give
> users
> >>>>>> better
> >>>>>>>> predictability (e.g. if clients are at most 1 major release behind
> >>>>>>>> brokers,
> >>>>>>>> they'll remain compatible but possibly use deprecated features).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
> gshapira@cloudera.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Kafka Devs,
> >>>>>>>>>
> >>>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on
> the
> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a heads
> up
> >>>>>>>>> on
> >>>>>>>>> what these changes mean.
> >>>>>>>>>
> >>>>>>>>> Once this patch is included, any produce requests that include
> >>>>>>>>> request.required.acks > 1 will result in an exception. This will
> be
> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
> >>>>>>>>> UnknownException in existing versions (sorry, but I can't add
> error
> >>>>>>>>> codes retroactively).
> >>>>>>>>>
> >>>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and
> >>>>>>>>> new),
> >>>>>>>>> but we expect impact on users with older producers that relied on
> >>>>>>>>> acks
> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
> >>>>>>>>>
> >>>>>>>>> Users who relied on acks > 1 are expected to switch to using
> acks =
> >>>>>>>>> -1
> >>>>>>>>> and a min.isr parameter than matches their user case.
> >>>>>>>>>
> >>>>>>>>> This change was discussed in the past in the context of
> KAFKA-1555
> >>>>>>>>> (min.isr), but let us know if you have any questions or concerns
> >>>>>>>>> regarding this change.
> >>>>>>>>>
> >>>>>>>>> Gwen
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Thanks,
> >>>>>>>> Ewen
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> You received this message because you are subscribed to the Google
> >>>>>>> Groups
> >>>>>>> "kafka-clients" group.
> >>>>>>> To unsubscribe from this group and stop receiving emails from it,
> send
> >>>>>>> an
> >>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
> >>>>>>> To post to this group, send email to
> kafka-clients@googlegroups.com.
> >>>>>>> Visit this group at http://groups.google.com/group/kafka-clients.
> >>>>>>> To view this discussion on the web visit
> >>>>>>
> >>>>>>
> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
> >>>>>> .
> >>>>>>>
> >>>>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>>>
> >>>>>> --
> >>>>>> You received this message because you are subscribed to the Google
> >>>>>> Groups
> >>>>>> "kafka-clients" group.
> >>>>>> To unsubscribe from this group and stop receiving emails from it,
> send
> >>>>>> an
> >>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
> >>>>>> To post to this group, send email to kafka-clients@googlegroups.com
> .
> >>>>>> Visit this group at http://groups.google.com/group/kafka-clients.
> >>>>>> To view this discussion on the web visit
> >>>>>>
> >>>>>>
> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
> >>>>>> .
> >>>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Thanks,
> >>>> Ewen
>

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Gwen Shapira <gs...@cloudera.com>.
I created a KIP for this suggestion:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks

Basically documenting what was already discussed here.  Comments will
be awesome!

Gwen

On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gs...@cloudera.com> wrote:
> The errors are part of the KIP process now, so I think the clients are safe :)
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>
> On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <st...@gmail.com> wrote:
>> Agree errors should be part of the protocol
>>
>>> On Jan 15, 2015, at 17:59, Gwen Shapira <gs...@cloudera.com> wrote:
>>>
>>> Hi,
>>>
>>> I got convinced by Joe and Dana that errors are indeed part of the
>>> protocol and can't be randomly added.
>>>
>>> So, it looks like we need to bump version of ProduceRequest in the
>>> following way:
>>> Version 0 -> accept acks >1. I think we should keep the existing
>>> behavior too (i.e. not replace it with -1) to avoid surprising
>>> clients, but I'm willing to hear other opinions.
>>> Version 1 -> do not accept acks >1 and return an error.
>>> Are we ok with the error I added in KAFKA-1697? We can use something
>>> less specific like InvalidRequestParameter. This error can be reused
>>> in the future and reduce the need to add errors, but will also be less
>>> clear to the client and its users. Maybe even add the error message
>>> string to the protocol in addition to the error code? (since we are
>>> bumping versions....)
>>>
>>> I think maintaining the old version throughout 0.8.X makes sense. IMO
>>> dropping it for 0.9 is feasible, but I'll let client owners help make
>>> that call.
>>>
>>> Am I missing anything? Should I start a KIP? It seems like a KIP-type
>>> discussion :)
>>>
>>> Gwen
>>>
>>>
>>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>>> <ew...@confluent.io> wrote:
>>>> Gwen,
>>>>
>>>> I think the only option that wouldn't require a protocol version change is
>>>> the one where acks > 1 is converted to acks = -1 since it's the only one
>>>> that doesn't potentially break older clients. The protocol guide says that
>>>> the expected upgrade path is servers first, then clients, so old clients,
>>>> including non-java clients, that may be using acks > 1 should be able to
>>>> work with a new broker version.
>>>>
>>>> It's more work, but I think dealing with the protocol change is the right
>>>> thing to do since it eventually gets us to the behavior I think is better --
>>>> the broker should reject requests with invalid values. I think Joe and I
>>>> were basically in agreement. In my mind the major piece missing from his
>>>> description is how long we're going to maintain his "case 0" behavior. It's
>>>> impractical to maintain old versions forever, but it sounds like there
>>>> hasn't been a decision on how long to maintain them. Maybe that's another
>>>> item to add to KIPs -- protocol versions and behavior need to be listed as
>>>> deprecated and the earliest version in which they'll be removed should be
>>>> specified so users can understand which versions are guaranteed to be
>>>> compatible, even if they're using (well-written) non-java clients.
>>>>
>>>> -Ewen
>>>>
>>>>
>>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <da...@gmail.com> wrote:
>>>>>
>>>>>> clients don't break on unknown errors
>>>>>
>>>>> maybe true for the official java clients, but I dont think the assumption
>>>>> holds true for community-maintained clients and users of those clients.
>>>>> kafka-python generally follows the fail-fast philosophy and raises an
>>>>> exception on any unrecognized error code in any server response.  in this
>>>>> case, kafka-python allows users to set their own required-acks policy when
>>>>> creating a producer instance.  It is possible that users of kafka-python
>>>>> have deployed producer code that uses ack>1 -- perhaps in production
>>>>> environments -- and for those users the new error code will crash their
>>>>> producer code.  I would not be surprised if the same were true of other
>>>>> community clients.
>>>>>
>>>>> *one reason for the fail-fast approach is that there isn't great
>>>>> documentation on what errors to expect for each request / response -- so
>>>>> we
>>>>> use failures to alert that some error case is not handled properly.  and
>>>>> because of that, introducing new error cases without bumping the api
>>>>> version is likely to cause those errors to get raised/thrown all the way
>>>>> back up to the user.  of course we (client maintainers) can fix the issues
>>>>> in the client libraries and suggest users upgrade, but it's not the ideal
>>>>> situation.
>>>>>
>>>>>
>>>>> long-winded way of saying: I agree w/ Joe.
>>>>>
>>>>> -Dana
>>>>>
>>>>>
>>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <gs...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Is the protocol bump caused by the behavior change or the new error
>>>>>> code?
>>>>>>
>>>>>> 1) IMO, error_codes are data, and clients can expect to receive errors
>>>>>> that they don't understand (i.e. unknown errors). AFAIK, clients don't
>>>>>> break on unknown errors, they are simple more challenging to debug. If
>>>>>> we document the new behavior, then its definitely debuggable and
>>>>>> fixable.
>>>>>>
>>>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1 were
>>>>>> never documented, and are not supported by Kafka clients starting with
>>>>>> version 0.8.2. I'm not sure this requires a protocol bump either,
>>>>>> although its a better case than new error codes.
>>>>>>
>>>>>> Thanks,
>>>>>> Gwen
>>>>>>
>>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <jo...@stealth.ly>
>>>>>> wrote:
>>>>>>> Looping in the mailing list that the client developers live on because
>>>>>> they
>>>>>>> are all not on dev (though they should be if they want to be helping
>>>>>>> to
>>>>>>> build the best client libraries they can).
>>>>>>>
>>>>>>> I whole hardily believe that we need to not break existing
>>>>>>> functionality
>>>>>> of
>>>>>>> the client protocol, ever.
>>>>>>>
>>>>>>> There are many reasons for this and we have other threads on the
>>>>>>> mailing
>>>>>>> list where we are discussing that topic (no pun intended) that I don't
>>>>>> want
>>>>>>> to re-hash here.
>>>>>>>
>>>>>>> If we change wire protocol functionality OR the binary format (either)
>>>>>>> we
>>>>>>> must bump version AND treat version as a feature flag with backward
>>>>>>> compatibility support until it is deprecated for some time for folks
>>>>>>> to
>>>>>> deal
>>>>>>> with it.
>>>>>>>
>>>>>>> match version = {
>>>>>>> case 0: keepDoingWhatWeWereDoing()
>>>>>>> case 1: doNewStuff()
>>>>>>> case 2: doEvenMoreNewStuff()
>>>>>>> }
>>>>>>>
>>>>>>> has to be a practice we adopt imho ... I know feature flags can be
>>>>>> construed
>>>>>>> as "messy code" but I am eager to hear another (better? different?)
>>>>>> solution
>>>>>>> to this.
>>>>>>>
>>>>>>> If we don't do a feature flag like this specifically with this change
>>>>>> then
>>>>>>> what happens is that someone upgrades their brokers with a rolling
>>>>>> restart
>>>>>>> in 0.8.3 and every single one of their producer requests start to fail
>>>>>> and
>>>>>>> they have a major production outage. eeeek!!!!
>>>>>>>
>>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need people to
>>>>>> start
>>>>>>> using 0,1,-1 but we need to-do that in a way that is going to work for
>>>>>>> everyone.
>>>>>>>
>>>>>>> Old producers and consumers must keep working with new brokers and if
>>>>>>> we
>>>>>> are
>>>>>>> not going to support that then I am unclear what the use of "version"
>>>>>>> is
>>>>>>> based on our original intentions of having it because of the
>>>>>>> 0.7=>-0.8.
>>>>>> We
>>>>>>> said no more breaking changes when we did that.
>>>>>>>
>>>>>>> - Joe Stein
>>>>>>>
>>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
>>>>>> ewen@confluent.io>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Right, so this looks like it could create an issue similar to what's
>>>>>>>> currently being discussed in
>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users now get
>>>>>>>> errors
>>>>>>>> under conditions when they previously wouldn't. Old clients won't
>>>>>>>> even
>>>>>>>> know
>>>>>>>> about the error code, so besides failing they won't even be able to
>>>>>>>> log
>>>>>>>> any
>>>>>>>> meaningful error messages.
>>>>>>>>
>>>>>>>> I think there are two options for compatibility:
>>>>>>>>
>>>>>>>> 1. An alternative change is to remove the ack > 1 code, but silently
>>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the same as
>>>>>>>> other
>>>>>>>> changes to behavior since the interaction between the client and
>>>>>>>> server
>>>>>>>> remains the same, no error codes change, etc. The client might just
>>>>>>>> see
>>>>>>>> some increased latency since the message might need to be replicated
>>>>>>>> to
>>>>>>>> more brokers than they requested.
>>>>>>>> 2. Split this into two patches, one that bumps the protocol version
>>>>>>>> on
>>>>>>>> that
>>>>>>>> message to include the new error code but maintains both old (now
>>>>>>>> deprecated) and new behavior, then a second that would be applied in
>>>>>>>> a
>>>>>>>> later release that removes the old protocol + code for handling acks
>>>>>> 1.
>>>>>>>>
>>>>>>>> 2 is probably the right thing to do. If we specify the release when
>>>>>> we'll
>>>>>>>> remove the deprecated protocol at the time of deprecation it makes
>>>>>> things
>>>>>>>> a
>>>>>>>> lot easier for people writing non-java clients and could give users
>>>>>> better
>>>>>>>> predictability (e.g. if clients are at most 1 major release behind
>>>>>>>> brokers,
>>>>>>>> they'll remain compatible but possibly use deprecated features).
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <gs...@cloudera.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Kafka Devs,
>>>>>>>>>
>>>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on the
>>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a heads up
>>>>>>>>> on
>>>>>>>>> what these changes mean.
>>>>>>>>>
>>>>>>>>> Once this patch is included, any produce requests that include
>>>>>>>>> request.required.acks > 1 will result in an exception. This will be
>>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
>>>>>>>>> UnknownException in existing versions (sorry, but I can't add error
>>>>>>>>> codes retroactively).
>>>>>>>>>
>>>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and
>>>>>>>>> new),
>>>>>>>>> but we expect impact on users with older producers that relied on
>>>>>>>>> acks
>>>>>>>>>> 1 and external clients (i.e python, go, etc).
>>>>>>>>>
>>>>>>>>> Users who relied on acks > 1 are expected to switch to using acks =
>>>>>>>>> -1
>>>>>>>>> and a min.isr parameter than matches their user case.
>>>>>>>>>
>>>>>>>>> This change was discussed in the past in the context of KAFKA-1555
>>>>>>>>> (min.isr), but let us know if you have any questions or concerns
>>>>>>>>> regarding this change.
>>>>>>>>>
>>>>>>>>> Gwen
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> Ewen
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups
>>>>>>> "kafka-clients" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>>>> an
>>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>>>>>> To post to this group, send email to kafka-clients@googlegroups.com.
>>>>>>> Visit this group at http://groups.google.com/group/kafka-clients.
>>>>>>> To view this discussion on the web visit
>>>>>>
>>>>>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>>>>>> .
>>>>>>>
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups
>>>>>> "kafka-clients" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>>> an
>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>>>>> To post to this group, send email to kafka-clients@googlegroups.com.
>>>>>> Visit this group at http://groups.google.com/group/kafka-clients.
>>>>>> To view this discussion on the web visit
>>>>>>
>>>>>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Ewen

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Gwen Shapira <gs...@cloudera.com>.
The errors are part of the KIP process now, so I think the clients are safe :)

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <st...@gmail.com> wrote:
> Agree errors should be part of the protocol
>
>> On Jan 15, 2015, at 17:59, Gwen Shapira <gs...@cloudera.com> wrote:
>>
>> Hi,
>>
>> I got convinced by Joe and Dana that errors are indeed part of the
>> protocol and can't be randomly added.
>>
>> So, it looks like we need to bump version of ProduceRequest in the
>> following way:
>> Version 0 -> accept acks >1. I think we should keep the existing
>> behavior too (i.e. not replace it with -1) to avoid surprising
>> clients, but I'm willing to hear other opinions.
>> Version 1 -> do not accept acks >1 and return an error.
>> Are we ok with the error I added in KAFKA-1697? We can use something
>> less specific like InvalidRequestParameter. This error can be reused
>> in the future and reduce the need to add errors, but will also be less
>> clear to the client and its users. Maybe even add the error message
>> string to the protocol in addition to the error code? (since we are
>> bumping versions....)
>>
>> I think maintaining the old version throughout 0.8.X makes sense. IMO
>> dropping it for 0.9 is feasible, but I'll let client owners help make
>> that call.
>>
>> Am I missing anything? Should I start a KIP? It seems like a KIP-type
>> discussion :)
>>
>> Gwen
>>
>>
>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>> <ew...@confluent.io> wrote:
>>> Gwen,
>>>
>>> I think the only option that wouldn't require a protocol version change is
>>> the one where acks > 1 is converted to acks = -1 since it's the only one
>>> that doesn't potentially break older clients. The protocol guide says that
>>> the expected upgrade path is servers first, then clients, so old clients,
>>> including non-java clients, that may be using acks > 1 should be able to
>>> work with a new broker version.
>>>
>>> It's more work, but I think dealing with the protocol change is the right
>>> thing to do since it eventually gets us to the behavior I think is better --
>>> the broker should reject requests with invalid values. I think Joe and I
>>> were basically in agreement. In my mind the major piece missing from his
>>> description is how long we're going to maintain his "case 0" behavior. It's
>>> impractical to maintain old versions forever, but it sounds like there
>>> hasn't been a decision on how long to maintain them. Maybe that's another
>>> item to add to KIPs -- protocol versions and behavior need to be listed as
>>> deprecated and the earliest version in which they'll be removed should be
>>> specified so users can understand which versions are guaranteed to be
>>> compatible, even if they're using (well-written) non-java clients.
>>>
>>> -Ewen
>>>
>>>
>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <da...@gmail.com> wrote:
>>>>
>>>>> clients don't break on unknown errors
>>>>
>>>> maybe true for the official java clients, but I dont think the assumption
>>>> holds true for community-maintained clients and users of those clients.
>>>> kafka-python generally follows the fail-fast philosophy and raises an
>>>> exception on any unrecognized error code in any server response.  in this
>>>> case, kafka-python allows users to set their own required-acks policy when
>>>> creating a producer instance.  It is possible that users of kafka-python
>>>> have deployed producer code that uses ack>1 -- perhaps in production
>>>> environments -- and for those users the new error code will crash their
>>>> producer code.  I would not be surprised if the same were true of other
>>>> community clients.
>>>>
>>>> *one reason for the fail-fast approach is that there isn't great
>>>> documentation on what errors to expect for each request / response -- so
>>>> we
>>>> use failures to alert that some error case is not handled properly.  and
>>>> because of that, introducing new error cases without bumping the api
>>>> version is likely to cause those errors to get raised/thrown all the way
>>>> back up to the user.  of course we (client maintainers) can fix the issues
>>>> in the client libraries and suggest users upgrade, but it's not the ideal
>>>> situation.
>>>>
>>>>
>>>> long-winded way of saying: I agree w/ Joe.
>>>>
>>>> -Dana
>>>>
>>>>
>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <gs...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Is the protocol bump caused by the behavior change or the new error
>>>>> code?
>>>>>
>>>>> 1) IMO, error_codes are data, and clients can expect to receive errors
>>>>> that they don't understand (i.e. unknown errors). AFAIK, clients don't
>>>>> break on unknown errors, they are simple more challenging to debug. If
>>>>> we document the new behavior, then its definitely debuggable and
>>>>> fixable.
>>>>>
>>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1 were
>>>>> never documented, and are not supported by Kafka clients starting with
>>>>> version 0.8.2. I'm not sure this requires a protocol bump either,
>>>>> although its a better case than new error codes.
>>>>>
>>>>> Thanks,
>>>>> Gwen
>>>>>
>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <jo...@stealth.ly>
>>>>> wrote:
>>>>>> Looping in the mailing list that the client developers live on because
>>>>> they
>>>>>> are all not on dev (though they should be if they want to be helping
>>>>>> to
>>>>>> build the best client libraries they can).
>>>>>>
>>>>>> I whole hardily believe that we need to not break existing
>>>>>> functionality
>>>>> of
>>>>>> the client protocol, ever.
>>>>>>
>>>>>> There are many reasons for this and we have other threads on the
>>>>>> mailing
>>>>>> list where we are discussing that topic (no pun intended) that I don't
>>>>> want
>>>>>> to re-hash here.
>>>>>>
>>>>>> If we change wire protocol functionality OR the binary format (either)
>>>>>> we
>>>>>> must bump version AND treat version as a feature flag with backward
>>>>>> compatibility support until it is deprecated for some time for folks
>>>>>> to
>>>>> deal
>>>>>> with it.
>>>>>>
>>>>>> match version = {
>>>>>> case 0: keepDoingWhatWeWereDoing()
>>>>>> case 1: doNewStuff()
>>>>>> case 2: doEvenMoreNewStuff()
>>>>>> }
>>>>>>
>>>>>> has to be a practice we adopt imho ... I know feature flags can be
>>>>> construed
>>>>>> as "messy code" but I am eager to hear another (better? different?)
>>>>> solution
>>>>>> to this.
>>>>>>
>>>>>> If we don't do a feature flag like this specifically with this change
>>>>> then
>>>>>> what happens is that someone upgrades their brokers with a rolling
>>>>> restart
>>>>>> in 0.8.3 and every single one of their producer requests start to fail
>>>>> and
>>>>>> they have a major production outage. eeeek!!!!
>>>>>>
>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need people to
>>>>> start
>>>>>> using 0,1,-1 but we need to-do that in a way that is going to work for
>>>>>> everyone.
>>>>>>
>>>>>> Old producers and consumers must keep working with new brokers and if
>>>>>> we
>>>>> are
>>>>>> not going to support that then I am unclear what the use of "version"
>>>>>> is
>>>>>> based on our original intentions of having it because of the
>>>>>> 0.7=>-0.8.
>>>>> We
>>>>>> said no more breaking changes when we did that.
>>>>>>
>>>>>> - Joe Stein
>>>>>>
>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
>>>>> ewen@confluent.io>
>>>>>> wrote:
>>>>>>>
>>>>>>> Right, so this looks like it could create an issue similar to what's
>>>>>>> currently being discussed in
>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users now get
>>>>>>> errors
>>>>>>> under conditions when they previously wouldn't. Old clients won't
>>>>>>> even
>>>>>>> know
>>>>>>> about the error code, so besides failing they won't even be able to
>>>>>>> log
>>>>>>> any
>>>>>>> meaningful error messages.
>>>>>>>
>>>>>>> I think there are two options for compatibility:
>>>>>>>
>>>>>>> 1. An alternative change is to remove the ack > 1 code, but silently
>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the same as
>>>>>>> other
>>>>>>> changes to behavior since the interaction between the client and
>>>>>>> server
>>>>>>> remains the same, no error codes change, etc. The client might just
>>>>>>> see
>>>>>>> some increased latency since the message might need to be replicated
>>>>>>> to
>>>>>>> more brokers than they requested.
>>>>>>> 2. Split this into two patches, one that bumps the protocol version
>>>>>>> on
>>>>>>> that
>>>>>>> message to include the new error code but maintains both old (now
>>>>>>> deprecated) and new behavior, then a second that would be applied in
>>>>>>> a
>>>>>>> later release that removes the old protocol + code for handling acks
>>>>> 1.
>>>>>>>
>>>>>>> 2 is probably the right thing to do. If we specify the release when
>>>>> we'll
>>>>>>> remove the deprecated protocol at the time of deprecation it makes
>>>>> things
>>>>>>> a
>>>>>>> lot easier for people writing non-java clients and could give users
>>>>> better
>>>>>>> predictability (e.g. if clients are at most 1 major release behind
>>>>>>> brokers,
>>>>>>> they'll remain compatible but possibly use deprecated features).
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <gs...@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Kafka Devs,
>>>>>>>>
>>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on the
>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a heads up
>>>>>>>> on
>>>>>>>> what these changes mean.
>>>>>>>>
>>>>>>>> Once this patch is included, any produce requests that include
>>>>>>>> request.required.acks > 1 will result in an exception. This will be
>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
>>>>>>>> UnknownException in existing versions (sorry, but I can't add error
>>>>>>>> codes retroactively).
>>>>>>>>
>>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and
>>>>>>>> new),
>>>>>>>> but we expect impact on users with older producers that relied on
>>>>>>>> acks
>>>>>>>>> 1 and external clients (i.e python, go, etc).
>>>>>>>>
>>>>>>>> Users who relied on acks > 1 are expected to switch to using acks =
>>>>>>>> -1
>>>>>>>> and a min.isr parameter than matches their user case.
>>>>>>>>
>>>>>>>> This change was discussed in the past in the context of KAFKA-1555
>>>>>>>> (min.isr), but let us know if you have any questions or concerns
>>>>>>>> regarding this change.
>>>>>>>>
>>>>>>>> Gwen
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks,
>>>>>>> Ewen
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups
>>>>>> "kafka-clients" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>>> an
>>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>>>>> To post to this group, send email to kafka-clients@googlegroups.com.
>>>>>> Visit this group at http://groups.google.com/group/kafka-clients.
>>>>>> To view this discussion on the web visit
>>>>>
>>>>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups
>>>>> "kafka-clients" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an
>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>>>> To post to this group, send email to kafka-clients@googlegroups.com.
>>>>> Visit this group at http://groups.google.com/group/kafka-clients.
>>>>> To view this discussion on the web visit
>>>>>
>>>>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Ewen

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Steve Morin <st...@gmail.com>.
Agree errors should be part of the protocol

> On Jan 15, 2015, at 17:59, Gwen Shapira <gs...@cloudera.com> wrote:
> 
> Hi,
> 
> I got convinced by Joe and Dana that errors are indeed part of the
> protocol and can't be randomly added.
> 
> So, it looks like we need to bump version of ProduceRequest in the
> following way:
> Version 0 -> accept acks >1. I think we should keep the existing
> behavior too (i.e. not replace it with -1) to avoid surprising
> clients, but I'm willing to hear other opinions.
> Version 1 -> do not accept acks >1 and return an error.
> Are we ok with the error I added in KAFKA-1697? We can use something
> less specific like InvalidRequestParameter. This error can be reused
> in the future and reduce the need to add errors, but will also be less
> clear to the client and its users. Maybe even add the error message
> string to the protocol in addition to the error code? (since we are
> bumping versions....)
> 
> I think maintaining the old version throughout 0.8.X makes sense. IMO
> dropping it for 0.9 is feasible, but I'll let client owners help make
> that call.
> 
> Am I missing anything? Should I start a KIP? It seems like a KIP-type
> discussion :)
> 
> Gwen
> 
> 
> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
> <ew...@confluent.io> wrote:
>> Gwen,
>> 
>> I think the only option that wouldn't require a protocol version change is
>> the one where acks > 1 is converted to acks = -1 since it's the only one
>> that doesn't potentially break older clients. The protocol guide says that
>> the expected upgrade path is servers first, then clients, so old clients,
>> including non-java clients, that may be using acks > 1 should be able to
>> work with a new broker version.
>> 
>> It's more work, but I think dealing with the protocol change is the right
>> thing to do since it eventually gets us to the behavior I think is better --
>> the broker should reject requests with invalid values. I think Joe and I
>> were basically in agreement. In my mind the major piece missing from his
>> description is how long we're going to maintain his "case 0" behavior. It's
>> impractical to maintain old versions forever, but it sounds like there
>> hasn't been a decision on how long to maintain them. Maybe that's another
>> item to add to KIPs -- protocol versions and behavior need to be listed as
>> deprecated and the earliest version in which they'll be removed should be
>> specified so users can understand which versions are guaranteed to be
>> compatible, even if they're using (well-written) non-java clients.
>> 
>> -Ewen
>> 
>> 
>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <da...@gmail.com> wrote:
>>> 
>>>> clients don't break on unknown errors
>>> 
>>> maybe true for the official java clients, but I dont think the assumption
>>> holds true for community-maintained clients and users of those clients.
>>> kafka-python generally follows the fail-fast philosophy and raises an
>>> exception on any unrecognized error code in any server response.  in this
>>> case, kafka-python allows users to set their own required-acks policy when
>>> creating a producer instance.  It is possible that users of kafka-python
>>> have deployed producer code that uses ack>1 -- perhaps in production
>>> environments -- and for those users the new error code will crash their
>>> producer code.  I would not be surprised if the same were true of other
>>> community clients.
>>> 
>>> *one reason for the fail-fast approach is that there isn't great
>>> documentation on what errors to expect for each request / response -- so
>>> we
>>> use failures to alert that some error case is not handled properly.  and
>>> because of that, introducing new error cases without bumping the api
>>> version is likely to cause those errors to get raised/thrown all the way
>>> back up to the user.  of course we (client maintainers) can fix the issues
>>> in the client libraries and suggest users upgrade, but it's not the ideal
>>> situation.
>>> 
>>> 
>>> long-winded way of saying: I agree w/ Joe.
>>> 
>>> -Dana
>>> 
>>> 
>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <gs...@cloudera.com>
>>> wrote:
>>> 
>>>> Is the protocol bump caused by the behavior change or the new error
>>>> code?
>>>> 
>>>> 1) IMO, error_codes are data, and clients can expect to receive errors
>>>> that they don't understand (i.e. unknown errors). AFAIK, clients don't
>>>> break on unknown errors, they are simple more challenging to debug. If
>>>> we document the new behavior, then its definitely debuggable and
>>>> fixable.
>>>> 
>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1 were
>>>> never documented, and are not supported by Kafka clients starting with
>>>> version 0.8.2. I'm not sure this requires a protocol bump either,
>>>> although its a better case than new error codes.
>>>> 
>>>> Thanks,
>>>> Gwen
>>>> 
>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <jo...@stealth.ly>
>>>> wrote:
>>>>> Looping in the mailing list that the client developers live on because
>>>> they
>>>>> are all not on dev (though they should be if they want to be helping
>>>>> to
>>>>> build the best client libraries they can).
>>>>> 
>>>>> I whole hardily believe that we need to not break existing
>>>>> functionality
>>>> of
>>>>> the client protocol, ever.
>>>>> 
>>>>> There are many reasons for this and we have other threads on the
>>>>> mailing
>>>>> list where we are discussing that topic (no pun intended) that I don't
>>>> want
>>>>> to re-hash here.
>>>>> 
>>>>> If we change wire protocol functionality OR the binary format (either)
>>>>> we
>>>>> must bump version AND treat version as a feature flag with backward
>>>>> compatibility support until it is deprecated for some time for folks
>>>>> to
>>>> deal
>>>>> with it.
>>>>> 
>>>>> match version = {
>>>>> case 0: keepDoingWhatWeWereDoing()
>>>>> case 1: doNewStuff()
>>>>> case 2: doEvenMoreNewStuff()
>>>>> }
>>>>> 
>>>>> has to be a practice we adopt imho ... I know feature flags can be
>>>> construed
>>>>> as "messy code" but I am eager to hear another (better? different?)
>>>> solution
>>>>> to this.
>>>>> 
>>>>> If we don't do a feature flag like this specifically with this change
>>>> then
>>>>> what happens is that someone upgrades their brokers with a rolling
>>>> restart
>>>>> in 0.8.3 and every single one of their producer requests start to fail
>>>> and
>>>>> they have a major production outage. eeeek!!!!
>>>>> 
>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need people to
>>>> start
>>>>> using 0,1,-1 but we need to-do that in a way that is going to work for
>>>>> everyone.
>>>>> 
>>>>> Old producers and consumers must keep working with new brokers and if
>>>>> we
>>>> are
>>>>> not going to support that then I am unclear what the use of "version"
>>>>> is
>>>>> based on our original intentions of having it because of the
>>>>> 0.7=>-0.8.
>>>> We
>>>>> said no more breaking changes when we did that.
>>>>> 
>>>>> - Joe Stein
>>>>> 
>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
>>>> ewen@confluent.io>
>>>>> wrote:
>>>>>> 
>>>>>> Right, so this looks like it could create an issue similar to what's
>>>>>> currently being discussed in
>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users now get
>>>>>> errors
>>>>>> under conditions when they previously wouldn't. Old clients won't
>>>>>> even
>>>>>> know
>>>>>> about the error code, so besides failing they won't even be able to
>>>>>> log
>>>>>> any
>>>>>> meaningful error messages.
>>>>>> 
>>>>>> I think there are two options for compatibility:
>>>>>> 
>>>>>> 1. An alternative change is to remove the ack > 1 code, but silently
>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the same as
>>>>>> other
>>>>>> changes to behavior since the interaction between the client and
>>>>>> server
>>>>>> remains the same, no error codes change, etc. The client might just
>>>>>> see
>>>>>> some increased latency since the message might need to be replicated
>>>>>> to
>>>>>> more brokers than they requested.
>>>>>> 2. Split this into two patches, one that bumps the protocol version
>>>>>> on
>>>>>> that
>>>>>> message to include the new error code but maintains both old (now
>>>>>> deprecated) and new behavior, then a second that would be applied in
>>>>>> a
>>>>>> later release that removes the old protocol + code for handling acks
>>>> 1.
>>>>>> 
>>>>>> 2 is probably the right thing to do. If we specify the release when
>>>> we'll
>>>>>> remove the deprecated protocol at the time of deprecation it makes
>>>> things
>>>>>> a
>>>>>> lot easier for people writing non-java clients and could give users
>>>> better
>>>>>> predictability (e.g. if clients are at most 1 major release behind
>>>>>> brokers,
>>>>>> they'll remain compatible but possibly use deprecated features).
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <gs...@cloudera.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Kafka Devs,
>>>>>>> 
>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on the
>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a heads up
>>>>>>> on
>>>>>>> what these changes mean.
>>>>>>> 
>>>>>>> Once this patch is included, any produce requests that include
>>>>>>> request.required.acks > 1 will result in an exception. This will be
>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
>>>>>>> UnknownException in existing versions (sorry, but I can't add error
>>>>>>> codes retroactively).
>>>>>>> 
>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and
>>>>>>> new),
>>>>>>> but we expect impact on users with older producers that relied on
>>>>>>> acks
>>>>>>>> 1 and external clients (i.e python, go, etc).
>>>>>>> 
>>>>>>> Users who relied on acks > 1 are expected to switch to using acks =
>>>>>>> -1
>>>>>>> and a min.isr parameter than matches their user case.
>>>>>>> 
>>>>>>> This change was discussed in the past in the context of KAFKA-1555
>>>>>>> (min.isr), but let us know if you have any questions or concerns
>>>>>>> regarding this change.
>>>>>>> 
>>>>>>> Gwen
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Thanks,
>>>>>> Ewen
>>>>> 
>>>>> 
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups
>>>>> "kafka-clients" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an
>>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>>>> To post to this group, send email to kafka-clients@googlegroups.com.
>>>>> Visit this group at http://groups.google.com/group/kafka-clients.
>>>>> To view this discussion on the web visit
>>>> 
>>>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>>>> .
>>>>> 
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>> 
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups
>>>> "kafka-clients" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an
>>>> email to kafka-clients+unsubscribe@googlegroups.com.
>>>> To post to this group, send email to kafka-clients@googlegroups.com.
>>>> Visit this group at http://groups.google.com/group/kafka-clients.
>>>> To view this discussion on the web visit
>>>> 
>>>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> 
>> 
>> 
>> --
>> Thanks,
>> Ewen

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Gwen Shapira <gs...@cloudera.com>.
Hi,

I got convinced by Joe and Dana that errors are indeed part of the
protocol and can't be randomly added.

So, it looks like we need to bump version of ProduceRequest in the
following way:
Version 0 -> accept acks >1. I think we should keep the existing
behavior too (i.e. not replace it with -1) to avoid surprising
clients, but I'm willing to hear other opinions.
Version 1 -> do not accept acks >1 and return an error.
Are we ok with the error I added in KAFKA-1697? We can use something
less specific like InvalidRequestParameter. This error can be reused
in the future and reduce the need to add errors, but will also be less
clear to the client and its users. Maybe even add the error message
string to the protocol in addition to the error code? (since we are
bumping versions....)

I think maintaining the old version throughout 0.8.X makes sense. IMO
dropping it for 0.9 is feasible, but I'll let client owners help make
that call.

Am I missing anything? Should I start a KIP? It seems like a KIP-type
discussion :)

Gwen


On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
<ew...@confluent.io> wrote:
> Gwen,
>
> I think the only option that wouldn't require a protocol version change is
> the one where acks > 1 is converted to acks = -1 since it's the only one
> that doesn't potentially break older clients. The protocol guide says that
> the expected upgrade path is servers first, then clients, so old clients,
> including non-java clients, that may be using acks > 1 should be able to
> work with a new broker version.
>
> It's more work, but I think dealing with the protocol change is the right
> thing to do since it eventually gets us to the behavior I think is better --
> the broker should reject requests with invalid values. I think Joe and I
> were basically in agreement. In my mind the major piece missing from his
> description is how long we're going to maintain his "case 0" behavior. It's
> impractical to maintain old versions forever, but it sounds like there
> hasn't been a decision on how long to maintain them. Maybe that's another
> item to add to KIPs -- protocol versions and behavior need to be listed as
> deprecated and the earliest version in which they'll be removed should be
> specified so users can understand which versions are guaranteed to be
> compatible, even if they're using (well-written) non-java clients.
>
> -Ewen
>
>
> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <da...@gmail.com> wrote:
>>
>> > clients don't break on unknown errors
>>
>> maybe true for the official java clients, but I dont think the assumption
>> holds true for community-maintained clients and users of those clients.
>>  kafka-python generally follows the fail-fast philosophy and raises an
>> exception on any unrecognized error code in any server response.  in this
>> case, kafka-python allows users to set their own required-acks policy when
>> creating a producer instance.  It is possible that users of kafka-python
>> have deployed producer code that uses ack>1 -- perhaps in production
>> environments -- and for those users the new error code will crash their
>> producer code.  I would not be surprised if the same were true of other
>> community clients.
>>
>> *one reason for the fail-fast approach is that there isn't great
>> documentation on what errors to expect for each request / response -- so
>> we
>> use failures to alert that some error case is not handled properly.  and
>> because of that, introducing new error cases without bumping the api
>> version is likely to cause those errors to get raised/thrown all the way
>> back up to the user.  of course we (client maintainers) can fix the issues
>> in the client libraries and suggest users upgrade, but it's not the ideal
>> situation.
>>
>>
>> long-winded way of saying: I agree w/ Joe.
>>
>> -Dana
>>
>>
>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <gs...@cloudera.com>
>> wrote:
>>
>> > Is the protocol bump caused by the behavior change or the new error
>> > code?
>> >
>> > 1) IMO, error_codes are data, and clients can expect to receive errors
>> > that they don't understand (i.e. unknown errors). AFAIK, clients don't
>> > break on unknown errors, they are simple more challenging to debug. If
>> > we document the new behavior, then its definitely debuggable and
>> > fixable.
>> >
>> > 2) The behavior change is basically a deprecation - i.e. acks > 1 were
>> > never documented, and are not supported by Kafka clients starting with
>> > version 0.8.2. I'm not sure this requires a protocol bump either,
>> > although its a better case than new error codes.
>> >
>> > Thanks,
>> > Gwen
>> >
>> > On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <jo...@stealth.ly>
>> > wrote:
>> > > Looping in the mailing list that the client developers live on because
>> > they
>> > > are all not on dev (though they should be if they want to be helping
>> > > to
>> > > build the best client libraries they can).
>> > >
>> > > I whole hardily believe that we need to not break existing
>> > > functionality
>> > of
>> > > the client protocol, ever.
>> > >
>> > > There are many reasons for this and we have other threads on the
>> > > mailing
>> > > list where we are discussing that topic (no pun intended) that I don't
>> > want
>> > > to re-hash here.
>> > >
>> > > If we change wire protocol functionality OR the binary format (either)
>> > > we
>> > > must bump version AND treat version as a feature flag with backward
>> > > compatibility support until it is deprecated for some time for folks
>> > > to
>> > deal
>> > > with it.
>> > >
>> > > match version = {
>> > > case 0: keepDoingWhatWeWereDoing()
>> > > case 1: doNewStuff()
>> > > case 2: doEvenMoreNewStuff()
>> > > }
>> > >
>> > > has to be a practice we adopt imho ... I know feature flags can be
>> > construed
>> > > as "messy code" but I am eager to hear another (better? different?)
>> > solution
>> > > to this.
>> > >
>> > > If we don't do a feature flag like this specifically with this change
>> > then
>> > > what happens is that someone upgrades their brokers with a rolling
>> > restart
>> > > in 0.8.3 and every single one of their producer requests start to fail
>> > and
>> > > they have a major production outage. eeeek!!!!
>> > >
>> > > I do 100% agree that > 1 makes no sense and we *REALLY* need people to
>> > start
>> > > using 0,1,-1 but we need to-do that in a way that is going to work for
>> > > everyone.
>> > >
>> > > Old producers and consumers must keep working with new brokers and if
>> > > we
>> > are
>> > > not going to support that then I am unclear what the use of "version"
>> > > is
>> > > based on our original intentions of having it because of the
>> > > 0.7=>-0.8.
>> > We
>> > > said no more breaking changes when we did that.
>> > >
>> > > - Joe Stein
>> > >
>> > > On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
>> > ewen@confluent.io>
>> > > wrote:
>> > >>
>> > >> Right, so this looks like it could create an issue similar to what's
>> > >> currently being discussed in
>> > >> https://issues.apache.org/jira/browse/KAFKA-1649 where users now get
>> > >> errors
>> > >> under conditions when they previously wouldn't. Old clients won't
>> > >> even
>> > >> know
>> > >> about the error code, so besides failing they won't even be able to
>> > >> log
>> > >> any
>> > >> meaningful error messages.
>> > >>
>> > >> I think there are two options for compatibility:
>> > >>
>> > >> 1. An alternative change is to remove the ack > 1 code, but silently
>> > >> "upgrade" requests with acks > 1 to acks = -1. This isn't the same as
>> > >> other
>> > >> changes to behavior since the interaction between the client and
>> > >> server
>> > >> remains the same, no error codes change, etc. The client might just
>> > >> see
>> > >> some increased latency since the message might need to be replicated
>> > >> to
>> > >> more brokers than they requested.
>> > >> 2. Split this into two patches, one that bumps the protocol version
>> > >> on
>> > >> that
>> > >> message to include the new error code but maintains both old (now
>> > >> deprecated) and new behavior, then a second that would be applied in
>> > >> a
>> > >> later release that removes the old protocol + code for handling acks
>> > >> >
>> > 1.
>> > >>
>> > >> 2 is probably the right thing to do. If we specify the release when
>> > we'll
>> > >> remove the deprecated protocol at the time of deprecation it makes
>> > things
>> > >> a
>> > >> lot easier for people writing non-java clients and could give users
>> > better
>> > >> predictability (e.g. if clients are at most 1 major release behind
>> > >> brokers,
>> > >> they'll remain compatible but possibly use deprecated features).
>> > >>
>> > >>
>> > >> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <gs...@cloudera.com>
>> > >> wrote:
>> > >>
>> > >> > Hi Kafka Devs,
>> > >> >
>> > >> > We are working on KAFKA-1697 - remove code related to ack>1 on the
>> > >> > broker. Per Neha's suggestion, I'd like to give everyone a heads up
>> > >> > on
>> > >> > what these changes mean.
>> > >> >
>> > >> > Once this patch is included, any produce requests that include
>> > >> > request.required.acks > 1 will result in an exception. This will be
>> > >> > InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
>> > >> > UnknownException in existing versions (sorry, but I can't add error
>> > >> > codes retroactively).
>> > >> >
>> > >> > This behavior is already enforced by 0.8.2 producers (sync and
>> > >> > new),
>> > >> > but we expect impact on users with older producers that relied on
>> > >> > acks
>> > >> > > 1 and external clients (i.e python, go, etc).
>> > >> >
>> > >> > Users who relied on acks > 1 are expected to switch to using acks =
>> > >> > -1
>> > >> > and a min.isr parameter than matches their user case.
>> > >> >
>> > >> > This change was discussed in the past in the context of KAFKA-1555
>> > >> > (min.isr), but let us know if you have any questions or concerns
>> > >> > regarding this change.
>> > >> >
>> > >> > Gwen
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Thanks,
>> > >> Ewen
>> > >
>> > >
>> > > --
>> > > You received this message because you are subscribed to the Google
>> > > Groups
>> > > "kafka-clients" group.
>> > > To unsubscribe from this group and stop receiving emails from it, send
>> > > an
>> > > email to kafka-clients+unsubscribe@googlegroups.com.
>> > > To post to this group, send email to kafka-clients@googlegroups.com.
>> > > Visit this group at http://groups.google.com/group/kafka-clients.
>> > > To view this discussion on the web visit
>> > >
>> >
>> > https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>> > .
>> > >
>> > > For more options, visit https://groups.google.com/d/optout.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "kafka-clients" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to kafka-clients+unsubscribe@googlegroups.com.
>> > To post to this group, send email to kafka-clients@googlegroups.com.
>> > Visit this group at http://groups.google.com/group/kafka-clients.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>> > .
>> > For more options, visit https://groups.google.com/d/optout.
>> >
>
>
>
>
> --
> Thanks,
> Ewen

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
Gwen,

I think the only option that wouldn't require a protocol version change is
the one where acks > 1 is converted to acks = -1 since it's the only one
that doesn't potentially break older clients. The protocol guide says that
the expected upgrade path is servers first, then clients, so old clients,
including non-java clients, that may be using acks > 1 should be able to
work with a new broker version.

It's more work, but I think dealing with the protocol change is the right
thing to do since it eventually gets us to the behavior I think is better
-- the broker should reject requests with invalid values. I think Joe and I
were basically in agreement. In my mind the major piece missing from his
description is how long we're going to maintain his "case 0" behavior. It's
impractical to maintain old versions forever, but it sounds like there
hasn't been a decision on how long to maintain them. Maybe that's another
item to add to KIPs -- protocol versions and behavior need to be listed as
deprecated and the earliest version in which they'll be removed should be
specified so users can understand which versions are guaranteed to be
compatible, even if they're using (well-written) non-java clients.

-Ewen


On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <da...@gmail.com> wrote:

> > clients don't break on unknown errors
>
> maybe true for the official java clients, but I dont think the assumption
> holds true for community-maintained clients and users of those clients.
>  kafka-python generally follows the fail-fast philosophy and raises an
> exception on any unrecognized error code in any server response.  in this
> case, kafka-python allows users to set their own required-acks policy when
> creating a producer instance.  It is possible that users of kafka-python
> have deployed producer code that uses ack>1 -- perhaps in production
> environments -- and for those users the new error code will crash their
> producer code.  I would not be surprised if the same were true of other
> community clients.
>
> *one reason for the fail-fast approach is that there isn't great
> documentation on what errors to expect for each request / response -- so we
> use failures to alert that some error case is not handled properly.  and
> because of that, introducing new error cases without bumping the api
> version is likely to cause those errors to get raised/thrown all the way
> back up to the user.  of course we (client maintainers) can fix the issues
> in the client libraries and suggest users upgrade, but it's not the ideal
> situation.
>
>
> long-winded way of saying: I agree w/ Joe.
>
> -Dana
>
>
> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
> > Is the protocol bump caused by the behavior change or the new error code?
> >
> > 1) IMO, error_codes are data, and clients can expect to receive errors
> > that they don't understand (i.e. unknown errors). AFAIK, clients don't
> > break on unknown errors, they are simple more challenging to debug. If
> > we document the new behavior, then its definitely debuggable and
> > fixable.
> >
> > 2) The behavior change is basically a deprecation - i.e. acks > 1 were
> > never documented, and are not supported by Kafka clients starting with
> > version 0.8.2. I'm not sure this requires a protocol bump either,
> > although its a better case than new error codes.
> >
> > Thanks,
> > Gwen
> >
> > On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <jo...@stealth.ly>
> wrote:
> > > Looping in the mailing list that the client developers live on because
> > they
> > > are all not on dev (though they should be if they want to be helping to
> > > build the best client libraries they can).
> > >
> > > I whole hardily believe that we need to not break existing
> functionality
> > of
> > > the client protocol, ever.
> > >
> > > There are many reasons for this and we have other threads on the
> mailing
> > > list where we are discussing that topic (no pun intended) that I don't
> > want
> > > to re-hash here.
> > >
> > > If we change wire protocol functionality OR the binary format (either)
> we
> > > must bump version AND treat version as a feature flag with backward
> > > compatibility support until it is deprecated for some time for folks to
> > deal
> > > with it.
> > >
> > > match version = {
> > > case 0: keepDoingWhatWeWereDoing()
> > > case 1: doNewStuff()
> > > case 2: doEvenMoreNewStuff()
> > > }
> > >
> > > has to be a practice we adopt imho ... I know feature flags can be
> > construed
> > > as "messy code" but I am eager to hear another (better? different?)
> > solution
> > > to this.
> > >
> > > If we don't do a feature flag like this specifically with this change
> > then
> > > what happens is that someone upgrades their brokers with a rolling
> > restart
> > > in 0.8.3 and every single one of their producer requests start to fail
> > and
> > > they have a major production outage. eeeek!!!!
> > >
> > > I do 100% agree that > 1 makes no sense and we *REALLY* need people to
> > start
> > > using 0,1,-1 but we need to-do that in a way that is going to work for
> > > everyone.
> > >
> > > Old producers and consumers must keep working with new brokers and if
> we
> > are
> > > not going to support that then I am unclear what the use of "version"
> is
> > > based on our original intentions of having it because of the 0.7=>-0.8.
> > We
> > > said no more breaking changes when we did that.
> > >
> > > - Joe Stein
> > >
> > > On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
> > ewen@confluent.io>
> > > wrote:
> > >>
> > >> Right, so this looks like it could create an issue similar to what's
> > >> currently being discussed in
> > >> https://issues.apache.org/jira/browse/KAFKA-1649 where users now get
> > >> errors
> > >> under conditions when they previously wouldn't. Old clients won't even
> > >> know
> > >> about the error code, so besides failing they won't even be able to
> log
> > >> any
> > >> meaningful error messages.
> > >>
> > >> I think there are two options for compatibility:
> > >>
> > >> 1. An alternative change is to remove the ack > 1 code, but silently
> > >> "upgrade" requests with acks > 1 to acks = -1. This isn't the same as
> > >> other
> > >> changes to behavior since the interaction between the client and
> server
> > >> remains the same, no error codes change, etc. The client might just
> see
> > >> some increased latency since the message might need to be replicated
> to
> > >> more brokers than they requested.
> > >> 2. Split this into two patches, one that bumps the protocol version on
> > >> that
> > >> message to include the new error code but maintains both old (now
> > >> deprecated) and new behavior, then a second that would be applied in a
> > >> later release that removes the old protocol + code for handling acks >
> > 1.
> > >>
> > >> 2 is probably the right thing to do. If we specify the release when
> > we'll
> > >> remove the deprecated protocol at the time of deprecation it makes
> > things
> > >> a
> > >> lot easier for people writing non-java clients and could give users
> > better
> > >> predictability (e.g. if clients are at most 1 major release behind
> > >> brokers,
> > >> they'll remain compatible but possibly use deprecated features).
> > >>
> > >>
> > >> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <gs...@cloudera.com>
> > >> wrote:
> > >>
> > >> > Hi Kafka Devs,
> > >> >
> > >> > We are working on KAFKA-1697 - remove code related to ack>1 on the
> > >> > broker. Per Neha's suggestion, I'd like to give everyone a heads up
> on
> > >> > what these changes mean.
> > >> >
> > >> > Once this patch is included, any produce requests that include
> > >> > request.required.acks > 1 will result in an exception. This will be
> > >> > InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
> > >> > UnknownException in existing versions (sorry, but I can't add error
> > >> > codes retroactively).
> > >> >
> > >> > This behavior is already enforced by 0.8.2 producers (sync and new),
> > >> > but we expect impact on users with older producers that relied on
> acks
> > >> > > 1 and external clients (i.e python, go, etc).
> > >> >
> > >> > Users who relied on acks > 1 are expected to switch to using acks =
> -1
> > >> > and a min.isr parameter than matches their user case.
> > >> >
> > >> > This change was discussed in the past in the context of KAFKA-1555
> > >> > (min.isr), but let us know if you have any questions or concerns
> > >> > regarding this change.
> > >> >
> > >> > Gwen
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Thanks,
> > >> Ewen
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "kafka-clients" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> an
> > > email to kafka-clients+unsubscribe@googlegroups.com.
> > > To post to this group, send email to kafka-clients@googlegroups.com.
> > > Visit this group at http://groups.google.com/group/kafka-clients.
> > > To view this discussion on the web visit
> > >
> >
> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
> > .
> > >
> > > For more options, visit https://groups.google.com/d/optout.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "kafka-clients" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to kafka-clients+unsubscribe@googlegroups.com.
> > To post to this group, send email to kafka-clients@googlegroups.com.
> > Visit this group at http://groups.google.com/group/kafka-clients.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
> > .
> > For more options, visit https://groups.google.com/d/optout.
> >
>



-- 
Thanks,
Ewen

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Dana Powers <da...@gmail.com>.
> clients don't break on unknown errors

maybe true for the official java clients, but I dont think the assumption
holds true for community-maintained clients and users of those clients.
 kafka-python generally follows the fail-fast philosophy and raises an
exception on any unrecognized error code in any server response.  in this
case, kafka-python allows users to set their own required-acks policy when
creating a producer instance.  It is possible that users of kafka-python
have deployed producer code that uses ack>1 -- perhaps in production
environments -- and for those users the new error code will crash their
producer code.  I would not be surprised if the same were true of other
community clients.

*one reason for the fail-fast approach is that there isn't great
documentation on what errors to expect for each request / response -- so we
use failures to alert that some error case is not handled properly.  and
because of that, introducing new error cases without bumping the api
version is likely to cause those errors to get raised/thrown all the way
back up to the user.  of course we (client maintainers) can fix the issues
in the client libraries and suggest users upgrade, but it's not the ideal
situation.


long-winded way of saying: I agree w/ Joe.

-Dana


On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <gs...@cloudera.com>
wrote:

> Is the protocol bump caused by the behavior change or the new error code?
>
> 1) IMO, error_codes are data, and clients can expect to receive errors
> that they don't understand (i.e. unknown errors). AFAIK, clients don't
> break on unknown errors, they are simple more challenging to debug. If
> we document the new behavior, then its definitely debuggable and
> fixable.
>
> 2) The behavior change is basically a deprecation - i.e. acks > 1 were
> never documented, and are not supported by Kafka clients starting with
> version 0.8.2. I'm not sure this requires a protocol bump either,
> although its a better case than new error codes.
>
> Thanks,
> Gwen
>
> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <jo...@stealth.ly> wrote:
> > Looping in the mailing list that the client developers live on because
> they
> > are all not on dev (though they should be if they want to be helping to
> > build the best client libraries they can).
> >
> > I whole hardily believe that we need to not break existing functionality
> of
> > the client protocol, ever.
> >
> > There are many reasons for this and we have other threads on the mailing
> > list where we are discussing that topic (no pun intended) that I don't
> want
> > to re-hash here.
> >
> > If we change wire protocol functionality OR the binary format (either) we
> > must bump version AND treat version as a feature flag with backward
> > compatibility support until it is deprecated for some time for folks to
> deal
> > with it.
> >
> > match version = {
> > case 0: keepDoingWhatWeWereDoing()
> > case 1: doNewStuff()
> > case 2: doEvenMoreNewStuff()
> > }
> >
> > has to be a practice we adopt imho ... I know feature flags can be
> construed
> > as "messy code" but I am eager to hear another (better? different?)
> solution
> > to this.
> >
> > If we don't do a feature flag like this specifically with this change
> then
> > what happens is that someone upgrades their brokers with a rolling
> restart
> > in 0.8.3 and every single one of their producer requests start to fail
> and
> > they have a major production outage. eeeek!!!!
> >
> > I do 100% agree that > 1 makes no sense and we *REALLY* need people to
> start
> > using 0,1,-1 but we need to-do that in a way that is going to work for
> > everyone.
> >
> > Old producers and consumers must keep working with new brokers and if we
> are
> > not going to support that then I am unclear what the use of "version" is
> > based on our original intentions of having it because of the 0.7=>-0.8.
> We
> > said no more breaking changes when we did that.
> >
> > - Joe Stein
> >
> > On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
> ewen@confluent.io>
> > wrote:
> >>
> >> Right, so this looks like it could create an issue similar to what's
> >> currently being discussed in
> >> https://issues.apache.org/jira/browse/KAFKA-1649 where users now get
> >> errors
> >> under conditions when they previously wouldn't. Old clients won't even
> >> know
> >> about the error code, so besides failing they won't even be able to log
> >> any
> >> meaningful error messages.
> >>
> >> I think there are two options for compatibility:
> >>
> >> 1. An alternative change is to remove the ack > 1 code, but silently
> >> "upgrade" requests with acks > 1 to acks = -1. This isn't the same as
> >> other
> >> changes to behavior since the interaction between the client and server
> >> remains the same, no error codes change, etc. The client might just see
> >> some increased latency since the message might need to be replicated to
> >> more brokers than they requested.
> >> 2. Split this into two patches, one that bumps the protocol version on
> >> that
> >> message to include the new error code but maintains both old (now
> >> deprecated) and new behavior, then a second that would be applied in a
> >> later release that removes the old protocol + code for handling acks >
> 1.
> >>
> >> 2 is probably the right thing to do. If we specify the release when
> we'll
> >> remove the deprecated protocol at the time of deprecation it makes
> things
> >> a
> >> lot easier for people writing non-java clients and could give users
> better
> >> predictability (e.g. if clients are at most 1 major release behind
> >> brokers,
> >> they'll remain compatible but possibly use deprecated features).
> >>
> >>
> >> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <gs...@cloudera.com>
> >> wrote:
> >>
> >> > Hi Kafka Devs,
> >> >
> >> > We are working on KAFKA-1697 - remove code related to ack>1 on the
> >> > broker. Per Neha's suggestion, I'd like to give everyone a heads up on
> >> > what these changes mean.
> >> >
> >> > Once this patch is included, any produce requests that include
> >> > request.required.acks > 1 will result in an exception. This will be
> >> > InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
> >> > UnknownException in existing versions (sorry, but I can't add error
> >> > codes retroactively).
> >> >
> >> > This behavior is already enforced by 0.8.2 producers (sync and new),
> >> > but we expect impact on users with older producers that relied on acks
> >> > > 1 and external clients (i.e python, go, etc).
> >> >
> >> > Users who relied on acks > 1 are expected to switch to using acks = -1
> >> > and a min.isr parameter than matches their user case.
> >> >
> >> > This change was discussed in the past in the context of KAFKA-1555
> >> > (min.isr), but let us know if you have any questions or concerns
> >> > regarding this change.
> >> >
> >> > Gwen
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks,
> >> Ewen
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "kafka-clients" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to kafka-clients+unsubscribe@googlegroups.com.
> > To post to this group, send email to kafka-clients@googlegroups.com.
> > Visit this group at http://groups.google.com/group/kafka-clients.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
> .
> >
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscribe@googlegroups.com.
> To post to this group, send email to kafka-clients@googlegroups.com.
> Visit this group at http://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Gwen Shapira <gs...@cloudera.com>.
Is the protocol bump caused by the behavior change or the new error code?

1) IMO, error_codes are data, and clients can expect to receive errors
that they don't understand (i.e. unknown errors). AFAIK, clients don't
break on unknown errors, they are simple more challenging to debug. If
we document the new behavior, then its definitely debuggable and
fixable.

2) The behavior change is basically a deprecation - i.e. acks > 1 were
never documented, and are not supported by Kafka clients starting with
version 0.8.2. I'm not sure this requires a protocol bump either,
although its a better case than new error codes.

Thanks,
Gwen

On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <jo...@stealth.ly> wrote:
> Looping in the mailing list that the client developers live on because they
> are all not on dev (though they should be if they want to be helping to
> build the best client libraries they can).
>
> I whole hardily believe that we need to not break existing functionality of
> the client protocol, ever.
>
> There are many reasons for this and we have other threads on the mailing
> list where we are discussing that topic (no pun intended) that I don't want
> to re-hash here.
>
> If we change wire protocol functionality OR the binary format (either) we
> must bump version AND treat version as a feature flag with backward
> compatibility support until it is deprecated for some time for folks to deal
> with it.
>
> match version = {
> case 0: keepDoingWhatWeWereDoing()
> case 1: doNewStuff()
> case 2: doEvenMoreNewStuff()
> }
>
> has to be a practice we adopt imho ... I know feature flags can be construed
> as "messy code" but I am eager to hear another (better? different?) solution
> to this.
>
> If we don't do a feature flag like this specifically with this change then
> what happens is that someone upgrades their brokers with a rolling restart
> in 0.8.3 and every single one of their producer requests start to fail and
> they have a major production outage. eeeek!!!!
>
> I do 100% agree that > 1 makes no sense and we *REALLY* need people to start
> using 0,1,-1 but we need to-do that in a way that is going to work for
> everyone.
>
> Old producers and consumers must keep working with new brokers and if we are
> not going to support that then I am unclear what the use of "version" is
> based on our original intentions of having it because of the 0.7=>-0.8. We
> said no more breaking changes when we did that.
>
> - Joe Stein
>
> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <ew...@confluent.io>
> wrote:
>>
>> Right, so this looks like it could create an issue similar to what's
>> currently being discussed in
>> https://issues.apache.org/jira/browse/KAFKA-1649 where users now get
>> errors
>> under conditions when they previously wouldn't. Old clients won't even
>> know
>> about the error code, so besides failing they won't even be able to log
>> any
>> meaningful error messages.
>>
>> I think there are two options for compatibility:
>>
>> 1. An alternative change is to remove the ack > 1 code, but silently
>> "upgrade" requests with acks > 1 to acks = -1. This isn't the same as
>> other
>> changes to behavior since the interaction between the client and server
>> remains the same, no error codes change, etc. The client might just see
>> some increased latency since the message might need to be replicated to
>> more brokers than they requested.
>> 2. Split this into two patches, one that bumps the protocol version on
>> that
>> message to include the new error code but maintains both old (now
>> deprecated) and new behavior, then a second that would be applied in a
>> later release that removes the old protocol + code for handling acks > 1.
>>
>> 2 is probably the right thing to do. If we specify the release when we'll
>> remove the deprecated protocol at the time of deprecation it makes things
>> a
>> lot easier for people writing non-java clients and could give users better
>> predictability (e.g. if clients are at most 1 major release behind
>> brokers,
>> they'll remain compatible but possibly use deprecated features).
>>
>>
>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <gs...@cloudera.com>
>> wrote:
>>
>> > Hi Kafka Devs,
>> >
>> > We are working on KAFKA-1697 - remove code related to ack>1 on the
>> > broker. Per Neha's suggestion, I'd like to give everyone a heads up on
>> > what these changes mean.
>> >
>> > Once this patch is included, any produce requests that include
>> > request.required.acks > 1 will result in an exception. This will be
>> > InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
>> > UnknownException in existing versions (sorry, but I can't add error
>> > codes retroactively).
>> >
>> > This behavior is already enforced by 0.8.2 producers (sync and new),
>> > but we expect impact on users with older producers that relied on acks
>> > > 1 and external clients (i.e python, go, etc).
>> >
>> > Users who relied on acks > 1 are expected to switch to using acks = -1
>> > and a min.isr parameter than matches their user case.
>> >
>> > This change was discussed in the past in the context of KAFKA-1555
>> > (min.isr), but let us know if you have any questions or concerns
>> > regarding this change.
>> >
>> > Gwen
>> >
>>
>>
>>
>> --
>> Thanks,
>> Ewen
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscribe@googlegroups.com.
> To post to this group, send email to kafka-clients@googlegroups.com.
> Visit this group at http://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Mark Roberts <wi...@gmail.com>.
This would sting a whole lot less if there was a programmatic way to get what server version is in use. Also, how will this work in mixed version clusters (during an upgrade, for example)?


> On Jan 15, 2015, at 10:10, Joe Stein <jo...@stealth.ly> wrote:
> 
> Looping in the mailing list that the client developers live on because they are all not on dev (though they should be if they want to be helping to build the best client libraries they can).
> 
> I whole hardily believe that we need to not break existing functionality of the client protocol, ever.
> 
> There are many reasons for this and we have other threads on the mailing list where we are discussing that topic (no pun intended) that I don't want to re-hash here.
> 
> If we change wire protocol functionality OR the binary format (either) we must bump version AND treat version as a feature flag with backward compatibility support until it is deprecated for some time for folks to deal with it.
> 
> match version = {
> case 0: keepDoingWhatWeWereDoing()
> case 1: doNewStuff()
> case 2: doEvenMoreNewStuff()
> }
> 
> has to be a practice we adopt imho ... I know feature flags can be construed as "messy code" but I am eager to hear another (better? different?) solution to this.
> 
> If we don't do a feature flag like this specifically with this change then what happens is that someone upgrades their brokers with a rolling restart in 0.8.3 and every single one of their producer requests start to fail and they have a major production outage. eeeek!!!!
> 
> I do 100% agree that > 1 makes no sense and we *REALLY* need people to start using 0,1,-1 but we need to-do that in a way that is going to work for everyone.
> 
> Old producers and consumers must keep working with new brokers and if we are not going to support that then I am unclear what the use of "version" is based on our original intentions of having it because of the 0.7=>-0.8. We said no more breaking changes when we did that.
> 
> - Joe Stein
> 
>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <ew...@confluent.io> wrote:
>> Right, so this looks like it could create an issue similar to what's
>> currently being discussed in
>> https://issues.apache.org/jira/browse/KAFKA-1649 where users now get errors
>> under conditions when they previously wouldn't. Old clients won't even know
>> about the error code, so besides failing they won't even be able to log any
>> meaningful error messages.
>> 
>> I think there are two options for compatibility:
>> 
>> 1. An alternative change is to remove the ack > 1 code, but silently
>> "upgrade" requests with acks > 1 to acks = -1. This isn't the same as other
>> changes to behavior since the interaction between the client and server
>> remains the same, no error codes change, etc. The client might just see
>> some increased latency since the message might need to be replicated to
>> more brokers than they requested.
>> 2. Split this into two patches, one that bumps the protocol version on that
>> message to include the new error code but maintains both old (now
>> deprecated) and new behavior, then a second that would be applied in a
>> later release that removes the old protocol + code for handling acks > 1.
>> 
>> 2 is probably the right thing to do. If we specify the release when we'll
>> remove the deprecated protocol at the time of deprecation it makes things a
>> lot easier for people writing non-java clients and could give users better
>> predictability (e.g. if clients are at most 1 major release behind brokers,
>> they'll remain compatible but possibly use deprecated features).
>> 
>> 
>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <gs...@cloudera.com> wrote:
>> 
>> > Hi Kafka Devs,
>> >
>> > We are working on KAFKA-1697 - remove code related to ack>1 on the
>> > broker. Per Neha's suggestion, I'd like to give everyone a heads up on
>> > what these changes mean.
>> >
>> > Once this patch is included, any produce requests that include
>> > request.required.acks > 1 will result in an exception. This will be
>> > InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
>> > UnknownException in existing versions (sorry, but I can't add error
>> > codes retroactively).
>> >
>> > This behavior is already enforced by 0.8.2 producers (sync and new),
>> > but we expect impact on users with older producers that relied on acks
>> > > 1 and external clients (i.e python, go, etc).
>> >
>> > Users who relied on acks > 1 are expected to switch to using acks = -1
>> > and a min.isr parameter than matches their user case.
>> >
>> > This change was discussed in the past in the context of KAFKA-1555
>> > (min.isr), but let us know if you have any questions or concerns
>> > regarding this change.
>> >
>> > Gwen
>> >
>> 
>> 
>> 
>> --
>> Thanks,
>> Ewen
> 
> -- 
> You received this message because you are subscribed to the Google Groups "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kafka-clients+unsubscribe@googlegroups.com.
> To post to this group, send email to kafka-clients@googlegroups.com.
> Visit this group at http://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Joe Stein <jo...@stealth.ly>.
Looping in the mailing list that the client developers live on because they
are all not on dev (though they should be if they want to be helping to
build the best client libraries they can).

I whole hardily believe that we need to not break existing functionality of
the client protocol, ever.

There are many reasons for this and we have other threads on the mailing
list where we are discussing that topic (no pun intended) that I don't want
to re-hash here.

If we change wire protocol functionality *OR* the binary format (either) we
must bump version *AND* treat version as a feature flag with backward
compatibility support until it is deprecated for some time for folks to
deal with it.

match version = {
case 0: keepDoingWhatWeWereDoing()
case 1: doNewStuff()
case 2: doEvenMoreNewStuff()
}

has to be a practice we adopt imho ... I know feature flags can be
construed as "messy code" but I am eager to hear another (better?
different?) solution to this.

If we don't do a feature flag like this specifically with this change then
what happens is that someone upgrades their brokers with a rolling restart
in 0.8.3 and every single one of their producer requests start to fail and
they have a major production outage. eeeek!!!!

I do 100% agree that > 1 makes no sense and we *REALLY* need people to
start using 0,1,-1 but we need to-do that in a way that is going to work
for everyone.

Old producers and consumers must keep working with new brokers and if we
are not going to support that then I am unclear what the use of "version"
is based on our original intentions of having it because of the 0.7=>-0.8.
We said no more breaking changes when we did that.

- Joe Stein

On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> Right, so this looks like it could create an issue similar to what's
> currently being discussed in
> https://issues.apache.org/jira/browse/KAFKA-1649 where users now get
> errors
> under conditions when they previously wouldn't. Old clients won't even know
> about the error code, so besides failing they won't even be able to log any
> meaningful error messages.
>
> I think there are two options for compatibility:
>
> 1. An alternative change is to remove the ack > 1 code, but silently
> "upgrade" requests with acks > 1 to acks = -1. This isn't the same as other
> changes to behavior since the interaction between the client and server
> remains the same, no error codes change, etc. The client might just see
> some increased latency since the message might need to be replicated to
> more brokers than they requested.
> 2. Split this into two patches, one that bumps the protocol version on that
> message to include the new error code but maintains both old (now
> deprecated) and new behavior, then a second that would be applied in a
> later release that removes the old protocol + code for handling acks > 1.
>
> 2 is probably the right thing to do. If we specify the release when we'll
> remove the deprecated protocol at the time of deprecation it makes things a
> lot easier for people writing non-java clients and could give users better
> predictability (e.g. if clients are at most 1 major release behind brokers,
> they'll remain compatible but possibly use deprecated features).
>
>
> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
> > Hi Kafka Devs,
> >
> > We are working on KAFKA-1697 - remove code related to ack>1 on the
> > broker. Per Neha's suggestion, I'd like to give everyone a heads up on
> > what these changes mean.
> >
> > Once this patch is included, any produce requests that include
> > request.required.acks > 1 will result in an exception. This will be
> > InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
> > UnknownException in existing versions (sorry, but I can't add error
> > codes retroactively).
> >
> > This behavior is already enforced by 0.8.2 producers (sync and new),
> > but we expect impact on users with older producers that relied on acks
> > > 1 and external clients (i.e python, go, etc).
> >
> > Users who relied on acks > 1 are expected to switch to using acks = -1
> > and a min.isr parameter than matches their user case.
> >
> > This change was discussed in the past in the context of KAFKA-1555
> > (min.isr), but let us know if you have any questions or concerns
> > regarding this change.
> >
> > Gwen
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
Right, so this looks like it could create an issue similar to what's
currently being discussed in
https://issues.apache.org/jira/browse/KAFKA-1649 where users now get errors
under conditions when they previously wouldn't. Old clients won't even know
about the error code, so besides failing they won't even be able to log any
meaningful error messages.

I think there are two options for compatibility:

1. An alternative change is to remove the ack > 1 code, but silently
"upgrade" requests with acks > 1 to acks = -1. This isn't the same as other
changes to behavior since the interaction between the client and server
remains the same, no error codes change, etc. The client might just see
some increased latency since the message might need to be replicated to
more brokers than they requested.
2. Split this into two patches, one that bumps the protocol version on that
message to include the new error code but maintains both old (now
deprecated) and new behavior, then a second that would be applied in a
later release that removes the old protocol + code for handling acks > 1.

2 is probably the right thing to do. If we specify the release when we'll
remove the deprecated protocol at the time of deprecation it makes things a
lot easier for people writing non-java clients and could give users better
predictability (e.g. if clients are at most 1 major release behind brokers,
they'll remain compatible but possibly use deprecated features).


On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> Hi Kafka Devs,
>
> We are working on KAFKA-1697 - remove code related to ack>1 on the
> broker. Per Neha's suggestion, I'd like to give everyone a heads up on
> what these changes mean.
>
> Once this patch is included, any produce requests that include
> request.required.acks > 1 will result in an exception. This will be
> InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and
> UnknownException in existing versions (sorry, but I can't add error
> codes retroactively).
>
> This behavior is already enforced by 0.8.2 producers (sync and new),
> but we expect impact on users with older producers that relied on acks
> > 1 and external clients (i.e python, go, etc).
>
> Users who relied on acks > 1 are expected to switch to using acks = -1
> and a min.isr parameter than matches their user case.
>
> This change was discussed in the past in the context of KAFKA-1555
> (min.isr), but let us know if you have any questions or concerns
> regarding this change.
>
> Gwen
>



-- 
Thanks,
Ewen