You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Rajini Sivaram <ra...@gmail.com> on 2020/07/08 09:28:13 UTC

Re: [VOTE] KIP-392: Allow consumers to fetch from the closest replica

Hi all,

We would like to make a change related to this KIP to ensure that transient
exceptions during reassignments are handled as retriable exceptions. Kafka
brokers currently return REPLICA_NOT_AVAILABLE for Fetch requests if the
replica is not available on the broker. But ReplicaNotAvailableException is
not a retriable InvalidMetadataException. Java consumers handle this
correctly by explicitly matching error codes, but some non-Java clients
treat this as a fatal error. Brokers return NOT_LEADER_FOR_PARTITION for
Produce requests in this scenario. This error is currently not suitable for
fetch since fetch could be from leader or follower.

Following on from the discussion in the PR
https://github.com/apache/kafka/pull/8979 for fixing
https://issues.apache.org/jira/browse/KAFKA-10223, we would like to rename
NOT_LEADER_FOR_PARTITION to NOT_LEADER_OR_FOLLOWER so that it can be used
for fetch-from-follower as well. Both Produce and Fetch requests will
return this error if replica is not available. This makes it compatible
with old and new clients and is a non-breaking change. A new exception
class NotLeaderOrFollowerException will be added. This will be a subclass
of NotLeaderForPartitionException. We will deprecate
NotLeaderForPartitionException and ReplicaNotAvailableException.

If you have any concerns about this change, please let us know. Otherwise,
I will update the KIP and PR.

Thank you,

Rajini


On Mon, Apr 8, 2019 at 5:19 PM Jason Gustafson <ja...@confluent.io> wrote:

> I'm going to call this vote.
>
> +1: Me, Ryanne, Guozhang, Eno, David, Viktor, Stephane, Gwen, Jun
>
> Binding total is +5 with no -1 or +0. Thanks all for the discussion and
> feedback.
>
> -Jason
>
>
> On Thu, Apr 4, 2019 at 4:55 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Jason,
> >
> > Thanks for the updated KIP. +1 from me.
> >
> > Jun
> >
> > On Thu, Apr 4, 2019 at 2:26 PM Jason Gustafson <ja...@confluent.io>
> wrote:
> >
> > > Hi Jun,
> > >
> > > I have updated the KIP to remove `replica.selection.policy` from the
> > > consumer configuration. Thanks for the suggestion.
> > >
> > > Best,
> > > Jason
> > >
> > > On Wed, Mar 27, 2019 at 9:46 AM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > @Jun
> > > >
> > > > Re; 200: It's a fair point that it is useful to minimize the client
> > > > changes that are needed to get a benefit from affinity. I think the
> > high
> > > > level argument that this is mostly the concern of operators and
> should
> > be
> > > > under their control. Since there is a protocol bump here, users will
> > have
> > > > to upgrade clients at a minimum. An alternative would be to make
> > > > "preferred" the default option for `replica.selection.policy`. But I
> > > agree
> > > > that the value of the configuration becomes less clear in this case.
> > > > Overall this suggestion sounds good to me, but let me see if there is
> > any
> > > > additional feedback before I update the KIP.
> > > >
> > > > Re; 201: Ack.
> > > >
> > > > @Guozhang
> > > >
> > > > I think rack.id is still an easier and more reliable way for many
> > users
> > > > to determine local affinity. This lets us provide the simple
> rack-aware
> > > > implementation which is probably sufficient for a fair number of use
> > > cases
> > > > and wouldn't require users to write any custom code.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > >
> > > > On Wed, Mar 27, 2019 at 9:05 AM Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > > >
> > > >> Hello Jun,
> > > >>
> > > >> Regarding 200: if we assume that most client would not bother
> setting
> > > >> rack.id at all and affinity can be determined w/o rack.id via TCP
> > > header,
> > > >> plus rack.id may not be "future-proof" additional information is
> > needed
> > > >> as
> > > >> well, then do we still need to change the protocol of metadata
> request
> > > to
> > > >> add `rack.id`?
> > > >>
> > > >>
> > > >> Guozhang
> > > >>
> > > >> On Tue, Mar 26, 2019 at 6:23 PM Jun Rao <ju...@confluent.io> wrote:
> > > >>
> > > >> > Hi, Jason,
> > > >> >
> > > >> > Thanks for the KIP. Just a couple of more comments.
> > > >> >
> > > >> > 200. I am wondering if we really need the replica.selection.policy
> > > >> config
> > > >> > in the consumer. A slight variant is that we (1) let the consumer
> > > always
> > > >> > fetch from the PreferredReplica and (2) provide a default
> > > >> implementation of
> > > >> > ReplicaSelector that always returns the leader replica in select()
> > for
> > > >> > backward compatibility. Then, we can get rid of
> > > >> replica.selection.policy in
> > > >> > the consumer. The benefits are that (1) fewer configs, (2)
> affinity
> > > >> > optimization can potentially be turned on with just a broker side
> > > change
> > > >> > (assuming affinity can be determined w/o client rack.id).
> > > >> >
> > > >> > 201. I am wondering if PreferredReplica in the protocol should be
> > > named
> > > >> > PreferredReadReplica since it's intended for reads?
> > > >> >
> > > >> > Jun
> > > >> >
> > > >> > On Mon, Mar 25, 2019 at 9:07 AM Jason Gustafson <
> jason@confluent.io
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi All, discussion on the KIP seems to have died down, so I'd
> like
> > > to
> > > >> go
> > > >> > > ahead and start a vote. Here is a link to the KIP:
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica
> > > >> > > .
> > > >> > >
> > > >> > > +1 from me (duh)
> > > >> > >
> > > >> > > -Jason
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> -- Guozhang
> > > >>
> > > >
> > >
> >
>