You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Matthias J. Sax" <mj...@apache.org> on 2020/07/06 19:29:54 UTC

Re: Potential Improvement for Kafka Producer

Arvin,

thanks for your email. This is definitely the right channel. I am
personally not familiar enough with the producer code, but what you say
makes sense to me from a high level.

Maybe it would be best if you would file a Jira to improve the producer
accordingly? I guess, this change would require a KIP.

Of course, if you are interested, feel free to pick it up yourself.


-Matthias


On 6/28/20 8:53 AM, Arvin Zheng wrote:
> Hi All,
> 
> Not sure if this is the right channel and thread to ask, but would like to
> discuss a potential improvement to Java Kafka Producer.
> 
> ```
> Currently the Kafka Producer is able to identify unavailable partitions and
> avoid routing messages to them, but the definition of an unavailable
> partitions is - the leader of the partition is not available.
> From Producer point of view, acks for sending messages can be [all, -1, 0,
> 1]
> 1. When acks is set to either 0 or 1, leader availability is good enough to
> determine whether we should route messages to that partition.
> 2. When acks is set to -1 or all, leader available doesn't mean we are able
> to persist messages to that partition successfully, instead, we need to
> make sure
> a. leader is available.
> b. at least min.insync.replicas number of replicas are available
> ```
> 
> To achieve 2, what we need is to carry min.insync.replicas information of a
> topic to the metadata response, so that Producer is able to determine if it
> should route messages to that partition when there's no enough replicas
> available and it's acks is set to -1 or all.
> 
> Advantages that I can think of
> 1. Avoid exhausting the entire Producer cache when a partition is not
> available for a long time and
> a. retries is set to a large value
> b. acks is set to all
> 2. Avoid unnecessary network tries
> 
> Not sure if this is a valid case but would like to hear any opinions.
> 
> Br,
> Arvin
> 


Re: Potential Improvement for Kafka Producer

Posted by Arvin Zheng <zm...@gmail.com>.
Thanks to Matthias for the response, I've created a KIP and started another
email thread to discuss this, check following.
[DISCUSS] Include min.insync.replicas in MetadataResponse to make Producer
smarter in partitioning events

Matthias J. Sax <mj...@apache.org> 于2020年7月6日周一 下午12:30写道:

> Arvin,
>
> thanks for your email. This is definitely the right channel. I am
> personally not familiar enough with the producer code, but what you say
> makes sense to me from a high level.
>
> Maybe it would be best if you would file a Jira to improve the producer
> accordingly? I guess, this change would require a KIP.
>
> Of course, if you are interested, feel free to pick it up yourself.
>
>
> -Matthias
>
>
> On 6/28/20 8:53 AM, Arvin Zheng wrote:
> > Hi All,
> >
> > Not sure if this is the right channel and thread to ask, but would like
> to
> > discuss a potential improvement to Java Kafka Producer.
> >
> > ```
> > Currently the Kafka Producer is able to identify unavailable partitions
> and
> > avoid routing messages to them, but the definition of an unavailable
> > partitions is - the leader of the partition is not available.
> > From Producer point of view, acks for sending messages can be [all, -1,
> 0,
> > 1]
> > 1. When acks is set to either 0 or 1, leader availability is good enough
> to
> > determine whether we should route messages to that partition.
> > 2. When acks is set to -1 or all, leader available doesn't mean we are
> able
> > to persist messages to that partition successfully, instead, we need to
> > make sure
> > a. leader is available.
> > b. at least min.insync.replicas number of replicas are available
> > ```
> >
> > To achieve 2, what we need is to carry min.insync.replicas information
> of a
> > topic to the metadata response, so that Producer is able to determine if
> it
> > should route messages to that partition when there's no enough replicas
> > available and it's acks is set to -1 or all.
> >
> > Advantages that I can think of
> > 1. Avoid exhausting the entire Producer cache when a partition is not
> > available for a long time and
> > a. retries is set to a large value
> > b. acks is set to all
> > 2. Avoid unnecessary network tries
> >
> > Not sure if this is a valid case but would like to hear any opinions.
> >
> > Br,
> > Arvin
> >
>
>