You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jun Rao <ju...@confluent.io> on 2015/06/02 02:50:20 UTC

Re: [DISCUSSION] Partition Selection and Coordination By Brokers for Producers

Bhavesh,

I am not sure if load balancing based on the consumption rate (1.b) makes
sense. Each consumer typically consumes all partitions from a topic. So, as
long as the data in each partition is balanced, the consumption rate will
be balanced too. Selecting a partition based on the size of each partition
could be useful, but I am not sure if it's going to be significantly better
than just having the clients pick a random partition. Also, implementing
this on the broker side has downside. First, having the broker forward each
produce request increases the network traffic on the broker. Second, this
likely will make the broker code more complicated since we probably have to
put every forwarded produce request in a purgatory. Third, we currently
don't maintain the size of each partition on every broker.

Given these, I think your best bet is probably to just fix those non-java
clients to send data in a round robin way.

Thanks,

Jun

On Fri, May 29, 2015 at 1:22 PM, Bhavesh Mistry <mi...@gmail.com>
wrote:

> Hi Kafka Dev Team,
>
> I would appreciate your feedback on moving producer partition selection
> from producer to Broker.   Also, please do let me know what is correct
> process of collecting feedback from Kafka Dev team and/or community.
>
> Thanks,
>
> Bhavesh
>
> On Tue, May 26, 2015 at 11:54 AM, Bhavesh Mistry <
> mistry.p.bhavesh@gmail.com
> > wrote:
>
> > Hi Kafka Dev Team,
> >
> > I am sorry I am new to process of discussion and/or KIP.  So, I had
> > commented  other email voting chain.  Please do let me know correct
> process
> > for collecting and staring discussion with Kafka Dev Group.
> >
> > Here is original message:
> >
> > I have had experience with both producer and consumer side.  I have
> > different  use case on this partition selection strategy.
> >
> >
> >
> > Problem :
> >
> >
> > We have heterogeneous environment of producers (by that I mean we have
> > node js, python, New Java & Old Scala Based producers to same topic).   I
> > have seen that not all producers employ round-robing strategies for
> > non-keyed message like new producer does.  Hence, it creates non uniform
> > data ingestion into partition and delay in overall message processing.
> >
> > How to address uniform distribution/message injection rate to all
> > partitions ?
> >
> >
> >
> > Propose Solution:
> >
> >
> > Let broker cluster decide the next partition for topic to send data
> rather
> > than producer itself with more intelligence.
> >
> > 1)   When sending data to brokers (ProduceResponse) Kafka Protocol over
> > the wire send hint to client which partition to send based on following
> > logic (Or can be customizable)
> >
> > a.     Based on overall data injection rate for topic and current
> > producer injection rate
> >
> > b.     Ability rank partition based on consumer rate (Advance Use Case as
> > there may be many consumers so weighted average etc... )
> >
> >
> >
> > Untimely, brokers will coordinate among thousand of producers and divert
> > data injection  rate (out-of-box feature) and consumption rate (pluggable
> > interface implementation on brokers’ side).  The goal  here is to attain
> > uniformity and/or lower delivery rate to consumer.  This is similar to
> > consumer coordination moving to brokers. The producer side partition
> > selection would also move to brokers.  This will benefit both java and
> > non-java clients.
> >
> >
> >
> > Please let me know your feedback on this subject matter.  I am sure lots
> > of you run  Kafka in Enterprise Environment where you may have different
> > type of producers for same topic (e.g logging client in JavaScript, PHP,
> > Java and Python etc sending to log topic).  I would really appreciate
> your
> > feedback on this.
> >
> >
> >
> >
> >
> > Thanks,
> >
> >
> > Bhavesh
> >
>

Re: [DISCUSSION] Partition Selection and Coordination By Brokers for Producers

Posted by Bhavesh Mistry <mi...@gmail.com>.
Thanks for the info Jun.



Each broker (that host topic) has the latest (end) offset for each topic
and partition ?  If yes, I was planning to use this rate-of-change vs
incoming injection rate of producer (only calculated for the attached
producer broker) to make decision to which partition would be next optimal
partition to inject data.



I was not thinking about coordination among brokers  (but I agree consumer
rate can be dropped).

Thanks for feedback, and I will have to study how broker handle
ProducerRequest (purgatory stuff).


Thanks,


Bhavesh

On Mon, Jun 1, 2015 at 5:50 PM, Jun Rao <ju...@confluent.io> wrote:

> Bhavesh,
>
> I am not sure if load balancing based on the consumption rate (1.b) makes
> sense. Each consumer typically consumes all partitions from a topic. So, as
> long as the data in each partition is balanced, the consumption rate will
> be balanced too. Selecting a partition based on the size of each partition
> could be useful, but I am not sure if it's going to be significantly better
> than just having the clients pick a random partition. Also, implementing
> this on the broker side has downside. First, having the broker forward each
> produce request increases the network traffic on the broker. Second, this
> likely will make the broker code more complicated since we probably have to
> put every forwarded produce request in a purgatory. Third, we currently
> don't maintain the size of each partition on every broker.
>
> Given these, I think your best bet is probably to just fix those non-java
> clients to send data in a round robin way.
>
> Thanks,
>
> Jun
>
> On Fri, May 29, 2015 at 1:22 PM, Bhavesh Mistry <
> mistry.p.bhavesh@gmail.com>
> wrote:
>
> > Hi Kafka Dev Team,
> >
> > I would appreciate your feedback on moving producer partition selection
> > from producer to Broker.   Also, please do let me know what is correct
> > process of collecting feedback from Kafka Dev team and/or community.
> >
> > Thanks,
> >
> > Bhavesh
> >
> > On Tue, May 26, 2015 at 11:54 AM, Bhavesh Mistry <
> > mistry.p.bhavesh@gmail.com
> > > wrote:
> >
> > > Hi Kafka Dev Team,
> > >
> > > I am sorry I am new to process of discussion and/or KIP.  So, I had
> > > commented  other email voting chain.  Please do let me know correct
> > process
> > > for collecting and staring discussion with Kafka Dev Group.
> > >
> > > Here is original message:
> > >
> > > I have had experience with both producer and consumer side.  I have
> > > different  use case on this partition selection strategy.
> > >
> > >
> > >
> > > Problem :
> > >
> > >
> > > We have heterogeneous environment of producers (by that I mean we have
> > > node js, python, New Java & Old Scala Based producers to same topic).
>  I
> > > have seen that not all producers employ round-robing strategies for
> > > non-keyed message like new producer does.  Hence, it creates non
> uniform
> > > data ingestion into partition and delay in overall message processing.
> > >
> > > How to address uniform distribution/message injection rate to all
> > > partitions ?
> > >
> > >
> > >
> > > Propose Solution:
> > >
> > >
> > > Let broker cluster decide the next partition for topic to send data
> > rather
> > > than producer itself with more intelligence.
> > >
> > > 1)   When sending data to brokers (ProduceResponse) Kafka Protocol over
> > > the wire send hint to client which partition to send based on following
> > > logic (Or can be customizable)
> > >
> > > a.     Based on overall data injection rate for topic and current
> > > producer injection rate
> > >
> > > b.     Ability rank partition based on consumer rate (Advance Use Case
> as
> > > there may be many consumers so weighted average etc... )
> > >
> > >
> > >
> > > Untimely, brokers will coordinate among thousand of producers and
> divert
> > > data injection  rate (out-of-box feature) and consumption rate
> (pluggable
> > > interface implementation on brokers’ side).  The goal  here is to
> attain
> > > uniformity and/or lower delivery rate to consumer.  This is similar to
> > > consumer coordination moving to brokers. The producer side partition
> > > selection would also move to brokers.  This will benefit both java and
> > > non-java clients.
> > >
> > >
> > >
> > > Please let me know your feedback on this subject matter.  I am sure
> lots
> > > of you run  Kafka in Enterprise Environment where you may have
> different
> > > type of producers for same topic (e.g logging client in JavaScript,
> PHP,
> > > Java and Python etc sending to log topic).  I would really appreciate
> > your
> > > feedback on this.
> > >
> > >
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Bhavesh
> > >
> >
>