You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Justine Olshan <jo...@confluent.io> on 2019/06/24 21:04:13 UTC

[DISCUSS] KIP-480 : Sticky Partitioner

Hello,
This is the discussion thread for KIP-480: Sticky Partitioner.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner

Thank you,
Justine Olshan

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
Hi Colin,

The first thing that comes to mind is AlwaysStickyPartitioner, but maybe I
could think about it a bit more and come up with something better.

The only reason why I hesitate about the RoundRobin part is that it doesn't
really distinguish that this partitioner will have this behavior on keyed
values. I know this is the case for the RoundRobinPartitioner, but unless
one is familiar with the code, it might not be obvious on first glance.

I can definitely include this in the KIP once we get a name settled.

Thanks,
Justine

On Fri, Jul 12, 2019 at 11:45 AM Colin McCabe <cm...@apache.org> wrote:

> On Fri, Jul 12, 2019, at 09:02, Justine Olshan wrote:
> > Hello all,
> >
> > Jun, thanks for taking a look at my KIP! We were also concerned about
> > batches containing a single record so we kept this in mind for the
> > implementation. The decision to switch the sticky partition actually
> > involves returning from the record accumulator and assigning the new
> > partition before the new batch is created. That way all of the records
> will
> > go to this new partition's batch. If you would like to get a better look
> at
> > how this works, please check out the PR:
> > https://github.com/apache/kafka/pull/6997/files. The most important
> lines
> > are in the append method of the RecordAccumulator and doSend in
> > KafkaProducer.
>
> Thanks for the explanation.
>
> >
> > Colin, I think this makes sense to me except for the name
> > StickyRoundRobinPartitioner seems to not really explain the behavior of
> > what would be implemented. Perhaps a name indicating the sticky behavior
> is
> > always used, or that it will be used on keys is more descriptive. Calling
> > it "RoundRobin" seems a bit misleading to me.
>
> Hmm, what name would you propose here?
>
> Keep in mind we don't have to implement the new configurable partitioner
> in the initial PR :)
>
> best,
> Colin
>
> >
> > Thanks again for reviewing,
> > Justine
> >
> > On Thu, Jul 11, 2019 at 6:07 PM Jun Rao <ju...@confluent.io> wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the KIP. Nice writeup and great results. Just one comment.
> > >
> > > 100. To add a record to the accumulator, the producer needs to know the
> > > partition id. The decision of whether the record can be added to the
> > > current batch is only made after the accumulator.append() call. So,
> when a
> > > batch is full, it seems that the KIP will try to append the next
> record to
> > > the same partition, which will trigger the creation of a new batch
> with a
> > > single record. After that, new records will be routed to a new
> partition.
> > > If the producer doesn't come back to the first partition in time, the
> > > producer will send a single record batch. In the worse case, it can be
> that
> > > every other batch has only a single record. Is this correct? If so,
> could
> > > we avoid that?
> > >
> > > Jun
> > >
> > > On Thu, Jul 11, 2019 at 5:23 PM Colin McCabe <cm...@apache.org>
> wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > I agree that we shouldn't change RoundRobinPartitioner, since its
> > > behavior
> > > > is already specified.
> > > >
> > > > However, we could add a new, separate StickyRoundRobinPartitioner
> class
> > > to
> > > > KIP-480 which just implemented the sticky behavior regardless of
> whether
> > > > the key was null.  That seems pretty easy to add (and it wouldn't
> have to
> > > > implemented right away in the first PR, of course.)  It would be an
> > > option
> > > > for people who wanted to configure this behavior.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Wed, Jul 10, 2019, at 08:48, Justine Olshan wrote:
> > > > > Hi M,
> > > > >
> > > > > I'm a little confused by what you mean by extending the behavior
> on to
> > > > the
> > > > > RoundRobinPartitioner.
> > > > > The sticky partitioner plans to remove the round-robin behavior
> from
> > > > > records with no keys. Instead of sending them to each partition in
> > > order,
> > > > > it sends them all to the same partition until the batch is sent.
> > > > > I don't think you can have both round-robin and sticky partition
> > > > behavior.
> > > > >
> > > > > Thank you,
> > > > > Justine Olshan
> > > > >
> > > > > On Wed, Jul 10, 2019 at 1:54 AM M. Manna <ma...@gmail.com>
> wrote:
> > > > >
> > > > > > Thanks for the comments Colin.
> > > > > >
> > > > > > My only concern is that this KIP is addressing a good feature and
> > > > having
> > > > > > that extended to RoundRobinPartitioner means 1 less KIP in the
> > > future.
> > > > > >
> > > > > > Would it be appropriate to extend the support to
> > > RoundRobinPartitioner
> > > > too?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > On Tue, 9 Jul 2019 at 17:24, Colin McCabe <cm...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Hi M,
> > > > > > >
> > > > > > > The RoundRobinPartitioner added by KIP-369 doesn't interact
> with
> > > this
> > > > > > > KIP.  If you configure your producer to use
> RoundRobinPartitioner,
> > > > then
> > > > > > the
> > > > > > > DefaultPartitioner will not be used.  And the "sticky"
> behavior is
> > > > > > > implemented only in the DefaultPartitioner.
> > > > > > >
> > > > > > > regards,
> > > > > > > Colin
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jul 9, 2019, at 05:12, M. Manna wrote:
> > > > > > > > Hello Justine,
> > > > > > > >
> > > > > > > > I have one item I wanted to discuss.
> > > > > > > >
> > > > > > > > We are currently in review stage for KAFKA-3333 where we can
> > > choose
> > > > > > > always
> > > > > > > > RoundRobin regardless of null/usable key.
> > > > > > > >
> > > > > > > > If I understood this KIP motivation correctly, you are still
> > > > honouring
> > > > > > > how
> > > > > > > > the hashing of key works for DefaultPartitioner. Would you
> say
> > > that
> > > > > > > having
> > > > > > > > an always "Round-Robin" partitioning with "Sticky" assignment
> > > > > > (efficient
> > > > > > > > batching of records for a partition) doesn't deviate from
> your
> > > > original
> > > > > > > > intention?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > On Tue, 9 Jul 2019 at 01:00, Justine Olshan <
> > > jolshan@confluent.io>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello all,
> > > > > > > > >
> > > > > > > > > If there are no more comments or concerns, I would like to
> > > start
> > > > the
> > > > > > > vote
> > > > > > > > > on this tomorrow afternoon.
> > > > > > > > >
> > > > > > > > > However, if there are still topics to discuss, feel free to
> > > bring
> > > > > > them
> > > > > > > up
> > > > > > > > > now.
> > > > > > > > >
> > > > > > > > > Thank you,
> > > > > > > > > Justine
> > > > > > > > >
> > > > > > > > > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan <
> > > > jolshan@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello again,
> > > > > > > > > >
> > > > > > > > > > Another update to the interface has been made to the KIP.
> > > > > > > > > > Please let me know if you have any feedback!
> > > > > > > > > >
> > > > > > > > > > Thank you,
> > > > > > > > > > Justine
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <
> > > > > > jolshan@confluent.io
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hi all,
> > > > > > > > > >> I made some changes to the KIP.
> > > > > > > > > >> The idea is to clean up the code, make behavior more
> > > explicit,
> > > > > > > provide
> > > > > > > > > >> more flexibility, and to keep default behavior the same.
> > > > > > > > > >>
> > > > > > > > > >> Now we will change the partition in onNewBatch, and
> specify
> > > > the
> > > > > > > > > >> conditions for this function call (non-keyed values, no
> > > > explicit
> > > > > > > > > >> partitions) in willCallOnNewBatch.
> > > > > > > > > >> This clears up some of the issues with the
> implementation.
> > > I'm
> > > > > > > happy to
> > > > > > > > > >> hear further opinions and discuss this change!
> > > > > > > > > >>
> > > > > > > > > >> Thank you,
> > > > > > > > > >> Justine
> > > > > > > > > >>
> > > > > > > > > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <
> > > > cmccabe@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > > > > > > > > >>> > Thanks for the KIP Justine. It looks pretty good. A
> few
> > > > > > comments:
> > > > > > > > > >>> >
> > > > > > > > > >>> > 1. Should we favor partitions that are not under
> > > > replicated?
> > > > > > > This is
> > > > > > > > > >>> > something that Netflix did too.
> > > > > > > > > >>>
> > > > > > > > > >>> This seems like it could lead to cascading failures,
> right?
> > > > If a
> > > > > > > > > >>> partition becomes under-replicated because there is too
> > > much
> > > > > > > traffic,
> > > > > > > > > the
> > > > > > > > > >>> producer stops sending to it, which puts even more
> load on
> > > > the
> > > > > > > > > remaining
> > > > > > > > > >>> partitions, which are even more likely to fail then,
> etc.
> > > It
> > > > > > also
> > > > > > > will
> > > > > > > > > >>> create unbalanced load patterns on the consumers.
> > > > > > > > > >>>
> > > > > > > > > >>> >
> > > > > > > > > >>> > 2. If there's no measurable performance difference, I
> > > agree
> > > > > > with
> > > > > > > > > >>> Stanislav
> > > > > > > > > >>> > that Optional would be better than Integer.
> > > > > > > > > >>> >
> > > > > > > > > >>> > 3. We should include the javadoc for the newly
> introduced
> > > > > > method
> > > > > > > that
> > > > > > > > > >>> > specifies it and its parameters. In particular, it
> would
> > > > good
> > > > > > to
> > > > > > > > > >>> specify if
> > > > > > > > > >>> > it gets called when an explicit partition id has been
> > > > provided.
> > > > > > > > > >>>
> > > > > > > > > >>> Agreed.
> > > > > > > > > >>>
> > > > > > > > > >>> best,
> > > > > > > > > >>> Colin
> > > > > > > > > >>>
> > > > > > > > > >>> >
> > > > > > > > > >>> > Ismael
> > > > > > > > > >>> >
> > > > > > > > > >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <
> > > > > > > jolshan@confluent.io>
> > > > > > > > > >>> wrote:
> > > > > > > > > >>> >
> > > > > > > > > >>> > > Hello,
> > > > > > > > > >>> > > This is the discussion thread for KIP-480: Sticky
> > > > > > Partitioner.
> > > > > > > > > >>> > >
> > > > > > > > > >>> > >
> > > > > > > > > >>> > >
> > > > > > > > > >>>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > > > > > > >>> > >
> > > > > > > > > >>> > > Thank you,
> > > > > > > > > >>> > > Justine Olshan
> > > > > > > > > >>> > >
> > > > > > > > > >>> >
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Colin McCabe <cm...@apache.org>.
On Fri, Jul 12, 2019, at 09:02, Justine Olshan wrote:
> Hello all,
> 
> Jun, thanks for taking a look at my KIP! We were also concerned about
> batches containing a single record so we kept this in mind for the
> implementation. The decision to switch the sticky partition actually
> involves returning from the record accumulator and assigning the new
> partition before the new batch is created. That way all of the records will
> go to this new partition's batch. If you would like to get a better look at
> how this works, please check out the PR:
> https://github.com/apache/kafka/pull/6997/files. The most important lines
> are in the append method of the RecordAccumulator and doSend in
> KafkaProducer.

Thanks for the explanation.

> 
> Colin, I think this makes sense to me except for the name
> StickyRoundRobinPartitioner seems to not really explain the behavior of
> what would be implemented. Perhaps a name indicating the sticky behavior is
> always used, or that it will be used on keys is more descriptive. Calling
> it "RoundRobin" seems a bit misleading to me.

Hmm, what name would you propose here?

Keep in mind we don't have to implement the new configurable partitioner in the initial PR :)

best,
Colin

> 
> Thanks again for reviewing,
> Justine
> 
> On Thu, Jul 11, 2019 at 6:07 PM Jun Rao <ju...@confluent.io> wrote:
> 
> > Hi, Justine,
> >
> > Thanks for the KIP. Nice writeup and great results. Just one comment.
> >
> > 100. To add a record to the accumulator, the producer needs to know the
> > partition id. The decision of whether the record can be added to the
> > current batch is only made after the accumulator.append() call. So, when a
> > batch is full, it seems that the KIP will try to append the next record to
> > the same partition, which will trigger the creation of a new batch with a
> > single record. After that, new records will be routed to a new partition.
> > If the producer doesn't come back to the first partition in time, the
> > producer will send a single record batch. In the worse case, it can be that
> > every other batch has only a single record. Is this correct? If so, could
> > we avoid that?
> >
> > Jun
> >
> > On Thu, Jul 11, 2019 at 5:23 PM Colin McCabe <cm...@apache.org> wrote:
> >
> > > Hi Justine,
> > >
> > > I agree that we shouldn't change RoundRobinPartitioner, since its
> > behavior
> > > is already specified.
> > >
> > > However, we could add a new, separate StickyRoundRobinPartitioner class
> > to
> > > KIP-480 which just implemented the sticky behavior regardless of whether
> > > the key was null.  That seems pretty easy to add (and it wouldn't have to
> > > implemented right away in the first PR, of course.)  It would be an
> > option
> > > for people who wanted to configure this behavior.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Wed, Jul 10, 2019, at 08:48, Justine Olshan wrote:
> > > > Hi M,
> > > >
> > > > I'm a little confused by what you mean by extending the behavior on to
> > > the
> > > > RoundRobinPartitioner.
> > > > The sticky partitioner plans to remove the round-robin behavior from
> > > > records with no keys. Instead of sending them to each partition in
> > order,
> > > > it sends them all to the same partition until the batch is sent.
> > > > I don't think you can have both round-robin and sticky partition
> > > behavior.
> > > >
> > > > Thank you,
> > > > Justine Olshan
> > > >
> > > > On Wed, Jul 10, 2019 at 1:54 AM M. Manna <ma...@gmail.com> wrote:
> > > >
> > > > > Thanks for the comments Colin.
> > > > >
> > > > > My only concern is that this KIP is addressing a good feature and
> > > having
> > > > > that extended to RoundRobinPartitioner means 1 less KIP in the
> > future.
> > > > >
> > > > > Would it be appropriate to extend the support to
> > RoundRobinPartitioner
> > > too?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > On Tue, 9 Jul 2019 at 17:24, Colin McCabe <cm...@apache.org>
> > wrote:
> > > > >
> > > > > > Hi M,
> > > > > >
> > > > > > The RoundRobinPartitioner added by KIP-369 doesn't interact with
> > this
> > > > > > KIP.  If you configure your producer to use RoundRobinPartitioner,
> > > then
> > > > > the
> > > > > > DefaultPartitioner will not be used.  And the "sticky" behavior is
> > > > > > implemented only in the DefaultPartitioner.
> > > > > >
> > > > > > regards,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 9, 2019, at 05:12, M. Manna wrote:
> > > > > > > Hello Justine,
> > > > > > >
> > > > > > > I have one item I wanted to discuss.
> > > > > > >
> > > > > > > We are currently in review stage for KAFKA-3333 where we can
> > choose
> > > > > > always
> > > > > > > RoundRobin regardless of null/usable key.
> > > > > > >
> > > > > > > If I understood this KIP motivation correctly, you are still
> > > honouring
> > > > > > how
> > > > > > > the hashing of key works for DefaultPartitioner. Would you say
> > that
> > > > > > having
> > > > > > > an always "Round-Robin" partitioning with "Sticky" assignment
> > > > > (efficient
> > > > > > > batching of records for a partition) doesn't deviate from your
> > > original
> > > > > > > intention?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > On Tue, 9 Jul 2019 at 01:00, Justine Olshan <
> > jolshan@confluent.io>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hello all,
> > > > > > > >
> > > > > > > > If there are no more comments or concerns, I would like to
> > start
> > > the
> > > > > > vote
> > > > > > > > on this tomorrow afternoon.
> > > > > > > >
> > > > > > > > However, if there are still topics to discuss, feel free to
> > bring
> > > > > them
> > > > > > up
> > > > > > > > now.
> > > > > > > >
> > > > > > > > Thank you,
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan <
> > > jolshan@confluent.io>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello again,
> > > > > > > > >
> > > > > > > > > Another update to the interface has been made to the KIP.
> > > > > > > > > Please let me know if you have any feedback!
> > > > > > > > >
> > > > > > > > > Thank you,
> > > > > > > > > Justine
> > > > > > > > >
> > > > > > > > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <
> > > > > jolshan@confluent.io
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi all,
> > > > > > > > >> I made some changes to the KIP.
> > > > > > > > >> The idea is to clean up the code, make behavior more
> > explicit,
> > > > > > provide
> > > > > > > > >> more flexibility, and to keep default behavior the same.
> > > > > > > > >>
> > > > > > > > >> Now we will change the partition in onNewBatch, and specify
> > > the
> > > > > > > > >> conditions for this function call (non-keyed values, no
> > > explicit
> > > > > > > > >> partitions) in willCallOnNewBatch.
> > > > > > > > >> This clears up some of the issues with the implementation.
> > I'm
> > > > > > happy to
> > > > > > > > >> hear further opinions and discuss this change!
> > > > > > > > >>
> > > > > > > > >> Thank you,
> > > > > > > > >> Justine
> > > > > > > > >>
> > > > > > > > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <
> > > cmccabe@apache.org>
> > > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > > > > > > > >>> > Thanks for the KIP Justine. It looks pretty good. A few
> > > > > comments:
> > > > > > > > >>> >
> > > > > > > > >>> > 1. Should we favor partitions that are not under
> > > replicated?
> > > > > > This is
> > > > > > > > >>> > something that Netflix did too.
> > > > > > > > >>>
> > > > > > > > >>> This seems like it could lead to cascading failures, right?
> > > If a
> > > > > > > > >>> partition becomes under-replicated because there is too
> > much
> > > > > > traffic,
> > > > > > > > the
> > > > > > > > >>> producer stops sending to it, which puts even more load on
> > > the
> > > > > > > > remaining
> > > > > > > > >>> partitions, which are even more likely to fail then, etc.
> > It
> > > > > also
> > > > > > will
> > > > > > > > >>> create unbalanced load patterns on the consumers.
> > > > > > > > >>>
> > > > > > > > >>> >
> > > > > > > > >>> > 2. If there's no measurable performance difference, I
> > agree
> > > > > with
> > > > > > > > >>> Stanislav
> > > > > > > > >>> > that Optional would be better than Integer.
> > > > > > > > >>> >
> > > > > > > > >>> > 3. We should include the javadoc for the newly introduced
> > > > > method
> > > > > > that
> > > > > > > > >>> > specifies it and its parameters. In particular, it would
> > > good
> > > > > to
> > > > > > > > >>> specify if
> > > > > > > > >>> > it gets called when an explicit partition id has been
> > > provided.
> > > > > > > > >>>
> > > > > > > > >>> Agreed.
> > > > > > > > >>>
> > > > > > > > >>> best,
> > > > > > > > >>> Colin
> > > > > > > > >>>
> > > > > > > > >>> >
> > > > > > > > >>> > Ismael
> > > > > > > > >>> >
> > > > > > > > >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <
> > > > > > jolshan@confluent.io>
> > > > > > > > >>> wrote:
> > > > > > > > >>> >
> > > > > > > > >>> > > Hello,
> > > > > > > > >>> > > This is the discussion thread for KIP-480: Sticky
> > > > > Partitioner.
> > > > > > > > >>> > >
> > > > > > > > >>> > >
> > > > > > > > >>> > >
> > > > > > > > >>>
> > > > > > > >
> > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > > > > > >>> > >
> > > > > > > > >>> > > Thank you,
> > > > > > > > >>> > > Justine Olshan
> > > > > > > > >>> > >
> > > > > > > > >>> >
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Jun Rao <ju...@confluent.io>.
Hi, Justine,

Thanks for the explanation. Your PR made the RecordAccumulator more
cooperative with the new partitioning scheme. So, what I mentioned won't be
an issue. The KIP looks good to me then.

Jun

On Fri, Jul 12, 2019 at 9:02 AM Justine Olshan <jo...@confluent.io> wrote:

> Hello all,
>
> Jun, thanks for taking a look at my KIP! We were also concerned about
> batches containing a single record so we kept this in mind for the
> implementation. The decision to switch the sticky partition actually
> involves returning from the record accumulator and assigning the new
> partition before the new batch is created. That way all of the records will
> go to this new partition's batch. If you would like to get a better look at
> how this works, please check out the PR:
> https://github.com/apache/kafka/pull/6997/files. The most important lines
> are in the append method of the RecordAccumulator and doSend in
> KafkaProducer.
>
> Colin, I think this makes sense to me except for the name
> StickyRoundRobinPartitioner seems to not really explain the behavior of
> what would be implemented. Perhaps a name indicating the sticky behavior is
> always used, or that it will be used on keys is more descriptive. Calling
> it "RoundRobin" seems a bit misleading to me.
>
> Thanks again for reviewing,
> Justine
>
> On Thu, Jul 11, 2019 at 6:07 PM Jun Rao <ju...@confluent.io> wrote:
>
> > Hi, Justine,
> >
> > Thanks for the KIP. Nice writeup and great results. Just one comment.
> >
> > 100. To add a record to the accumulator, the producer needs to know the
> > partition id. The decision of whether the record can be added to the
> > current batch is only made after the accumulator.append() call. So, when
> a
> > batch is full, it seems that the KIP will try to append the next record
> to
> > the same partition, which will trigger the creation of a new batch with a
> > single record. After that, new records will be routed to a new partition.
> > If the producer doesn't come back to the first partition in time, the
> > producer will send a single record batch. In the worse case, it can be
> that
> > every other batch has only a single record. Is this correct? If so, could
> > we avoid that?
> >
> > Jun
> >
> > On Thu, Jul 11, 2019 at 5:23 PM Colin McCabe <cm...@apache.org> wrote:
> >
> > > Hi Justine,
> > >
> > > I agree that we shouldn't change RoundRobinPartitioner, since its
> > behavior
> > > is already specified.
> > >
> > > However, we could add a new, separate StickyRoundRobinPartitioner class
> > to
> > > KIP-480 which just implemented the sticky behavior regardless of
> whether
> > > the key was null.  That seems pretty easy to add (and it wouldn't have
> to
> > > implemented right away in the first PR, of course.)  It would be an
> > option
> > > for people who wanted to configure this behavior.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Wed, Jul 10, 2019, at 08:48, Justine Olshan wrote:
> > > > Hi M,
> > > >
> > > > I'm a little confused by what you mean by extending the behavior on
> to
> > > the
> > > > RoundRobinPartitioner.
> > > > The sticky partitioner plans to remove the round-robin behavior from
> > > > records with no keys. Instead of sending them to each partition in
> > order,
> > > > it sends them all to the same partition until the batch is sent.
> > > > I don't think you can have both round-robin and sticky partition
> > > behavior.
> > > >
> > > > Thank you,
> > > > Justine Olshan
> > > >
> > > > On Wed, Jul 10, 2019 at 1:54 AM M. Manna <ma...@gmail.com> wrote:
> > > >
> > > > > Thanks for the comments Colin.
> > > > >
> > > > > My only concern is that this KIP is addressing a good feature and
> > > having
> > > > > that extended to RoundRobinPartitioner means 1 less KIP in the
> > future.
> > > > >
> > > > > Would it be appropriate to extend the support to
> > RoundRobinPartitioner
> > > too?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > On Tue, 9 Jul 2019 at 17:24, Colin McCabe <cm...@apache.org>
> > wrote:
> > > > >
> > > > > > Hi M,
> > > > > >
> > > > > > The RoundRobinPartitioner added by KIP-369 doesn't interact with
> > this
> > > > > > KIP.  If you configure your producer to use
> RoundRobinPartitioner,
> > > then
> > > > > the
> > > > > > DefaultPartitioner will not be used.  And the "sticky" behavior
> is
> > > > > > implemented only in the DefaultPartitioner.
> > > > > >
> > > > > > regards,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 9, 2019, at 05:12, M. Manna wrote:
> > > > > > > Hello Justine,
> > > > > > >
> > > > > > > I have one item I wanted to discuss.
> > > > > > >
> > > > > > > We are currently in review stage for KAFKA-3333 where we can
> > choose
> > > > > > always
> > > > > > > RoundRobin regardless of null/usable key.
> > > > > > >
> > > > > > > If I understood this KIP motivation correctly, you are still
> > > honouring
> > > > > > how
> > > > > > > the hashing of key works for DefaultPartitioner. Would you say
> > that
> > > > > > having
> > > > > > > an always "Round-Robin" partitioning with "Sticky" assignment
> > > > > (efficient
> > > > > > > batching of records for a partition) doesn't deviate from your
> > > original
> > > > > > > intention?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > On Tue, 9 Jul 2019 at 01:00, Justine Olshan <
> > jolshan@confluent.io>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hello all,
> > > > > > > >
> > > > > > > > If there are no more comments or concerns, I would like to
> > start
> > > the
> > > > > > vote
> > > > > > > > on this tomorrow afternoon.
> > > > > > > >
> > > > > > > > However, if there are still topics to discuss, feel free to
> > bring
> > > > > them
> > > > > > up
> > > > > > > > now.
> > > > > > > >
> > > > > > > > Thank you,
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan <
> > > jolshan@confluent.io>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello again,
> > > > > > > > >
> > > > > > > > > Another update to the interface has been made to the KIP.
> > > > > > > > > Please let me know if you have any feedback!
> > > > > > > > >
> > > > > > > > > Thank you,
> > > > > > > > > Justine
> > > > > > > > >
> > > > > > > > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <
> > > > > jolshan@confluent.io
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi all,
> > > > > > > > >> I made some changes to the KIP.
> > > > > > > > >> The idea is to clean up the code, make behavior more
> > explicit,
> > > > > > provide
> > > > > > > > >> more flexibility, and to keep default behavior the same.
> > > > > > > > >>
> > > > > > > > >> Now we will change the partition in onNewBatch, and
> specify
> > > the
> > > > > > > > >> conditions for this function call (non-keyed values, no
> > > explicit
> > > > > > > > >> partitions) in willCallOnNewBatch.
> > > > > > > > >> This clears up some of the issues with the implementation.
> > I'm
> > > > > > happy to
> > > > > > > > >> hear further opinions and discuss this change!
> > > > > > > > >>
> > > > > > > > >> Thank you,
> > > > > > > > >> Justine
> > > > > > > > >>
> > > > > > > > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <
> > > cmccabe@apache.org>
> > > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > > > > > > > >>> > Thanks for the KIP Justine. It looks pretty good. A few
> > > > > comments:
> > > > > > > > >>> >
> > > > > > > > >>> > 1. Should we favor partitions that are not under
> > > replicated?
> > > > > > This is
> > > > > > > > >>> > something that Netflix did too.
> > > > > > > > >>>
> > > > > > > > >>> This seems like it could lead to cascading failures,
> right?
> > > If a
> > > > > > > > >>> partition becomes under-replicated because there is too
> > much
> > > > > > traffic,
> > > > > > > > the
> > > > > > > > >>> producer stops sending to it, which puts even more load
> on
> > > the
> > > > > > > > remaining
> > > > > > > > >>> partitions, which are even more likely to fail then, etc.
> > It
> > > > > also
> > > > > > will
> > > > > > > > >>> create unbalanced load patterns on the consumers.
> > > > > > > > >>>
> > > > > > > > >>> >
> > > > > > > > >>> > 2. If there's no measurable performance difference, I
> > agree
> > > > > with
> > > > > > > > >>> Stanislav
> > > > > > > > >>> > that Optional would be better than Integer.
> > > > > > > > >>> >
> > > > > > > > >>> > 3. We should include the javadoc for the newly
> introduced
> > > > > method
> > > > > > that
> > > > > > > > >>> > specifies it and its parameters. In particular, it
> would
> > > good
> > > > > to
> > > > > > > > >>> specify if
> > > > > > > > >>> > it gets called when an explicit partition id has been
> > > provided.
> > > > > > > > >>>
> > > > > > > > >>> Agreed.
> > > > > > > > >>>
> > > > > > > > >>> best,
> > > > > > > > >>> Colin
> > > > > > > > >>>
> > > > > > > > >>> >
> > > > > > > > >>> > Ismael
> > > > > > > > >>> >
> > > > > > > > >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <
> > > > > > jolshan@confluent.io>
> > > > > > > > >>> wrote:
> > > > > > > > >>> >
> > > > > > > > >>> > > Hello,
> > > > > > > > >>> > > This is the discussion thread for KIP-480: Sticky
> > > > > Partitioner.
> > > > > > > > >>> > >
> > > > > > > > >>> > >
> > > > > > > > >>> > >
> > > > > > > > >>>
> > > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > > > > > >>> > >
> > > > > > > > >>> > > Thank you,
> > > > > > > > >>> > > Justine Olshan
> > > > > > > > >>> > >
> > > > > > > > >>> >
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
Hello all,

Jun, thanks for taking a look at my KIP! We were also concerned about
batches containing a single record so we kept this in mind for the
implementation. The decision to switch the sticky partition actually
involves returning from the record accumulator and assigning the new
partition before the new batch is created. That way all of the records will
go to this new partition's batch. If you would like to get a better look at
how this works, please check out the PR:
https://github.com/apache/kafka/pull/6997/files. The most important lines
are in the append method of the RecordAccumulator and doSend in
KafkaProducer.

Colin, I think this makes sense to me except for the name
StickyRoundRobinPartitioner seems to not really explain the behavior of
what would be implemented. Perhaps a name indicating the sticky behavior is
always used, or that it will be used on keys is more descriptive. Calling
it "RoundRobin" seems a bit misleading to me.

Thanks again for reviewing,
Justine

On Thu, Jul 11, 2019 at 6:07 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Justine,
>
> Thanks for the KIP. Nice writeup and great results. Just one comment.
>
> 100. To add a record to the accumulator, the producer needs to know the
> partition id. The decision of whether the record can be added to the
> current batch is only made after the accumulator.append() call. So, when a
> batch is full, it seems that the KIP will try to append the next record to
> the same partition, which will trigger the creation of a new batch with a
> single record. After that, new records will be routed to a new partition.
> If the producer doesn't come back to the first partition in time, the
> producer will send a single record batch. In the worse case, it can be that
> every other batch has only a single record. Is this correct? If so, could
> we avoid that?
>
> Jun
>
> On Thu, Jul 11, 2019 at 5:23 PM Colin McCabe <cm...@apache.org> wrote:
>
> > Hi Justine,
> >
> > I agree that we shouldn't change RoundRobinPartitioner, since its
> behavior
> > is already specified.
> >
> > However, we could add a new, separate StickyRoundRobinPartitioner class
> to
> > KIP-480 which just implemented the sticky behavior regardless of whether
> > the key was null.  That seems pretty easy to add (and it wouldn't have to
> > implemented right away in the first PR, of course.)  It would be an
> option
> > for people who wanted to configure this behavior.
> >
> > best,
> > Colin
> >
> >
> > On Wed, Jul 10, 2019, at 08:48, Justine Olshan wrote:
> > > Hi M,
> > >
> > > I'm a little confused by what you mean by extending the behavior on to
> > the
> > > RoundRobinPartitioner.
> > > The sticky partitioner plans to remove the round-robin behavior from
> > > records with no keys. Instead of sending them to each partition in
> order,
> > > it sends them all to the same partition until the batch is sent.
> > > I don't think you can have both round-robin and sticky partition
> > behavior.
> > >
> > > Thank you,
> > > Justine Olshan
> > >
> > > On Wed, Jul 10, 2019 at 1:54 AM M. Manna <ma...@gmail.com> wrote:
> > >
> > > > Thanks for the comments Colin.
> > > >
> > > > My only concern is that this KIP is addressing a good feature and
> > having
> > > > that extended to RoundRobinPartitioner means 1 less KIP in the
> future.
> > > >
> > > > Would it be appropriate to extend the support to
> RoundRobinPartitioner
> > too?
> > > >
> > > > Thanks,
> > > >
> > > > On Tue, 9 Jul 2019 at 17:24, Colin McCabe <cm...@apache.org>
> wrote:
> > > >
> > > > > Hi M,
> > > > >
> > > > > The RoundRobinPartitioner added by KIP-369 doesn't interact with
> this
> > > > > KIP.  If you configure your producer to use RoundRobinPartitioner,
> > then
> > > > the
> > > > > DefaultPartitioner will not be used.  And the "sticky" behavior is
> > > > > implemented only in the DefaultPartitioner.
> > > > >
> > > > > regards,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Tue, Jul 9, 2019, at 05:12, M. Manna wrote:
> > > > > > Hello Justine,
> > > > > >
> > > > > > I have one item I wanted to discuss.
> > > > > >
> > > > > > We are currently in review stage for KAFKA-3333 where we can
> choose
> > > > > always
> > > > > > RoundRobin regardless of null/usable key.
> > > > > >
> > > > > > If I understood this KIP motivation correctly, you are still
> > honouring
> > > > > how
> > > > > > the hashing of key works for DefaultPartitioner. Would you say
> that
> > > > > having
> > > > > > an always "Round-Robin" partitioning with "Sticky" assignment
> > > > (efficient
> > > > > > batching of records for a partition) doesn't deviate from your
> > original
> > > > > > intention?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > On Tue, 9 Jul 2019 at 01:00, Justine Olshan <
> jolshan@confluent.io>
> > > > > wrote:
> > > > > >
> > > > > > > Hello all,
> > > > > > >
> > > > > > > If there are no more comments or concerns, I would like to
> start
> > the
> > > > > vote
> > > > > > > on this tomorrow afternoon.
> > > > > > >
> > > > > > > However, if there are still topics to discuss, feel free to
> bring
> > > > them
> > > > > up
> > > > > > > now.
> > > > > > >
> > > > > > > Thank you,
> > > > > > > Justine
> > > > > > >
> > > > > > > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan <
> > jolshan@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello again,
> > > > > > > >
> > > > > > > > Another update to the interface has been made to the KIP.
> > > > > > > > Please let me know if you have any feedback!
> > > > > > > >
> > > > > > > > Thank you,
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <
> > > > jolshan@confluent.io
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi all,
> > > > > > > >> I made some changes to the KIP.
> > > > > > > >> The idea is to clean up the code, make behavior more
> explicit,
> > > > > provide
> > > > > > > >> more flexibility, and to keep default behavior the same.
> > > > > > > >>
> > > > > > > >> Now we will change the partition in onNewBatch, and specify
> > the
> > > > > > > >> conditions for this function call (non-keyed values, no
> > explicit
> > > > > > > >> partitions) in willCallOnNewBatch.
> > > > > > > >> This clears up some of the issues with the implementation.
> I'm
> > > > > happy to
> > > > > > > >> hear further opinions and discuss this change!
> > > > > > > >>
> > > > > > > >> Thank you,
> > > > > > > >> Justine
> > > > > > > >>
> > > > > > > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <
> > cmccabe@apache.org>
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > > > > > > >>> > Thanks for the KIP Justine. It looks pretty good. A few
> > > > comments:
> > > > > > > >>> >
> > > > > > > >>> > 1. Should we favor partitions that are not under
> > replicated?
> > > > > This is
> > > > > > > >>> > something that Netflix did too.
> > > > > > > >>>
> > > > > > > >>> This seems like it could lead to cascading failures, right?
> > If a
> > > > > > > >>> partition becomes under-replicated because there is too
> much
> > > > > traffic,
> > > > > > > the
> > > > > > > >>> producer stops sending to it, which puts even more load on
> > the
> > > > > > > remaining
> > > > > > > >>> partitions, which are even more likely to fail then, etc.
> It
> > > > also
> > > > > will
> > > > > > > >>> create unbalanced load patterns on the consumers.
> > > > > > > >>>
> > > > > > > >>> >
> > > > > > > >>> > 2. If there's no measurable performance difference, I
> agree
> > > > with
> > > > > > > >>> Stanislav
> > > > > > > >>> > that Optional would be better than Integer.
> > > > > > > >>> >
> > > > > > > >>> > 3. We should include the javadoc for the newly introduced
> > > > method
> > > > > that
> > > > > > > >>> > specifies it and its parameters. In particular, it would
> > good
> > > > to
> > > > > > > >>> specify if
> > > > > > > >>> > it gets called when an explicit partition id has been
> > provided.
> > > > > > > >>>
> > > > > > > >>> Agreed.
> > > > > > > >>>
> > > > > > > >>> best,
> > > > > > > >>> Colin
> > > > > > > >>>
> > > > > > > >>> >
> > > > > > > >>> > Ismael
> > > > > > > >>> >
> > > > > > > >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <
> > > > > jolshan@confluent.io>
> > > > > > > >>> wrote:
> > > > > > > >>> >
> > > > > > > >>> > > Hello,
> > > > > > > >>> > > This is the discussion thread for KIP-480: Sticky
> > > > Partitioner.
> > > > > > > >>> > >
> > > > > > > >>> > >
> > > > > > > >>> > >
> > > > > > > >>>
> > > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > > > > >>> > >
> > > > > > > >>> > > Thank you,
> > > > > > > >>> > > Justine Olshan
> > > > > > > >>> > >
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Jun Rao <ju...@confluent.io>.
Hi, Justine,

Thanks for the KIP. Nice writeup and great results. Just one comment.

100. To add a record to the accumulator, the producer needs to know the
partition id. The decision of whether the record can be added to the
current batch is only made after the accumulator.append() call. So, when a
batch is full, it seems that the KIP will try to append the next record to
the same partition, which will trigger the creation of a new batch with a
single record. After that, new records will be routed to a new partition.
If the producer doesn't come back to the first partition in time, the
producer will send a single record batch. In the worse case, it can be that
every other batch has only a single record. Is this correct? If so, could
we avoid that?

Jun

On Thu, Jul 11, 2019 at 5:23 PM Colin McCabe <cm...@apache.org> wrote:

> Hi Justine,
>
> I agree that we shouldn't change RoundRobinPartitioner, since its behavior
> is already specified.
>
> However, we could add a new, separate StickyRoundRobinPartitioner class to
> KIP-480 which just implemented the sticky behavior regardless of whether
> the key was null.  That seems pretty easy to add (and it wouldn't have to
> implemented right away in the first PR, of course.)  It would be an option
> for people who wanted to configure this behavior.
>
> best,
> Colin
>
>
> On Wed, Jul 10, 2019, at 08:48, Justine Olshan wrote:
> > Hi M,
> >
> > I'm a little confused by what you mean by extending the behavior on to
> the
> > RoundRobinPartitioner.
> > The sticky partitioner plans to remove the round-robin behavior from
> > records with no keys. Instead of sending them to each partition in order,
> > it sends them all to the same partition until the batch is sent.
> > I don't think you can have both round-robin and sticky partition
> behavior.
> >
> > Thank you,
> > Justine Olshan
> >
> > On Wed, Jul 10, 2019 at 1:54 AM M. Manna <ma...@gmail.com> wrote:
> >
> > > Thanks for the comments Colin.
> > >
> > > My only concern is that this KIP is addressing a good feature and
> having
> > > that extended to RoundRobinPartitioner means 1 less KIP in the future.
> > >
> > > Would it be appropriate to extend the support to RoundRobinPartitioner
> too?
> > >
> > > Thanks,
> > >
> > > On Tue, 9 Jul 2019 at 17:24, Colin McCabe <cm...@apache.org> wrote:
> > >
> > > > Hi M,
> > > >
> > > > The RoundRobinPartitioner added by KIP-369 doesn't interact with this
> > > > KIP.  If you configure your producer to use RoundRobinPartitioner,
> then
> > > the
> > > > DefaultPartitioner will not be used.  And the "sticky" behavior is
> > > > implemented only in the DefaultPartitioner.
> > > >
> > > > regards,
> > > > Colin
> > > >
> > > >
> > > > On Tue, Jul 9, 2019, at 05:12, M. Manna wrote:
> > > > > Hello Justine,
> > > > >
> > > > > I have one item I wanted to discuss.
> > > > >
> > > > > We are currently in review stage for KAFKA-3333 where we can choose
> > > > always
> > > > > RoundRobin regardless of null/usable key.
> > > > >
> > > > > If I understood this KIP motivation correctly, you are still
> honouring
> > > > how
> > > > > the hashing of key works for DefaultPartitioner. Would you say that
> > > > having
> > > > > an always "Round-Robin" partitioning with "Sticky" assignment
> > > (efficient
> > > > > batching of records for a partition) doesn't deviate from your
> original
> > > > > intention?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > On Tue, 9 Jul 2019 at 01:00, Justine Olshan <jo...@confluent.io>
> > > > wrote:
> > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > If there are no more comments or concerns, I would like to start
> the
> > > > vote
> > > > > > on this tomorrow afternoon.
> > > > > >
> > > > > > However, if there are still topics to discuss, feel free to bring
> > > them
> > > > up
> > > > > > now.
> > > > > >
> > > > > > Thank you,
> > > > > > Justine
> > > > > >
> > > > > > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan <
> jolshan@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello again,
> > > > > > >
> > > > > > > Another update to the interface has been made to the KIP.
> > > > > > > Please let me know if you have any feedback!
> > > > > > >
> > > > > > > Thank you,
> > > > > > > Justine
> > > > > > >
> > > > > > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <
> > > jolshan@confluent.io
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi all,
> > > > > > >> I made some changes to the KIP.
> > > > > > >> The idea is to clean up the code, make behavior more explicit,
> > > > provide
> > > > > > >> more flexibility, and to keep default behavior the same.
> > > > > > >>
> > > > > > >> Now we will change the partition in onNewBatch, and specify
> the
> > > > > > >> conditions for this function call (non-keyed values, no
> explicit
> > > > > > >> partitions) in willCallOnNewBatch.
> > > > > > >> This clears up some of the issues with the implementation. I'm
> > > > happy to
> > > > > > >> hear further opinions and discuss this change!
> > > > > > >>
> > > > > > >> Thank you,
> > > > > > >> Justine
> > > > > > >>
> > > > > > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <
> cmccabe@apache.org>
> > > > > > wrote:
> > > > > > >>
> > > > > > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > > > > > >>> > Thanks for the KIP Justine. It looks pretty good. A few
> > > comments:
> > > > > > >>> >
> > > > > > >>> > 1. Should we favor partitions that are not under
> replicated?
> > > > This is
> > > > > > >>> > something that Netflix did too.
> > > > > > >>>
> > > > > > >>> This seems like it could lead to cascading failures, right?
> If a
> > > > > > >>> partition becomes under-replicated because there is too much
> > > > traffic,
> > > > > > the
> > > > > > >>> producer stops sending to it, which puts even more load on
> the
> > > > > > remaining
> > > > > > >>> partitions, which are even more likely to fail then, etc.  It
> > > also
> > > > will
> > > > > > >>> create unbalanced load patterns on the consumers.
> > > > > > >>>
> > > > > > >>> >
> > > > > > >>> > 2. If there's no measurable performance difference, I agree
> > > with
> > > > > > >>> Stanislav
> > > > > > >>> > that Optional would be better than Integer.
> > > > > > >>> >
> > > > > > >>> > 3. We should include the javadoc for the newly introduced
> > > method
> > > > that
> > > > > > >>> > specifies it and its parameters. In particular, it would
> good
> > > to
> > > > > > >>> specify if
> > > > > > >>> > it gets called when an explicit partition id has been
> provided.
> > > > > > >>>
> > > > > > >>> Agreed.
> > > > > > >>>
> > > > > > >>> best,
> > > > > > >>> Colin
> > > > > > >>>
> > > > > > >>> >
> > > > > > >>> > Ismael
> > > > > > >>> >
> > > > > > >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <
> > > > jolshan@confluent.io>
> > > > > > >>> wrote:
> > > > > > >>> >
> > > > > > >>> > > Hello,
> > > > > > >>> > > This is the discussion thread for KIP-480: Sticky
> > > Partitioner.
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>>
> > > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > > > >>> > >
> > > > > > >>> > > Thank you,
> > > > > > >>> > > Justine Olshan
> > > > > > >>> > >
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Colin McCabe <cm...@apache.org>.
Hi Justine,

I agree that we shouldn't change RoundRobinPartitioner, since its behavior is already specified.

However, we could add a new, separate StickyRoundRobinPartitioner class to KIP-480 which just implemented the sticky behavior regardless of whether the key was null.  That seems pretty easy to add (and it wouldn't have to implemented right away in the first PR, of course.)  It would be an option for people who wanted to configure this behavior.

best,
Colin


On Wed, Jul 10, 2019, at 08:48, Justine Olshan wrote:
> Hi M,
> 
> I'm a little confused by what you mean by extending the behavior on to the
> RoundRobinPartitioner.
> The sticky partitioner plans to remove the round-robin behavior from
> records with no keys. Instead of sending them to each partition in order,
> it sends them all to the same partition until the batch is sent.
> I don't think you can have both round-robin and sticky partition behavior.
> 
> Thank you,
> Justine Olshan
> 
> On Wed, Jul 10, 2019 at 1:54 AM M. Manna <ma...@gmail.com> wrote:
> 
> > Thanks for the comments Colin.
> >
> > My only concern is that this KIP is addressing a good feature and having
> > that extended to RoundRobinPartitioner means 1 less KIP in the future.
> >
> > Would it be appropriate to extend the support to RoundRobinPartitioner too?
> >
> > Thanks,
> >
> > On Tue, 9 Jul 2019 at 17:24, Colin McCabe <cm...@apache.org> wrote:
> >
> > > Hi M,
> > >
> > > The RoundRobinPartitioner added by KIP-369 doesn't interact with this
> > > KIP.  If you configure your producer to use RoundRobinPartitioner, then
> > the
> > > DefaultPartitioner will not be used.  And the "sticky" behavior is
> > > implemented only in the DefaultPartitioner.
> > >
> > > regards,
> > > Colin
> > >
> > >
> > > On Tue, Jul 9, 2019, at 05:12, M. Manna wrote:
> > > > Hello Justine,
> > > >
> > > > I have one item I wanted to discuss.
> > > >
> > > > We are currently in review stage for KAFKA-3333 where we can choose
> > > always
> > > > RoundRobin regardless of null/usable key.
> > > >
> > > > If I understood this KIP motivation correctly, you are still honouring
> > > how
> > > > the hashing of key works for DefaultPartitioner. Would you say that
> > > having
> > > > an always "Round-Robin" partitioning with "Sticky" assignment
> > (efficient
> > > > batching of records for a partition) doesn't deviate from your original
> > > > intention?
> > > >
> > > > Thanks,
> > > >
> > > > On Tue, 9 Jul 2019 at 01:00, Justine Olshan <jo...@confluent.io>
> > > wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > If there are no more comments or concerns, I would like to start the
> > > vote
> > > > > on this tomorrow afternoon.
> > > > >
> > > > > However, if there are still topics to discuss, feel free to bring
> > them
> > > up
> > > > > now.
> > > > >
> > > > > Thank you,
> > > > > Justine
> > > > >
> > > > > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan <jo...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hello again,
> > > > > >
> > > > > > Another update to the interface has been made to the KIP.
> > > > > > Please let me know if you have any feedback!
> > > > > >
> > > > > > Thank you,
> > > > > > Justine
> > > > > >
> > > > > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <
> > jolshan@confluent.io
> > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >> I made some changes to the KIP.
> > > > > >> The idea is to clean up the code, make behavior more explicit,
> > > provide
> > > > > >> more flexibility, and to keep default behavior the same.
> > > > > >>
> > > > > >> Now we will change the partition in onNewBatch, and specify the
> > > > > >> conditions for this function call (non-keyed values, no explicit
> > > > > >> partitions) in willCallOnNewBatch.
> > > > > >> This clears up some of the issues with the implementation. I'm
> > > happy to
> > > > > >> hear further opinions and discuss this change!
> > > > > >>
> > > > > >> Thank you,
> > > > > >> Justine
> > > > > >>
> > > > > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <cm...@apache.org>
> > > > > wrote:
> > > > > >>
> > > > > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > > > > >>> > Thanks for the KIP Justine. It looks pretty good. A few
> > comments:
> > > > > >>> >
> > > > > >>> > 1. Should we favor partitions that are not under replicated?
> > > This is
> > > > > >>> > something that Netflix did too.
> > > > > >>>
> > > > > >>> This seems like it could lead to cascading failures, right?  If a
> > > > > >>> partition becomes under-replicated because there is too much
> > > traffic,
> > > > > the
> > > > > >>> producer stops sending to it, which puts even more load on the
> > > > > remaining
> > > > > >>> partitions, which are even more likely to fail then, etc.  It
> > also
> > > will
> > > > > >>> create unbalanced load patterns on the consumers.
> > > > > >>>
> > > > > >>> >
> > > > > >>> > 2. If there's no measurable performance difference, I agree
> > with
> > > > > >>> Stanislav
> > > > > >>> > that Optional would be better than Integer.
> > > > > >>> >
> > > > > >>> > 3. We should include the javadoc for the newly introduced
> > method
> > > that
> > > > > >>> > specifies it and its parameters. In particular, it would good
> > to
> > > > > >>> specify if
> > > > > >>> > it gets called when an explicit partition id has been provided.
> > > > > >>>
> > > > > >>> Agreed.
> > > > > >>>
> > > > > >>> best,
> > > > > >>> Colin
> > > > > >>>
> > > > > >>> >
> > > > > >>> > Ismael
> > > > > >>> >
> > > > > >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <
> > > jolshan@confluent.io>
> > > > > >>> wrote:
> > > > > >>> >
> > > > > >>> > > Hello,
> > > > > >>> > > This is the discussion thread for KIP-480: Sticky
> > Partitioner.
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>>
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > > >>> > >
> > > > > >>> > > Thank you,
> > > > > >>> > > Justine Olshan
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
Hi M,

I'm a little confused by what you mean by extending the behavior on to the
RoundRobinPartitioner.
The sticky partitioner plans to remove the round-robin behavior from
records with no keys. Instead of sending them to each partition in order,
it sends them all to the same partition until the batch is sent.
I don't think you can have both round-robin and sticky partition behavior.

Thank you,
Justine Olshan

On Wed, Jul 10, 2019 at 1:54 AM M. Manna <ma...@gmail.com> wrote:

> Thanks for the comments Colin.
>
> My only concern is that this KIP is addressing a good feature and having
> that extended to RoundRobinPartitioner means 1 less KIP in the future.
>
> Would it be appropriate to extend the support to RoundRobinPartitioner too?
>
> Thanks,
>
> On Tue, 9 Jul 2019 at 17:24, Colin McCabe <cm...@apache.org> wrote:
>
> > Hi M,
> >
> > The RoundRobinPartitioner added by KIP-369 doesn't interact with this
> > KIP.  If you configure your producer to use RoundRobinPartitioner, then
> the
> > DefaultPartitioner will not be used.  And the "sticky" behavior is
> > implemented only in the DefaultPartitioner.
> >
> > regards,
> > Colin
> >
> >
> > On Tue, Jul 9, 2019, at 05:12, M. Manna wrote:
> > > Hello Justine,
> > >
> > > I have one item I wanted to discuss.
> > >
> > > We are currently in review stage for KAFKA-3333 where we can choose
> > always
> > > RoundRobin regardless of null/usable key.
> > >
> > > If I understood this KIP motivation correctly, you are still honouring
> > how
> > > the hashing of key works for DefaultPartitioner. Would you say that
> > having
> > > an always "Round-Robin" partitioning with "Sticky" assignment
> (efficient
> > > batching of records for a partition) doesn't deviate from your original
> > > intention?
> > >
> > > Thanks,
> > >
> > > On Tue, 9 Jul 2019 at 01:00, Justine Olshan <jo...@confluent.io>
> > wrote:
> > >
> > > > Hello all,
> > > >
> > > > If there are no more comments or concerns, I would like to start the
> > vote
> > > > on this tomorrow afternoon.
> > > >
> > > > However, if there are still topics to discuss, feel free to bring
> them
> > up
> > > > now.
> > > >
> > > > Thank you,
> > > > Justine
> > > >
> > > > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan <jo...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hello again,
> > > > >
> > > > > Another update to the interface has been made to the KIP.
> > > > > Please let me know if you have any feedback!
> > > > >
> > > > > Thank you,
> > > > > Justine
> > > > >
> > > > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <
> jolshan@confluent.io
> > >
> > > > > wrote:
> > > > >
> > > > >> Hi all,
> > > > >> I made some changes to the KIP.
> > > > >> The idea is to clean up the code, make behavior more explicit,
> > provide
> > > > >> more flexibility, and to keep default behavior the same.
> > > > >>
> > > > >> Now we will change the partition in onNewBatch, and specify the
> > > > >> conditions for this function call (non-keyed values, no explicit
> > > > >> partitions) in willCallOnNewBatch.
> > > > >> This clears up some of the issues with the implementation. I'm
> > happy to
> > > > >> hear further opinions and discuss this change!
> > > > >>
> > > > >> Thank you,
> > > > >> Justine
> > > > >>
> > > > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <cm...@apache.org>
> > > > wrote:
> > > > >>
> > > > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > > > >>> > Thanks for the KIP Justine. It looks pretty good. A few
> comments:
> > > > >>> >
> > > > >>> > 1. Should we favor partitions that are not under replicated?
> > This is
> > > > >>> > something that Netflix did too.
> > > > >>>
> > > > >>> This seems like it could lead to cascading failures, right?  If a
> > > > >>> partition becomes under-replicated because there is too much
> > traffic,
> > > > the
> > > > >>> producer stops sending to it, which puts even more load on the
> > > > remaining
> > > > >>> partitions, which are even more likely to fail then, etc.  It
> also
> > will
> > > > >>> create unbalanced load patterns on the consumers.
> > > > >>>
> > > > >>> >
> > > > >>> > 2. If there's no measurable performance difference, I agree
> with
> > > > >>> Stanislav
> > > > >>> > that Optional would be better than Integer.
> > > > >>> >
> > > > >>> > 3. We should include the javadoc for the newly introduced
> method
> > that
> > > > >>> > specifies it and its parameters. In particular, it would good
> to
> > > > >>> specify if
> > > > >>> > it gets called when an explicit partition id has been provided.
> > > > >>>
> > > > >>> Agreed.
> > > > >>>
> > > > >>> best,
> > > > >>> Colin
> > > > >>>
> > > > >>> >
> > > > >>> > Ismael
> > > > >>> >
> > > > >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <
> > jolshan@confluent.io>
> > > > >>> wrote:
> > > > >>> >
> > > > >>> > > Hello,
> > > > >>> > > This is the discussion thread for KIP-480: Sticky
> Partitioner.
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >
> > > > >>>
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > >>> > >
> > > > >>> > > Thank you,
> > > > >>> > > Justine Olshan
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by "M. Manna" <ma...@gmail.com>.
Thanks for the comments Colin.

My only concern is that this KIP is addressing a good feature and having
that extended to RoundRobinPartitioner means 1 less KIP in the future.

Would it be appropriate to extend the support to RoundRobinPartitioner too?

Thanks,

On Tue, 9 Jul 2019 at 17:24, Colin McCabe <cm...@apache.org> wrote:

> Hi M,
>
> The RoundRobinPartitioner added by KIP-369 doesn't interact with this
> KIP.  If you configure your producer to use RoundRobinPartitioner, then the
> DefaultPartitioner will not be used.  And the "sticky" behavior is
> implemented only in the DefaultPartitioner.
>
> regards,
> Colin
>
>
> On Tue, Jul 9, 2019, at 05:12, M. Manna wrote:
> > Hello Justine,
> >
> > I have one item I wanted to discuss.
> >
> > We are currently in review stage for KAFKA-3333 where we can choose
> always
> > RoundRobin regardless of null/usable key.
> >
> > If I understood this KIP motivation correctly, you are still honouring
> how
> > the hashing of key works for DefaultPartitioner. Would you say that
> having
> > an always "Round-Robin" partitioning with "Sticky" assignment (efficient
> > batching of records for a partition) doesn't deviate from your original
> > intention?
> >
> > Thanks,
> >
> > On Tue, 9 Jul 2019 at 01:00, Justine Olshan <jo...@confluent.io>
> wrote:
> >
> > > Hello all,
> > >
> > > If there are no more comments or concerns, I would like to start the
> vote
> > > on this tomorrow afternoon.
> > >
> > > However, if there are still topics to discuss, feel free to bring them
> up
> > > now.
> > >
> > > Thank you,
> > > Justine
> > >
> > > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan <jo...@confluent.io>
> > > wrote:
> > >
> > > > Hello again,
> > > >
> > > > Another update to the interface has been made to the KIP.
> > > > Please let me know if you have any feedback!
> > > >
> > > > Thank you,
> > > > Justine
> > > >
> > > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <jolshan@confluent.io
> >
> > > > wrote:
> > > >
> > > >> Hi all,
> > > >> I made some changes to the KIP.
> > > >> The idea is to clean up the code, make behavior more explicit,
> provide
> > > >> more flexibility, and to keep default behavior the same.
> > > >>
> > > >> Now we will change the partition in onNewBatch, and specify the
> > > >> conditions for this function call (non-keyed values, no explicit
> > > >> partitions) in willCallOnNewBatch.
> > > >> This clears up some of the issues with the implementation. I'm
> happy to
> > > >> hear further opinions and discuss this change!
> > > >>
> > > >> Thank you,
> > > >> Justine
> > > >>
> > > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <cm...@apache.org>
> > > wrote:
> > > >>
> > > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > > >>> > Thanks for the KIP Justine. It looks pretty good. A few comments:
> > > >>> >
> > > >>> > 1. Should we favor partitions that are not under replicated?
> This is
> > > >>> > something that Netflix did too.
> > > >>>
> > > >>> This seems like it could lead to cascading failures, right?  If a
> > > >>> partition becomes under-replicated because there is too much
> traffic,
> > > the
> > > >>> producer stops sending to it, which puts even more load on the
> > > remaining
> > > >>> partitions, which are even more likely to fail then, etc.  It also
> will
> > > >>> create unbalanced load patterns on the consumers.
> > > >>>
> > > >>> >
> > > >>> > 2. If there's no measurable performance difference, I agree with
> > > >>> Stanislav
> > > >>> > that Optional would be better than Integer.
> > > >>> >
> > > >>> > 3. We should include the javadoc for the newly introduced method
> that
> > > >>> > specifies it and its parameters. In particular, it would good to
> > > >>> specify if
> > > >>> > it gets called when an explicit partition id has been provided.
> > > >>>
> > > >>> Agreed.
> > > >>>
> > > >>> best,
> > > >>> Colin
> > > >>>
> > > >>> >
> > > >>> > Ismael
> > > >>> >
> > > >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <
> jolshan@confluent.io>
> > > >>> wrote:
> > > >>> >
> > > >>> > > Hello,
> > > >>> > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > >>> > >
> > > >>> > > Thank you,
> > > >>> > > Justine Olshan
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Colin McCabe <cm...@apache.org>.
Hi M,

The RoundRobinPartitioner added by KIP-369 doesn't interact with this KIP.  If you configure your producer to use RoundRobinPartitioner, then the DefaultPartitioner will not be used.  And the "sticky" behavior is implemented only in the DefaultPartitioner.

regards,
Colin


On Tue, Jul 9, 2019, at 05:12, M. Manna wrote:
> Hello Justine,
> 
> I have one item I wanted to discuss.
> 
> We are currently in review stage for KAFKA-3333 where we can choose always
> RoundRobin regardless of null/usable key.
> 
> If I understood this KIP motivation correctly, you are still honouring how
> the hashing of key works for DefaultPartitioner. Would you say that having
> an always "Round-Robin" partitioning with "Sticky" assignment (efficient
> batching of records for a partition) doesn't deviate from your original
> intention?
> 
> Thanks,
> 
> On Tue, 9 Jul 2019 at 01:00, Justine Olshan <jo...@confluent.io> wrote:
> 
> > Hello all,
> >
> > If there are no more comments or concerns, I would like to start the vote
> > on this tomorrow afternoon.
> >
> > However, if there are still topics to discuss, feel free to bring them up
> > now.
> >
> > Thank you,
> > Justine
> >
> > On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan <jo...@confluent.io>
> > wrote:
> >
> > > Hello again,
> > >
> > > Another update to the interface has been made to the KIP.
> > > Please let me know if you have any feedback!
> > >
> > > Thank you,
> > > Justine
> > >
> > > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <jo...@confluent.io>
> > > wrote:
> > >
> > >> Hi all,
> > >> I made some changes to the KIP.
> > >> The idea is to clean up the code, make behavior more explicit, provide
> > >> more flexibility, and to keep default behavior the same.
> > >>
> > >> Now we will change the partition in onNewBatch, and specify the
> > >> conditions for this function call (non-keyed values, no explicit
> > >> partitions) in willCallOnNewBatch.
> > >> This clears up some of the issues with the implementation. I'm happy to
> > >> hear further opinions and discuss this change!
> > >>
> > >> Thank you,
> > >> Justine
> > >>
> > >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <cm...@apache.org>
> > wrote:
> > >>
> > >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > >>> > Thanks for the KIP Justine. It looks pretty good. A few comments:
> > >>> >
> > >>> > 1. Should we favor partitions that are not under replicated? This is
> > >>> > something that Netflix did too.
> > >>>
> > >>> This seems like it could lead to cascading failures, right?  If a
> > >>> partition becomes under-replicated because there is too much traffic,
> > the
> > >>> producer stops sending to it, which puts even more load on the
> > remaining
> > >>> partitions, which are even more likely to fail then, etc.  It also will
> > >>> create unbalanced load patterns on the consumers.
> > >>>
> > >>> >
> > >>> > 2. If there's no measurable performance difference, I agree with
> > >>> Stanislav
> > >>> > that Optional would be better than Integer.
> > >>> >
> > >>> > 3. We should include the javadoc for the newly introduced method that
> > >>> > specifies it and its parameters. In particular, it would good to
> > >>> specify if
> > >>> > it gets called when an explicit partition id has been provided.
> > >>>
> > >>> Agreed.
> > >>>
> > >>> best,
> > >>> Colin
> > >>>
> > >>> >
> > >>> > Ismael
> > >>> >
> > >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <jo...@confluent.io>
> > >>> wrote:
> > >>> >
> > >>> > > Hello,
> > >>> > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > >>> > >
> > >>> > >
> > >>> > >
> > >>>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > >>> > >
> > >>> > > Thank you,
> > >>> > > Justine Olshan
> > >>> > >
> > >>> >
> > >>>
> > >>
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by "M. Manna" <ma...@gmail.com>.
Hello Justine,

I have one item I wanted to discuss.

We are currently in review stage for KAFKA-3333 where we can choose always
RoundRobin regardless of null/usable key.

If I understood this KIP motivation correctly, you are still honouring how
the hashing of key works for DefaultPartitioner. Would you say that having
an always "Round-Robin" partitioning with "Sticky" assignment (efficient
batching of records for a partition) doesn't deviate from your original
intention?

Thanks,

On Tue, 9 Jul 2019 at 01:00, Justine Olshan <jo...@confluent.io> wrote:

> Hello all,
>
> If there are no more comments or concerns, I would like to start the vote
> on this tomorrow afternoon.
>
> However, if there are still topics to discuss, feel free to bring them up
> now.
>
> Thank you,
> Justine
>
> On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan <jo...@confluent.io>
> wrote:
>
> > Hello again,
> >
> > Another update to the interface has been made to the KIP.
> > Please let me know if you have any feedback!
> >
> > Thank you,
> > Justine
> >
> > On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <jo...@confluent.io>
> > wrote:
> >
> >> Hi all,
> >> I made some changes to the KIP.
> >> The idea is to clean up the code, make behavior more explicit, provide
> >> more flexibility, and to keep default behavior the same.
> >>
> >> Now we will change the partition in onNewBatch, and specify the
> >> conditions for this function call (non-keyed values, no explicit
> >> partitions) in willCallOnNewBatch.
> >> This clears up some of the issues with the implementation. I'm happy to
> >> hear further opinions and discuss this change!
> >>
> >> Thank you,
> >> Justine
> >>
> >> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <cm...@apache.org>
> wrote:
> >>
> >>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> >>> > Thanks for the KIP Justine. It looks pretty good. A few comments:
> >>> >
> >>> > 1. Should we favor partitions that are not under replicated? This is
> >>> > something that Netflix did too.
> >>>
> >>> This seems like it could lead to cascading failures, right?  If a
> >>> partition becomes under-replicated because there is too much traffic,
> the
> >>> producer stops sending to it, which puts even more load on the
> remaining
> >>> partitions, which are even more likely to fail then, etc.  It also will
> >>> create unbalanced load patterns on the consumers.
> >>>
> >>> >
> >>> > 2. If there's no measurable performance difference, I agree with
> >>> Stanislav
> >>> > that Optional would be better than Integer.
> >>> >
> >>> > 3. We should include the javadoc for the newly introduced method that
> >>> > specifies it and its parameters. In particular, it would good to
> >>> specify if
> >>> > it gets called when an explicit partition id has been provided.
> >>>
> >>> Agreed.
> >>>
> >>> best,
> >>> Colin
> >>>
> >>> >
> >>> > Ismael
> >>> >
> >>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <jo...@confluent.io>
> >>> wrote:
> >>> >
> >>> > > Hello,
> >>> > > This is the discussion thread for KIP-480: Sticky Partitioner.
> >>> > >
> >>> > >
> >>> > >
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> >>> > >
> >>> > > Thank you,
> >>> > > Justine Olshan
> >>> > >
> >>> >
> >>>
> >>
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
Hello all,

If there are no more comments or concerns, I would like to start the vote
on this tomorrow afternoon.

However, if there are still topics to discuss, feel free to bring them up
now.

Thank you,
Justine

On Tue, Jul 2, 2019 at 4:25 PM Justine Olshan <jo...@confluent.io> wrote:

> Hello again,
>
> Another update to the interface has been made to the KIP.
> Please let me know if you have any feedback!
>
> Thank you,
> Justine
>
> On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <jo...@confluent.io>
> wrote:
>
>> Hi all,
>> I made some changes to the KIP.
>> The idea is to clean up the code, make behavior more explicit, provide
>> more flexibility, and to keep default behavior the same.
>>
>> Now we will change the partition in onNewBatch, and specify the
>> conditions for this function call (non-keyed values, no explicit
>> partitions) in willCallOnNewBatch.
>> This clears up some of the issues with the implementation. I'm happy to
>> hear further opinions and discuss this change!
>>
>> Thank you,
>> Justine
>>
>> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <cm...@apache.org> wrote:
>>
>>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
>>> > Thanks for the KIP Justine. It looks pretty good. A few comments:
>>> >
>>> > 1. Should we favor partitions that are not under replicated? This is
>>> > something that Netflix did too.
>>>
>>> This seems like it could lead to cascading failures, right?  If a
>>> partition becomes under-replicated because there is too much traffic, the
>>> producer stops sending to it, which puts even more load on the remaining
>>> partitions, which are even more likely to fail then, etc.  It also will
>>> create unbalanced load patterns on the consumers.
>>>
>>> >
>>> > 2. If there's no measurable performance difference, I agree with
>>> Stanislav
>>> > that Optional would be better than Integer.
>>> >
>>> > 3. We should include the javadoc for the newly introduced method that
>>> > specifies it and its parameters. In particular, it would good to
>>> specify if
>>> > it gets called when an explicit partition id has been provided.
>>>
>>> Agreed.
>>>
>>> best,
>>> Colin
>>>
>>> >
>>> > Ismael
>>> >
>>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <jo...@confluent.io>
>>> wrote:
>>> >
>>> > > Hello,
>>> > > This is the discussion thread for KIP-480: Sticky Partitioner.
>>> > >
>>> > >
>>> > >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
>>> > >
>>> > > Thank you,
>>> > > Justine Olshan
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
Hello again,

Another update to the interface has been made to the KIP.
Please let me know if you have any feedback!

Thank you,
Justine

On Fri, Jun 28, 2019 at 2:52 PM Justine Olshan <jo...@confluent.io> wrote:

> Hi all,
> I made some changes to the KIP.
> The idea is to clean up the code, make behavior more explicit, provide
> more flexibility, and to keep default behavior the same.
>
> Now we will change the partition in onNewBatch, and specify the conditions
> for this function call (non-keyed values, no explicit partitions) in
> willCallOnNewBatch.
> This clears up some of the issues with the implementation. I'm happy to
> hear further opinions and discuss this change!
>
> Thank you,
> Justine
>
> On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <cm...@apache.org> wrote:
>
>> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
>> > Thanks for the KIP Justine. It looks pretty good. A few comments:
>> >
>> > 1. Should we favor partitions that are not under replicated? This is
>> > something that Netflix did too.
>>
>> This seems like it could lead to cascading failures, right?  If a
>> partition becomes under-replicated because there is too much traffic, the
>> producer stops sending to it, which puts even more load on the remaining
>> partitions, which are even more likely to fail then, etc.  It also will
>> create unbalanced load patterns on the consumers.
>>
>> >
>> > 2. If there's no measurable performance difference, I agree with
>> Stanislav
>> > that Optional would be better than Integer.
>> >
>> > 3. We should include the javadoc for the newly introduced method that
>> > specifies it and its parameters. In particular, it would good to
>> specify if
>> > it gets called when an explicit partition id has been provided.
>>
>> Agreed.
>>
>> best,
>> Colin
>>
>> >
>> > Ismael
>> >
>> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <jo...@confluent.io>
>> wrote:
>> >
>> > > Hello,
>> > > This is the discussion thread for KIP-480: Sticky Partitioner.
>> > >
>> > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
>> > >
>> > > Thank you,
>> > > Justine Olshan
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
Hi all,
I made some changes to the KIP.
The idea is to clean up the code, make behavior more explicit, provide more
flexibility, and to keep default behavior the same.

Now we will change the partition in onNewBatch, and specify the conditions
for this function call (non-keyed values, no explicit partitions) in
willCallOnNewBatch.
This clears up some of the issues with the implementation. I'm happy to
hear further opinions and discuss this change!

Thank you,
Justine

On Thu, Jun 27, 2019 at 2:53 PM Colin McCabe <cm...@apache.org> wrote:

> On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> > Thanks for the KIP Justine. It looks pretty good. A few comments:
> >
> > 1. Should we favor partitions that are not under replicated? This is
> > something that Netflix did too.
>
> This seems like it could lead to cascading failures, right?  If a
> partition becomes under-replicated because there is too much traffic, the
> producer stops sending to it, which puts even more load on the remaining
> partitions, which are even more likely to fail then, etc.  It also will
> create unbalanced load patterns on the consumers.
>
> >
> > 2. If there's no measurable performance difference, I agree with
> Stanislav
> > that Optional would be better than Integer.
> >
> > 3. We should include the javadoc for the newly introduced method that
> > specifies it and its parameters. In particular, it would good to specify
> if
> > it gets called when an explicit partition id has been provided.
>
> Agreed.
>
> best,
> Colin
>
> >
> > Ismael
> >
> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <jo...@confluent.io>
> wrote:
> >
> > > Hello,
> > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > >
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > >
> > > Thank you,
> > > Justine Olshan
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Colin McCabe <cm...@apache.org>.
On Thu, Jun 27, 2019, at 01:31, Ismael Juma wrote:
> Thanks for the KIP Justine. It looks pretty good. A few comments:
> 
> 1. Should we favor partitions that are not under replicated? This is
> something that Netflix did too.

This seems like it could lead to cascading failures, right?  If a partition becomes under-replicated because there is too much traffic, the producer stops sending to it, which puts even more load on the remaining partitions, which are even more likely to fail then, etc.  It also will create unbalanced load patterns on the consumers.

> 
> 2. If there's no measurable performance difference, I agree with Stanislav
> that Optional would be better than Integer.
> 
> 3. We should include the javadoc for the newly introduced method that
> specifies it and its parameters. In particular, it would good to specify if
> it gets called when an explicit partition id has been provided.

Agreed.

best,
Colin

> 
> Ismael
> 
> On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <jo...@confluent.io> wrote:
> 
> > Hello,
> > This is the discussion thread for KIP-480: Sticky Partitioner.
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> >
> > Thank you,
> > Justine Olshan
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
Moving the previous comment to the PR discussion. :)

On Thu, Jun 27, 2019 at 10:51 AM Justine Olshan <jo...@confluent.io>
wrote:

> I was going through fixing some of the overloaded methods and I realized I
> overloaded the RecordAccumulator constructor. I added a parameter to
> include the partitioner so I can call my method. However, the tests for the
> record accumulator do not have a partitioner. There is the potential for a
> NPE when calling this method. Currently, none of the tests will enter the
> code block, but I was wondering if would be a good idea to include a
> partitioner != null in the if statement as well. I'm open to other
> suggestions if this is not clear about what is going on.
>
> Ismael,
> Oh I see now. It seems like Netflix just checks if the leader is available
> as well.
> I'll look into the case where no replica is down.
>
>
>
> On Thu, Jun 27, 2019 at 10:39 AM Ismael Juma <is...@gmail.com> wrote:
>
>> Hey Justine.
>>
>> Available could mean that some replicas are down but the leader is
>> available. The suggestion was to try a partition where no replica was down
>> if it's available. Such partitions are safer in general. There could be
>> some downsides too, so worth thinking about the trade-offs.
>>
>> Ismael
>>
>> On Thu, Jun 27, 2019, 10:24 AM Justine Olshan <jo...@confluent.io>
>> wrote:
>>
>> > Ismael,
>> >
>> > Thanks for the feedback!
>> >
>> > For 1, currently the sticky partitioner favors "available partitions."
>> From
>> > my understanding, these are partitions that are not under-replicated. If
>> > that is not the same, please let me know.
>> > As for 2, I've switched to Optional, and the few tests I've run so far
>> > suggest the performance is the same.
>> > And for 3, I've added a javadoc to my next commit, so that should be up
>> > soon.
>> >
>> > Thanks,
>> > Justine
>> >
>> > On Thu, Jun 27, 2019 at 1:31 AM Ismael Juma <is...@juma.me.uk> wrote:
>> >
>> > > Thanks for the KIP Justine. It looks pretty good. A few comments:
>> > >
>> > > 1. Should we favor partitions that are not under replicated? This is
>> > > something that Netflix did too.
>> > >
>> > > 2. If there's no measurable performance difference, I agree with
>> > Stanislav
>> > > that Optional would be better than Integer.
>> > >
>> > > 3. We should include the javadoc for the newly introduced method that
>> > > specifies it and its parameters. In particular, it would good to
>> specify
>> > if
>> > > it gets called when an explicit partition id has been provided.
>> > >
>> > > Ismael
>> > >
>> > > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <jo...@confluent.io>
>> > wrote:
>> > >
>> > > > Hello,
>> > > > This is the discussion thread for KIP-480: Sticky Partitioner.
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
>> > > >
>> > > > Thank you,
>> > > > Justine Olshan
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
I was going through fixing some of the overloaded methods and I realized I
overloaded the RecordAccumulator constructor. I added a parameter to
include the partitioner so I can call my method. However, the tests for the
record accumulator do not have a partitioner. There is the potential for a
NPE when calling this method. Currently, none of the tests will enter the
code block, but I was wondering if would be a good idea to include a
partitioner != null in the if statement as well. I'm open to other
suggestions if this is not clear about what is going on.

Ismael,
Oh I see now. It seems like Netflix just checks if the leader is available
as well.
I'll look into the case where no replica is down.



On Thu, Jun 27, 2019 at 10:39 AM Ismael Juma <is...@gmail.com> wrote:

> Hey Justine.
>
> Available could mean that some replicas are down but the leader is
> available. The suggestion was to try a partition where no replica was down
> if it's available. Such partitions are safer in general. There could be
> some downsides too, so worth thinking about the trade-offs.
>
> Ismael
>
> On Thu, Jun 27, 2019, 10:24 AM Justine Olshan <jo...@confluent.io>
> wrote:
>
> > Ismael,
> >
> > Thanks for the feedback!
> >
> > For 1, currently the sticky partitioner favors "available partitions."
> From
> > my understanding, these are partitions that are not under-replicated. If
> > that is not the same, please let me know.
> > As for 2, I've switched to Optional, and the few tests I've run so far
> > suggest the performance is the same.
> > And for 3, I've added a javadoc to my next commit, so that should be up
> > soon.
> >
> > Thanks,
> > Justine
> >
> > On Thu, Jun 27, 2019 at 1:31 AM Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > Thanks for the KIP Justine. It looks pretty good. A few comments:
> > >
> > > 1. Should we favor partitions that are not under replicated? This is
> > > something that Netflix did too.
> > >
> > > 2. If there's no measurable performance difference, I agree with
> > Stanislav
> > > that Optional would be better than Integer.
> > >
> > > 3. We should include the javadoc for the newly introduced method that
> > > specifies it and its parameters. In particular, it would good to
> specify
> > if
> > > it gets called when an explicit partition id has been provided.
> > >
> > > Ismael
> > >
> > > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <jo...@confluent.io>
> > wrote:
> > >
> > > > Hello,
> > > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > >
> > > > Thank you,
> > > > Justine Olshan
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Ismael Juma <is...@gmail.com>.
Hey Justine.

Available could mean that some replicas are down but the leader is
available. The suggestion was to try a partition where no replica was down
if it's available. Such partitions are safer in general. There could be
some downsides too, so worth thinking about the trade-offs.

Ismael

On Thu, Jun 27, 2019, 10:24 AM Justine Olshan <jo...@confluent.io> wrote:

> Ismael,
>
> Thanks for the feedback!
>
> For 1, currently the sticky partitioner favors "available partitions." From
> my understanding, these are partitions that are not under-replicated. If
> that is not the same, please let me know.
> As for 2, I've switched to Optional, and the few tests I've run so far
> suggest the performance is the same.
> And for 3, I've added a javadoc to my next commit, so that should be up
> soon.
>
> Thanks,
> Justine
>
> On Thu, Jun 27, 2019 at 1:31 AM Ismael Juma <is...@juma.me.uk> wrote:
>
> > Thanks for the KIP Justine. It looks pretty good. A few comments:
> >
> > 1. Should we favor partitions that are not under replicated? This is
> > something that Netflix did too.
> >
> > 2. If there's no measurable performance difference, I agree with
> Stanislav
> > that Optional would be better than Integer.
> >
> > 3. We should include the javadoc for the newly introduced method that
> > specifies it and its parameters. In particular, it would good to specify
> if
> > it gets called when an explicit partition id has been provided.
> >
> > Ismael
> >
> > On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <jo...@confluent.io>
> wrote:
> >
> > > Hello,
> > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > >
> > > Thank you,
> > > Justine Olshan
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
Ismael,

Thanks for the feedback!

For 1, currently the sticky partitioner favors "available partitions." From
my understanding, these are partitions that are not under-replicated. If
that is not the same, please let me know.
As for 2, I've switched to Optional, and the few tests I've run so far
suggest the performance is the same.
And for 3, I've added a javadoc to my next commit, so that should be up
soon.

Thanks,
Justine

On Thu, Jun 27, 2019 at 1:31 AM Ismael Juma <is...@juma.me.uk> wrote:

> Thanks for the KIP Justine. It looks pretty good. A few comments:
>
> 1. Should we favor partitions that are not under replicated? This is
> something that Netflix did too.
>
> 2. If there's no measurable performance difference, I agree with Stanislav
> that Optional would be better than Integer.
>
> 3. We should include the javadoc for the newly introduced method that
> specifies it and its parameters. In particular, it would good to specify if
> it gets called when an explicit partition id has been provided.
>
> Ismael
>
> On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <jo...@confluent.io> wrote:
>
> > Hello,
> > This is the discussion thread for KIP-480: Sticky Partitioner.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> >
> > Thank you,
> > Justine Olshan
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Ismael Juma <is...@juma.me.uk>.
Thanks for the KIP Justine. It looks pretty good. A few comments:

1. Should we favor partitions that are not under replicated? This is
something that Netflix did too.

2. If there's no measurable performance difference, I agree with Stanislav
that Optional would be better than Integer.

3. We should include the javadoc for the newly introduced method that
specifies it and its parameters. In particular, it would good to specify if
it gets called when an explicit partition id has been provided.

Ismael

On Mon, Jun 24, 2019, 2:04 PM Justine Olshan <jo...@confluent.io> wrote:

> Hello,
> This is the discussion thread for KIP-480: Sticky Partitioner.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
>
> Thank you,
> Justine Olshan
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Colin McCabe <cm...@apache.org>.
Thanks Justine!  You might want to update the "JIRA" link on the KIP so that it links to this.

cheers,
Colin

On Tue, Jun 25, 2019, at 11:59, Justine Olshan wrote:
> Thank you for looking at my KIP!
> 
> I will get to work on these changes.
> 
> In addition, here is the JIRA ticket:
> https://issues.apache.org/jira/browse/KAFKA-8601
> 
> Thanks again,
> Justine
> 
> On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe <cm...@apache.org> wrote:
> 
> > Hi Justine,
> >
> > The KIP discusses adding a new method to the partitioner interface.
> >
> > > default public Integer onNewBatch(String topic, Cluster cluster) { ... }
> >
> > However, this new method doesn't give the partitioner access to the key
> > and value of the message.  While this works for the case described here (no
> > key), in general we might need this information when re-assigning a
> > partitition based on the batch completing.  So I think we should add these
> > methods to onNewBatch.
> >
> > Also, it would be nice to call this something like "repartitionOnNewBatch"
> > or something, to make it clearer what is going on.
> >
> > best,
> > Colin
> >
> > On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> > > Thank you Justine for the KIP! Do you mind creating a corresponding JIRA
> > > ticket too?
> > >
> > > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <cm...@apache.org> wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the KIP.  This looks great!
> > > >
> > > > In one place in the KIP, you write: "Remove
> > > > testRoundRobinWithUnavailablePartitions() and testRoundRobin() since
> > the
> > > > round robin functionality of the partitioner has been removed."  You
> > can
> > > > skip this and similar lines.  We don't need to describe changes to
> > internal
> > > > test classes in the KIP since they're not visible to users or external
> > > > developers.
> > > >
> > > > It seems like maybe the performance tests should get their own section.
> > > > Right now, the way the layout is makes it look like they are part of
> > the
> > > > "Compatibility, Deprecation, and Migration Plan"
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > > > > Hello,
> > > > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > >
> > > > > Thank you,
> > > > > Justine Olshan
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
Thank you for looking at my KIP!

I will get to work on these changes.

In addition, here is the JIRA ticket:
https://issues.apache.org/jira/browse/KAFKA-8601

Thanks again,
Justine

On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe <cm...@apache.org> wrote:

> Hi Justine,
>
> The KIP discusses adding a new method to the partitioner interface.
>
> > default public Integer onNewBatch(String topic, Cluster cluster) { ... }
>
> However, this new method doesn't give the partitioner access to the key
> and value of the message.  While this works for the case described here (no
> key), in general we might need this information when re-assigning a
> partitition based on the batch completing.  So I think we should add these
> methods to onNewBatch.
>
> Also, it would be nice to call this something like "repartitionOnNewBatch"
> or something, to make it clearer what is going on.
>
> best,
> Colin
>
> On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> > Thank you Justine for the KIP! Do you mind creating a corresponding JIRA
> > ticket too?
> >
> > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <cm...@apache.org> wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP.  This looks great!
> > >
> > > In one place in the KIP, you write: "Remove
> > > testRoundRobinWithUnavailablePartitions() and testRoundRobin() since
> the
> > > round robin functionality of the partitioner has been removed."  You
> can
> > > skip this and similar lines.  We don't need to describe changes to
> internal
> > > test classes in the KIP since they're not visible to users or external
> > > developers.
> > >
> > > It seems like maybe the performance tests should get their own section.
> > > Right now, the way the layout is makes it look like they are part of
> the
> > > "Compatibility, Deprecation, and Migration Plan"
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > > > Hello,
> > > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > >
> > > > Thank you,
> > > > Justine Olshan
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
Stanislav,
Thank you for looking at my KIP!

I did discuss with Colin about whether the null vs. Optional types and we
did not come to a strong conclusion either way.
I'd be happy to change it if it makes the logic more clear.

Thanks,
Justine

On Wed, Jun 26, 2019 at 2:46 PM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Hey Justine,
>
> Thanks for the KIP! I am impressed by the performance results linked in the
> KIP and I like the data-driven approach. This looks like a great
> improvement.
>
> I had one minor question regarding the public interface
> `repartitionOnNewBatch` where we return null in the case of no change
> needed. Have we considered using Java's Optional type to avoid null values?
>
> Best,
> Stanislav
>
> On Tue, Jun 25, 2019 at 11:29 PM Colin McCabe <cm...@apache.org> wrote:
>
> > No worries.  Thanks for fixing it!
> > C.
> >
> > On Tue, Jun 25, 2019, at 13:47, Justine Olshan wrote:
> > > Also apologies on the late link to the jira, but apparently https links
> > do
> > > not work and it kept defaulting to an image on my desktop even when it
> > > looked like I put the correct link in. Weird...
> > >
> > > On Tue, Jun 25, 2019 at 1:41 PM Justine Olshan <jo...@confluent.io>
> > wrote:
> > >
> > > > I came up with a good solution for this and will push the commit
> soon.
> > The
> > > > repartition will be called only when a partition is not manually
> sent.
> > > >
> > > > On Tue, Jun 25, 2019 at 1:39 PM Colin McCabe <cm...@apache.org>
> > wrote:
> > > >
> > > >> Well, this is a generic partitioner method, so it shouldn't dictate
> > any
> > > >> particular behavior.
> > > >>
> > > >> Colin
> > > >>
> > > >>
> > > >> On Tue, Jun 25, 2019, at 12:04, Justine Olshan wrote:
> > > >> > I also just noticed that if we want to use this method on the
> keyed
> > > >> record
> > > >> > case, I will need to move the method outside of the sticky (no
> key,
> > no
> > > >> set
> > > >> > partition) check. Not a big problem, but something to keep in
> mind.
> > > >> > Perhaps, we should encapsulate the sticky vs. not behavior inside
> > the
> > > >> > method? More things to think about.
> > > >> >
> > > >> > On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe <cmccabe@apache.org
> >
> > > >> wrote:
> > > >> >
> > > >> > > Hi Justine,
> > > >> > >
> > > >> > > The KIP discusses adding a new method to the partitioner
> > interface.
> > > >> > >
> > > >> > > > default public Integer onNewBatch(String topic, Cluster
> > cluster) {
> > > >> ... }
> > > >> > >
> > > >> > > However, this new method doesn't give the partitioner access to
> > the
> > > >> key
> > > >> > > and value of the message.  While this works for the case
> described
> > > >> here (no
> > > >> > > key), in general we might need this information when
> re-assigning
> > a
> > > >> > > partitition based on the batch completing.  So I think we should
> > add
> > > >> these
> > > >> > > methods to onNewBatch.
> > > >> > >
> > > >> > > Also, it would be nice to call this something like
> > > >> "repartitionOnNewBatch"
> > > >> > > or something, to make it clearer what is going on.
> > > >> > >
> > > >> > > best,
> > > >> > > Colin
> > > >> > >
> > > >> > > On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> > > >> > > > Thank you Justine for the KIP! Do you mind creating a
> > corresponding
> > > >> JIRA
> > > >> > > > ticket too?
> > > >> > > >
> > > >> > > > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <
> > cmccabe@apache.org>
> > > >> wrote:
> > > >> > > >
> > > >> > > > > Hi Justine,
> > > >> > > > >
> > > >> > > > > Thanks for the KIP.  This looks great!
> > > >> > > > >
> > > >> > > > > In one place in the KIP, you write: "Remove
> > > >> > > > > testRoundRobinWithUnavailablePartitions() and
> testRoundRobin()
> > > >> since
> > > >> > > the
> > > >> > > > > round robin functionality of the partitioner has been
> > removed."
> > > >> You
> > > >> > > can
> > > >> > > > > skip this and similar lines.  We don't need to describe
> > changes to
> > > >> > > internal
> > > >> > > > > test classes in the KIP since they're not visible to users
> or
> > > >> external
> > > >> > > > > developers.
> > > >> > > > >
> > > >> > > > > It seems like maybe the performance tests should get their
> own
> > > >> section.
> > > >> > > > > Right now, the way the layout is makes it look like they are
> > part
> > > >> of
> > > >> > > the
> > > >> > > > > "Compatibility, Deprecation, and Migration Plan"
> > > >> > > > >
> > > >> > > > > best,
> > > >> > > > > Colin
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > > >> > > > > > Hello,
> > > >> > > > > > This is the discussion thread for KIP-480: Sticky
> > Partitioner.
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > >
> > > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > >> > > > > >
> > > >> > > > > > Thank you,
> > > >> > > > > > Justine Olshan
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>
>
> --
> Best,
> Stanislav
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey Justine,

Thanks for the KIP! I am impressed by the performance results linked in the
KIP and I like the data-driven approach. This looks like a great
improvement.

I had one minor question regarding the public interface
`repartitionOnNewBatch` where we return null in the case of no change
needed. Have we considered using Java's Optional type to avoid null values?

Best,
Stanislav

On Tue, Jun 25, 2019 at 11:29 PM Colin McCabe <cm...@apache.org> wrote:

> No worries.  Thanks for fixing it!
> C.
>
> On Tue, Jun 25, 2019, at 13:47, Justine Olshan wrote:
> > Also apologies on the late link to the jira, but apparently https links
> do
> > not work and it kept defaulting to an image on my desktop even when it
> > looked like I put the correct link in. Weird...
> >
> > On Tue, Jun 25, 2019 at 1:41 PM Justine Olshan <jo...@confluent.io>
> wrote:
> >
> > > I came up with a good solution for this and will push the commit soon.
> The
> > > repartition will be called only when a partition is not manually sent.
> > >
> > > On Tue, Jun 25, 2019 at 1:39 PM Colin McCabe <cm...@apache.org>
> wrote:
> > >
> > >> Well, this is a generic partitioner method, so it shouldn't dictate
> any
> > >> particular behavior.
> > >>
> > >> Colin
> > >>
> > >>
> > >> On Tue, Jun 25, 2019, at 12:04, Justine Olshan wrote:
> > >> > I also just noticed that if we want to use this method on the keyed
> > >> record
> > >> > case, I will need to move the method outside of the sticky (no key,
> no
> > >> set
> > >> > partition) check. Not a big problem, but something to keep in mind.
> > >> > Perhaps, we should encapsulate the sticky vs. not behavior inside
> the
> > >> > method? More things to think about.
> > >> >
> > >> > On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe <cm...@apache.org>
> > >> wrote:
> > >> >
> > >> > > Hi Justine,
> > >> > >
> > >> > > The KIP discusses adding a new method to the partitioner
> interface.
> > >> > >
> > >> > > > default public Integer onNewBatch(String topic, Cluster
> cluster) {
> > >> ... }
> > >> > >
> > >> > > However, this new method doesn't give the partitioner access to
> the
> > >> key
> > >> > > and value of the message.  While this works for the case described
> > >> here (no
> > >> > > key), in general we might need this information when re-assigning
> a
> > >> > > partitition based on the batch completing.  So I think we should
> add
> > >> these
> > >> > > methods to onNewBatch.
> > >> > >
> > >> > > Also, it would be nice to call this something like
> > >> "repartitionOnNewBatch"
> > >> > > or something, to make it clearer what is going on.
> > >> > >
> > >> > > best,
> > >> > > Colin
> > >> > >
> > >> > > On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> > >> > > > Thank you Justine for the KIP! Do you mind creating a
> corresponding
> > >> JIRA
> > >> > > > ticket too?
> > >> > > >
> > >> > > > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <
> cmccabe@apache.org>
> > >> wrote:
> > >> > > >
> > >> > > > > Hi Justine,
> > >> > > > >
> > >> > > > > Thanks for the KIP.  This looks great!
> > >> > > > >
> > >> > > > > In one place in the KIP, you write: "Remove
> > >> > > > > testRoundRobinWithUnavailablePartitions() and testRoundRobin()
> > >> since
> > >> > > the
> > >> > > > > round robin functionality of the partitioner has been
> removed."
> > >> You
> > >> > > can
> > >> > > > > skip this and similar lines.  We don't need to describe
> changes to
> > >> > > internal
> > >> > > > > test classes in the KIP since they're not visible to users or
> > >> external
> > >> > > > > developers.
> > >> > > > >
> > >> > > > > It seems like maybe the performance tests should get their own
> > >> section.
> > >> > > > > Right now, the way the layout is makes it look like they are
> part
> > >> of
> > >> > > the
> > >> > > > > "Compatibility, Deprecation, and Migration Plan"
> > >> > > > >
> > >> > > > > best,
> > >> > > > > Colin
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > >> > > > > > Hello,
> > >> > > > > > This is the discussion thread for KIP-480: Sticky
> Partitioner.
> > >> > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > >
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > >> > > > > >
> > >> > > > > > Thank you,
> > >> > > > > > Justine Olshan
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>


-- 
Best,
Stanislav

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Colin McCabe <cm...@apache.org>.
No worries.  Thanks for fixing it!
C.

On Tue, Jun 25, 2019, at 13:47, Justine Olshan wrote:
> Also apologies on the late link to the jira, but apparently https links do
> not work and it kept defaulting to an image on my desktop even when it
> looked like I put the correct link in. Weird...
> 
> On Tue, Jun 25, 2019 at 1:41 PM Justine Olshan <jo...@confluent.io> wrote:
> 
> > I came up with a good solution for this and will push the commit soon. The
> > repartition will be called only when a partition is not manually sent.
> >
> > On Tue, Jun 25, 2019 at 1:39 PM Colin McCabe <cm...@apache.org> wrote:
> >
> >> Well, this is a generic partitioner method, so it shouldn't dictate any
> >> particular behavior.
> >>
> >> Colin
> >>
> >>
> >> On Tue, Jun 25, 2019, at 12:04, Justine Olshan wrote:
> >> > I also just noticed that if we want to use this method on the keyed
> >> record
> >> > case, I will need to move the method outside of the sticky (no key, no
> >> set
> >> > partition) check. Not a big problem, but something to keep in mind.
> >> > Perhaps, we should encapsulate the sticky vs. not behavior inside the
> >> > method? More things to think about.
> >> >
> >> > On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe <cm...@apache.org>
> >> wrote:
> >> >
> >> > > Hi Justine,
> >> > >
> >> > > The KIP discusses adding a new method to the partitioner interface.
> >> > >
> >> > > > default public Integer onNewBatch(String topic, Cluster cluster) {
> >> ... }
> >> > >
> >> > > However, this new method doesn't give the partitioner access to the
> >> key
> >> > > and value of the message.  While this works for the case described
> >> here (no
> >> > > key), in general we might need this information when re-assigning a
> >> > > partitition based on the batch completing.  So I think we should add
> >> these
> >> > > methods to onNewBatch.
> >> > >
> >> > > Also, it would be nice to call this something like
> >> "repartitionOnNewBatch"
> >> > > or something, to make it clearer what is going on.
> >> > >
> >> > > best,
> >> > > Colin
> >> > >
> >> > > On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> >> > > > Thank you Justine for the KIP! Do you mind creating a corresponding
> >> JIRA
> >> > > > ticket too?
> >> > > >
> >> > > > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <cm...@apache.org>
> >> wrote:
> >> > > >
> >> > > > > Hi Justine,
> >> > > > >
> >> > > > > Thanks for the KIP.  This looks great!
> >> > > > >
> >> > > > > In one place in the KIP, you write: "Remove
> >> > > > > testRoundRobinWithUnavailablePartitions() and testRoundRobin()
> >> since
> >> > > the
> >> > > > > round robin functionality of the partitioner has been removed."
> >> You
> >> > > can
> >> > > > > skip this and similar lines.  We don't need to describe changes to
> >> > > internal
> >> > > > > test classes in the KIP since they're not visible to users or
> >> external
> >> > > > > developers.
> >> > > > >
> >> > > > > It seems like maybe the performance tests should get their own
> >> section.
> >> > > > > Right now, the way the layout is makes it look like they are part
> >> of
> >> > > the
> >> > > > > "Compatibility, Deprecation, and Migration Plan"
> >> > > > >
> >> > > > > best,
> >> > > > > Colin
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> >> > > > > > Hello,
> >> > > > > > This is the discussion thread for KIP-480: Sticky Partitioner.
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> >> > > > > >
> >> > > > > > Thank you,
> >> > > > > > Justine Olshan
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
Also apologies on the late link to the jira, but apparently https links do
not work and it kept defaulting to an image on my desktop even when it
looked like I put the correct link in. Weird...

On Tue, Jun 25, 2019 at 1:41 PM Justine Olshan <jo...@confluent.io> wrote:

> I came up with a good solution for this and will push the commit soon. The
> repartition will be called only when a partition is not manually sent.
>
> On Tue, Jun 25, 2019 at 1:39 PM Colin McCabe <cm...@apache.org> wrote:
>
>> Well, this is a generic partitioner method, so it shouldn't dictate any
>> particular behavior.
>>
>> Colin
>>
>>
>> On Tue, Jun 25, 2019, at 12:04, Justine Olshan wrote:
>> > I also just noticed that if we want to use this method on the keyed
>> record
>> > case, I will need to move the method outside of the sticky (no key, no
>> set
>> > partition) check. Not a big problem, but something to keep in mind.
>> > Perhaps, we should encapsulate the sticky vs. not behavior inside the
>> > method? More things to think about.
>> >
>> > On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe <cm...@apache.org>
>> wrote:
>> >
>> > > Hi Justine,
>> > >
>> > > The KIP discusses adding a new method to the partitioner interface.
>> > >
>> > > > default public Integer onNewBatch(String topic, Cluster cluster) {
>> ... }
>> > >
>> > > However, this new method doesn't give the partitioner access to the
>> key
>> > > and value of the message.  While this works for the case described
>> here (no
>> > > key), in general we might need this information when re-assigning a
>> > > partitition based on the batch completing.  So I think we should add
>> these
>> > > methods to onNewBatch.
>> > >
>> > > Also, it would be nice to call this something like
>> "repartitionOnNewBatch"
>> > > or something, to make it clearer what is going on.
>> > >
>> > > best,
>> > > Colin
>> > >
>> > > On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
>> > > > Thank you Justine for the KIP! Do you mind creating a corresponding
>> JIRA
>> > > > ticket too?
>> > > >
>> > > > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <cm...@apache.org>
>> wrote:
>> > > >
>> > > > > Hi Justine,
>> > > > >
>> > > > > Thanks for the KIP.  This looks great!
>> > > > >
>> > > > > In one place in the KIP, you write: "Remove
>> > > > > testRoundRobinWithUnavailablePartitions() and testRoundRobin()
>> since
>> > > the
>> > > > > round robin functionality of the partitioner has been removed."
>> You
>> > > can
>> > > > > skip this and similar lines.  We don't need to describe changes to
>> > > internal
>> > > > > test classes in the KIP since they're not visible to users or
>> external
>> > > > > developers.
>> > > > >
>> > > > > It seems like maybe the performance tests should get their own
>> section.
>> > > > > Right now, the way the layout is makes it look like they are part
>> of
>> > > the
>> > > > > "Compatibility, Deprecation, and Migration Plan"
>> > > > >
>> > > > > best,
>> > > > > Colin
>> > > > >
>> > > > >
>> > > > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
>> > > > > > Hello,
>> > > > > > This is the discussion thread for KIP-480: Sticky Partitioner.
>> > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
>> > > > > >
>> > > > > > Thank you,
>> > > > > > Justine Olshan
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
I came up with a good solution for this and will push the commit soon. The
repartition will be called only when a partition is not manually sent.

On Tue, Jun 25, 2019 at 1:39 PM Colin McCabe <cm...@apache.org> wrote:

> Well, this is a generic partitioner method, so it shouldn't dictate any
> particular behavior.
>
> Colin
>
>
> On Tue, Jun 25, 2019, at 12:04, Justine Olshan wrote:
> > I also just noticed that if we want to use this method on the keyed
> record
> > case, I will need to move the method outside of the sticky (no key, no
> set
> > partition) check. Not a big problem, but something to keep in mind.
> > Perhaps, we should encapsulate the sticky vs. not behavior inside the
> > method? More things to think about.
> >
> > On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe <cm...@apache.org>
> wrote:
> >
> > > Hi Justine,
> > >
> > > The KIP discusses adding a new method to the partitioner interface.
> > >
> > > > default public Integer onNewBatch(String topic, Cluster cluster) {
> ... }
> > >
> > > However, this new method doesn't give the partitioner access to the key
> > > and value of the message.  While this works for the case described
> here (no
> > > key), in general we might need this information when re-assigning a
> > > partitition based on the batch completing.  So I think we should add
> these
> > > methods to onNewBatch.
> > >
> > > Also, it would be nice to call this something like
> "repartitionOnNewBatch"
> > > or something, to make it clearer what is going on.
> > >
> > > best,
> > > Colin
> > >
> > > On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> > > > Thank you Justine for the KIP! Do you mind creating a corresponding
> JIRA
> > > > ticket too?
> > > >
> > > > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <cm...@apache.org>
> wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > Thanks for the KIP.  This looks great!
> > > > >
> > > > > In one place in the KIP, you write: "Remove
> > > > > testRoundRobinWithUnavailablePartitions() and testRoundRobin()
> since
> > > the
> > > > > round robin functionality of the partitioner has been removed."
> You
> > > can
> > > > > skip this and similar lines.  We don't need to describe changes to
> > > internal
> > > > > test classes in the KIP since they're not visible to users or
> external
> > > > > developers.
> > > > >
> > > > > It seems like maybe the performance tests should get their own
> section.
> > > > > Right now, the way the layout is makes it look like they are part
> of
> > > the
> > > > > "Compatibility, Deprecation, and Migration Plan"
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > > > > > Hello,
> > > > > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > > >
> > > > > > Thank you,
> > > > > > Justine Olshan
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Colin McCabe <cm...@apache.org>.
Well, this is a generic partitioner method, so it shouldn't dictate any particular behavior.

Colin


On Tue, Jun 25, 2019, at 12:04, Justine Olshan wrote:
> I also just noticed that if we want to use this method on the keyed record
> case, I will need to move the method outside of the sticky (no key, no set
> partition) check. Not a big problem, but something to keep in mind.
> Perhaps, we should encapsulate the sticky vs. not behavior inside the
> method? More things to think about.
> 
> On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe <cm...@apache.org> wrote:
> 
> > Hi Justine,
> >
> > The KIP discusses adding a new method to the partitioner interface.
> >
> > > default public Integer onNewBatch(String topic, Cluster cluster) { ... }
> >
> > However, this new method doesn't give the partitioner access to the key
> > and value of the message.  While this works for the case described here (no
> > key), in general we might need this information when re-assigning a
> > partitition based on the batch completing.  So I think we should add these
> > methods to onNewBatch.
> >
> > Also, it would be nice to call this something like "repartitionOnNewBatch"
> > or something, to make it clearer what is going on.
> >
> > best,
> > Colin
> >
> > On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> > > Thank you Justine for the KIP! Do you mind creating a corresponding JIRA
> > > ticket too?
> > >
> > > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <cm...@apache.org> wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the KIP.  This looks great!
> > > >
> > > > In one place in the KIP, you write: "Remove
> > > > testRoundRobinWithUnavailablePartitions() and testRoundRobin() since
> > the
> > > > round robin functionality of the partitioner has been removed."  You
> > can
> > > > skip this and similar lines.  We don't need to describe changes to
> > internal
> > > > test classes in the KIP since they're not visible to users or external
> > > > developers.
> > > >
> > > > It seems like maybe the performance tests should get their own section.
> > > > Right now, the way the layout is makes it look like they are part of
> > the
> > > > "Compatibility, Deprecation, and Migration Plan"
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > > > > Hello,
> > > > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > >
> > > > > Thank you,
> > > > > Justine Olshan
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Justine Olshan <jo...@confluent.io>.
I also just noticed that if we want to use this method on the keyed record
case, I will need to move the method outside of the sticky (no key, no set
partition) check. Not a big problem, but something to keep in mind.
Perhaps, we should encapsulate the sticky vs. not behavior inside the
method? More things to think about.

On Tue, Jun 25, 2019 at 11:55 AM Colin McCabe <cm...@apache.org> wrote:

> Hi Justine,
>
> The KIP discusses adding a new method to the partitioner interface.
>
> > default public Integer onNewBatch(String topic, Cluster cluster) { ... }
>
> However, this new method doesn't give the partitioner access to the key
> and value of the message.  While this works for the case described here (no
> key), in general we might need this information when re-assigning a
> partitition based on the batch completing.  So I think we should add these
> methods to onNewBatch.
>
> Also, it would be nice to call this something like "repartitionOnNewBatch"
> or something, to make it clearer what is going on.
>
> best,
> Colin
>
> On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> > Thank you Justine for the KIP! Do you mind creating a corresponding JIRA
> > ticket too?
> >
> > On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <cm...@apache.org> wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP.  This looks great!
> > >
> > > In one place in the KIP, you write: "Remove
> > > testRoundRobinWithUnavailablePartitions() and testRoundRobin() since
> the
> > > round robin functionality of the partitioner has been removed."  You
> can
> > > skip this and similar lines.  We don't need to describe changes to
> internal
> > > test classes in the KIP since they're not visible to users or external
> > > developers.
> > >
> > > It seems like maybe the performance tests should get their own section.
> > > Right now, the way the layout is makes it look like they are part of
> the
> > > "Compatibility, Deprecation, and Migration Plan"
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > > > Hello,
> > > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > >
> > > > Thank you,
> > > > Justine Olshan
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Colin McCabe <cm...@apache.org>.
Hi Justine,

The KIP discusses adding a new method to the partitioner interface.

> default public Integer onNewBatch(String topic, Cluster cluster) { ... }

However, this new method doesn't give the partitioner access to the key and value of the message.  While this works for the case described here (no key), in general we might need this information when re-assigning a partitition based on the batch completing.  So I think we should add these methods to onNewBatch.

Also, it would be nice to call this something like "repartitionOnNewBatch" or something, to make it clearer what is going on.

best,
Colin

On Mon, Jun 24, 2019, at 18:32, Boyang Chen wrote:
> Thank you Justine for the KIP! Do you mind creating a corresponding JIRA
> ticket too?
> 
> On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <cm...@apache.org> wrote:
> 
> > Hi Justine,
> >
> > Thanks for the KIP.  This looks great!
> >
> > In one place in the KIP, you write: "Remove
> > testRoundRobinWithUnavailablePartitions() and testRoundRobin() since the
> > round robin functionality of the partitioner has been removed."  You can
> > skip this and similar lines.  We don't need to describe changes to internal
> > test classes in the KIP since they're not visible to users or external
> > developers.
> >
> > It seems like maybe the performance tests should get their own section.
> > Right now, the way the layout is makes it look like they are part of the
> > "Compatibility, Deprecation, and Migration Plan"
> >
> > best,
> > Colin
> >
> >
> > On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > > Hello,
> > > This is the discussion thread for KIP-480: Sticky Partitioner.
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > >
> > > Thank you,
> > > Justine Olshan
> > >
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Boyang Chen <re...@gmail.com>.
Thank you Justine for the KIP! Do you mind creating a corresponding JIRA
ticket too?

On Mon, Jun 24, 2019 at 4:51 PM Colin McCabe <cm...@apache.org> wrote:

> Hi Justine,
>
> Thanks for the KIP.  This looks great!
>
> In one place in the KIP, you write: "Remove
> testRoundRobinWithUnavailablePartitions() and testRoundRobin() since the
> round robin functionality of the partitioner has been removed."  You can
> skip this and similar lines.  We don't need to describe changes to internal
> test classes in the KIP since they're not visible to users or external
> developers.
>
> It seems like maybe the performance tests should get their own section.
> Right now, the way the layout is makes it look like they are part of the
> "Compatibility, Deprecation, and Migration Plan"
>
> best,
> Colin
>
>
> On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> > Hello,
> > This is the discussion thread for KIP-480: Sticky Partitioner.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> >
> > Thank you,
> > Justine Olshan
> >
>

Re: [DISCUSS] KIP-480 : Sticky Partitioner

Posted by Colin McCabe <cm...@apache.org>.
Hi Justine,

Thanks for the KIP.  This looks great!

In one place in the KIP, you write: "Remove testRoundRobinWithUnavailablePartitions() and testRoundRobin() since the round robin functionality of the partitioner has been removed."  You can skip this and similar lines.  We don't need to describe changes to internal test classes in the KIP since they're not visible to users or external developers.

It seems like maybe the performance tests should get their own section.  Right now, the way the layout is makes it look like they are part of the "Compatibility, Deprecation, and Migration Plan"

best,
Colin


On Mon, Jun 24, 2019, at 14:04, Justine Olshan wrote:
> Hello,
> This is the discussion thread for KIP-480: Sticky Partitioner.
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> 
> Thank you,
> Justine Olshan
>