You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Sam Meder <sa...@jivesoftware.com> on 2013/09/19 17:17:36 UTC

Rebalancing failures during upgrade to latest code

The latest consumer changes to read data from Zookeeper during rebalance have made the consumer rebalance code incompatible with older versions (making rolling upgrades without downtime hard). The problem relates to how partitions are ordered. The old code seems to have returned the partitions sorted:

... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic produce-indexable-views with consumers: ...

the new code instead uses:

... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13, 2, 17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic produce-indexable-views with consumers: ...

This causes new consumers and old consumers to claim the same partitions. I realize that this may not be a big deal (although painful for us since it disagrees with our deployment automation) since the code wasn't officially released, but it seems simple enough to sort the partitions if you'd take such a patch.

/Sam




Re: Rebalancing failures during upgrade to latest code

Posted by Sam Meder <sa...@jivesoftware.com>.
Filed KAFKA-1062, including trivial patch.

/Sam

On Sep 19, 2013, at 5:52 PM, Neha Narkhede <ne...@gmail.com> wrote:

> Agreed. This is a regression and is not easy to reason about. This is a
> side effect of reading the partitions as a set from zookeeper. Please can
> you file a JIRA to get this fixed? Feel free to upload a patch as well.
> 
> Thanks,
> Neha
> 
> 
> On Thu, Sep 19, 2013 at 8:17 AM, Sam Meder <sa...@jivesoftware.com>wrote:
> 
>> The latest consumer changes to read data from Zookeeper during rebalance
>> have made the consumer rebalance code incompatible with older versions
>> (making rolling upgrades without downtime hard). The problem relates to how
>> partitions are ordered. The old code seems to have returned the partitions
>> sorted:
>> 
>> ... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6,
>> 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic
>> produce-indexable-views with consumers: ...
>> 
>> the new code instead uses:
>> 
>> ... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13,
>> 2, 17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic
>> produce-indexable-views with consumers: ...
>> 
>> This causes new consumers and old consumers to claim the same partitions.
>> I realize that this may not be a big deal (although painful for us since it
>> disagrees with our deployment automation) since the code wasn't officially
>> released, but it seems simple enough to sort the partitions if you'd take
>> such a patch.
>> 
>> /Sam
>> 
>> 
>> 
>> 


Re: Rebalancing failures during upgrade to latest code

Posted by Neha Narkhede <ne...@gmail.com>.
Agreed. This is a regression and is not easy to reason about. This is a
side effect of reading the partitions as a set from zookeeper. Please can
you file a JIRA to get this fixed? Feel free to upload a patch as well.

Thanks,
Neha


On Thu, Sep 19, 2013 at 8:17 AM, Sam Meder <sa...@jivesoftware.com>wrote:

> The latest consumer changes to read data from Zookeeper during rebalance
> have made the consumer rebalance code incompatible with older versions
> (making rolling upgrades without downtime hard). The problem relates to how
> partitions are ordered. The old code seems to have returned the partitions
> sorted:
>
> ... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6,
> 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic
> produce-indexable-views with consumers: ...
>
> the new code instead uses:
>
> ... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13,
> 2, 17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic
> produce-indexable-views with consumers: ...
>
> This causes new consumers and old consumers to claim the same partitions.
> I realize that this may not be a big deal (although painful for us since it
> disagrees with our deployment automation) since the code wasn't officially
> released, but it seems simple enough to sort the partitions if you'd take
> such a patch.
>
> /Sam
>
>
>
>

Re: Rebalancing failures during upgrade to latest code

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Sam,

I agree that with the fix we still should sort the partition list before
hand it to the assignment algorithm. I will try to make a follow-up patch
to fix this.

Guozhang


On Thu, Sep 19, 2013 at 8:17 AM, Sam Meder <sa...@jivesoftware.com>wrote:

> The latest consumer changes to read data from Zookeeper during rebalance
> have made the consumer rebalance code incompatible with older versions
> (making rolling upgrades without downtime hard). The problem relates to how
> partitions are ordered. The old code seems to have returned the partitions
> sorted:
>
> ... rebalancing the following partitions: ArrayBuffer(0, 1, 2, 3, 4, 5, 6,
> 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19) for topic
> produce-indexable-views with consumers: ...
>
> the new code instead uses:
>
> ... rebalancing the following partitions: List(0, 5, 10, 14, 1, 6, 9, 13,
> 2, 17, 12, 7, 3, 18, 16, 11, 8, 19, 4, 15) for topic
> produce-indexable-views with consumers: ...
>
> This causes new consumers and old consumers to claim the same partitions.
> I realize that this may not be a big deal (although painful for us since it
> disagrees with our deployment automation) since the code wasn't officially
> released, but it seems simple enough to sort the partitions if you'd take
> such a patch.
>
> /Sam
>
>
>
>


-- 
-- Guozhang