You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by jun aoki <ja...@apache.org> on 2022/01/12 06:21:06 UTC

Questions around static membership partition assignment

Hi kafka experts,

My understanding of static membership is that assuming kubernetes, for
example, can provide a fixed number of healthy pods almost always, so that
kafka doesn't have to do any rebalancing.

It leads me to think, if the starting point is partition assignment being
unbalanced (say there are total 10 partitions and 2 pods. 8 partitions are
assigned one of them and 2 partitions go to the other), it will be
unbalanced forever because pods are kept healthy by k8s and no rebalancing
ever occurs. And I don't think it is the desired behavior.

My questions are
1. The desired behavior is partitions should be balanced eventually (but
then it conflicts with "no rebalance" nature of static membership with the
healthy backbone). Could you point out what I am missing here?
2. What is recommended under this unbalanced partition situation?
  2-a Leaving it unbalanced? (unlikely)
  2-b Do I have to adjust session timeout so I can artificially cause a
rebalance that eventually makes assignment even? (I sense somewhat
cumbersome)
  2-c Is there a special PartitionAssignor implementation we should use
under static membership and the assignor magically guarantees the
assignment even?

We've been suffering from rebalance storms from time to time and static
membership seems the way to resolve it, but I do want to make sure we know
how to work around some edge cases like it.

-- 
-jun

Re: Questions around static membership partition assignment

Posted by jun aoki <ja...@apache.org>.
Thank you Luke for responding!

The unbalance situation we've observed without static membership goes like
this;  say, we expect 2 pods and k8s statefulset actually starts one by one
(PodManagementPolicy=OrderedReady), after the first pod is ready, all
partitions are assigned to it, and the very moment the second pod becomes
ready, the second one has 0 assignment, and at this very moment I see it as
"unbalanced", then the next moment Rebalance does start to even out
eventually. Rebalance occurs because it is dynamic membership. (and no
issues here, everything is expected)

My impression of static membership was
1. say, there is only one pod and we decide to add one more pod later
2. I expect rebalance *never* occurs, the second pod has 0 assignment
forever.

> --> Yes, the root cause of this issue is: there should not be any
"unbalanced" cases after rebalance completed

So if I got your response right, my misunderstanding was at 2. my kafka app
would actually trigger a rebalance for the brand new second pod, at least
once, and likely only once, that would make the assignment even and then
the assignment stay even forever.

Do I understand static membership partition assignment strategy correctly?


On Tue, Jan 11, 2022 at 11:32 PM Luke Chen <sh...@gmail.com> wrote:

> Hi Jun
>
> The goal of static membership, is to hold the rebalance when there's
> consumer dropped (before session timeout).
> For K8s, it's good because when the pods are broken (or during upgrade),
> it'll kill the pod and bring a new one up to replace the old one.
> In this case, we don't want the consumer group to rebalance twice (once
> when old pods down, and once after new pods up).
> We hold the rebalance, and after the new pods up, we check everything is
> good, no rebalance will be triggered.
>
> So, answering your questions below:
>
> 1. The desired behavior is partitions should be balanced eventually (but
> then it conflicts with "no rebalance" nature of static membership with the
> healthy backbone). Could you point out what I am missing here?
> --> Yes, the root cause of this issue is: there should not be any
> "unbalanced" cases after rebalance completed
>
> 2. What is recommended under this unbalanced partition situation?
>   2-a Leaving it unbalanced? (unlikely)
>   2-b Do I have to adjust session timeout so I can artificially cause a
> rebalance that eventually makes assignment even? (I sense somewhat
> cumbersome)
>   2-c Is there a special PartitionAssignor implementation we should use
> under static membership and the assignor magically guarantees the
> assignment even?
>
> --> Again, we expect that there should be no "unbalanced" situation after
> rebalance completed.
> If this really happened, you might need to try to report in JIRA here:
> https://issues.apache.org/jira/projects/KAFKA/issues
> And try to collect logs and current consumer status to us. (if possible,
> enable DEBUG log for troubleshooting)
> And please let us know which partition.assignment.strategy
> <
> https://kafka.apache.org/documentation/#consumerconfigs_partition.assignment.strategy
> >
> you're using? RoundRobin? CooperativeStickyAssignor?
>
> I saw you mentioned you've. been suffering from rebalance storms from time
> to time.
> I think you should first figure out why the rebalance happen so frequently.
> And then we can know how to fix it.
> Static membership is a way to improve it, but as I mentioned above, it only
> works for some cases.
>
> If you're interested in read more about static membership, here's a good
> blog:
> https://www.confluent.io/blog/kafka-rebalance-protocol-static-membership/
>
> Thank you.
> Luke
>
> On Wed, Jan 12, 2022 at 2:21 PM jun aoki <ja...@apache.org> wrote:
>
> > Hi kafka experts,
> >
> > My understanding of static membership is that assuming kubernetes, for
> > example, can provide a fixed number of healthy pods almost always, so
> that
> > kafka doesn't have to do any rebalancing.
> >
> > It leads me to think, if the starting point is partition assignment being
> > unbalanced (say there are total 10 partitions and 2 pods. 8 partitions
> are
> > assigned one of them and 2 partitions go to the other), it will be
> > unbalanced forever because pods are kept healthy by k8s and no
> rebalancing
> > ever occurs. And I don't think it is the desired behavior.
> >
> > My questions are
> > 1. The desired behavior is partitions should be balanced eventually (but
> > then it conflicts with "no rebalance" nature of static membership with
> the
> > healthy backbone). Could you point out what I am missing here?
> > 2. What is recommended under this unbalanced partition situation?
> >   2-a Leaving it unbalanced? (unlikely)
> >   2-b Do I have to adjust session timeout so I can artificially cause a
> > rebalance that eventually makes assignment even? (I sense somewhat
> > cumbersome)
> >   2-c Is there a special PartitionAssignor implementation we should use
> > under static membership and the assignor magically guarantees the
> > assignment even?
> >
> > We've been suffering from rebalance storms from time to time and static
> > membership seems the way to resolve it, but I do want to make sure we
> know
> > how to work around some edge cases like it.
> >
> > --
> > -jun
> >
>


-- 
-jun

Re: Questions around static membership partition assignment

Posted by Luke Chen <sh...@gmail.com>.
Hi Jun

The goal of static membership, is to hold the rebalance when there's
consumer dropped (before session timeout).
For K8s, it's good because when the pods are broken (or during upgrade),
it'll kill the pod and bring a new one up to replace the old one.
In this case, we don't want the consumer group to rebalance twice (once
when old pods down, and once after new pods up).
We hold the rebalance, and after the new pods up, we check everything is
good, no rebalance will be triggered.

So, answering your questions below:

1. The desired behavior is partitions should be balanced eventually (but
then it conflicts with "no rebalance" nature of static membership with the
healthy backbone). Could you point out what I am missing here?
--> Yes, the root cause of this issue is: there should not be any
"unbalanced" cases after rebalance completed

2. What is recommended under this unbalanced partition situation?
  2-a Leaving it unbalanced? (unlikely)
  2-b Do I have to adjust session timeout so I can artificially cause a
rebalance that eventually makes assignment even? (I sense somewhat
cumbersome)
  2-c Is there a special PartitionAssignor implementation we should use
under static membership and the assignor magically guarantees the
assignment even?

--> Again, we expect that there should be no "unbalanced" situation after
rebalance completed.
If this really happened, you might need to try to report in JIRA here:
https://issues.apache.org/jira/projects/KAFKA/issues
And try to collect logs and current consumer status to us. (if possible,
enable DEBUG log for troubleshooting)
And please let us know which partition.assignment.strategy
<https://kafka.apache.org/documentation/#consumerconfigs_partition.assignment.strategy>
you're using? RoundRobin? CooperativeStickyAssignor?

I saw you mentioned you've. been suffering from rebalance storms from time
to time.
I think you should first figure out why the rebalance happen so frequently.
And then we can know how to fix it.
Static membership is a way to improve it, but as I mentioned above, it only
works for some cases.

If you're interested in read more about static membership, here's a good
blog:
https://www.confluent.io/blog/kafka-rebalance-protocol-static-membership/

Thank you.
Luke

On Wed, Jan 12, 2022 at 2:21 PM jun aoki <ja...@apache.org> wrote:

> Hi kafka experts,
>
> My understanding of static membership is that assuming kubernetes, for
> example, can provide a fixed number of healthy pods almost always, so that
> kafka doesn't have to do any rebalancing.
>
> It leads me to think, if the starting point is partition assignment being
> unbalanced (say there are total 10 partitions and 2 pods. 8 partitions are
> assigned one of them and 2 partitions go to the other), it will be
> unbalanced forever because pods are kept healthy by k8s and no rebalancing
> ever occurs. And I don't think it is the desired behavior.
>
> My questions are
> 1. The desired behavior is partitions should be balanced eventually (but
> then it conflicts with "no rebalance" nature of static membership with the
> healthy backbone). Could you point out what I am missing here?
> 2. What is recommended under this unbalanced partition situation?
>   2-a Leaving it unbalanced? (unlikely)
>   2-b Do I have to adjust session timeout so I can artificially cause a
> rebalance that eventually makes assignment even? (I sense somewhat
> cumbersome)
>   2-c Is there a special PartitionAssignor implementation we should use
> under static membership and the assignor magically guarantees the
> assignment even?
>
> We've been suffering from rebalance storms from time to time and static
> membership seems the way to resolve it, but I do want to make sure we know
> how to work around some edge cases like it.
>
> --
> -jun
>