You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Jack Lund <ja...@braintreepayments.com> on 2017/01/05 15:55:53 UTC

Under-replicated Partitions while rolling Kafka nodes in AWS

Hello, all.

We're running multiple Kafka clusters in AWS, and thus multiple Zookeeper
clusters as well. When we roll out changes to our zookeeper nodes (which
involves changes to the AMI, which means terminating the zookeeper instance
and bringing up a new one in its place) we have to restart our Kafka
brokers one at a time so they can pick up the new zookeeper IP address.

What we've noticed is that, as the brokers are restarted, we get alerts for
under-replicated partitions, which seems strange since it seems like the
shutdown process should take care of moving any replicas and the leadership
election process.

This is causing us some pain because it means that we get pages whenever we
roll out changes to Zookeeper.

Does anybody have any ideas why this would be happening, and how we can
avoid it?

Thanks.

-Jack Lund
 Braintree Payments

Re: Under-replicated Partitions while rolling Kafka nodes in AWS

Posted by Abhishek Agarwal <ab...@gmail.com>.

FYI, DNS caching is still not fixed in 0.10. The zookeeper DNS cache has
been fixed for the zookeeper server where quorum members refresh the ip
address of their peers. But the client still doesn't do ip address refresh.

On Mon, Jan 9, 2017 at 9:56 PM, Jack Lund <ja...@braintreepayments.com>
wrote:

> On Thu, Jan 5, 2017 at 4:32 PM James Cheng <wu...@gmail.com> wrote:
>
> >
> > FYI, zookeeper 3.4.8 fixes the issue where you have to restart zookeeper
> > nodes when their DNS mapping changes. I'm not sure how it affects
> > restarting kafka though, when the zookeeper DNS changes.
> >
> > https://zookeeper.apache.org/doc/r3.4.8/releasenotes.html <
> > https://zookeeper.apache.org/doc/r3.4.8/releasenotes.html>
> > https://issues.apache.org/jira/browse/ZOOKEEPER-1506 <
> > https://issues.apache.org/jira/browse/ZOOKEEPER-1506>
> >
> >
> Actually, the problem for us isn't zookeeper but Kafka, because the IP
> addresses for the zookeeper instances changes, and we have to update Kafka
> with the new ones.
>
> We chose not to have Kafka access the zookeeper nodes via DNS because we
> had issues with Kafka caching the DNS entries (IIRC) so we had to restart
> the kafka nodes anyway to get the changes (this was with, I believe, 0.8,
> so it's possible this is fixed in 0.10).
>
>
> > > What we've noticed is that, as the brokers are restarted, we get alerts
> > for
> > > under-replicated partitions, which seems strange since it seems like
> the
> > > shutdown process should take care of moving any replicas and the
> > leadership
> > > election process.
> > >
> >
> > During a controlled shutdown, you are right that *leadership* is moved
> > from one broker to another. But the replica list does not change. A topic
> > assigned to brokers 1 2 3 for example will only live on 1 2 3. If broker
> 1
> > is the leader for the topic, then during controlled shutdown of 1,
> > leadership may move to 2 or 3. But a broker 4 would never automatically
> > take over as replica for the topic.
> >
> > You can build such functionality yourself, if you wanted. You could, for
> > example, move the topic to 2 3 4 before shutting down 1, and then move it
> > back to 1 2 3 once 1 is back up. But that's a bunch of work you've have
> to
> > do yourself.
>
>
> We don't actually want this behavior, but we thought this would explain
> what we're seeing with the under-replicated partitions JMX metric. If the
> partitions aren't being moved (which makes sense), then it makes sense that
> it would claim that the partition is under-replicated, because, well, it
> is, at least briefly.
>
>
> >
> >
> -James
> >
> >
> Thanks!
>
> -Jack
>



-- 
Regards,
Abhishek Agarwal

Re: Under-replicated Partitions while rolling Kafka nodes in AWS

Posted by Jack Lund <ja...@braintreepayments.com>.

On Thu, Jan 5, 2017 at 4:32 PM James Cheng <wu...@gmail.com> wrote:

>
> FYI, zookeeper 3.4.8 fixes the issue where you have to restart zookeeper
> nodes when their DNS mapping changes. I'm not sure how it affects
> restarting kafka though, when the zookeeper DNS changes.
>
> https://zookeeper.apache.org/doc/r3.4.8/releasenotes.html <
> https://zookeeper.apache.org/doc/r3.4.8/releasenotes.html>
> https://issues.apache.org/jira/browse/ZOOKEEPER-1506 <
> https://issues.apache.org/jira/browse/ZOOKEEPER-1506>
>
>
Actually, the problem for us isn't zookeeper but Kafka, because the IP
addresses for the zookeeper instances changes, and we have to update Kafka
with the new ones.

We chose not to have Kafka access the zookeeper nodes via DNS because we
had issues with Kafka caching the DNS entries (IIRC) so we had to restart
the kafka nodes anyway to get the changes (this was with, I believe, 0.8,
so it's possible this is fixed in 0.10).


> > What we've noticed is that, as the brokers are restarted, we get alerts
> for
> > under-replicated partitions, which seems strange since it seems like the
> > shutdown process should take care of moving any replicas and the
> leadership
> > election process.
> >
>
> During a controlled shutdown, you are right that *leadership* is moved
> from one broker to another. But the replica list does not change. A topic
> assigned to brokers 1 2 3 for example will only live on 1 2 3. If broker 1
> is the leader for the topic, then during controlled shutdown of 1,
> leadership may move to 2 or 3. But a broker 4 would never automatically
> take over as replica for the topic.
>
> You can build such functionality yourself, if you wanted. You could, for
> example, move the topic to 2 3 4 before shutting down 1, and then move it
> back to 1 2 3 once 1 is back up. But that's a bunch of work you've have to
> do yourself.


We don't actually want this behavior, but we thought this would explain
what we're seeing with the under-replicated partitions JMX metric. If the
partitions aren't being moved (which makes sense), then it makes sense that
it would claim that the partition is under-replicated, because, well, it
is, at least briefly.


>
>
-James
>
>
Thanks!

-Jack

Re: Under-replicated Partitions while rolling Kafka nodes in AWS

Posted by James Cheng <wu...@gmail.com>.

> On Jan 5, 2017, at 7:55 AM, Jack Lund <ja...@braintreepayments.com> wrote:
> 
> Hello, all.
> 
> We're running multiple Kafka clusters in AWS, and thus multiple Zookeeper
> clusters as well. When we roll out changes to our zookeeper nodes (which
> involves changes to the AMI, which means terminating the zookeeper instance
> and bringing up a new one in its place) we have to restart our Kafka
> brokers one at a time so they can pick up the new zookeeper IP address.
> 

FYI, zookeeper 3.4.8 fixes the issue where you have to restart zookeeper nodes when their DNS mapping changes. I'm not sure how it affects restarting kafka though, when the zookeeper DNS changes.

https://zookeeper.apache.org/doc/r3.4.8/releasenotes.html <https://zookeeper.apache.org/doc/r3.4.8/releasenotes.html>
https://issues.apache.org/jira/browse/ZOOKEEPER-1506 <https://issues.apache.org/jira/browse/ZOOKEEPER-1506>

> What we've noticed is that, as the brokers are restarted, we get alerts for
> under-replicated partitions, which seems strange since it seems like the
> shutdown process should take care of moving any replicas and the leadership
> election process.
> 

During a controlled shutdown, you are right that *leadership* is moved from one broker to another. But the replica list does not change. A topic assigned to brokers 1 2 3 for example will only live on 1 2 3. If broker 1 is the leader for the topic, then during controlled shutdown of 1, leadership may move to 2 or 3. But a broker 4 would never automatically take over as replica for the topic.

You can build such functionality yourself, if you wanted. You could, for example, move the topic to 2 3 4 before shutting down 1, and then move it back to 1 2 3 once 1 is back up. But that's a bunch of work you've have to do yourself.

-James

> This is causing us some pain because it means that we get pages whenever we
> roll out changes to Zookeeper.
> 
> Does anybody have any ideas why this would be happening, and how we can
> avoid it?
> 
> Thanks.
> 
> -Jack Lund
> Braintree Payments