You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Jason Rosenberg <jb...@squareup.com> on 2014/11/17 20:06:51 UTC

ISR shrink to 0?

We have had 2 nodes in a 4 node cluster die this weekend, sadly.
Fortunately there was no critical data on these machines yet.

The cluster is running 0.8.1.1, and using replication factor of 2 for 2
topics, each with 20 partitions.

For sake of discussion, assume that nodes A and B are still up, and C and D
are now down.

As expected, partitions that had one replica on a good host (A or B) and
one on a bad node (C or D), had their ISR shrink to just 1 node (A or B).

Roughly 1/6 of the partitions had their 2 replicas on the 2 bad nodes, C
and D.  For these, I was expecting the ISR to show up as empty, and the
partition unavailable.

However, that's not what I'm seeing.  When running TopicCommand --describe,
I see that the ISR still shows 1 replica, on node D (D was the second node
to go down).

And, producers are still periodically trying to produce to node D (but
failing and retrying to one of the good nodes).

So, it seems the cluster's meta data is still thinking that node D is up
and serving the partitions that were only replicated on C and D.   However,
for partitions that were on A and D, or B and D, D is not shown as being in
the ISR.

Is this correct?  Should the cluster continue showing the last node to have
been alive for a partition as still in the ISR?

Jason

Re: ISR shrink to 0?

Posted by Jun Rao <ju...@gmail.com>.

In that case, no leader will be elected until at least 1 replica in ISR
comes back.

Thanks,

Jun

On Wed, Nov 19, 2014 at 6:58 PM, Jason Rosenberg <jb...@squareup.com> wrote:

> What if it never comes back with unclean leader election disabled (but
> another broker does come back)?
>
> On Wed, Nov 19, 2014 at 9:32 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > In that case, we just wait until the broker in ISR is back and make it
> the
> > leader and take whatever data is has.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Nov 18, 2014 at 10:36 PM, Jason Rosenberg <jb...@squareup.com>
> > wrote:
> >
> > > Ok,
> > >
> > > Makes sense.  But if the node is not actually healthy (and underwent a
> > hard
> > > crash) it would likely not be able to avoid an 'unclean'
> restart.....what
> > > happens if unclean leader election is disabled, but there are no
> 'clean'
> > > partitions available?
> > >
> > > Jason
> > >
> > > On Wed, Nov 19, 2014 at 12:40 AM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > Yes, we will preserve the last replica in ISR. This way, we know
> which
> > > > replica has all committed messages and can wait for it to come back
> as
> > > the
> > > > leader, if unclean leader election is disabled.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Mon, Nov 17, 2014 at 11:06 AM, Jason Rosenberg <jb...@squareup.com>
> > > > wrote:
> > > >
> > > > > We have had 2 nodes in a 4 node cluster die this weekend, sadly.
> > > > > Fortunately there was no critical data on these machines yet.
> > > > >
> > > > > The cluster is running 0.8.1.1, and using replication factor of 2
> > for 2
> > > > > topics, each with 20 partitions.
> > > > >
> > > > > For sake of discussion, assume that nodes A and B are still up,
> and C
> > > > and D
> > > > > are now down.
> > > > >
> > > > > As expected, partitions that had one replica on a good host (A or
> B)
> > > and
> > > > > one on a bad node (C or D), had their ISR shrink to just 1 node (A
> or
> > > B).
> > > > >
> > > > > Roughly 1/6 of the partitions had their 2 replicas on the 2 bad
> > nodes,
> > > C
> > > > > and D.  For these, I was expecting the ISR to show up as empty, and
> > the
> > > > > partition unavailable.
> > > > >
> > > > > However, that's not what I'm seeing.  When running TopicCommand
> > > > --describe,
> > > > > I see that the ISR still shows 1 replica, on node D (D was the
> second
> > > > node
> > > > > to go down).
> > > > >
> > > > > And, producers are still periodically trying to produce to node D
> > (but
> > > > > failing and retrying to one of the good nodes).
> > > > >
> > > > > So, it seems the cluster's meta data is still thinking that node D
> is
> > > up
> > > > > and serving the partitions that were only replicated on C and D.
> > > >  However,
> > > > > for partitions that were on A and D, or B and D, D is not shown as
> > > being
> > > > in
> > > > > the ISR.
> > > > >
> > > > > Is this correct?  Should the cluster continue showing the last node
> > to
> > > > have
> > > > > been alive for a partition as still in the ISR?
> > > > >
> > > > > Jason
> > > > >
> > > >
> > >
> >
>

Re: ISR shrink to 0?

Posted by Jason Rosenberg <jb...@squareup.com>.

What if it never comes back with unclean leader election disabled (but
another broker does come back)?

On Wed, Nov 19, 2014 at 9:32 PM, Jun Rao <ju...@gmail.com> wrote:

> In that case, we just wait until the broker in ISR is back and make it the
> leader and take whatever data is has.
>
> Thanks,
>
> Jun
>
> On Tue, Nov 18, 2014 at 10:36 PM, Jason Rosenberg <jb...@squareup.com>
> wrote:
>
> > Ok,
> >
> > Makes sense.  But if the node is not actually healthy (and underwent a
> hard
> > crash) it would likely not be able to avoid an 'unclean' restart.....what
> > happens if unclean leader election is disabled, but there are no 'clean'
> > partitions available?
> >
> > Jason
> >
> > On Wed, Nov 19, 2014 at 12:40 AM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > Yes, we will preserve the last replica in ISR. This way, we know which
> > > replica has all committed messages and can wait for it to come back as
> > the
> > > leader, if unclean leader election is disabled.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Nov 17, 2014 at 11:06 AM, Jason Rosenberg <jb...@squareup.com>
> > > wrote:
> > >
> > > > We have had 2 nodes in a 4 node cluster die this weekend, sadly.
> > > > Fortunately there was no critical data on these machines yet.
> > > >
> > > > The cluster is running 0.8.1.1, and using replication factor of 2
> for 2
> > > > topics, each with 20 partitions.
> > > >
> > > > For sake of discussion, assume that nodes A and B are still up, and C
> > > and D
> > > > are now down.
> > > >
> > > > As expected, partitions that had one replica on a good host (A or B)
> > and
> > > > one on a bad node (C or D), had their ISR shrink to just 1 node (A or
> > B).
> > > >
> > > > Roughly 1/6 of the partitions had their 2 replicas on the 2 bad
> nodes,
> > C
> > > > and D.  For these, I was expecting the ISR to show up as empty, and
> the
> > > > partition unavailable.
> > > >
> > > > However, that's not what I'm seeing.  When running TopicCommand
> > > --describe,
> > > > I see that the ISR still shows 1 replica, on node D (D was the second
> > > node
> > > > to go down).
> > > >
> > > > And, producers are still periodically trying to produce to node D
> (but
> > > > failing and retrying to one of the good nodes).
> > > >
> > > > So, it seems the cluster's meta data is still thinking that node D is
> > up
> > > > and serving the partitions that were only replicated on C and D.
> > >  However,
> > > > for partitions that were on A and D, or B and D, D is not shown as
> > being
> > > in
> > > > the ISR.
> > > >
> > > > Is this correct?  Should the cluster continue showing the last node
> to
> > > have
> > > > been alive for a partition as still in the ISR?
> > > >
> > > > Jason
> > > >
> > >
> >
>

Re: ISR shrink to 0?

Posted by Jun Rao <ju...@gmail.com>.

In that case, we just wait until the broker in ISR is back and make it the
leader and take whatever data is has.

Thanks,

Jun

On Tue, Nov 18, 2014 at 10:36 PM, Jason Rosenberg <jb...@squareup.com> wrote:

> Ok,
>
> Makes sense.  But if the node is not actually healthy (and underwent a hard
> crash) it would likely not be able to avoid an 'unclean' restart.....what
> happens if unclean leader election is disabled, but there are no 'clean'
> partitions available?
>
> Jason
>
> On Wed, Nov 19, 2014 at 12:40 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > Yes, we will preserve the last replica in ISR. This way, we know which
> > replica has all committed messages and can wait for it to come back as
> the
> > leader, if unclean leader election is disabled.
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Nov 17, 2014 at 11:06 AM, Jason Rosenberg <jb...@squareup.com>
> > wrote:
> >
> > > We have had 2 nodes in a 4 node cluster die this weekend, sadly.
> > > Fortunately there was no critical data on these machines yet.
> > >
> > > The cluster is running 0.8.1.1, and using replication factor of 2 for 2
> > > topics, each with 20 partitions.
> > >
> > > For sake of discussion, assume that nodes A and B are still up, and C
> > and D
> > > are now down.
> > >
> > > As expected, partitions that had one replica on a good host (A or B)
> and
> > > one on a bad node (C or D), had their ISR shrink to just 1 node (A or
> B).
> > >
> > > Roughly 1/6 of the partitions had their 2 replicas on the 2 bad nodes,
> C
> > > and D.  For these, I was expecting the ISR to show up as empty, and the
> > > partition unavailable.
> > >
> > > However, that's not what I'm seeing.  When running TopicCommand
> > --describe,
> > > I see that the ISR still shows 1 replica, on node D (D was the second
> > node
> > > to go down).
> > >
> > > And, producers are still periodically trying to produce to node D (but
> > > failing and retrying to one of the good nodes).
> > >
> > > So, it seems the cluster's meta data is still thinking that node D is
> up
> > > and serving the partitions that were only replicated on C and D.
> >  However,
> > > for partitions that were on A and D, or B and D, D is not shown as
> being
> > in
> > > the ISR.
> > >
> > > Is this correct?  Should the cluster continue showing the last node to
> > have
> > > been alive for a partition as still in the ISR?
> > >
> > > Jason
> > >
> >
>

Re: ISR shrink to 0?

Posted by Jason Rosenberg <jb...@squareup.com>.

Ok,

Makes sense.  But if the node is not actually healthy (and underwent a hard
crash) it would likely not be able to avoid an 'unclean' restart.....what
happens if unclean leader election is disabled, but there are no 'clean'
partitions available?

Jason

On Wed, Nov 19, 2014 at 12:40 AM, Jun Rao <ju...@gmail.com> wrote:

> Yes, we will preserve the last replica in ISR. This way, we know which
> replica has all committed messages and can wait for it to come back as the
> leader, if unclean leader election is disabled.
>
> Thanks,
>
> Jun
>
> On Mon, Nov 17, 2014 at 11:06 AM, Jason Rosenberg <jb...@squareup.com>
> wrote:
>
> > We have had 2 nodes in a 4 node cluster die this weekend, sadly.
> > Fortunately there was no critical data on these machines yet.
> >
> > The cluster is running 0.8.1.1, and using replication factor of 2 for 2
> > topics, each with 20 partitions.
> >
> > For sake of discussion, assume that nodes A and B are still up, and C
> and D
> > are now down.
> >
> > As expected, partitions that had one replica on a good host (A or B) and
> > one on a bad node (C or D), had their ISR shrink to just 1 node (A or B).
> >
> > Roughly 1/6 of the partitions had their 2 replicas on the 2 bad nodes, C
> > and D.  For these, I was expecting the ISR to show up as empty, and the
> > partition unavailable.
> >
> > However, that's not what I'm seeing.  When running TopicCommand
> --describe,
> > I see that the ISR still shows 1 replica, on node D (D was the second
> node
> > to go down).
> >
> > And, producers are still periodically trying to produce to node D (but
> > failing and retrying to one of the good nodes).
> >
> > So, it seems the cluster's meta data is still thinking that node D is up
> > and serving the partitions that were only replicated on C and D.
>  However,
> > for partitions that were on A and D, or B and D, D is not shown as being
> in
> > the ISR.
> >
> > Is this correct?  Should the cluster continue showing the last node to
> have
> > been alive for a partition as still in the ISR?
> >
> > Jason
> >
>

Re: ISR shrink to 0?

Posted by Jun Rao <ju...@gmail.com>.

Yes, we will preserve the last replica in ISR. This way, we know which
replica has all committed messages and can wait for it to come back as the
leader, if unclean leader election is disabled.

Thanks,

Jun

On Mon, Nov 17, 2014 at 11:06 AM, Jason Rosenberg <jb...@squareup.com> wrote:

> We have had 2 nodes in a 4 node cluster die this weekend, sadly.
> Fortunately there was no critical data on these machines yet.
>
> The cluster is running 0.8.1.1, and using replication factor of 2 for 2
> topics, each with 20 partitions.
>
> For sake of discussion, assume that nodes A and B are still up, and C and D
> are now down.
>
> As expected, partitions that had one replica on a good host (A or B) and
> one on a bad node (C or D), had their ISR shrink to just 1 node (A or B).
>
> Roughly 1/6 of the partitions had their 2 replicas on the 2 bad nodes, C
> and D.  For these, I was expecting the ISR to show up as empty, and the
> partition unavailable.
>
> However, that's not what I'm seeing.  When running TopicCommand --describe,
> I see that the ISR still shows 1 replica, on node D (D was the second node
> to go down).
>
> And, producers are still periodically trying to produce to node D (but
> failing and retrying to one of the good nodes).
>
> So, it seems the cluster's meta data is still thinking that node D is up
> and serving the partitions that were only replicated on C and D.   However,
> for partitions that were on A and D, or B and D, D is not shown as being in
> the ISR.
>
> Is this correct?  Should the cluster continue showing the last node to have
> been alive for a partition as still in the ISR?
>
> Jason
>

Re: ISR shrink to 0?

Posted by Jason Rosenberg <jb...@squareup.com>.

Not sure what happened, but the issue went away once revived the broker id
on a new host....

But it does seem host D's ISR leadership could not be cleared until another
member of the ISR came back.....somehow D was stale and remained stuck (and
clients therefore kept trying to connect to it)...

Jason

On Mon, Nov 17, 2014 at 2:06 PM, Jason Rosenberg <jb...@squareup.com> wrote:

> We have had 2 nodes in a 4 node cluster die this weekend, sadly.
> Fortunately there was no critical data on these machines yet.
>
> The cluster is running 0.8.1.1, and using replication factor of 2 for 2
> topics, each with 20 partitions.
>
> For sake of discussion, assume that nodes A and B are still up, and C and
> D are now down.
>
> As expected, partitions that had one replica on a good host (A or B) and
> one on a bad node (C or D), had their ISR shrink to just 1 node (A or B).
>
> Roughly 1/6 of the partitions had their 2 replicas on the 2 bad nodes, C
> and D.  For these, I was expecting the ISR to show up as empty, and the
> partition unavailable.
>
> However, that's not what I'm seeing.  When running TopicCommand
> --describe, I see that the ISR still shows 1 replica, on node D (D was the
> second node to go down).
>
> And, producers are still periodically trying to produce to node D (but
> failing and retrying to one of the good nodes).
>
> So, it seems the cluster's meta data is still thinking that node D is up
> and serving the partitions that were only replicated on C and D.   However,
> for partitions that were on A and D, or B and D, D is not shown as being in
> the ISR.
>
> Is this correct?  Should the cluster continue showing the last node to
> have been alive for a partition as still in the ISR?
>
> Jason
>
>
>