You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Jason Rosenberg <jb...@squareup.com> on 2013/10/25 10:18:02 UTC

Controlled shutdown failure, retry settings

I'm running into an issue where sometimes, the controlled shutdown fails to
complete after the default 3 retry attempts.  This ended up in one case,
with a broker under going an unclean shutdown, and then it was in a rather
bad state after restart.  Producers would connect to the metadata vip,
still think that this broker was the leader, and then fail on that leader,
and then reconnect to to the metadata vip, and get sent back to that same
failed broker!   Does that make sense?

I'm trying to understand the conditions which cause the controlled shutdown
to fail?  There doesn't seem to be a setting for max amount of time to
wait, etc.

It would be nice to specify how long to try before giving up (hopefully
giving up in a graceful way).

Instead, we have a retry count, but it's not clear what this retry count is
really specifying, in terms of how long to keep trying, etc.

Also, what are the ramifications for different settings for the
controlled.shutdown.retry.backoff.ms?  Is there a reason we want to wait
before retrying again (again, it would be helpful to understand the reasons
for a controlled shutdown failure).

Thanks,

Jason

Re: Controlled shutdown failure, retry settings

Posted by Joel Koshy <jj...@gmail.com>.

On Fri, Oct 25, 2013 at 1:18 AM, Jason Rosenberg <jb...@squareup.com> wrote:
> I'm running into an issue where sometimes, the controlled shutdown fails to
> complete after the default 3 retry attempts.  This ended up in one case,
> with a broker under going an unclean shutdown, and then it was in a rather
> bad state after restart.  Producers would connect to the metadata vip,
> still think that this broker was the leader, and then fail on that leader,
> and then reconnect to to the metadata vip, and get sent back to that same
> failed broker!   Does that make sense?
>
> I'm trying to understand the conditions which cause the controlled shutdown
> to fail?  There doesn't seem to be a setting for max amount of time to
> wait, etc.

There is a retry interval (controlled.shutdown.retry.backoff.ms).
Controlled shutdown fails when the controller is unable to move
leadership of partitions from the broker being shutdown to another
broker. This happens for instance when the broker being shutdown is
the only replica for any partition that it leads (i.e., if the
follower replicas are out of ISR). Each attempt will report the
partitions remaining on the broker. If all retries are exhausted then
we proceed to an uncontrolled shutdown. So for the partitions for
which leadership could not be moved we will do an unclean leader
election (https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-Whathappenswhentherearenootherreplicasinthe%22insync%22setforapartition%3F).
If your producers are getting sent back to the same broker that's
weird (assuming you have a > 1 replication factor) You will need to
look into your controller and state change logs to determine what's
going on.

>
> Also, what are the ramifications for different settings for the
> controlled.shutdown.retry.backoff.ms?  Is there a reason we want to wait
> before retrying again (again, it would be helpful to understand the reasons
> for a controlled shutdown failure).

Yes - e.g., if a partition is underreplicated due to a follower being
slow for any reason. Controlled shutdown would fail but may succeed
after a short while if the follower re-enters ISR. That reason could
be a bounce. i.e., when you are doing a rolling bounce and bring up a
broker that was just shut down the partitions that are assigned to it
would be under-replicated - but you would have proceeded to do a
controlled shutdown on the next broker in the sequence. That broker
may also be assigned partitions that were on the preceding broker in
the sequence - in which case those partitions being underreplicated
would cause controlled shutdown to fail. However, after the previous
broker is fully caught up (in a few ms or seconds.. depending on how
long it was down) it will succeed - i.e., if the retry interval times
num.retries is set large enough to the expected period of
underreplication it will succeed. There are a couple other scenarios
that are outlined on the original controlled shutdown jira which I can
look up later in the day if you are interested.

Joel

Re: Controlled shutdown failure, retry settings

Posted by Jun Rao <ju...@gmail.com>.

A replica is dropped out of ISR if (1) it hasn't issue a fetch request for
some time, or (2) it's behind the leader by some messages. The replica will
be added back to ISR if neither condition is longer true.

The actual value depends on the application. For example, if there is a
spike and the follower can't keep up, the application has to decide whether
to slow down the commit of messages or let the replicas drift apart.

Thanks,

Jun


On Sun, Nov 3, 2013 at 3:23 AM, Jason Rosenberg <jb...@squareup.com> wrote:

> Jun,
>
> Can you explain the difference between "failed" and "slow"?  In either
> case, the follower drops out of the ISR, and can come back later if they
> catch up, no?
>
> In the configuration doc, it seems to describe them both with the same
> language:  "if ....., the leader will remove the follower from ISR and
> treat it as dead."
>
> The *.max.messages setting seems to make the system somewhat susceptible to
> sudden spikes of message traffic.
>
> At first glance, the defaults seem a bit out of balance.  The default *.
> max.ms is 10 seconds, while the default *.max.messages is only 4000
> messages.   Given that we can handle 10's of thousands of messages a
> second, what is the thinking behind these defaults?
>
> Jason
>
>
> On Sat, Nov 2, 2013 at 11:41 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > replica.lag.time.max.ms is used to detect a failed broker.
> > replica.lag.max.messages is used to detect a slow broker.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Fri, Nov 1, 2013 at 10:36 PM, Jason Rosenberg <jb...@squareup.com>
> wrote:
> >
> > > In response to Joel's point, I think I do understand that messages can
> be
> > > lost, if in fact we have dropped down to only 1 member in the ISR at
> the
> > > time the message is written, and then that 1 node goes down.
> > >
> > > What I'm not clear on, is the conditions under which a node can drop
> out
> > of
> > > the ISR.  You said:
> > >
> > > "- ISR = 0, leader = 0; (so 1 is out of the ISR - say if broker 0 is
> > > slow (but up))"
> > >
> > > Did you mean to say "if broker *1* is slow (but up)"?
> > >
> > > I assume by "slow", you mean when a follower hasn't made a fetch
> request
> > > within "replica.lag.time.max.ms"?  The default for this is 10000 ms,
> so
> > it
> > > would have to be in pretty bad shape to be "up" but "slow", no?
> > >
> > > It also does seem odd that a node can be too slow to remain in an ISR,
> > but
> > > then be made available to compete in a ISR leader election.....
> > >
> > > Jason
> > >
> > >
> > > On Fri, Nov 1, 2013 at 5:00 PM, Neha Narkhede <neha.narkhede@gmail.com
> > > >wrote:
> > >
> > > > For supporting more durability at the expense of availability, we
> have
> > a
> > > > JIRA that we will fix on trunk. This will allow you to configure the
> > > > default as well as per topic durability vs availability behavior  -
> > > > https://issues.apache.org/jira/browse/KAFKA-1028
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Fri, Nov 1, 2013 at 1:43 PM, Joel Koshy <jj...@gmail.com>
> > wrote:
> > > >
> > > > > >>>>
> > > > > >>>> Unclean shutdown could result in data loss - since you are
> > moving
> > > > > >>>> leadership to a replica that has fallen out of ISR. i.e., it's
> > log
> > > > end
> > > > > >>>> offset is behind the last committed message to this partition.
> > > > > >>>>
> > > > > >>>>
> > > > > >>> But if data is written with 'request.required.acks=-1', no data
> > > > should
> > > > > be
> > > > > >>> lost, no?  Or will partitions be truncated wholesale after an
> > > unclean
> > > > > >>> shutdown?
> > > > >
> > > > > Sorry about the delayed reply to this, but it is an important
> point -
> > > > > data can be lost even in this case. -1 means ack after all replicas
> > in
> > > > > the current ISR have received the message. So for example:
> > > > > - assigned replicas for some partition = 0,1
> > > > > - ISR = 0, leader = 0; (so 1 is out of the ISR - say if broker 0 is
> > > > > slow (but up))
> > > > > - messages with acks=-1 will get committed (i.e., the producer will
> > > > > receive the ack)
> > > > > - shutdown broker 0. So this will result in an unclean leader
> > election
> > > > > since there are no other replicas in ISR; and if the broker 1 is up
> > it
> > > > > will become the new leader
> > > > > - however it only has messages until the earlier watermark
> > > > > - when broker 0 comes back up it will become a follower to 1. Its
> > > > > initial fetch will be for its last checkpointed high watermark
> (which
> > > > > is larger than the current leader's hw), but the leader will piggy
> > > > > back the current hw in the fetch response to the follower which
> will
> > > > > then truncate its log to the leader's current hw. At least this is
> my
> > > > > recollection of how it should be - but I'm staring at the code and
> > > > > think there may be an issue with it. Will update this thread if
> this
> > > > > is incorrect.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Joel
> > > > >
> > > >
> > >
> >
>

Re: Controlled shutdown failure, retry settings

Posted by Jason Rosenberg <jb...@squareup.com>.

Jun,

Can you explain the difference between "failed" and "slow"?  In either
case, the follower drops out of the ISR, and can come back later if they
catch up, no?

In the configuration doc, it seems to describe them both with the same
language:  "if ....., the leader will remove the follower from ISR and
treat it as dead."

The *.max.messages setting seems to make the system somewhat susceptible to
sudden spikes of message traffic.

At first glance, the defaults seem a bit out of balance.  The default *.
max.ms is 10 seconds, while the default *.max.messages is only 4000
messages.   Given that we can handle 10's of thousands of messages a
second, what is the thinking behind these defaults?

Jason


On Sat, Nov 2, 2013 at 11:41 PM, Jun Rao <ju...@gmail.com> wrote:

> replica.lag.time.max.ms is used to detect a failed broker.
> replica.lag.max.messages is used to detect a slow broker.
>
> Thanks,
>
> Jun
>
>
> On Fri, Nov 1, 2013 at 10:36 PM, Jason Rosenberg <jb...@squareup.com> wrote:
>
> > In response to Joel's point, I think I do understand that messages can be
> > lost, if in fact we have dropped down to only 1 member in the ISR at the
> > time the message is written, and then that 1 node goes down.
> >
> > What I'm not clear on, is the conditions under which a node can drop out
> of
> > the ISR.  You said:
> >
> > "- ISR = 0, leader = 0; (so 1 is out of the ISR - say if broker 0 is
> > slow (but up))"
> >
> > Did you mean to say "if broker *1* is slow (but up)"?
> >
> > I assume by "slow", you mean when a follower hasn't made a fetch request
> > within "replica.lag.time.max.ms"?  The default for this is 10000 ms, so
> it
> > would have to be in pretty bad shape to be "up" but "slow", no?
> >
> > It also does seem odd that a node can be too slow to remain in an ISR,
> but
> > then be made available to compete in a ISR leader election.....
> >
> > Jason
> >
> >
> > On Fri, Nov 1, 2013 at 5:00 PM, Neha Narkhede <neha.narkhede@gmail.com
> > >wrote:
> >
> > > For supporting more durability at the expense of availability, we have
> a
> > > JIRA that we will fix on trunk. This will allow you to configure the
> > > default as well as per topic durability vs availability behavior  -
> > > https://issues.apache.org/jira/browse/KAFKA-1028
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Fri, Nov 1, 2013 at 1:43 PM, Joel Koshy <jj...@gmail.com>
> wrote:
> > >
> > > > >>>>
> > > > >>>> Unclean shutdown could result in data loss - since you are
> moving
> > > > >>>> leadership to a replica that has fallen out of ISR. i.e., it's
> log
> > > end
> > > > >>>> offset is behind the last committed message to this partition.
> > > > >>>>
> > > > >>>>
> > > > >>> But if data is written with 'request.required.acks=-1', no data
> > > should
> > > > be
> > > > >>> lost, no?  Or will partitions be truncated wholesale after an
> > unclean
> > > > >>> shutdown?
> > > >
> > > > Sorry about the delayed reply to this, but it is an important point -
> > > > data can be lost even in this case. -1 means ack after all replicas
> in
> > > > the current ISR have received the message. So for example:
> > > > - assigned replicas for some partition = 0,1
> > > > - ISR = 0, leader = 0; (so 1 is out of the ISR - say if broker 0 is
> > > > slow (but up))
> > > > - messages with acks=-1 will get committed (i.e., the producer will
> > > > receive the ack)
> > > > - shutdown broker 0. So this will result in an unclean leader
> election
> > > > since there are no other replicas in ISR; and if the broker 1 is up
> it
> > > > will become the new leader
> > > > - however it only has messages until the earlier watermark
> > > > - when broker 0 comes back up it will become a follower to 1. Its
> > > > initial fetch will be for its last checkpointed high watermark (which
> > > > is larger than the current leader's hw), but the leader will piggy
> > > > back the current hw in the fetch response to the follower which will
> > > > then truncate its log to the leader's current hw. At least this is my
> > > > recollection of how it should be - but I'm staring at the code and
> > > > think there may be an issue with it. Will update this thread if this
> > > > is incorrect.
> > > >
> > > > Thanks,
> > > >
> > > > Joel
> > > >
> > >
> >
>

Re: Controlled shutdown failure, retry settings

Posted by Jun Rao <ju...@gmail.com>.

replica.lag.time.max.ms is used to detect a failed broker.
replica.lag.max.messages is used to detect a slow broker.

Thanks,

Jun


On Fri, Nov 1, 2013 at 10:36 PM, Jason Rosenberg <jb...@squareup.com> wrote:

> In response to Joel's point, I think I do understand that messages can be
> lost, if in fact we have dropped down to only 1 member in the ISR at the
> time the message is written, and then that 1 node goes down.
>
> What I'm not clear on, is the conditions under which a node can drop out of
> the ISR.  You said:
>
> "- ISR = 0, leader = 0; (so 1 is out of the ISR - say if broker 0 is
> slow (but up))"
>
> Did you mean to say "if broker *1* is slow (but up)"?
>
> I assume by "slow", you mean when a follower hasn't made a fetch request
> within "replica.lag.time.max.ms"?  The default for this is 10000 ms, so it
> would have to be in pretty bad shape to be "up" but "slow", no?
>
> It also does seem odd that a node can be too slow to remain in an ISR, but
> then be made available to compete in a ISR leader election.....
>
> Jason
>
>
> On Fri, Nov 1, 2013 at 5:00 PM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > For supporting more durability at the expense of availability, we have a
> > JIRA that we will fix on trunk. This will allow you to configure the
> > default as well as per topic durability vs availability behavior  -
> > https://issues.apache.org/jira/browse/KAFKA-1028
> >
> > Thanks,
> > Neha
> >
> >
> > On Fri, Nov 1, 2013 at 1:43 PM, Joel Koshy <jj...@gmail.com> wrote:
> >
> > > >>>>
> > > >>>> Unclean shutdown could result in data loss - since you are moving
> > > >>>> leadership to a replica that has fallen out of ISR. i.e., it's log
> > end
> > > >>>> offset is behind the last committed message to this partition.
> > > >>>>
> > > >>>>
> > > >>> But if data is written with 'request.required.acks=-1', no data
> > should
> > > be
> > > >>> lost, no?  Or will partitions be truncated wholesale after an
> unclean
> > > >>> shutdown?
> > >
> > > Sorry about the delayed reply to this, but it is an important point -
> > > data can be lost even in this case. -1 means ack after all replicas in
> > > the current ISR have received the message. So for example:
> > > - assigned replicas for some partition = 0,1
> > > - ISR = 0, leader = 0; (so 1 is out of the ISR - say if broker 0 is
> > > slow (but up))
> > > - messages with acks=-1 will get committed (i.e., the producer will
> > > receive the ack)
> > > - shutdown broker 0. So this will result in an unclean leader election
> > > since there are no other replicas in ISR; and if the broker 1 is up it
> > > will become the new leader
> > > - however it only has messages until the earlier watermark
> > > - when broker 0 comes back up it will become a follower to 1. Its
> > > initial fetch will be for its last checkpointed high watermark (which
> > > is larger than the current leader's hw), but the leader will piggy
> > > back the current hw in the fetch response to the follower which will
> > > then truncate its log to the leader's current hw. At least this is my
> > > recollection of how it should be - but I'm staring at the code and
> > > think there may be an issue with it. Will update this thread if this
> > > is incorrect.
> > >
> > > Thanks,
> > >
> > > Joel
> > >
> >
>

Re: Controlled shutdown failure, retry settings

Posted by Jason Rosenberg <jb...@squareup.com>.

In response to Joel's point, I think I do understand that messages can be
lost, if in fact we have dropped down to only 1 member in the ISR at the
time the message is written, and then that 1 node goes down.

What I'm not clear on, is the conditions under which a node can drop out of
the ISR.  You said:

"- ISR = 0, leader = 0; (so 1 is out of the ISR - say if broker 0 is
slow (but up))"

Did you mean to say "if broker *1* is slow (but up)"?

I assume by "slow", you mean when a follower hasn't made a fetch request
within "replica.lag.time.max.ms"?  The default for this is 10000 ms, so it
would have to be in pretty bad shape to be "up" but "slow", no?

It also does seem odd that a node can be too slow to remain in an ISR, but
then be made available to compete in a ISR leader election.....

Jason


On Fri, Nov 1, 2013 at 5:00 PM, Neha Narkhede <ne...@gmail.com>wrote:

> For supporting more durability at the expense of availability, we have a
> JIRA that we will fix on trunk. This will allow you to configure the
> default as well as per topic durability vs availability behavior  -
> https://issues.apache.org/jira/browse/KAFKA-1028
>
> Thanks,
> Neha
>
>
> On Fri, Nov 1, 2013 at 1:43 PM, Joel Koshy <jj...@gmail.com> wrote:
>
> > >>>>
> > >>>> Unclean shutdown could result in data loss - since you are moving
> > >>>> leadership to a replica that has fallen out of ISR. i.e., it's log
> end
> > >>>> offset is behind the last committed message to this partition.
> > >>>>
> > >>>>
> > >>> But if data is written with 'request.required.acks=-1', no data
> should
> > be
> > >>> lost, no?  Or will partitions be truncated wholesale after an unclean
> > >>> shutdown?
> >
> > Sorry about the delayed reply to this, but it is an important point -
> > data can be lost even in this case. -1 means ack after all replicas in
> > the current ISR have received the message. So for example:
> > - assigned replicas for some partition = 0,1
> > - ISR = 0, leader = 0; (so 1 is out of the ISR - say if broker 0 is
> > slow (but up))
> > - messages with acks=-1 will get committed (i.e., the producer will
> > receive the ack)
> > - shutdown broker 0. So this will result in an unclean leader election
> > since there are no other replicas in ISR; and if the broker 1 is up it
> > will become the new leader
> > - however it only has messages until the earlier watermark
> > - when broker 0 comes back up it will become a follower to 1. Its
> > initial fetch will be for its last checkpointed high watermark (which
> > is larger than the current leader's hw), but the leader will piggy
> > back the current hw in the fetch response to the follower which will
> > then truncate its log to the leader's current hw. At least this is my
> > recollection of how it should be - but I'm staring at the code and
> > think there may be an issue with it. Will update this thread if this
> > is incorrect.
> >
> > Thanks,
> >
> > Joel
> >
>

Re: Controlled shutdown failure, retry settings

Posted by Neha Narkhede <ne...@gmail.com>.

For supporting more durability at the expense of availability, we have a
JIRA that we will fix on trunk. This will allow you to configure the
default as well as per topic durability vs availability behavior  -
https://issues.apache.org/jira/browse/KAFKA-1028

Thanks,
Neha


On Fri, Nov 1, 2013 at 1:43 PM, Joel Koshy <jj...@gmail.com> wrote:

> >>>>
> >>>> Unclean shutdown could result in data loss - since you are moving
> >>>> leadership to a replica that has fallen out of ISR. i.e., it's log end
> >>>> offset is behind the last committed message to this partition.
> >>>>
> >>>>
> >>> But if data is written with 'request.required.acks=-1', no data should
> be
> >>> lost, no?  Or will partitions be truncated wholesale after an unclean
> >>> shutdown?
>
> Sorry about the delayed reply to this, but it is an important point -
> data can be lost even in this case. -1 means ack after all replicas in
> the current ISR have received the message. So for example:
> - assigned replicas for some partition = 0,1
> - ISR = 0, leader = 0; (so 1 is out of the ISR - say if broker 0 is
> slow (but up))
> - messages with acks=-1 will get committed (i.e., the producer will
> receive the ack)
> - shutdown broker 0. So this will result in an unclean leader election
> since there are no other replicas in ISR; and if the broker 1 is up it
> will become the new leader
> - however it only has messages until the earlier watermark
> - when broker 0 comes back up it will become a follower to 1. Its
> initial fetch will be for its last checkpointed high watermark (which
> is larger than the current leader's hw), but the leader will piggy
> back the current hw in the fetch response to the follower which will
> then truncate its log to the leader's current hw. At least this is my
> recollection of how it should be - but I'm staring at the code and
> think there may be an issue with it. Will update this thread if this
> is incorrect.
>
> Thanks,
>
> Joel
>

Re: Controlled shutdown failure, retry settings

Posted by Joel Koshy <jj...@gmail.com>.

>>>>
>>>> Unclean shutdown could result in data loss - since you are moving
>>>> leadership to a replica that has fallen out of ISR. i.e., it's log end
>>>> offset is behind the last committed message to this partition.
>>>>
>>>>
>>> But if data is written with 'request.required.acks=-1', no data should be
>>> lost, no?  Or will partitions be truncated wholesale after an unclean
>>> shutdown?

Sorry about the delayed reply to this, but it is an important point -
data can be lost even in this case. -1 means ack after all replicas in
the current ISR have received the message. So for example:
- assigned replicas for some partition = 0,1
- ISR = 0, leader = 0; (so 1 is out of the ISR - say if broker 0 is
slow (but up))
- messages with acks=-1 will get committed (i.e., the producer will
receive the ack)
- shutdown broker 0. So this will result in an unclean leader election
since there are no other replicas in ISR; and if the broker 1 is up it
will become the new leader
- however it only has messages until the earlier watermark
- when broker 0 comes back up it will become a follower to 1. Its
initial fetch will be for its last checkpointed high watermark (which
is larger than the current leader's hw), but the leader will piggy
back the current hw in the fetch response to the follower which will
then truncate its log to the leader's current hw. At least this is my
recollection of how it should be - but I'm staring at the code and
think there may be an issue with it. Will update this thread if this
is incorrect.

Thanks,

Joel

Re: Controlled shutdown failure, retry settings

Posted by Jason Rosenberg <jb...@squareup.com>.

I've filed: https://issues.apache.org/jira/browse/KAFKA-1108


On Tue, Oct 29, 2013 at 4:29 PM, Jason Rosenberg <jb...@squareup.com> wrote:

> Here's another exception I see during controlled shutdown (this time there
> was not an unclean shutdown problem).  Should I be concerned about this
> exception? Is any data loss possible with this?  This one happened after
> the first "Retrying controlled shutdown after the previous attempt
> failed..." message.  The controlled shutdown subsequently succeeded without
> another retry (but with a few more of these exceptions).
>
> Again, there was no "Remaining partitions to move..." message before the
> retrying message, so I assume the retry happens after an IOException (which
> is not logged in KafkaServer.controlledShutdown).
>
> 2013-10-29 20:03:31,883  INFO [kafka-request-handler-4]
> controller.ReplicaStateMachine - [Replica state machine on controller 10]:
> Invoking state change to OfflineReplica for replicas
> PartitionAndReplica(mytopic,0,10)
> 2013-10-29 20:03:31,883 ERROR [kafka-request-handler-4] change.logger -
> Controller 10 epoch 190 initiated state change of replica 10 for partition
> [mytopic,0] to OfflineReplica failed
> java.lang.AssertionError: assertion failed: Replica 10 for partition
> [mytopic,0] should be in the NewReplica,OnlineReplica states before moving
> to OfflineReplica state. Instead it is in OfflineReplica state
>         at scala.Predef$.assert(Predef.scala:91)
>         at
> kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:209)
>         at
> kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:167)
>         at
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:89)
>         at
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:89)
>         at scala.collection.immutable.Set$Set1.foreach(Set.scala:81)
>         at
> kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:89)
>         at
> kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:199)
>         at
> kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:184)
>         at scala.Option.foreach(Option.scala:121)
>         at
> kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:184)
>         at
> kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:180)
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
>         at
> kafka.controller.KafkaController.shutdownBroker(KafkaController.scala:180)
>         at
> kafka.server.KafkaApis.handleControlledShutdownRequest(KafkaApis.scala:133)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:72)
>         at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
>         at java.lang.Thread.run(Thread.java:662)
>
> Jason
>
>
> On Fri, Oct 25, 2013 at 11:51 PM, Jason Rosenberg <jb...@squareup.com>wrote:
>
>>
>>
>> On Fri, Oct 25, 2013 at 9:16 PM, Joel Koshy <jj...@gmail.com> wrote:
>>
>>>
>>> Unclean shutdown could result in data loss - since you are moving
>>> leadership to a replica that has fallen out of ISR. i.e., it's log end
>>> offset is behind the last committed message to this partition.
>>>
>>>
>> But if data is written with 'request.required.acks=-1', no data should be
>> lost, no?  Or will partitions be truncated wholesale after an unclean
>> shutdown?
>>
>>
>>
>>>
>>> Take a look at the packaged log4j.properties file. The controller's
>>> partition/replica state machines and its channel manager which
>>> sends/receives leaderandisr requests/responses to brokers uses a
>>> stateChangeLogger. The replica managers on all brokers also use this
>>> logger.
>>
>>
>> Ah.....so it looks like most things logged with the stateChangeLogger are
>> logged at the TRACE log level.....and that's the default setting in the
>> log4j.properties file.  Needless to say, my contained KafkaServer is not
>> currently using that log4j.properties (we are just using a rootLogger with
>> level = INFO by default).  I can probably add a special rule to use TRACE
>> for the state.change.logger category.  However, I'm not sure I can make it
>> so that logging all goes to it's own separate log file.....
>>
>>>
>>> Our logging can improve - e.g., it looks like on controller movement
>>> we could retry without saying why.
>>>
>>
>> I can file a jira for this, but I'm not sure what it should say!
>>
>
>

Re: Controlled shutdown failure, retry settings

Posted by Jason Rosenberg <jb...@squareup.com>.

Here's another exception I see during controlled shutdown (this time there
was not an unclean shutdown problem).  Should I be concerned about this
exception? Is any data loss possible with this?  This one happened after
the first "Retrying controlled shutdown after the previous attempt
failed..." message.  The controlled shutdown subsequently succeeded without
another retry (but with a few more of these exceptions).

Again, there was no "Remaining partitions to move..." message before the
retrying message, so I assume the retry happens after an IOException (which
is not logged in KafkaServer.controlledShutdown).

2013-10-29 20:03:31,883  INFO [kafka-request-handler-4]
controller.ReplicaStateMachine - [Replica state machine on controller 10]:
Invoking state change to OfflineReplica for replicas
PartitionAndReplica(mytopic,0,10)
2013-10-29 20:03:31,883 ERROR [kafka-request-handler-4] change.logger -
Controller 10 epoch 190 initiated state change of replica 10 for partition
[mytopic,0] to OfflineReplica failed
java.lang.AssertionError: assertion failed: Replica 10 for partition
[mytopic,0] should be in the NewReplica,OnlineReplica states before moving
to OfflineReplica state. Instead it is in OfflineReplica state
        at scala.Predef$.assert(Predef.scala:91)
        at
kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:209)
        at
kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:167)
        at
kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:89)
        at
kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:89)
        at scala.collection.immutable.Set$Set1.foreach(Set.scala:81)
        at
kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:89)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:199)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:184)
        at scala.Option.foreach(Option.scala:121)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:184)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:180)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
        at
kafka.controller.KafkaController.shutdownBroker(KafkaController.scala:180)
        at
kafka.server.KafkaApis.handleControlledShutdownRequest(KafkaApis.scala:133)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:72)
        at
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
        at java.lang.Thread.run(Thread.java:662)

Jason


On Fri, Oct 25, 2013 at 11:51 PM, Jason Rosenberg <jb...@squareup.com> wrote:

>
>
> On Fri, Oct 25, 2013 at 9:16 PM, Joel Koshy <jj...@gmail.com> wrote:
>
>>
>> Unclean shutdown could result in data loss - since you are moving
>> leadership to a replica that has fallen out of ISR. i.e., it's log end
>> offset is behind the last committed message to this partition.
>>
>>
> But if data is written with 'request.required.acks=-1', no data should be
> lost, no?  Or will partitions be truncated wholesale after an unclean
> shutdown?
>
>
>
>>
>> Take a look at the packaged log4j.properties file. The controller's
>> partition/replica state machines and its channel manager which
>> sends/receives leaderandisr requests/responses to brokers uses a
>> stateChangeLogger. The replica managers on all brokers also use this
>> logger.
>
>
> Ah.....so it looks like most things logged with the stateChangeLogger are
> logged at the TRACE log level.....and that's the default setting in the
> log4j.properties file.  Needless to say, my contained KafkaServer is not
> currently using that log4j.properties (we are just using a rootLogger with
> level = INFO by default).  I can probably add a special rule to use TRACE
> for the state.change.logger category.  However, I'm not sure I can make it
> so that logging all goes to it's own separate log file.....
>
>>
>> Our logging can improve - e.g., it looks like on controller movement
>> we could retry without saying why.
>>
>
> I can file a jira for this, but I'm not sure what it should say!
>

Re: Controlled shutdown failure, retry settings

Posted by Jason Rosenberg <jb...@squareup.com>.

On Fri, Oct 25, 2013 at 9:16 PM, Joel Koshy <jj...@gmail.com> wrote:

>
> Unclean shutdown could result in data loss - since you are moving
> leadership to a replica that has fallen out of ISR. i.e., it's log end
> offset is behind the last committed message to this partition.
>
>
But if data is written with 'request.required.acks=-1', no data should be
lost, no?  Or will partitions be truncated wholesale after an unclean
shutdown?

>
> Take a look at the packaged log4j.properties file. The controller's
> partition/replica state machines and its channel manager which
> sends/receives leaderandisr requests/responses to brokers uses a
> stateChangeLogger. The replica managers on all brokers also use this
> logger.

Ah.....so it looks like most things logged with the stateChangeLogger are
logged at the TRACE log level.....and that's the default setting in the
log4j.properties file.  Needless to say, my contained KafkaServer is not
currently using that log4j.properties (we are just using a rootLogger with
level = INFO by default).  I can probably add a special rule to use TRACE
for the state.change.logger category.  However, I'm not sure I can make it
so that logging all goes to it's own separate log file.....

>
> Our logging can improve - e.g., it looks like on controller movement
> we could retry without saying why.
>

I can file a jira for this, but I'm not sure what it should say!

Re: Controlled shutdown failure, retry settings

Posted by Joel Koshy <jj...@gmail.com>.

On Fri, Oct 25, 2013 at 3:22 PM, Jason Rosenberg <jb...@squareup.com> wrote:
> It looks like when the controlled shutdown failes with an IOException, the
> exception is swallowed, and we see nothing in the logs:
>
>             catch {
>               case ioe: java.io.IOException =>
>                 channel.disconnect()
>                 channel = null
>                 // ignore and try again
>             }

> Question.....what is the ramification of an 'unclean shutdown'?  Is it no
> different than a shutdown with no controlled shutdown ever attempted?  Or
> is it something more difficult to recover from?

Unclean shutdown could result in data loss - since you are moving
leadership to a replica that has fallen out of ISR. i.e., it's log end
offset is behind the last committed message to this partition.

>
> I am still not clear on how to generate the state transition logs.....Does
> the StateChangeLogManagerTool run against the main logs for the server (and
> just collates entries there)?

Take a look at the packaged log4j.properties file. The controller's
partition/replica state machines and its channel manager which
sends/receives leaderandisr requests/responses to brokers uses a
stateChangeLogger. The replica managers on all brokers also use this
logger.

>
>
> This one eventually succeeds, after a mysterious failure:
>
> 2013-10-25 00:11:53,891  INFO [Thread-13] server.KafkaServer - [Kafka
> Server 10], Starting controlled shutdown
> ....<no exceptions between these log lines><no "Remaining partitions to
> move....">....
> 2013-10-25 00:12:28,965  WARN [Thread-13] server.KafkaServer - [Kafka
> Server 10], Retrying controlled shutdown after the previous attempt
> failed...

Our logging can improve - e.g., it looks like on controller movement
we could retry without saying why.


Thanks,

Joel

>
>
>
> On Fri, Oct 25, 2013 at 12:51 PM, Jason Rosenberg <jb...@squareup.com> wrote:
>
>> Neha,
>>
>> It looks like the StateChangeLogMergerTool takes state change logs as
>> input.  I'm not sure I know where those live?  (Maybe update the doc on
>> that wiki page to describe!).
>>
>> Thanks,
>>
>> Jason
>>
>>
>> On Fri, Oct 25, 2013 at 12:38 PM, Neha Narkhede <ne...@gmail.com>wrote:
>>
>>> Jason,
>>>
>>> The state change log tool is described here -
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-7.StateChangeLogMergerTool
>>>
>>> I'm curious what the IOException is and if we can improve error reporting.
>>> Can you send around the stack trace ?
>>>
>>> Thanks,
>>> Neha
>>>
>>>
>>> On Fri, Oct 25, 2013 at 8:26 AM, Jason Rosenberg <jb...@squareup.com>
>>> wrote:
>>>
>>> > Ok,
>>> >
>>> > Looking at the controlled shutdown code, it appears that it can fail
>>> with
>>> > an IOException too, in which case it won't report the remaining
>>> partitions
>>> > to replicate, etc.  (I think that might be what I'm seeing, since I
>>> never
>>> > saw the log line for "controlled shutdown failed, X remaining
>>> partitions",
>>> > etc.).  In my case, that may be the issue (it's happening during a
>>> rolling
>>> > restart, and the second of 3 nodes might be trying to shutdown before
>>> the
>>> > first one has completely come back up).
>>> >
>>> > I've heard you guys mention several times now about controller and state
>>> > change logs.  But I don't know where those live (or how to configure).
>>> >  Please advise!
>>> >
>>> > Thanks,
>>> >
>>> > Jason
>>> >
>>> >
>>> > On Fri, Oct 25, 2013 at 10:40 AM, Neha Narkhede <
>>> neha.narkhede@gmail.com
>>> > >wrote:
>>> >
>>> > > Controlled shutdown can fail if the cluster has non zero under
>>> replicated
>>> > > partition count. Since that means the leaders may not move off of the
>>> > > broker being shutdown, causing controlled shutdown to fail. The
>>> backoff
>>> > > might help if the under replication is just temporary due to a spike
>>> in
>>> > > traffic. This is the most common reason it might fail besides bugs.
>>> But
>>> > you
>>> > > can check the logs to see why the shutdown failed.
>>> > >
>>> > > Thanks,
>>> > > Neha
>>> > > On Oct 25, 2013 1:18 AM, "Jason Rosenberg" <jb...@squareup.com> wrote:
>>> > >
>>> > > > I'm running into an issue where sometimes, the controlled shutdown
>>> > fails
>>> > > to
>>> > > > complete after the default 3 retry attempts.  This ended up in one
>>> > case,
>>> > > > with a broker under going an unclean shutdown, and then it was in a
>>> > > rather
>>> > > > bad state after restart.  Producers would connect to the metadata
>>> vip,
>>> > > > still think that this broker was the leader, and then fail on that
>>> > > leader,
>>> > > > and then reconnect to to the metadata vip, and get sent back to that
>>> > same
>>> > > > failed broker!   Does that make sense?
>>> > > >
>>> > > > I'm trying to understand the conditions which cause the controlled
>>> > > shutdown
>>> > > > to fail?  There doesn't seem to be a setting for max amount of time
>>> to
>>> > > > wait, etc.
>>> > > >
>>> > > > It would be nice to specify how long to try before giving up
>>> (hopefully
>>> > > > giving up in a graceful way).
>>> > > >
>>> > > > Instead, we have a retry count, but it's not clear what this retry
>>> > count
>>> > > is
>>> > > > really specifying, in terms of how long to keep trying, etc.
>>> > > >
>>> > > > Also, what are the ramifications for different settings for the
>>> > > > controlled.shutdown.retry.backoff.ms?  Is there a reason we want to
>>> > wait
>>> > > > before retrying again (again, it would be helpful to understand the
>>> > > reasons
>>> > > > for a controlled shutdown failure).
>>> > > >
>>> > > > Thanks,
>>> > > >
>>> > > > Jason
>>> > > >
>>> > >
>>> >
>>>
>>
>>

Re: Controlled shutdown failure, retry settings

Posted by Jason Rosenberg <jb...@squareup.com>.

It looks like when the controlled shutdown failes with an IOException, the
exception is swallowed, and we see nothing in the logs:

            catch {
              case ioe: java.io.IOException =>
                channel.disconnect()
                channel = null
                // ignore and try again
            }

So, I don't really have visibility into why controlled shutdown fails when
it does.....Below are some logging examples.

The first one eventually succeeds, after a mysterious error.  The second
one exhibits the issue suggested, ("no other replicas in ISR").

So, I'll try adding more retries and more back off delay.....

Question.....what is the ramification of an 'unclean shutdown'?  Is it no
different than a shutdown with no controlled shutdown ever attempted?  Or
is it something more difficult to recover from?

I am still not clear on how to generate the state transition logs.....Does
the StateChangeLogManagerTool run against the main logs for the server (and
just collates entries there)?


This one eventually succeeds, after a mysterious failure:

2013-10-25 00:11:53,891  INFO [Thread-13] server.KafkaServer - [Kafka
Server 10], Starting controlled shutdown
....<no exceptions between these log lines><no "Remaining partitions to
move....">....
2013-10-25 00:12:28,965  WARN [Thread-13] server.KafkaServer - [Kafka
Server 10], Retrying controlled shutdown after the previous attempt
failed...
....
2013-10-25 00:12:56,623  INFO [Thread-13] server.KafkaServer - [Kafka
Server 10], Controlled shutdown succeeded


This one fails ultimately, and proceeds with unclean shutdown:

2013-10-25 20:39:10,350  INFO [Thread-12] server.KafkaServer - [Kafka
Server 11], Starting controlled shutdown
...
<lots of exceptions like this><no "Remaining partitions to move....">....
2013-10-25 20:39:40,735 ERROR [kafka-request-handler-4] change.logger -
Controller 11 epoch 187 encountered error while electing leader for
partition [topicX,0] due to: No other replicas in ISR 11 for [topicX,0]
besides current leader 11 and shutting down brokers 11.
2013-10-25 20:39:40,735 ERROR [kafka-request-handler-4] change.logger -
Controller 11 epoch 187 initiated state change for partition [topicX,0]
from OnlinePartition to OnlinePartition failed
kafka.common.StateChangeFailedException: encountered error while electing
leader for partition [topicX,0] due to: No other replicas in ISR 11 for
[topicX,0] besides current leader 11 and shutting down brokers 11.
        at
kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:328)
        at
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:155)
        at
kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:111)
        at
kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:110)
        at scala.collection.immutable.Set$Set1.foreach(Set.scala:81)
        at
kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:110)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:188)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4$$anonfun$apply$2.apply(KafkaController.scala:184)
        at scala.Option.foreach(Option.scala:121)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:184)
        at
kafka.controller.KafkaController$$anonfun$shutdownBroker$4.apply(KafkaController.scala:180)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
        at
kafka.controller.KafkaController.shutdownBroker(KafkaController.scala:180)
        at
kafka.server.KafkaApis.handleControlledShutdownRequest(KafkaApis.scala:133)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:72)
        at
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
        at java.lang.Thread.run(Thread.java:662)
Caused by: kafka.common.StateChangeFailedException: No other replicas in
ISR 11 for [topicX,0] besides current leader 11 and shutting down brokers 11
        at
kafka.controller.ControlledShutdownLeaderSelector.selectLeader(PartitionLeaderSelector.scala:177)
        at
kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:304)
        ... 17 more
...
2013-10-25 20:39:45,404  WARN [Thread-12] server.KafkaServer - [Kafka
Server 11], Retrying controlled shutdown after the previous attempt
failed...
<lots more of the StateChangeFailedExceptions><no "Remaining partitions to
move....">....
2013-10-25 20:40:20,499  WARN [Thread-12] server.KafkaServer - [Kafka
Server 11], Retrying controlled shutdown after the previous attempt
failed...
<lots more of the StateChangeFailedExceptions><no "Remaining partitions to
move....">....
2013-10-25 20:40:55,598  WARN [Thread-12] server.KafkaServer - [Kafka
Server 11], Retrying controlled shutdown after the previous attempt
failed...
2013-10-25 20:40:55,598  WARN [Thread-12] server.KafkaServer - [Kafka
Server 11], Proceeding to do an unclean shutdown as all the controlled
shutdown attempts failed


So, this would seem to indicate the issue described previously (no leader
for partition so unclean shutdown).....

So, I'll try adding more retries and more back off delay.....

Question.....what is the ramification of an 'unclean shutdown'?  Is it no
different than a shutdown with no controlled shutdown ever attempted?  Or
is it something more difficult to recover from?






On Fri, Oct 25, 2013 at 12:51 PM, Jason Rosenberg <jb...@squareup.com> wrote:

> Neha,
>
> It looks like the StateChangeLogMergerTool takes state change logs as
> input.  I'm not sure I know where those live?  (Maybe update the doc on
> that wiki page to describe!).
>
> Thanks,
>
> Jason
>
>
> On Fri, Oct 25, 2013 at 12:38 PM, Neha Narkhede <ne...@gmail.com>wrote:
>
>> Jason,
>>
>> The state change log tool is described here -
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-7.StateChangeLogMergerTool
>>
>> I'm curious what the IOException is and if we can improve error reporting.
>> Can you send around the stack trace ?
>>
>> Thanks,
>> Neha
>>
>>
>> On Fri, Oct 25, 2013 at 8:26 AM, Jason Rosenberg <jb...@squareup.com>
>> wrote:
>>
>> > Ok,
>> >
>> > Looking at the controlled shutdown code, it appears that it can fail
>> with
>> > an IOException too, in which case it won't report the remaining
>> partitions
>> > to replicate, etc.  (I think that might be what I'm seeing, since I
>> never
>> > saw the log line for "controlled shutdown failed, X remaining
>> partitions",
>> > etc.).  In my case, that may be the issue (it's happening during a
>> rolling
>> > restart, and the second of 3 nodes might be trying to shutdown before
>> the
>> > first one has completely come back up).
>> >
>> > I've heard you guys mention several times now about controller and state
>> > change logs.  But I don't know where those live (or how to configure).
>> >  Please advise!
>> >
>> > Thanks,
>> >
>> > Jason
>> >
>> >
>> > On Fri, Oct 25, 2013 at 10:40 AM, Neha Narkhede <
>> neha.narkhede@gmail.com
>> > >wrote:
>> >
>> > > Controlled shutdown can fail if the cluster has non zero under
>> replicated
>> > > partition count. Since that means the leaders may not move off of the
>> > > broker being shutdown, causing controlled shutdown to fail. The
>> backoff
>> > > might help if the under replication is just temporary due to a spike
>> in
>> > > traffic. This is the most common reason it might fail besides bugs.
>> But
>> > you
>> > > can check the logs to see why the shutdown failed.
>> > >
>> > > Thanks,
>> > > Neha
>> > > On Oct 25, 2013 1:18 AM, "Jason Rosenberg" <jb...@squareup.com> wrote:
>> > >
>> > > > I'm running into an issue where sometimes, the controlled shutdown
>> > fails
>> > > to
>> > > > complete after the default 3 retry attempts.  This ended up in one
>> > case,
>> > > > with a broker under going an unclean shutdown, and then it was in a
>> > > rather
>> > > > bad state after restart.  Producers would connect to the metadata
>> vip,
>> > > > still think that this broker was the leader, and then fail on that
>> > > leader,
>> > > > and then reconnect to to the metadata vip, and get sent back to that
>> > same
>> > > > failed broker!   Does that make sense?
>> > > >
>> > > > I'm trying to understand the conditions which cause the controlled
>> > > shutdown
>> > > > to fail?  There doesn't seem to be a setting for max amount of time
>> to
>> > > > wait, etc.
>> > > >
>> > > > It would be nice to specify how long to try before giving up
>> (hopefully
>> > > > giving up in a graceful way).
>> > > >
>> > > > Instead, we have a retry count, but it's not clear what this retry
>> > count
>> > > is
>> > > > really specifying, in terms of how long to keep trying, etc.
>> > > >
>> > > > Also, what are the ramifications for different settings for the
>> > > > controlled.shutdown.retry.backoff.ms?  Is there a reason we want to
>> > wait
>> > > > before retrying again (again, it would be helpful to understand the
>> > > reasons
>> > > > for a controlled shutdown failure).
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jason
>> > > >
>> > >
>> >
>>
>
>

Re: Controlled shutdown failure, retry settings

Posted by Jason Rosenberg <jb...@squareup.com>.

Neha,

It looks like the StateChangeLogMergerTool takes state change logs as
input.  I'm not sure I know where those live?  (Maybe update the doc on
that wiki page to describe!).

Thanks,

Jason


On Fri, Oct 25, 2013 at 12:38 PM, Neha Narkhede <ne...@gmail.com>wrote:

> Jason,
>
> The state change log tool is described here -
>
> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-7.StateChangeLogMergerTool
>
> I'm curious what the IOException is and if we can improve error reporting.
> Can you send around the stack trace ?
>
> Thanks,
> Neha
>
>
> On Fri, Oct 25, 2013 at 8:26 AM, Jason Rosenberg <jb...@squareup.com> wrote:
>
> > Ok,
> >
> > Looking at the controlled shutdown code, it appears that it can fail with
> > an IOException too, in which case it won't report the remaining
> partitions
> > to replicate, etc.  (I think that might be what I'm seeing, since I never
> > saw the log line for "controlled shutdown failed, X remaining
> partitions",
> > etc.).  In my case, that may be the issue (it's happening during a
> rolling
> > restart, and the second of 3 nodes might be trying to shutdown before the
> > first one has completely come back up).
> >
> > I've heard you guys mention several times now about controller and state
> > change logs.  But I don't know where those live (or how to configure).
> >  Please advise!
> >
> > Thanks,
> >
> > Jason
> >
> >
> > On Fri, Oct 25, 2013 at 10:40 AM, Neha Narkhede <neha.narkhede@gmail.com
> > >wrote:
> >
> > > Controlled shutdown can fail if the cluster has non zero under
> replicated
> > > partition count. Since that means the leaders may not move off of the
> > > broker being shutdown, causing controlled shutdown to fail. The backoff
> > > might help if the under replication is just temporary due to a spike in
> > > traffic. This is the most common reason it might fail besides bugs. But
> > you
> > > can check the logs to see why the shutdown failed.
> > >
> > > Thanks,
> > > Neha
> > > On Oct 25, 2013 1:18 AM, "Jason Rosenberg" <jb...@squareup.com> wrote:
> > >
> > > > I'm running into an issue where sometimes, the controlled shutdown
> > fails
> > > to
> > > > complete after the default 3 retry attempts.  This ended up in one
> > case,
> > > > with a broker under going an unclean shutdown, and then it was in a
> > > rather
> > > > bad state after restart.  Producers would connect to the metadata
> vip,
> > > > still think that this broker was the leader, and then fail on that
> > > leader,
> > > > and then reconnect to to the metadata vip, and get sent back to that
> > same
> > > > failed broker!   Does that make sense?
> > > >
> > > > I'm trying to understand the conditions which cause the controlled
> > > shutdown
> > > > to fail?  There doesn't seem to be a setting for max amount of time
> to
> > > > wait, etc.
> > > >
> > > > It would be nice to specify how long to try before giving up
> (hopefully
> > > > giving up in a graceful way).
> > > >
> > > > Instead, we have a retry count, but it's not clear what this retry
> > count
> > > is
> > > > really specifying, in terms of how long to keep trying, etc.
> > > >
> > > > Also, what are the ramifications for different settings for the
> > > > controlled.shutdown.retry.backoff.ms?  Is there a reason we want to
> > wait
> > > > before retrying again (again, it would be helpful to understand the
> > > reasons
> > > > for a controlled shutdown failure).
> > > >
> > > > Thanks,
> > > >
> > > > Jason
> > > >
> > >
> >
>

Re: Controlled shutdown failure, retry settings

Posted by Neha Narkhede <ne...@gmail.com>.

Jason,

The state change log tool is described here -
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-7.StateChangeLogMergerTool

I'm curious what the IOException is and if we can improve error reporting.
Can you send around the stack trace ?

Thanks,
Neha


On Fri, Oct 25, 2013 at 8:26 AM, Jason Rosenberg <jb...@squareup.com> wrote:

> Ok,
>
> Looking at the controlled shutdown code, it appears that it can fail with
> an IOException too, in which case it won't report the remaining partitions
> to replicate, etc.  (I think that might be what I'm seeing, since I never
> saw the log line for "controlled shutdown failed, X remaining partitions",
> etc.).  In my case, that may be the issue (it's happening during a rolling
> restart, and the second of 3 nodes might be trying to shutdown before the
> first one has completely come back up).
>
> I've heard you guys mention several times now about controller and state
> change logs.  But I don't know where those live (or how to configure).
>  Please advise!
>
> Thanks,
>
> Jason
>
>
> On Fri, Oct 25, 2013 at 10:40 AM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > Controlled shutdown can fail if the cluster has non zero under replicated
> > partition count. Since that means the leaders may not move off of the
> > broker being shutdown, causing controlled shutdown to fail. The backoff
> > might help if the under replication is just temporary due to a spike in
> > traffic. This is the most common reason it might fail besides bugs. But
> you
> > can check the logs to see why the shutdown failed.
> >
> > Thanks,
> > Neha
> > On Oct 25, 2013 1:18 AM, "Jason Rosenberg" <jb...@squareup.com> wrote:
> >
> > > I'm running into an issue where sometimes, the controlled shutdown
> fails
> > to
> > > complete after the default 3 retry attempts.  This ended up in one
> case,
> > > with a broker under going an unclean shutdown, and then it was in a
> > rather
> > > bad state after restart.  Producers would connect to the metadata vip,
> > > still think that this broker was the leader, and then fail on that
> > leader,
> > > and then reconnect to to the metadata vip, and get sent back to that
> same
> > > failed broker!   Does that make sense?
> > >
> > > I'm trying to understand the conditions which cause the controlled
> > shutdown
> > > to fail?  There doesn't seem to be a setting for max amount of time to
> > > wait, etc.
> > >
> > > It would be nice to specify how long to try before giving up (hopefully
> > > giving up in a graceful way).
> > >
> > > Instead, we have a retry count, but it's not clear what this retry
> count
> > is
> > > really specifying, in terms of how long to keep trying, etc.
> > >
> > > Also, what are the ramifications for different settings for the
> > > controlled.shutdown.retry.backoff.ms?  Is there a reason we want to
> wait
> > > before retrying again (again, it would be helpful to understand the
> > reasons
> > > for a controlled shutdown failure).
> > >
> > > Thanks,
> > >
> > > Jason
> > >
> >
>

Re: Controlled shutdown failure, retry settings

Posted by Jason Rosenberg <jb...@squareup.com>.

Ok,

Looking at the controlled shutdown code, it appears that it can fail with
an IOException too, in which case it won't report the remaining partitions
to replicate, etc.  (I think that might be what I'm seeing, since I never
saw the log line for "controlled shutdown failed, X remaining partitions",
etc.).  In my case, that may be the issue (it's happening during a rolling
restart, and the second of 3 nodes might be trying to shutdown before the
first one has completely come back up).

I've heard you guys mention several times now about controller and state
change logs.  But I don't know where those live (or how to configure).
 Please advise!

Thanks,

Jason


On Fri, Oct 25, 2013 at 10:40 AM, Neha Narkhede <ne...@gmail.com>wrote:

> Controlled shutdown can fail if the cluster has non zero under replicated
> partition count. Since that means the leaders may not move off of the
> broker being shutdown, causing controlled shutdown to fail. The backoff
> might help if the under replication is just temporary due to a spike in
> traffic. This is the most common reason it might fail besides bugs. But you
> can check the logs to see why the shutdown failed.
>
> Thanks,
> Neha
> On Oct 25, 2013 1:18 AM, "Jason Rosenberg" <jb...@squareup.com> wrote:
>
> > I'm running into an issue where sometimes, the controlled shutdown fails
> to
> > complete after the default 3 retry attempts.  This ended up in one case,
> > with a broker under going an unclean shutdown, and then it was in a
> rather
> > bad state after restart.  Producers would connect to the metadata vip,
> > still think that this broker was the leader, and then fail on that
> leader,
> > and then reconnect to to the metadata vip, and get sent back to that same
> > failed broker!   Does that make sense?
> >
> > I'm trying to understand the conditions which cause the controlled
> shutdown
> > to fail?  There doesn't seem to be a setting for max amount of time to
> > wait, etc.
> >
> > It would be nice to specify how long to try before giving up (hopefully
> > giving up in a graceful way).
> >
> > Instead, we have a retry count, but it's not clear what this retry count
> is
> > really specifying, in terms of how long to keep trying, etc.
> >
> > Also, what are the ramifications for different settings for the
> > controlled.shutdown.retry.backoff.ms?  Is there a reason we want to wait
> > before retrying again (again, it would be helpful to understand the
> reasons
> > for a controlled shutdown failure).
> >
> > Thanks,
> >
> > Jason
> >
>

Re: Controlled shutdown failure, retry settings

Posted by Neha Narkhede <ne...@gmail.com>.

Controlled shutdown can fail if the cluster has non zero under replicated
partition count. Since that means the leaders may not move off of the
broker being shutdown, causing controlled shutdown to fail. The backoff
might help if the under replication is just temporary due to a spike in
traffic. This is the most common reason it might fail besides bugs. But you
can check the logs to see why the shutdown failed.

Thanks,
Neha
On Oct 25, 2013 1:18 AM, "Jason Rosenberg" <jb...@squareup.com> wrote:

> I'm running into an issue where sometimes, the controlled shutdown fails to
> complete after the default 3 retry attempts.  This ended up in one case,
> with a broker under going an unclean shutdown, and then it was in a rather
> bad state after restart.  Producers would connect to the metadata vip,
> still think that this broker was the leader, and then fail on that leader,
> and then reconnect to to the metadata vip, and get sent back to that same
> failed broker!   Does that make sense?
>
> I'm trying to understand the conditions which cause the controlled shutdown
> to fail?  There doesn't seem to be a setting for max amount of time to
> wait, etc.
>
> It would be nice to specify how long to try before giving up (hopefully
> giving up in a graceful way).
>
> Instead, we have a retry count, but it's not clear what this retry count is
> really specifying, in terms of how long to keep trying, etc.
>
> Also, what are the ramifications for different settings for the
> controlled.shutdown.retry.backoff.ms?  Is there a reason we want to wait
> before retrying again (again, it would be helpful to understand the reasons
> for a controlled shutdown failure).
>
> Thanks,
>
> Jason
>