You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Dong Lin <li...@gmail.com> on 2017/04/26 18:29:02 UTC
Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

Hey Jun, Ismael,

Thanks for all the review! Can you vote for KIP-112 if you are OK with the
latest design doc?

Thanks,
Dong

On Thu, Mar 30, 2017 at 3:29 PM, Dong Lin <li...@gmail.com> wrote:

> Hi all,
>
> Thanks for all the comments. I am going to open the voting thread if
> there is no further concern with the KIP.
>
> Dong
>
> On Wed, Mar 15, 2017 at 5:25 PM, Ismael Juma <is...@juma.me.uk> wrote:
>
>> Thanks for the updates Dong, they look good to me.
>>
>> Ismael
>>
>> On Wed, Mar 15, 2017 at 5:50 PM, Dong Lin <li...@gmail.com> wrote:
>>
>> > Hey Ismael,
>> >
>> > Sure, I have updated "Changes in Operational Procedures" section in
>> KIP-113
>> > to specify the problem and solution with known disk failure. And I
>> updated
>> > the "Test Plan" section to note that we have test in KIP-113 to verify
>> that
>> > replicas already created on the good log directories will not be
>> affected
>> > by failure of other log directories.
>> >
>> > Please let me know if there is any other improvement I can make. Thanks
>> for
>> > your comment.
>> >
>> > Dong
>> >
>> >
>> > On Wed, Mar 15, 2017 at 3:18 AM, Ismael Juma <is...@juma.me.uk> wrote:
>> >
>> > > Hi Dong,
>> > >
>> > > Yes, that sounds good to me. I'd list option 2 first since that is
>> safe
>> > > and, as you said, no worse than what happens today. The file approach
>> is
>> > a
>> > > bit hacky as you said, so it may be a bit fragile. Not sure if we
>> really
>> > > want to mention that. :)
>> > >
>> > > About the note in KIP-112 versus adding the test in KIP-113, I think
>> it
>> > > would make sense to add a short sentence stating that this scenario is
>> > > covered in KIP-113. People won't necessarily read both KIPs at the
>> same
>> > > time and it's helpful to cross-reference when it makes sense.
>> > >
>> > > Thanks for your work on this.
>> > >
>> > > Ismael
>> > >
>> > > On Tue, Mar 14, 2017 at 11:00 PM, Dong Lin <li...@gmail.com>
>> wrote:
>> > >
>> > > > Hey Ismael,
>> > > >
>> > > > I get your concern that it is more likely for a disk to be slow, or
>> > > exhibit
>> > > > other forms of non-fatal symptom, after some known fatal error.
>> Then it
>> > > is
>> > > > weird for user to start broker with the likely-problematic disk in
>> the
>> > > > broker config. In that case, I think there are two things user can
>> do:
>> > > >
>> > > > 1) Intentionally change the log directory in the config to point to
>> a
>> > > file.
>> > > > This is a bit hacky but it works well before we make
>> more-appropriate
>> > > > long-term change in Kafka to handle this case.
>> > > > 2) Just don't start broker with bad log directories. Always fix disk
>> > > before
>> > > > restarting the broker. This is a safe approach that is no worse than
>> > > > current practice.
>> > > >
>> > > > Would this address your concern if I specify the problem and the two
>> > > > solutions in the KIP?
>> > > >
>> > > > Thanks,
>> > > > Dong
>> > > >
>> > > > On Tue, Mar 14, 2017 at 3:29 PM, Dong Lin <li...@gmail.com>
>> wrote:
>> > > >
>> > > > > Hey Ismael,
>> > > > >
>> > > > > Thanks for the comment. Please see my reply below.
>> > > > >
>> > > > > On Tue, Mar 14, 2017 at 10:31 AM, Ismael Juma <is...@juma.me.uk>
>> > > wrote:
>> > > > >
>> > > > >> Thanks Dong. Comments inline.
>> > > > >>
>> > > > >> On Fri, Mar 10, 2017 at 6:25 PM, Dong Lin <li...@gmail.com>
>> > > wrote:
>> > > > >> >
>> > > > >> > I get your point. But I am not sure we should recommend user to
>> > > simply
>> > > > >> > remove disk from the broker config. If user simply does this
>> > without
>> > > > >> > checking the utilization of good disks, replica on the bad disk
>> > will
>> > > > be
>> > > > >> > re-created on the good disk and may overload the good disks,
>> > causing
>> > > > >> > cascading failure.
>> > > > >> >
>> > > > >>
>> > > > >> Good point.
>> > > > >>
>> > > > >>
>> > > > >> >
>> > > > >> > I agree with you and Colin that slow disk may cause problem.
>> > > However,
>> > > > >> > performance degradation due to slow disk this is an existing
>> > problem
>> > > > >> that
>> > > > >> > is not detected or handled by Kafka or KIP-112.
>> > > > >>
>> > > > >>
>> > > > >> I think an important difference is that a number of disk errors
>> are
>> > > > >> currently fatal and won't be after KIP-112. So it introduces new
>> > > > scenarios
>> > > > >> (for example, bouncing a broker that is working fine although
>> some
>> > > disks
>> > > > >> have been marked bad).
>> > > > >>
>> > > > >
>> > > > > Hmm.. I am still trying to understand why KIP-112 creates new
>> > > scenarios.
>> > > > > Slow disk is not considered fatal error and won't be caught by
>> either
>> > > > > existing Kafka design or this KIP. If any disk is marked bad, it
>> > means
>> > > > > broker encounters IOException when accessing disk, most likely the
>> > > broker
>> > > > > will encounter IOException again when accessing this disk and mark
>> > this
>> > > > > disk as bad after bounce. I guess you are talking about the case
>> > that a
>> > > > > disk is marked bad, broker is bounced, then the disk provides
>> > degraded
>> > > > > performance without being marked bad, right? But this seems to be
>> an
>> > > > > existing problem we already have today with slow disk.
>> > > > >
>> > > > > Here are the possible scenarios with bad disk after broker bounce:
>> > > > >
>> > > > > 1) bad disk -> broker bounce -> good disk. This would be great.
>> > > > > 2) bad disk -> broker bounce -> slow disk. Slow disk is an
>> existing
>> > > > > problem that is not addressed by Kafka today.
>> > > > > 3) bad disk -> broker bounce -> bad disk. This is handled by this
>> KIP
>> > > > such
>> > > > > that only replicas on the bad disk become offline.
>> > > > >
>> > > > >
>> > > > >>
>> > > > >> > Detection and handling of
>> > > > >> > slow disk is a separate problem that needs to be addressed in a
>> > > future
>> > > > >> KIP.
>> > > > >> > It is currently listed in the future work. Does this sound OK?
>> > > > >> >
>> > > > >>
>> > > > >> I'm OK with it being handled in the future. In the meantime, I
>> was
>> > > just
>> > > > >> hoping that we can make it clear to users about the potential
>> issue
>> > > of a
>> > > > >> disk marked as bad becoming good again after a bounce (which can
>> be
>> > > > >> dangerous).
>> > > > >>
>> > > > >> The main benefit of creating the second topic after log directory
>> > goes
>> > > > >> > offline is that we can make sure the second topic is created on
>> > the
>> > > > good
>> > > > >> > log directory. I am not sure we can simply assume that the
>> first
>> > > topic
>> > > > >> will
>> > > > >> > always be created on the first log directory in the broker
>> config
>> > > and
>> > > > >> the
>> > > > >> > second topic will be created on the second log directory in the
>> > > broker
>> > > > >> > config.
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> > However, I can add this test in KIP-113 which allows user to
>> > > > >> > re-assign replica to specific log directory of a broker. Is
>> this
>> > OK?
>> > > > >> >
>> > > > >>
>> > > > >> OK. Please add a note to KIP-112 about this as well (so that it's
>> > > clear
>> > > > >> why
>> > > > >> we only do it in KIP-113).
>> > > > >>
>> > > > >
>> > > > > Sure. Instead of adding note to KIP-112, I have added test in
>> KIP-113
>> > > to
>> > > > > verify that bad log directories discovered during runtime would
>> not
>> > > > affect
>> > > > > replicas on the good log directories. Does this address the
>> problem?
>> > > > >
>> > > > >
>> > > > >> Ismael
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>