You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Aishwarya Gune <ai...@confluent.io> on 2019/05/06 20:20:48 UTC

[DISCUSS] KIP-461 - Improving replica fetcher behavior in case of partition failures

Hey All!

I have created a KIP to improve the behavior of replica fetcher when
partition failure occurs. Please do have a look at it and let me know what
you think.
KIP 461 -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-461+-+Improve+Replica+Fetcher+behavior+at+handling+partition+failure

-- 
Thank you,
Aishwarya

Re: [DISCUSS] KIP-461 - Improving replica fetcher behavior in case of partition failures

Posted by Jason Gustafson <ja...@confluent.io>.
Hey Aishwarya,

Thanks for the KIP. I'd suggest we move to a vote since this is a
straightforward improvement with a large impact.

-Jason

On Tue, May 7, 2019 at 3:02 PM Aishwarya Gune <ai...@confluent.io>
wrote:

> Hi Colin!
>
> Whenever the thread has all of its partitions marked as failed (i.e. thread
> is idle), the thread would be shut down.
> The errors that are not per-partition would probably retry or behave just
> as before.
>
>
> On Tue, May 7, 2019 at 9:57 AM Colin McCabe <cm...@apache.org> wrote:
>
> > Hi Aishwarya,
> >
> > This looks like a great improvement!
> >
> > Will a fetcher thread exit if all of its partitions have been marked
> > failed?  Or will it continue to run?
> >
> > After this KIP is adopted, are there any remaining situations where we
> > would exit a fetcher thread?  I guess some errors are not per-partition,
> > like authentication exceptions.  How will those behave?
> >
> > best,
> > Colin
> >
> > On Mon, May 6, 2019, at 13:21, Aishwarya Gune wrote:
> > > Hey All!
> > >
> > > I have created a KIP to improve the behavior of replica fetcher when
> > > partition failure occurs. Please do have a look at it and let me know
> > > what
> > > you think.
> > > KIP 461 -
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-461+-+Improve+Replica+Fetcher+behavior+at+handling+partition+failure
> > >
> > > --
> > > Thank you,
> > > Aishwarya
> > >
> >
>
>
> --
> Thank you,
> Aishwarya
>

Re: [DISCUSS] KIP-461 - Improving replica fetcher behavior in case of partition failures

Posted by Colin McCabe <cm...@apache.org>.
Thanks-- that makes sense.

cheers,
Colin

On Tue, May 7, 2019, at 15:02, Aishwarya Gune wrote:
> Hi Colin!
> 
> Whenever the thread has all of its partitions marked as failed (i.e. thread
> is idle), the thread would be shut down.
> The errors that are not per-partition would probably retry or behave just
> as before.
> 
> 
> On Tue, May 7, 2019 at 9:57 AM Colin McCabe <cm...@apache.org> wrote:
> 
> > Hi Aishwarya,
> >
> > This looks like a great improvement!
> >
> > Will a fetcher thread exit if all of its partitions have been marked
> > failed?  Or will it continue to run?
> >
> > After this KIP is adopted, are there any remaining situations where we
> > would exit a fetcher thread?  I guess some errors are not per-partition,
> > like authentication exceptions.  How will those behave?
> >
> > best,
> > Colin
> >
> > On Mon, May 6, 2019, at 13:21, Aishwarya Gune wrote:
> > > Hey All!
> > >
> > > I have created a KIP to improve the behavior of replica fetcher when
> > > partition failure occurs. Please do have a look at it and let me know
> > > what
> > > you think.
> > > KIP 461 -
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-461+-+Improve+Replica+Fetcher+behavior+at+handling+partition+failure
> > >
> > > --
> > > Thank you,
> > > Aishwarya
> > >
> >
> 
> 
> -- 
> Thank you,
> Aishwarya
>

Re: [DISCUSS] KIP-461 - Improving replica fetcher behavior in case of partition failures

Posted by Aishwarya Gune <ai...@confluent.io>.
Hi Colin!

Whenever the thread has all of its partitions marked as failed (i.e. thread
is idle), the thread would be shut down.
The errors that are not per-partition would probably retry or behave just
as before.


On Tue, May 7, 2019 at 9:57 AM Colin McCabe <cm...@apache.org> wrote:

> Hi Aishwarya,
>
> This looks like a great improvement!
>
> Will a fetcher thread exit if all of its partitions have been marked
> failed?  Or will it continue to run?
>
> After this KIP is adopted, are there any remaining situations where we
> would exit a fetcher thread?  I guess some errors are not per-partition,
> like authentication exceptions.  How will those behave?
>
> best,
> Colin
>
> On Mon, May 6, 2019, at 13:21, Aishwarya Gune wrote:
> > Hey All!
> >
> > I have created a KIP to improve the behavior of replica fetcher when
> > partition failure occurs. Please do have a look at it and let me know
> > what
> > you think.
> > KIP 461 -
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-461+-+Improve+Replica+Fetcher+behavior+at+handling+partition+failure
> >
> > --
> > Thank you,
> > Aishwarya
> >
>


-- 
Thank you,
Aishwarya

Re: [DISCUSS] KIP-461 - Improving replica fetcher behavior in case of partition failures

Posted by Colin McCabe <cm...@apache.org>.
Hi Aishwarya,

This looks like a great improvement!

Will a fetcher thread exit if all of its partitions have been marked failed?  Or will it continue to run?

After this KIP is adopted, are there any remaining situations where we would exit a fetcher thread?  I guess some errors are not per-partition, like authentication exceptions.  How will those behave?

best,
Colin

On Mon, May 6, 2019, at 13:21, Aishwarya Gune wrote:
> Hey All!
> 
> I have created a KIP to improve the behavior of replica fetcher when
> partition failure occurs. Please do have a look at it and let me know 
> what
> you think.
> KIP 461 -
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-461+-+Improve+Replica+Fetcher+behavior+at+handling+partition+failure
> 
> -- 
> Thank you,
> Aishwarya
>

Re: [DISCUSS] KIP-461 - Improving replica fetcher behavior in case of partition failures

Posted by Aishwarya Gune <ai...@confluent.io>.
Hi Jun!

Yes, we should exclude. When a replica is deleted with StopReplicaRequest,
the partition is removed from the set of failed partitions. Will update the
KIP to mention it.

On Wed, May 8, 2019 at 1:59 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Aishwarya,
>
> Thanks for the KIP. Looks good to me. Just one minor comment. If a replica
> is deleted on a broker (through a StopReplicaRequest) while it's in the
> failed partition set, should we exclude that partition from the set and
> the FailedPartitionsCount?
>
> Jun
>
> On Mon, May 6, 2019 at 1:21 PM Aishwarya Gune <ai...@confluent.io>
> wrote:
>
> > Hey All!
> >
> > I have created a KIP to improve the behavior of replica fetcher when
> > partition failure occurs. Please do have a look at it and let me know
> what
> > you think.
> > KIP 461 -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-461+-+Improve+Replica+Fetcher+behavior+at+handling+partition+failure
> >
> > --
> > Thank you,
> > Aishwarya
> >
>


-- 
Thank you,
Aishwarya

Re: [DISCUSS] KIP-461 - Improving replica fetcher behavior in case of partition failures

Posted by Jun Rao <ju...@confluent.io>.
Hi, Aishwarya,

Thanks for the KIP. Looks good to me. Just one minor comment. If a replica
is deleted on a broker (through a StopReplicaRequest) while it's in the
failed partition set, should we exclude that partition from the set and
the FailedPartitionsCount?

Jun

On Mon, May 6, 2019 at 1:21 PM Aishwarya Gune <ai...@confluent.io>
wrote:

> Hey All!
>
> I have created a KIP to improve the behavior of replica fetcher when
> partition failure occurs. Please do have a look at it and let me know what
> you think.
> KIP 461 -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-461+-+Improve+Replica+Fetcher+behavior+at+handling+partition+failure
>
> --
> Thank you,
> Aishwarya
>