You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Andrew Otto <ao...@wikimedia.org> on 2014/09/03 20:48:01 UTC

Manual Leader Assignment

Hiya,

During leader changes, we see short periods of message loss on some of our higher volume producers.  I suspect that this is because it takes a couple of seconds for Zookeeper to notice and notify the producers of the metadata change.  During this time, producer buffers can fill up and end up dropping some messages.

I’d like to do some troubleshooting.  Is it possible to manually change the leadership of a single partition?  I see here[1] that I can start a leadership election for a particular partition, but the JSON doesn’t show a way to choose the new leader of the partition.

Thanks!
-Andrew Otto

[1] https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-Howtousethetool?.1

Re: Manual Leader Assignment

Posted by Jun Rao <ju...@gmail.com>.
With ack=1, some acked msgs could be lost when the leader of a partition
changes.

Thanks,

Jun

On Tue, Sep 9, 2014 at 6:22 AM, Andrew Otto <ao...@wikimedia.org> wrote:

> > (2)
> > configure enough retries + backoff time in the producer (so that new
> > leaders can be elected during failure).
> Ya, I’ve been doing this, I need to get it just right.
>
> >  (1) use ack=-1 in the producer;
> We use ack=1.  If I configure (2) properly, I shouldn’t need ack=-1, will
> I?  I don’t think this will help me in my case anyway, as it is the
> producers themselves that are dropping the messages when their buffers fill
> up, while they are waiting for acks on messages before the new leadership
> metadata has been propagated.
>
> Anyway, my question is about being able to test if I have done (2)
> properly.  Right now
>
>
>
> On Sep 9, 2014, at 12:10 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > To avoid data loss, you will need to (1) use ack=-1 in the producer; (2)
> > configure enough retries + backoff time in the producer (so that new
> > leaders can be elected during failure).
> >
> > For controlled failure, you can reduce the unavailability window by using
> > controlled shutdown. See
> > http://kafka.apache.org/documentation.html#basic_ops_restarting
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Sep 3, 2014 at 11:48 AM, Andrew Otto <ao...@wikimedia.org>
> wrote:
> >
> >> Hiya,
> >>
> >> During leader changes, we see short periods of message loss on some of
> our
> >> higher volume producers.  I suspect that this is because it takes a
> couple
> >> of seconds for Zookeeper to notice and notify the producers of the
> metadata
> >> change.  During this time, producer buffers can fill up and end up
> dropping
> >> some messages.
> >>
> >> I’d like to do some troubleshooting.  Is it possible to manually change
> >> the leadership of a single partition?  I see here[1] that I can start a
> >> leadership election for a particular partition, but the JSON doesn’t
> show a
> >> way to choose the new leader of the partition.
> >>
> >> Thanks!
> >> -Andrew Otto
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-Howtousethetool?.1
>
>

Re: Manual Leader Assignment

Posted by Andrew Otto <ao...@wikimedia.org>.
> (2)
> configure enough retries + backoff time in the producer (so that new
> leaders can be elected during failure).
Ya, I’ve been doing this, I need to get it just right.

>  (1) use ack=-1 in the producer;
We use ack=1.  If I configure (2) properly, I shouldn’t need ack=-1, will I?  I don’t think this will help me in my case anyway, as it is the producers themselves that are dropping the messages when their buffers fill up, while they are waiting for acks on messages before the new leadership metadata has been propagated.

Anyway, my question is about being able to test if I have done (2) properly.  Right now



On Sep 9, 2014, at 12:10 AM, Jun Rao <ju...@gmail.com> wrote:

> To avoid data loss, you will need to (1) use ack=-1 in the producer; (2)
> configure enough retries + backoff time in the producer (so that new
> leaders can be elected during failure).
> 
> For controlled failure, you can reduce the unavailability window by using
> controlled shutdown. See
> http://kafka.apache.org/documentation.html#basic_ops_restarting
> 
> Thanks,
> 
> Jun
> 
> On Wed, Sep 3, 2014 at 11:48 AM, Andrew Otto <ao...@wikimedia.org> wrote:
> 
>> Hiya,
>> 
>> During leader changes, we see short periods of message loss on some of our
>> higher volume producers.  I suspect that this is because it takes a couple
>> of seconds for Zookeeper to notice and notify the producers of the metadata
>> change.  During this time, producer buffers can fill up and end up dropping
>> some messages.
>> 
>> I’d like to do some troubleshooting.  Is it possible to manually change
>> the leadership of a single partition?  I see here[1] that I can start a
>> leadership election for a particular partition, but the JSON doesn’t show a
>> way to choose the new leader of the partition.
>> 
>> Thanks!
>> -Andrew Otto
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-Howtousethetool?.1


Re: Manual Leader Assignment

Posted by Jun Rao <ju...@gmail.com>.
To avoid data loss, you will need to (1) use ack=-1 in the producer; (2)
configure enough retries + backoff time in the producer (so that new
leaders can be elected during failure).

For controlled failure, you can reduce the unavailability window by using
controlled shutdown. See
http://kafka.apache.org/documentation.html#basic_ops_restarting

Thanks,

Jun

On Wed, Sep 3, 2014 at 11:48 AM, Andrew Otto <ao...@wikimedia.org> wrote:

> Hiya,
>
> During leader changes, we see short periods of message loss on some of our
> higher volume producers.  I suspect that this is because it takes a couple
> of seconds for Zookeeper to notice and notify the producers of the metadata
> change.  During this time, producer buffers can fill up and end up dropping
> some messages.
>
> I’d like to do some troubleshooting.  Is it possible to manually change
> the leadership of a single partition?  I see here[1] that I can start a
> leadership election for a particular partition, but the JSON doesn’t show a
> way to choose the new leader of the partition.
>
> Thanks!
> -Andrew Otto
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-Howtousethetool?.1