You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Sam Meder <sa...@jivesoftware.com> on 2013/09/06 17:46:43 UTC

Re: Segment recovery and replication

On Aug 29, 2013, at 11:12 PM, Neha Narkhede <ne...@gmail.com> wrote:

>>> How do you automate waiting for the broker to come up? Just keep
> monitoring the process and keep trying to connect to the port?
> 
> Every leader in a Kafka cluster exposes the UnderReplicatedPartitionCount
> metric. The safest way to issue controlled shutdown is to wait until that
> metric reports 0 on the brokers.

Maybe I am missing something, but won't the topics for which I have partitions on the broker I am shutting down always report as under-replicated (unless I manually reassign the partition to another broker)? I thought that the shutdown logic really only dealt with transferring the leader status for a partition.

As a side note it would be great to have a minimum replication factor in addition to the regular replication factor so one can enforce durability guarantees (fail the producer when the message can't be sufficiently replicated).

> If you try to shutdown the last broker in
> the ISR, the controlled shutdown cannot succeed since there is no other
> broker to move the leader to. Waiting until under replicated partition
> count hits 0 prevents you from hitting this issue.
> 
> This also solves the problem of waiting until the broker comes up since you
> will automatically wait until the broker comes up and joins ISR.

Not sure I follow, but one start-up situation I am concerned about is what happens on abnormal termination (whether through a kill -9, OOM, HW failure - what ever floats your boat). For this scenario it would be great if there was a way to wait for the recovery process to finish. For now we can just wait for the server port to become available, but something more explicit would be great.

/Sam

> 
> 
> Thanks,
> Neha
> 
> 
> On Thu, Aug 29, 2013 at 12:59 PM, Sam Meder <sa...@jivesoftware.com>wrote:
> 
>> Ok, I spent some more time staring at our logs and figured out that it was
>> our fault. We were not waiting around for the Kafka broker to fully
>> initialize before moving on to the next broker and loading the data logs
>> can take quite some time (~7 minutes in one case), so   we ended up with no
>> replicas online at some point and the replica that came back first was a
>> little short on data...
>> 
>> How do you automate waiting for the broker to come up? Just keep
>> monitoring the process and keep trying to connect to the port?
>> 
>> /Sam
>> 
>> On Aug 29, 2013, at 6:40 PM, Sam Meder <sa...@jivesoftware.com> wrote:
>> 
>>> 
>>> On Aug 29, 2013, at 5:50 PM, Sriram Subramanian <
>> srsubramanian@linkedin.com> wrote:
>>> 
>>>> Do you know why you timed out on a regular shutdown?
>>> 
>>> No, though I think it may just have been that the timeout we put in was
>> too short.
>>> 
>>>> If the replica had
>>>> fallen off of the ISR and shutdown was forced on the leader this could
>>>> happen.
>>> 
>>> Hmm, but it shouldn't really be made leader if it isn't even in the isr,
>> should it?
>>> 
>>> /Sam
>>> 
>>>> With ack = -1, we guarantee that all the replicas in the in sync
>>>> set have received the message before exposing the message to the
>> consumer.
>>>> 
>>>> On 8/29/13 8:32 AM, "Sam Meder" <sa...@jivesoftware.com> wrote:
>>>> 
>>>>> We've recently come across a scenario where we see consumers resetting
>>>>> their offsets to earliest and which as far as I can tell may also lead
>> to
>>>>> data loss (we're running with ack = -1 to avoid loss). This seems to
>>>>> happen when we time out on doing a regular shutdown and instead kill -9
>>>>> the kafka broker, but does obviously apply to any scenario that
>> involves
>>>>> a unclean exit. As far as I can tell what happens is
>>>>> 
>>>>> 1. On restart the broker truncates the data for the affected
>> partitions,
>>>>> i.e. not all data was written to disk.
>>>>> 2. The new broker then becomes a leader for the affected partitions and
>>>>> consumers get confused because they've already consumed beyond the now
>>>>> available offset.
>>>>> 
>>>>> Does that seem like a possible failure scenario?
>>>>> 
>>>>> /Sam
>>>> 
>>> 
>> 
>> 


Re: Segment recovery and replication

Posted by Sam Meder <sa...@jivesoftware.com>.
Thinking about it some more I guess you are really talking about monitoring UnderReplicatedPartitionCount during a restart?

/Sam

On Sep 6, 2013, at 5:46 PM, Sam Meder <sa...@jivesoftware.com> wrote:

> On Aug 29, 2013, at 11:12 PM, Neha Narkhede <ne...@gmail.com> wrote:
> 
>>>> How do you automate waiting for the broker to come up? Just keep
>> monitoring the process and keep trying to connect to the port?
>> 
>> Every leader in a Kafka cluster exposes the UnderReplicatedPartitionCount
>> metric. The safest way to issue controlled shutdown is to wait until that
>> metric reports 0 on the brokers.
> 
> Maybe I am missing something, but won't the topics for which I have partitions on the broker I am shutting down always report as under-replicated (unless I manually reassign the partition to another broker)? I thought that the shutdown logic really only dealt with transferring the leader status for a partition.
> 
> As a side note it would be great to have a minimum replication factor in addition to the regular replication factor so one can enforce durability guarantees (fail the producer when the message can't be sufficiently replicated).
> 
>> If you try to shutdown the last broker in
>> the ISR, the controlled shutdown cannot succeed since there is no other
>> broker to move the leader to. Waiting until under replicated partition
>> count hits 0 prevents you from hitting this issue.
>> 
>> This also solves the problem of waiting until the broker comes up since you
>> will automatically wait until the broker comes up and joins ISR.
> 
> Not sure I follow, but one start-up situation I am concerned about is what happens on abnormal termination (whether through a kill -9, OOM, HW failure - what ever floats your boat). For this scenario it would be great if there was a way to wait for the recovery process to finish. For now we can just wait for the server port to become available, but something more explicit would be great.
> 
> /Sam
> 
>> 
>> 
>> Thanks,
>> Neha
>> 
>> 
>> On Thu, Aug 29, 2013 at 12:59 PM, Sam Meder <sa...@jivesoftware.com>wrote:
>> 
>>> Ok, I spent some more time staring at our logs and figured out that it was
>>> our fault. We were not waiting around for the Kafka broker to fully
>>> initialize before moving on to the next broker and loading the data logs
>>> can take quite some time (~7 minutes in one case), so   we ended up with no
>>> replicas online at some point and the replica that came back first was a
>>> little short on data...
>>> 
>>> How do you automate waiting for the broker to come up? Just keep
>>> monitoring the process and keep trying to connect to the port?
>>> 
>>> /Sam
>>> 
>>> On Aug 29, 2013, at 6:40 PM, Sam Meder <sa...@jivesoftware.com> wrote:
>>> 
>>>> 
>>>> On Aug 29, 2013, at 5:50 PM, Sriram Subramanian <
>>> srsubramanian@linkedin.com> wrote:
>>>> 
>>>>> Do you know why you timed out on a regular shutdown?
>>>> 
>>>> No, though I think it may just have been that the timeout we put in was
>>> too short.
>>>> 
>>>>> If the replica had
>>>>> fallen off of the ISR and shutdown was forced on the leader this could
>>>>> happen.
>>>> 
>>>> Hmm, but it shouldn't really be made leader if it isn't even in the isr,
>>> should it?
>>>> 
>>>> /Sam
>>>> 
>>>>> With ack = -1, we guarantee that all the replicas in the in sync
>>>>> set have received the message before exposing the message to the
>>> consumer.
>>>>> 
>>>>> On 8/29/13 8:32 AM, "Sam Meder" <sa...@jivesoftware.com> wrote:
>>>>> 
>>>>>> We've recently come across a scenario where we see consumers resetting
>>>>>> their offsets to earliest and which as far as I can tell may also lead
>>> to
>>>>>> data loss (we're running with ack = -1 to avoid loss). This seems to
>>>>>> happen when we time out on doing a regular shutdown and instead kill -9
>>>>>> the kafka broker, but does obviously apply to any scenario that
>>> involves
>>>>>> a unclean exit. As far as I can tell what happens is
>>>>>> 
>>>>>> 1. On restart the broker truncates the data for the affected
>>> partitions,
>>>>>> i.e. not all data was written to disk.
>>>>>> 2. The new broker then becomes a leader for the affected partitions and
>>>>>> consumers get confused because they've already consumed beyond the now
>>>>>> available offset.
>>>>>> 
>>>>>> Does that seem like a possible failure scenario?
>>>>>> 
>>>>>> /Sam
>>>>> 
>>>> 
>>> 
>>> 
>