You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joe Stein (JIRA)" <ji...@apache.org> on 2014/09/22 19:52:35 UTC

[jira] [Comment Edited] (KAFKA-1555) provide strong consistency with reasonable availability

    [ https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143521#comment-14143521 ] 

Joe Stein edited comment on KAFKA-1555 at 9/22/14 5:51 PM:
-----------------------------------------------------------

ah, my bad. 

<< Any suggestions regarding the problem with retries? 

I think that is an issue beyond this ticket that happens in other cases (e.g. MaxMessageSize and retry that 3 times won't change a thing like this same problem) that we don't have a solution for yet that "classifies" exceptions... So I think we should do some fix for it but that is not related to this ticket IMHO... unlike MaxMessageSize though it is possible after the first failure another replica comes online and succeeds so that functionality might be desirable ( I could see how it would be).


was (Author: joestein):
ah, my bad. 

<< Any suggestions regarding the problem with retries? 

I think that is an issue beyond this ticket that happens in other cases (e.g. MaxMessageSize and retry that 3 times won't change a thing like this same problem) that we don't have a solution for yet that "classifies" exceptions... So I think we should do some fix for it but that is not related to this ticket IMHO.

> provide strong consistency with reasonable availability
> -------------------------------------------------------
>
>                 Key: KAFKA-1555
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1555
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>    Affects Versions: 0.8.1.1
>            Reporter: Jiang Wu
>            Assignee: Gwen Shapira
>             Fix For: 0.8.2
>
>         Attachments: KAFKA-1555.0.patch, KAFKA-1555.1.patch
>
>
> In a mission critical application, we expect a kafka cluster with 3 brokers can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume that at the beginning, all 3 replicas, leader A, followers B and C, are in sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this time, although C hasn't received m, C is still in ISR. If A is killed, C can be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a message m to A, and receives an acknowledgement. Disk failure happens in A before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)