You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Sanjay Awatramani <Sa...@guavus.com> on 2018/08/05 18:21:18 UTC

Reliability against rack failure

Hi,

I have done some experiments and gone through kafka documentation, which makes me conclude that there is a small chance of data loss or availability in a rack scenario. Can someone please validate my understanding ?

The minimum configuration for a single rack system against single machine failure is Replication Factor = 3, min.insync.replicas=2, ack=all. This will ensure that leader + at least one replica receives the data written by a producer and there will be no data loss as well as the system continues to be available for further writes by the producer when a broker goes down.

With rack awareness enabled, Kafka will distribute replicas of a partition across racks, giving reliability in case of rack failure. However rack awareness is only concerned with distribution of replicas, not prioritising the order of replication when followers catch up with the leader.

Moving to a rack aware setup which has 2 racks, the above configuration would create a problem because one of the racks might get 2 replicas and if that rack goes down, data will be lost.

Extending the minimum configuration for a 2 rack setup, Replication Factor = 4, min.insync.replicas=2, ack=all. This will ensure that when a rack goes down, one of the replicas will be available as it would be on a different rack than the leader. This was my understanding and I cannot find any documentation to back this. I studied the mechanism by which producer writes to leader - all IN SYNC REPLICAS (ISR) pull the newest data, and if the leader confirms that at least min.insync.replicas have got the newest data, it sends an ack back to the producer. In a rack aware system, I think Kafka will send an ack even if the 2 replicas which are in sync are on the same rack. And at this instant if that rack goes down, data is lost.

If we make min.insync.replicas=3, we can guarantee that one of the replicas will be on a different rack and data will not be lost. However if any rack goes down, producer’s writes will start failing as it won’t have the requisite replicas available.

Is my understanding correct ? Is there a way to configure Kafka in a rack scenario to make it tolerant to data loss as well as make it available for further writes even when a single node or an entire rack goes down ?

Regards,
Sanjay


Re: Reliability against rack failure

Posted by Sanjay Awatramani <Sa...@guavus.com>.
Hi Svante,

I just forgot about ZK. Thanks for setting me on the right track :)
 
Regards,
Sanjay

On 06/08/18, 12:14 AM, "Svante Karlsson" <sv...@csi.se> wrote:

    You need 3 racks for your zookeepers anyway. It needs 2 out of three. How
    have you solved that?
    
    Den sön 5 aug. 2018 20:31Sanjay Awatramani <Sa...@guavus.com>
    skrev:
    
    > Thanks for the quick response Svante.
    > I forgot to mention that the deployment I am looking at has 2 racks. We
    > came up with this solution, but for this specific deployment adding a rack
    > is out of question.
    > Is there a way to resolve this with 2 racks ?
    >
    > Regards,
    > Sanjay
    >
    > On 05/08/18, 11:57 PM, "Svante Karlsson" <sv...@csi.se> wrote:
    >
    > >3 racks,  Replication Factor = 3, min.insync.replicas=2, ack=all
    > >
    > >2018-08-05 20:21 GMT+02:00 Sanjay Awatramani
    > ><Sa...@guavus.com>:
    > >
    > >> Hi,
    > >>
    > >> I have done some experiments and gone through kafka documentation, which
    > >> makes me conclude that there is a small chance of data loss or
    > >>availability
    > >> in a rack scenario. Can someone please validate my understanding ?
    > >>
    > >> The minimum configuration for a single rack system against single
    > >>machine
    > >> failure is Replication Factor = 3, min.insync.replicas=2, ack=all. This
    > >> will ensure that leader + at least one replica receives the data
    > >>written by
    > >> a producer and there will be no data loss as well as the system
    > >>continues
    > >> to be available for further writes by the producer when a broker goes
    > >>down.
    > >>
    > >> With rack awareness enabled, Kafka will distribute replicas of a
    > >>partition
    > >> across racks, giving reliability in case of rack failure. However rack
    > >> awareness is only concerned with distribution of replicas, not
    > >>prioritising
    > >> the order of replication when followers catch up with the leader.
    > >>
    > >> Moving to a rack aware setup which has 2 racks, the above configuration
    > >> would create a problem because one of the racks might get 2 replicas
    > >>and if
    > >> that rack goes down, data will be lost.
    > >>
    > >> Extending the minimum configuration for a 2 rack setup, Replication
    > >>Factor
    > >> = 4, min.insync.replicas=2, ack=all. This will ensure that when a rack
    > >>goes
    > >> down, one of the replicas will be available as it would be on a
    > >>different
    > >> rack than the leader. This was my understanding and I cannot find any
    > >> documentation to back this. I studied the mechanism by which producer
    > >> writes to leader - all IN SYNC REPLICAS (ISR) pull the newest data, and
    > >>if
    > >> the leader confirms that at least min.insync.replicas have got the
    > >>newest
    > >> data, it sends an ack back to the producer. In a rack aware system, I
    > >>think
    > >> Kafka will send an ack even if the 2 replicas which are in sync are on
    > >>the
    > >> same rack. And at this instant if that rack goes down, data is lost.
    > >>
    > >> If we make min.insync.replicas=3, we can guarantee that one of the
    > >> replicas will be on a different rack and data will not be lost. However
    > >>if
    > >> any rack goes down, producer¹s writes will start failing as it won¹t
    > >>have
    > >> the requisite replicas available.
    > >>
    > >> Is my understanding correct ? Is there a way to configure Kafka in a
    > >>rack
    > >> scenario to make it tolerant to data loss as well as make it available
    > >>for
    > >> further writes even when a single node or an entire rack goes down ?
    > >>
    > >> Regards,
    > >> Sanjay
    > >>
    > >>
    >
    >
    


Re: Reliability against rack failure

Posted by Svante Karlsson <sv...@csi.se>.
You need 3 racks for your zookeepers anyway. It needs 2 out of three. How
have you solved that?

Den sön 5 aug. 2018 20:31Sanjay Awatramani <Sa...@guavus.com>
skrev:

> Thanks for the quick response Svante.
> I forgot to mention that the deployment I am looking at has 2 racks. We
> came up with this solution, but for this specific deployment adding a rack
> is out of question.
> Is there a way to resolve this with 2 racks ?
>
> Regards,
> Sanjay
>
> On 05/08/18, 11:57 PM, "Svante Karlsson" <sv...@csi.se> wrote:
>
> >3 racks,  Replication Factor = 3, min.insync.replicas=2, ack=all
> >
> >2018-08-05 20:21 GMT+02:00 Sanjay Awatramani
> ><Sa...@guavus.com>:
> >
> >> Hi,
> >>
> >> I have done some experiments and gone through kafka documentation, which
> >> makes me conclude that there is a small chance of data loss or
> >>availability
> >> in a rack scenario. Can someone please validate my understanding ?
> >>
> >> The minimum configuration for a single rack system against single
> >>machine
> >> failure is Replication Factor = 3, min.insync.replicas=2, ack=all. This
> >> will ensure that leader + at least one replica receives the data
> >>written by
> >> a producer and there will be no data loss as well as the system
> >>continues
> >> to be available for further writes by the producer when a broker goes
> >>down.
> >>
> >> With rack awareness enabled, Kafka will distribute replicas of a
> >>partition
> >> across racks, giving reliability in case of rack failure. However rack
> >> awareness is only concerned with distribution of replicas, not
> >>prioritising
> >> the order of replication when followers catch up with the leader.
> >>
> >> Moving to a rack aware setup which has 2 racks, the above configuration
> >> would create a problem because one of the racks might get 2 replicas
> >>and if
> >> that rack goes down, data will be lost.
> >>
> >> Extending the minimum configuration for a 2 rack setup, Replication
> >>Factor
> >> = 4, min.insync.replicas=2, ack=all. This will ensure that when a rack
> >>goes
> >> down, one of the replicas will be available as it would be on a
> >>different
> >> rack than the leader. This was my understanding and I cannot find any
> >> documentation to back this. I studied the mechanism by which producer
> >> writes to leader - all IN SYNC REPLICAS (ISR) pull the newest data, and
> >>if
> >> the leader confirms that at least min.insync.replicas have got the
> >>newest
> >> data, it sends an ack back to the producer. In a rack aware system, I
> >>think
> >> Kafka will send an ack even if the 2 replicas which are in sync are on
> >>the
> >> same rack. And at this instant if that rack goes down, data is lost.
> >>
> >> If we make min.insync.replicas=3, we can guarantee that one of the
> >> replicas will be on a different rack and data will not be lost. However
> >>if
> >> any rack goes down, producer¹s writes will start failing as it won¹t
> >>have
> >> the requisite replicas available.
> >>
> >> Is my understanding correct ? Is there a way to configure Kafka in a
> >>rack
> >> scenario to make it tolerant to data loss as well as make it available
> >>for
> >> further writes even when a single node or an entire rack goes down ?
> >>
> >> Regards,
> >> Sanjay
> >>
> >>
>
>

Re: Reliability against rack failure

Posted by Sanjay Awatramani <Sa...@guavus.com>.
Thanks for the quick response Svante.
I forgot to mention that the deployment I am looking at has 2 racks. We
came up with this solution, but for this specific deployment adding a rack
is out of question.
Is there a way to resolve this with 2 racks ?

Regards,
Sanjay

On 05/08/18, 11:57 PM, "Svante Karlsson" <sv...@csi.se> wrote:

>3 racks,  Replication Factor = 3, min.insync.replicas=2, ack=all
>
>2018-08-05 20:21 GMT+02:00 Sanjay Awatramani
><Sa...@guavus.com>:
>
>> Hi,
>>
>> I have done some experiments and gone through kafka documentation, which
>> makes me conclude that there is a small chance of data loss or
>>availability
>> in a rack scenario. Can someone please validate my understanding ?
>>
>> The minimum configuration for a single rack system against single
>>machine
>> failure is Replication Factor = 3, min.insync.replicas=2, ack=all. This
>> will ensure that leader + at least one replica receives the data
>>written by
>> a producer and there will be no data loss as well as the system
>>continues
>> to be available for further writes by the producer when a broker goes
>>down.
>>
>> With rack awareness enabled, Kafka will distribute replicas of a
>>partition
>> across racks, giving reliability in case of rack failure. However rack
>> awareness is only concerned with distribution of replicas, not
>>prioritising
>> the order of replication when followers catch up with the leader.
>>
>> Moving to a rack aware setup which has 2 racks, the above configuration
>> would create a problem because one of the racks might get 2 replicas
>>and if
>> that rack goes down, data will be lost.
>>
>> Extending the minimum configuration for a 2 rack setup, Replication
>>Factor
>> = 4, min.insync.replicas=2, ack=all. This will ensure that when a rack
>>goes
>> down, one of the replicas will be available as it would be on a
>>different
>> rack than the leader. This was my understanding and I cannot find any
>> documentation to back this. I studied the mechanism by which producer
>> writes to leader - all IN SYNC REPLICAS (ISR) pull the newest data, and
>>if
>> the leader confirms that at least min.insync.replicas have got the
>>newest
>> data, it sends an ack back to the producer. In a rack aware system, I
>>think
>> Kafka will send an ack even if the 2 replicas which are in sync are on
>>the
>> same rack. And at this instant if that rack goes down, data is lost.
>>
>> If we make min.insync.replicas=3, we can guarantee that one of the
>> replicas will be on a different rack and data will not be lost. However
>>if
>> any rack goes down, producer¹s writes will start failing as it won¹t
>>have
>> the requisite replicas available.
>>
>> Is my understanding correct ? Is there a way to configure Kafka in a
>>rack
>> scenario to make it tolerant to data loss as well as make it available
>>for
>> further writes even when a single node or an entire rack goes down ?
>>
>> Regards,
>> Sanjay
>>
>>


Re: Reliability against rack failure

Posted by Svante Karlsson <sv...@csi.se>.
3 racks,  Replication Factor = 3, min.insync.replicas=2, ack=all

2018-08-05 20:21 GMT+02:00 Sanjay Awatramani <Sa...@guavus.com>:

> Hi,
>
> I have done some experiments and gone through kafka documentation, which
> makes me conclude that there is a small chance of data loss or availability
> in a rack scenario. Can someone please validate my understanding ?
>
> The minimum configuration for a single rack system against single machine
> failure is Replication Factor = 3, min.insync.replicas=2, ack=all. This
> will ensure that leader + at least one replica receives the data written by
> a producer and there will be no data loss as well as the system continues
> to be available for further writes by the producer when a broker goes down.
>
> With rack awareness enabled, Kafka will distribute replicas of a partition
> across racks, giving reliability in case of rack failure. However rack
> awareness is only concerned with distribution of replicas, not prioritising
> the order of replication when followers catch up with the leader.
>
> Moving to a rack aware setup which has 2 racks, the above configuration
> would create a problem because one of the racks might get 2 replicas and if
> that rack goes down, data will be lost.
>
> Extending the minimum configuration for a 2 rack setup, Replication Factor
> = 4, min.insync.replicas=2, ack=all. This will ensure that when a rack goes
> down, one of the replicas will be available as it would be on a different
> rack than the leader. This was my understanding and I cannot find any
> documentation to back this. I studied the mechanism by which producer
> writes to leader - all IN SYNC REPLICAS (ISR) pull the newest data, and if
> the leader confirms that at least min.insync.replicas have got the newest
> data, it sends an ack back to the producer. In a rack aware system, I think
> Kafka will send an ack even if the 2 replicas which are in sync are on the
> same rack. And at this instant if that rack goes down, data is lost.
>
> If we make min.insync.replicas=3, we can guarantee that one of the
> replicas will be on a different rack and data will not be lost. However if
> any rack goes down, producer’s writes will start failing as it won’t have
> the requisite replicas available.
>
> Is my understanding correct ? Is there a way to configure Kafka in a rack
> scenario to make it tolerant to data loss as well as make it available for
> further writes even when a single node or an entire rack goes down ?
>
> Regards,
> Sanjay
>
>

Re: Reliability against rack failure

Posted by Daniel Hanley <da...@confluent.io>.
Hi Sanjay

From Kafka 0.10.0 you can use the optional broker.rack property to get
replications distributed across racks.

See:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
and https://issues.apache.org/jira/browse/KAFKA-1215

https://docs.confluent.io/current/installation/configuration/broker-configs.html#broker-rack

Best Regards

Dan

On Sun, Aug 5, 2018 at 7:21 PM, Sanjay Awatramani <
Sanjay.Awatramani@guavus.com> wrote:

> Hi,
>
> I have done some experiments and gone through kafka documentation, which
> makes me conclude that there is a small chance of data loss or availability
> in a rack scenario. Can someone please validate my understanding ?
>
> The minimum configuration for a single rack system against single machine
> failure is Replication Factor = 3, min.insync.replicas=2, ack=all. This
> will ensure that leader + at least one replica receives the data written by
> a producer and there will be no data loss as well as the system continues
> to be available for further writes by the producer when a broker goes down.
>
> With rack awareness enabled, Kafka will distribute replicas of a partition
> across racks, giving reliability in case of rack failure. However rack
> awareness is only concerned with distribution of replicas, not prioritising
> the order of replication when followers catch up with the leader.
>
> Moving to a rack aware setup which has 2 racks, the above configuration
> would create a problem because one of the racks might get 2 replicas and if
> that rack goes down, data will be lost.
>
> Extending the minimum configuration for a 2 rack setup, Replication Factor
> = 4, min.insync.replicas=2, ack=all. This will ensure that when a rack goes
> down, one of the replicas will be available as it would be on a different
> rack than the leader. This was my understanding and I cannot find any
> documentation to back this. I studied the mechanism by which producer
> writes to leader - all IN SYNC REPLICAS (ISR) pull the newest data, and if
> the leader confirms that at least min.insync.replicas have got the newest
> data, it sends an ack back to the producer. In a rack aware system, I think
> Kafka will send an ack even if the 2 replicas which are in sync are on the
> same rack. And at this instant if that rack goes down, data is lost.
>
> If we make min.insync.replicas=3, we can guarantee that one of the
> replicas will be on a different rack and data will not be lost. However if
> any rack goes down, producer’s writes will start failing as it won’t have
> the requisite replicas available.
>
> Is my understanding correct ? Is there a way to configure Kafka in a rack
> scenario to make it tolerant to data loss as well as make it available for
> further writes even when a single node or an entire rack goes down ?
>
> Regards,
> Sanjay
>
>