You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Stevo Slavić <ss...@gmail.com> on 2015/07/07 16:13:43 UTC

Kafka settings for (more) reliable/durable messaging

Hello Apache Kafka community,

Documentation for min.insync.replicas in
http://kafka.apache.org/documentation.html#brokerconfigs states:

"When used together, min.insync.replicas and request.required.acks allow
you to enforce greater durability guarantees. A typical scenario would be
to create a topic with a replication factor of 3, set min.insync.replicas
to 2, and produce with request.required.acks of -1. This will ensure that
the producer raises an exception if a majority of replicas do not receive a
write."

Correct me if wrong (doc reference?), I assume min.insync.replicas includes
lead, so with min.insync.replicas=2, lead and one more replica besides lead
will have to ACK writes.

In such setup, with minimalistic 3 brokers cluster, given that
- all 3 replicas are insync
- a batch of messages is written and ends up on lead and one replica ACKs
- another batch of messages ends up on lead and different replica ACKs

Is it possible that when lead crashes, while replicas didn't catch up,
(part of) one batch of messages could be lost (since one replica becomes a
new lead, and it's only serving all reads and requests, and replication is
one way)?

Kind regards,
Stevo Slavic.

Re: Kafka settings for (more) reliable/durable messaging

Posted by Stevo Slavić <ss...@gmail.com>.

Great feedback, thank you very much to both!

Kind regards,
Stevo Slavic.

On Tue, Jul 7, 2015 at 7:33 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> The replica lag definition now is time based, so as long as a replica can
> catch up with leader in replica.lag.time.max.ms, it is in ISR, no matter
> how many messages it is behind.
>
> And yes, your understanding is correct - ACK is sent back either when all
> replica in ISR got the message or the request timeout.
>
> I had some related slides here might help a bit.
> http://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kaf
> ka-49753844
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On 7/7/15, 9:28 AM, "Stevo Slavić" <ss...@gmail.com> wrote:
>
> >Thanks for heads up and code reference!
> >
> >Traced back required offset to
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/serve
> >r/ReplicaManager.scala#L303
> >
> >Have to investigate more, but from initial check was expecting to see
> >there
> >reference to "replica.lag.max.messages" (so even when replica is between 0
> >and maxLagMessages behind to be considered on required offset to be
> >considered as insync). Searching through trunk cannot find where in main
> >code is "replica.lag.max.messages" configuration property used.
> >
> >Used search query
> >
> https://github.com/apache/kafka/search?utf8=%E2%9C%93&q=%22replica.lag.max
> >.messages%22&type=Code
> >
> >Maybe it's going to be removed in next release?!
> >
> >Time based lag is still there.
> >
> >Anyway, if I understood correctly, with request.required.acks=-1, when a
> >message/batch is published, it's first written to lead, then other
> >partition replicas either continuously poll and get in sync with lead, or
> >through zookeeper get notified that they are behind and poll and get in
> >sync with lead, and as soon as enough (min.insync.replicas - 1) replicas
> >are detected to be fully in sync with lead, ACK is sent to producer
> >(unless
> >timeout occurs first).
> >
> >On Tue, Jul 7, 2015 at 5:15 PM, Gwen Shapira <gs...@cloudera.com>
> >wrote:
> >
> >> Ah, I think I see the confusion: Replicas don't actually ACK at all.
> >> What happens is that the replica manager waits for enough ISR replicas
> >> to reach the correct offset
> >> Partition.checkEnoughReplicasReachOffset(...) has this logic. A
> >> replica can't reach offset of second batch, without first having
> >> written the first batch. So I believe we are safe in this scenario.
> >>
> >> Gwen
> >>
> >> On Tue, Jul 7, 2015 at 8:01 AM, Stevo Slavić <ss...@gmail.com> wrote:
> >> > Hello Gwen,
> >> >
> >> > Thanks for fast response!
> >> >
> >> > Btw, congrats on officially becoming a Kafka committer and thanks,
> >>among
> >> > other things, for great "Intro to Kafka" video
> >> > http://shop.oreilly.com/product/0636920038603.do !
> >> >
> >> > Have to read more docs and/or source. I thought this scenario is
> >>possible
> >> > because replica can fall behind (replica.lag.max.messages) and still
> >>be
> >> > considered ISR. Then I assumed also write can be ACKed by any ISR, and
> >> then
> >> > why not by one which has fallen more behind.
> >> >
> >> > Kind regards,
> >> > Stevo Slavic.
> >> >
> >> > On Tue, Jul 7, 2015 at 4:47 PM, Gwen Shapira <gs...@cloudera.com>
> >> wrote:
> >> >
> >> >> I am not sure "different replica" can ACK the second back of messages
> >> >> while not having the first - from what I can see, it will need to be
> >> >> up-to-date on the latest messages (i.e. correct HWM) in order to ACK.
> >> >>
> >> >> On Tue, Jul 7, 2015 at 7:13 AM, Stevo Slavić <ss...@gmail.com>
> >>wrote:
> >> >> > Hello Apache Kafka community,
> >> >> >
> >> >> > Documentation for min.insync.replicas in
> >> >> > http://kafka.apache.org/documentation.html#brokerconfigs states:
> >> >> >
> >> >> > "When used together, min.insync.replicas and request.required.acks
> >> allow
> >> >> > you to enforce greater durability guarantees. A typical scenario
> >> would be
> >> >> > to create a topic with a replication factor of 3, set
> >> min.insync.replicas
> >> >> > to 2, and produce with request.required.acks of -1. This will
> >>ensure
> >> that
> >> >> > the producer raises an exception if a majority of replicas do not
> >> >> receive a
> >> >> > write."
> >> >> >
> >> >> > Correct me if wrong (doc reference?), I assume min.insync.replicas
> >> >> includes
> >> >> > lead, so with min.insync.replicas=2, lead and one more replica
> >>besides
> >> >> lead
> >> >> > will have to ACK writes.
> >> >> >
> >> >> > In such setup, with minimalistic 3 brokers cluster, given that
> >> >> > - all 3 replicas are insync
> >> >> > - a batch of messages is written and ends up on lead and one
> >>replica
> >> ACKs
> >> >> > - another batch of messages ends up on lead and different replica
> >>ACKs
> >> >> >
> >> >> > Is it possible that when lead crashes, while replicas didn't catch
> >>up,
> >> >> > (part of) one batch of messages could be lost (since one replica
> >> becomes
> >> >> a
> >> >> > new lead, and it's only serving all reads and requests, and
> >> replication
> >> >> is
> >> >> > one way)?
> >> >> >
> >> >> > Kind regards,
> >> >> > Stevo Slavic.
> >> >>
> >>
>
>

Re: Kafka settings for (more) reliable/durable messaging

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

The replica lag definition now is time based, so as long as a replica can
catch up with leader in replica.lag.time.max.ms, it is in ISR, no matter
how many messages it is behind.

And yes, your understanding is correct - ACK is sent back either when all
replica in ISR got the message or the request timeout.

I had some related slides here might help a bit.
http://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kaf
ka-49753844

Thanks,

Jiangjie (Becket) Qin

On 7/7/15, 9:28 AM, "Stevo Slavić" <ss...@gmail.com> wrote:

>Thanks for heads up and code reference!
>
>Traced back required offset to
>https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/serve
>r/ReplicaManager.scala#L303
>
>Have to investigate more, but from initial check was expecting to see
>there
>reference to "replica.lag.max.messages" (so even when replica is between 0
>and maxLagMessages behind to be considered on required offset to be
>considered as insync). Searching through trunk cannot find where in main
>code is "replica.lag.max.messages" configuration property used.
>
>Used search query
>https://github.com/apache/kafka/search?utf8=%E2%9C%93&q=%22replica.lag.max
>.messages%22&type=Code
>
>Maybe it's going to be removed in next release?!
>
>Time based lag is still there.
>
>Anyway, if I understood correctly, with request.required.acks=-1, when a
>message/batch is published, it's first written to lead, then other
>partition replicas either continuously poll and get in sync with lead, or
>through zookeeper get notified that they are behind and poll and get in
>sync with lead, and as soon as enough (min.insync.replicas - 1) replicas
>are detected to be fully in sync with lead, ACK is sent to producer
>(unless
>timeout occurs first).
>
>On Tue, Jul 7, 2015 at 5:15 PM, Gwen Shapira <gs...@cloudera.com>
>wrote:
>
>> Ah, I think I see the confusion: Replicas don't actually ACK at all.
>> What happens is that the replica manager waits for enough ISR replicas
>> to reach the correct offset
>> Partition.checkEnoughReplicasReachOffset(...) has this logic. A
>> replica can't reach offset of second batch, without first having
>> written the first batch. So I believe we are safe in this scenario.
>>
>> Gwen
>>
>> On Tue, Jul 7, 2015 at 8:01 AM, Stevo Slavić <ss...@gmail.com> wrote:
>> > Hello Gwen,
>> >
>> > Thanks for fast response!
>> >
>> > Btw, congrats on officially becoming a Kafka committer and thanks,
>>among
>> > other things, for great "Intro to Kafka" video
>> > http://shop.oreilly.com/product/0636920038603.do !
>> >
>> > Have to read more docs and/or source. I thought this scenario is
>>possible
>> > because replica can fall behind (replica.lag.max.messages) and still
>>be
>> > considered ISR. Then I assumed also write can be ACKed by any ISR, and
>> then
>> > why not by one which has fallen more behind.
>> >
>> > Kind regards,
>> > Stevo Slavic.
>> >
>> > On Tue, Jul 7, 2015 at 4:47 PM, Gwen Shapira <gs...@cloudera.com>
>> wrote:
>> >
>> >> I am not sure "different replica" can ACK the second back of messages
>> >> while not having the first - from what I can see, it will need to be
>> >> up-to-date on the latest messages (i.e. correct HWM) in order to ACK.
>> >>
>> >> On Tue, Jul 7, 2015 at 7:13 AM, Stevo Slavić <ss...@gmail.com>
>>wrote:
>> >> > Hello Apache Kafka community,
>> >> >
>> >> > Documentation for min.insync.replicas in
>> >> > http://kafka.apache.org/documentation.html#brokerconfigs states:
>> >> >
>> >> > "When used together, min.insync.replicas and request.required.acks
>> allow
>> >> > you to enforce greater durability guarantees. A typical scenario
>> would be
>> >> > to create a topic with a replication factor of 3, set
>> min.insync.replicas
>> >> > to 2, and produce with request.required.acks of -1. This will
>>ensure
>> that
>> >> > the producer raises an exception if a majority of replicas do not
>> >> receive a
>> >> > write."
>> >> >
>> >> > Correct me if wrong (doc reference?), I assume min.insync.replicas
>> >> includes
>> >> > lead, so with min.insync.replicas=2, lead and one more replica
>>besides
>> >> lead
>> >> > will have to ACK writes.
>> >> >
>> >> > In such setup, with minimalistic 3 brokers cluster, given that
>> >> > - all 3 replicas are insync
>> >> > - a batch of messages is written and ends up on lead and one
>>replica
>> ACKs
>> >> > - another batch of messages ends up on lead and different replica
>>ACKs
>> >> >
>> >> > Is it possible that when lead crashes, while replicas didn't catch
>>up,
>> >> > (part of) one batch of messages could be lost (since one replica
>> becomes
>> >> a
>> >> > new lead, and it's only serving all reads and requests, and
>> replication
>> >> is
>> >> > one way)?
>> >> >
>> >> > Kind regards,
>> >> > Stevo Slavic.
>> >>
>>

Re: Kafka settings for (more) reliable/durable messaging

Posted by Stevo Slavić <ss...@gmail.com>.

Thanks for heads up and code reference!

Traced back required offset to
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L303

Have to investigate more, but from initial check was expecting to see there
reference to "replica.lag.max.messages" (so even when replica is between 0
and maxLagMessages behind to be considered on required offset to be
considered as insync). Searching through trunk cannot find where in main
code is "replica.lag.max.messages" configuration property used.

Used search query
https://github.com/apache/kafka/search?utf8=%E2%9C%93&q=%22replica.lag.max.messages%22&type=Code

Maybe it's going to be removed in next release?!

Time based lag is still there.

Anyway, if I understood correctly, with request.required.acks=-1, when a
message/batch is published, it's first written to lead, then other
partition replicas either continuously poll and get in sync with lead, or
through zookeeper get notified that they are behind and poll and get in
sync with lead, and as soon as enough (min.insync.replicas - 1) replicas
are detected to be fully in sync with lead, ACK is sent to producer (unless
timeout occurs first).

On Tue, Jul 7, 2015 at 5:15 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> Ah, I think I see the confusion: Replicas don't actually ACK at all.
> What happens is that the replica manager waits for enough ISR replicas
> to reach the correct offset
> Partition.checkEnoughReplicasReachOffset(...) has this logic. A
> replica can't reach offset of second batch, without first having
> written the first batch. So I believe we are safe in this scenario.
>
> Gwen
>
> On Tue, Jul 7, 2015 at 8:01 AM, Stevo Slavić <ss...@gmail.com> wrote:
> > Hello Gwen,
> >
> > Thanks for fast response!
> >
> > Btw, congrats on officially becoming a Kafka committer and thanks, among
> > other things, for great "Intro to Kafka" video
> > http://shop.oreilly.com/product/0636920038603.do !
> >
> > Have to read more docs and/or source. I thought this scenario is possible
> > because replica can fall behind (replica.lag.max.messages) and still be
> > considered ISR. Then I assumed also write can be ACKed by any ISR, and
> then
> > why not by one which has fallen more behind.
> >
> > Kind regards,
> > Stevo Slavic.
> >
> > On Tue, Jul 7, 2015 at 4:47 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
> >
> >> I am not sure "different replica" can ACK the second back of messages
> >> while not having the first - from what I can see, it will need to be
> >> up-to-date on the latest messages (i.e. correct HWM) in order to ACK.
> >>
> >> On Tue, Jul 7, 2015 at 7:13 AM, Stevo Slavić <ss...@gmail.com> wrote:
> >> > Hello Apache Kafka community,
> >> >
> >> > Documentation for min.insync.replicas in
> >> > http://kafka.apache.org/documentation.html#brokerconfigs states:
> >> >
> >> > "When used together, min.insync.replicas and request.required.acks
> allow
> >> > you to enforce greater durability guarantees. A typical scenario
> would be
> >> > to create a topic with a replication factor of 3, set
> min.insync.replicas
> >> > to 2, and produce with request.required.acks of -1. This will ensure
> that
> >> > the producer raises an exception if a majority of replicas do not
> >> receive a
> >> > write."
> >> >
> >> > Correct me if wrong (doc reference?), I assume min.insync.replicas
> >> includes
> >> > lead, so with min.insync.replicas=2, lead and one more replica besides
> >> lead
> >> > will have to ACK writes.
> >> >
> >> > In such setup, with minimalistic 3 brokers cluster, given that
> >> > - all 3 replicas are insync
> >> > - a batch of messages is written and ends up on lead and one replica
> ACKs
> >> > - another batch of messages ends up on lead and different replica ACKs
> >> >
> >> > Is it possible that when lead crashes, while replicas didn't catch up,
> >> > (part of) one batch of messages could be lost (since one replica
> becomes
> >> a
> >> > new lead, and it's only serving all reads and requests, and
> replication
> >> is
> >> > one way)?
> >> >
> >> > Kind regards,
> >> > Stevo Slavic.
> >>
>

Re: Kafka settings for (more) reliable/durable messaging

Posted by Gwen Shapira <gs...@cloudera.com>.

Ah, I think I see the confusion: Replicas don't actually ACK at all.
What happens is that the replica manager waits for enough ISR replicas
to reach the correct offset
Partition.checkEnoughReplicasReachOffset(...) has this logic. A
replica can't reach offset of second batch, without first having
written the first batch. So I believe we are safe in this scenario.

Gwen

On Tue, Jul 7, 2015 at 8:01 AM, Stevo Slavić <ss...@gmail.com> wrote:
> Hello Gwen,
>
> Thanks for fast response!
>
> Btw, congrats on officially becoming a Kafka committer and thanks, among
> other things, for great "Intro to Kafka" video
> http://shop.oreilly.com/product/0636920038603.do !
>
> Have to read more docs and/or source. I thought this scenario is possible
> because replica can fall behind (replica.lag.max.messages) and still be
> considered ISR. Then I assumed also write can be ACKed by any ISR, and then
> why not by one which has fallen more behind.
>
> Kind regards,
> Stevo Slavic.
>
> On Tue, Jul 7, 2015 at 4:47 PM, Gwen Shapira <gs...@cloudera.com> wrote:
>
>> I am not sure "different replica" can ACK the second back of messages
>> while not having the first - from what I can see, it will need to be
>> up-to-date on the latest messages (i.e. correct HWM) in order to ACK.
>>
>> On Tue, Jul 7, 2015 at 7:13 AM, Stevo Slavić <ss...@gmail.com> wrote:
>> > Hello Apache Kafka community,
>> >
>> > Documentation for min.insync.replicas in
>> > http://kafka.apache.org/documentation.html#brokerconfigs states:
>> >
>> > "When used together, min.insync.replicas and request.required.acks allow
>> > you to enforce greater durability guarantees. A typical scenario would be
>> > to create a topic with a replication factor of 3, set min.insync.replicas
>> > to 2, and produce with request.required.acks of -1. This will ensure that
>> > the producer raises an exception if a majority of replicas do not
>> receive a
>> > write."
>> >
>> > Correct me if wrong (doc reference?), I assume min.insync.replicas
>> includes
>> > lead, so with min.insync.replicas=2, lead and one more replica besides
>> lead
>> > will have to ACK writes.
>> >
>> > In such setup, with minimalistic 3 brokers cluster, given that
>> > - all 3 replicas are insync
>> > - a batch of messages is written and ends up on lead and one replica ACKs
>> > - another batch of messages ends up on lead and different replica ACKs
>> >
>> > Is it possible that when lead crashes, while replicas didn't catch up,
>> > (part of) one batch of messages could be lost (since one replica becomes
>> a
>> > new lead, and it's only serving all reads and requests, and replication
>> is
>> > one way)?
>> >
>> > Kind regards,
>> > Stevo Slavic.
>>

Re: Kafka settings for (more) reliable/durable messaging

Posted by Stevo Slavić <ss...@gmail.com>.

Hello Gwen,

Thanks for fast response!

Btw, congrats on officially becoming a Kafka committer and thanks, among
other things, for great "Intro to Kafka" video
http://shop.oreilly.com/product/0636920038603.do !

Have to read more docs and/or source. I thought this scenario is possible
because replica can fall behind (replica.lag.max.messages) and still be
considered ISR. Then I assumed also write can be ACKed by any ISR, and then
why not by one which has fallen more behind.

Kind regards,
Stevo Slavic.

On Tue, Jul 7, 2015 at 4:47 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> I am not sure "different replica" can ACK the second back of messages
> while not having the first - from what I can see, it will need to be
> up-to-date on the latest messages (i.e. correct HWM) in order to ACK.
>
> On Tue, Jul 7, 2015 at 7:13 AM, Stevo Slavić <ss...@gmail.com> wrote:
> > Hello Apache Kafka community,
> >
> > Documentation for min.insync.replicas in
> > http://kafka.apache.org/documentation.html#brokerconfigs states:
> >
> > "When used together, min.insync.replicas and request.required.acks allow
> > you to enforce greater durability guarantees. A typical scenario would be
> > to create a topic with a replication factor of 3, set min.insync.replicas
> > to 2, and produce with request.required.acks of -1. This will ensure that
> > the producer raises an exception if a majority of replicas do not
> receive a
> > write."
> >
> > Correct me if wrong (doc reference?), I assume min.insync.replicas
> includes
> > lead, so with min.insync.replicas=2, lead and one more replica besides
> lead
> > will have to ACK writes.
> >
> > In such setup, with minimalistic 3 brokers cluster, given that
> > - all 3 replicas are insync
> > - a batch of messages is written and ends up on lead and one replica ACKs
> > - another batch of messages ends up on lead and different replica ACKs
> >
> > Is it possible that when lead crashes, while replicas didn't catch up,
> > (part of) one batch of messages could be lost (since one replica becomes
> a
> > new lead, and it's only serving all reads and requests, and replication
> is
> > one way)?
> >
> > Kind regards,
> > Stevo Slavic.
>

Re: Kafka settings for (more) reliable/durable messaging

Posted by Gwen Shapira <gs...@cloudera.com>.

I am not sure "different replica" can ACK the second back of messages
while not having the first - from what I can see, it will need to be
up-to-date on the latest messages (i.e. correct HWM) in order to ACK.

On Tue, Jul 7, 2015 at 7:13 AM, Stevo Slavić <ss...@gmail.com> wrote:
> Hello Apache Kafka community,
>
> Documentation for min.insync.replicas in
> http://kafka.apache.org/documentation.html#brokerconfigs states:
>
> "When used together, min.insync.replicas and request.required.acks allow
> you to enforce greater durability guarantees. A typical scenario would be
> to create a topic with a replication factor of 3, set min.insync.replicas
> to 2, and produce with request.required.acks of -1. This will ensure that
> the producer raises an exception if a majority of replicas do not receive a
> write."
>
> Correct me if wrong (doc reference?), I assume min.insync.replicas includes
> lead, so with min.insync.replicas=2, lead and one more replica besides lead
> will have to ACK writes.
>
> In such setup, with minimalistic 3 brokers cluster, given that
> - all 3 replicas are insync
> - a batch of messages is written and ends up on lead and one replica ACKs
> - another batch of messages ends up on lead and different replica ACKs
>
> Is it possible that when lead crashes, while replicas didn't catch up,
> (part of) one batch of messages could be lost (since one replica becomes a
> new lead, and it's only serving all reads and requests, and replication is
> one way)?
>
> Kind regards,
> Stevo Slavic.