You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Andreas Flinck <an...@digitalroute.com> on 2015/11/27 09:28:39 UTC

What is the benefit of using acks=all and minover e.g. acks=3

Hi all

The reason why I need to know is that we have seen an issue when using acks=all, forcing us to quickly find an alternative. I leave the issue out of this post, but will probably come back to that! 

My question is about acks=all and min.insync.replicas property. Since we have found a workaround for an issue by using acks>1 instead of all (absolutely no clue why at this moment), I would like to know what benefit you get from e.g. acks=all and min.insync.replicas=3 instead of using acks=3 in a 5 broker cluster and replication-factor of 4. To my understanding you would get the exact level of durability and security from using either of those settings. However, I suspect this is not quite the case from finding hints without proper explanation that acks=all is preferred.


Regards
Andreas 

Re: What is the benefit of using acks=all and minover e.g. acks=3

Posted by Andreas Flinck <an...@digitalroute.com>.
Hi

We have run the tests with your proposed properties, but with the same result. However, we noticed that kafka broker only seems to run on 1 out of 72 cores with 600% cpu usage. It is obviously overloading one core without scaling threading.

The test environment is running RedHat 6.7 and java 1.8.0_65.

You have any idea why the broker process is not scaling across cores? Are there any more kafka broker properties or OS level settings to solve this issue?

Thanks in advance!

Andreas


On 28 Nov 2015, at 17:45, Prabhjot Bharaj <pr...@gmail.com>> wrote:


Hi,

Of all the parameters, num.replica.fetchers should be kept higher to 4 can be of help.
Please try it out and let us know if it worked

Thanks,
Prabhjot

On Nov 28, 2015 4:59 PM, "Andreas Flinck" <an...@digitalroute.com>> wrote:
Hi!

Here are our settings for the properties requested:

num.network.threads=3
socket.request.max.bytes=104857600
socket.receive.buffer.bytes=1048576
socket.send.buffer.bytes=1048576

The following properties we don't set at all, so I guess they will default according to the documentation (within parenthesis):

"num.replica.fetchers": (1)
"replica.fetch.wait.max.ms<http://replica.fetch.wait.max.ms/>": (500),
"num.recovery.threads.per.data.dir": (1)

The producer properties we explicitly set are the following;

block.on.buffer.full=false
client.id<http://client.id/>=MZ
max.request.size=1048576
acks=all
retries=0
timeout.ms<http://timeout.ms/>=30000
buffer.memory=67108864
metadata.fetch.timeout.ms<http://metadata.fetch.timeout.ms/>=3000

Do let me know what you think about it! We are currently setting up some tests using the broker properties that you suggested.

Regards
Andreas






________________________________________
Från: Prabhjot Bharaj <pr...@gmail.com>>
Skickat: den 28 november 2015 11:37
Till: users@kafka.apache.org<ma...@kafka.apache.org>
Ämne: Re: What is the benefit of using acks=all and minover e.g. acks=3

Hi,

Clogging can happen if, as seems in your case, the requests are bounded by
network.
Just to confirm your configurations, does your broker configuration look
like this?? :-

"num.replica.fetchers": 4,
"replica.fetch.wait.max.ms<http://replica.fetch.wait.max.ms/>": 500,
"num.recovery.threads.per.data.dir": 4,


"num.network.threads": 8,
"socket.request.max.bytes": 104857600,
"socket.receive.buffer.bytes": 10485760,
"socket.send.buffer.bytes": 10485760,

Similarly, please share your producer config as well. I'm thinking may be
it is related to tuning your cluster.

Thanks,
Prabhjot


On Sat, Nov 28, 2015 at 3:54 PM, Andreas Flinck <
andreas.flinck@digitalroute.com<ma...@digitalroute.com>> wrote:

> Great, thanks for the information! So it is definitely acks=all we want to
> go for. Unfortunately we run into an blocking issue in our production like
> test environment which we have not been able to find a solution for. So
> here it is, ANY idea on how we could possibly find a solution is very much
> appreciated!
>
> Environment:
> Kafka version: kafka_2.11-0.8.2.1
> 5 kafka brokers and 5 ZK on spread out on 5 hosts
> Using new producer (async)
>
> Topic:
> partitions=10
> replication-factor=4
> min.insync.replicas=2
>
> Default property values used for broker configs and producer.
>
> Scenario and problem:
> Incoming diameter data (10k TPS) is sent to 5 topics via 5 producers which
> is working great until we start another 5 producers sending to another 5
> topics with the same rate (10k). What happens then is that the producers
> sending to 2 of the topics fills up the buffer and the throughput becomes
> very low, with BufferExhaustedExceptions for most of the messages. When
> checking the latency for the problematic topics it becomes really high
> (around 150ms). Stopping the 5 producers that were started in the second
> round, the latency goes down to about 1 ms again and the buffer will go
> back to normal. The load is not that high, about 10MB/s, it is not even
> near disk bound.
> So the questions right now are, why do we get such high latency to
> specifically two topics when starting more producers, even though cpu and
> disk load looks unproblematic? And why two topics specifically, is there an
> order of what topics to prfioritize when things get clogged for some reason?
>
> Sorry for the quite messy description, we are all kind of new at kafka
> here!
>
> BR
> Andreas
>
> > On 28 Nov 2015, at 09:26, Prabhjot Bharaj <pr...@gmail.com>> wrote:
> >
> > Hi,
> >
> > This should help :)
> >
> > During my benchmarks, I noticed that if 5 node kafka cluster running 1
> > topic is given a continuous injection of 50GB in one shot (using a
> modified
> > producer performance script, which writes my custom data to kafka), the
> > last replica can sometimes lag and it used to catch up at a speed of 1GB
> in
> > 20-25 seconds. This lag increases if producer performance injects 200GB
> in
> > one shot.
> >
> > I'm not sure how it will behave with multiple topics.  it could have an
> > impact on the overall throughput (because more partitions will be alive
> on
> > the same broker thereby dividing the network usage), but I have to test
> it
> > in staging environment
> >
> > Regards,
> > Prabhjot
> >
> > On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira <gw...@confluent.io>>
> wrote:
> >
> >> Hi,
> >>
> >> min.insync.replica is alive and well in 0.9 :)
> >>
> >> Normally, you will have 4 our of 4 replicas in sync. However if one of
> the
> >> replicas will fall behind, you will have 3 out of 4 in sync.
> >> If you set min.insync.replica = 3, produce requests will fail if the
> number
> >> on in-sync replicas fall below 3.
> >>
> >> I hope this helps.
> >>
> >> Gwen
> >>
> >> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj <pr...@gmail.com>
> >
> >> wrote:
> >>
> >>> Hi Gwen,
> >>>
> >>> How about min.isr.replicas property?
> >>> Is it still valid in the new version 0.9 ?
> >>>
> >>> We could get 3 out of 4 replicas in sync if we set it's value to 3.
> >>> Correct?
> >>>
> >>> Thanks,
> >>> Prabhjot
> >>> On Nov 28, 2015 10:20 AM, "Gwen Shapira" <gw...@confluent.io>> wrote:
> >>>
> >>>> In your scenario, you are receiving acks from 3 replicas while it is
> >>>> possible to have 4 in the ISR. This means that one replica can be up
> to
> >>>> 4000 messages (by default) behind others. If a leader crashes, there
> is
> >>> 33%
> >>>> chance this replica will become the new leader, thereby losing up to
> >> 4000
> >>>> messages.
> >>>>
> >>>> acks = all requires all ISR to ack as long as they are in the ISR,
> >>>> protecting you from this scenario (but leading to high latency if a
> >>> replica
> >>>> is hanging and is just about to drop out of the ISR).
> >>>>
> >>>> Also, note that in future versions acks > 1 was deprecated, to protect
> >>>> against such subtle mistakes.
> >>>>
> >>>> Gwen
> >>>>
> >>>> On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
> >>>> andreas.flinck@digitalroute.com<ma...@digitalroute.com>> wrote:
> >>>>
> >>>>> Hi all
> >>>>>
> >>>>> The reason why I need to know is that we have seen an issue when
> >> using
> >>>>> acks=all, forcing us to quickly find an alternative. I leave the
> >> issue
> >>>> out
> >>>>> of this post, but will probably come back to that!
> >>>>>
> >>>>> My question is about acks=all and min.insync.replicas property. Since
> >>> we
> >>>>> have found a workaround for an issue by using acks>1 instead of all
> >>>>> (absolutely no clue why at this moment), I would like to know what
> >>>> benefit
> >>>>> you get from e.g. acks=all and min.insync.replicas=3 instead of using
> >>>>> acks=3 in a 5 broker cluster and replication-factor of 4. To my
> >>>>> understanding you would get the exact level of durability and
> >> security
> >>>> from
> >>>>> using either of those settings. However, I suspect this is not quite
> >>> the
> >>>>> case from finding hints without proper explanation that acks=all is
> >>>>> preferred.
> >>>>>
> >>>>>
> >>>>> Regards
> >>>>> Andreas
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > ---------------------------------------------------------
> > "There are only 10 types of people in the world: Those who understand
> > binary, and those who don't"
>
>


--
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: SV: What is the benefit of using acks=all and minover e.g. acks=3

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Hi,

Of all the parameters, num.replica.fetchers should be kept higher to 4 can
be of help.
Please try it out and let us know if it worked

Thanks,
Prabhjot
On Nov 28, 2015 4:59 PM, "Andreas Flinck" <an...@digitalroute.com>
wrote:

> Hi!
>
> Here are our settings for the properties requested:
>
> num.network.threads=3
> socket.request.max.bytes=104857600
> socket.receive.buffer.bytes=1048576
> socket.send.buffer.bytes=1048576
>
> The following properties we don't set at all, so I guess they will default
> according to the documentation (within parenthesis):
>
> "num.replica.fetchers": (1)
> "replica.fetch.wait.max.ms": (500),
> "num.recovery.threads.per.data.dir": (1)
>
> The producer properties we explicitly set are the following;
>
> block.on.buffer.full=false
> client.id=MZ
> max.request.size=1048576
> acks=all
> retries=0
> timeout.ms=30000
> buffer.memory=67108864
> metadata.fetch.timeout.ms=3000
>
> Do let me know what you think about it! We are currently setting up some
> tests using the broker properties that you suggested.
>
> Regards
> Andreas
>
>
>
>
>
>
> ________________________________________
> Från: Prabhjot Bharaj <pr...@gmail.com>
> Skickat: den 28 november 2015 11:37
> Till: users@kafka.apache.org
> Ämne: Re: What is the benefit of using acks=all and minover e.g. acks=3
>
> Hi,
>
> Clogging can happen if, as seems in your case, the requests are bounded by
> network.
> Just to confirm your configurations, does your broker configuration look
> like this?? :-
>
> "num.replica.fetchers": 4,
> "replica.fetch.wait.max.ms": 500,
> "num.recovery.threads.per.data.dir": 4,
>
>
> "num.network.threads": 8,
> "socket.request.max.bytes": 104857600,
> "socket.receive.buffer.bytes": 10485760,
> "socket.send.buffer.bytes": 10485760,
>
> Similarly, please share your producer config as well. I'm thinking may be
> it is related to tuning your cluster.
>
> Thanks,
> Prabhjot
>
>
> On Sat, Nov 28, 2015 at 3:54 PM, Andreas Flinck <
> andreas.flinck@digitalroute.com> wrote:
>
> > Great, thanks for the information! So it is definitely acks=all we want
> to
> > go for. Unfortunately we run into an blocking issue in our production
> like
> > test environment which we have not been able to find a solution for. So
> > here it is, ANY idea on how we could possibly find a solution is very
> much
> > appreciated!
> >
> > Environment:
> > Kafka version: kafka_2.11-0.8.2.1
> > 5 kafka brokers and 5 ZK on spread out on 5 hosts
> > Using new producer (async)
> >
> > Topic:
> > partitions=10
> > replication-factor=4
> > min.insync.replicas=2
> >
> > Default property values used for broker configs and producer.
> >
> > Scenario and problem:
> > Incoming diameter data (10k TPS) is sent to 5 topics via 5 producers
> which
> > is working great until we start another 5 producers sending to another 5
> > topics with the same rate (10k). What happens then is that the producers
> > sending to 2 of the topics fills up the buffer and the throughput becomes
> > very low, with BufferExhaustedExceptions for most of the messages. When
> > checking the latency for the problematic topics it becomes really high
> > (around 150ms). Stopping the 5 producers that were started in the second
> > round, the latency goes down to about 1 ms again and the buffer will go
> > back to normal. The load is not that high, about 10MB/s, it is not even
> > near disk bound.
> > So the questions right now are, why do we get such high latency to
> > specifically two topics when starting more producers, even though cpu and
> > disk load looks unproblematic? And why two topics specifically, is there
> an
> > order of what topics to prfioritize when things get clogged for some
> reason?
> >
> > Sorry for the quite messy description, we are all kind of new at kafka
> > here!
> >
> > BR
> > Andreas
> >
> > > On 28 Nov 2015, at 09:26, Prabhjot Bharaj <pr...@gmail.com>
> wrote:
> > >
> > > Hi,
> > >
> > > This should help :)
> > >
> > > During my benchmarks, I noticed that if 5 node kafka cluster running 1
> > > topic is given a continuous injection of 50GB in one shot (using a
> > modified
> > > producer performance script, which writes my custom data to kafka), the
> > > last replica can sometimes lag and it used to catch up at a speed of
> 1GB
> > in
> > > 20-25 seconds. This lag increases if producer performance injects 200GB
> > in
> > > one shot.
> > >
> > > I'm not sure how it will behave with multiple topics.  it could have an
> > > impact on the overall throughput (because more partitions will be alive
> > on
> > > the same broker thereby dividing the network usage), but I have to test
> > it
> > > in staging environment
> > >
> > > Regards,
> > > Prabhjot
> > >
> > > On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira <gw...@confluent.io>
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> min.insync.replica is alive and well in 0.9 :)
> > >>
> > >> Normally, you will have 4 our of 4 replicas in sync. However if one of
> > the
> > >> replicas will fall behind, you will have 3 out of 4 in sync.
> > >> If you set min.insync.replica = 3, produce requests will fail if the
> > number
> > >> on in-sync replicas fall below 3.
> > >>
> > >> I hope this helps.
> > >>
> > >> Gwen
> > >>
> > >> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj <
> prabhbharaj@gmail.com
> > >
> > >> wrote:
> > >>
> > >>> Hi Gwen,
> > >>>
> > >>> How about min.isr.replicas property?
> > >>> Is it still valid in the new version 0.9 ?
> > >>>
> > >>> We could get 3 out of 4 replicas in sync if we set it's value to 3.
> > >>> Correct?
> > >>>
> > >>> Thanks,
> > >>> Prabhjot
> > >>> On Nov 28, 2015 10:20 AM, "Gwen Shapira" <gw...@confluent.io> wrote:
> > >>>
> > >>>> In your scenario, you are receiving acks from 3 replicas while it is
> > >>>> possible to have 4 in the ISR. This means that one replica can be up
> > to
> > >>>> 4000 messages (by default) behind others. If a leader crashes, there
> > is
> > >>> 33%
> > >>>> chance this replica will become the new leader, thereby losing up to
> > >> 4000
> > >>>> messages.
> > >>>>
> > >>>> acks = all requires all ISR to ack as long as they are in the ISR,
> > >>>> protecting you from this scenario (but leading to high latency if a
> > >>> replica
> > >>>> is hanging and is just about to drop out of the ISR).
> > >>>>
> > >>>> Also, note that in future versions acks > 1 was deprecated, to
> protect
> > >>>> against such subtle mistakes.
> > >>>>
> > >>>> Gwen
> > >>>>
> > >>>> On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
> > >>>> andreas.flinck@digitalroute.com> wrote:
> > >>>>
> > >>>>> Hi all
> > >>>>>
> > >>>>> The reason why I need to know is that we have seen an issue when
> > >> using
> > >>>>> acks=all, forcing us to quickly find an alternative. I leave the
> > >> issue
> > >>>> out
> > >>>>> of this post, but will probably come back to that!
> > >>>>>
> > >>>>> My question is about acks=all and min.insync.replicas property.
> Since
> > >>> we
> > >>>>> have found a workaround for an issue by using acks>1 instead of all
> > >>>>> (absolutely no clue why at this moment), I would like to know what
> > >>>> benefit
> > >>>>> you get from e.g. acks=all and min.insync.replicas=3 instead of
> using
> > >>>>> acks=3 in a 5 broker cluster and replication-factor of 4. To my
> > >>>>> understanding you would get the exact level of durability and
> > >> security
> > >>>> from
> > >>>>> using either of those settings. However, I suspect this is not
> quite
> > >>> the
> > >>>>> case from finding hints without proper explanation that acks=all is
> > >>>>> preferred.
> > >>>>>
> > >>>>>
> > >>>>> Regards
> > >>>>> Andreas
> > >>>>
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > ---------------------------------------------------------
> > > "There are only 10 types of people in the world: Those who understand
> > > binary, and those who don't"
> >
> >
>
>
> --
> ---------------------------------------------------------
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"

SV: What is the benefit of using acks=all and minover e.g. acks=3

Posted by Andreas Flinck <an...@digitalroute.com>.
Hi!

Here are our settings for the properties requested:

num.network.threads=3
socket.request.max.bytes=104857600
socket.receive.buffer.bytes=1048576
socket.send.buffer.bytes=1048576

The following properties we don't set at all, so I guess they will default according to the documentation (within parenthesis):

"num.replica.fetchers": (1)
"replica.fetch.wait.max.ms": (500),
"num.recovery.threads.per.data.dir": (1)

The producer properties we explicitly set are the following;

block.on.buffer.full=false
client.id=MZ
max.request.size=1048576
acks=all
retries=0
timeout.ms=30000
buffer.memory=67108864
metadata.fetch.timeout.ms=3000

Do let me know what you think about it! We are currently setting up some tests using the broker properties that you suggested.

Regards
Andreas






________________________________________
Från: Prabhjot Bharaj <pr...@gmail.com>
Skickat: den 28 november 2015 11:37
Till: users@kafka.apache.org
Ämne: Re: What is the benefit of using acks=all and minover e.g. acks=3

Hi,

Clogging can happen if, as seems in your case, the requests are bounded by
network.
Just to confirm your configurations, does your broker configuration look
like this?? :-

"num.replica.fetchers": 4,
"replica.fetch.wait.max.ms": 500,
"num.recovery.threads.per.data.dir": 4,


"num.network.threads": 8,
"socket.request.max.bytes": 104857600,
"socket.receive.buffer.bytes": 10485760,
"socket.send.buffer.bytes": 10485760,

Similarly, please share your producer config as well. I'm thinking may be
it is related to tuning your cluster.

Thanks,
Prabhjot


On Sat, Nov 28, 2015 at 3:54 PM, Andreas Flinck <
andreas.flinck@digitalroute.com> wrote:

> Great, thanks for the information! So it is definitely acks=all we want to
> go for. Unfortunately we run into an blocking issue in our production like
> test environment which we have not been able to find a solution for. So
> here it is, ANY idea on how we could possibly find a solution is very much
> appreciated!
>
> Environment:
> Kafka version: kafka_2.11-0.8.2.1
> 5 kafka brokers and 5 ZK on spread out on 5 hosts
> Using new producer (async)
>
> Topic:
> partitions=10
> replication-factor=4
> min.insync.replicas=2
>
> Default property values used for broker configs and producer.
>
> Scenario and problem:
> Incoming diameter data (10k TPS) is sent to 5 topics via 5 producers which
> is working great until we start another 5 producers sending to another 5
> topics with the same rate (10k). What happens then is that the producers
> sending to 2 of the topics fills up the buffer and the throughput becomes
> very low, with BufferExhaustedExceptions for most of the messages. When
> checking the latency for the problematic topics it becomes really high
> (around 150ms). Stopping the 5 producers that were started in the second
> round, the latency goes down to about 1 ms again and the buffer will go
> back to normal. The load is not that high, about 10MB/s, it is not even
> near disk bound.
> So the questions right now are, why do we get such high latency to
> specifically two topics when starting more producers, even though cpu and
> disk load looks unproblematic? And why two topics specifically, is there an
> order of what topics to prfioritize when things get clogged for some reason?
>
> Sorry for the quite messy description, we are all kind of new at kafka
> here!
>
> BR
> Andreas
>
> > On 28 Nov 2015, at 09:26, Prabhjot Bharaj <pr...@gmail.com> wrote:
> >
> > Hi,
> >
> > This should help :)
> >
> > During my benchmarks, I noticed that if 5 node kafka cluster running 1
> > topic is given a continuous injection of 50GB in one shot (using a
> modified
> > producer performance script, which writes my custom data to kafka), the
> > last replica can sometimes lag and it used to catch up at a speed of 1GB
> in
> > 20-25 seconds. This lag increases if producer performance injects 200GB
> in
> > one shot.
> >
> > I'm not sure how it will behave with multiple topics.  it could have an
> > impact on the overall throughput (because more partitions will be alive
> on
> > the same broker thereby dividing the network usage), but I have to test
> it
> > in staging environment
> >
> > Regards,
> > Prabhjot
> >
> > On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira <gw...@confluent.io>
> wrote:
> >
> >> Hi,
> >>
> >> min.insync.replica is alive and well in 0.9 :)
> >>
> >> Normally, you will have 4 our of 4 replicas in sync. However if one of
> the
> >> replicas will fall behind, you will have 3 out of 4 in sync.
> >> If you set min.insync.replica = 3, produce requests will fail if the
> number
> >> on in-sync replicas fall below 3.
> >>
> >> I hope this helps.
> >>
> >> Gwen
> >>
> >> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj <prabhbharaj@gmail.com
> >
> >> wrote:
> >>
> >>> Hi Gwen,
> >>>
> >>> How about min.isr.replicas property?
> >>> Is it still valid in the new version 0.9 ?
> >>>
> >>> We could get 3 out of 4 replicas in sync if we set it's value to 3.
> >>> Correct?
> >>>
> >>> Thanks,
> >>> Prabhjot
> >>> On Nov 28, 2015 10:20 AM, "Gwen Shapira" <gw...@confluent.io> wrote:
> >>>
> >>>> In your scenario, you are receiving acks from 3 replicas while it is
> >>>> possible to have 4 in the ISR. This means that one replica can be up
> to
> >>>> 4000 messages (by default) behind others. If a leader crashes, there
> is
> >>> 33%
> >>>> chance this replica will become the new leader, thereby losing up to
> >> 4000
> >>>> messages.
> >>>>
> >>>> acks = all requires all ISR to ack as long as they are in the ISR,
> >>>> protecting you from this scenario (but leading to high latency if a
> >>> replica
> >>>> is hanging and is just about to drop out of the ISR).
> >>>>
> >>>> Also, note that in future versions acks > 1 was deprecated, to protect
> >>>> against such subtle mistakes.
> >>>>
> >>>> Gwen
> >>>>
> >>>> On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
> >>>> andreas.flinck@digitalroute.com> wrote:
> >>>>
> >>>>> Hi all
> >>>>>
> >>>>> The reason why I need to know is that we have seen an issue when
> >> using
> >>>>> acks=all, forcing us to quickly find an alternative. I leave the
> >> issue
> >>>> out
> >>>>> of this post, but will probably come back to that!
> >>>>>
> >>>>> My question is about acks=all and min.insync.replicas property. Since
> >>> we
> >>>>> have found a workaround for an issue by using acks>1 instead of all
> >>>>> (absolutely no clue why at this moment), I would like to know what
> >>>> benefit
> >>>>> you get from e.g. acks=all and min.insync.replicas=3 instead of using
> >>>>> acks=3 in a 5 broker cluster and replication-factor of 4. To my
> >>>>> understanding you would get the exact level of durability and
> >> security
> >>>> from
> >>>>> using either of those settings. However, I suspect this is not quite
> >>> the
> >>>>> case from finding hints without proper explanation that acks=all is
> >>>>> preferred.
> >>>>>
> >>>>>
> >>>>> Regards
> >>>>> Andreas
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > ---------------------------------------------------------
> > "There are only 10 types of people in the world: Those who understand
> > binary, and those who don't"
>
>


--
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: What is the benefit of using acks=all and minover e.g. acks=3

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Hi,

Clogging can happen if, as seems in your case, the requests are bounded by
network.
Just to confirm your configurations, does your broker configuration look
like this?? :-

"num.replica.fetchers": 4,
"replica.fetch.wait.max.ms": 500,
"num.recovery.threads.per.data.dir": 4,


"num.network.threads": 8,
"socket.request.max.bytes": 104857600,
"socket.receive.buffer.bytes": 10485760,
"socket.send.buffer.bytes": 10485760,

Similarly, please share your producer config as well. I'm thinking may be
it is related to tuning your cluster.

Thanks,
Prabhjot


On Sat, Nov 28, 2015 at 3:54 PM, Andreas Flinck <
andreas.flinck@digitalroute.com> wrote:

> Great, thanks for the information! So it is definitely acks=all we want to
> go for. Unfortunately we run into an blocking issue in our production like
> test environment which we have not been able to find a solution for. So
> here it is, ANY idea on how we could possibly find a solution is very much
> appreciated!
>
> Environment:
> Kafka version: kafka_2.11-0.8.2.1
> 5 kafka brokers and 5 ZK on spread out on 5 hosts
> Using new producer (async)
>
> Topic:
> partitions=10
> replication-factor=4
> min.insync.replicas=2
>
> Default property values used for broker configs and producer.
>
> Scenario and problem:
> Incoming diameter data (10k TPS) is sent to 5 topics via 5 producers which
> is working great until we start another 5 producers sending to another 5
> topics with the same rate (10k). What happens then is that the producers
> sending to 2 of the topics fills up the buffer and the throughput becomes
> very low, with BufferExhaustedExceptions for most of the messages. When
> checking the latency for the problematic topics it becomes really high
> (around 150ms). Stopping the 5 producers that were started in the second
> round, the latency goes down to about 1 ms again and the buffer will go
> back to normal. The load is not that high, about 10MB/s, it is not even
> near disk bound.
> So the questions right now are, why do we get such high latency to
> specifically two topics when starting more producers, even though cpu and
> disk load looks unproblematic? And why two topics specifically, is there an
> order of what topics to prfioritize when things get clogged for some reason?
>
> Sorry for the quite messy description, we are all kind of new at kafka
> here!
>
> BR
> Andreas
>
> > On 28 Nov 2015, at 09:26, Prabhjot Bharaj <pr...@gmail.com> wrote:
> >
> > Hi,
> >
> > This should help :)
> >
> > During my benchmarks, I noticed that if 5 node kafka cluster running 1
> > topic is given a continuous injection of 50GB in one shot (using a
> modified
> > producer performance script, which writes my custom data to kafka), the
> > last replica can sometimes lag and it used to catch up at a speed of 1GB
> in
> > 20-25 seconds. This lag increases if producer performance injects 200GB
> in
> > one shot.
> >
> > I'm not sure how it will behave with multiple topics.  it could have an
> > impact on the overall throughput (because more partitions will be alive
> on
> > the same broker thereby dividing the network usage), but I have to test
> it
> > in staging environment
> >
> > Regards,
> > Prabhjot
> >
> > On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira <gw...@confluent.io>
> wrote:
> >
> >> Hi,
> >>
> >> min.insync.replica is alive and well in 0.9 :)
> >>
> >> Normally, you will have 4 our of 4 replicas in sync. However if one of
> the
> >> replicas will fall behind, you will have 3 out of 4 in sync.
> >> If you set min.insync.replica = 3, produce requests will fail if the
> number
> >> on in-sync replicas fall below 3.
> >>
> >> I hope this helps.
> >>
> >> Gwen
> >>
> >> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj <prabhbharaj@gmail.com
> >
> >> wrote:
> >>
> >>> Hi Gwen,
> >>>
> >>> How about min.isr.replicas property?
> >>> Is it still valid in the new version 0.9 ?
> >>>
> >>> We could get 3 out of 4 replicas in sync if we set it's value to 3.
> >>> Correct?
> >>>
> >>> Thanks,
> >>> Prabhjot
> >>> On Nov 28, 2015 10:20 AM, "Gwen Shapira" <gw...@confluent.io> wrote:
> >>>
> >>>> In your scenario, you are receiving acks from 3 replicas while it is
> >>>> possible to have 4 in the ISR. This means that one replica can be up
> to
> >>>> 4000 messages (by default) behind others. If a leader crashes, there
> is
> >>> 33%
> >>>> chance this replica will become the new leader, thereby losing up to
> >> 4000
> >>>> messages.
> >>>>
> >>>> acks = all requires all ISR to ack as long as they are in the ISR,
> >>>> protecting you from this scenario (but leading to high latency if a
> >>> replica
> >>>> is hanging and is just about to drop out of the ISR).
> >>>>
> >>>> Also, note that in future versions acks > 1 was deprecated, to protect
> >>>> against such subtle mistakes.
> >>>>
> >>>> Gwen
> >>>>
> >>>> On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
> >>>> andreas.flinck@digitalroute.com> wrote:
> >>>>
> >>>>> Hi all
> >>>>>
> >>>>> The reason why I need to know is that we have seen an issue when
> >> using
> >>>>> acks=all, forcing us to quickly find an alternative. I leave the
> >> issue
> >>>> out
> >>>>> of this post, but will probably come back to that!
> >>>>>
> >>>>> My question is about acks=all and min.insync.replicas property. Since
> >>> we
> >>>>> have found a workaround for an issue by using acks>1 instead of all
> >>>>> (absolutely no clue why at this moment), I would like to know what
> >>>> benefit
> >>>>> you get from e.g. acks=all and min.insync.replicas=3 instead of using
> >>>>> acks=3 in a 5 broker cluster and replication-factor of 4. To my
> >>>>> understanding you would get the exact level of durability and
> >> security
> >>>> from
> >>>>> using either of those settings. However, I suspect this is not quite
> >>> the
> >>>>> case from finding hints without proper explanation that acks=all is
> >>>>> preferred.
> >>>>>
> >>>>>
> >>>>> Regards
> >>>>> Andreas
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > ---------------------------------------------------------
> > "There are only 10 types of people in the world: Those who understand
> > binary, and those who don't"
>
>


-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: What is the benefit of using acks=all and minover e.g. acks=3

Posted by Andreas Flinck <an...@digitalroute.com>.
Great, thanks for the information! So it is definitely acks=all we want to go for. Unfortunately we run into an blocking issue in our production like test environment which we have not been able to find a solution for. So here it is, ANY idea on how we could possibly find a solution is very much appreciated!

Environment:
Kafka version: kafka_2.11-0.8.2.1
5 kafka brokers and 5 ZK on spread out on 5 hosts
Using new producer (async)

Topic:
partitions=10
replication-factor=4
min.insync.replicas=2

Default property values used for broker configs and producer.

Scenario and problem:
Incoming diameter data (10k TPS) is sent to 5 topics via 5 producers which is working great until we start another 5 producers sending to another 5 topics with the same rate (10k). What happens then is that the producers sending to 2 of the topics fills up the buffer and the throughput becomes very low, with BufferExhaustedExceptions for most of the messages. When checking the latency for the problematic topics it becomes really high (around 150ms). Stopping the 5 producers that were started in the second round, the latency goes down to about 1 ms again and the buffer will go back to normal. The load is not that high, about 10MB/s, it is not even near disk bound. 
So the questions right now are, why do we get such high latency to specifically two topics when starting more producers, even though cpu and disk load looks unproblematic? And why two topics specifically, is there an order of what topics to prfioritize when things get clogged for some reason? 

Sorry for the quite messy description, we are all kind of new at kafka here!

BR
Andreas

> On 28 Nov 2015, at 09:26, Prabhjot Bharaj <pr...@gmail.com> wrote:
> 
> Hi,
> 
> This should help :)
> 
> During my benchmarks, I noticed that if 5 node kafka cluster running 1
> topic is given a continuous injection of 50GB in one shot (using a modified
> producer performance script, which writes my custom data to kafka), the
> last replica can sometimes lag and it used to catch up at a speed of 1GB in
> 20-25 seconds. This lag increases if producer performance injects 200GB in
> one shot.
> 
> I'm not sure how it will behave with multiple topics.  it could have an
> impact on the overall throughput (because more partitions will be alive on
> the same broker thereby dividing the network usage), but I have to test it
> in staging environment
> 
> Regards,
> Prabhjot
> 
> On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira <gw...@confluent.io> wrote:
> 
>> Hi,
>> 
>> min.insync.replica is alive and well in 0.9 :)
>> 
>> Normally, you will have 4 our of 4 replicas in sync. However if one of the
>> replicas will fall behind, you will have 3 out of 4 in sync.
>> If you set min.insync.replica = 3, produce requests will fail if the number
>> on in-sync replicas fall below 3.
>> 
>> I hope this helps.
>> 
>> Gwen
>> 
>> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj <pr...@gmail.com>
>> wrote:
>> 
>>> Hi Gwen,
>>> 
>>> How about min.isr.replicas property?
>>> Is it still valid in the new version 0.9 ?
>>> 
>>> We could get 3 out of 4 replicas in sync if we set it's value to 3.
>>> Correct?
>>> 
>>> Thanks,
>>> Prabhjot
>>> On Nov 28, 2015 10:20 AM, "Gwen Shapira" <gw...@confluent.io> wrote:
>>> 
>>>> In your scenario, you are receiving acks from 3 replicas while it is
>>>> possible to have 4 in the ISR. This means that one replica can be up to
>>>> 4000 messages (by default) behind others. If a leader crashes, there is
>>> 33%
>>>> chance this replica will become the new leader, thereby losing up to
>> 4000
>>>> messages.
>>>> 
>>>> acks = all requires all ISR to ack as long as they are in the ISR,
>>>> protecting you from this scenario (but leading to high latency if a
>>> replica
>>>> is hanging and is just about to drop out of the ISR).
>>>> 
>>>> Also, note that in future versions acks > 1 was deprecated, to protect
>>>> against such subtle mistakes.
>>>> 
>>>> Gwen
>>>> 
>>>> On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
>>>> andreas.flinck@digitalroute.com> wrote:
>>>> 
>>>>> Hi all
>>>>> 
>>>>> The reason why I need to know is that we have seen an issue when
>> using
>>>>> acks=all, forcing us to quickly find an alternative. I leave the
>> issue
>>>> out
>>>>> of this post, but will probably come back to that!
>>>>> 
>>>>> My question is about acks=all and min.insync.replicas property. Since
>>> we
>>>>> have found a workaround for an issue by using acks>1 instead of all
>>>>> (absolutely no clue why at this moment), I would like to know what
>>>> benefit
>>>>> you get from e.g. acks=all and min.insync.replicas=3 instead of using
>>>>> acks=3 in a 5 broker cluster and replication-factor of 4. To my
>>>>> understanding you would get the exact level of durability and
>> security
>>>> from
>>>>> using either of those settings. However, I suspect this is not quite
>>> the
>>>>> case from finding hints without proper explanation that acks=all is
>>>>> preferred.
>>>>> 
>>>>> 
>>>>> Regards
>>>>> Andreas
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> ---------------------------------------------------------
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"


Re: What is the benefit of using acks=all and minover e.g. acks=3

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Hi,

This should help :)

During my benchmarks, I noticed that if 5 node kafka cluster running 1
topic is given a continuous injection of 50GB in one shot (using a modified
producer performance script, which writes my custom data to kafka), the
last replica can sometimes lag and it used to catch up at a speed of 1GB in
20-25 seconds. This lag increases if producer performance injects 200GB in
one shot.

I'm not sure how it will behave with multiple topics.  it could have an
impact on the overall throughput (because more partitions will be alive on
the same broker thereby dividing the network usage), but I have to test it
in staging environment

Regards,
Prabhjot

On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira <gw...@confluent.io> wrote:

> Hi,
>
> min.insync.replica is alive and well in 0.9 :)
>
> Normally, you will have 4 our of 4 replicas in sync. However if one of the
> replicas will fall behind, you will have 3 out of 4 in sync.
> If you set min.insync.replica = 3, produce requests will fail if the number
> on in-sync replicas fall below 3.
>
> I hope this helps.
>
> Gwen
>
> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj <pr...@gmail.com>
> wrote:
>
> > Hi Gwen,
> >
> > How about min.isr.replicas property?
> > Is it still valid in the new version 0.9 ?
> >
> > We could get 3 out of 4 replicas in sync if we set it's value to 3.
> > Correct?
> >
> > Thanks,
> > Prabhjot
> > On Nov 28, 2015 10:20 AM, "Gwen Shapira" <gw...@confluent.io> wrote:
> >
> > > In your scenario, you are receiving acks from 3 replicas while it is
> > > possible to have 4 in the ISR. This means that one replica can be up to
> > > 4000 messages (by default) behind others. If a leader crashes, there is
> > 33%
> > > chance this replica will become the new leader, thereby losing up to
> 4000
> > > messages.
> > >
> > > acks = all requires all ISR to ack as long as they are in the ISR,
> > > protecting you from this scenario (but leading to high latency if a
> > replica
> > > is hanging and is just about to drop out of the ISR).
> > >
> > > Also, note that in future versions acks > 1 was deprecated, to protect
> > > against such subtle mistakes.
> > >
> > > Gwen
> > >
> > > On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
> > > andreas.flinck@digitalroute.com> wrote:
> > >
> > > > Hi all
> > > >
> > > > The reason why I need to know is that we have seen an issue when
> using
> > > > acks=all, forcing us to quickly find an alternative. I leave the
> issue
> > > out
> > > > of this post, but will probably come back to that!
> > > >
> > > > My question is about acks=all and min.insync.replicas property. Since
> > we
> > > > have found a workaround for an issue by using acks>1 instead of all
> > > > (absolutely no clue why at this moment), I would like to know what
> > > benefit
> > > > you get from e.g. acks=all and min.insync.replicas=3 instead of using
> > > > acks=3 in a 5 broker cluster and replication-factor of 4. To my
> > > > understanding you would get the exact level of durability and
> security
> > > from
> > > > using either of those settings. However, I suspect this is not quite
> > the
> > > > case from finding hints without proper explanation that acks=all is
> > > > preferred.
> > > >
> > > >
> > > > Regards
> > > > Andreas
> > >
> >
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: What is the benefit of using acks=all and minover e.g. acks=3

Posted by Gwen Shapira <gw...@confluent.io>.
Hi,

min.insync.replica is alive and well in 0.9 :)

Normally, you will have 4 our of 4 replicas in sync. However if one of the
replicas will fall behind, you will have 3 out of 4 in sync.
If you set min.insync.replica = 3, produce requests will fail if the number
on in-sync replicas fall below 3.

I hope this helps.

Gwen

On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> Hi Gwen,
>
> How about min.isr.replicas property?
> Is it still valid in the new version 0.9 ?
>
> We could get 3 out of 4 replicas in sync if we set it's value to 3.
> Correct?
>
> Thanks,
> Prabhjot
> On Nov 28, 2015 10:20 AM, "Gwen Shapira" <gw...@confluent.io> wrote:
>
> > In your scenario, you are receiving acks from 3 replicas while it is
> > possible to have 4 in the ISR. This means that one replica can be up to
> > 4000 messages (by default) behind others. If a leader crashes, there is
> 33%
> > chance this replica will become the new leader, thereby losing up to 4000
> > messages.
> >
> > acks = all requires all ISR to ack as long as they are in the ISR,
> > protecting you from this scenario (but leading to high latency if a
> replica
> > is hanging and is just about to drop out of the ISR).
> >
> > Also, note that in future versions acks > 1 was deprecated, to protect
> > against such subtle mistakes.
> >
> > Gwen
> >
> > On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
> > andreas.flinck@digitalroute.com> wrote:
> >
> > > Hi all
> > >
> > > The reason why I need to know is that we have seen an issue when using
> > > acks=all, forcing us to quickly find an alternative. I leave the issue
> > out
> > > of this post, but will probably come back to that!
> > >
> > > My question is about acks=all and min.insync.replicas property. Since
> we
> > > have found a workaround for an issue by using acks>1 instead of all
> > > (absolutely no clue why at this moment), I would like to know what
> > benefit
> > > you get from e.g. acks=all and min.insync.replicas=3 instead of using
> > > acks=3 in a 5 broker cluster and replication-factor of 4. To my
> > > understanding you would get the exact level of durability and security
> > from
> > > using either of those settings. However, I suspect this is not quite
> the
> > > case from finding hints without proper explanation that acks=all is
> > > preferred.
> > >
> > >
> > > Regards
> > > Andreas
> >
>

Re: What is the benefit of using acks=all and minover e.g. acks=3

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Hi Gwen,

How about min.isr.replicas property?
Is it still valid in the new version 0.9 ?

We could get 3 out of 4 replicas in sync if we set it's value to 3. Correct?

Thanks,
Prabhjot
On Nov 28, 2015 10:20 AM, "Gwen Shapira" <gw...@confluent.io> wrote:

> In your scenario, you are receiving acks from 3 replicas while it is
> possible to have 4 in the ISR. This means that one replica can be up to
> 4000 messages (by default) behind others. If a leader crashes, there is 33%
> chance this replica will become the new leader, thereby losing up to 4000
> messages.
>
> acks = all requires all ISR to ack as long as they are in the ISR,
> protecting you from this scenario (but leading to high latency if a replica
> is hanging and is just about to drop out of the ISR).
>
> Also, note that in future versions acks > 1 was deprecated, to protect
> against such subtle mistakes.
>
> Gwen
>
> On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
> andreas.flinck@digitalroute.com> wrote:
>
> > Hi all
> >
> > The reason why I need to know is that we have seen an issue when using
> > acks=all, forcing us to quickly find an alternative. I leave the issue
> out
> > of this post, but will probably come back to that!
> >
> > My question is about acks=all and min.insync.replicas property. Since we
> > have found a workaround for an issue by using acks>1 instead of all
> > (absolutely no clue why at this moment), I would like to know what
> benefit
> > you get from e.g. acks=all and min.insync.replicas=3 instead of using
> > acks=3 in a 5 broker cluster and replication-factor of 4. To my
> > understanding you would get the exact level of durability and security
> from
> > using either of those settings. However, I suspect this is not quite the
> > case from finding hints without proper explanation that acks=all is
> > preferred.
> >
> >
> > Regards
> > Andreas
>

Re: What is the benefit of using acks=all and minover e.g. acks=3

Posted by Gwen Shapira <gw...@confluent.io>.
In your scenario, you are receiving acks from 3 replicas while it is
possible to have 4 in the ISR. This means that one replica can be up to
4000 messages (by default) behind others. If a leader crashes, there is 33%
chance this replica will become the new leader, thereby losing up to 4000
messages.

acks = all requires all ISR to ack as long as they are in the ISR,
protecting you from this scenario (but leading to high latency if a replica
is hanging and is just about to drop out of the ISR).

Also, note that in future versions acks > 1 was deprecated, to protect
against such subtle mistakes.

Gwen

On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
andreas.flinck@digitalroute.com> wrote:

> Hi all
>
> The reason why I need to know is that we have seen an issue when using
> acks=all, forcing us to quickly find an alternative. I leave the issue out
> of this post, but will probably come back to that!
>
> My question is about acks=all and min.insync.replicas property. Since we
> have found a workaround for an issue by using acks>1 instead of all
> (absolutely no clue why at this moment), I would like to know what benefit
> you get from e.g. acks=all and min.insync.replicas=3 instead of using
> acks=3 in a 5 broker cluster and replication-factor of 4. To my
> understanding you would get the exact level of durability and security from
> using either of those settings. However, I suspect this is not quite the
> case from finding hints without proper explanation that acks=all is
> preferred.
>
>
> Regards
> Andreas