You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Gwen Shapira <gs...@cloudera.com> on 2015/07/19 07:50:58 UTC

New Producer and "acks" configuration

Hi,

I was looking into the different between acks = 0 and acks = 1 in the
new producer, and was a bit surprised at what I found.

Basically, if I understand correctly, the only difference is that with
acks = 0, if the leader fails to append locally, it closes the network
connection silently and with acks = 1, it sends an actual error
message.

Which seems to mean that with acks = 0, any failed produce will lead
to metadata refresh and a retry (because network error), while acks =
1 will lead to either retries or abort, depending on the error.

Not only this doesn't match the documentation, it doesn't even make
much sense...
"acks = 0" was supposed to somehow makes things "less safe but
faster", and it doesn't seem to be doing that any more. I'm not even
sure there's any case where the "acks = 0" behavior is desirable.

Is it my misunderstanding, or did we somehow screw up the logic here?

Gwen

Re: New Producer and "acks" configuration

Posted by Gwen Shapira <gs...@cloudera.com>.
Would also be nice if max inflights requests was documented :)
https://issues.apache.org/jira/browse/KAFKA-2255

This is one of those things that would be nice to mention in docs....

On Mon, Jul 27, 2015 at 10:57 AM, Ewen Cheslack-Postava
<ew...@confluent.io> wrote:
> If only we had some sort of system test framework with a producer
> performance test that we could parameterize with the different acks
> settings to validate these performance differences...
>
> wrt out of order: yes, with > 1 in flight requests with retries, messages
> can get out of order. Becket had a great presentation addressing that and a
> bunch of other issues with no data loss pipelines:
> http://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka-49753844
> Short version: as things are today, you have to *really* understand the
> producer settings, and some producer internals, to get the exact behavior
> you want.
>
>
> On Mon, Jul 27, 2015 at 9:44 AM, Gwen Shapira <gs...@cloudera.com> wrote:
>
>> Yeah, using acks=0 should result in higher throughput since we are not
>> limited by the roundtrip time to the broker.
>>
>> Btw. regarding in-flight requests: With acks = 1 (or -1), can we send
>> a message batch to a partition before the brokers "acked" a previous
>> request? Doesn't it risk getting messages out of order?
>>
>> On Mon, Jul 27, 2015 at 9:41 AM, Guozhang Wang <wa...@gmail.com> wrote:
>> > I think there is still a subtle difference between "async with acks = 0"
>> > and "async with callback", that when the #.max-inflight-requests has
>> > reached the subsequent requests cannot be sent until previous responses
>> are
>> > returned (which could happen, for example, when the broker is slow /
>> > network issue happens) in the second case but not in the first.
>> >
>> > Given this difference, I feel there are still scenarios, though probably
>> > rare, that users would like to use "acks = 0" even with new producer's
>> > callbacks.
>> >
>> > Guozhang
>> >
>> > On Mon, Jul 27, 2015 at 9:25 AM, Mayuresh Gharat <
>> gharatmayuresh15@gmail.com
>> >> wrote:
>> >
>> >> So basically this means that with acks = 0, their is no guarantee that
>> the
>> >> message has been received by Kafka broker. I am just wondering, why
>> would
>> >> anyone be using acks = 0, since anyone using kafka and doing
>> >> producer.send() would want that, their message got to kafka brokers.
>> Also
>> >> as Jay said, with new producer with async mode, clients will not have to
>> >> wait for the response since it will be handled in callbacks. So the use
>> of
>> >> acks = 0 sounds very rare to me and I am not able to think of an usecase
>> >> around it.
>> >>
>> >> Thanks,
>> >>
>> >> Mayuresh
>> >>
>> >> On Sun, Jul 26, 2015 at 2:40 PM, Gwen Shapira <gs...@cloudera.com>
>> >> wrote:
>> >>
>> >> > Aha! Yes, I was missing the part with the dummy response.
>> >> > Thank you!
>> >> >
>> >> > Gwen
>> >> >
>> >> >
>> >> > On Sun, Jul 26, 2015 at 2:17 PM, Ewen Cheslack-Postava
>> >> > <ew...@confluent.io> wrote:
>> >> > > It's different because it changes whether the client waits for the
>> >> > response
>> >> > > from the broker at all. Take a look at
>> >> > NetworkClient.handleCompletedSends,
>> >> > > which fills in dummy responses when a response is not expected (and
>> >> that
>> >> > > flag gets set via Sender.produceRequest using acks != 0 as a flag to
>> >> > > ClientRequest). This means that the producer will invoke the
>> callback &
>> >> > > resolve the future as soon as the request hits the TCP buffer on the
>> >> > > client. At that point, the behavior of the broker wrt generating a
>> >> > response
>> >> > > doesn't matter -- the client isn't waiting on that response anyway.
>> >> > >
>> >> > > This definitely is faster since you aren't waiting for the round
>> trip,
>> >> > but
>> >> > > it seems like it is of questionable value with the new producer as
>> Jay
>> >> > > explained. It is slightly better than just assuming records have
>> been
>> >> > sent
>> >> > > as soon as you call Producer.send() in this shouldn't trigger a
>> >> callback
>> >> > > until the records have made it through the internal KafkaProducer
>> >> > > buffering. But since it still has to make it through the TCP
>> buffers it
>> >> > > doesn't really guarantee anything that useful.
>> >> > >
>> >> > > -Ewen
>> >> > >
>> >> > >
>> >> > > On Sun, Jul 26, 2015 at 1:40 PM, Gwen Shapira <
>> gshapira@cloudera.com>
>> >> > wrote:
>> >> > >
>> >> > >> What bugs me is that even with acks = 0, the broker will append to
>> >> > >> local log before responding (unless I'm misreading the code), so I
>> >> > >> don't see why a client with acks = 0 will be any faster. Unless the
>> >> > >> client chooses to not wait for response, which is orthogonal to
>> acks
>> >> > >> parameter.
>> >> > >>
>> >> > >> On Mon, Jul 20, 2015 at 8:52 AM, Jay Kreps <ja...@confluent.io>
>> wrote:
>> >> > >> > acks=0 is a one-way send, the client doesn't need to wait on the
>> >> > >> response.
>> >> > >> > Whether this is useful sort of depends on the client
>> implementation.
>> >> > The
>> >> > >> > new java producer does all sends async so "waiting" on a response
>> >> > isn't
>> >> > >> > really a thing. For a client that lacks this, though, as some of
>> >> them
>> >> > do,
>> >> > >> > acks=0 will be a lot faster.
>> >> > >> >
>> >> > >> > It also makes some sense in terms of what is completed when the
>> >> > request
>> >> > >> is
>> >> > >> > considered satisfied
>> >> > >> >   acks = 0 - message is written to the network (buffer)
>> >> > >> >   acks = 1 - message is written to the leader log
>> >> > >> >   acks = -1 - message is committed
>> >> > >> >
>> >> > >> > -Jay
>> >> > >> >
>> >> > >> > On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <
>> >> gshapira@cloudera.com
>> >> > >
>> >> > >> > wrote:
>> >> > >> >
>> >> > >> >> Hi,
>> >> > >> >>
>> >> > >> >> I was looking into the different between acks = 0 and acks = 1
>> in
>> >> the
>> >> > >> >> new producer, and was a bit surprised at what I found.
>> >> > >> >>
>> >> > >> >> Basically, if I understand correctly, the only difference is
>> that
>> >> > with
>> >> > >> >> acks = 0, if the leader fails to append locally, it closes the
>> >> > network
>> >> > >> >> connection silently and with acks = 1, it sends an actual error
>> >> > >> >> message.
>> >> > >> >>
>> >> > >> >> Which seems to mean that with acks = 0, any failed produce will
>> >> lead
>> >> > >> >> to metadata refresh and a retry (because network error), while
>> >> acks =
>> >> > >> >> 1 will lead to either retries or abort, depending on the error.
>> >> > >> >>
>> >> > >> >> Not only this doesn't match the documentation, it doesn't even
>> make
>> >> > >> >> much sense...
>> >> > >> >> "acks = 0" was supposed to somehow makes things "less safe but
>> >> > >> >> faster", and it doesn't seem to be doing that any more. I'm not
>> >> even
>> >> > >> >> sure there's any case where the "acks = 0" behavior is
>> desirable.
>> >> > >> >>
>> >> > >> >> Is it my misunderstanding, or did we somehow screw up the logic
>> >> here?
>> >> > >> >>
>> >> > >> >> Gwen
>> >> > >> >>
>> >> > >>
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Thanks,
>> >> > > Ewen
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> -Regards,
>> >> Mayuresh R. Gharat
>> >> (862) 250-7125
>> >>
>> >
>> >
>> >
>> > --
>> > -- Guozhang
>>
>
>
>
> --
> Thanks,
> Ewen

Re: New Producer and "acks" configuration

Posted by Gwen Shapira <gs...@cloudera.com>.
But on the server side, the code path for acks = 0 and acks = 1 is the
same... its up to the client to implement acks = 0 and acks = 1
correctly (the way our java client fakes a response for acks = 0).
So its not really a server feature, more of a protocol feature that
needs to be implemented correctly by clients.

On Tue, Jul 28, 2015 at 11:40 AM, Jay Kreps <ja...@confluent.io> wrote:
> For the new java producer this doesn't result in a big perf difference (at
> least in my testing) so there isn't a good reason to use acks=0. However
> this is a protocol and server feature, not a client feature. For the scala
> client and many of the other blocking clients the perf difference is quite
> substantial since the latency of waiting for the ack blocks the client and
> reduces throughput.
>
> -Jay
>
> On Tue, Jul 28, 2015 at 10:21 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
> wrote:
>
>> I seems to me that performance wide, acks=1 and acks=0 will be pretty much
>> the same if max.in.flight.request.per.connection is set to very high and
>> does not cause a request sending to block.
>>
>> In new producer, if there are send failure from time to time, I would guess
>> asks=1 would even have better performance than acks=0. Because in acks=0,
>> when error occurred, broker will disconnect the connection. In that case,
>> subsequent send from the producer needs to reconnect to the broker, that
>> means it has to go through 3-way handshake, TCP slow-start, etc, etc. On
>> the other hand, acks=1 will not have this issue.
>>
>> Maybe the major difference is still the delivery guarantee? acks=0 means
>> send and forget while acks=1 means user still want to know if the messages
>> were sent successfully or not.
>>
>> On Mon, Jul 27, 2015 at 10:57 AM, Ewen Cheslack-Postava <ewen@confluent.io
>> >
>> wrote:
>>
>> > If only we had some sort of system test framework with a producer
>> > performance test that we could parameterize with the different acks
>> > settings to validate these performance differences...
>> >
>> > wrt out of order: yes, with > 1 in flight requests with retries, messages
>> > can get out of order. Becket had a great presentation addressing that
>> and a
>> > bunch of other issues with no data loss pipelines:
>> >
>> >
>> http://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka-49753844
>> > Short version: as things are today, you have to *really* understand the
>> > producer settings, and some producer internals, to get the exact behavior
>> > you want.
>> >
>> >
>> > On Mon, Jul 27, 2015 at 9:44 AM, Gwen Shapira <gs...@cloudera.com>
>> > wrote:
>> >
>> > > Yeah, using acks=0 should result in higher throughput since we are not
>> > > limited by the roundtrip time to the broker.
>> > >
>> > > Btw. regarding in-flight requests: With acks = 1 (or -1), can we send
>> > > a message batch to a partition before the brokers "acked" a previous
>> > > request? Doesn't it risk getting messages out of order?
>> > >
>> > > On Mon, Jul 27, 2015 at 9:41 AM, Guozhang Wang <wa...@gmail.com>
>> > wrote:
>> > > > I think there is still a subtle difference between "async with acks =
>> > 0"
>> > > > and "async with callback", that when the #.max-inflight-requests has
>> > > > reached the subsequent requests cannot be sent until previous
>> responses
>> > > are
>> > > > returned (which could happen, for example, when the broker is slow /
>> > > > network issue happens) in the second case but not in the first.
>> > > >
>> > > > Given this difference, I feel there are still scenarios, though
>> > probably
>> > > > rare, that users would like to use "acks = 0" even with new
>> producer's
>> > > > callbacks.
>> > > >
>> > > > Guozhang
>> > > >
>> > > > On Mon, Jul 27, 2015 at 9:25 AM, Mayuresh Gharat <
>> > > gharatmayuresh15@gmail.com
>> > > >> wrote:
>> > > >
>> > > >> So basically this means that with acks = 0, their is no guarantee
>> that
>> > > the
>> > > >> message has been received by Kafka broker. I am just wondering, why
>> > > would
>> > > >> anyone be using acks = 0, since anyone using kafka and doing
>> > > >> producer.send() would want that, their message got to kafka brokers.
>> > > Also
>> > > >> as Jay said, with new producer with async mode, clients will not
>> have
>> > to
>> > > >> wait for the response since it will be handled in callbacks. So the
>> > use
>> > > of
>> > > >> acks = 0 sounds very rare to me and I am not able to think of an
>> > usecase
>> > > >> around it.
>> > > >>
>> > > >> Thanks,
>> > > >>
>> > > >> Mayuresh
>> > > >>
>> > > >> On Sun, Jul 26, 2015 at 2:40 PM, Gwen Shapira <
>> gshapira@cloudera.com>
>> > > >> wrote:
>> > > >>
>> > > >> > Aha! Yes, I was missing the part with the dummy response.
>> > > >> > Thank you!
>> > > >> >
>> > > >> > Gwen
>> > > >> >
>> > > >> >
>> > > >> > On Sun, Jul 26, 2015 at 2:17 PM, Ewen Cheslack-Postava
>> > > >> > <ew...@confluent.io> wrote:
>> > > >> > > It's different because it changes whether the client waits for
>> the
>> > > >> > response
>> > > >> > > from the broker at all. Take a look at
>> > > >> > NetworkClient.handleCompletedSends,
>> > > >> > > which fills in dummy responses when a response is not expected
>> > (and
>> > > >> that
>> > > >> > > flag gets set via Sender.produceRequest using acks != 0 as a
>> flag
>> > to
>> > > >> > > ClientRequest). This means that the producer will invoke the
>> > > callback &
>> > > >> > > resolve the future as soon as the request hits the TCP buffer on
>> > the
>> > > >> > > client. At that point, the behavior of the broker wrt
>> generating a
>> > > >> > response
>> > > >> > > doesn't matter -- the client isn't waiting on that response
>> > anyway.
>> > > >> > >
>> > > >> > > This definitely is faster since you aren't waiting for the round
>> > > trip,
>> > > >> > but
>> > > >> > > it seems like it is of questionable value with the new producer
>> as
>> > > Jay
>> > > >> > > explained. It is slightly better than just assuming records have
>> > > been
>> > > >> > sent
>> > > >> > > as soon as you call Producer.send() in this shouldn't trigger a
>> > > >> callback
>> > > >> > > until the records have made it through the internal
>> KafkaProducer
>> > > >> > > buffering. But since it still has to make it through the TCP
>> > > buffers it
>> > > >> > > doesn't really guarantee anything that useful.
>> > > >> > >
>> > > >> > > -Ewen
>> > > >> > >
>> > > >> > >
>> > > >> > > On Sun, Jul 26, 2015 at 1:40 PM, Gwen Shapira <
>> > > gshapira@cloudera.com>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > >> What bugs me is that even with acks = 0, the broker will append
>> > to
>> > > >> > >> local log before responding (unless I'm misreading the code),
>> so
>> > I
>> > > >> > >> don't see why a client with acks = 0 will be any faster. Unless
>> > the
>> > > >> > >> client chooses to not wait for response, which is orthogonal to
>> > > acks
>> > > >> > >> parameter.
>> > > >> > >>
>> > > >> > >> On Mon, Jul 20, 2015 at 8:52 AM, Jay Kreps <ja...@confluent.io>
>> > > wrote:
>> > > >> > >> > acks=0 is a one-way send, the client doesn't need to wait on
>> > the
>> > > >> > >> response.
>> > > >> > >> > Whether this is useful sort of depends on the client
>> > > implementation.
>> > > >> > The
>> > > >> > >> > new java producer does all sends async so "waiting" on a
>> > response
>> > > >> > isn't
>> > > >> > >> > really a thing. For a client that lacks this, though, as some
>> > of
>> > > >> them
>> > > >> > do,
>> > > >> > >> > acks=0 will be a lot faster.
>> > > >> > >> >
>> > > >> > >> > It also makes some sense in terms of what is completed when
>> the
>> > > >> > request
>> > > >> > >> is
>> > > >> > >> > considered satisfied
>> > > >> > >> >   acks = 0 - message is written to the network (buffer)
>> > > >> > >> >   acks = 1 - message is written to the leader log
>> > > >> > >> >   acks = -1 - message is committed
>> > > >> > >> >
>> > > >> > >> > -Jay
>> > > >> > >> >
>> > > >> > >> > On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <
>> > > >> gshapira@cloudera.com
>> > > >> > >
>> > > >> > >> > wrote:
>> > > >> > >> >
>> > > >> > >> >> Hi,
>> > > >> > >> >>
>> > > >> > >> >> I was looking into the different between acks = 0 and acks
>> = 1
>> > > in
>> > > >> the
>> > > >> > >> >> new producer, and was a bit surprised at what I found.
>> > > >> > >> >>
>> > > >> > >> >> Basically, if I understand correctly, the only difference is
>> > > that
>> > > >> > with
>> > > >> > >> >> acks = 0, if the leader fails to append locally, it closes
>> the
>> > > >> > network
>> > > >> > >> >> connection silently and with acks = 1, it sends an actual
>> > error
>> > > >> > >> >> message.
>> > > >> > >> >>
>> > > >> > >> >> Which seems to mean that with acks = 0, any failed produce
>> > will
>> > > >> lead
>> > > >> > >> >> to metadata refresh and a retry (because network error),
>> while
>> > > >> acks =
>> > > >> > >> >> 1 will lead to either retries or abort, depending on the
>> > error.
>> > > >> > >> >>
>> > > >> > >> >> Not only this doesn't match the documentation, it doesn't
>> even
>> > > make
>> > > >> > >> >> much sense...
>> > > >> > >> >> "acks = 0" was supposed to somehow makes things "less safe
>> but
>> > > >> > >> >> faster", and it doesn't seem to be doing that any more. I'm
>> > not
>> > > >> even
>> > > >> > >> >> sure there's any case where the "acks = 0" behavior is
>> > > desirable.
>> > > >> > >> >>
>> > > >> > >> >> Is it my misunderstanding, or did we somehow screw up the
>> > logic
>> > > >> here?
>> > > >> > >> >>
>> > > >> > >> >> Gwen
>> > > >> > >> >>
>> > > >> > >>
>> > > >> > >
>> > > >> > >
>> > > >> > >
>> > > >> > > --
>> > > >> > > Thanks,
>> > > >> > > Ewen
>> > > >> >
>> > > >>
>> > > >>
>> > > >>
>> > > >> --
>> > > >> -Regards,
>> > > >> Mayuresh R. Gharat
>> > > >> (862) 250-7125
>> > > >>
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > -- Guozhang
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks,
>> > Ewen
>> >
>>

Re: New Producer and "acks" configuration

Posted by Jay Kreps <ja...@confluent.io>.
For the new java producer this doesn't result in a big perf difference (at
least in my testing) so there isn't a good reason to use acks=0. However
this is a protocol and server feature, not a client feature. For the scala
client and many of the other blocking clients the perf difference is quite
substantial since the latency of waiting for the ack blocks the client and
reduces throughput.

-Jay

On Tue, Jul 28, 2015 at 10:21 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> I seems to me that performance wide, acks=1 and acks=0 will be pretty much
> the same if max.in.flight.request.per.connection is set to very high and
> does not cause a request sending to block.
>
> In new producer, if there are send failure from time to time, I would guess
> asks=1 would even have better performance than acks=0. Because in acks=0,
> when error occurred, broker will disconnect the connection. In that case,
> subsequent send from the producer needs to reconnect to the broker, that
> means it has to go through 3-way handshake, TCP slow-start, etc, etc. On
> the other hand, acks=1 will not have this issue.
>
> Maybe the major difference is still the delivery guarantee? acks=0 means
> send and forget while acks=1 means user still want to know if the messages
> were sent successfully or not.
>
> On Mon, Jul 27, 2015 at 10:57 AM, Ewen Cheslack-Postava <ewen@confluent.io
> >
> wrote:
>
> > If only we had some sort of system test framework with a producer
> > performance test that we could parameterize with the different acks
> > settings to validate these performance differences...
> >
> > wrt out of order: yes, with > 1 in flight requests with retries, messages
> > can get out of order. Becket had a great presentation addressing that
> and a
> > bunch of other issues with no data loss pipelines:
> >
> >
> http://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka-49753844
> > Short version: as things are today, you have to *really* understand the
> > producer settings, and some producer internals, to get the exact behavior
> > you want.
> >
> >
> > On Mon, Jul 27, 2015 at 9:44 AM, Gwen Shapira <gs...@cloudera.com>
> > wrote:
> >
> > > Yeah, using acks=0 should result in higher throughput since we are not
> > > limited by the roundtrip time to the broker.
> > >
> > > Btw. regarding in-flight requests: With acks = 1 (or -1), can we send
> > > a message batch to a partition before the brokers "acked" a previous
> > > request? Doesn't it risk getting messages out of order?
> > >
> > > On Mon, Jul 27, 2015 at 9:41 AM, Guozhang Wang <wa...@gmail.com>
> > wrote:
> > > > I think there is still a subtle difference between "async with acks =
> > 0"
> > > > and "async with callback", that when the #.max-inflight-requests has
> > > > reached the subsequent requests cannot be sent until previous
> responses
> > > are
> > > > returned (which could happen, for example, when the broker is slow /
> > > > network issue happens) in the second case but not in the first.
> > > >
> > > > Given this difference, I feel there are still scenarios, though
> > probably
> > > > rare, that users would like to use "acks = 0" even with new
> producer's
> > > > callbacks.
> > > >
> > > > Guozhang
> > > >
> > > > On Mon, Jul 27, 2015 at 9:25 AM, Mayuresh Gharat <
> > > gharatmayuresh15@gmail.com
> > > >> wrote:
> > > >
> > > >> So basically this means that with acks = 0, their is no guarantee
> that
> > > the
> > > >> message has been received by Kafka broker. I am just wondering, why
> > > would
> > > >> anyone be using acks = 0, since anyone using kafka and doing
> > > >> producer.send() would want that, their message got to kafka brokers.
> > > Also
> > > >> as Jay said, with new producer with async mode, clients will not
> have
> > to
> > > >> wait for the response since it will be handled in callbacks. So the
> > use
> > > of
> > > >> acks = 0 sounds very rare to me and I am not able to think of an
> > usecase
> > > >> around it.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Mayuresh
> > > >>
> > > >> On Sun, Jul 26, 2015 at 2:40 PM, Gwen Shapira <
> gshapira@cloudera.com>
> > > >> wrote:
> > > >>
> > > >> > Aha! Yes, I was missing the part with the dummy response.
> > > >> > Thank you!
> > > >> >
> > > >> > Gwen
> > > >> >
> > > >> >
> > > >> > On Sun, Jul 26, 2015 at 2:17 PM, Ewen Cheslack-Postava
> > > >> > <ew...@confluent.io> wrote:
> > > >> > > It's different because it changes whether the client waits for
> the
> > > >> > response
> > > >> > > from the broker at all. Take a look at
> > > >> > NetworkClient.handleCompletedSends,
> > > >> > > which fills in dummy responses when a response is not expected
> > (and
> > > >> that
> > > >> > > flag gets set via Sender.produceRequest using acks != 0 as a
> flag
> > to
> > > >> > > ClientRequest). This means that the producer will invoke the
> > > callback &
> > > >> > > resolve the future as soon as the request hits the TCP buffer on
> > the
> > > >> > > client. At that point, the behavior of the broker wrt
> generating a
> > > >> > response
> > > >> > > doesn't matter -- the client isn't waiting on that response
> > anyway.
> > > >> > >
> > > >> > > This definitely is faster since you aren't waiting for the round
> > > trip,
> > > >> > but
> > > >> > > it seems like it is of questionable value with the new producer
> as
> > > Jay
> > > >> > > explained. It is slightly better than just assuming records have
> > > been
> > > >> > sent
> > > >> > > as soon as you call Producer.send() in this shouldn't trigger a
> > > >> callback
> > > >> > > until the records have made it through the internal
> KafkaProducer
> > > >> > > buffering. But since it still has to make it through the TCP
> > > buffers it
> > > >> > > doesn't really guarantee anything that useful.
> > > >> > >
> > > >> > > -Ewen
> > > >> > >
> > > >> > >
> > > >> > > On Sun, Jul 26, 2015 at 1:40 PM, Gwen Shapira <
> > > gshapira@cloudera.com>
> > > >> > wrote:
> > > >> > >
> > > >> > >> What bugs me is that even with acks = 0, the broker will append
> > to
> > > >> > >> local log before responding (unless I'm misreading the code),
> so
> > I
> > > >> > >> don't see why a client with acks = 0 will be any faster. Unless
> > the
> > > >> > >> client chooses to not wait for response, which is orthogonal to
> > > acks
> > > >> > >> parameter.
> > > >> > >>
> > > >> > >> On Mon, Jul 20, 2015 at 8:52 AM, Jay Kreps <ja...@confluent.io>
> > > wrote:
> > > >> > >> > acks=0 is a one-way send, the client doesn't need to wait on
> > the
> > > >> > >> response.
> > > >> > >> > Whether this is useful sort of depends on the client
> > > implementation.
> > > >> > The
> > > >> > >> > new java producer does all sends async so "waiting" on a
> > response
> > > >> > isn't
> > > >> > >> > really a thing. For a client that lacks this, though, as some
> > of
> > > >> them
> > > >> > do,
> > > >> > >> > acks=0 will be a lot faster.
> > > >> > >> >
> > > >> > >> > It also makes some sense in terms of what is completed when
> the
> > > >> > request
> > > >> > >> is
> > > >> > >> > considered satisfied
> > > >> > >> >   acks = 0 - message is written to the network (buffer)
> > > >> > >> >   acks = 1 - message is written to the leader log
> > > >> > >> >   acks = -1 - message is committed
> > > >> > >> >
> > > >> > >> > -Jay
> > > >> > >> >
> > > >> > >> > On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <
> > > >> gshapira@cloudera.com
> > > >> > >
> > > >> > >> > wrote:
> > > >> > >> >
> > > >> > >> >> Hi,
> > > >> > >> >>
> > > >> > >> >> I was looking into the different between acks = 0 and acks
> = 1
> > > in
> > > >> the
> > > >> > >> >> new producer, and was a bit surprised at what I found.
> > > >> > >> >>
> > > >> > >> >> Basically, if I understand correctly, the only difference is
> > > that
> > > >> > with
> > > >> > >> >> acks = 0, if the leader fails to append locally, it closes
> the
> > > >> > network
> > > >> > >> >> connection silently and with acks = 1, it sends an actual
> > error
> > > >> > >> >> message.
> > > >> > >> >>
> > > >> > >> >> Which seems to mean that with acks = 0, any failed produce
> > will
> > > >> lead
> > > >> > >> >> to metadata refresh and a retry (because network error),
> while
> > > >> acks =
> > > >> > >> >> 1 will lead to either retries or abort, depending on the
> > error.
> > > >> > >> >>
> > > >> > >> >> Not only this doesn't match the documentation, it doesn't
> even
> > > make
> > > >> > >> >> much sense...
> > > >> > >> >> "acks = 0" was supposed to somehow makes things "less safe
> but
> > > >> > >> >> faster", and it doesn't seem to be doing that any more. I'm
> > not
> > > >> even
> > > >> > >> >> sure there's any case where the "acks = 0" behavior is
> > > desirable.
> > > >> > >> >>
> > > >> > >> >> Is it my misunderstanding, or did we somehow screw up the
> > logic
> > > >> here?
> > > >> > >> >>
> > > >> > >> >> Gwen
> > > >> > >> >>
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Thanks,
> > > >> > > Ewen
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> -Regards,
> > > >> Mayuresh R. Gharat
> > > >> (862) 250-7125
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>

Re: New Producer and "acks" configuration

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.
I seems to me that performance wide, acks=1 and acks=0 will be pretty much
the same if max.in.flight.request.per.connection is set to very high and
does not cause a request sending to block.

In new producer, if there are send failure from time to time, I would guess
asks=1 would even have better performance than acks=0. Because in acks=0,
when error occurred, broker will disconnect the connection. In that case,
subsequent send from the producer needs to reconnect to the broker, that
means it has to go through 3-way handshake, TCP slow-start, etc, etc. On
the other hand, acks=1 will not have this issue.

Maybe the major difference is still the delivery guarantee? acks=0 means
send and forget while acks=1 means user still want to know if the messages
were sent successfully or not.

On Mon, Jul 27, 2015 at 10:57 AM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> If only we had some sort of system test framework with a producer
> performance test that we could parameterize with the different acks
> settings to validate these performance differences...
>
> wrt out of order: yes, with > 1 in flight requests with retries, messages
> can get out of order. Becket had a great presentation addressing that and a
> bunch of other issues with no data loss pipelines:
>
> http://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka-49753844
> Short version: as things are today, you have to *really* understand the
> producer settings, and some producer internals, to get the exact behavior
> you want.
>
>
> On Mon, Jul 27, 2015 at 9:44 AM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
> > Yeah, using acks=0 should result in higher throughput since we are not
> > limited by the roundtrip time to the broker.
> >
> > Btw. regarding in-flight requests: With acks = 1 (or -1), can we send
> > a message batch to a partition before the brokers "acked" a previous
> > request? Doesn't it risk getting messages out of order?
> >
> > On Mon, Jul 27, 2015 at 9:41 AM, Guozhang Wang <wa...@gmail.com>
> wrote:
> > > I think there is still a subtle difference between "async with acks =
> 0"
> > > and "async with callback", that when the #.max-inflight-requests has
> > > reached the subsequent requests cannot be sent until previous responses
> > are
> > > returned (which could happen, for example, when the broker is slow /
> > > network issue happens) in the second case but not in the first.
> > >
> > > Given this difference, I feel there are still scenarios, though
> probably
> > > rare, that users would like to use "acks = 0" even with new producer's
> > > callbacks.
> > >
> > > Guozhang
> > >
> > > On Mon, Jul 27, 2015 at 9:25 AM, Mayuresh Gharat <
> > gharatmayuresh15@gmail.com
> > >> wrote:
> > >
> > >> So basically this means that with acks = 0, their is no guarantee that
> > the
> > >> message has been received by Kafka broker. I am just wondering, why
> > would
> > >> anyone be using acks = 0, since anyone using kafka and doing
> > >> producer.send() would want that, their message got to kafka brokers.
> > Also
> > >> as Jay said, with new producer with async mode, clients will not have
> to
> > >> wait for the response since it will be handled in callbacks. So the
> use
> > of
> > >> acks = 0 sounds very rare to me and I am not able to think of an
> usecase
> > >> around it.
> > >>
> > >> Thanks,
> > >>
> > >> Mayuresh
> > >>
> > >> On Sun, Jul 26, 2015 at 2:40 PM, Gwen Shapira <gs...@cloudera.com>
> > >> wrote:
> > >>
> > >> > Aha! Yes, I was missing the part with the dummy response.
> > >> > Thank you!
> > >> >
> > >> > Gwen
> > >> >
> > >> >
> > >> > On Sun, Jul 26, 2015 at 2:17 PM, Ewen Cheslack-Postava
> > >> > <ew...@confluent.io> wrote:
> > >> > > It's different because it changes whether the client waits for the
> > >> > response
> > >> > > from the broker at all. Take a look at
> > >> > NetworkClient.handleCompletedSends,
> > >> > > which fills in dummy responses when a response is not expected
> (and
> > >> that
> > >> > > flag gets set via Sender.produceRequest using acks != 0 as a flag
> to
> > >> > > ClientRequest). This means that the producer will invoke the
> > callback &
> > >> > > resolve the future as soon as the request hits the TCP buffer on
> the
> > >> > > client. At that point, the behavior of the broker wrt generating a
> > >> > response
> > >> > > doesn't matter -- the client isn't waiting on that response
> anyway.
> > >> > >
> > >> > > This definitely is faster since you aren't waiting for the round
> > trip,
> > >> > but
> > >> > > it seems like it is of questionable value with the new producer as
> > Jay
> > >> > > explained. It is slightly better than just assuming records have
> > been
> > >> > sent
> > >> > > as soon as you call Producer.send() in this shouldn't trigger a
> > >> callback
> > >> > > until the records have made it through the internal KafkaProducer
> > >> > > buffering. But since it still has to make it through the TCP
> > buffers it
> > >> > > doesn't really guarantee anything that useful.
> > >> > >
> > >> > > -Ewen
> > >> > >
> > >> > >
> > >> > > On Sun, Jul 26, 2015 at 1:40 PM, Gwen Shapira <
> > gshapira@cloudera.com>
> > >> > wrote:
> > >> > >
> > >> > >> What bugs me is that even with acks = 0, the broker will append
> to
> > >> > >> local log before responding (unless I'm misreading the code), so
> I
> > >> > >> don't see why a client with acks = 0 will be any faster. Unless
> the
> > >> > >> client chooses to not wait for response, which is orthogonal to
> > acks
> > >> > >> parameter.
> > >> > >>
> > >> > >> On Mon, Jul 20, 2015 at 8:52 AM, Jay Kreps <ja...@confluent.io>
> > wrote:
> > >> > >> > acks=0 is a one-way send, the client doesn't need to wait on
> the
> > >> > >> response.
> > >> > >> > Whether this is useful sort of depends on the client
> > implementation.
> > >> > The
> > >> > >> > new java producer does all sends async so "waiting" on a
> response
> > >> > isn't
> > >> > >> > really a thing. For a client that lacks this, though, as some
> of
> > >> them
> > >> > do,
> > >> > >> > acks=0 will be a lot faster.
> > >> > >> >
> > >> > >> > It also makes some sense in terms of what is completed when the
> > >> > request
> > >> > >> is
> > >> > >> > considered satisfied
> > >> > >> >   acks = 0 - message is written to the network (buffer)
> > >> > >> >   acks = 1 - message is written to the leader log
> > >> > >> >   acks = -1 - message is committed
> > >> > >> >
> > >> > >> > -Jay
> > >> > >> >
> > >> > >> > On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <
> > >> gshapira@cloudera.com
> > >> > >
> > >> > >> > wrote:
> > >> > >> >
> > >> > >> >> Hi,
> > >> > >> >>
> > >> > >> >> I was looking into the different between acks = 0 and acks = 1
> > in
> > >> the
> > >> > >> >> new producer, and was a bit surprised at what I found.
> > >> > >> >>
> > >> > >> >> Basically, if I understand correctly, the only difference is
> > that
> > >> > with
> > >> > >> >> acks = 0, if the leader fails to append locally, it closes the
> > >> > network
> > >> > >> >> connection silently and with acks = 1, it sends an actual
> error
> > >> > >> >> message.
> > >> > >> >>
> > >> > >> >> Which seems to mean that with acks = 0, any failed produce
> will
> > >> lead
> > >> > >> >> to metadata refresh and a retry (because network error), while
> > >> acks =
> > >> > >> >> 1 will lead to either retries or abort, depending on the
> error.
> > >> > >> >>
> > >> > >> >> Not only this doesn't match the documentation, it doesn't even
> > make
> > >> > >> >> much sense...
> > >> > >> >> "acks = 0" was supposed to somehow makes things "less safe but
> > >> > >> >> faster", and it doesn't seem to be doing that any more. I'm
> not
> > >> even
> > >> > >> >> sure there's any case where the "acks = 0" behavior is
> > desirable.
> > >> > >> >>
> > >> > >> >> Is it my misunderstanding, or did we somehow screw up the
> logic
> > >> here?
> > >> > >> >>
> > >> > >> >> Gwen
> > >> > >> >>
> > >> > >>
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Thanks,
> > >> > > Ewen
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> -Regards,
> > >> Mayuresh R. Gharat
> > >> (862) 250-7125
> > >>
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: New Producer and "acks" configuration

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
If only we had some sort of system test framework with a producer
performance test that we could parameterize with the different acks
settings to validate these performance differences...

wrt out of order: yes, with > 1 in flight requests with retries, messages
can get out of order. Becket had a great presentation addressing that and a
bunch of other issues with no data loss pipelines:
http://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka-49753844
Short version: as things are today, you have to *really* understand the
producer settings, and some producer internals, to get the exact behavior
you want.


On Mon, Jul 27, 2015 at 9:44 AM, Gwen Shapira <gs...@cloudera.com> wrote:

> Yeah, using acks=0 should result in higher throughput since we are not
> limited by the roundtrip time to the broker.
>
> Btw. regarding in-flight requests: With acks = 1 (or -1), can we send
> a message batch to a partition before the brokers "acked" a previous
> request? Doesn't it risk getting messages out of order?
>
> On Mon, Jul 27, 2015 at 9:41 AM, Guozhang Wang <wa...@gmail.com> wrote:
> > I think there is still a subtle difference between "async with acks = 0"
> > and "async with callback", that when the #.max-inflight-requests has
> > reached the subsequent requests cannot be sent until previous responses
> are
> > returned (which could happen, for example, when the broker is slow /
> > network issue happens) in the second case but not in the first.
> >
> > Given this difference, I feel there are still scenarios, though probably
> > rare, that users would like to use "acks = 0" even with new producer's
> > callbacks.
> >
> > Guozhang
> >
> > On Mon, Jul 27, 2015 at 9:25 AM, Mayuresh Gharat <
> gharatmayuresh15@gmail.com
> >> wrote:
> >
> >> So basically this means that with acks = 0, their is no guarantee that
> the
> >> message has been received by Kafka broker. I am just wondering, why
> would
> >> anyone be using acks = 0, since anyone using kafka and doing
> >> producer.send() would want that, their message got to kafka brokers.
> Also
> >> as Jay said, with new producer with async mode, clients will not have to
> >> wait for the response since it will be handled in callbacks. So the use
> of
> >> acks = 0 sounds very rare to me and I am not able to think of an usecase
> >> around it.
> >>
> >> Thanks,
> >>
> >> Mayuresh
> >>
> >> On Sun, Jul 26, 2015 at 2:40 PM, Gwen Shapira <gs...@cloudera.com>
> >> wrote:
> >>
> >> > Aha! Yes, I was missing the part with the dummy response.
> >> > Thank you!
> >> >
> >> > Gwen
> >> >
> >> >
> >> > On Sun, Jul 26, 2015 at 2:17 PM, Ewen Cheslack-Postava
> >> > <ew...@confluent.io> wrote:
> >> > > It's different because it changes whether the client waits for the
> >> > response
> >> > > from the broker at all. Take a look at
> >> > NetworkClient.handleCompletedSends,
> >> > > which fills in dummy responses when a response is not expected (and
> >> that
> >> > > flag gets set via Sender.produceRequest using acks != 0 as a flag to
> >> > > ClientRequest). This means that the producer will invoke the
> callback &
> >> > > resolve the future as soon as the request hits the TCP buffer on the
> >> > > client. At that point, the behavior of the broker wrt generating a
> >> > response
> >> > > doesn't matter -- the client isn't waiting on that response anyway.
> >> > >
> >> > > This definitely is faster since you aren't waiting for the round
> trip,
> >> > but
> >> > > it seems like it is of questionable value with the new producer as
> Jay
> >> > > explained. It is slightly better than just assuming records have
> been
> >> > sent
> >> > > as soon as you call Producer.send() in this shouldn't trigger a
> >> callback
> >> > > until the records have made it through the internal KafkaProducer
> >> > > buffering. But since it still has to make it through the TCP
> buffers it
> >> > > doesn't really guarantee anything that useful.
> >> > >
> >> > > -Ewen
> >> > >
> >> > >
> >> > > On Sun, Jul 26, 2015 at 1:40 PM, Gwen Shapira <
> gshapira@cloudera.com>
> >> > wrote:
> >> > >
> >> > >> What bugs me is that even with acks = 0, the broker will append to
> >> > >> local log before responding (unless I'm misreading the code), so I
> >> > >> don't see why a client with acks = 0 will be any faster. Unless the
> >> > >> client chooses to not wait for response, which is orthogonal to
> acks
> >> > >> parameter.
> >> > >>
> >> > >> On Mon, Jul 20, 2015 at 8:52 AM, Jay Kreps <ja...@confluent.io>
> wrote:
> >> > >> > acks=0 is a one-way send, the client doesn't need to wait on the
> >> > >> response.
> >> > >> > Whether this is useful sort of depends on the client
> implementation.
> >> > The
> >> > >> > new java producer does all sends async so "waiting" on a response
> >> > isn't
> >> > >> > really a thing. For a client that lacks this, though, as some of
> >> them
> >> > do,
> >> > >> > acks=0 will be a lot faster.
> >> > >> >
> >> > >> > It also makes some sense in terms of what is completed when the
> >> > request
> >> > >> is
> >> > >> > considered satisfied
> >> > >> >   acks = 0 - message is written to the network (buffer)
> >> > >> >   acks = 1 - message is written to the leader log
> >> > >> >   acks = -1 - message is committed
> >> > >> >
> >> > >> > -Jay
> >> > >> >
> >> > >> > On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <
> >> gshapira@cloudera.com
> >> > >
> >> > >> > wrote:
> >> > >> >
> >> > >> >> Hi,
> >> > >> >>
> >> > >> >> I was looking into the different between acks = 0 and acks = 1
> in
> >> the
> >> > >> >> new producer, and was a bit surprised at what I found.
> >> > >> >>
> >> > >> >> Basically, if I understand correctly, the only difference is
> that
> >> > with
> >> > >> >> acks = 0, if the leader fails to append locally, it closes the
> >> > network
> >> > >> >> connection silently and with acks = 1, it sends an actual error
> >> > >> >> message.
> >> > >> >>
> >> > >> >> Which seems to mean that with acks = 0, any failed produce will
> >> lead
> >> > >> >> to metadata refresh and a retry (because network error), while
> >> acks =
> >> > >> >> 1 will lead to either retries or abort, depending on the error.
> >> > >> >>
> >> > >> >> Not only this doesn't match the documentation, it doesn't even
> make
> >> > >> >> much sense...
> >> > >> >> "acks = 0" was supposed to somehow makes things "less safe but
> >> > >> >> faster", and it doesn't seem to be doing that any more. I'm not
> >> even
> >> > >> >> sure there's any case where the "acks = 0" behavior is
> desirable.
> >> > >> >>
> >> > >> >> Is it my misunderstanding, or did we somehow screw up the logic
> >> here?
> >> > >> >>
> >> > >> >> Gwen
> >> > >> >>
> >> > >>
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Thanks,
> >> > > Ewen
> >> >
> >>
> >>
> >>
> >> --
> >> -Regards,
> >> Mayuresh R. Gharat
> >> (862) 250-7125
> >>
> >
> >
> >
> > --
> > -- Guozhang
>



-- 
Thanks,
Ewen

Re: New Producer and "acks" configuration

Posted by Gwen Shapira <gs...@cloudera.com>.
Yeah, using acks=0 should result in higher throughput since we are not
limited by the roundtrip time to the broker.

Btw. regarding in-flight requests: With acks = 1 (or -1), can we send
a message batch to a partition before the brokers "acked" a previous
request? Doesn't it risk getting messages out of order?

On Mon, Jul 27, 2015 at 9:41 AM, Guozhang Wang <wa...@gmail.com> wrote:
> I think there is still a subtle difference between "async with acks = 0"
> and "async with callback", that when the #.max-inflight-requests has
> reached the subsequent requests cannot be sent until previous responses are
> returned (which could happen, for example, when the broker is slow /
> network issue happens) in the second case but not in the first.
>
> Given this difference, I feel there are still scenarios, though probably
> rare, that users would like to use "acks = 0" even with new producer's
> callbacks.
>
> Guozhang
>
> On Mon, Jul 27, 2015 at 9:25 AM, Mayuresh Gharat <gharatmayuresh15@gmail.com
>> wrote:
>
>> So basically this means that with acks = 0, their is no guarantee that the
>> message has been received by Kafka broker. I am just wondering, why would
>> anyone be using acks = 0, since anyone using kafka and doing
>> producer.send() would want that, their message got to kafka brokers. Also
>> as Jay said, with new producer with async mode, clients will not have to
>> wait for the response since it will be handled in callbacks. So the use of
>> acks = 0 sounds very rare to me and I am not able to think of an usecase
>> around it.
>>
>> Thanks,
>>
>> Mayuresh
>>
>> On Sun, Jul 26, 2015 at 2:40 PM, Gwen Shapira <gs...@cloudera.com>
>> wrote:
>>
>> > Aha! Yes, I was missing the part with the dummy response.
>> > Thank you!
>> >
>> > Gwen
>> >
>> >
>> > On Sun, Jul 26, 2015 at 2:17 PM, Ewen Cheslack-Postava
>> > <ew...@confluent.io> wrote:
>> > > It's different because it changes whether the client waits for the
>> > response
>> > > from the broker at all. Take a look at
>> > NetworkClient.handleCompletedSends,
>> > > which fills in dummy responses when a response is not expected (and
>> that
>> > > flag gets set via Sender.produceRequest using acks != 0 as a flag to
>> > > ClientRequest). This means that the producer will invoke the callback &
>> > > resolve the future as soon as the request hits the TCP buffer on the
>> > > client. At that point, the behavior of the broker wrt generating a
>> > response
>> > > doesn't matter -- the client isn't waiting on that response anyway.
>> > >
>> > > This definitely is faster since you aren't waiting for the round trip,
>> > but
>> > > it seems like it is of questionable value with the new producer as Jay
>> > > explained. It is slightly better than just assuming records have been
>> > sent
>> > > as soon as you call Producer.send() in this shouldn't trigger a
>> callback
>> > > until the records have made it through the internal KafkaProducer
>> > > buffering. But since it still has to make it through the TCP buffers it
>> > > doesn't really guarantee anything that useful.
>> > >
>> > > -Ewen
>> > >
>> > >
>> > > On Sun, Jul 26, 2015 at 1:40 PM, Gwen Shapira <gs...@cloudera.com>
>> > wrote:
>> > >
>> > >> What bugs me is that even with acks = 0, the broker will append to
>> > >> local log before responding (unless I'm misreading the code), so I
>> > >> don't see why a client with acks = 0 will be any faster. Unless the
>> > >> client chooses to not wait for response, which is orthogonal to acks
>> > >> parameter.
>> > >>
>> > >> On Mon, Jul 20, 2015 at 8:52 AM, Jay Kreps <ja...@confluent.io> wrote:
>> > >> > acks=0 is a one-way send, the client doesn't need to wait on the
>> > >> response.
>> > >> > Whether this is useful sort of depends on the client implementation.
>> > The
>> > >> > new java producer does all sends async so "waiting" on a response
>> > isn't
>> > >> > really a thing. For a client that lacks this, though, as some of
>> them
>> > do,
>> > >> > acks=0 will be a lot faster.
>> > >> >
>> > >> > It also makes some sense in terms of what is completed when the
>> > request
>> > >> is
>> > >> > considered satisfied
>> > >> >   acks = 0 - message is written to the network (buffer)
>> > >> >   acks = 1 - message is written to the leader log
>> > >> >   acks = -1 - message is committed
>> > >> >
>> > >> > -Jay
>> > >> >
>> > >> > On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <
>> gshapira@cloudera.com
>> > >
>> > >> > wrote:
>> > >> >
>> > >> >> Hi,
>> > >> >>
>> > >> >> I was looking into the different between acks = 0 and acks = 1 in
>> the
>> > >> >> new producer, and was a bit surprised at what I found.
>> > >> >>
>> > >> >> Basically, if I understand correctly, the only difference is that
>> > with
>> > >> >> acks = 0, if the leader fails to append locally, it closes the
>> > network
>> > >> >> connection silently and with acks = 1, it sends an actual error
>> > >> >> message.
>> > >> >>
>> > >> >> Which seems to mean that with acks = 0, any failed produce will
>> lead
>> > >> >> to metadata refresh and a retry (because network error), while
>> acks =
>> > >> >> 1 will lead to either retries or abort, depending on the error.
>> > >> >>
>> > >> >> Not only this doesn't match the documentation, it doesn't even make
>> > >> >> much sense...
>> > >> >> "acks = 0" was supposed to somehow makes things "less safe but
>> > >> >> faster", and it doesn't seem to be doing that any more. I'm not
>> even
>> > >> >> sure there's any case where the "acks = 0" behavior is desirable.
>> > >> >>
>> > >> >> Is it my misunderstanding, or did we somehow screw up the logic
>> here?
>> > >> >>
>> > >> >> Gwen
>> > >> >>
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Thanks,
>> > > Ewen
>> >
>>
>>
>>
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>>
>
>
>
> --
> -- Guozhang

Re: New Producer and "acks" configuration

Posted by Guozhang Wang <wa...@gmail.com>.
I think there is still a subtle difference between "async with acks = 0"
and "async with callback", that when the #.max-inflight-requests has
reached the subsequent requests cannot be sent until previous responses are
returned (which could happen, for example, when the broker is slow /
network issue happens) in the second case but not in the first.

Given this difference, I feel there are still scenarios, though probably
rare, that users would like to use "acks = 0" even with new producer's
callbacks.

Guozhang

On Mon, Jul 27, 2015 at 9:25 AM, Mayuresh Gharat <gharatmayuresh15@gmail.com
> wrote:

> So basically this means that with acks = 0, their is no guarantee that the
> message has been received by Kafka broker. I am just wondering, why would
> anyone be using acks = 0, since anyone using kafka and doing
> producer.send() would want that, their message got to kafka brokers. Also
> as Jay said, with new producer with async mode, clients will not have to
> wait for the response since it will be handled in callbacks. So the use of
> acks = 0 sounds very rare to me and I am not able to think of an usecase
> around it.
>
> Thanks,
>
> Mayuresh
>
> On Sun, Jul 26, 2015 at 2:40 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
> > Aha! Yes, I was missing the part with the dummy response.
> > Thank you!
> >
> > Gwen
> >
> >
> > On Sun, Jul 26, 2015 at 2:17 PM, Ewen Cheslack-Postava
> > <ew...@confluent.io> wrote:
> > > It's different because it changes whether the client waits for the
> > response
> > > from the broker at all. Take a look at
> > NetworkClient.handleCompletedSends,
> > > which fills in dummy responses when a response is not expected (and
> that
> > > flag gets set via Sender.produceRequest using acks != 0 as a flag to
> > > ClientRequest). This means that the producer will invoke the callback &
> > > resolve the future as soon as the request hits the TCP buffer on the
> > > client. At that point, the behavior of the broker wrt generating a
> > response
> > > doesn't matter -- the client isn't waiting on that response anyway.
> > >
> > > This definitely is faster since you aren't waiting for the round trip,
> > but
> > > it seems like it is of questionable value with the new producer as Jay
> > > explained. It is slightly better than just assuming records have been
> > sent
> > > as soon as you call Producer.send() in this shouldn't trigger a
> callback
> > > until the records have made it through the internal KafkaProducer
> > > buffering. But since it still has to make it through the TCP buffers it
> > > doesn't really guarantee anything that useful.
> > >
> > > -Ewen
> > >
> > >
> > > On Sun, Jul 26, 2015 at 1:40 PM, Gwen Shapira <gs...@cloudera.com>
> > wrote:
> > >
> > >> What bugs me is that even with acks = 0, the broker will append to
> > >> local log before responding (unless I'm misreading the code), so I
> > >> don't see why a client with acks = 0 will be any faster. Unless the
> > >> client chooses to not wait for response, which is orthogonal to acks
> > >> parameter.
> > >>
> > >> On Mon, Jul 20, 2015 at 8:52 AM, Jay Kreps <ja...@confluent.io> wrote:
> > >> > acks=0 is a one-way send, the client doesn't need to wait on the
> > >> response.
> > >> > Whether this is useful sort of depends on the client implementation.
> > The
> > >> > new java producer does all sends async so "waiting" on a response
> > isn't
> > >> > really a thing. For a client that lacks this, though, as some of
> them
> > do,
> > >> > acks=0 will be a lot faster.
> > >> >
> > >> > It also makes some sense in terms of what is completed when the
> > request
> > >> is
> > >> > considered satisfied
> > >> >   acks = 0 - message is written to the network (buffer)
> > >> >   acks = 1 - message is written to the leader log
> > >> >   acks = -1 - message is committed
> > >> >
> > >> > -Jay
> > >> >
> > >> > On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <
> gshapira@cloudera.com
> > >
> > >> > wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> I was looking into the different between acks = 0 and acks = 1 in
> the
> > >> >> new producer, and was a bit surprised at what I found.
> > >> >>
> > >> >> Basically, if I understand correctly, the only difference is that
> > with
> > >> >> acks = 0, if the leader fails to append locally, it closes the
> > network
> > >> >> connection silently and with acks = 1, it sends an actual error
> > >> >> message.
> > >> >>
> > >> >> Which seems to mean that with acks = 0, any failed produce will
> lead
> > >> >> to metadata refresh and a retry (because network error), while
> acks =
> > >> >> 1 will lead to either retries or abort, depending on the error.
> > >> >>
> > >> >> Not only this doesn't match the documentation, it doesn't even make
> > >> >> much sense...
> > >> >> "acks = 0" was supposed to somehow makes things "less safe but
> > >> >> faster", and it doesn't seem to be doing that any more. I'm not
> even
> > >> >> sure there's any case where the "acks = 0" behavior is desirable.
> > >> >>
> > >> >> Is it my misunderstanding, or did we somehow screw up the logic
> here?
> > >> >>
> > >> >> Gwen
> > >> >>
> > >>
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Ewen
> >
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>



-- 
-- Guozhang

Re: New Producer and "acks" configuration

Posted by Mayuresh Gharat <gh...@gmail.com>.
So basically this means that with acks = 0, their is no guarantee that the
message has been received by Kafka broker. I am just wondering, why would
anyone be using acks = 0, since anyone using kafka and doing
producer.send() would want that, their message got to kafka brokers. Also
as Jay said, with new producer with async mode, clients will not have to
wait for the response since it will be handled in callbacks. So the use of
acks = 0 sounds very rare to me and I am not able to think of an usecase
around it.

Thanks,

Mayuresh

On Sun, Jul 26, 2015 at 2:40 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> Aha! Yes, I was missing the part with the dummy response.
> Thank you!
>
> Gwen
>
>
> On Sun, Jul 26, 2015 at 2:17 PM, Ewen Cheslack-Postava
> <ew...@confluent.io> wrote:
> > It's different because it changes whether the client waits for the
> response
> > from the broker at all. Take a look at
> NetworkClient.handleCompletedSends,
> > which fills in dummy responses when a response is not expected (and that
> > flag gets set via Sender.produceRequest using acks != 0 as a flag to
> > ClientRequest). This means that the producer will invoke the callback &
> > resolve the future as soon as the request hits the TCP buffer on the
> > client. At that point, the behavior of the broker wrt generating a
> response
> > doesn't matter -- the client isn't waiting on that response anyway.
> >
> > This definitely is faster since you aren't waiting for the round trip,
> but
> > it seems like it is of questionable value with the new producer as Jay
> > explained. It is slightly better than just assuming records have been
> sent
> > as soon as you call Producer.send() in this shouldn't trigger a callback
> > until the records have made it through the internal KafkaProducer
> > buffering. But since it still has to make it through the TCP buffers it
> > doesn't really guarantee anything that useful.
> >
> > -Ewen
> >
> >
> > On Sun, Jul 26, 2015 at 1:40 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
> >
> >> What bugs me is that even with acks = 0, the broker will append to
> >> local log before responding (unless I'm misreading the code), so I
> >> don't see why a client with acks = 0 will be any faster. Unless the
> >> client chooses to not wait for response, which is orthogonal to acks
> >> parameter.
> >>
> >> On Mon, Jul 20, 2015 at 8:52 AM, Jay Kreps <ja...@confluent.io> wrote:
> >> > acks=0 is a one-way send, the client doesn't need to wait on the
> >> response.
> >> > Whether this is useful sort of depends on the client implementation.
> The
> >> > new java producer does all sends async so "waiting" on a response
> isn't
> >> > really a thing. For a client that lacks this, though, as some of them
> do,
> >> > acks=0 will be a lot faster.
> >> >
> >> > It also makes some sense in terms of what is completed when the
> request
> >> is
> >> > considered satisfied
> >> >   acks = 0 - message is written to the network (buffer)
> >> >   acks = 1 - message is written to the leader log
> >> >   acks = -1 - message is committed
> >> >
> >> > -Jay
> >> >
> >> > On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <gshapira@cloudera.com
> >
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I was looking into the different between acks = 0 and acks = 1 in the
> >> >> new producer, and was a bit surprised at what I found.
> >> >>
> >> >> Basically, if I understand correctly, the only difference is that
> with
> >> >> acks = 0, if the leader fails to append locally, it closes the
> network
> >> >> connection silently and with acks = 1, it sends an actual error
> >> >> message.
> >> >>
> >> >> Which seems to mean that with acks = 0, any failed produce will lead
> >> >> to metadata refresh and a retry (because network error), while acks =
> >> >> 1 will lead to either retries or abort, depending on the error.
> >> >>
> >> >> Not only this doesn't match the documentation, it doesn't even make
> >> >> much sense...
> >> >> "acks = 0" was supposed to somehow makes things "less safe but
> >> >> faster", and it doesn't seem to be doing that any more. I'm not even
> >> >> sure there's any case where the "acks = 0" behavior is desirable.
> >> >>
> >> >> Is it my misunderstanding, or did we somehow screw up the logic here?
> >> >>
> >> >> Gwen
> >> >>
> >>
> >
> >
> >
> > --
> > Thanks,
> > Ewen
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: New Producer and "acks" configuration

Posted by Gwen Shapira <gs...@cloudera.com>.
Aha! Yes, I was missing the part with the dummy response.
Thank you!

Gwen


On Sun, Jul 26, 2015 at 2:17 PM, Ewen Cheslack-Postava
<ew...@confluent.io> wrote:
> It's different because it changes whether the client waits for the response
> from the broker at all. Take a look at NetworkClient.handleCompletedSends,
> which fills in dummy responses when a response is not expected (and that
> flag gets set via Sender.produceRequest using acks != 0 as a flag to
> ClientRequest). This means that the producer will invoke the callback &
> resolve the future as soon as the request hits the TCP buffer on the
> client. At that point, the behavior of the broker wrt generating a response
> doesn't matter -- the client isn't waiting on that response anyway.
>
> This definitely is faster since you aren't waiting for the round trip, but
> it seems like it is of questionable value with the new producer as Jay
> explained. It is slightly better than just assuming records have been sent
> as soon as you call Producer.send() in this shouldn't trigger a callback
> until the records have made it through the internal KafkaProducer
> buffering. But since it still has to make it through the TCP buffers it
> doesn't really guarantee anything that useful.
>
> -Ewen
>
>
> On Sun, Jul 26, 2015 at 1:40 PM, Gwen Shapira <gs...@cloudera.com> wrote:
>
>> What bugs me is that even with acks = 0, the broker will append to
>> local log before responding (unless I'm misreading the code), so I
>> don't see why a client with acks = 0 will be any faster. Unless the
>> client chooses to not wait for response, which is orthogonal to acks
>> parameter.
>>
>> On Mon, Jul 20, 2015 at 8:52 AM, Jay Kreps <ja...@confluent.io> wrote:
>> > acks=0 is a one-way send, the client doesn't need to wait on the
>> response.
>> > Whether this is useful sort of depends on the client implementation. The
>> > new java producer does all sends async so "waiting" on a response isn't
>> > really a thing. For a client that lacks this, though, as some of them do,
>> > acks=0 will be a lot faster.
>> >
>> > It also makes some sense in terms of what is completed when the request
>> is
>> > considered satisfied
>> >   acks = 0 - message is written to the network (buffer)
>> >   acks = 1 - message is written to the leader log
>> >   acks = -1 - message is committed
>> >
>> > -Jay
>> >
>> > On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <gs...@cloudera.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I was looking into the different between acks = 0 and acks = 1 in the
>> >> new producer, and was a bit surprised at what I found.
>> >>
>> >> Basically, if I understand correctly, the only difference is that with
>> >> acks = 0, if the leader fails to append locally, it closes the network
>> >> connection silently and with acks = 1, it sends an actual error
>> >> message.
>> >>
>> >> Which seems to mean that with acks = 0, any failed produce will lead
>> >> to metadata refresh and a retry (because network error), while acks =
>> >> 1 will lead to either retries or abort, depending on the error.
>> >>
>> >> Not only this doesn't match the documentation, it doesn't even make
>> >> much sense...
>> >> "acks = 0" was supposed to somehow makes things "less safe but
>> >> faster", and it doesn't seem to be doing that any more. I'm not even
>> >> sure there's any case where the "acks = 0" behavior is desirable.
>> >>
>> >> Is it my misunderstanding, or did we somehow screw up the logic here?
>> >>
>> >> Gwen
>> >>
>>
>
>
>
> --
> Thanks,
> Ewen

Re: New Producer and "acks" configuration

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
It's different because it changes whether the client waits for the response
from the broker at all. Take a look at NetworkClient.handleCompletedSends,
which fills in dummy responses when a response is not expected (and that
flag gets set via Sender.produceRequest using acks != 0 as a flag to
ClientRequest). This means that the producer will invoke the callback &
resolve the future as soon as the request hits the TCP buffer on the
client. At that point, the behavior of the broker wrt generating a response
doesn't matter -- the client isn't waiting on that response anyway.

This definitely is faster since you aren't waiting for the round trip, but
it seems like it is of questionable value with the new producer as Jay
explained. It is slightly better than just assuming records have been sent
as soon as you call Producer.send() in this shouldn't trigger a callback
until the records have made it through the internal KafkaProducer
buffering. But since it still has to make it through the TCP buffers it
doesn't really guarantee anything that useful.

-Ewen


On Sun, Jul 26, 2015 at 1:40 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> What bugs me is that even with acks = 0, the broker will append to
> local log before responding (unless I'm misreading the code), so I
> don't see why a client with acks = 0 will be any faster. Unless the
> client chooses to not wait for response, which is orthogonal to acks
> parameter.
>
> On Mon, Jul 20, 2015 at 8:52 AM, Jay Kreps <ja...@confluent.io> wrote:
> > acks=0 is a one-way send, the client doesn't need to wait on the
> response.
> > Whether this is useful sort of depends on the client implementation. The
> > new java producer does all sends async so "waiting" on a response isn't
> > really a thing. For a client that lacks this, though, as some of them do,
> > acks=0 will be a lot faster.
> >
> > It also makes some sense in terms of what is completed when the request
> is
> > considered satisfied
> >   acks = 0 - message is written to the network (buffer)
> >   acks = 1 - message is written to the leader log
> >   acks = -1 - message is committed
> >
> > -Jay
> >
> > On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <gs...@cloudera.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I was looking into the different between acks = 0 and acks = 1 in the
> >> new producer, and was a bit surprised at what I found.
> >>
> >> Basically, if I understand correctly, the only difference is that with
> >> acks = 0, if the leader fails to append locally, it closes the network
> >> connection silently and with acks = 1, it sends an actual error
> >> message.
> >>
> >> Which seems to mean that with acks = 0, any failed produce will lead
> >> to metadata refresh and a retry (because network error), while acks =
> >> 1 will lead to either retries or abort, depending on the error.
> >>
> >> Not only this doesn't match the documentation, it doesn't even make
> >> much sense...
> >> "acks = 0" was supposed to somehow makes things "less safe but
> >> faster", and it doesn't seem to be doing that any more. I'm not even
> >> sure there's any case where the "acks = 0" behavior is desirable.
> >>
> >> Is it my misunderstanding, or did we somehow screw up the logic here?
> >>
> >> Gwen
> >>
>



-- 
Thanks,
Ewen

Re: New Producer and "acks" configuration

Posted by Gwen Shapira <gs...@cloudera.com>.
What bugs me is that even with acks = 0, the broker will append to
local log before responding (unless I'm misreading the code), so I
don't see why a client with acks = 0 will be any faster. Unless the
client chooses to not wait for response, which is orthogonal to acks
parameter.

On Mon, Jul 20, 2015 at 8:52 AM, Jay Kreps <ja...@confluent.io> wrote:
> acks=0 is a one-way send, the client doesn't need to wait on the response.
> Whether this is useful sort of depends on the client implementation. The
> new java producer does all sends async so "waiting" on a response isn't
> really a thing. For a client that lacks this, though, as some of them do,
> acks=0 will be a lot faster.
>
> It also makes some sense in terms of what is completed when the request is
> considered satisfied
>   acks = 0 - message is written to the network (buffer)
>   acks = 1 - message is written to the leader log
>   acks = -1 - message is committed
>
> -Jay
>
> On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
>> Hi,
>>
>> I was looking into the different between acks = 0 and acks = 1 in the
>> new producer, and was a bit surprised at what I found.
>>
>> Basically, if I understand correctly, the only difference is that with
>> acks = 0, if the leader fails to append locally, it closes the network
>> connection silently and with acks = 1, it sends an actual error
>> message.
>>
>> Which seems to mean that with acks = 0, any failed produce will lead
>> to metadata refresh and a retry (because network error), while acks =
>> 1 will lead to either retries or abort, depending on the error.
>>
>> Not only this doesn't match the documentation, it doesn't even make
>> much sense...
>> "acks = 0" was supposed to somehow makes things "less safe but
>> faster", and it doesn't seem to be doing that any more. I'm not even
>> sure there's any case where the "acks = 0" behavior is desirable.
>>
>> Is it my misunderstanding, or did we somehow screw up the logic here?
>>
>> Gwen
>>

Re: New Producer and "acks" configuration

Posted by Jay Kreps <ja...@confluent.io>.
acks=0 is a one-way send, the client doesn't need to wait on the response.
Whether this is useful sort of depends on the client implementation. The
new java producer does all sends async so "waiting" on a response isn't
really a thing. For a client that lacks this, though, as some of them do,
acks=0 will be a lot faster.

It also makes some sense in terms of what is completed when the request is
considered satisfied
  acks = 0 - message is written to the network (buffer)
  acks = 1 - message is written to the leader log
  acks = -1 - message is committed

-Jay

On Sat, Jul 18, 2015 at 10:50 PM, Gwen Shapira <gs...@cloudera.com>
wrote:

> Hi,
>
> I was looking into the different between acks = 0 and acks = 1 in the
> new producer, and was a bit surprised at what I found.
>
> Basically, if I understand correctly, the only difference is that with
> acks = 0, if the leader fails to append locally, it closes the network
> connection silently and with acks = 1, it sends an actual error
> message.
>
> Which seems to mean that with acks = 0, any failed produce will lead
> to metadata refresh and a retry (because network error), while acks =
> 1 will lead to either retries or abort, depending on the error.
>
> Not only this doesn't match the documentation, it doesn't even make
> much sense...
> "acks = 0" was supposed to somehow makes things "less safe but
> faster", and it doesn't seem to be doing that any more. I'm not even
> sure there's any case where the "acks = 0" behavior is desirable.
>
> Is it my misunderstanding, or did we somehow screw up the logic here?
>
> Gwen
>