You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Yang <te...@gmail.com> on 2013/09/04 08:06:56 UTC

Re: at-least-once guarantee?

thanks Jay.

after setting up a kafka instance and running some examples. I think I
understand it better now. as the paper pointed out, Kafka made a
design choice that made the problem simpler: essentially the "at least once
" delivery actually means "at least once given sufficiently long time and
sufficiently many tries", in fact as the paper says, the broker never cares
to remember the offset of consumer, let alone whether the consumer
succeeded in reading data, but it assumes that given the retention period,
the client will indeed finally succeed in reading. From another
perspective, the consumer has to take on the responsibility to make sure of
"at least once consumption" (if this is required by application), by means
such as atomic commit of the offset and data, as you mentioned.

Regards
Yang


On Wed, Aug 7, 2013 at 4:26 PM, Jay Kreps <ja...@gmail.com> wrote:

> Yeah I'm not sure how good our understanding was when we wrote that.
>
> Here is my take now:
>
> At least once delivery is not that hard but you need the ability to
> deduplicate things--basically you turn the "at least once delivery channel"
> into the "exactly once channel" by throwing away duplicates. This means
> 1. Some key assigned by the producer that allows the broker to detect a
> re-published message to make publishing idempotent. This solves the problem
> of producer retries. This key obviously has to be highly available--i.e. if
> the leader for a partition fails the follower must correctly deduplicate
> for all committed messages.
> 2. Some key that allows the consumer to detect a re-consumed message.
>
> The first item is actually pretty doable as we can track some producer
> sequence in the log and use it to avoid duplicate appends. We just need to
> implement it. I think this can be done in a way that is fairly low overhead
> and can be "on by default".
>
> We actually already provide such a key to the consumer--the offset. Making
> use of this is actually somewhat application dependent. Obviously providing
> exactly-once guarantees in the case of no failures is easy and we already
> handle that case. The harder part is if a consumer process dies to ensure
> that it restarts in a position that exactly matches the state changes that
> it has made in some destination system. If the consumer application uses
> the offset in a way that makes updates idempotent that will work, or if
> they commit their offset and data atomically that works. However in general
> the goal of a consumer is to produce some state change in another system (a
> db, hdfs, some other data system, etc) and having a general solution that
> works with all of these is hard since they have very different limitations
> and features.
>
> -Jay
>
>
> On Wed, Aug 7, 2013 at 4:00 PM, Yang <te...@gmail.com> wrote:
>
> > I wonder why at-least-once guarantee is easier to maintain than
> > exactly-once (in that the latter requires 2PC while the former does not ,
> > according to
> >
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> >  )
> >
> > if u achieve at-least-once guarantee, you are able to assert between 2
> > cases "nothing" vs ">=1 delivered", which can be seen as 2 different
> > answers 0 and 1. isn't this as hard as the common Byzantine general
> > problem?
> >
> > Thanks
> > Yang
> >
>