You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Michael Luban <ml...@gmail.com> on 2016/07/11 13:18:21 UTC

Duplicates consumed on rebalance. No compression, autocommit enabled.

Using the 0.8.2.1 client.

Is it possible to statistically minimize the possibility of duplication in
this scenario or has this behavior been corrected in a later client
version?  Or is the test flawed?

https://gist.github.com/mluban/03a5c0d9221182e6ddbc37189c4d3eb0

Re: Duplicates consumed on rebalance. No compression, autocommit enabled.

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
I'd suggest using the new consumer instead of the old consumer. We've
refined the implementation such that even with auto-commit you should get
at least once processing in the worst case (and when there aren't failures,
exactly once). The 0.10.0.0 release should get all of these semantics right.

-Ewen

On Mon, Jul 11, 2016 at 7:05 AM, Gerard Klijs <ge...@dizzit.com>
wrote:

> You could set the auto.commit.interval.ms to a lower value, in your
> example
> it is 10 seconds, which can be a lot of messages. I don't really see how it
> could be prevented any further, since offset's can only committed by
> consumer to the partitions they are assigned to. I do believe there is some
> work in progress in which the assigned of partitions to consumers is
> somewhat sticky.
> In that case when a consumer has been assigned the same partitions after
> the rebalance as it has had before, and then it should not be necessary to
> consume the same data again in those partitions.
>
> On Mon, Jul 11, 2016 at 3:18 PM Michael Luban <ml...@gmail.com>
> wrote:
>
> > Using the 0.8.2.1 client.
> >
> > Is it possible to statistically minimize the possibility of duplication
> in
> > this scenario or has this behavior been corrected in a later client
> > version?  Or is the test flawed?
> >
> > https://gist.github.com/mluban/03a5c0d9221182e6ddbc37189c4d3eb0
> >
>



-- 
Thanks,
Ewen

Re: Duplicates consumed on rebalance. No compression, autocommit enabled.

Posted by Gerard Klijs <ge...@dizzit.com>.
You could set the auto.commit.interval.ms to a lower value, in your example
it is 10 seconds, which can be a lot of messages. I don't really see how it
could be prevented any further, since offset's can only committed by
consumer to the partitions they are assigned to. I do believe there is some
work in progress in which the assigned of partitions to consumers is
somewhat sticky.
In that case when a consumer has been assigned the same partitions after
the rebalance as it has had before, and then it should not be necessary to
consume the same data again in those partitions.

On Mon, Jul 11, 2016 at 3:18 PM Michael Luban <ml...@gmail.com> wrote:

> Using the 0.8.2.1 client.
>
> Is it possible to statistically minimize the possibility of duplication in
> this scenario or has this behavior been corrected in a later client
> version?  Or is the test flawed?
>
> https://gist.github.com/mluban/03a5c0d9221182e6ddbc37189c4d3eb0
>