You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Adam Kunicki <ad...@streamsets.com> on 2016/02/01 22:35:02 UTC

Kafka Committed Offset Behavior off by 1

Hi,

I've been noticing that a restarted consumer in 0.9 will start consuming
from the last committed offset (inclusive). This means that any restarted
consumer will get the last read (and committed) message causing a duplicate
each time the consumer is restarted from the same position if there have
been no new messages.

Per:
http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
<https://mailtrack.io/trace/link/9853c5856f2b5862212148c1a969575c970a3dcc?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=63a1a40b88347844>
this seems like that is the intended behavior.

Can anyone confirm this? If this is the case how are we expected to handle
these duplicated messages?

-Adam

Re: Kafka Committed Offset Behavior off by 1

Posted by Gwen Shapira <gw...@confluent.io>.
Assigned to you :)

On Mon, Feb 1, 2016 at 10:46 PM, Adam Kunicki <ad...@streamsets.com> wrote:

> Done: https://issues.apache.org/jira/browse/KAFKA-3191
> <
> https://mailtrack.io/trace/link/3bdac0fc9ba873124166ced41257394c37af6bdd?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-3191&signature=7188d1843f83499b
> >
> Feel free to assign it to me (wasn't able to do that myself)
>
> On Mon, Feb 1, 2016 at 9:55 PM, Gwen Shapira <gw...@confluent.io> wrote:
>
> > This is the second time I see this complaint, so we could probably make
> the
> > API docs clearer.
> >
> > Adam, feel like submitting a JIRA?
> >
> > On Mon, Feb 1, 2016 at 3:34 PM, Adam Kunicki <ad...@streamsets.com>
> wrote:
> >
> > > Thanks, actually found this out per:
> > >
> > >
> >
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> > > <
> > >
> >
> https://mailtrack.io/trace/link/f2e80a9ef7bfbabfc3e6f8951266d07b52051751?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=44a0b68933da863e
> > > >
> > >
> > > from TFA:
> > >
> > > consumer.commitSync(Collections.singletonMap(record.partition(), new
> > > OffsetAndMetadata(record.offset() + 1)));
> > >
> > > The committed offset should always be the offset of the next message
> that
> > > your application will read.
> > >
> > > Wish this was a bit clearer in the API docs :)
> > >
> > > On Mon, Feb 1, 2016 at 1:52 PM, Dana Powers <da...@gmail.com>
> > wrote:
> > >
> > > > The committed offset is actually the next message to consume, not the
> > > last
> > > > message consumed. So that sounds like expected behavior to me. The
> > > consumer
> > > > code handles this internally, but if you write code to commit offsets
> > > > manually, it can be a gotcha.
> > > >
> > > > -Dana
> > > >
> > > > On Mon, Feb 1, 2016 at 1:35 PM, Adam Kunicki <ad...@streamsets.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I've been noticing that a restarted consumer in 0.9 will start
> > > consuming
> > > > > from the last committed offset (inclusive). This means that any
> > > restarted
> > > > > consumer will get the last read (and committed) message causing a
> > > > duplicate
> > > > > each time the consumer is restarted from the same position if there
> > > have
> > > > > been no new messages.
> > > > >
> > > > > Per:
> > > > >
> > > > >
> > > >
> > >
> >
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> > > > <
> > >
> >
> https://mailtrack.io/trace/link/f2e80a9ef7bfbabfc3e6f8951266d07b52051751?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=44a0b68933da863e
> > > >
> > > > > <
> > > > >
> > > >
> > >
> >
> https://mailtrack.io/trace/link/9853c5856f2b5862212148c1a969575c970a3dcc?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=63a1a40b88347844
> > > > > >
> > > > > this seems like that is the intended behavior.
> > > > >
> > > > > Can anyone confirm this? If this is the case how are we expected to
> > > > handle
> > > > > these duplicated messages?
> > > > >
> > > > > -Adam
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Adam Kunicki
> > > StreamSets | Field Engineer
> > > mobile: 415.890.DATA (3282) | linkedin
> > > <
> > >
> >
> https://mailtrack.io/trace/link/50832933390e909694a7f2157c5d640476609cd1?url=http%3A%2F%2Fwww.adamkunicki.com&signature=c5598df83da6c7fa
> > > >
> > >
> >
>
>
>
> --
> Adam Kunicki
> StreamSets | Field Engineer
> mobile: 415.890.DATA (3282) | linkedin
> <
> https://mailtrack.io/trace/link/a7f1302905f447f07ec44b6eaaaf95463a26b7ea?url=http%3A%2F%2Fwww.adamkunicki.com&signature=c3965698083a715b
> >
>

Re: Kafka Committed Offset Behavior off by 1

Posted by Adam Kunicki <ad...@streamsets.com>.
Done: https://issues.apache.org/jira/browse/KAFKA-3191
<https://mailtrack.io/trace/link/3bdac0fc9ba873124166ced41257394c37af6bdd?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-3191&signature=7188d1843f83499b>
Feel free to assign it to me (wasn't able to do that myself)

On Mon, Feb 1, 2016 at 9:55 PM, Gwen Shapira <gw...@confluent.io> wrote:

> This is the second time I see this complaint, so we could probably make the
> API docs clearer.
>
> Adam, feel like submitting a JIRA?
>
> On Mon, Feb 1, 2016 at 3:34 PM, Adam Kunicki <ad...@streamsets.com> wrote:
>
> > Thanks, actually found this out per:
> >
> >
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> > <
> >
> https://mailtrack.io/trace/link/f2e80a9ef7bfbabfc3e6f8951266d07b52051751?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=44a0b68933da863e
> > >
> >
> > from TFA:
> >
> > consumer.commitSync(Collections.singletonMap(record.partition(), new
> > OffsetAndMetadata(record.offset() + 1)));
> >
> > The committed offset should always be the offset of the next message that
> > your application will read.
> >
> > Wish this was a bit clearer in the API docs :)
> >
> > On Mon, Feb 1, 2016 at 1:52 PM, Dana Powers <da...@gmail.com>
> wrote:
> >
> > > The committed offset is actually the next message to consume, not the
> > last
> > > message consumed. So that sounds like expected behavior to me. The
> > consumer
> > > code handles this internally, but if you write code to commit offsets
> > > manually, it can be a gotcha.
> > >
> > > -Dana
> > >
> > > On Mon, Feb 1, 2016 at 1:35 PM, Adam Kunicki <ad...@streamsets.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I've been noticing that a restarted consumer in 0.9 will start
> > consuming
> > > > from the last committed offset (inclusive). This means that any
> > restarted
> > > > consumer will get the last read (and committed) message causing a
> > > duplicate
> > > > each time the consumer is restarted from the same position if there
> > have
> > > > been no new messages.
> > > >
> > > > Per:
> > > >
> > > >
> > >
> >
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> > > <
> >
> https://mailtrack.io/trace/link/f2e80a9ef7bfbabfc3e6f8951266d07b52051751?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=44a0b68933da863e
> > >
> > > > <
> > > >
> > >
> >
> https://mailtrack.io/trace/link/9853c5856f2b5862212148c1a969575c970a3dcc?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=63a1a40b88347844
> > > > >
> > > > this seems like that is the intended behavior.
> > > >
> > > > Can anyone confirm this? If this is the case how are we expected to
> > > handle
> > > > these duplicated messages?
> > > >
> > > > -Adam
> > > >
> > >
> >
> >
> >
> > --
> > Adam Kunicki
> > StreamSets | Field Engineer
> > mobile: 415.890.DATA (3282) | linkedin
> > <
> >
> https://mailtrack.io/trace/link/50832933390e909694a7f2157c5d640476609cd1?url=http%3A%2F%2Fwww.adamkunicki.com&signature=c5598df83da6c7fa
> > >
> >
>



-- 
Adam Kunicki
StreamSets | Field Engineer
mobile: 415.890.DATA (3282) | linkedin
<https://mailtrack.io/trace/link/a7f1302905f447f07ec44b6eaaaf95463a26b7ea?url=http%3A%2F%2Fwww.adamkunicki.com&signature=c3965698083a715b>

Re: Kafka Committed Offset Behavior off by 1

Posted by Gwen Shapira <gw...@confluent.io>.
This is the second time I see this complaint, so we could probably make the
API docs clearer.

Adam, feel like submitting a JIRA?

On Mon, Feb 1, 2016 at 3:34 PM, Adam Kunicki <ad...@streamsets.com> wrote:

> Thanks, actually found this out per:
>
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> <
> https://mailtrack.io/trace/link/f2e80a9ef7bfbabfc3e6f8951266d07b52051751?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=44a0b68933da863e
> >
>
> from TFA:
>
> consumer.commitSync(Collections.singletonMap(record.partition(), new
> OffsetAndMetadata(record.offset() + 1)));
>
> The committed offset should always be the offset of the next message that
> your application will read.
>
> Wish this was a bit clearer in the API docs :)
>
> On Mon, Feb 1, 2016 at 1:52 PM, Dana Powers <da...@gmail.com> wrote:
>
> > The committed offset is actually the next message to consume, not the
> last
> > message consumed. So that sounds like expected behavior to me. The
> consumer
> > code handles this internally, but if you write code to commit offsets
> > manually, it can be a gotcha.
> >
> > -Dana
> >
> > On Mon, Feb 1, 2016 at 1:35 PM, Adam Kunicki <ad...@streamsets.com>
> wrote:
> >
> > > Hi,
> > >
> > > I've been noticing that a restarted consumer in 0.9 will start
> consuming
> > > from the last committed offset (inclusive). This means that any
> restarted
> > > consumer will get the last read (and committed) message causing a
> > duplicate
> > > each time the consumer is restarted from the same position if there
> have
> > > been no new messages.
> > >
> > > Per:
> > >
> > >
> >
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> > <
> https://mailtrack.io/trace/link/f2e80a9ef7bfbabfc3e6f8951266d07b52051751?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=44a0b68933da863e
> >
> > > <
> > >
> >
> https://mailtrack.io/trace/link/9853c5856f2b5862212148c1a969575c970a3dcc?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=63a1a40b88347844
> > > >
> > > this seems like that is the intended behavior.
> > >
> > > Can anyone confirm this? If this is the case how are we expected to
> > handle
> > > these duplicated messages?
> > >
> > > -Adam
> > >
> >
>
>
>
> --
> Adam Kunicki
> StreamSets | Field Engineer
> mobile: 415.890.DATA (3282) | linkedin
> <
> https://mailtrack.io/trace/link/50832933390e909694a7f2157c5d640476609cd1?url=http%3A%2F%2Fwww.adamkunicki.com&signature=c5598df83da6c7fa
> >
>

Re: Kafka Committed Offset Behavior off by 1

Posted by Adam Kunicki <ad...@streamsets.com>.
Thanks, actually found this out per:
http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
<https://mailtrack.io/trace/link/f2e80a9ef7bfbabfc3e6f8951266d07b52051751?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=44a0b68933da863e>

from TFA:

consumer.commitSync(Collections.singletonMap(record.partition(), new
OffsetAndMetadata(record.offset() + 1)));

The committed offset should always be the offset of the next message that
your application will read.

Wish this was a bit clearer in the API docs :)

On Mon, Feb 1, 2016 at 1:52 PM, Dana Powers <da...@gmail.com> wrote:

> The committed offset is actually the next message to consume, not the last
> message consumed. So that sounds like expected behavior to me. The consumer
> code handles this internally, but if you write code to commit offsets
> manually, it can be a gotcha.
>
> -Dana
>
> On Mon, Feb 1, 2016 at 1:35 PM, Adam Kunicki <ad...@streamsets.com> wrote:
>
> > Hi,
> >
> > I've been noticing that a restarted consumer in 0.9 will start consuming
> > from the last committed offset (inclusive). This means that any restarted
> > consumer will get the last read (and committed) message causing a
> duplicate
> > each time the consumer is restarted from the same position if there have
> > been no new messages.
> >
> > Per:
> >
> >
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> <https://mailtrack.io/trace/link/f2e80a9ef7bfbabfc3e6f8951266d07b52051751?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=44a0b68933da863e>
> > <
> >
> https://mailtrack.io/trace/link/9853c5856f2b5862212148c1a969575c970a3dcc?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=63a1a40b88347844
> > >
> > this seems like that is the intended behavior.
> >
> > Can anyone confirm this? If this is the case how are we expected to
> handle
> > these duplicated messages?
> >
> > -Adam
> >
>



-- 
Adam Kunicki
StreamSets | Field Engineer
mobile: 415.890.DATA (3282) | linkedin
<https://mailtrack.io/trace/link/50832933390e909694a7f2157c5d640476609cd1?url=http%3A%2F%2Fwww.adamkunicki.com&signature=c5598df83da6c7fa>

Re: Kafka Committed Offset Behavior off by 1

Posted by Dana Powers <da...@gmail.com>.
The committed offset is actually the next message to consume, not the last
message consumed. So that sounds like expected behavior to me. The consumer
code handles this internally, but if you write code to commit offsets
manually, it can be a gotcha.

-Dana

On Mon, Feb 1, 2016 at 1:35 PM, Adam Kunicki <ad...@streamsets.com> wrote:

> Hi,
>
> I've been noticing that a restarted consumer in 0.9 will start consuming
> from the last committed offset (inclusive). This means that any restarted
> consumer will get the last read (and committed) message causing a duplicate
> each time the consumer is restarted from the same position if there have
> been no new messages.
>
> Per:
>
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> <
> https://mailtrack.io/trace/link/9853c5856f2b5862212148c1a969575c970a3dcc?url=http%3A%2F%2Fwww.confluent.io%2Fblog%2Ftutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client&signature=63a1a40b88347844
> >
> this seems like that is the intended behavior.
>
> Can anyone confirm this? If this is the case how are we expected to handle
> these duplicated messages?
>
> -Adam
>