You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Stevo Slavić <ss...@gmail.com> on 2015/07/24 18:13:27 UTC

New consumer - offset one gets in poll is not offset one is supposed to commit

Hello Apache Kafka community,

Say there is only one topic with single partition and a single message on
it.
Result of calling a poll with new consumer will return ConsumerRecord for
that message and it will have offset of 0.

After processing message, current KafkaConsumer implementation expects one
to commit not offset 0 as processed, but to commit offset 1 - next
offset/position one would like to consume.

Does this sound strange to you as well?

Wondering couldn't this offset+1 handling for next position to read been
done in one place, in KafkaConsumer implementation or broker or whatever,
instead of every user of KafkaConsumer having to do it.

Kind regards,
Stevo Slavic.

Re: New consumer - offset one gets in poll is not offset one is supposed to commit

Posted by Stevo Slavić <ss...@gmail.com>.
Sorry, wrong ML.

On Fri, Jul 24, 2015 at 7:07 PM, Cody Koeninger <co...@koeninger.org> wrote:

> Are you intending to be mailing the spark list or the kafka list?
>
> On Fri, Jul 24, 2015 at 11:56 AM, Stevo Slavić <ss...@gmail.com> wrote:
>
>> Hello Cody,
>>
>> I'm not sure we're talking about same thing.
>>
>> Since you're mentioning streams I guess you were referring to current
>> high level consumer, while I'm talking about new yet unreleased high level
>> consumer.
>>
>> Poll I was referring to is
>> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L715
>>
>> ConsumerRecord offset I was referring to is
>> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecord.java#L22
>>
>> poll returns ConsumerRecords, per TopicPartition collection of
>> ConsumerRecord. And in example I gave ConsumerRecord offset would be 0.
>>
>> Commit I was referring to is
>> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L805
>>
>> After processing read ConsumerRecord, commit expects me to submit not
>> offset 0 but offset 1...
>>
>> Kind regards,
>> Stevo Slavic
>>
>> On Fri, Jul 24, 2015 at 6:31 PM, Cody Koeninger <co...@koeninger.org>
>> wrote:
>>
>>> Well... there are only 2 hard problems in computer science: naming
>>> things, cache invalidation, and off-by-one errors.
>>>
>>> The direct stream implementation isn't asking you to "commit" anything.
>>> It's asking you to provide a starting point for the stream on startup.
>>>
>>> Because offset ranges are inclusive start, exclusive end, it's pretty
>>> natural to use the end of the previous offset range as the beginning of the
>>> next.
>>>
>>>
>>> On Fri, Jul 24, 2015 at 11:13 AM, Stevo Slavić <ss...@gmail.com>
>>> wrote:
>>>
>>>> Hello Apache Kafka community,
>>>>
>>>> Say there is only one topic with single partition and a single message
>>>> on it.
>>>> Result of calling a poll with new consumer will return ConsumerRecord
>>>> for that message and it will have offset of 0.
>>>>
>>>> After processing message, current KafkaConsumer implementation expects
>>>> one to commit not offset 0 as processed, but to commit offset 1 - next
>>>> offset/position one would like to consume.
>>>>
>>>> Does this sound strange to you as well?
>>>>
>>>> Wondering couldn't this offset+1 handling for next position to read
>>>> been done in one place, in KafkaConsumer implementation or broker or
>>>> whatever, instead of every user of KafkaConsumer having to do it.
>>>>
>>>> Kind regards,
>>>> Stevo Slavic.
>>>>
>>>
>>>
>>
>

Re: New consumer - offset one gets in poll is not offset one is supposed to commit

Posted by Cody Koeninger <co...@koeninger.org>.
Are you intending to be mailing the spark list or the kafka list?

On Fri, Jul 24, 2015 at 11:56 AM, Stevo Slavić <ss...@gmail.com> wrote:

> Hello Cody,
>
> I'm not sure we're talking about same thing.
>
> Since you're mentioning streams I guess you were referring to current high
> level consumer, while I'm talking about new yet unreleased high level
> consumer.
>
> Poll I was referring to is
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L715
>
> ConsumerRecord offset I was referring to is
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecord.java#L22
>
> poll returns ConsumerRecords, per TopicPartition collection of
> ConsumerRecord. And in example I gave ConsumerRecord offset would be 0.
>
> Commit I was referring to is
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L805
>
> After processing read ConsumerRecord, commit expects me to submit not
> offset 0 but offset 1...
>
> Kind regards,
> Stevo Slavic
>
> On Fri, Jul 24, 2015 at 6:31 PM, Cody Koeninger <co...@koeninger.org>
> wrote:
>
>> Well... there are only 2 hard problems in computer science: naming
>> things, cache invalidation, and off-by-one errors.
>>
>> The direct stream implementation isn't asking you to "commit" anything.
>> It's asking you to provide a starting point for the stream on startup.
>>
>> Because offset ranges are inclusive start, exclusive end, it's pretty
>> natural to use the end of the previous offset range as the beginning of the
>> next.
>>
>>
>> On Fri, Jul 24, 2015 at 11:13 AM, Stevo Slavić <ss...@gmail.com> wrote:
>>
>>> Hello Apache Kafka community,
>>>
>>> Say there is only one topic with single partition and a single message
>>> on it.
>>> Result of calling a poll with new consumer will return ConsumerRecord
>>> for that message and it will have offset of 0.
>>>
>>> After processing message, current KafkaConsumer implementation expects
>>> one to commit not offset 0 as processed, but to commit offset 1 - next
>>> offset/position one would like to consume.
>>>
>>> Does this sound strange to you as well?
>>>
>>> Wondering couldn't this offset+1 handling for next position to read been
>>> done in one place, in KafkaConsumer implementation or broker or whatever,
>>> instead of every user of KafkaConsumer having to do it.
>>>
>>> Kind regards,
>>> Stevo Slavic.
>>>
>>
>>
>

Re: New consumer - offset one gets in poll is not offset one is supposed to commit

Posted by Stevo Slavić <ss...@gmail.com>.
Hello Cody,

I'm not sure we're talking about same thing.

Since you're mentioning streams I guess you were referring to current high
level consumer, while I'm talking about new yet unreleased high level
consumer.

Poll I was referring to is
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L715

ConsumerRecord offset I was referring to is
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecord.java#L22

poll returns ConsumerRecords, per TopicPartition collection of
ConsumerRecord. And in example I gave ConsumerRecord offset would be 0.

Commit I was referring to is
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L805

After processing read ConsumerRecord, commit expects me to submit not
offset 0 but offset 1...

Kind regards,
Stevo Slavic

On Fri, Jul 24, 2015 at 6:31 PM, Cody Koeninger <co...@koeninger.org> wrote:

> Well... there are only 2 hard problems in computer science: naming things,
> cache invalidation, and off-by-one errors.
>
> The direct stream implementation isn't asking you to "commit" anything.
> It's asking you to provide a starting point for the stream on startup.
>
> Because offset ranges are inclusive start, exclusive end, it's pretty
> natural to use the end of the previous offset range as the beginning of the
> next.
>
>
> On Fri, Jul 24, 2015 at 11:13 AM, Stevo Slavić <ss...@gmail.com> wrote:
>
>> Hello Apache Kafka community,
>>
>> Say there is only one topic with single partition and a single message on
>> it.
>> Result of calling a poll with new consumer will return ConsumerRecord for
>> that message and it will have offset of 0.
>>
>> After processing message, current KafkaConsumer implementation expects
>> one to commit not offset 0 as processed, but to commit offset 1 - next
>> offset/position one would like to consume.
>>
>> Does this sound strange to you as well?
>>
>> Wondering couldn't this offset+1 handling for next position to read been
>> done in one place, in KafkaConsumer implementation or broker or whatever,
>> instead of every user of KafkaConsumer having to do it.
>>
>> Kind regards,
>> Stevo Slavic.
>>
>
>

Re: New consumer - offset one gets in poll is not offset one is supposed to commit

Posted by Cody Koeninger <co...@koeninger.org>.
Well... there are only 2 hard problems in computer science: naming things,
cache invalidation, and off-by-one errors.

The direct stream implementation isn't asking you to "commit" anything.
It's asking you to provide a starting point for the stream on startup.

Because offset ranges are inclusive start, exclusive end, it's pretty
natural to use the end of the previous offset range as the beginning of the
next.


On Fri, Jul 24, 2015 at 11:13 AM, Stevo Slavić <ss...@gmail.com> wrote:

> Hello Apache Kafka community,
>
> Say there is only one topic with single partition and a single message on
> it.
> Result of calling a poll with new consumer will return ConsumerRecord for
> that message and it will have offset of 0.
>
> After processing message, current KafkaConsumer implementation expects one
> to commit not offset 0 as processed, but to commit offset 1 - next
> offset/position one would like to consume.
>
> Does this sound strange to you as well?
>
> Wondering couldn't this offset+1 handling for next position to read been
> done in one place, in KafkaConsumer implementation or broker or whatever,
> instead of every user of KafkaConsumer having to do it.
>
> Kind regards,
> Stevo Slavic.
>