You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2017/10/24 00:14:00 UTC

[jira] [Comment Edited] (BEAM-3093) add an option 'FirstPollOffsetStrategy' to KafkaIO

    [ https://issues.apache.org/jira/browse/BEAM-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216107#comment-16216107 ] 

Raghu Angadi edited comment on BEAM-3093 at 10/24/17 12:13 AM:
---------------------------------------------------------------

(3) is same as setting "auto.offset.reset" Kafka consumer config to "earliest".
(4) is the default behavior. Same as setting ""auto.offset.reset" config to "latest".

It is better not to mix 'checkpointed' offsets and 'auto_committed' offsets. If you restart a job from scratch, 'checkpointed' offsets are discarded out as well.

(1) & (2) might be useful in the case of 'auto_committed' offsets. User can always remove auto_committed offsets in Kafka through admin commands. In that sense, (1) is same as '(3) with auto committed offsets reset.'. In fact, resetting these offsets using Kafka admin commands gives you much better control on where you want to start processing. E.g. you could resume from 24 hours ago rather then from max retention period for Kafka.





was (Author: rangadi):
(3) is same as setting "auto.offset.reset" Kafka consumer config to "earliest".
(4) is the default behavior. Same as setting ""auto.offset.reset" config to "latest".

It is better not to mix 'checkpointed' offsets and 'auto_committed' offsets. If you restart a job scratch, 'checkpointed' offsets are thrown out as well.

(1) & (2) might be useful in the case of 'auto_committed' offsets. User can always remove auto_committed offsets in Kafka through admin commands. In that sense, (1) is same as '(3) with auto committed offsets reset.'. In fact, resetting these offsets using Kafka admin commands gives you much better control on where you want to start processing. E.g. you could resume from 24 hours ago rather then from max retention period for Kafka.




> add an option 'FirstPollOffsetStrategy' to KafkaIO
> --------------------------------------------------
>
>                 Key: BEAM-3093
>                 URL: https://issues.apache.org/jira/browse/BEAM-3093
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Xu Mingmin
>            Assignee: Kenneth Knowles
>
> This is a feature borrowed from Storm KafkaSpout.
> *What's the issue?*
> In KafkaIO, when offset is stored either in checkpoint or auto_committed, it cannot be changed in application, to force to read from earliest/latest. --This feature is important to reset the start offset when relaunching a job.
> *Proposed solution:*
> By borrowing the FirstPollOffsetStrategy concept, users can have more options:
> 1). *{{EARLIEST}}*: always start_from_beginning no matter of what's in checkpoint/auto_commit;
> 2). *{{LATEST}}*: always start_from_latest no matter of what's in checkpoint/auto_commit;
> 3). *{{UNCOMMITTED_EARLIEST}}*: if no offset in checkpoint/auto_commit then start_from_beginning if, otherwise start_from_previous_offset;
> 4). *{{UNCOMMITTED_LATEST}}*: if no offset in checkpoint/auto_commit then start_from_latest, otherwise start_from_previous_offset;
> [~rangadi], any comments?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)