You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@samza.apache.org by "Martin Kleppmann (JIRA)" <ji...@apache.org> on 2014/04/02 21:12:15 UTC

[jira] [Resolved] (SAMZA-180) Support one-time offset reset for a Samza job

     [ https://issues.apache.org/jira/browse/SAMZA-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Kleppmann resolved SAMZA-180.
------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.7.0

Got a "ship it" on the RB. Thanks for reviewing! I've committed this. Resolving.

> Support one-time offset reset for a Samza job
> ---------------------------------------------
>
>                 Key: SAMZA-180
>                 URL: https://issues.apache.org/jira/browse/SAMZA-180
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>            Assignee: Martin Kleppmann
>             Fix For: 0.7.0
>
>         Attachments: SAMZA-180.1.patch, SAMZA-180.5.patch
>
>
> Samza currently has a systems.%s.streams.%s.samza.reset.offset configuration. When set to "true", this configuration tells each SamzaContainer to disregard the checkpointed offsets for a stream when starting up. The problem with this configuration is that the checkpoints are disregarded every time the SamzaContainer starts up, not just the first time. If a host that a SamzaContainer is running on fails, and YARN (or some other mechanism) restarts the SamzaContainer, the container will not pick up where it left off, but will instead disregard the checkpointed offsets, and start over again, as before.
> There are some use-cases where developers wish to have a one-time reset of the checkpointed offsets. That is, they want to reset the offsets exactly once, but then have failures not trigger another reset. This is typically useful in bootstrapping cases (related to SAMZA-179), where a developer wishes to reset its task back to offset 0, and process all messages up to the head of a stream, then shut down. Right now, the developer can set reset.offset=true, and auto.offset.reset=smallest (if reprocessing a Kafka topic), but if the container ever restarts, processing will begin again from offset 0. This is not ideal.



--
This message was sent by Atlassian JIRA
(v6.2#6252)