You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Ben Kirwin (JIRA)" <ji...@apache.org> on 2015/02/16 22:18:11 UTC

[jira] [Updated] (SAMZA-568) Start offset override in Task init

     [ https://issues.apache.org/jira/browse/SAMZA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ben Kirwin updated SAMZA-568:
-----------------------------
    Attachment: 0001-Allow-overriding-starting-offsets-in-TaskContext.patch

> Start offset override in Task init
> ----------------------------------
>
>                 Key: SAMZA-568
>                 URL: https://issues.apache.org/jira/browse/SAMZA-568
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Ben Kirwin
>            Priority: Minor
>         Attachments: 0001-Allow-overriding-starting-offsets-in-TaskContext.patch
>
>
> A couple months back -- [on the mailing list | http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201411.mbox/%3CCACuX-D_ZWZP2EmQSE4NOU76skFh6bkifitzSMnm_b8DxJuTqRw@mail.gmail.com%3E] -- I mentioned a couple offset management issues I'd been having. (I'm happy to elaborate on this, but in short: I associate some extra state / ordering information with the input offsets, and there's a nontrivial performance cost keeping Samza's checkpoints and my task's state in sync.)
> It occurs to me now that there's a simple workaround for this: disable Samza's checkpointing entirely, and let `StreamTask.init` choose the starting offsets. The task can just keep its checkpoints in an ordinary StorageEngine -- and by managing all the state from a single place, it's easy to keep everything in sync.
> The basic implementation actually seems fairly straightforward -- the consumers are not started until after the tasks are initialized, so all we'd need to do is allow the `init` method to override the starting offsets. I've attached a small patch that exposes this through the TaskContext interface, just to illustrate the idea -- if this seems like an interesting feature for Samza, I'm happy to add more tests / documentation / etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)