You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Jake Maes (JIRA)" <ji...@apache.org> on 2017/11/09 19:56:00 UTC

[jira] [Resolved] (SAMZA-1480) TaskStorageManager improperly initializes changelog consumer position when restoring a store from disk

     [ https://issues.apache.org/jira/browse/SAMZA-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Maes resolved SAMZA-1480.
------------------------------
    Resolution: Fixed

Issue resolved by pull request 350
[https://github.com/apache/samza/pull/350]

> TaskStorageManager improperly initializes changelog consumer position when restoring a store from disk
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SAMZA-1480
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1480
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>            Reporter: Jake Maes
>            Assignee: Jake Maes
>            Priority: Trivial
>             Fix For: 0.14.0
>
>
> For the Host Affinity state restore, an OFFSET file is written to disk on each commit. This offset file contains the most recently written changelog event which is also reflected in the on-disk state. When the container is restarted, it restores the on-disk store and then replays the changelog from the offset recorded in the OFFSET file in order to restore any changelog events that were produced when the job ran on a different host. 
> http://samza.apache.org/learn/documentation/0.13/yarn/yarn-host-affinity.html
> When TaskStorageManager initializes the consumer, it uses the offset from the OFFSET file, which is already reflected in the state. 
> Instead, it should use the SystemAdmin.getOffsetsAfter() method to get the next offset to consume. This will avoid the replay of 1 extra message for state restore.
> It should then use SystemAdmin.offsetComparator() to use the larger of the next offset (calculated above) and the oldest offset (according to the metadata). This is necessary for changelogs configured with TTL retention rather than infinite retention where the offset from the OFFSET file may no longer be valid. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)