You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2014/09/03 17:00:52 UTC

[jira] [Commented] (SAMZA-354) Write tool to convert old-style checkpoint log to post-SAMZA-123 format

    [ https://issues.apache.org/jira/browse/SAMZA-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119926#comment-14119926 ] 

Chris Riccomini commented on SAMZA-354:
---------------------------------------

Another thought on this: what if we just update the KafkaCheckpointManager in 0.8 to automatically migrate from 0.7 if there is no 0.8 topic when the job starts up?

# Check if 0.8 topic exists.
# If not, check if 0.7 topic exists.
# If 0.8 doesn't exist and 0.7 does, create an 0.8 checkpoint topic, then read all partitions from 0.7, translate them to 0.8 checkpoints, and write them to the new topic.

There is a race condition in (3), where the 0.8 topic might be created, and the container could fail before it writes the checkpoints, and future loads would just read an empty 0.8 topic. We could tweak the logic to be if 0.8 topic doesn't exists or is empty, though.

This seems like the best solution for the end-user, as it would involve no tool, and no coordinating with other people (in the case of production) to do the migration. Code-cleanliness-wise, it's a bit ugly, though. Since it's temporary, we could remove it after 0.8, though.

> Write tool to convert old-style checkpoint log to post-SAMZA-123 format
> -----------------------------------------------------------------------
>
>                 Key: SAMZA-354
>                 URL: https://issues.apache.org/jira/browse/SAMZA-354
>             Project: Samza
>          Issue Type: Task
>    Affects Versions: 0.8.0
>            Reporter: Jakob Homan
>            Assignee: David Chen
>
> After SAMZA-123, the checkpoint log has a new format (keyed entries interspersed with statelog-partition mapping) and a new name.  It would be simple to write a tool that would consume an old-style log and write out a new-style log, using the GroupByPartition strategy.  This would allow existing jobs to not lose checkpointing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)