You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Martin Kleppmann (JIRA)" <ji...@apache.org> on 2014/04/08 07:22:16 UTC

[jira] [Created] (SAMZA-232) Keys and values in state should be versioned

Martin Kleppmann created SAMZA-232:
--------------------------------------

             Summary: Keys and values in state should be versioned
                 Key: SAMZA-232
                 URL: https://issues.apache.org/jira/browse/SAMZA-232
             Project: Samza
          Issue Type: Improvement
            Reporter: Martin Kleppmann
             Fix For: 0.7.0


At the moment, keys and values that are written to a task's key-value store (and the associated changelog stream) are just the bytes that were generated by the serde. This will be a problem in future, since it gives us no way of changing the storage format.

For example, in order to implement exactly-once semantics, we may want to associate additional metadata with each value (and that metadata would be managed by the framework, and would not be seen by serdes). The current implementation does not give us any room to make such a change, because a job would not know whether the value it is reading includes metadata or not.

I propose that we prefix every key and every value in the key-value store and the changelog stream with a version number, currently just a zero byte. That is an incompatible change, so we should do it before the 0.7.0 release. In future, if we ever need to change the storage format, we can bump the version number and thus allow jobs to be gracefully upgraded in-place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)