You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2014/04/08 18:48:19 UTC

[jira] [Commented] (SAMZA-232) Keys and values in state should be versioned

    [ https://issues.apache.org/jira/browse/SAMZA-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963160#comment-13963160 ] 

Chris Riccomini commented on SAMZA-232:
---------------------------------------

Hmm. We should really think about this. Can you be more specific about the use case for this? What metadata do you want to attach?

I haven't wanted to do anything like this. It makes Samza's streams "different" from any other Kafka topic. It makes it so that non-Samza consumers won't be able to consume the changelogs without a custom Samza-specific decoder. This is an even weirder change, since you're only talking about Samza's changelog streams. 

In addition, there's nothing that dictates that a changelog need be a byte[]. LoggedStore is typed with K, V. It's very possible to have a changelog that's a JDBC connection, or some other object-based (not byte-based) connection, though I'm not sure why you'd want to.

> Keys and values in state should be versioned
> --------------------------------------------
>
>                 Key: SAMZA-232
>                 URL: https://issues.apache.org/jira/browse/SAMZA-232
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Martin Kleppmann
>             Fix For: 0.7.0
>
>
> At the moment, keys and values that are written to a task's key-value store (and the associated changelog stream) are just the bytes that were generated by the serde. This will be a problem in future, since it gives us no way of changing the storage format.
> For example, in order to implement exactly-once semantics, we may want to associate additional metadata with each value (and that metadata would be managed by the framework, and would not be seen by serdes). The current implementation does not give us any room to make such a change, because a job would not know whether the value it is reading includes metadata or not.
> I propose that we prefix every key and every value in the key-value store and the changelog stream with a version number, currently just a zero byte. That is an incompatible change, so we should do it before the 0.7.0 release. In future, if we ever need to change the storage format, we can bump the version number and thus allow jobs to be gracefully upgraded in-place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)