You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@samza.apache.org by prateekm <gi...@git.apache.org> on 2017/09/08 21:54:11 UTC

[GitHub] samza pull request #293: Support for declaring serdes in high level API code...

GitHub user prateekm opened a pull request:

    https://github.com/apache/samza/pull/293

    Support for declaring serdes in high level API code for input/output/intermediate streams.

    @nickpan47 @vjagadish1989 @jmakes Please take a look.
    
    Some notes/considerations:
    * Serde interface is now Serializable. I've updated the built-in serde implementations to make sure they are Serializable, but custom serde implementations will need to be updated by users when they upgrade.
    * I've tried to preserve the number of serde objects and their usage before and after serialization, but don't guarantee that any references shared by these serde instances before serialization will be shared after deserialization. Ideally, serde of serdes should be done in one pass along with all other user provided objects in the DAG so that any shared references can be preserved.
    * I've left them as-is for now, but we probably need to move Serdes to samza-api now that users can interact with them directly.
    * Samza's JsonSerde uses Samza's custom ObjectMapper since it was probably meant to be an Samza internal serde. One peculiarity is that it forces the 'dasherized-names' property naming convention instead of the (default in Jackson and more common) camelCase naming convention. This makes user POJO definitions more verbose with JsonProperty annotations, and if users don't remember to dasherize property names the resulting errors are somewhat confusing to debug. Should we create a new JsonSerde with a default ObjectMapper when we move it to samza-api?
    * Note that the default value for a 'defaultKey/MsgSerde` is a no-op serde instead of null. This helps users of Systems that do the serialization/deserialization themselves (e.g. Wikipedia example), but the downside is that it is possible to forget setting the default serde before creating streams and end up with a no-op serde and ClassCastExceptions at runtime.
    * I've made a choice here to disallow setting system/stream serdes in config to make configs simpler. Imho the 'serdes only in code' and 'only 2 levels of serde' model is much simpler to understand. The risk is that there might be streams that aren't part of the user application code, but need their serdes to be set in configuration. An example is the logging topic. Do you think this is a reasonable choice?
    * There's a separate discussion to be had about introducing a KV type and specifying serdes for such KV types instead of specifying 'key' and 'msg' serdes separately.
    
    Tested with samza-hello-world Wikipedia application and all TestRepartitionWindowApp. Will add unit tests if the overall design is acceptable.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/prateekm/samza serde-instance

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/samza/pull/293.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #293
    
----
commit 0d399607064faecc983526c255c0b4037ae84aa0
Author: Prateek Maheshwari <pm...@linkedin.com>
Date:   2017-09-08T21:27:58Z

    Adding support for declaring serdes in high level API code for input/output/intermediate stream.

----


---

[GitHub] samza pull request #293: SAMZA-1109: Allow setting Serdes for fluent API ope...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/samza/pull/293


---