You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2014/04/03 00:32:17 UTC

[jira] [Commented] (SAMZA-184) Add thin multi-language support for SamzaContainer

    [ https://issues.apache.org/jira/browse/SAMZA-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958267#comment-13958267 ] 

Chris Riccomini commented on SAMZA-184:
---------------------------------------

This is pretty cool. [~martinkl]'s findings make sense to me. I kind of like the STDIN/STDOUT approach for the transport since it's pretty straightforward to implement. I'm cool with a faster serialization, provided it's got good multi-language support, which most do.

> Add thin multi-language support for SamzaContainer
> --------------------------------------------------
>
>                 Key: SAMZA-184
>                 URL: https://issues.apache.org/jira/browse/SAMZA-184
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>         Attachments: Test.java
>
>
> There has been some interest in supporting languages other than Java (or JVM-based languages). We have already opened up SAMZA-18, which proposes supporting a C implementation of SamzaContainer.
> A second solution to this problem is to have a StreamTask implementation that starts a child process in another language, and acts as a bridge between the child process and the java-based Samza APIs. This is the way that both Storm [1] and Hadoop work.
> A lot of design decisions need to be fleshed out to support this, but most people on the mailing list were very supportive of this approach. [2]
> Things that need to be decided:
> 1. Should we start one subprocess per SamzaContainer, or one subprocess per StreamTask?
> 2. How should the parent interact with the subprocess at both the transport (stdin/stdout, unix sockets, TCP, HTTP, Thrift, etc) and serialization level (protobuf, json, etc)?
> 3. What should the protocol look like? We should ideally support all of the operations in StreamTask, InitableTask, WindowableTask, ClosableTask, etc.
> 4. Should the child process receive the messages in batches, or one at a time?
> It'd be good to get a draft proposal up on the Wiki, so we can all discuss this and converge on an implementation.
> [1] https://github.com/nathanmarz/storm/wiki/Multilang-protocol
> [2] http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201403.mbox/%3CCAB%2B2NVXX2Fq_61WfvH%2BAfW8ZW7vQbVfTN-JPGU%2Bd7AdZ73oPDQ%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)