You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "David Chen (JIRA)" <ji...@apache.org> on 2014/09/11 07:46:34 UTC

[jira] [Updated] (SAMZA-184) Add thin multi-language support for SamzaContainer

     [ https://issues.apache.org/jira/browse/SAMZA-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Chen updated SAMZA-184:
-----------------------------
    Description: 
There has been some interest in supporting languages other than Java (or JVM-based languages). We have already opened up SAMZA-18, which proposes supporting a C implementation of SamzaContainer.

A second solution to this problem is to have a StreamTask implementation that starts a child process in another language, and acts as a bridge between the child process and the java-based Samza APIs. This is the way that both Storm [1] and Hadoop work.

A lot of design decisions need to be fleshed out to support this, but most people on the mailing list were very supportive of this approach. [2]

Things that need to be decided:

1. Should we start one subprocess per SamzaContainer, or one subprocess per StreamTask?
2. How should the parent interact with the subprocess at both the transport (stdin/stdout, unix sockets, TCP, HTTP, Thrift, etc) and serialization level (protobuf, json, etc)?
3. What should the protocol look like? We should ideally support all of the operations in StreamTask, InitableTask, WindowableTask, ClosableTask, etc.
4. Should the child process receive the messages in batches, or one at a time?

It'd be good to get a draft proposal up on the Wiki, so we can all discuss this and converge on an implementation.

[1] http://storm.incubator.apache.org/documentation/Multilang-protocol.html
[2] http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201403.mbox/%3CCAB%2B2NVXX2Fq_61WfvH%2BAfW8ZW7vQbVfTN-JPGU%2Bd7AdZ73oPDQ%40mail.gmail.com%3E

  was:
There has been some interest in supporting languages other than Java (or JVM-based languages). We have already opened up SAMZA-18, which proposes supporting a C implementation of SamzaContainer.

A second solution to this problem is to have a StreamTask implementation that starts a child process in another language, and acts as a bridge between the child process and the java-based Samza APIs. This is the way that both Storm [1] and Hadoop work.

A lot of design decisions need to be fleshed out to support this, but most people on the mailing list were very supportive of this approach. [2]

Things that need to be decided:

1. Should we start one subprocess per SamzaContainer, or one subprocess per StreamTask?
2. How should the parent interact with the subprocess at both the transport (stdin/stdout, unix sockets, TCP, HTTP, Thrift, etc) and serialization level (protobuf, json, etc)?
3. What should the protocol look like? We should ideally support all of the operations in StreamTask, InitableTask, WindowableTask, ClosableTask, etc.
4. Should the child process receive the messages in batches, or one at a time?

It'd be good to get a draft proposal up on the Wiki, so we can all discuss this and converge on an implementation.

[1] https://github.com/nathanmarz/storm/wiki/Multilang-protocol
[2] http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201403.mbox/%3CCAB%2B2NVXX2Fq_61WfvH%2BAfW8ZW7vQbVfTN-JPGU%2Bd7AdZ73oPDQ%40mail.gmail.com%3E


> Add thin multi-language support for SamzaContainer
> --------------------------------------------------
>
>                 Key: SAMZA-184
>                 URL: https://issues.apache.org/jira/browse/SAMZA-184
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>            Assignee: David Chen
>         Attachments: Test.java
>
>
> There has been some interest in supporting languages other than Java (or JVM-based languages). We have already opened up SAMZA-18, which proposes supporting a C implementation of SamzaContainer.
> A second solution to this problem is to have a StreamTask implementation that starts a child process in another language, and acts as a bridge between the child process and the java-based Samza APIs. This is the way that both Storm [1] and Hadoop work.
> A lot of design decisions need to be fleshed out to support this, but most people on the mailing list were very supportive of this approach. [2]
> Things that need to be decided:
> 1. Should we start one subprocess per SamzaContainer, or one subprocess per StreamTask?
> 2. How should the parent interact with the subprocess at both the transport (stdin/stdout, unix sockets, TCP, HTTP, Thrift, etc) and serialization level (protobuf, json, etc)?
> 3. What should the protocol look like? We should ideally support all of the operations in StreamTask, InitableTask, WindowableTask, ClosableTask, etc.
> 4. Should the child process receive the messages in batches, or one at a time?
> It'd be good to get a draft proposal up on the Wiki, so we can all discuss this and converge on an implementation.
> [1] http://storm.incubator.apache.org/documentation/Multilang-protocol.html
> [2] http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201403.mbox/%3CCAB%2B2NVXX2Fq_61WfvH%2BAfW8ZW7vQbVfTN-JPGU%2Bd7AdZ73oPDQ%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)