You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2007/05/04 23:48:15 UTC
[jira] Created: (HADOOP-1330) Unifying Hadoop Steaming/Hadoop Pipe
Unifying Hadoop Steaming/Hadoop Pipe
------------------------------------
Key: HADOOP-1330
URL: https://issues.apache.org/jira/browse/HADOOP-1330
Project: Hadoop
Issue Type: Improvement
Reporter: Runping Qi
Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-1330) Unifying Hadoop Steaming/Hadoop Pipe
Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Devaraj Das reassigned HADOOP-1330:
-----------------------------------
Assignee: Devaraj Das (was: Owen O'Malley)
> Unifying Hadoop Steaming/Hadoop Pipe
> ------------------------------------
>
> Key: HADOOP-1330
> URL: https://issues.apache.org/jira/browse/HADOOP-1330
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/streaming
> Reporter: Runping Qi
> Assignee: Devaraj Das
>
> Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1330) Unifying Hadoop Steaming/Hadoop Pipe
Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sameer Paranjpye updated HADOOP-1330:
-------------------------------------
Component/s: contrib/streaming
Assignee: Owen O'Malley
Description:
Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.
was:
Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.
> Unifying Hadoop Steaming/Hadoop Pipe
> ------------------------------------
>
> Key: HADOOP-1330
> URL: https://issues.apache.org/jira/browse/HADOOP-1330
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/streaming
> Reporter: Runping Qi
> Assigned To: Owen O'Malley
>
> Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1330) Unifying Hadoop Steaming/Hadoop
Pipe
Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508781 ]
Devaraj Das commented on HADOOP-1330:
-------------------------------------
Some early thoughts on merging Streaming & Pipes (some of them are potential improvements in Streaming)
1) The command-line for both can be unified since they share quite a few common arguments. So we could have a base class that handles all the common arguments, and subclasses that handle the respective specific arguments. Toolbase is one of the candidates that can help here.
2) Both Streaming and Pipes frameworks spawns Java Map/Reduce tasks that in turn spawns the executables (like perl scripts or c++ executables). The main difference between the two approaches is in the communication protocol between the Java map/reduce processes and the executables - Streaming uses stdin/stdout streams and Pipes uses sockets. One thing to investigate here is the feasibility of implement the Pipes protocols for the Streaming case.
3) The combiner in Pipes is more flexible in that it allows both native and with some tweaks can use Java combiners as well. This is missing in Streaming where we are restricted to invoke the user's combiner only through the Java framework.
4) Use of FileCache in Streaming
Have the above so far .. Would be great if others can add to this list. Planning to reuse Pipes as much as possible for the Streaming framework. Also, pls let me know if there are features in Streaming that we want to introduce in Pipes?
> Unifying Hadoop Steaming/Hadoop Pipe
> ------------------------------------
>
> Key: HADOOP-1330
> URL: https://issues.apache.org/jira/browse/HADOOP-1330
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/streaming
> Reporter: Runping Qi
> Assignee: Owen O'Malley
>
> Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.