You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2007/05/04 23:48:15 UTC

[jira] Created: (HADOOP-1330) Unifying Hadoop Steaming/Hadoop Pipe

Unifying Hadoop Steaming/Hadoop Pipe
------------------------------------

                 Key: HADOOP-1330
                 URL: https://issues.apache.org/jira/browse/HADOOP-1330
             Project: Hadoop
          Issue Type: Improvement
            Reporter: Runping Qi



Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1330) Unifying Hadoop Steaming/Hadoop Pipe

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das reassigned HADOOP-1330:
-----------------------------------

    Assignee: Devaraj Das  (was: Owen O'Malley)

> Unifying Hadoop Steaming/Hadoop Pipe
> ------------------------------------
>
>                 Key: HADOOP-1330
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1330
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Runping Qi
>            Assignee: Devaraj Das
>
> Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1330) Unifying Hadoop Steaming/Hadoop Pipe

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sameer Paranjpye updated HADOOP-1330:
-------------------------------------

    Component/s: contrib/streaming
       Assignee: Owen O'Malley
    Description: 
Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.


  was:

Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.



> Unifying Hadoop Steaming/Hadoop Pipe
> ------------------------------------
>
>                 Key: HADOOP-1330
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1330
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Owen O'Malley
>
> Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1330) Unifying Hadoop Steaming/Hadoop Pipe

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508781 ] 

Devaraj Das commented on HADOOP-1330:
-------------------------------------

Some early thoughts on merging Streaming & Pipes (some of them are potential improvements in Streaming)
1) The command-line for both can be unified since they share quite a few common arguments. So we could have a base class that handles all the common arguments, and subclasses that handle the respective specific arguments. Toolbase is one of the candidates that can help here.
2) Both Streaming and Pipes frameworks spawns Java Map/Reduce tasks that in turn spawns the executables (like perl scripts or c++ executables). The main difference between the two approaches is in the communication protocol between the Java map/reduce processes and the executables - Streaming uses stdin/stdout streams and Pipes uses sockets. One thing to investigate here is the feasibility of implement the Pipes protocols for the Streaming case.
3) The combiner in Pipes is more flexible in that it allows both native and with some tweaks can use Java combiners as well. This is missing in Streaming where we are restricted to invoke the user's combiner only through the Java framework.
4) Use of FileCache in Streaming

Have the above so far .. Would be great if others can add to this list. Planning to reuse Pipes as much as possible for the Streaming framework. Also, pls let me know if there are features in Streaming that we want to introduce in Pipes?

> Unifying Hadoop Steaming/Hadoop Pipe
> ------------------------------------
>
>                 Key: HADOOP-1330
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1330
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Runping Qi
>            Assignee: Owen O'Malley
>
> Hadoop Streaming and Pipe have many similarities. It is worthwhile to examine how to factor out the commonality in the implementation and to unify the user interface as much as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.