You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Michel Tourn (JIRA)" <ji...@apache.org> on 2006/05/03 00:10:46 UTC

[jira] Created: (HADOOP-191) add hadoopStreaming to src/contrib

add hadoopStreaming to src/contrib
----------------------------------

         Key: HADOOP-191
         URL: http://issues.apache.org/jira/browse/HADOOP-191
     Project: Hadoop
        Type: New Feature

    Reporter: Michel Tourn
 Assigned to: Doug Cutting 


This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)


TO test the patch: 
Merge the patch. 
The only existing file that is modified is trunk/build.xml
trunk>ant deploy-contrib
trunk>bin/hadoopStreaming : should show usage message
trunk>ant test-contrib    : should run one test successfully

TO add src/contrib/someOtherProject:
edit src/contrib/build.xml





-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-191) add hadoopStreaming to src/contrib

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-191?page=comments#action_12377488 ] 

Doug Cutting commented on HADOOP-191:
-------------------------------------

I'm okay with all the contrib targets.  But all of the other changes to that file seem spurious.  The new properties are unused and the directory created will be created by another build script anyway.

> add hadoopStreaming to src/contrib
> ----------------------------------
>
>          Key: HADOOP-191
>          URL: http://issues.apache.org/jira/browse/HADOOP-191
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Michel Tourn
>     Assignee: Doug Cutting
>  Attachments: streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch: 
> Merge the patch. 
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib    : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-191) add hadoopStreaming to src/contrib

Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]

Michel Tourn updated HADOOP-191:
--------------------------------

    Attachment: streaming.patch

> add hadoopStreaming to src/contrib
> ----------------------------------
>
>          Key: HADOOP-191
>          URL: http://issues.apache.org/jira/browse/HADOOP-191
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Michel Tourn
>     Assignee: Doug Cutting
>  Attachments: streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch: 
> Merge the patch. 
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib    : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Please don't re-open closed bugs

Posted by Doug Cutting <cu...@apache.org>.
Please don't re-open closed bugs that are included in releases, since 
that messes up Jira's change log for the release.  Bugs normally 
transition from Open to Resolved/Fixed, then to Closed when a release is 
made.  It's okay to re-open a bug whose fix has not yet been released, 
but it is generally better to create a new bug in this case too.

Ideally I could modify the Jira permissions so that non-administrators 
cannot transition bugs from Closed to Open and are forced to instead 
start a new bug, but that does not appear to be possible.

Doug

[jira] Reopened: (HADOOP-191) add hadoopStreaming to src/contrib

Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]
     
Michel Tourn reopened HADOOP-191:
---------------------------------


An update to hadoop-streaming.



> add hadoopStreaming to src/contrib
> ----------------------------------
>
>          Key: HADOOP-191
>          URL: http://issues.apache.org/jira/browse/HADOOP-191
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Michel Tourn
>     Assignee: Doug Cutting
>      Fix For: 0.2
>  Attachments: streaming.2.patch, streaming.3.patch, streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch: 
> Merge the patch. 
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib    : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-191) add hadoopStreaming to src/contrib

Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]

Michel Tourn updated HADOOP-191:
--------------------------------

    Attachment: streaming.3.patch

This patch depends on the LargeUTF8 patch:  http://issues.apache.org/jira/browse/HADOOP-136


Added a few more configurable options.

michel@cdev2004> bin/hadoop jar build/hadoop-streaming.jar -info
Usage: $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar [options]
Options:
  -input    <path>     DFS input file(s) for the Map step
  -output   <path>     DFS output directory for the Reduce step
  -mapper   <cmd>      The streaming command to run
  -combiner <cmd>      Not implemented. But you can pipe the mapper output
  -reducer  <cmd>      The streaming command to run
  -file     <file>     File/dir to be shipped in the Job jar file
  -cluster  <name>     Default uses hadoop-default.xml and hadoop-site.xml
  -config   <file>     Optional. One or more paths to xml config files
  -dfs      <h:p>      Optional. Override DFS configuration
  -jt       <h:p>      Optional. Override JobTracker configuration
  -inputreader <spec>  Optional.
  -jobconf  <n>=<v>    Optional.
  -cmdenv   <n>=<v>    Optional. Pass env.var to streaming commands
  -verbose

For more details about these options:
Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info


> add hadoopStreaming to src/contrib
> ----------------------------------
>
>          Key: HADOOP-191
>          URL: http://issues.apache.org/jira/browse/HADOOP-191
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Michel Tourn
>     Assignee: Doug Cutting
>      Fix For: 0.2
>  Attachments: streaming.2.patch, streaming.3.patch, streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch: 
> Merge the patch. 
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib    : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HADOOP-191) add hadoopStreaming to src/contrib

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]
     
Doug Cutting resolved HADOOP-191:
---------------------------------

    Fix Version: 0.2
     Resolution: Fixed

I just committed this.  It looks great!  Thanks, Michel.

> add hadoopStreaming to src/contrib
> ----------------------------------
>
>          Key: HADOOP-191
>          URL: http://issues.apache.org/jira/browse/HADOOP-191
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Michel Tourn
>     Assignee: Doug Cutting
>      Fix For: 0.2
>  Attachments: streaming.2.patch, streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch: 
> Merge the patch. 
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib    : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-191) add hadoopStreaming to src/contrib

Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]

Michel Tourn updated HADOOP-191:
--------------------------------

    Attachment: streaming.2.patch

Updated patch:

1. top-level build.xml has 3 contrib targets and no other changes.

2. script hadoopStreaming is gone. 
new Usage message: 
bin/hadoop jar build/hadoop-streaming.jar [options]

3. removed some spurious exec permissions on  source files


> add hadoopStreaming to src/contrib
> ----------------------------------
>
>          Key: HADOOP-191
>          URL: http://issues.apache.org/jira/browse/HADOOP-191
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Michel Tourn
>     Assignee: Doug Cutting
>  Attachments: streaming.2.patch, streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch: 
> Merge the patch. 
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib    : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-191) add hadoopStreaming to src/contrib

Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-191?page=comments#action_12377480 ] 

Michel Tourn commented on HADOOP-191:
-------------------------------------

The usage message:

hadoop-trunk>bin/hadoopStreaming

Usage: hadoopStreaming [options]
Options:
  -input   <path>     DFS input file(s) for the Map step
  -output  <path>     DFS output directory for the Reduce step
  -mapper  <cmd>      The streaming command to run
  -reducer <cmd>      The streaming command to run
  -files   <file>     Additional files to be shipped in the Job jar file
  -cluster <name>     Default uses hadoop-default.xml and hadoop-site.xml
  -config  <file>     Optional. One or more paths to xml config files
  -inputreader <spec> Optional. See below
  -verbose

In -input: globbing on <path> is supported and can have multiple -input
Default Map input format: a line is a record in UTF-8
  the key part ends at first TAB, the rest of the line is the value
Custom Map input format: -inputreader package.MyRecordReader,n=v,n=v
  comma-separated name-values can be specified to configure the InputFormat
  Ex: -inputreader 'StreamXmlRecordReader,begin=<doc>,end=</doc>'
Map output format, reduce input/output format:
  Format defined by what mapper command outputs. Line-oriented
Mapper and Reducer <cmd> syntax:
  If the mapper or reducer programs are prefixed with noship: then
  the paths are assumed to be valid absolute paths on the task tracker machines
  and are NOT packaged with the Job jar file.
Use -cluster <name> to switch between "local" Hadoop and one or more remote
  Hadoop clusters.
  The default is to use the normal hadoop-default.xml and hadoop-site.xml
  Else configuration will use $HADOOP_HOME/conf/hadoop-<name>.xml

Example: hadoopStreaming -mapper "noship:/usr/local/bin/perl5 filter.pl"
           -files /local/filter.pl -input "/logs/0604*/*" [...]
  Ships a script, invokes the non-shipped perl interpreter
  Shipped files go to the working directory so filter.pl is found by perl
  Input files are all the daily logs for days in month 2006-04


> add hadoopStreaming to src/contrib
> ----------------------------------
>
>          Key: HADOOP-191
>          URL: http://issues.apache.org/jira/browse/HADOOP-191
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Michel Tourn
>     Assignee: Doug Cutting
>  Attachments: streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch: 
> Merge the patch. 
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib    : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-191) add hadoopStreaming to src/contrib

Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-191?page=comments#action_12377487 ] 

Michel Tourn commented on HADOOP-191:
-------------------------------------

>top-level build.xml : 
OK, I can remove the unnecessary changes.
Which contrib targets would you keep in?
I mimicked deploy-contrib, test-contrib, clean-contrib on Nutch plugins.
(It is true that for now the new targets are not required since the nightly target does not call them.)

>bin/hadoop jar build/hadoop-streaming.jar ...
Looks cleaner. I'll try to do it this way



> add hadoopStreaming to src/contrib
> ----------------------------------
>
>          Key: HADOOP-191
>          URL: http://issues.apache.org/jira/browse/HADOOP-191
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Michel Tourn
>     Assignee: Doug Cutting
>  Attachments: streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch: 
> Merge the patch. 
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib    : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-191) add hadoopStreaming to src/contrib

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-191?page=comments#action_12377484 ] 

Doug Cutting commented on HADOOP-191:
-------------------------------------

Most of the changes to the top-level build.xml don't seem to be required, and a number are spurious whitespace and comment changes.  It seems to build fine with only the new targets added.

Also, is the new bin/ script required?  Won't 'bin/hadoop jar build/hadoop-streaming.jar ...' suffice?  (You'll need to set the "Main-Class" attribute in the jar's manifest.)


> add hadoopStreaming to src/contrib
> ----------------------------------
>
>          Key: HADOOP-191
>          URL: http://issues.apache.org/jira/browse/HADOOP-191
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Michel Tourn
>     Assignee: Doug Cutting
>  Attachments: streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch: 
> Merge the patch. 
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib    : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Closed: (HADOOP-191) add hadoopStreaming to src/contrib

Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]
     
Michel Tourn closed HADOOP-191:
-------------------------------

    Resolution: Fixed

(Visible to jira-users)


> add hadoopStreaming to src/contrib
> ----------------------------------
>
>          Key: HADOOP-191
>          URL: http://issues.apache.org/jira/browse/HADOOP-191
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Michel Tourn
>     Assignee: Doug Cutting
>      Fix For: 0.2
>  Attachments: streaming.2.patch, streaming.3.patch, streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch: 
> Merge the patch. 
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib    : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira