You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Michel Tourn (JIRA)" <ji...@apache.org> on 2006/05/03 00:10:46 UTC
[jira] Created: (HADOOP-191) add hadoopStreaming to src/contrib
add hadoopStreaming to src/contrib
----------------------------------
Key: HADOOP-191
URL: http://issues.apache.org/jira/browse/HADOOP-191
Project: Hadoop
Type: New Feature
Reporter: Michel Tourn
Assigned to: Doug Cutting
This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
TO test the patch:
Merge the patch.
The only existing file that is modified is trunk/build.xml
trunk>ant deploy-contrib
trunk>bin/hadoopStreaming : should show usage message
trunk>ant test-contrib : should run one test successfully
TO add src/contrib/someOtherProject:
edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-191) add hadoopStreaming to src/contrib
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-191?page=comments#action_12377488 ]
Doug Cutting commented on HADOOP-191:
-------------------------------------
I'm okay with all the contrib targets. But all of the other changes to that file seem spurious. The new properties are unused and the directory created will be created by another build script anyway.
> add hadoopStreaming to src/contrib
> ----------------------------------
>
> Key: HADOOP-191
> URL: http://issues.apache.org/jira/browse/HADOOP-191
> Project: Hadoop
> Type: New Feature
> Reporter: Michel Tourn
> Assignee: Doug Cutting
> Attachments: streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch:
> Merge the patch.
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-191) add hadoopStreaming to src/contrib
Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]
Michel Tourn updated HADOOP-191:
--------------------------------
Attachment: streaming.patch
> add hadoopStreaming to src/contrib
> ----------------------------------
>
> Key: HADOOP-191
> URL: http://issues.apache.org/jira/browse/HADOOP-191
> Project: Hadoop
> Type: New Feature
> Reporter: Michel Tourn
> Assignee: Doug Cutting
> Attachments: streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch:
> Merge the patch.
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
Please don't re-open closed bugs
Posted by Doug Cutting <cu...@apache.org>.
Please don't re-open closed bugs that are included in releases, since
that messes up Jira's change log for the release. Bugs normally
transition from Open to Resolved/Fixed, then to Closed when a release is
made. It's okay to re-open a bug whose fix has not yet been released,
but it is generally better to create a new bug in this case too.
Ideally I could modify the Jira permissions so that non-administrators
cannot transition bugs from Closed to Open and are forced to instead
start a new bug, but that does not appear to be possible.
Doug
[jira] Reopened: (HADOOP-191) add hadoopStreaming to src/contrib
Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]
Michel Tourn reopened HADOOP-191:
---------------------------------
An update to hadoop-streaming.
> add hadoopStreaming to src/contrib
> ----------------------------------
>
> Key: HADOOP-191
> URL: http://issues.apache.org/jira/browse/HADOOP-191
> Project: Hadoop
> Type: New Feature
> Reporter: Michel Tourn
> Assignee: Doug Cutting
> Fix For: 0.2
> Attachments: streaming.2.patch, streaming.3.patch, streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch:
> Merge the patch.
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-191) add hadoopStreaming to src/contrib
Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]
Michel Tourn updated HADOOP-191:
--------------------------------
Attachment: streaming.3.patch
This patch depends on the LargeUTF8 patch: http://issues.apache.org/jira/browse/HADOOP-136
Added a few more configurable options.
michel@cdev2004> bin/hadoop jar build/hadoop-streaming.jar -info
Usage: $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar [options]
Options:
-input <path> DFS input file(s) for the Map step
-output <path> DFS output directory for the Reduce step
-mapper <cmd> The streaming command to run
-combiner <cmd> Not implemented. But you can pipe the mapper output
-reducer <cmd> The streaming command to run
-file <file> File/dir to be shipped in the Job jar file
-cluster <name> Default uses hadoop-default.xml and hadoop-site.xml
-config <file> Optional. One or more paths to xml config files
-dfs <h:p> Optional. Override DFS configuration
-jt <h:p> Optional. Override JobTracker configuration
-inputreader <spec> Optional.
-jobconf <n>=<v> Optional.
-cmdenv <n>=<v> Optional. Pass env.var to streaming commands
-verbose
For more details about these options:
Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info
> add hadoopStreaming to src/contrib
> ----------------------------------
>
> Key: HADOOP-191
> URL: http://issues.apache.org/jira/browse/HADOOP-191
> Project: Hadoop
> Type: New Feature
> Reporter: Michel Tourn
> Assignee: Doug Cutting
> Fix For: 0.2
> Attachments: streaming.2.patch, streaming.3.patch, streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch:
> Merge the patch.
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Resolved: (HADOOP-191) add hadoopStreaming to src/contrib
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]
Doug Cutting resolved HADOOP-191:
---------------------------------
Fix Version: 0.2
Resolution: Fixed
I just committed this. It looks great! Thanks, Michel.
> add hadoopStreaming to src/contrib
> ----------------------------------
>
> Key: HADOOP-191
> URL: http://issues.apache.org/jira/browse/HADOOP-191
> Project: Hadoop
> Type: New Feature
> Reporter: Michel Tourn
> Assignee: Doug Cutting
> Fix For: 0.2
> Attachments: streaming.2.patch, streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch:
> Merge the patch.
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-191) add hadoopStreaming to src/contrib
Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]
Michel Tourn updated HADOOP-191:
--------------------------------
Attachment: streaming.2.patch
Updated patch:
1. top-level build.xml has 3 contrib targets and no other changes.
2. script hadoopStreaming is gone.
new Usage message:
bin/hadoop jar build/hadoop-streaming.jar [options]
3. removed some spurious exec permissions on source files
> add hadoopStreaming to src/contrib
> ----------------------------------
>
> Key: HADOOP-191
> URL: http://issues.apache.org/jira/browse/HADOOP-191
> Project: Hadoop
> Type: New Feature
> Reporter: Michel Tourn
> Assignee: Doug Cutting
> Attachments: streaming.2.patch, streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch:
> Merge the patch.
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-191) add hadoopStreaming to src/contrib
Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-191?page=comments#action_12377480 ]
Michel Tourn commented on HADOOP-191:
-------------------------------------
The usage message:
hadoop-trunk>bin/hadoopStreaming
Usage: hadoopStreaming [options]
Options:
-input <path> DFS input file(s) for the Map step
-output <path> DFS output directory for the Reduce step
-mapper <cmd> The streaming command to run
-reducer <cmd> The streaming command to run
-files <file> Additional files to be shipped in the Job jar file
-cluster <name> Default uses hadoop-default.xml and hadoop-site.xml
-config <file> Optional. One or more paths to xml config files
-inputreader <spec> Optional. See below
-verbose
In -input: globbing on <path> is supported and can have multiple -input
Default Map input format: a line is a record in UTF-8
the key part ends at first TAB, the rest of the line is the value
Custom Map input format: -inputreader package.MyRecordReader,n=v,n=v
comma-separated name-values can be specified to configure the InputFormat
Ex: -inputreader 'StreamXmlRecordReader,begin=<doc>,end=</doc>'
Map output format, reduce input/output format:
Format defined by what mapper command outputs. Line-oriented
Mapper and Reducer <cmd> syntax:
If the mapper or reducer programs are prefixed with noship: then
the paths are assumed to be valid absolute paths on the task tracker machines
and are NOT packaged with the Job jar file.
Use -cluster <name> to switch between "local" Hadoop and one or more remote
Hadoop clusters.
The default is to use the normal hadoop-default.xml and hadoop-site.xml
Else configuration will use $HADOOP_HOME/conf/hadoop-<name>.xml
Example: hadoopStreaming -mapper "noship:/usr/local/bin/perl5 filter.pl"
-files /local/filter.pl -input "/logs/0604*/*" [...]
Ships a script, invokes the non-shipped perl interpreter
Shipped files go to the working directory so filter.pl is found by perl
Input files are all the daily logs for days in month 2006-04
> add hadoopStreaming to src/contrib
> ----------------------------------
>
> Key: HADOOP-191
> URL: http://issues.apache.org/jira/browse/HADOOP-191
> Project: Hadoop
> Type: New Feature
> Reporter: Michel Tourn
> Assignee: Doug Cutting
> Attachments: streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch:
> Merge the patch.
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-191) add hadoopStreaming to src/contrib
Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-191?page=comments#action_12377487 ]
Michel Tourn commented on HADOOP-191:
-------------------------------------
>top-level build.xml :
OK, I can remove the unnecessary changes.
Which contrib targets would you keep in?
I mimicked deploy-contrib, test-contrib, clean-contrib on Nutch plugins.
(It is true that for now the new targets are not required since the nightly target does not call them.)
>bin/hadoop jar build/hadoop-streaming.jar ...
Looks cleaner. I'll try to do it this way
> add hadoopStreaming to src/contrib
> ----------------------------------
>
> Key: HADOOP-191
> URL: http://issues.apache.org/jira/browse/HADOOP-191
> Project: Hadoop
> Type: New Feature
> Reporter: Michel Tourn
> Assignee: Doug Cutting
> Attachments: streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch:
> Merge the patch.
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-191) add hadoopStreaming to src/contrib
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-191?page=comments#action_12377484 ]
Doug Cutting commented on HADOOP-191:
-------------------------------------
Most of the changes to the top-level build.xml don't seem to be required, and a number are spurious whitespace and comment changes. It seems to build fine with only the new targets added.
Also, is the new bin/ script required? Won't 'bin/hadoop jar build/hadoop-streaming.jar ...' suffice? (You'll need to set the "Main-Class" attribute in the jar's manifest.)
> add hadoopStreaming to src/contrib
> ----------------------------------
>
> Key: HADOOP-191
> URL: http://issues.apache.org/jira/browse/HADOOP-191
> Project: Hadoop
> Type: New Feature
> Reporter: Michel Tourn
> Assignee: Doug Cutting
> Attachments: streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch:
> Merge the patch.
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Closed: (HADOOP-191) add hadoopStreaming to src/contrib
Posted by "Michel Tourn (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/HADOOP-191?page=all ]
Michel Tourn closed HADOOP-191:
-------------------------------
Resolution: Fixed
(Visible to jira-users)
> add hadoopStreaming to src/contrib
> ----------------------------------
>
> Key: HADOOP-191
> URL: http://issues.apache.org/jira/browse/HADOOP-191
> Project: Hadoop
> Type: New Feature
> Reporter: Michel Tourn
> Assignee: Doug Cutting
> Fix For: 0.2
> Attachments: streaming.2.patch, streaming.3.patch, streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch:
> Merge the patch.
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira