You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-commits@hadoop.apache.org by am...@apache.org on 2010/06/23 09:13:31 UTC
svn commit: r957126 - in /hadoop/mapreduce/trunk: CHANGES.txt
src/docs/src/documentation/content/xdocs/streaming.xml
Author: amareshwari
Date: Wed Jun 23 07:13:30 2010
New Revision: 957126
URL: http://svn.apache.org/viewvc?rev=957126&view=rev
Log:
MAPREDUCE-1851. Documents configuration parameters in streaming. Contributed by Amareshwari Sriramadasu.
Modified:
hadoop/mapreduce/trunk/CHANGES.txt
hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml
Modified: hadoop/mapreduce/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/CHANGES.txt?rev=957126&r1=957125&r2=957126&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/CHANGES.txt (original)
+++ hadoop/mapreduce/trunk/CHANGES.txt Wed Jun 23 07:13:30 2010
@@ -58,6 +58,9 @@ Trunk (unreleased changes)
of HADOOP-5472, MAPREDUCE-642 and HADOOP-5620. (Rodrigo Schmidt via
szetszwo)
+ MAPREDUCE-1851. Documents configuration parameters in streaming.
+ (amareshwari)
+
OPTIMIZATIONS
MAPREDUCE-1354. Enhancements to JobTracker for better performance and
Modified: hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml?rev=957126&r1=957125&r2=957126&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml (original)
+++ hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml Wed Jun 23 07:13:30 2010
@@ -653,6 +653,80 @@ from field 5 (corresponding to all the o
</section>
</section>
+<section>
+<title>Configurable parameters</title>
+<p>This section lists all the streaming specific configuration parameters which
+ are configurable for a streaming job.</p>
+
+<p>The Hadoop streaming configurable parameters are</p>
+<table>
+<tr><th>Parameter</th><th>Default value </th><th>Description </th></tr>
+<tr><td> stream.map.streamprocessor </td><td> - </td>
+ <td> The command for mapper. </td></tr>
+<tr><td> stream.map.input.ignoreKey </td><td> true </td><td> Specifies whether
+ to ignore key or not while writing input for the mapper. The configuration
+ parameter is valid only if stream.map.input.writer.class is
+ org.apache.hadoop.streaming.io.TextInputWriter.class. By default, for
+ TextInputFormat, it is true. </td></tr>
+<tr><td> stream.map.input.field.separator </td><td> \t </td><td> Seperator for
+ key and value, while writing input for mapper. This is honoured only if
+ "stream.map.input.ignoreKey" is false. </td></tr>
+<tr><td> stream.map.output.field.separator </td><td> \t </td><td> Seperator for
+ key and value, while reading output from mapper. </td></tr>
+<tr><td> stream.num.map.output.key.fields </td><td> 1 </td><td> Specifies the
+ nth field separator in the line of the map output as the
+ separator between the key and the value. </td></tr>
+<tr><td> stream.combine.streamprocessor </td><td> - </td>
+ <td> The command for combiner. </td></tr>
+<tr><td> stream.reduce.streamprocessor </td><td> - </td>
+ <td> The command for reducer. </td></tr>
+<tr><td> stream.reduce.input.field.separator </td><td> \t </td><td> Seperator
+ for key and value, while writing input for reducer. </td></tr>
+<tr><td> stream.reduce.output.field.separator </td><td> \t </td><td> Seperator
+ for key and value, while reading output from reducer. </td></tr>
+<tr><td> stream.num.reduce.output.key.fields </td><td> 1 </td><td> Specifies the
+ nth field separator in the line of the reduce output as the separator
+ between the key and the value. </td></tr>
+<tr><td> stream.map.input </td><td> text </td><td> Identifier to specify the
+ communication format used for map input. Possible values are text,
+ typedbytes and rawbytes. This value is honored only if no identifier is
+ specified via -io option</td></tr>
+<tr><td> stream.map.output </td><td> text </td><td> Identifier to specify the
+ communication format used for map output. Possible values are text,
+ typedbytes and rawbytes. This value is honored only if no identifier is
+ specified via -io option. </td></tr>
+<tr><td> stream.reduce.input </td><td> text </td><td> Identifier to specify the
+ communication format used for reduce input. Possible values are text,
+ typedbytes and rawbytes. This value is honored only if no identifier is
+ specified via -io option. </td></tr>
+<tr><td> stream.reduce.output </td><td> text </td><td> Identifier to specify the
+ communication format used for reduce output. Possible values are text,
+ typedbytes and rawbytes. This value is honored only if no identifier is
+ specified via -io option. </td></tr>
+<tr><td> stream.io.identifier.resolver.class </td>
+ <td> org.apache.hadoop.streaming.io.IdentifierResolver.class </td>
+ <td> The class to resolve iospec passed via option -io. </td></tr>
+<tr><td> stream.recordreader.class </td><td> - </td><td> RecordReader class
+ passed via -inputReader option. </td></tr>
+<tr><td> stream.recordreader.* </td><td> - </td><td> Configuration properties
+ for record reader passed via stream.recordreader.class. </td></tr>
+<tr><td> stream.shipped.hadoopstreaming </td><td> - </td><td> Custom streaming
+ build along with standard hadoop install</td></tr>
+<tr><td> stream.non.zero.exit.is.failure </td><td> true </td><td> Specifies
+ whether to treat non-zero exit code of the map/reduce process as a failure
+ or not. </td></tr>
+<tr><td> stream.tmpdir </td><td> - </td><td> Temporary directory used for jar
+ packaging</td></tr>
+<tr><td> stream.joindelay.milli </td><td> 0 </td><td> Timeout in milliseconds
+ for joining the error and output threads at the end of mapper/reducer. A
+ timeout of "0" means to wait forever. </td></tr>
+<tr><td> stream.minRecWrittenToEnableSkip_ </td><td> - </td><td> Minimum number
+ of input records written to skip map failure </td></tr>
+<tr><td> stream.stderr.reporter.prefix </td><td> reporter: </td><td> Reporter
+ prefix to indicate reporter statements emitted from stderr. </td>
+ </tr>
+</table>
+</section>
<!-- FREQUENTLY ASKED QUESTIONS -->
<section>
@@ -824,7 +898,10 @@ Anything found between BEGIN_STRING and
<p>
A streaming process can use the stderr to emit counter information.
<code>reporter:counter:<group>,<counter>,<amount></code>
-should be sent to stderr to update the counter.
+should be sent to stderr to update the counter. You can specify a different
+reporter prefix by specifying the value for the configuration property
+<code>stream.stderr.reporter.prefix</code>, by default it is
+<code>reporter:</code>.
</p>
</section>
@@ -835,7 +912,9 @@ should be sent to stderr to update the c
<p>
A streaming process can use the stderr to emit status information.
To set a status, <code>reporter:status:<message></code> should be sent
-to stderr.
+to stderr. You can specify a different reporter prefix by specifying the value
+for the configuration property <code>stream.stderr.reporter.prefix</code>, by
+default it is <code>reporter:</code>.
</p>
</section>