You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-commits@hadoop.apache.org by to...@apache.org on 2010/04/13 00:35:46 UTC
svn commit: r933440 - in /hadoop/mapreduce/trunk: CHANGES.txt src/docs/src/documentation/content/xdocs/site.xml src/docs/src/documentation/content/xdocs/streaming.xml

Author: tomwhite
Date: Mon Apr 12 22:35:46 2010
New Revision: 933440

URL: http://svn.apache.org/viewvc?rev=933440&view=rev
Log:
MAPREDUCE-889. binary communication formats added to Streaming by HADOOP-1722 should be documented. Contributed by Klaas Bosteels.

Modified:
    hadoop/mapreduce/trunk/CHANGES.txt
    hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/site.xml
    hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml

Modified: hadoop/mapreduce/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/CHANGES.txt?rev=933440&r1=933439&r2=933440&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/CHANGES.txt (original)
+++ hadoop/mapreduce/trunk/CHANGES.txt Mon Apr 12 22:35:46 2010
@@ -525,6 +525,9 @@ Trunk (unreleased changes)
     MAPREDUCE-1635. ResourceEstimator does not work after MAPREDUCE-842.
     (Amareshwari Sriramadasu via vinodkv)
 
+    MAPREDUCE-889. binary communication formats added to Streaming by
+    HADOOP-1722 should be documented. (Klaas Bosteels via tomwhite)
+
 Release 0.21.0 - Unreleased
 
   INCOMPATIBLE CHANGES

Modified: hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=933440&r1=933439&r2=933440&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/site.xml Mon Apr 12 22:35:46 2010
@@ -259,6 +259,9 @@ See http://forrest.apache.org/docs/linki
             <streaming href="streaming/">
               <package-summary href="package-summary.html" />
             </streaming>
+            <typedbytes href="typedbytes/">
+              <package-summary href="package-summary.html" />
+            </typedbytes>
             <util href="util/">
               <genericoptionsparser href="GenericOptionsParser.html" />
               <progress href="Progress.html" />

Modified: hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml?rev=933440&r1=933439&r2=933440&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml (original)
+++ hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml Mon Apr 12 22:35:46 2010
@@ -112,6 +112,7 @@ For an example, see <a href="streaming.h
 <tr><td> -numReduceTasks</td><td> Optional </td><td> Specify the number of reducers</td></tr>
 <tr><td> -mapdebug </td><td> Optional </td><td> Script to call when map task fails </td></tr>
 <tr><td> -reducedebug </td><td> Optional </td><td> Script to call when reduce task fails </td></tr>
+<tr><td> -io </td><td> Optional </td><td> Format to use for input to and output from client processes. </td></tr>
 </table>
 
 <section>
@@ -182,8 +183,25 @@ Since the TextInputFormat returns keys o
 </source>
 </section>
 
+<section>
+<title>Specifying the Communication Format</title>
+<p>
+By default Hadoop Streaming uses tab-separated lines of text as input/output format for passing data to and from client processes, but it is also possible to use other formats. Specifying the communication format can be done as follows:
+</p>
+<source>
+   -io [identifier]
+</source>
+<p>
+where <code>[identifier]</code> can be <code>text</code>, <code>rawbytes</code> or <code>typedbytes</code>. These identifiers correspond to the following formats:
+</p>
+<ul>
+<li><code>text</code>: The default tab-separated lines of text.</li>
+<li><code>rawbytes</code>: Keys and values are passed as a 4 byte length followed by the raw bytes.</li>
+<li><code>typedbytes</code>: The "typed bytes" format as described in the <a href="ext:api/org/apache/hadoop/typedbytes/package-summary">API documentation</a> for the package <code>org.apache.hadoop.typedbytes</code>.</li>
+</ul>
 </section>
 
+</section>
 
 <!-- GENERIC COMMAND OPTIONS-->
 <section>
@@ -294,8 +312,20 @@ the nth field separator in a line of the
 inputs. By default the separator is the tab character.</p>
 </section>
 
+<section>
+<title>Specifying Communication Formats in Detail</title>
+<p>
+The above-mentioned <code>-io [identifier]</code> option is pretty coarse-grained since it triggers usage of the format corresponding to the given identifier for everything. A more fine-grained way of specifying the communication formats is by using the following generic options:
+</p>
+<source>
+    -D stream.map.input=[identifier]
+    -D stream.map.output=[identifier]
+    -D stream.reduce.input=[identifier]
+    -D stream.reduce.output=[identifier]
+</source>
 </section>
 
+</section>
 
 <section>
 <title>Working with Large Files and Archives</title>