You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-commits@hadoop.apache.org by tu...@apache.org on 2013/02/06 20:52:25 UTC
svn commit: r1443168 - in /hadoop/common/trunk/hadoop-mapreduce-project: CHANGES.txt hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm

Author: tucu
Date: Wed Feb  6 19:52:25 2013
New Revision: 1443168

URL: http://svn.apache.org/viewvc?rev=1443168&view=rev
Log:
MAPREDUCE-4977. Documentation for pluggable shuffle and pluggable sort. (tucu)

Added:
    hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm
Modified:
    hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

Modified: hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt?rev=1443168&r1=1443167&r2=1443168&view=diff
==============================================================================
--- hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt (original)
+++ hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Wed Feb  6 19:52:25 2013
@@ -230,6 +230,9 @@ Release 2.0.3-alpha - 2013-02-06 
     MAPREDUCE-4971. Minor extensibility enhancements to Counters & 
     FileOutputFormat. (Arun C Murthy via sseth)
 
+    MAPREDUCE-4977. Documentation for pluggable shuffle and pluggable sort. 
+    (tucu)
+
   OPTIMIZATIONS
 
     MAPREDUCE-4893. Fixed MR ApplicationMaster to do optimal assignment of

Added: hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm?rev=1443168&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm (added)
+++ hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm Wed Feb  6 19:52:25 2013
@@ -0,0 +1,96 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~   http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+  ---
+  Hadoop Map Reduce Next Generation-${project.version} - Pluggable Shuffle and Pluggable Sort
+  ---
+  ---
+  ${maven.build.timestamp}
+
+Hadoop MapReduce Next Generation - Pluggable Shuffle and Pluggable Sort
+
+  \[ {{{./index.html}Go Back}} \]
+
+* Introduction
+
+  The pluggable shuffle and pluggable sort capabilities allow replacing the 
+  built in shuffle and sort logic with alternate implementations. Example use 
+  cases for this are: using a different application protocol other than HTTP 
+  such as RDMA for shuffling data from the Map nodes to the Reducer nodes; or
+  replacing the sort logic with custom algorithms that enable Hash aggregation 
+  and Limit-N query.
+
+  <<IMPORTANT:>> The pluggable shuffle and pluggable sort capabilities are 
+  experimental and unstable. This means the provided APIs may change and break 
+  compatibility in future versions of Hadoop.
+
+* Implementing a Custom Shuffle and a Custom Sort 
+
+  A custom shuffle implementation requires a
+  <<<org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.AuxiliaryService>>> 
+  implementation class running in the NodeManagers and a 
+  <<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>> implementation class
+  running in the Reducer tasks.
+
+  The default implementations provided by Hadoop can be used as references:
+
+    * <<<org.apache.hadoop.mapred.ShuffleHandler>>>
+    
+    * <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>>
+
+  A custom sort implementation requires a <<<org.apache.hadoop.mapred.MapOutputCollector>>>
+  implementation class running in the Mapper tasks and (optionally, depending
+  on the sort implementation) a <<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>> 
+  implementation class running in the Reducer tasks.
+
+  The default implementations provided by Hadoop can be used as references:
+
+  * <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>>
+  
+  * <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>>
+
+* Configuration
+
+  Except for the auxiliary service running in the NodeManagers serving the 
+  shuffle (by default the <<<ShuffleHandler>>>), all the pluggable components 
+  run in the job tasks. This means, they can be configured on per job basis. 
+  The auxiliary service servicing the Shuffle must be configured in the 
+  NodeManagers configuration.
+
+** Job Configuration Properties (on per job basis):
+
+*--------------------------------------+---------------------+-----------------+
+| <<Property>>                         | <<Default Value>>   | <<Explanation>> |
+*--------------------------------------+---------------------+-----------------+
+| <<<mapreduce.job.reduce.shuffle.consumer.plugin.class>>> | <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>>         | The <<<ShuffleConsumerPlugin>>> implementation to use |
+*--------------------------------------+---------------------+-----------------+
+| <<<mapreduce.job.map.output.collector.class>>>   | <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>> | The <<<MapOutputCollector>>> implementation to use |
+*--------------------------------------+---------------------+-----------------+
+
+  These properties can also be set in the <<<mapred-site.xml>>> to change the default values for all jobs.
+
+** NodeManager Configuration properties, <<<yarn-site.xml>>> in all nodes:
+
+*--------------------------------------+---------------------+-----------------+
+| <<Property>>                         | <<Default Value>>   | <<Explanation>> |
+*--------------------------------------+---------------------+-----------------+
+| <<<yarn.nodemanager.aux-services>>> | <<<...,mapreduce.shuffle>>>  | The auxiliary service name |
+*--------------------------------------+---------------------+-----------------+
+| <<<yarn.nodemanager.aux-services.mapreduce.shuffle.class>>>   | <<<org.apache.hadoop.mapred.ShuffleHandler>>> | The auxiliary service class to use |
+*--------------------------------------+---------------------+-----------------+
+
+  <<IMPORTANT:>> If setting an auxiliary service in addition the default 
+  <<<mapreduce.shuffle>>> service, then a new service key should be added to the
+  <<<yarn.nodemanager.aux-services>>> property, for example <<<mapred.shufflex>>>.
+  Then the property defining the corresponding class must be
+  <<<yarn.nodemanager.aux-services.mapreduce.shufflex.class>>>.
+  
\ No newline at end of file