You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-commits@hadoop.apache.org by tu...@apache.org on 2013/02/06 20:52:25 UTC
svn commit: r1443168 - in /hadoop/common/trunk/hadoop-mapreduce-project:
CHANGES.txt
hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm
Author: tucu
Date: Wed Feb 6 19:52:25 2013
New Revision: 1443168
URL: http://svn.apache.org/viewvc?rev=1443168&view=rev
Log:
MAPREDUCE-4977. Documentation for pluggable shuffle and pluggable sort. (tucu)
Added:
hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm
Modified:
hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
Modified: hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt?rev=1443168&r1=1443167&r2=1443168&view=diff
==============================================================================
--- hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt (original)
+++ hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Wed Feb 6 19:52:25 2013
@@ -230,6 +230,9 @@ Release 2.0.3-alpha - 2013-02-06
MAPREDUCE-4971. Minor extensibility enhancements to Counters &
FileOutputFormat. (Arun C Murthy via sseth)
+ MAPREDUCE-4977. Documentation for pluggable shuffle and pluggable sort.
+ (tucu)
+
OPTIMIZATIONS
MAPREDUCE-4893. Fixed MR ApplicationMaster to do optimal assignment of
Added: hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm?rev=1443168&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm (added)
+++ hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm Wed Feb 6 19:52:25 2013
@@ -0,0 +1,96 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~ http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+ ---
+ Hadoop Map Reduce Next Generation-${project.version} - Pluggable Shuffle and Pluggable Sort
+ ---
+ ---
+ ${maven.build.timestamp}
+
+Hadoop MapReduce Next Generation - Pluggable Shuffle and Pluggable Sort
+
+ \[ {{{./index.html}Go Back}} \]
+
+* Introduction
+
+ The pluggable shuffle and pluggable sort capabilities allow replacing the
+ built in shuffle and sort logic with alternate implementations. Example use
+ cases for this are: using a different application protocol other than HTTP
+ such as RDMA for shuffling data from the Map nodes to the Reducer nodes; or
+ replacing the sort logic with custom algorithms that enable Hash aggregation
+ and Limit-N query.
+
+ <<IMPORTANT:>> The pluggable shuffle and pluggable sort capabilities are
+ experimental and unstable. This means the provided APIs may change and break
+ compatibility in future versions of Hadoop.
+
+* Implementing a Custom Shuffle and a Custom Sort
+
+ A custom shuffle implementation requires a
+ <<<org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.AuxiliaryService>>>
+ implementation class running in the NodeManagers and a
+ <<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>> implementation class
+ running in the Reducer tasks.
+
+ The default implementations provided by Hadoop can be used as references:
+
+ * <<<org.apache.hadoop.mapred.ShuffleHandler>>>
+
+ * <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>>
+
+ A custom sort implementation requires a <<<org.apache.hadoop.mapred.MapOutputCollector>>>
+ implementation class running in the Mapper tasks and (optionally, depending
+ on the sort implementation) a <<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>>
+ implementation class running in the Reducer tasks.
+
+ The default implementations provided by Hadoop can be used as references:
+
+ * <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>>
+
+ * <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>>
+
+* Configuration
+
+ Except for the auxiliary service running in the NodeManagers serving the
+ shuffle (by default the <<<ShuffleHandler>>>), all the pluggable components
+ run in the job tasks. This means, they can be configured on per job basis.
+ The auxiliary service servicing the Shuffle must be configured in the
+ NodeManagers configuration.
+
+** Job Configuration Properties (on per job basis):
+
+*--------------------------------------+---------------------+-----------------+
+| <<Property>> | <<Default Value>> | <<Explanation>> |
+*--------------------------------------+---------------------+-----------------+
+| <<<mapreduce.job.reduce.shuffle.consumer.plugin.class>>> | <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>> | The <<<ShuffleConsumerPlugin>>> implementation to use |
+*--------------------------------------+---------------------+-----------------+
+| <<<mapreduce.job.map.output.collector.class>>> | <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>> | The <<<MapOutputCollector>>> implementation to use |
+*--------------------------------------+---------------------+-----------------+
+
+ These properties can also be set in the <<<mapred-site.xml>>> to change the default values for all jobs.
+
+** NodeManager Configuration properties, <<<yarn-site.xml>>> in all nodes:
+
+*--------------------------------------+---------------------+-----------------+
+| <<Property>> | <<Default Value>> | <<Explanation>> |
+*--------------------------------------+---------------------+-----------------+
+| <<<yarn.nodemanager.aux-services>>> | <<<...,mapreduce.shuffle>>> | The auxiliary service name |
+*--------------------------------------+---------------------+-----------------+
+| <<<yarn.nodemanager.aux-services.mapreduce.shuffle.class>>> | <<<org.apache.hadoop.mapred.ShuffleHandler>>> | The auxiliary service class to use |
+*--------------------------------------+---------------------+-----------------+
+
+ <<IMPORTANT:>> If setting an auxiliary service in addition the default
+ <<<mapreduce.shuffle>>> service, then a new service key should be added to the
+ <<<yarn.nodemanager.aux-services>>> property, for example <<<mapred.shufflex>>>.
+ Then the property defining the corresponding class must be
+ <<<yarn.nodemanager.aux-services.mapreduce.shufflex.class>>>.
+
\ No newline at end of file