You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by aw...@apache.org on 2015/02/10 22:40:11 UTC
[7/7] hadoop git commit: HADOOP-11495. Convert site documentation
from apt to markdown (Masatake Iwasaki via aw)
HADOOP-11495. Convert site documentation from apt to markdown (Masatake Iwasaki via aw)
Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/e9d26fe9
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/e9d26fe9
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/e9d26fe9
Branch: refs/heads/trunk
Commit: e9d26fe9eb16a0482d3581504ecad22b4cd65077
Parents: 6338ce3
Author: Allen Wittenauer <aw...@apache.org>
Authored: Tue Feb 10 13:39:57 2015 -0800
Committer: Allen Wittenauer <aw...@apache.org>
Committed: Tue Feb 10 13:39:57 2015 -0800
----------------------------------------------------------------------
hadoop-common-project/hadoop-common/CHANGES.txt | 3 +
.../src/site/apt/CLIMiniCluster.apt.vm | 83 --
.../src/site/apt/ClusterSetup.apt.vm | 651 --------------
.../src/site/apt/CommandsManual.apt.vm | 327 -------
.../src/site/apt/Compatibility.apt.vm | 541 ------------
.../src/site/apt/DeprecatedProperties.apt.vm | 552 ------------
.../src/site/apt/FileSystemShell.apt.vm | 764 ----------------
.../src/site/apt/HttpAuthentication.apt.vm | 98 ---
.../src/site/apt/InterfaceClassification.apt.vm | 239 -----
.../hadoop-common/src/site/apt/Metrics.apt.vm | 879 -------------------
.../src/site/apt/NativeLibraries.apt.vm | 205 -----
.../src/site/apt/RackAwareness.apt.vm | 140 ---
.../src/site/apt/SecureMode.apt.vm | 689 ---------------
.../src/site/apt/ServiceLevelAuth.apt.vm | 216 -----
.../src/site/apt/SingleCluster.apt.vm | 286 ------
.../src/site/apt/SingleNodeSetup.apt.vm | 24 -
.../src/site/apt/Superusers.apt.vm | 144 ---
.../hadoop-common/src/site/apt/Tracing.apt.vm | 233 -----
.../src/site/markdown/CLIMiniCluster.md.vm | 68 ++
.../src/site/markdown/ClusterSetup.md | 339 +++++++
.../src/site/markdown/CommandsManual.md | 227 +++++
.../src/site/markdown/Compatibility.md | 313 +++++++
.../src/site/markdown/DeprecatedProperties.md | 288 ++++++
.../src/site/markdown/FileSystemShell.md | 689 +++++++++++++++
.../src/site/markdown/HttpAuthentication.md | 58 ++
.../site/markdown/InterfaceClassification.md | 105 +++
.../hadoop-common/src/site/markdown/Metrics.md | 456 ++++++++++
.../src/site/markdown/NativeLibraries.md.vm | 145 +++
.../src/site/markdown/RackAwareness.md | 104 +++
.../src/site/markdown/SecureMode.md | 375 ++++++++
.../src/site/markdown/ServiceLevelAuth.md | 144 +++
.../src/site/markdown/SingleCluster.md.vm | 232 +++++
.../src/site/markdown/SingleNodeSetup.md | 20 +
.../src/site/markdown/Superusers.md | 106 +++
.../hadoop-common/src/site/markdown/Tracing.md | 182 ++++
35 files changed, 3854 insertions(+), 6071 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/CHANGES.txt b/hadoop-common-project/hadoop-common/CHANGES.txt
index fadc744..1ba93e8 100644
--- a/hadoop-common-project/hadoop-common/CHANGES.txt
+++ b/hadoop-common-project/hadoop-common/CHANGES.txt
@@ -168,6 +168,9 @@ Trunk (Unreleased)
HADOOP-6964. Allow compact property description in xml (Kengo Seki
via aw)
+ HADOOP-11495. Convert site documentation from apt to markdown
+ (Masatake Iwasaki via aw)
+
BUG FIXES
HADOOP-11473. test-patch says "-1 overall" even when all checks are +1
http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/src/site/apt/CLIMiniCluster.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/CLIMiniCluster.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/CLIMiniCluster.apt.vm
deleted file mode 100644
index 2d12c39..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/CLIMiniCluster.apt.vm
+++ /dev/null
@@ -1,83 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~ http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
- ---
- Hadoop MapReduce Next Generation ${project.version} - CLI MiniCluster.
- ---
- ---
- ${maven.build.timestamp}
-
-Hadoop MapReduce Next Generation - CLI MiniCluster.
-
-%{toc|section=1|fromDepth=0}
-
-* {Purpose}
-
- Using the CLI MiniCluster, users can simply start and stop a single-node
- Hadoop cluster with a single command, and without the need to set any
- environment variables or manage configuration files. The CLI MiniCluster
- starts both a <<<YARN>>>/<<<MapReduce>>> & <<<HDFS>>> clusters.
-
- This is useful for cases where users want to quickly experiment with a real
- Hadoop cluster or test non-Java programs that rely on significant Hadoop
- functionality.
-
-* {Hadoop Tarball}
-
- You should be able to obtain the Hadoop tarball from the release. Also, you
- can directly create a tarball from the source:
-
-+---+
-$ mvn clean install -DskipTests
-$ mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip
-+---+
- <<NOTE:>> You will need {{{http://code.google.com/p/protobuf/}protoc 2.5.0}}
- installed.
-
- The tarball should be available in <<<hadoop-dist/target/>>> directory.
-
-* {Running the MiniCluster}
-
- From inside the root directory of the extracted tarball, you can start the CLI
- MiniCluster using the following command:
-
-+---+
-$ bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${project.version}-tests.jar minicluster -rmport RM_PORT -jhsport JHS_PORT
-+---+
-
- In the example command above, <<<RM_PORT>>> and <<<JHS_PORT>>> should be
- replaced by the user's choice of these port numbers. If not specified, random
- free ports will be used.
-
- There are a number of command line arguments that the users can use to control
- which services to start, and to pass other configuration properties.
- The available command line arguments:
-
-+---+
-$ -D <property=value> Options to pass into configuration object
-$ -datanodes <arg> How many datanodes to start (default 1)
-$ -format Format the DFS (default false)
-$ -help Prints option help.
-$ -jhsport <arg> JobHistoryServer port (default 0--we choose)
-$ -namenode <arg> URL of the namenode (default is either the DFS
-$ cluster or a temporary dir)
-$ -nnport <arg> NameNode port (default 0--we choose)
-$ -nodemanagers <arg> How many nodemanagers to start (default 1)
-$ -nodfs Don't start a mini DFS cluster
-$ -nomr Don't start a mini MR cluster
-$ -rmport <arg> ResourceManager port (default 0--we choose)
-$ -writeConfig <path> Save configuration to this XML file.
-$ -writeDetails <path> Write basic information to this JSON file.
-+---+
-
- To display this full list of available arguments, the user can pass the
- <<<-help>>> argument to the above command.
http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
deleted file mode 100644
index 52b0552..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
+++ /dev/null
@@ -1,651 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~ http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
- ---
- Hadoop ${project.version} - Cluster Setup
- ---
- ---
- ${maven.build.timestamp}
-
-%{toc|section=1|fromDepth=0}
-
-Hadoop Cluster Setup
-
-* {Purpose}
-
- This document describes how to install and configure
- Hadoop clusters ranging from a few nodes to extremely large clusters
- with thousands of nodes. To play with Hadoop, you may first want to
- install it on a single machine (see {{{./SingleCluster.html}Single Node Setup}}).
-
- This document does not cover advanced topics such as {{{./SecureMode.html}Security}} or
- High Availability.
-
-* {Prerequisites}
-
- * Install Java. See the {{{http://wiki.apache.org/hadoop/HadoopJavaVersions}Hadoop Wiki}} for known good versions.
- * Download a stable version of Hadoop from Apache mirrors.
-
-* {Installation}
-
- Installing a Hadoop cluster typically involves unpacking the software on all
- the machines in the cluster or installing it via a packaging system as
- appropriate for your operating system. It is important to divide up the hardware
- into functions.
-
- Typically one machine in the cluster is designated as the NameNode and
- another machine the as ResourceManager, exclusively. These are the masters. Other
- services (such as Web App Proxy Server and MapReduce Job History server) are usually
- run either on dedicated hardware or on shared infrastrucutre, depending upon the load.
-
- The rest of the machines in the cluster act as both DataNode and NodeManager.
- These are the slaves.
-
-* {Configuring Hadoop in Non-Secure Mode}
-
- Hadoop's Java configuration is driven by two types of important configuration files:
-
- * Read-only default configuration - <<<core-default.xml>>>,
- <<<hdfs-default.xml>>>, <<<yarn-default.xml>>> and
- <<<mapred-default.xml>>>.
-
- * Site-specific configuration - <<<etc/hadoop/core-site.xml>>>,
- <<<etc/hadoop/hdfs-site.xml>>>, <<<etc/hadoop/yarn-site.xml>>> and
- <<<etc/hadoop/mapred-site.xml>>>.
-
-
- Additionally, you can control the Hadoop scripts found in the bin/
- directory of the distribution, by setting site-specific values via the
- <<<etc/hadoop/hadoop-env.sh>>> and <<<etc/hadoop/yarn-env.sh>>>.
-
- To configure the Hadoop cluster you will need to configure the
- <<<environment>>> in which the Hadoop daemons execute as well as the
- <<<configuration parameters>>> for the Hadoop daemons.
-
- HDFS daemons are NameNode, SecondaryNameNode, and DataNode. YARN damones
- are ResourceManager, NodeManager, and WebAppProxy. If MapReduce is to be
- used, then the MapReduce Job History Server will also be running. For
- large installations, these are generally running on separate hosts.
-
-
-** {Configuring Environment of Hadoop Daemons}
-
- Administrators should use the <<<etc/hadoop/hadoop-env.sh>>> and optionally the
- <<<etc/hadoop/mapred-env.sh>>> and <<<etc/hadoop/yarn-env.sh>>> scripts to do
- site-specific customization of the Hadoop daemons' process environment.
-
- At the very least, you must specify the <<<JAVA_HOME>>> so that it is
- correctly defined on each remote node.
-
- Administrators can configure individual daemons using the configuration
- options shown below in the table:
-
-*--------------------------------------+--------------------------------------+
-|| Daemon || Environment Variable |
-*--------------------------------------+--------------------------------------+
-| NameNode | HADOOP_NAMENODE_OPTS |
-*--------------------------------------+--------------------------------------+
-| DataNode | HADOOP_DATANODE_OPTS |
-*--------------------------------------+--------------------------------------+
-| Secondary NameNode | HADOOP_SECONDARYNAMENODE_OPTS |
-*--------------------------------------+--------------------------------------+
-| ResourceManager | YARN_RESOURCEMANAGER_OPTS |
-*--------------------------------------+--------------------------------------+
-| NodeManager | YARN_NODEMANAGER_OPTS |
-*--------------------------------------+--------------------------------------+
-| WebAppProxy | YARN_PROXYSERVER_OPTS |
-*--------------------------------------+--------------------------------------+
-| Map Reduce Job History Server | HADOOP_JOB_HISTORYSERVER_OPTS |
-*--------------------------------------+--------------------------------------+
-
-
- For example, To configure Namenode to use parallelGC, the following
- statement should be added in hadoop-env.sh :
-
-----
- export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
-----
-
- See <<<etc/hadoop/hadoop-env.sh>>> for other examples.
-
- Other useful configuration parameters that you can customize include:
-
- * <<<HADOOP_PID_DIR>>> - The directory where the
- daemons' process id files are stored.
-
- * <<<HADOOP_LOG_DIR>>> - The directory where the
- daemons' log files are stored. Log files are automatically created
- if they don't exist.
-
- * <<<HADOOP_HEAPSIZE_MAX>>> - The maximum amount of
- memory to use for the Java heapsize. Units supported by the JVM
- are also supported here. If no unit is present, it will be assumed
- the number is in megabytes. By default, Hadoop will let the JVM
- determine how much to use. This value can be overriden on
- a per-daemon basis using the appropriate <<<_OPTS>>> variable listed above.
- For example, setting <<<HADOOP_HEAPSIZE_MAX=1g>>> and
- <<<HADOOP_NAMENODE_OPTS="-Xmx5g">>> will configure the NameNode with 5GB heap.
-
- In most cases, you should specify the <<<HADOOP_PID_DIR>>> and
- <<<HADOOP_LOG_DIR>>> directories such that they can only be
- written to by the users that are going to run the hadoop daemons.
- Otherwise there is the potential for a symlink attack.
-
- It is also traditional to configure <<<HADOOP_PREFIX>>> in the system-wide
- shell environment configuration. For example, a simple script inside
- <<</etc/profile.d>>>:
-
----
- HADOOP_PREFIX=/path/to/hadoop
- export HADOOP_PREFIX
----
-
-*--------------------------------------+--------------------------------------+
-|| Daemon || Environment Variable |
-*--------------------------------------+--------------------------------------+
-| ResourceManager | YARN_RESOURCEMANAGER_HEAPSIZE |
-*--------------------------------------+--------------------------------------+
-| NodeManager | YARN_NODEMANAGER_HEAPSIZE |
-*--------------------------------------+--------------------------------------+
-| WebAppProxy | YARN_PROXYSERVER_HEAPSIZE |
-*--------------------------------------+--------------------------------------+
-| Map Reduce Job History Server | HADOOP_JOB_HISTORYSERVER_HEAPSIZE |
-*--------------------------------------+--------------------------------------+
-
-** {Configuring the Hadoop Daemons}
-
- This section deals with important parameters to be specified in
- the given configuration files:
-
- * <<<etc/hadoop/core-site.xml>>>
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter || Value || Notes |
-*-------------------------+-------------------------+------------------------+
-| <<<fs.defaultFS>>> | NameNode URI | <hdfs://host:port/> |
-*-------------------------+-------------------------+------------------------+
-| <<<io.file.buffer.size>>> | 131072 | |
-| | | Size of read/write buffer used in SequenceFiles. |
-*-------------------------+-------------------------+------------------------+
-
- * <<<etc/hadoop/hdfs-site.xml>>>
-
- * Configurations for NameNode:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter || Value || Notes |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.name.dir>>> | | |
-| | Path on the local filesystem where the NameNode stores the namespace | |
-| | and transactions logs persistently. | |
-| | | If this is a comma-delimited list of directories then the name table is |
-| | | replicated in all of the directories, for redundancy. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.hosts>>> / <<<dfs.namenode.hosts.exclude>>> | | |
-| | List of permitted/excluded DataNodes. | |
-| | | If necessary, use these files to control the list of allowable |
-| | | datanodes. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.blocksize>>> | 268435456 | |
-| | | HDFS blocksize of 256MB for large file-systems. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.handler.count>>> | 100 | |
-| | | More NameNode server threads to handle RPCs from large number of |
-| | | DataNodes. |
-*-------------------------+-------------------------+------------------------+
-
- * Configurations for DataNode:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter || Value || Notes |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.data.dir>>> | | |
-| | Comma separated list of paths on the local filesystem of a | |
-| | <<<DataNode>>> where it should store its blocks. | |
-| | | If this is a comma-delimited list of directories, then data will be |
-| | | stored in all named directories, typically on different devices. |
-*-------------------------+-------------------------+------------------------+
-
- * <<<etc/hadoop/yarn-site.xml>>>
-
- * Configurations for ResourceManager and NodeManager:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter || Value || Notes |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.acl.enable>>> | | |
-| | <<<true>>> / <<<false>>> | |
-| | | Enable ACLs? Defaults to <false>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.admin.acl>>> | | |
-| | Admin ACL | |
-| | | ACL to set admins on the cluster. |
-| | | ACLs are of for <comma-separated-users><space><comma-separated-groups>. |
-| | | Defaults to special value of <<*>> which means <anyone>. |
-| | | Special value of just <space> means no one has access. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.log-aggregation-enable>>> | | |
-| | <false> | |
-| | | Configuration to enable or disable log aggregation |
-*-------------------------+-------------------------+------------------------+
-
-
- * Configurations for ResourceManager:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter || Value || Notes |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.address>>> | | |
-| | <<<ResourceManager>>> host:port for clients to submit jobs. | |
-| | | <host:port>\ |
-| | | If set, overrides the hostname set in <<<yarn.resourcemanager.hostname>>>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.scheduler.address>>> | | |
-| | <<<ResourceManager>>> host:port for ApplicationMasters to talk to | |
-| | Scheduler to obtain resources. | |
-| | | <host:port>\ |
-| | | If set, overrides the hostname set in <<<yarn.resourcemanager.hostname>>>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.resource-tracker.address>>> | | |
-| | <<<ResourceManager>>> host:port for NodeManagers. | |
-| | | <host:port>\ |
-| | | If set, overrides the hostname set in <<<yarn.resourcemanager.hostname>>>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.admin.address>>> | | |
-| | <<<ResourceManager>>> host:port for administrative commands. | |
-| | | <host:port>\ |
-| | | If set, overrides the hostname set in <<<yarn.resourcemanager.hostname>>>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.webapp.address>>> | | |
-| | <<<ResourceManager>>> web-ui host:port. | |
-| | | <host:port>\ |
-| | | If set, overrides the hostname set in <<<yarn.resourcemanager.hostname>>>. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.hostname>>> | | |
-| | <<<ResourceManager>>> host. | |
-| | | <host>\ |
-| | | Single hostname that can be set in place of setting all <<<yarn.resourcemanager*address>>> resources. Results in default ports for ResourceManager components. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.scheduler.class>>> | | |
-| | <<<ResourceManager>>> Scheduler class. | |
-| | | <<<CapacityScheduler>>> (recommended), <<<FairScheduler>>> (also recommended), or <<<FifoScheduler>>> |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.scheduler.minimum-allocation-mb>>> | | |
-| | Minimum limit of memory to allocate to each container request at the <<<Resource Manager>>>. | |
-| | | In MBs |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.scheduler.maximum-allocation-mb>>> | | |
-| | Maximum limit of memory to allocate to each container request at the <<<Resource Manager>>>. | |
-| | | In MBs |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.nodes.include-path>>> / | | |
-| <<<yarn.resourcemanager.nodes.exclude-path>>> | | |
-| | List of permitted/excluded NodeManagers. | |
-| | | If necessary, use these files to control the list of allowable |
-| | | NodeManagers. |
-*-------------------------+-------------------------+------------------------+
-
- * Configurations for NodeManager:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter || Value || Notes |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.resource.memory-mb>>> | | |
-| | Resource i.e. available physical memory, in MB, for given <<<NodeManager>>> | |
-| | | Defines total available resources on the <<<NodeManager>>> to be made |
-| | | available to running containers |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.vmem-pmem-ratio>>> | | |
-| | Maximum ratio by which virtual memory usage of tasks may exceed |
-| | physical memory | |
-| | | The virtual memory usage of each task may exceed its physical memory |
-| | | limit by this ratio. The total amount of virtual memory used by tasks |
-| | | on the NodeManager may exceed its physical memory usage by this ratio. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.local-dirs>>> | | |
-| | Comma-separated list of paths on the local filesystem where | |
-| | intermediate data is written. ||
-| | | Multiple paths help spread disk i/o. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.log-dirs>>> | | |
-| | Comma-separated list of paths on the local filesystem where logs | |
-| | are written. | |
-| | | Multiple paths help spread disk i/o. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.log.retain-seconds>>> | | |
-| | <10800> | |
-| | | Default time (in seconds) to retain log files on the NodeManager |
-| | | Only applicable if log-aggregation is disabled. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.remote-app-log-dir>>> | | |
-| | </logs> | |
-| | | HDFS directory where the application logs are moved on application |
-| | | completion. Need to set appropriate permissions. |
-| | | Only applicable if log-aggregation is enabled. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.remote-app-log-dir-suffix>>> | | |
-| | <logs> | |
-| | | Suffix appended to the remote log dir. Logs will be aggregated to |
-| | | $\{yarn.nodemanager.remote-app-log-dir\}/$\{user\}/$\{thisParam\} |
-| | | Only applicable if log-aggregation is enabled. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.aux-services>>> | | |
-| | mapreduce_shuffle | |
-| | | Shuffle service that needs to be set for Map Reduce applications. |
-*-------------------------+-------------------------+------------------------+
-
- * Configurations for History Server (Needs to be moved elsewhere):
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter || Value || Notes |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.log-aggregation.retain-seconds>>> | | |
-| | <-1> | |
-| | | How long to keep aggregation logs before deleting them. -1 disables. |
-| | | Be careful, set this too small and you will spam the name node. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.log-aggregation.retain-check-interval-seconds>>> | | |
-| | <-1> | |
-| | | Time between checks for aggregated log retention. If set to 0 or a |
-| | | negative value then the value is computed as one-tenth of the |
-| | | aggregated log retention time. |
-| | | Be careful, set this too small and you will spam the name node. |
-*-------------------------+-------------------------+------------------------+
-
- * <<<etc/hadoop/mapred-site.xml>>>
-
- * Configurations for MapReduce Applications:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter || Value || Notes |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.framework.name>>> | | |
-| | yarn | |
-| | | Execution framework set to Hadoop YARN. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.map.memory.mb>>> | 1536 | |
-| | | Larger resource limit for maps. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.map.java.opts>>> | -Xmx1024M | |
-| | | Larger heap-size for child jvms of maps. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.reduce.memory.mb>>> | 3072 | |
-| | | Larger resource limit for reduces. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.reduce.java.opts>>> | -Xmx2560M | |
-| | | Larger heap-size for child jvms of reduces. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.task.io.sort.mb>>> | 512 | |
-| | | Higher memory-limit while sorting data for efficiency. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.task.io.sort.factor>>> | 100 | |
-| | | More streams merged at once while sorting files. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.reduce.shuffle.parallelcopies>>> | 50 | |
-| | | Higher number of parallel copies run by reduces to fetch outputs |
-| | | from very large number of maps. |
-*-------------------------+-------------------------+------------------------+
-
- * Configurations for MapReduce JobHistory Server:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter || Value || Notes |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.address>>> | | |
-| | MapReduce JobHistory Server <host:port> | Default port is 10020. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.webapp.address>>> | | |
-| | MapReduce JobHistory Server Web UI <host:port> | Default port is 19888. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.intermediate-done-dir>>> | /mr-history/tmp | |
-| | | Directory where history files are written by MapReduce jobs. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.done-dir>>> | /mr-history/done| |
-| | | Directory where history files are managed by the MR JobHistory Server. |
-*-------------------------+-------------------------+------------------------+
-
-* {Monitoring Health of NodeManagers}
-
- Hadoop provides a mechanism by which administrators can configure the
- NodeManager to run an administrator supplied script periodically to
- determine if a node is healthy or not.
-
- Administrators can determine if the node is in a healthy state by
- performing any checks of their choice in the script. If the script
- detects the node to be in an unhealthy state, it must print a line to
- standard output beginning with the string ERROR. The NodeManager spawns
- the script periodically and checks its output. If the script's output
- contains the string ERROR, as described above, the node's status is
- reported as <<<unhealthy>>> and the node is black-listed by the
- ResourceManager. No further tasks will be assigned to this node.
- However, the NodeManager continues to run the script, so that if the
- node becomes healthy again, it will be removed from the blacklisted nodes
- on the ResourceManager automatically. The node's health along with the
- output of the script, if it is unhealthy, is available to the
- administrator in the ResourceManager web interface. The time since the
- node was healthy is also displayed on the web interface.
-
- The following parameters can be used to control the node health
- monitoring script in <<<etc/hadoop/yarn-site.xml>>>.
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter || Value || Notes |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.health-checker.script.path>>> | | |
-| | Node health script | |
-| | | Script to check for node's health status. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.health-checker.script.opts>>> | | |
-| | Node health script options | |
-| | | Options for script to check for node's health status. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.health-checker.script.interval-ms>>> | | |
-| | Node health script interval | |
-| | | Time interval for running health script. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.health-checker.script.timeout-ms>>> | | |
-| | Node health script timeout interval | |
-| | | Timeout for health script execution. |
-*-------------------------+-------------------------+------------------------+
-
- The health checker script is not supposed to give ERROR if only some of the
- local disks become bad. NodeManager has the ability to periodically check
- the health of the local disks (specifically checks nodemanager-local-dirs
- and nodemanager-log-dirs) and after reaching the threshold of number of
- bad directories based on the value set for the config property
- yarn.nodemanager.disk-health-checker.min-healthy-disks, the whole node is
- marked unhealthy and this info is sent to resource manager also. The boot
- disk is either raided or a failure in the boot disk is identified by the
- health checker script.
-
-* {Slaves File}
-
- List all slave hostnames or IP addresses in your <<<etc/hadoop/slaves>>>
- file, one per line. Helper scripts (described below) will use the
- <<<etc/hadoop/slaves>>> file to run commands on many hosts at once. It is not
- used for any of the Java-based Hadoop configuration. In order
- to use this functionality, ssh trusts (via either passphraseless ssh or
- some other means, such as Kerberos) must be established for the accounts
- used to run Hadoop.
-
-* {Hadoop Rack Awareness}
-
- Many Hadoop components are rack-aware and take advantage of the
- network topology for performance and safety. Hadoop daemons obtain the
- rack information of the slaves in the cluster by invoking an administrator
- configured module. See the {{{./RackAwareness.html}Rack Awareness}}
- documentation for more specific information.
-
- It is highly recommended configuring rack awareness prior to starting HDFS.
-
-* {Logging}
-
- Hadoop uses the {{{http://logging.apache.org/log4j/2.x/}Apache log4j}} via the Apache Commons Logging framework for
- logging. Edit the <<<etc/hadoop/log4j.properties>>> file to customize the
- Hadoop daemons' logging configuration (log-formats and so on).
-
-* {Operating the Hadoop Cluster}
-
- Once all the necessary configuration is complete, distribute the files to the
- <<<HADOOP_CONF_DIR>>> directory on all the machines. This should be the
- same directory on all machines.
-
- In general, it is recommended that HDFS and YARN run as separate users.
- In the majority of installations, HDFS processes execute as 'hdfs'. YARN
- is typically using the 'yarn' account.
-
-** Hadoop Startup
-
- To start a Hadoop cluster you will need to start both the HDFS and YARN
- cluster.
-
- The first time you bring up HDFS, it must be formatted. Format a new
- distributed filesystem as <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
-----
-
- Start the HDFS NameNode with the following command on the
- designated node as <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start namenode
-----
-
- Start a HDFS DataNode with the following command on each
- designated node as <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start datanode
-----
-
- If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
- (see {{{./SingleCluster.html}Single Node Setup}}), all of the
- HDFS processes can be started with a utility script. As <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
-----
-
- Start the YARN with the following command, run on the designated
- ResourceManager as <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start resourcemanager
-----
-
- Run a script to start a NodeManager on each designated host as <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start nodemanager
-----
-
- Start a standalone WebAppProxy server. Run on the WebAppProxy
- server as <yarn>. If multiple servers are used with load balancing
- it should be run on each of them:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start proxyserver
-----
-
- If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
- (see {{{./SingleCluster.html}Single Node Setup}}), all of the
- YARN processes can be started with a utility script. As <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/sbin/start-yarn.sh
-----
-
- Start the MapReduce JobHistory Server with the following command, run
- on the designated server as <mapred>:
-
-----
-[mapred]$ $HADOOP_PREFIX/bin/mapred --daemon start historyserver
-----
-
-** Hadoop Shutdown
-
- Stop the NameNode with the following command, run on the designated NameNode
- as <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop namenode
-----
-
- Run a script to stop a DataNode as <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop datanode
-----
-
- If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
- (see {{{./SingleCluster.html}Single Node Setup}}), all of the
- HDFS processes may be stopped with a utility script. As <hdfs>:
-
-----
-[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
-----
-
- Stop the ResourceManager with the following command, run on the designated
- ResourceManager as <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon stop resourcemanager
-----
-
- Run a script to stop a NodeManager on a slave as <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon stop nodemanager
-----
-
- If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
- (see {{{./SingleCluster.html}Single Node Setup}}), all of the
- YARN processes can be stopped with a utility script. As <yarn>:
-
-----
-[yarn]$ $HADOOP_PREFIX/sbin/stop-yarn.sh
-----
-
- Stop the WebAppProxy server. Run on the WebAppProxy server as
- <yarn>. If multiple servers are used with load balancing it
- should be run on each of them:
-
-----
-[yarn]$ $HADOOP_PREFIX/bin/yarn stop proxyserver
-----
-
- Stop the MapReduce JobHistory Server with the following command, run on the
- designated server as <mapred>:
-
-----
-[mapred]$ $HADOOP_PREFIX/bin/mapred --daemon stop historyserver
-----
-
-* {Web Interfaces}
-
- Once the Hadoop cluster is up and running check the web-ui of the
- components as described below:
-
-*-------------------------+-------------------------+------------------------+
-|| Daemon || Web Interface || Notes |
-*-------------------------+-------------------------+------------------------+
-| NameNode | http://<nn_host:port>/ | Default HTTP port is 50070. |
-*-------------------------+-------------------------+------------------------+
-| ResourceManager | http://<rm_host:port>/ | Default HTTP port is 8088. |
-*-------------------------+-------------------------+------------------------+
-| MapReduce JobHistory Server | http://<jhs_host:port>/ | |
-| | | Default HTTP port is 19888. |
-*-------------------------+-------------------------+------------------------+
-
-
http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
deleted file mode 100644
index 67c8bc3..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
+++ /dev/null
@@ -1,327 +0,0 @@
-~~ Licensed to the Apache Software Foundation (ASF) under one or more
-~~ contributor license agreements. See the NOTICE file distributed with
-~~ this work for additional information regarding copyright ownership.
-~~ The ASF licenses this file to You under the Apache License, Version 2.0
-~~ (the "License"); you may not use this file except in compliance with
-~~ the License. You may obtain a copy of the License at
-~~
-~~ http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License.
-
- ---
- Hadoop Commands Guide
- ---
- ---
- ${maven.build.timestamp}
-
-%{toc}
-
-Hadoop Commands Guide
-
-* Overview
-
- All of the Hadoop commands and subprojects follow the same basic structure:
-
- Usage: <<<shellcommand [SHELL_OPTIONS] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
-
-*--------+---------+
-|| FIELD || Description
-*-----------------------+---------------+
-| shellcommand | The command of the project being invoked. For example,
- | Hadoop common uses <<<hadoop>>>, HDFS uses <<<hdfs>>>,
- | and YARN uses <<<yarn>>>.
-*---------------+-------------------+
-| SHELL_OPTIONS | Options that the shell processes prior to executing Java.
-*-----------------------+---------------+
-| COMMAND | Action to perform.
-*-----------------------+---------------+
-| GENERIC_OPTIONS | The common set of options supported by
- | multiple commands.
-*-----------------------+---------------+
-| COMMAND_OPTIONS | Various commands with their options are
- | described in this documention for the
- | Hadoop common sub-project. HDFS and YARN are
- | covered in other documents.
-*-----------------------+---------------+
-
-** {Shell Options}
-
- All of the shell commands will accept a common set of options. For some commands,
- these options are ignored. For example, passing <<<---hostnames>>> on a
- command that only executes on a single host will be ignored.
-
-*-----------------------+---------------+
-|| SHELL_OPTION || Description
-*-----------------------+---------------+
-| <<<--buildpaths>>> | Enables developer versions of jars.
-*-----------------------+---------------+
-| <<<--config confdir>>> | Overwrites the default Configuration
- | directory. Default is <<<${HADOOP_PREFIX}/conf>>>.
-*-----------------------+----------------+
-| <<<--daemon mode>>> | If the command supports daemonization (e.g.,
- | <<<hdfs namenode>>>), execute in the appropriate
- | mode. Supported modes are <<<start>>> to start the
- | process in daemon mode, <<<stop>>> to stop the
- | process, and <<<status>>> to determine the active
- | status of the process. <<<status>>> will return
- | an {{{http://refspecs.linuxbase.org/LSB_3.0.0/LSB-generic/LSB-generic/iniscrptact.html}LSB-compliant}} result code.
- | If no option is provided, commands that support
- | daemonization will run in the foreground.
-*-----------------------+---------------+
-| <<<--debug>>> | Enables shell level configuration debugging information
-*-----------------------+---------------+
-| <<<--help>>> | Shell script usage information.
-*-----------------------+---------------+
-| <<<--hostnames>>> | A space delimited list of hostnames where to execute
- | a multi-host subcommand. By default, the content of
- | the <<<slaves>>> file is used.
-*-----------------------+----------------+
-| <<<--hosts>>> | A file that contains a list of hostnames where to execute
- | a multi-host subcommand. By default, the content of the
- | <<<slaves>>> file is used.
-*-----------------------+----------------+
-| <<<--loglevel loglevel>>> | Overrides the log level. Valid log levels are
-| | FATAL, ERROR, WARN, INFO, DEBUG, and TRACE.
-| | Default is INFO.
-*-----------------------+---------------+
-
-** {Generic Options}
-
- Many subcommands honor a common set of configuration options to alter their behavior:
-
-*------------------------------------------------+-----------------------------+
-|| GENERIC_OPTION || Description
-*------------------------------------------------+-----------------------------+
-|<<<-archives \<comma separated list of archives\> >>> | Specify comma separated
- | archives to be unarchived on
- | the compute machines. Applies
- | only to job.
-*------------------------------------------------+-----------------------------+
-|<<<-conf \<configuration file\> >>> | Specify an application
- | configuration file.
-*------------------------------------------------+-----------------------------+
-|<<<-D \<property\>=\<value\> >>> | Use value for given property.
-*------------------------------------------------+-----------------------------+
-|<<<-files \<comma separated list of files\> >>> | Specify comma separated files
- | to be copied to the map
- | reduce cluster. Applies only
- | to job.
-*------------------------------------------------+-----------------------------+
-|<<<-jt \<local\> or \<resourcemanager:port\>>>> | Specify a ResourceManager.
- | Applies only to job.
-*------------------------------------------------+-----------------------------+
-|<<<-libjars \<comma seperated list of jars\> >>>| Specify comma separated jar
- | files to include in the
- | classpath. Applies only to
- | job.
-*------------------------------------------------+-----------------------------+
-
-Hadoop Common Commands
-
- All of these commands are executed from the <<<hadoop>>> shell command. They
- have been broken up into {{User Commands}} and
- {{Admininistration Commands}}.
-
-* User Commands
-
- Commands useful for users of a hadoop cluster.
-
-** <<<archive>>>
-
- Creates a hadoop archive. More information can be found at
- {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopArchives.html}
- Hadoop Archives Guide}}.
-
-** <<<checknative>>>
-
- Usage: <<<hadoop checknative [-a] [-h] >>>
-
-*-----------------+-----------------------------------------------------------+
-|| COMMAND_OPTION || Description
-*-----------------+-----------------------------------------------------------+
-| -a | Check all libraries are available.
-*-----------------+-----------------------------------------------------------+
-| -h | print help
-*-----------------+-----------------------------------------------------------+
-
- This command checks the availability of the Hadoop native code. See
- {{{NativeLibraries.html}}} for more information. By default, this command
- only checks the availability of libhadoop.
-
-** <<<classpath>>>
-
- Usage: <<<hadoop classpath [--glob|--jar <path>|-h|--help]>>>
-
-*-----------------+-----------------------------------------------------------+
-|| COMMAND_OPTION || Description
-*-----------------+-----------------------------------------------------------+
-| --glob | expand wildcards
-*-----------------+-----------------------------------------------------------+
-| --jar <path> | write classpath as manifest in jar named <path>
-*-----------------+-----------------------------------------------------------+
-| -h, --help | print help
-*-----------------+-----------------------------------------------------------+
-
- Prints the class path needed to get the Hadoop jar and the required
- libraries. If called without arguments, then prints the classpath set up by
- the command scripts, which is likely to contain wildcards in the classpath
- entries. Additional options print the classpath after wildcard expansion or
- write the classpath into the manifest of a jar file. The latter is useful in
- environments where wildcards cannot be used and the expanded classpath exceeds
- the maximum supported command line length.
-
-** <<<credential>>>
-
- Usage: <<<hadoop credential <subcommand> [options]>>>
-
-*-------------------+-------------------------------------------------------+
-||COMMAND_OPTION || Description
-*-------------------+-------------------------------------------------------+
-| create <alias> [-v <value>][-provider <provider-path>]| Prompts the user for
- | a credential to be stored as the given alias when a value
- | is not provided via <<<-v>>>. The
- | <hadoop.security.credential.provider.path> within the
- | core-site.xml file will be used unless a <<<-provider>>> is
- | indicated.
-*-------------------+-------------------------------------------------------+
-| delete <alias> [-i][-provider <provider-path>] | Deletes the credential with
- | the provided alias and optionally warns the user when
- | <<<--interactive>>> is used.
- | The <hadoop.security.credential.provider.path> within the
- | core-site.xml file will be used unless a <<<-provider>>> is
- | indicated.
-*-------------------+-------------------------------------------------------+
-| list [-provider <provider-path>] | Lists all of the credential aliases
- | The <hadoop.security.credential.provider.path> within the
- | core-site.xml file will be used unless a <<<-provider>>> is
- | indicated.
-*-------------------+-------------------------------------------------------+
-
- Command to manage credentials, passwords and secrets within credential providers.
-
- The CredentialProvider API in Hadoop allows for the separation of applications
- and how they store their required passwords/secrets. In order to indicate
- a particular provider type and location, the user must provide the
- <hadoop.security.credential.provider.path> configuration element in core-site.xml
- or use the command line option <<<-provider>>> on each of the following commands.
- This provider path is a comma-separated list of URLs that indicates the type and
- location of a list of providers that should be consulted. For example, the following path:
- <<<user:///,jceks://file/tmp/test.jceks,jceks://hdfs@nn1.example.com/my/path/test.jceks>>>
-
- indicates that the current user's credentials file should be consulted through
- the User Provider, that the local file located at <<</tmp/test.jceks>>> is a Java Keystore
- Provider and that the file located within HDFS at <<<nn1.example.com/my/path/test.jceks>>>
- is also a store for a Java Keystore Provider.
-
- When utilizing the credential command it will often be for provisioning a password
- or secret to a particular credential store provider. In order to explicitly
- indicate which provider store to use the <<<-provider>>> option should be used. Otherwise,
- given a path of multiple providers, the first non-transient provider will be used.
- This may or may not be the one that you intended.
-
- Example: <<<-provider jceks://file/tmp/test.jceks>>>
-
-** <<<distch>>>
-
- Usage: <<<hadoop distch [-f urilist_url] [-i] [-log logdir] path:owner:group:permissions>>>
-
-*-------------------+-------------------------------------------------------+
-||COMMAND_OPTION || Description
-*-------------------+-------------------------------------------------------+
-| -f | List of objects to change
-*----+------------+
-| -i | Ignore failures
-*----+------------+
-| -log | Directory to log output
-*-----+---------+
-
- Change the ownership and permissions on many files at once.
-
-** <<<distcp>>>
-
- Copy file or directories recursively. More information can be found at
- {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistCp.html}
- Hadoop DistCp Guide}}.
-
-** <<<fs>>>
-
- This command is documented in the {{{./FileSystemShell.html}File System Shell Guide}}. It is a synonym for <<<hdfs dfs>>> when HDFS is in use.
-
-** <<<jar>>>
-
- Usage: <<<hadoop jar <jar> [mainClass] args...>>>
-
- Runs a jar file.
-
- Use {{{../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html#jar}<<<yarn jar>>>}}
- to launch YARN applications instead.
-
-** <<<jnipath>>>
-
- Usage: <<<hadoop jnipath>>>
-
- Print the computed java.library.path.
-
-** <<<key>>>
-
- Manage keys via the KeyProvider.
-
-** <<<trace>>>
-
- View and modify Hadoop tracing settings. See the {{{./Tracing.html}Tracing Guide}}.
-
-** <<<version>>>
-
- Usage: <<<hadoop version>>>
-
- Prints the version.
-
-** <<<CLASSNAME>>>
-
- Usage: <<<hadoop CLASSNAME>>>
-
- Runs the class named <<<CLASSNAME>>>. The class must be part of a package.
-
-* {Administration Commands}
-
- Commands useful for administrators of a hadoop cluster.
-
-** <<<daemonlog>>>
-
- Usage: <<<hadoop daemonlog -getlevel <host:port> <name> >>>
- Usage: <<<hadoop daemonlog -setlevel <host:port> <name> <level> >>>
-
-*------------------------------+-----------------------------------------------------------+
-|| COMMAND_OPTION || Description
-*------------------------------+-----------------------------------------------------------+
-| -getlevel <host:port> <name> | Prints the log level of the daemon running at
- | <host:port>. This command internally connects
- | to http://<host:port>/logLevel?log=<name>
-*------------------------------+-----------------------------------------------------------+
-| -setlevel <host:port> <name> <level> | Sets the log level of the daemon
- | running at <host:port>. This command internally
- | connects to http://<host:port>/logLevel?log=<name>
-*------------------------------+-----------------------------------------------------------+
-
- Get/Set the log level for each daemon.
-
-* Files
-
-** <<etc/hadoop/hadoop-env.sh>>
-
- This file stores the global settings used by all Hadoop shell commands.
-
-** <<etc/hadoop/hadoop-user-functions.sh>>
-
- This file allows for advanced users to override some shell functionality.
-
-** <<~/.hadooprc>>
-
- This stores the personal environment for an individual user. It is
- processed after the hadoop-env.sh and hadoop-user-functions.sh files
- and can contain the same settings.
http://git-wip-us.apache.org/repos/asf/hadoop/blob/e9d26fe9/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm
deleted file mode 100644
index 98d1f57..0000000
--- a/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm
+++ /dev/null
@@ -1,541 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~ http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
- ---
-Apache Hadoop Compatibility
- ---
- ---
- ${maven.build.timestamp}
-
-Apache Hadoop Compatibility
-
-%{toc|section=1|fromDepth=0}
-
-* Purpose
-
- This document captures the compatibility goals of the Apache Hadoop
- project. The different types of compatibility between Hadoop
- releases that affects Hadoop developers, downstream projects, and
- end-users are enumerated. For each type of compatibility we:
-
- * describe the impact on downstream projects or end-users
-
- * where applicable, call out the policy adopted by the Hadoop
- developers when incompatible changes are permitted.
-
-* Compatibility types
-
-** Java API
-
- Hadoop interfaces and classes are annotated to describe the intended
- audience and stability in order to maintain compatibility with previous
- releases. See {{{./InterfaceClassification.html}Hadoop Interface
- Classification}}
- for details.
-
- * InterfaceAudience: captures the intended audience, possible
- values are Public (for end users and external projects),
- LimitedPrivate (for other Hadoop components, and closely related
- projects like YARN, MapReduce, HBase etc.), and Private (for intra component
- use).
-
- * InterfaceStability: describes what types of interface changes are
- permitted. Possible values are Stable, Evolving, Unstable, and Deprecated.
-
-*** Use Cases
-
- * Public-Stable API compatibility is required to ensure end-user programs
- and downstream projects continue to work without modification.
-
- * LimitedPrivate-Stable API compatibility is required to allow upgrade of
- individual components across minor releases.
-
- * Private-Stable API compatibility is required for rolling upgrades.
-
-*** Policy
-
- * Public-Stable APIs must be deprecated for at least one major release
- prior to their removal in a major release.
-
- * LimitedPrivate-Stable APIs can change across major releases,
- but not within a major release.
-
- * Private-Stable APIs can change across major releases,
- but not within a major release.
-
- * Classes not annotated are implicitly "Private". Class members not
- annotated inherit the annotations of the enclosing class.
-
- * Note: APIs generated from the proto files need to be compatible for
- rolling-upgrades. See the section on wire-compatibility for more details.
- The compatibility policies for APIs and wire-communication need to go
- hand-in-hand to address this.
-
-** Semantic compatibility
-
- Apache Hadoop strives to ensure that the behavior of APIs remains
- consistent over versions, though changes for correctness may result in
- changes in behavior. Tests and javadocs specify the API's behavior.
- The community is in the process of specifying some APIs more rigorously,
- and enhancing test suites to verify compliance with the specification,
- effectively creating a formal specification for the subset of behaviors
- that can be easily tested.
-
-*** Policy
-
- The behavior of API may be changed to fix incorrect behavior,
- such a change to be accompanied by updating existing buggy tests or adding
- tests in cases there were none prior to the change.
-
-** Wire compatibility
-
- Wire compatibility concerns data being transmitted over the wire
- between Hadoop processes. Hadoop uses Protocol Buffers for most RPC
- communication. Preserving compatibility requires prohibiting
- modification as described below.
- Non-RPC communication should be considered as well,
- for example using HTTP to transfer an HDFS image as part of
- snapshotting or transferring MapTask output. The potential
- communications can be categorized as follows:
-
- * Client-Server: communication between Hadoop clients and servers (e.g.,
- the HDFS client to NameNode protocol, or the YARN client to
- ResourceManager protocol).
-
- * Client-Server (Admin): It is worth distinguishing a subset of the
- Client-Server protocols used solely by administrative commands (e.g.,
- the HAAdmin protocol) as these protocols only impact administrators
- who can tolerate changes that end users (which use general
- Client-Server protocols) can not.
-
- * Server-Server: communication between servers (e.g., the protocol between
- the DataNode and NameNode, or NodeManager and ResourceManager)
-
-*** Use Cases
-
- * Client-Server compatibility is required to allow users to
- continue using the old clients even after upgrading the server
- (cluster) to a later version (or vice versa). For example, a
- Hadoop 2.1.0 client talking to a Hadoop 2.3.0 cluster.
-
- * Client-Server compatibility is also required to allow users to upgrade the
- client before upgrading the server (cluster). For example, a Hadoop 2.4.0
- client talking to a Hadoop 2.3.0 cluster. This allows deployment of
- client-side bug fixes ahead of full cluster upgrades. Note that new cluster
- features invoked by new client APIs or shell commands will not be usable.
- YARN applications that attempt to use new APIs (including new fields in data
- structures) that have not yet deployed to the cluster can expect link
- exceptions.
-
- * Client-Server compatibility is also required to allow upgrading
- individual components without upgrading others. For example,
- upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading MapReduce.
-
- * Server-Server compatibility is required to allow mixed versions
- within an active cluster so the cluster may be upgraded without
- downtime in a rolling fashion.
-
-*** Policy
-
- * Both Client-Server and Server-Server compatibility is preserved within a
- major release. (Different policies for different categories are yet to be
- considered.)
-
- * Compatibility can be broken only at a major release, though breaking compatibility
- even at major releases has grave consequences and should be discussed in the Hadoop community.
-
- * Hadoop protocols are defined in .proto (ProtocolBuffers) files.
- Client-Server protocols and Server-protocol .proto files are marked as stable.
- When a .proto file is marked as stable it means that changes should be made
- in a compatible fashion as described below:
-
- * The following changes are compatible and are allowed at any time:
-
- * Add an optional field, with the expectation that the code deals with the field missing due to communication with an older version of the code.
-
- * Add a new rpc/method to the service
-
- * Add a new optional request to a Message
-
- * Rename a field
-
- * Rename a .proto file
-
- * Change .proto annotations that effect code generation (e.g. name of java package)
-
- * The following changes are incompatible but can be considered only at a major release
-
- * Change the rpc/method name
-
- * Change the rpc/method parameter type or return type
-
- * Remove an rpc/method
-
- * Change the service name
-
- * Change the name of a Message
-
- * Modify a field type in an incompatible way (as defined recursively)
-
- * Change an optional field to required
-
- * Add or delete a required field
-
- * Delete an optional field as long as the optional field has reasonable defaults to allow deletions
-
- * The following changes are incompatible and hence never allowed
-
- * Change a field id
-
- * Reuse an old field that was previously deleted.
-
- * Field numbers are cheap and changing and reusing is not a good idea.
-
-
-** Java Binary compatibility for end-user applications i.e. Apache Hadoop ABI
-
- As Apache Hadoop revisions are upgraded end-users reasonably expect that
- their applications should continue to work without any modifications.
- This is fulfilled as a result of support API compatibility, Semantic
- compatibility and Wire compatibility.
-
- However, Apache Hadoop is a very complex, distributed system and services a
- very wide variety of use-cases. In particular, Apache Hadoop MapReduce is a
- very, very wide API; in the sense that end-users may make wide-ranging
- assumptions such as layout of the local disk when their map/reduce tasks are
- executing, environment variables for their tasks etc. In such cases, it
- becomes very hard to fully specify, and support, absolute compatibility.
-
-*** Use cases
-
- * Existing MapReduce applications, including jars of existing packaged
- end-user applications and projects such as Apache Pig, Apache Hive,
- Cascading etc. should work unmodified when pointed to an upgraded Apache
- Hadoop cluster within a major release.
-
- * Existing YARN applications, including jars of existing packaged
- end-user applications and projects such as Apache Tez etc. should work
- unmodified when pointed to an upgraded Apache Hadoop cluster within a
- major release.
-
- * Existing applications which transfer data in/out of HDFS, including jars
- of existing packaged end-user applications and frameworks such as Apache
- Flume, should work unmodified when pointed to an upgraded Apache Hadoop
- cluster within a major release.
-
-*** Policy
-
- * Existing MapReduce, YARN & HDFS applications and frameworks should work
- unmodified within a major release i.e. Apache Hadoop ABI is supported.
-
- * A very minor fraction of applications maybe affected by changes to disk
- layouts etc., the developer community will strive to minimize these
- changes and will not make them within a minor version. In more egregious
- cases, we will consider strongly reverting these breaking changes and
- invalidating offending releases if necessary.
-
- * In particular for MapReduce applications, the developer community will
- try our best to support provide binary compatibility across major
- releases e.g. applications using org.apache.hadoop.mapred.
-
- * APIs are supported compatibly across hadoop-1.x and hadoop-2.x. See
- {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html}
- Compatibility for MapReduce applications between hadoop-1.x and hadoop-2.x}}
- for more details.
-
-** REST APIs
-
- REST API compatibility corresponds to both the request (URLs) and responses
- to each request (content, which may contain other URLs). Hadoop REST APIs
- are specifically meant for stable use by clients across releases,
- even major releases. The following are the exposed REST APIs:
-
- * {{{../hadoop-hdfs/WebHDFS.html}WebHDFS}} - Stable
-
- * {{{../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html}ResourceManager}}
-
- * {{{../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html}NodeManager}}
-
- * {{{../../hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html}MR Application Master}}
-
- * {{{../../hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html}History Server}}
-
-*** Policy
-
- The APIs annotated stable in the text above preserve compatibility
- across at least one major release, and maybe deprecated by a newer
- version of the REST API in a major release.
-
-** Metrics/JMX
-
- While the Metrics API compatibility is governed by Java API compatibility,
- the actual metrics exposed by Hadoop need to be compatible for users to
- be able to automate using them (scripts etc.). Adding additional metrics
- is compatible. Modifying (eg changing the unit or measurement) or removing
- existing metrics breaks compatibility. Similarly, changes to JMX MBean
- object names also break compatibility.
-
-*** Policy
-
- Metrics should preserve compatibility within the major release.
-
-** File formats & Metadata
-
- User and system level data (including metadata) is stored in files of
- different formats. Changes to the metadata or the file formats used to
- store data/metadata can lead to incompatibilities between versions.
-
-*** User-level file formats
-
- Changes to formats that end-users use to store their data can prevent
- them for accessing the data in later releases, and hence it is highly
- important to keep those file-formats compatible. One can always add a
- "new" format improving upon an existing format. Examples of these formats
- include har, war, SequenceFileFormat etc.
-
-**** Policy
-
- * Non-forward-compatible user-file format changes are
- restricted to major releases. When user-file formats change, new
- releases are expected to read existing formats, but may write data
- in formats incompatible with prior releases. Also, the community
- shall prefer to create a new format that programs must opt in to
- instead of making incompatible changes to existing formats.
-
-*** System-internal file formats
-
- Hadoop internal data is also stored in files and again changing these
- formats can lead to incompatibilities. While such changes are not as
- devastating as the user-level file formats, a policy on when the
- compatibility can be broken is important.
-
-**** MapReduce
-
- MapReduce uses formats like I-File to store MapReduce-specific data.
-
-
-***** Policy
-
- MapReduce-internal formats like IFile maintain compatibility within a
- major release. Changes to these formats can cause in-flight jobs to fail
- and hence we should ensure newer clients can fetch shuffle-data from old
- servers in a compatible manner.
-
-**** HDFS Metadata
-
- HDFS persists metadata (the image and edit logs) in a particular format.
- Incompatible changes to either the format or the metadata prevent
- subsequent releases from reading older metadata. Such incompatible
- changes might require an HDFS "upgrade" to convert the metadata to make
- it accessible. Some changes can require more than one such "upgrades".
-
- Depending on the degree of incompatibility in the changes, the following
- potential scenarios can arise:
-
- * Automatic: The image upgrades automatically, no need for an explicit
- "upgrade".
-
- * Direct: The image is upgradable, but might require one explicit release
- "upgrade".
-
- * Indirect: The image is upgradable, but might require upgrading to
- intermediate release(s) first.
-
- * Not upgradeable: The image is not upgradeable.
-
-***** Policy
-
- * A release upgrade must allow a cluster to roll-back to the older
- version and its older disk format. The rollback needs to restore the
- original data, but not required to restore the updated data.
-
- * HDFS metadata changes must be upgradeable via any of the upgrade
- paths - automatic, direct or indirect.
-
- * More detailed policies based on the kind of upgrade are yet to be
- considered.
-
-** Command Line Interface (CLI)
-
- The Hadoop command line programs may be use either directly via the
- system shell or via shell scripts. Changing the path of a command,
- removing or renaming command line options, the order of arguments,
- or the command return code and output break compatibility and
- may adversely affect users.
-
-*** Policy
-
- CLI commands are to be deprecated (warning when used) for one
- major release before they are removed or incompatibly modified in
- a subsequent major release.
-
-** Web UI
-
- Web UI, particularly the content and layout of web pages, changes
- could potentially interfere with attempts to screen scrape the web
- pages for information.
-
-*** Policy
-
- Web pages are not meant to be scraped and hence incompatible
- changes to them are allowed at any time. Users are expected to use
- REST APIs to get any information.
-
-** Hadoop Configuration Files
-
- Users use (1) Hadoop-defined properties to configure and provide hints to
- Hadoop and (2) custom properties to pass information to jobs. Hence,
- compatibility of config properties is two-fold:
-
- * Modifying key-names, units of values, and default values of Hadoop-defined
- properties.
-
- * Custom configuration property keys should not conflict with the
- namespace of Hadoop-defined properties. Typically, users should
- avoid using prefixes used by Hadoop: hadoop, io, ipc, fs, net,
- file, ftp, s3, kfs, ha, file, dfs, mapred, mapreduce, yarn.
-
-*** Policy
-
- * Hadoop-defined properties are to be deprecated at least for one
- major release before being removed. Modifying units for existing
- properties is not allowed.
-
- * The default values of Hadoop-defined properties can
- be changed across minor/major releases, but will remain the same
- across point releases within a minor release.
-
- * Currently, there is NO explicit policy regarding when new
- prefixes can be added/removed, and the list of prefixes to be
- avoided for custom configuration properties. However, as noted above,
- users should avoid using prefixes used by Hadoop: hadoop, io, ipc, fs,
- net, file, ftp, s3, kfs, ha, file, dfs, mapred, mapreduce, yarn.
-
-** Directory Structure
-
- Source code, artifacts (source and tests), user logs, configuration files,
- output and job history are all stored on disk either local file system or
- HDFS. Changing the directory structure of these user-accessible
- files break compatibility, even in cases where the original path is
- preserved via symbolic links (if, for example, the path is accessed
- by a servlet that is configured to not follow symbolic links).
-
-*** Policy
-
- * The layout of source code and build artifacts can change
- anytime, particularly so across major versions. Within a major
- version, the developers will attempt (no guarantees) to preserve
- the directory structure; however, individual files can be
- added/moved/deleted. The best way to ensure patches stay in sync
- with the code is to get them committed to the Apache source tree.
-
- * The directory structure of configuration files, user logs, and
- job history will be preserved across minor and point releases
- within a major release.
-
-** Java Classpath
-
- User applications built against Hadoop might add all Hadoop jars
- (including Hadoop's library dependencies) to the application's
- classpath. Adding new dependencies or updating the version of
- existing dependencies may interfere with those in applications'
- classpaths.
-
-*** Policy
-
- Currently, there is NO policy on when Hadoop's dependencies can
- change.
-
-** Environment variables
-
- Users and related projects often utilize the exported environment
- variables (eg HADOOP_CONF_DIR), therefore removing or renaming
- environment variables is an incompatible change.
-
-*** Policy
-
- Currently, there is NO policy on when the environment variables
- can change. Developers try to limit changes to major releases.
-
-** Build artifacts
-
- Hadoop uses maven for project management and changing the artifacts
- can affect existing user workflows.
-
-*** Policy
-
- * Test artifacts: The test jars generated are strictly for internal
- use and are not expected to be used outside of Hadoop, similar to
- APIs annotated @Private, @Unstable.
-
- * Built artifacts: The hadoop-client artifact (maven
- groupId:artifactId) stays compatible within a major release,
- while the other artifacts can change in incompatible ways.
-
-** Hardware/Software Requirements
-
- To keep up with the latest advances in hardware, operating systems,
- JVMs, and other software, new Hadoop releases or some of their
- features might require higher versions of the same. For a specific
- environment, upgrading Hadoop might require upgrading other
- dependent software components.
-
-*** Policies
-
- * Hardware
-
- * Architecture: The community has no plans to restrict Hadoop to
- specific architectures, but can have family-specific
- optimizations.
-
- * Minimum resources: While there are no guarantees on the
- minimum resources required by Hadoop daemons, the community
- attempts to not increase requirements within a minor release.
-
- * Operating Systems: The community will attempt to maintain the
- same OS requirements (OS kernel versions) within a minor
- release. Currently GNU/Linux and Microsoft Windows are the OSes officially
- supported by the community while Apache Hadoop is known to work reasonably
- well on other OSes such as Apple MacOSX, Solaris etc.
-
- * The JVM requirements will not change across point releases
- within the same minor release except if the JVM version under
- question becomes unsupported. Minor/major releases might require
- later versions of JVM for some/all of the supported operating
- systems.
-
- * Other software: The community tries to maintain the minimum
- versions of additional software required by Hadoop. For example,
- ssh, kerberos etc.
-
-* References
-
- Here are some relevant JIRAs and pages related to the topic:
-
- * The evolution of this document -
- {{{https://issues.apache.org/jira/browse/HADOOP-9517}HADOOP-9517}}
-
- * Binary compatibility for MapReduce end-user applications between hadoop-1.x and hadoop-2.x -
- {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html}
- MapReduce Compatibility between hadoop-1.x and hadoop-2.x}}
-
- * Annotations for interfaces as per interface classification
- schedule -
- {{{https://issues.apache.org/jira/browse/HADOOP-7391}HADOOP-7391}}
- {{{./InterfaceClassification.html}Hadoop Interface Classification}}
-
- * Compatibility for Hadoop 1.x releases -
- {{{https://issues.apache.org/jira/browse/HADOOP-5071}HADOOP-5071}}
-
- * The {{{http://wiki.apache.org/hadoop/Roadmap}Hadoop Roadmap}} page
- that captures other release policies
-