You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by zj...@apache.org on 2015/01/27 19:40:48 UTC
[30/50] [abbrv] hadoop git commit: HDFS-7667. Various typos and
improvements to HDFS Federation doc (Charles Lamb via aw)
HDFS-7667. Various typos and improvements to HDFS Federation doc (Charles Lamb via aw)
Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/d411460e
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/d411460e
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/d411460e
Branch: refs/heads/YARN-2928
Commit: d411460e0d66b9b9d58924df295a957ba84b17d7
Parents: 4b00935
Author: Allen Wittenauer <aw...@apache.org>
Authored: Fri Jan 23 13:37:46 2015 -0800
Committer: Allen Wittenauer <aw...@apache.org>
Committed: Fri Jan 23 13:37:52 2015 -0800
----------------------------------------------------------------------
hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt | 3 +
.../hadoop-hdfs/src/site/apt/Federation.apt.vm | 207 +++++++++----------
2 files changed, 105 insertions(+), 105 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/hadoop/blob/d411460e/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
index 9176ec7..c9bee1a 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -290,6 +290,9 @@ Trunk (Unreleased)
HADOOP-11484. hadoop-mapreduce-client-nativetask fails to build on ARM
AARCH64 due to x86 asm statements (Edward Nevill via Colin P. McCabe)
+ HDFS-7667. Various typos and improvements to HDFS Federation doc
+ (Charles Lamb via aw)
+
Release 2.7.0 - UNRELEASED
INCOMPATIBLE CHANGES
http://git-wip-us.apache.org/repos/asf/hadoop/blob/d411460e/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
index 29278b7..17aaf3c 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
@@ -32,16 +32,16 @@ HDFS Federation
* <<Namespace>>
- * Consists of directories, files and blocks
+ * Consists of directories, files and blocks.
* It supports all the namespace related file system operations such as
create, delete, modify and list files and directories.
- * <<Block Storage Service>> has two parts
+ * <<Block Storage Service>>, which has two parts:
- * Block Management (which is done in Namenode)
+ * Block Management (performed in the Namenode)
- * Provides datanode cluster membership by handling registrations, and
+ * Provides Datanode cluster membership by handling registrations, and
periodic heart beats.
* Processes block reports and maintains location of blocks.
@@ -49,29 +49,29 @@ HDFS Federation
* Supports block related operations such as create, delete, modify and
get block location.
- * Manages replica placement and replication of a block for under
- replicated blocks and deletes blocks that are over replicated.
+ * Manages replica placement, block replication for under
+ replicated blocks, and deletes blocks that are over replicated.
- * Storage - is provided by datanodes by storing blocks on the local file
- system and allows read/write access.
+ * Storage - is provided by Datanodes by storing blocks on the local file
+ system and allowing read/write access.
The prior HDFS architecture allows only a single namespace for the
- entire cluster. A single Namenode manages this namespace. HDFS
- Federation addresses limitation of the prior architecture by adding
- support multiple Namenodes/namespaces to HDFS file system.
+ entire cluster. In that configuration, a single Namenode manages the
+ namespace. HDFS Federation addresses this limitation by adding
+ support for multiple Namenodes/namespaces to HDFS.
* {Multiple Namenodes/Namespaces}
In order to scale the name service horizontally, federation uses multiple
- independent Namenodes/namespaces. The Namenodes are federated, that is, the
+ independent Namenodes/namespaces. The Namenodes are federated; the
Namenodes are independent and do not require coordination with each other.
- The datanodes are used as common storage for blocks by all the Namenodes.
- Each datanode registers with all the Namenodes in the cluster. Datanodes
- send periodic heartbeats and block reports and handles commands from the
- Namenodes.
+ The Datanodes are used as common storage for blocks by all the Namenodes.
+ Each Datanode registers with all the Namenodes in the cluster. Datanodes
+ send periodic heartbeats and block reports. They also handle
+ commands from the Namenodes.
- Users may use {{{./ViewFs.html}ViewFs}} to create personalized namespace views,
- where ViewFs is analogous to client side mount tables in some Unix/Linux systems.
+ Users may use {{{./ViewFs.html}ViewFs}} to create personalized namespace views.
+ ViewFs is analogous to client side mount tables in some Unix/Linux systems.
[./images/federation.gif] HDFS Federation Architecture
@@ -79,66 +79,67 @@ HDFS Federation
<<Block Pool>>
A Block Pool is a set of blocks that belong to a single namespace.
- Datanodes store blocks for all the block pools in the cluster.
- It is managed independently of other block pools. This allows a namespace
- to generate Block IDs for new blocks without the need for coordination
- with the other namespaces. The failure of a Namenode does not prevent
- the datanode from serving other Namenodes in the cluster.
+ Datanodes store blocks for all the block pools in the cluster. Each
+ Block Pool is managed independently. This allows a namespace to
+ generate Block IDs for new blocks without the need for coordination
+ with the other namespaces. A Namenode failure does not prevent the
+ Datanode from serving other Namenodes in the cluster.
A Namespace and its block pool together are called Namespace Volume.
It is a self-contained unit of management. When a Namenode/namespace
- is deleted, the corresponding block pool at the datanodes is deleted.
+ is deleted, the corresponding block pool at the Datanodes is deleted.
Each namespace volume is upgraded as a unit, during cluster upgrade.
<<ClusterID>>
- A new identifier <<ClusterID>> is added to identify all the nodes in
- the cluster. When a Namenode is formatted, this identifier is provided
- or auto generated. This ID should be used for formatting the other
- Namenodes into the cluster.
+ A <<ClusterID>> identifier is used to identify all the nodes in the
+ cluster. When a Namenode is formatted, this identifier is either
+ provided or auto generated. This ID should be used for formatting
+ the other Namenodes into the cluster.
** Key Benefits
- * Namespace Scalability - HDFS cluster storage scales horizontally but
- the namespace does not. Large deployments or deployments using lot
- of small files benefit from scaling the namespace by adding more
- Namenodes to the cluster
+ * Namespace Scalability - Federation adds namespace horizontal
+ scaling. Large deployments or deployments using lot of small files
+ benefit from namespace scaling by allowing more Namenodes to be
+ added to the cluster.
- * Performance - File system operation throughput is limited by a single
- Namenode in the prior architecture. Adding more Namenodes to the cluster
- scales the file system read/write operations throughput.
+ * Performance - File system throughput is not limited by a single
+ Namenode. Adding more Namenodes to the cluster scales the file
+ system read/write throughput.
- * Isolation - A single Namenode offers no isolation in multi user
- environment. An experimental application can overload the Namenode
- and slow down production critical applications. With multiple Namenodes,
- different categories of applications and users can be isolated to
- different namespaces.
+ * Isolation - A single Namenode offers no isolation in a multi user
+ environment. For example, an experimental application can overload
+ the Namenode and slow down production critical applications. By using
+ multiple Namenodes, different categories of applications and users
+ can be isolated to different namespaces.
* {Federation Configuration}
- Federation configuration is <<backward compatible>> and allows existing
- single Namenode configuration to work without any change. The new
- configuration is designed such that all the nodes in the cluster have
- same configuration without the need for deploying different configuration
- based on the type of the node in the cluster.
+ Federation configuration is <<backward compatible>> and allows
+ existing single Namenode configurations to work without any
+ change. The new configuration is designed such that all the nodes in
+ the cluster have the same configuration without the need for
+ deploying different configurations based on the type of the node in
+ the cluster.
- A new abstraction called <<<NameServiceID>>> is added with
- federation. The Namenode and its corresponding secondary/backup/checkpointer
- nodes belong to this. To support single configuration file, the Namenode and
- secondary/backup/checkpointer configuration parameters are suffixed with
- <<<NameServiceID>>> and are added to the same configuration file.
+ Federation adds a new <<<NameServiceID>>> abstraction. A Namenode
+ and its corresponding secondary/backup/checkpointer nodes all belong
+ to a NameServiceId. In order to support a single configuration file,
+ the Namenode and secondary/backup/checkpointer configuration
+ parameters are suffixed with the <<<NameServiceID>>>.
** Configuration:
- <<Step 1>>: Add the following parameters to your configuration:
- <<<dfs.nameservices>>>: Configure with list of comma separated
- NameServiceIDs. This will be used by Datanodes to determine all the
+ <<Step 1>>: Add the <<<dfs.nameservices>>> parameter to your
+ configuration and configure it with a list of comma separated
+ NameServiceIDs. This will be used by the Datanodes to determine the
Namenodes in the cluster.
<<Step 2>>: For each Namenode and Secondary Namenode/BackupNode/Checkpointer
- add the following configuration suffixed with the corresponding
- <<<NameServiceID>>> into the common configuration file.
+ add the following configuration parameters suffixed with the corresponding
+ <<<NameServiceID>>> into the common configuration file:
*---------------------+--------------------------------------------+
|| Daemon || Configuration Parameter |
@@ -160,7 +161,7 @@ HDFS Federation
| | <<<dfs.secondary.namenode.keytab.file>>> |
*---------------------+--------------------------------------------+
- Here is an example configuration with two namenodes:
+ Here is an example configuration with two Namenodes:
----
<configuration>
@@ -199,16 +200,16 @@ HDFS Federation
** Formatting Namenodes
- <<Step 1>>: Format a namenode using the following command:
+ <<Step 1>>: Format a Namenode using the following command:
----
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format [-clusterId <cluster_id>]
----
- Choose a unique cluster_id, which will not conflict other clusters in
- your environment. If it is not provided, then a unique ClusterID is
+ Choose a unique cluster_id which will not conflict other clusters in
+ your environment. If a cluster_id is not provided, then a unique one is
auto generated.
- <<Step 2>>: Format additional namenode using the following command:
+ <<Step 2>>: Format additional Namenodes using the following command:
----
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format -clusterId <cluster_id>
@@ -219,40 +220,38 @@ HDFS Federation
** Upgrading from an older release and configuring federation
- Older releases supported a single Namenode.
- Upgrade the cluster to newer release to enable federation
+ Older releases only support a single Namenode.
+ Upgrade the cluster to newer release in order to enable federation
During upgrade you can provide a ClusterID as follows:
----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs start namenode --config $HADOOP_CONF_DIR -upgrade -clusterId <cluster_ID>
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start namenode -upgrade -clusterId <cluster_ID>
----
- If ClusterID is not provided, it is auto generated.
+ If cluster_id is not provided, it is auto generated.
** Adding a new Namenode to an existing HDFS cluster
- Follow the following steps:
+ Perform the following steps:
- * Add configuration parameter <<<dfs.nameservices>>> to the configuration.
+ * Add <<<dfs.nameservices>>> to the configuration.
- * Update the configuration with NameServiceID suffix. Configuration
- key names have changed post release 0.20. You must use new configuration
- parameter names, for federation.
+ * Update the configuration with the NameServiceID suffix. Configuration
+ key names changed post release 0.20. You must use the new configuration
+ parameter names in order to use federation.
- * Add new Namenode related config to the configuration files.
+ * Add the new Namenode related config to the configuration file.
* Propagate the configuration file to the all the nodes in the cluster.
- * Start the new Namenode, Secondary/Backup.
+ * Start the new Namenode and Secondary/Backup.
- * Refresh the datanodes to pickup the newly added Namenode by running
- the following command:
+ * Refresh the Datanodes to pickup the newly added Namenode by running
+ the following command against all the Datanodes in the cluster:
----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfsadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
----
- * The above command must be run against all the datanodes in the cluster.
-
* {Managing the cluster}
** Starting and stopping cluster
@@ -270,28 +269,28 @@ HDFS Federation
----
These commands can be run from any node where the HDFS configuration is
- available. The command uses configuration to determine the Namenodes
- in the cluster and starts the Namenode process on those nodes. The
- datanodes are started on nodes specified in the <<<slaves>>> file. The
- script can be used as reference for building your own scripts for
- starting and stopping the cluster.
+ available. The command uses the configuration to determine the Namenodes
+ in the cluster and then starts the Namenode process on those nodes. The
+ Datanodes are started on the nodes specified in the <<<slaves>>> file. The
+ script can be used as a reference for building your own scripts to
+ start and stop the cluster.
** Balancer
- Balancer has been changed to work with multiple Namenodes in the cluster to
- balance the cluster. Balancer can be run using the command:
+ The Balancer has been changed to work with multiple
+ Namenodes. The Balancer can be run using the command:
----
[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start balancer [-policy <policy>]
----
- Policy could be:
+ The policy parameter can be any of the following:
* <<<datanode>>> - this is the <default> policy. This balances the storage at
- the datanode level. This is similar to balancing policy from prior releases.
+ the Datanode level. This is similar to balancing policy from prior releases.
- * <<<blockpool>>> - this balances the storage at the block pool level.
- Balancing at block pool level balances storage at the datanode level also.
+ * <<<blockpool>>> - this balances the storage at the block pool
+ level which also balances at the Datanode level.
Note that Balancer only balances the data and does not balance the namespace.
For the complete command usage, see {{{../hadoop-common/CommandsManual.html#balancer}balancer}}.
@@ -299,44 +298,42 @@ HDFS Federation
** Decommissioning
Decommissioning is similar to prior releases. The nodes that need to be
- decomissioned are added to the exclude file at all the Namenode. Each
+ decomissioned are added to the exclude file at all of the Namenodes. Each
Namenode decommissions its Block Pool. When all the Namenodes finish
- decommissioning a datanode, the datanode is considered to be decommissioned.
+ decommissioning a Datanode, the Datanode is considered decommissioned.
- <<Step 1>>: To distributed an exclude file to all the Namenodes, use the
+ <<Step 1>>: To distribute an exclude file to all the Namenodes, use the
following command:
----
-[hdfs]$ $HADOOP_PREFIX/sbin/distributed-exclude.sh <exclude_file>
+[hdfs]$ $HADOOP_PREFIX/sbin/distribute-exclude.sh <exclude_file>
----
- <<Step 2>>: Refresh all the Namenodes to pick up the new exclude file.
+ <<Step 2>>: Refresh all the Namenodes to pick up the new exclude file:
----
[hdfs]$ $HADOOP_PREFIX/sbin/refresh-namenodes.sh
----
- The above command uses HDFS configuration to determine the Namenodes
- configured in the cluster and refreshes all the Namenodes to pick up
+ The above command uses HDFS configuration to determine the
+ configured Namenodes in the cluster and refreshes them to pick up
the new exclude file.
** Cluster Web Console
- Similar to Namenode status web page, a Cluster Web Console is added in
- federation to monitor the federated cluster at
+ Similar to the Namenode status web page, when using federation a
+ Cluster Web Console is available to monitor the federated cluster at
<<<http://<any_nn_host:port>/dfsclusterhealth.jsp>>>.
Any Namenode in the cluster can be used to access this web page.
- The web page provides the following information:
+ The Cluster Web Console provides the following information:
- * Cluster summary that shows number of files, number of blocks and
- total configured storage capacity, available and used storage information
+ * A cluster summary that shows the number of files, number of blocks,
+ total configured storage capacity, and the available and used storage
for the entire cluster.
- * Provides list of Namenodes and summary that includes number of files,
- blocks, missing blocks, number of live and dead data nodes for each
- Namenode. It also provides a link to conveniently access Namenode web UI.
-
- * It also provides decommissioning status of datanodes.
-
+ * A list of Namenodes and a summary that includes the number of files,
+ blocks, missing blocks, and live and dead data nodes for each
+ Namenode. It also provides a link to access each Namenode's web UI.
+ * The decommissioning status of Datanodes.