You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2011/08/31 19:45:57 UTC

[Hadoop Wiki] Update of "FAQ" by MichaelSchmitz

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "FAQ" page has been changed by MichaelSchmitz:
http://wiki.apache.org/hadoop/FAQ?action=diff&rev1=107&rev2=108

Comment:
Added some detail to the part about decomissioning a node.

== I want to make a large cluster smaller by taking out a bunch of nodes simultaneously. How can this be done? ==
On a large cluster removing one or two data-nodes will not lead to any data loss, because name-node will replicate their blocks as long as it will detect that the nodes are dead. With a large number of nodes getting removed or dying the probability of losing data is higher.

- Hadoop offers the ''decommission'' feature to retire a set of existing data-nodes. The nodes to be retired should be included into the ''exclude file'', and the exclude file name should be specified as a configuration parameter [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.hosts.exclude|dfs.hosts.exclude]]. This file should have been specified during namenode startup. It could be a zero length file. You must use the full hostname, ip or ip:port format in this file. Then the shell command
+ Hadoop offers the ''decommission'' feature to retire a set of existing data-nodes. The nodes to be retired should be included into the ''exclude file'', and the exclude file name should be specified as a configuration parameter [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.hosts.exclude|dfs.hosts.exclude]]. This file should have been specified during namenode startup. It could be a zero length file. You must use the full hostname, ip or ip:port format in this file. (Note that some users have trouble using the host name. If your namenode shows some nodes in "Live" and "Dead" but not decommission, try using the full ip:port.) Then the shell command

{{{
bin/hadoop dfsadmin -refreshNodes
}}}
should be called, which forces the name-node to re-read the exclude file and start the decommission process.

- Decommission does not happen momentarily since it requires replication of potentially a large number of blocks and we do not want the cluster to be overwhelmed with just this one job. The decommission progress can be monitored on the name-node Web UI. Until all blocks are replicated the node will be in "Decommission In Progress" state. When decommission is done the state will change to "Decommissioned". The nodes can be removed whenever decommission is finished.
+ Decommission is not instant since it requires replication of potentially a large number of blocks and we do not want the cluster to be overwhelmed with just this one job. The decommission progress can be monitored on the name-node Web UI. Until all blocks are replicated the node will be in "Decommission In Progress" state. When decommission is done the state will change to "Decommissioned". The nodes can be removed whenever decommission is finished.

The decommission process can be terminated at any time by editing the configuration or the exclude files and repeating the {{{-refreshNodes}}} command.