You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2012/12/05 22:31:58 UTC
svn commit: r1417657 - /hbase/trunk/src/docbkx/ops_mgt.xml
Author: stack
Date: Wed Dec 5 21:31:57 2012
New Revision: 1417657
URL: http://svn.apache.org/viewvc?rev=1417657&view=rev
Log:
More on bad disk handling
Modified:
hbase/trunk/src/docbkx/ops_mgt.xml
Modified: hbase/trunk/src/docbkx/ops_mgt.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/ops_mgt.xml?rev=1417657&r1=1417656&r2=1417657&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/ops_mgt.xml (original)
+++ hbase/trunk/src/docbkx/ops_mgt.xml Wed Dec 5 21:31:57 2012
@@ -387,11 +387,18 @@ false
to go down spewing errors in <filename>dmesg</filename> -- or for some reason, run much slower than their
companions. In this case you want to decommission the disk. You have two options. You can
<xlink href="http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F">decommission the datanode</xlink>
- or, less disruptive in that only the bad disks data will be rereplicated, is that you can stop the datanode,
+ or, less disruptive in that only the bad disks data will be rereplicated, can stop the datanode,
unmount the bad volume (You can't umount a volume while the datanode is using it), and then restart the
datanode (presuming you have set dfs.datanode.failed.volumes.tolerated > 0). The regionserver will
throw some errors in its logs as it recalibrates where to get its data from -- it will likely
roll its WAL log too -- but in general but for some latency spikes, it should keep on chugging.
+ <note>
+ <para>If you are doing short-circuit reads, you will have to move the regions off the regionserver
+ before you stop the datanode; when short-circuiting reading, though chmod'd so regionserver cannot
+ have access, because it already has the files open, it will be able to keep reading the file blocks
+ from the bad disk even though the datanode is down. Move the regions back after you restart the
+ datanode.</para>
+ </note>
</para>
</section>
</section>