You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2012/11/26 20:26:15 UTC

svn commit: r1413793 - /hbase/trunk/src/docbkx/performance.xml

Author: stack
Date: Mon Nov 26 19:26:15 2012
New Revision: 1413793

URL: http://svn.apache.org/viewvc?rev=1413793&view=rev
Log:
HBASE-7217 Documentation: Update section 11.5.1 to recommend that hbase.regionserver.checksum.verify is set

Modified:
    hbase/trunk/src/docbkx/performance.xml

Modified: hbase/trunk/src/docbkx/performance.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/performance.xml?rev=1413793&r1=1413792&r2=1413793&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/performance.xml (original)
+++ hbase/trunk/src/docbkx/performance.xml Mon Nov 26 19:26:15 2012
@@ -208,38 +208,10 @@
     </section>
 
   </section>
-  <section xml:id="perf.hdfs.configs">
-    <title>HDFS Configuration</title>
-    <section xml:id="perf.hdfs.configs.localread">
-    <title>Leveraging local data</title>
-<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
-<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
-it is possible for the DFSClient to take a "short circuit" and
-read directly from disk instead of going through the DataNode when the
-data is local. What this means for HBase is that the RegionServers can
-read directly off their machine's disks instead of having to open a
-socket to talk to the DataNode, the former being generally much
-faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance Talk</link></para></footnote>.
-Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
-more discussion around short circuit reads.
-</para>
-<para>To enable "short circuit" reads, you must set two configurations.
-First, the hdfs-site.xml needs to be amended. Set
-the property  <varname>dfs.block.local-path-access.user</varname>
-to be the <emphasis>only</emphasis> user that can use the shortcut.
-This has to be the user that started HBase.  Then in hbase-site.xml,
-set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
-</para>
-<para>
-The DataNodes need to be restarted in order to pick up the new
-configuration. Be aware that if a process started under another
-username than the one configured here also has the shortcircuit
-enabled, it will get an Exception regarding an unauthorized access but
-the data will still be read.
-</para>
-  </section>
 
-  </section>
+
+
+
   <section xml:id="perf.zookeeper">
     <title>ZooKeeper</title>
     <para>See <xref linkend="zookeeper"/> for information on configuring ZooKeeper, and see the part
@@ -658,6 +630,39 @@ htable.close();</programlisting></para>
       <link xlink:href="https://issues.apache.org/jira/browse/HDFS-1599">Umbrella Jira Ticket for HDFS Improvements for HBase</link>.
       </para>
     </section>
+    <section xml:id="perf.hdfs.configs.localread">
+    <title>Leveraging local data</title>
+<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
+<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
+it is possible for the DFSClient to take a "short circuit" and
+read directly from disk instead of going through the DataNode when the
+data is local. What this means for HBase is that the RegionServers can
+read directly off their machine's disks instead of having to open a
+socket to talk to the DataNode, the former being generally much
+faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance Talk</link></para></footnote>.
+Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
+more discussion around short circuit reads.
+</para>
+<para>To enable "short circuit" reads, you must set two configurations.
+First, the hdfs-site.xml needs to be amended. Set
+the property  <varname>dfs.block.local-path-access.user</varname>
+to be the <emphasis>only</emphasis> user that can use the shortcut.
+This has to be the user that started HBase.  Then in hbase-site.xml,
+set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
+</para>
+<para>
+    For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled.
+    To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into
+    its datablocks and verify against these. See <xref linkend="hbase.regionserver.checksum.verify" />.
+</para>
+<para>
+The DataNodes need to be restarted in order to pick up the new
+configuration. Be aware that if a process started under another
+username than the one configured here also has the shortcircuit
+enabled, it will get an Exception regarding an unauthorized access but
+the data will still be read.
+</para>
+  </section>
     <section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
      <para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
      a MapReduce source or sink).  The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,