You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2012/11/26 20:26:15 UTC
svn commit: r1413793 - /hbase/trunk/src/docbkx/performance.xml
Author: stack
Date: Mon Nov 26 19:26:15 2012
New Revision: 1413793
URL: http://svn.apache.org/viewvc?rev=1413793&view=rev
Log:
HBASE-7217 Documentation: Update section 11.5.1 to recommend that hbase.regionserver.checksum.verify is set
Modified:
hbase/trunk/src/docbkx/performance.xml
Modified: hbase/trunk/src/docbkx/performance.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/performance.xml?rev=1413793&r1=1413792&r2=1413793&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/performance.xml (original)
+++ hbase/trunk/src/docbkx/performance.xml Mon Nov 26 19:26:15 2012
@@ -208,38 +208,10 @@
</section>
</section>
- <section xml:id="perf.hdfs.configs">
- <title>HDFS Configuration</title>
- <section xml:id="perf.hdfs.configs.localread">
- <title>Leveraging local data</title>
-<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
-<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
-it is possible for the DFSClient to take a "short circuit" and
-read directly from disk instead of going through the DataNode when the
-data is local. What this means for HBase is that the RegionServers can
-read directly off their machine's disks instead of having to open a
-socket to talk to the DataNode, the former being generally much
-faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance Talk</link></para></footnote>.
-Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
-more discussion around short circuit reads.
-</para>
-<para>To enable "short circuit" reads, you must set two configurations.
-First, the hdfs-site.xml needs to be amended. Set
-the property <varname>dfs.block.local-path-access.user</varname>
-to be the <emphasis>only</emphasis> user that can use the shortcut.
-This has to be the user that started HBase. Then in hbase-site.xml,
-set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
-</para>
-<para>
-The DataNodes need to be restarted in order to pick up the new
-configuration. Be aware that if a process started under another
-username than the one configured here also has the shortcircuit
-enabled, it will get an Exception regarding an unauthorized access but
-the data will still be read.
-</para>
- </section>
- </section>
+
+
+
<section xml:id="perf.zookeeper">
<title>ZooKeeper</title>
<para>See <xref linkend="zookeeper"/> for information on configuring ZooKeeper, and see the part
@@ -658,6 +630,39 @@ htable.close();</programlisting></para>
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-1599">Umbrella Jira Ticket for HDFS Improvements for HBase</link>.
</para>
</section>
+ <section xml:id="perf.hdfs.configs.localread">
+ <title>Leveraging local data</title>
+<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
+<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
+it is possible for the DFSClient to take a "short circuit" and
+read directly from disk instead of going through the DataNode when the
+data is local. What this means for HBase is that the RegionServers can
+read directly off their machine's disks instead of having to open a
+socket to talk to the DataNode, the former being generally much
+faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance Talk</link></para></footnote>.
+Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
+more discussion around short circuit reads.
+</para>
+<para>To enable "short circuit" reads, you must set two configurations.
+First, the hdfs-site.xml needs to be amended. Set
+the property <varname>dfs.block.local-path-access.user</varname>
+to be the <emphasis>only</emphasis> user that can use the shortcut.
+This has to be the user that started HBase. Then in hbase-site.xml,
+set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
+</para>
+<para>
+ For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled.
+ To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into
+ its datablocks and verify against these. See <xref linkend="hbase.regionserver.checksum.verify" />.
+</para>
+<para>
+The DataNodes need to be restarted in order to pick up the new
+configuration. Be aware that if a process started under another
+username than the one configured here also has the shortcircuit
+enabled, it will get an Exception regarding an unauthorized access but
+the data will still be read.
+</para>
+ </section>
<section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
<para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,