You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by en...@apache.org on 2014/12/03 06:53:26 UTC

[3/9] hbase git commit: Blanket update of src/main/docbkx from master

http://git-wip-us.apache.org/repos/asf/hbase/blob/48d9d27d/src/main/docbkx/performance.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml
index 689b26f..1757d3f 100644
--- a/src/main/docbkx/performance.xml
+++ b/src/main/docbkx/performance.xml
@@ -182,6 +182,8 @@
           save a bit of YGC churn and allocate in the old gen directly. </para>
         <para>For more information about GC logs, see <xref
             linkend="trouble.log.gc" />. </para>
+    <para>Consider also enabling the offheap Block Cache.  This has been shown to mitigate
+        GC pause times.  See <xref linkend="block.cache" /></para>
       </section>
     </section>
   </section>
@@ -627,7 +629,7 @@ hbase> <userinput>create 'mytable',{NAME => 'colfam1', BLOOMFILTER => 'ROWCOL'}<
       <title>Constants</title>
       <para>When people get started with HBase they have a tendency to write code that looks like
         this:</para>
-      <programlisting>
+      <programlisting language="java">
 Get get = new Get(rowkey);
 Result r = htable.get(get);
 byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr"));  // returns current version of value
@@ -635,7 +637,7 @@ byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr"));  // returns c
       <para>But especially when inside loops (and MapReduce jobs), converting the columnFamily and
         column-names to byte-arrays repeatedly is surprisingly expensive. It's better to use
         constants for the byte-arrays, like this:</para>
-      <programlisting>
+      <programlisting language="java">
 public static final byte[] CF = "cf".getBytes();
 public static final byte[] ATTR = "attr".getBytes();
 ...
@@ -669,14 +671,14 @@ byte[] b = r.getValue(CF, ATTR);  // returns current version of value
       <para>There are two different approaches to pre-creating splits. The first approach is to rely
         on the default <code>HBaseAdmin</code> strategy (which is implemented in
           <code>Bytes.split</code>)... </para>
-      <programlisting>
-byte[] startKey = ...;   	// your lowest keuy
+      <programlisting language="java">
+byte[] startKey = ...;   	// your lowest key
 byte[] endKey = ...;   		// your highest key
 int numberOfRegions = ...;	// # of regions to create
 admin.createTable(table, startKey, endKey, numberOfRegions);
       </programlisting>
       <para>And the other approach is to define the splits yourself... </para>
-      <programlisting>
+      <programlisting language="java">
 byte[][] splits = ...;   // create your own splits
 admin.createTable(table, splits);
 </programlisting>
@@ -829,7 +831,7 @@ admin.createTable(table, splits);
           <code>Scan.HINT_LOOKAHEAD</code> can be set the on Scan object. The following code
         instructs the RegionServer to attempt two iterations of next before a seek is
         scheduled:</para>
-      <programlisting>
+      <programlisting language="java">
 Scan scan = new Scan();
 scan.addColumn(...);
 scan.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));
@@ -854,7 +856,7 @@ table.getScanner(scan);
           xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html">ResultScanners</link>
         you can cause problems on the RegionServers. Always have ResultScanner processing enclosed
         in try/catch blocks...</para>
-      <programlisting>
+      <programlisting language="java">
 Scan scan = new Scan();
 // set attrs...
 ResultScanner rs = htable.getScanner(scan);
@@ -878,6 +880,8 @@ htable.close();
           <methodname>setCacheBlocks</methodname> method. For input Scans to MapReduce jobs, this
         should be <varname>false</varname>. For frequently accessed rows, it is advisable to use the
         block cache.</para>
+
+    <para>Cache more data by moving your Block Cache offheap.  See <xref linkend="offheap.blockcache" /></para>
     </section>
     <section
       xml:id="perf.hbase.client.rowkeyonly">
@@ -984,6 +988,58 @@ htable.close();
       </section>
     </section>
     <!--  bloom  -->
+    <section>
+      <title>Hedged Reads</title>
+      <para>Hedged reads are a feature of HDFS, introduced in <link
+          xlink:href="https://issues.apache.org/jira/browse/HDFS-5776">HDFS-5776</link>. Normally, a
+        single thread is spawned for each read request. However, if hedged reads are enabled, the
+        client waits some configurable amount of time, and if the read does not return, the client
+        spawns a second read request, against a different block replica of the same data. Whichever
+        read returns first is used, and the other read request is discarded. Hedged reads can be
+        helpful for times where a rare slow read is caused by a transient error such as a failing
+        disk or flaky network connection.</para>
+      <para> Because a HBase RegionServer is a HDFS client, you can enable hedged reads in HBase, by
+        adding the following properties to the RegionServer's hbase-site.xml and tuning the values
+        to suit your environment.</para>
+      <itemizedlist>
+        <title>Configuration for Hedged Reads</title>
+        <listitem>
+          <para><code>dfs.client.hedged.read.threadpool.size</code> - the number of threads
+            dedicated to servicing hedged reads. If this is set to 0 (the default), hedged reads are
+            disabled.</para>
+        </listitem>
+        <listitem>
+          <para><code>dfs.client.hedged.read.threshold.millis</code> - the number of milliseconds to
+            wait before spawning a second read thread.</para>
+        </listitem>
+      </itemizedlist>
+      <example>
+        <title>Hedged Reads Configuration Example</title>
+        <screen><![CDATA[<property>
+  <name>dfs.client.hedged.read.threadpool.size</name>
+  <value>20</value>  <!-- 20 threads -->
+</property>
+<property>
+  <name>dfs.client.hedged.read.threshold.millis</name>
+  <value>10</value>  <!-- 10 milliseconds -->
+</property>]]></screen>
+      </example>
+      <para>Use the following metrics to tune the settings for hedged reads on
+        your cluster. See <xref linkend="hbase_metrics"/>  for more information.</para>
+      <itemizedlist>
+        <title>Metrics for Hedged Reads</title>
+        <listitem>
+          <para>hedgedReadOps - the number of times hedged read threads have been triggered. This
+            could indicate that read requests are often slow, or that hedged reads are triggered too
+            quickly.</para>
+        </listitem>
+        <listitem>
+          <para>hedgeReadOpsWin - the number of times the hedged read thread was faster than the
+            original thread. This could indicate that a given RegionServer is having trouble
+            servicing requests.</para>
+        </listitem>
+      </itemizedlist>
+    </section>
 
   </section>
   <!--  reading -->
@@ -1052,7 +1108,7 @@ htable.close();
           shortcircuit reads configuration page</link> for how to enable the latter, better version
         of shortcircuit. For example, here is a minimal config. enabling short-circuit reads added
         to <filename>hbase-site.xml</filename>: </para>
-      <programlisting><![CDATA[<property>
+      <programlisting language="xml"><![CDATA[<property>
   <name>dfs.client.read.shortcircuit</name>
   <value>true</value>
   <description>

http://git-wip-us.apache.org/repos/asf/hbase/blob/48d9d27d/src/main/docbkx/preface.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/preface.xml b/src/main/docbkx/preface.xml
index ff8efb9..a8f6895 100644
--- a/src/main/docbkx/preface.xml
+++ b/src/main/docbkx/preface.xml
@@ -39,15 +39,29 @@
             xlink:href="http://wiki.apache.org/hadoop/Hbase">wiki</link> where the pertinent
         information can be found.</para>
 
-    <para>This reference guide is a work in progress. The source for this guide can be found at
-            <filename>src/main/docbkx</filename> in a checkout of the hbase project. This reference
-        guide is marked up using <link
-            xlink:href="http://www.docbook.com/">DocBook</link> from which the the finished guide is
-        generated as part of the 'site' build target. Run <programlisting>mvn site</programlisting>
-        to generate this documentation. Amendments and improvements to the documentation are
-        welcomed. Add a patch to an issue up in the HBase <link
-            xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>.</para>
-
+    <formalpara>
+        <title>About This Guide</title>
+        <para>This reference guide is a work in progress. The source for this guide can be found in
+            the <filename>src/main/docbkx</filename> directory of the HBase source. This reference
+            guide is marked up using <link xlink:href="http://www.docbook.org/">DocBook</link> from
+            which the the finished guide is generated as part of the 'site' build target. Run
+            <programlisting language="bourne">mvn site</programlisting> to generate this documentation. Amendments and
+            improvements to the documentation are welcomed. Click <link
+                xlink:href="https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12310753&amp;issuetype=1&amp;components=12312132&amp;summary=SHORT+DESCRIPTION"
+                >this link</link> to file a new documentation bug against Apache HBase with some
+            values pre-selected.</para>
+    </formalpara>
+    <formalpara>
+        <title>Contributing to the Documentation</title>
+        <para>For an overview of Docbook and suggestions to get started contributing to the documentation, see <xref linkend="appendix_contributing_to_documentation" />.</para>
+    </formalpara>
+    <formalpara>
+        <title>Providing Feedback</title>
+        <para>This guide allows you to leave comments or questions on any page, using Disqus. Look
+            for the Comments area at the bottom of the page. Answering these questions is a
+            volunteer effort, and may be delayed.</para>
+    </formalpara>
+    
     <note
         xml:id="headsup">
         <title>Heads-up if this is your first foray into the world of distributed

http://git-wip-us.apache.org/repos/asf/hbase/blob/48d9d27d/src/main/docbkx/schema_design.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/schema_design.xml b/src/main/docbkx/schema_design.xml
index 614dab7..65e64b0 100644
--- a/src/main/docbkx/schema_design.xml
+++ b/src/main/docbkx/schema_design.xml
@@ -44,7 +44,7 @@
         xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link>
       in the Java API. </para>
     <para>Tables must be disabled when making ColumnFamily modifications, for example:</para>
-    <programlisting>
+    <programlisting language="java">
 Configuration config = HBaseConfiguration.create();
 HBaseAdmin admin = new HBaseAdmin(conf);
 String table = "myTable";
@@ -280,7 +280,7 @@ d-foo0002
           in those eight bytes. If you stored this number as a String -- presuming a byte per
           character -- you need nearly 3x the bytes. </para>
         <para>Not convinced? Below is some sample code that you can run on your own.</para>
-        <programlisting>
+        <programlisting language="java">
 // long
 //
 long l = 1234567890L;
@@ -403,7 +403,7 @@ COLUMN                                        CELL
         are accessible in the keyspace. </para>
       <para>To conclude this example, the following is an example of how appropriate splits can be
         pre-created for hex-keys:. </para>
-      <programlisting><![CDATA[public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits)
+      <programlisting language="java"><![CDATA[public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits)
 throws IOException {
   try {
     admin.createTable( table, splits );
@@ -439,18 +439,15 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
       xml:id="schema.versions.max">
       <title>Maximum Number of Versions</title>
       <para>The maximum number of row versions to store is configured per column family via <link
-          xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html"
-          >HColumnDescriptor</link>. The default for max versions is 3 prior to HBase 0.96.x, and 1
-        in newer versions. This is an important parameter because as described in <xref
-          linkend="datamodel"/> section HBase does <emphasis>not</emphasis> overwrite row values,
+          xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
+        The default for max versions is 1. This is an important parameter because as described in <xref
+          linkend="datamodel" /> section HBase does <emphasis>not</emphasis> overwrite row values,
         but rather stores different values per row by time (and qualifier). Excess versions are
         removed during major compactions. The number of max versions may need to be increased or
         decreased depending on application needs. </para>
       <para>It is not recommended setting the number of max versions to an exceedingly high level
         (e.g., hundreds or more) unless those old values are very dear to you because this will
         greatly increase StoreFile size. </para>
-      <para>See <xref linkend="specify.number.of.versions"/> for examples for setting the maximum
-        number of versions on a given column or globally.</para>
     </section>
     <section
       xml:id="schema.minversions">
@@ -465,8 +462,6 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
           around</emphasis>" (where M is the value for minimum number of row versions, M&lt;N). This
         parameter should only be set when time-to-live is enabled for a column family and must be
         less than the number of row versions. </para>
-      <para>See <xref linkend="specify.number.of.versions"/> for examples for setting the minimum
-        number of versions on a given column.</para>
     </section>
   </section>
   <section
@@ -700,7 +695,7 @@ HColumnDescriptor.setKeepDeletedCells(true);
           timestamps, by performing a mod operation on the timestamp. If time-oriented scans are
           important, this could be a useful approach. Attention must be paid to the number of
           buckets, because this will require the same number of scans to return results.</para>
-        <programlisting>
+        <programlisting language="java">
 long bucket = timestamp % numBuckets;
         </programlisting>
         <para>… to construct:</para>
@@ -1161,13 +1156,13 @@ long bucket = timestamp % numBuckets;
 ]]></programlisting>
 
       <para>The other option we had was to do this entirely using:</para>
-      <programlisting><![CDATA[
+      <programlisting language="xml"><![CDATA[
 <FixedWidthUserName><FixedWidthPageNum0>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>...
 <FixedWidthUserName><FixedWidthPageNum1>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>...
     		]]></programlisting>
       <para> where each row would contain multiple values. So in one case reading the first thirty
         values would be: </para>
-      <programlisting><![CDATA[
+      <programlisting language="java"><![CDATA[
 scan { STARTROW => 'FixedWidthUsername' LIMIT => 30}
     		]]></programlisting>
       <para>And in the second case it would be </para>