You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by mi...@apache.org on 2014/10/07 08:46:42 UTC
git commit: HBASE-11692 Document how and why to do a manual region split

Repository: hbase
Updated Branches:
  refs/heads/master 3557a3235 -> a3b65c45a


HBASE-11692 Document how and why to do a manual region split

Incorporated Stack's feedback


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/a3b65c45
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/a3b65c45
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/a3b65c45

Branch: refs/heads/master
Commit: a3b65c45ad3c55fc5ce12e6d69701a2bcd84f055
Parents: 3557a32
Author: Misty Stanley-Jones <ms...@cloudera.com>
Authored: Thu Oct 2 09:21:57 2014 +1000
Committer: Misty Stanley-Jones <ms...@cloudera.com>
Committed: Tue Oct 7 16:46:31 2014 +1000

----------------------------------------------------------------------
 src/main/docbkx/book.xml          | 86 ++++++++++++++++++++++++++++++++++
 src/main/docbkx/configuration.xml |  4 +-
 src/main/docbkx/ops_mgt.xml       |  4 +-
 src/main/docbkx/performance.xml   |  6 +--
 4 files changed, 94 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/book.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml
index b2b4c78..eea00d6 100644
--- a/src/main/docbkx/book.xml
+++ b/src/main/docbkx/book.xml
@@ -3298,6 +3298,92 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
         </section>
       </section>
 
+      <section xml:id="manual_region_splitting_decisions">
+        <title>Manual Region Splitting</title>
+        <para>It is possible to manually split your table, either at table creation (pre-splitting),
+          or at a later time as an administrative action. You might choose to split your region for
+          one or more of the following reasons. There may be other valid reasons, but the need to
+          manually split your table might also point to problems with your schema design.</para>
+        <itemizedlist>
+          <title>Reasons to Manually Split Your Table</title>
+          <listitem>
+            <para>Your data is sorted by timeseries or another similar algorithm that sorts new data
+              at the end of the table. This means that the Region Server holding the last region is
+              always under load, and the other Region Servers are idle, or mostly idle. See also
+                <xref linkend="timeseries"/>.</para>
+          </listitem>
+          <listitem>
+            <para>You have developed an unexpected hotspot in one region of your table. For
+              instance, an application which tracks web searches might be inundated by a lot of
+              searches for a celebrity in the event of news about that celebrity. See <xref
+                linkend="perf.one.region"/> for more discussion about this particular
+              scenario.</para>
+          </listitem>
+          <listitem>
+            <para>After a big increase to the number of Region Servers in your cluster, to get the
+              load spread out quickly.</para>
+          </listitem>
+          <listitem>
+            <para>Before a bulk-load which is likely to cause unusual and uneven load across
+              regions.</para>
+          </listitem>
+        </itemizedlist>
+        <para>See <xref linkend="disable.splitting"/> for a discussion about the dangers and
+          possible benefits of managing splitting completely manually.</para>
+        <section>
+          <title>Determining Split Points</title>
+          <para>The goal of splitting your table manually is to improve the chances of balancing the
+            load across the cluster in situations where good rowkey design alone won't get you
+            there. Keeping that in mind, the way you split your regions is very dependent upon the
+            characteristics of your data. It may be that you already know the best way to split your
+            table. If not, the way you split your table depends on what your keys are like.</para>
+          <variablelist>
+            <varlistentry>
+              <term>Alphanumeric Rowkeys</term>
+              <listitem>
+                <para>If your rowkeys start with a letter or number, you can split your table at
+                  letter or number boundaries. For instance, the following command creates a table
+                  with regions that split at each vowel, so the first region has A-D, the second
+                  region has E-H, the third region has I-N, the fourth region has O-V, and the fifth
+                  region has U-Z.</para>
+                  <screen>hbase> create 'test_table', 'f1', SPLITS=> ['a', 'e', 'i', 'o', 'u']</screen>
+                <para>The following command splits an existing table at split point '2'.</para>
+                <screen>hbase> split 'test_table', '2'</screen>
+                <para>You can also split a specific region by referring to its ID. You can find the
+                  region ID by looking at either the table or region in the Web UI. It will be a
+                  long number such as
+                    <literal>t2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.</literal>. The
+                  format is <replaceable>table_name,start_key,region_id</replaceable>To split that
+                  region into two, as close to equally as possible (at the nearest row boundary),
+                  issue the following command.</para>
+                <screen>hbase> split 't2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.'</screen>
+                <para>The split key is optional. If it is omitted, the table or region is split in
+                  half.</para>
+                <para>The following example shows how to use the RegionSplitter to create 10
+                  regions, split at hexadecimal values.</para>
+                <screen>hbase org.apache.hadoop.hbase.util.RegionSplitter test_table HexStringSplit -c 10 -f f1</screen>
+              </listitem>
+            </varlistentry>
+            <varlistentry>
+              <term>Using a Custom Algorithm</term>
+              <listitem>
+                <para>The RegionSplitter tool is provided with HBase, and uses a <firstterm><link
+                      xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.SplitAlgorithm.html"
+                      >SplitAlgorithm</link></firstterm> to determine split points for you. As
+                  parameters, you give it the algorithm, desired number of regions, and column
+                  families. It includes two split algorithms. The first is the <code><link
+                      xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html"
+                      >HexStringSplit</link></code> algorithm, which assumes the row keys are
+                  hexadecimal strings. The second, <link
+                    xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.UniformSplit.html"
+                    >UniformSplit</link>, assumes the row keys are random byte arrays. You will
+                  probably need to develop your own SplitAlgorithm, using the provided ones as
+                  models. </para>
+              </listitem>
+            </varlistentry>
+          </variablelist>
+        </section>
+      </section>
        <section>
         <title>Online Region Merges</title>
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/configuration.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/configuration.xml b/src/main/docbkx/configuration.xml
index 0af2b3c..aec8a00 100644
--- a/src/main/docbkx/configuration.xml
+++ b/src/main/docbkx/configuration.xml
@@ -1355,7 +1355,9 @@ export HBASE_HEAPSIZE=4096
             <varname>hbase.hregion.max.filesize</varname>,
             <varname>hbase.regionserver.regionSplitLimit</varname>. A simplistic view of splitting
           is that when a region grows to <varname>hbase.hregion.max.filesize</varname>, it is split.
-          For most use patterns, most of the time, you should use automatic splitting.</para>
+          For most use patterns, most of the time, you should use automatic splitting. See <xref
+            linkend="manual_region_splitting_decisions"/> for more information about manual region
+          splitting.</para>
         <para>Instead of allowing HBase to split your regions automatically, you can choose to
           manage the splitting yourself. This feature was added in HBase 0.90.0. Manually managing
           splits works if you know your keyspace well, otherwise let HBase figure where to split for you.

http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/ops_mgt.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml
index 1f83a15..ea7883b 100644
--- a/src/main/docbkx/ops_mgt.xml
+++ b/src/main/docbkx/ops_mgt.xml
@@ -2227,8 +2227,8 @@ hbase> restore_snapshot 'myTableSnapshot-122112'
           pre-split 1 region per RS at most), especially if you don't know how much each table will
           grow. If you split too much, you may end up with too many regions, with some tables having
           too many small regions.</para>
-        <para>For pre-splitting howto, see <xref
-            linkend="precreate.regions" />.</para>
+        <para>For pre-splitting howto, see <xref linkend="manual_region_splitting_decisions"/> and
+            <xref linkend="precreate.regions"/>.</para>
       </section>
       <!-- ops.capacity.config.presplit -->
     </section>

http://git-wip-us.apache.org/repos/asf/hbase/blob/a3b65c45/src/main/docbkx/performance.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml
index e7c0fc7..59287ee 100644
--- a/src/main/docbkx/performance.xml
+++ b/src/main/docbkx/performance.xml
@@ -682,9 +682,9 @@ admin.createTable(table, startKey, endKey, numberOfRegions);
 byte[][] splits = ...;   // create your own splits
 admin.createTable(table, splits);
 </programlisting>
-      <para> See <xref
-          linkend="rowkey.regionsplits" /> for issues related to understanding your keyspace and
-        pre-creating regions. </para>
+      <para> See <xref linkend="rowkey.regionsplits"/> for issues related to understanding your
+        keyspace and pre-creating regions. See <xref linkend="manual_region_splitting_decisions"/>
+        for discussion on manually pre-splitting regions.</para>
     </section>
     <section
       xml:id="def.log.flush">