You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by dm...@apache.org on 2011/12/14 20:14:10 UTC

svn commit: r1214412 - /hbase/trunk/src/docbkx/book.xml

Author: dmeil
Date: Wed Dec 14 19:14:10 2011
New Revision: 1214412

URL: http://svn.apache.org/viewvc?rev=1214412&view=rev
Log:
hbase-5028 book.xml - adding info on region assignment and file locality

Modified:
    hbase/trunk/src/docbkx/book.xml

Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1214412&r1=1214411&r2=1214412&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Wed Dec 14 19:14:10 2011
@@ -1554,6 +1554,8 @@ scan.setFilter(filter);
            <para>Periodically, and when there are not any regions in transition,
              a load balancer will run and move regions around to balance cluster load.
              See <xref linkend="balancer_config" /> for configuring this property.</para>
+             <para>See <xref linkend="regions.arch.assignment"/> for more information on region assignment.
+             </para>
          </section>
          <section xml:id="master.processes.catalog"><title>CatalogJanitor</title>
            <para>Periodically checks and cleans up the .META. table.  See <xref linkend="arch.catalog.meta" /> for more information on META.</para>
@@ -1714,6 +1716,90 @@ scan.setFilter(filter);
       </para>
     </section>
 
+      <section xml:id="regions.arch.assignment">
+        <title>Region-RegionServer Assignment</title>
+        <para>This section describes how Regions are assigned to RegionServers.
+         </para>
+
+        <section xml:id="regions.arch.assignment.startup">
+          <title>Startup</title>
+          <para>When HBase starts regions are assigned as follows (short version):
+           </para>
+            <orderedlist>
+              <listitem>
+                <para>The Master invokes the <code>AssignmentManager</code> upon startup.</para>
+              </listitem>
+              <listitem>
+                <para>The <code>AssignmentManager</code> looks at the existing region assignments
+                in META.</para>
+              </listitem>
+              <listitem>
+                <para>If the region assignment is still valid (i.e., if the RegionServer) is still online
+                then the assignment is kept.
+                </para>
+              </listitem>
+              <listitem>
+                <para>If the assignment is invalid, then the <code>LoadBalancerFactory</code> is invoked to assign the 
+                region.  The <code>DefaultLoadBalancer</code> will randomly assign the region to a RegionServer. 
+                </para>
+              </listitem>
+      </orderedlist>
+        
+        </section>
+
+        <section xml:id="regions.arch.assignment.failover">
+          <title>Failover</title>
+          <para>When a RegionServer fails (short version):
+           </para>
+            <orderedlist>
+              <listitem>
+                <para>The regions immediately become unavailable because the RegionServer is down.</para>
+              </listitem>
+              <listitem>
+                <para>The Master will detect that the RegionServer has failed.</para>
+              </listitem>
+              <listitem>
+                <para>The region assignments will be considered invalid and will be re-assigned just
+                like the startup sequence.    
+                </para>
+              </listitem>
+            </orderedlist>
+        
+        </section>
+
+        <section xml:id="regions.arch.balancer">
+          <title>Region Load Balancing</title>
+          <para>
+          Regions can be periodically moved by the <xref linkend="master.processes.loadbalancer" />.
+          </para>
+        </section>
+
+      </section>  <!--  assignment -->
+
+      <section xml:id="regions.arch.locality">
+        <title>Region-RegionServer Locality</title>
+        <para>Over time, Region-RegionServer locality is achieved via the an aspect of
+        HDFS block replication.  The HDFS client when choosing where to write it replicas,
+        by default does as follows:
+           <orderedlist>
+             <listitem>First replica is written to local node
+             </listitem>
+             <listitem>Second replica to another node in same rack
+             </listitem>
+             <listitem>Third replica to a node in another rack (if sufficient nodes)
+             </listitem>
+           </orderedlist>
+          HBase eventually achieves locality for a region after a flush a compaction. 
+          In a RegionServer failover situation a RegionServer may be assigned regions with non-local
+          StoreFiles (i.e., none of the replicas are local), however eventually as new data is written
+          in the region, or the table is compacted and StoreFiles are re-written, they will become "local"
+          to the RegionServer.  
+        </para>
+        <para>For more information, see <link xlink:href="http://hadoop.apache.org/common/docs/r0.20.205.0/hdfs_design.html#Replica+Placement%3A+The+First+Baby+Steps">HDFS Design on Replica Placement</link>
+        and also Lars George's blog on <link xlink:href="http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html">HBase and HDFS locality</link>.      
+        </para>
+      </section>
+
       <section>
         <title>Region Splits</title>
 
@@ -1725,15 +1811,6 @@ scan.setFilter(filter);
         splits (and for why you might do this)</para>
       </section>
 
-      <section xml:id="regions.arch.balancer">
-        <title>Region Load Balancing</title>
-
-        <para>
-        Regions can be periodically moved by the <xref linkend="master.processes.loadbalancer" />.
-        </para>
-       
-      </section>
-
       <section xml:id="store">
           <title>Store</title>
           <para>A Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.
@@ -2729,13 +2806,15 @@ Comparator class used for Bloom filter k
          <para><link xlink:href="http://www.slideshare.net/cloudera/hw09-practical-h-base-getting-the-most-from-your-h-base-install">Getting The Most From Your HBase Install</link> by Ryan Rawson, Jonathan Gray (Hadoop World 2009).
          </para>
        </section>
-       <section xml:id="other.info.papers"><title>Papers</title>
+       <section xml:id="other.info.papers"><title>HBase Papers</title>
          <para><link xlink:href="http://research.google.com/archive/bigtable.html">BigTable</link> by Google (2006).
          </para>
+         <para><link xlink:href="http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html">HBase and HDFS Locality</link> by Lars George (2010).
+         </para>
          <para><link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link> by Ian Varley (2009).
          </para>
        </section>
-       <section xml:id="other.info.sites"><title>Sites</title>
+       <section xml:id="other.info.sites"><title>HBase Sites</title>
          <para><link xlink:href="http://www.cloudera.com/blog/category/hbase/">Cloudera's HBase Blog</link> has a lot of links to useful HBase information.
 		<itemizedlist>
 			<listitem><link xlink:href="http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/">CAP Confusion</link> is a relevant entry for background information on
@@ -2746,10 +2825,15 @@ Comparator class used for Bloom filter k
          <para><link xlink:href="http://wiki.apache.org/hadoop/HBase/HBasePresentations">HBase Wiki</link> has a page with a number of presentations.
          </para>
        </section>
-       <section xml:id="other.info.books"><title>Books</title>
+       <section xml:id="other.info.books"><title>HBase Books</title>
          <para><link xlink:href="http://shop.oreilly.com/product/0636920014348.do">HBase:  The Definitive Guide</link> by Lars George.
          </para>
        </section>
+       <section xml:id="other.info.books.hadoop"><title>Hadoop Books</title>
+         <para><link xlink:href="http://shop.oreilly.com/product/9780596521981.do">Hadoop:  The Definitive Guide</link> by Tom White.
+         </para>
+       </section>
+       
   </appendix>
 
   <appendix xml:id="asf" ><title>HBase and the Apache Software Foundation</title>