You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by dm...@apache.org on 2011/12/14 20:14:10 UTC
svn commit: r1214412 - /hbase/trunk/src/docbkx/book.xml
Author: dmeil
Date: Wed Dec 14 19:14:10 2011
New Revision: 1214412
URL: http://svn.apache.org/viewvc?rev=1214412&view=rev
Log:
hbase-5028 book.xml - adding info on region assignment and file locality
Modified:
hbase/trunk/src/docbkx/book.xml
Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1214412&r1=1214411&r2=1214412&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Wed Dec 14 19:14:10 2011
@@ -1554,6 +1554,8 @@ scan.setFilter(filter);
<para>Periodically, and when there are not any regions in transition,
a load balancer will run and move regions around to balance cluster load.
See <xref linkend="balancer_config" /> for configuring this property.</para>
+ <para>See <xref linkend="regions.arch.assignment"/> for more information on region assignment.
+ </para>
</section>
<section xml:id="master.processes.catalog"><title>CatalogJanitor</title>
<para>Periodically checks and cleans up the .META. table. See <xref linkend="arch.catalog.meta" /> for more information on META.</para>
@@ -1714,6 +1716,90 @@ scan.setFilter(filter);
</para>
</section>
+ <section xml:id="regions.arch.assignment">
+ <title>Region-RegionServer Assignment</title>
+ <para>This section describes how Regions are assigned to RegionServers.
+ </para>
+
+ <section xml:id="regions.arch.assignment.startup">
+ <title>Startup</title>
+ <para>When HBase starts regions are assigned as follows (short version):
+ </para>
+ <orderedlist>
+ <listitem>
+ <para>The Master invokes the <code>AssignmentManager</code> upon startup.</para>
+ </listitem>
+ <listitem>
+ <para>The <code>AssignmentManager</code> looks at the existing region assignments
+ in META.</para>
+ </listitem>
+ <listitem>
+ <para>If the region assignment is still valid (i.e., if the RegionServer) is still online
+ then the assignment is kept.
+ </para>
+ </listitem>
+ <listitem>
+ <para>If the assignment is invalid, then the <code>LoadBalancerFactory</code> is invoked to assign the
+ region. The <code>DefaultLoadBalancer</code> will randomly assign the region to a RegionServer.
+ </para>
+ </listitem>
+ </orderedlist>
+
+ </section>
+
+ <section xml:id="regions.arch.assignment.failover">
+ <title>Failover</title>
+ <para>When a RegionServer fails (short version):
+ </para>
+ <orderedlist>
+ <listitem>
+ <para>The regions immediately become unavailable because the RegionServer is down.</para>
+ </listitem>
+ <listitem>
+ <para>The Master will detect that the RegionServer has failed.</para>
+ </listitem>
+ <listitem>
+ <para>The region assignments will be considered invalid and will be re-assigned just
+ like the startup sequence.
+ </para>
+ </listitem>
+ </orderedlist>
+
+ </section>
+
+ <section xml:id="regions.arch.balancer">
+ <title>Region Load Balancing</title>
+ <para>
+ Regions can be periodically moved by the <xref linkend="master.processes.loadbalancer" />.
+ </para>
+ </section>
+
+ </section> <!-- assignment -->
+
+ <section xml:id="regions.arch.locality">
+ <title>Region-RegionServer Locality</title>
+ <para>Over time, Region-RegionServer locality is achieved via the an aspect of
+ HDFS block replication. The HDFS client when choosing where to write it replicas,
+ by default does as follows:
+ <orderedlist>
+ <listitem>First replica is written to local node
+ </listitem>
+ <listitem>Second replica to another node in same rack
+ </listitem>
+ <listitem>Third replica to a node in another rack (if sufficient nodes)
+ </listitem>
+ </orderedlist>
+ HBase eventually achieves locality for a region after a flush a compaction.
+ In a RegionServer failover situation a RegionServer may be assigned regions with non-local
+ StoreFiles (i.e., none of the replicas are local), however eventually as new data is written
+ in the region, or the table is compacted and StoreFiles are re-written, they will become "local"
+ to the RegionServer.
+ </para>
+ <para>For more information, see <link xlink:href="http://hadoop.apache.org/common/docs/r0.20.205.0/hdfs_design.html#Replica+Placement%3A+The+First+Baby+Steps">HDFS Design on Replica Placement</link>
+ and also Lars George's blog on <link xlink:href="http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html">HBase and HDFS locality</link>.
+ </para>
+ </section>
+
<section>
<title>Region Splits</title>
@@ -1725,15 +1811,6 @@ scan.setFilter(filter);
splits (and for why you might do this)</para>
</section>
- <section xml:id="regions.arch.balancer">
- <title>Region Load Balancing</title>
-
- <para>
- Regions can be periodically moved by the <xref linkend="master.processes.loadbalancer" />.
- </para>
-
- </section>
-
<section xml:id="store">
<title>Store</title>
<para>A Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.
@@ -2729,13 +2806,15 @@ Comparator class used for Bloom filter k
<para><link xlink:href="http://www.slideshare.net/cloudera/hw09-practical-h-base-getting-the-most-from-your-h-base-install">Getting The Most From Your HBase Install</link> by Ryan Rawson, Jonathan Gray (Hadoop World 2009).
</para>
</section>
- <section xml:id="other.info.papers"><title>Papers</title>
+ <section xml:id="other.info.papers"><title>HBase Papers</title>
<para><link xlink:href="http://research.google.com/archive/bigtable.html">BigTable</link> by Google (2006).
</para>
+ <para><link xlink:href="http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html">HBase and HDFS Locality</link> by Lars George (2010).
+ </para>
<para><link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link> by Ian Varley (2009).
</para>
</section>
- <section xml:id="other.info.sites"><title>Sites</title>
+ <section xml:id="other.info.sites"><title>HBase Sites</title>
<para><link xlink:href="http://www.cloudera.com/blog/category/hbase/">Cloudera's HBase Blog</link> has a lot of links to useful HBase information.
<itemizedlist>
<listitem><link xlink:href="http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/">CAP Confusion</link> is a relevant entry for background information on
@@ -2746,10 +2825,15 @@ Comparator class used for Bloom filter k
<para><link xlink:href="http://wiki.apache.org/hadoop/HBase/HBasePresentations">HBase Wiki</link> has a page with a number of presentations.
</para>
</section>
- <section xml:id="other.info.books"><title>Books</title>
+ <section xml:id="other.info.books"><title>HBase Books</title>
<para><link xlink:href="http://shop.oreilly.com/product/0636920014348.do">HBase: The Definitive Guide</link> by Lars George.
</para>
</section>
+ <section xml:id="other.info.books.hadoop"><title>Hadoop Books</title>
+ <para><link xlink:href="http://shop.oreilly.com/product/9780596521981.do">Hadoop: The Definitive Guide</link> by Tom White.
+ </para>
+ </section>
+
</appendix>
<appendix xml:id="asf" ><title>HBase and the Apache Software Foundation</title>