You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by dm...@apache.org on 2011/12/02 21:57:36 UTC
svn commit: r1209688 - in /hbase/trunk/src/docbkx: book.xml troubleshooting.xml

Author: dmeil
Date: Fri Dec  2 20:57:35 2011
New Revision: 1209688

URL: http://svn.apache.org/viewvc?rev=1209688&view=rev
Log:
hbase-4939 book.xml (architecture/faq), troubleshooting.xml (created resources section)

Modified:
    hbase/trunk/src/docbkx/book.xml
    hbase/trunk/src/docbkx/troubleshooting.xml

Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1209688&r1=1209687&r2=1209688&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Fri Dec  2 20:57:35 2011
@@ -1200,6 +1200,63 @@ if (!b) {
  
   <chapter xml:id="architecture">
     <title>Architecture</title>
+	<section xml:id="arch.overview">
+	<title>Overview</title>
+	  <section xml:id="arch.overview.nosql">
+	  <title>NoSQL?</title>
+	  <para>HBase is a type of "NoSQL" database.  "NoSQL" is a general term meaning that the database isn't an RDBMS which
+	  supports SQL as it's primary access language, but there are many types of NoSQL databases:  BerkeleyDB is an 
+	  example of a local NoSQL database, whereas HBase is very much a distributed database.  Technically speaking,
+	  HBase is really more a "Data Store" than "Data Base" because it lacks many of the features you find in an RDBMS,
+	  such as typed columns, secondary indexes, triggers, and advanced query languages, etc.
+	  </para>
+	  <para>However, HBase has many features which supports both linear and modular scaling.  HBase clusters expand
+	  by adding RegionServers that are hosted on commodity class servers. If a cluster expands from 10 to 20 
+	  RegionServers, for example, it doubles both in terms of storage and as well as processing capacity.
+	  RDBMS can scale well, but only up to a point - specifically, the size of a single database server - and for the best
+	  performance requires specialized hardware and storage devices.  HBase features of note are:
+	        <itemizedlist>
+              <listitem>Strongly consistent reads/writes:  HBase is not an "eventually consistent" DataStore.  This 
+              makes it very suitable for tasks such as high-speed counter aggregation.  </listitem>
+              <listitem>Automatic sharding:  HBase tables are distributed on the cluster via regions, and regions are
+              automatically split and re-distributed as your data grows.</listitem>
+              <listitem>Automatic RegionServer failover</listitem>
+              <listitem>Hadoop/HDFS Integration:  HBase supports HDFS out of the box as it's distributed file system.</listitem>
+              <listitem>MapReduce:  HBase supports massively parallelized processing via MapReduce for using HBase as both 
+              source and sink.</listitem>
+              <listitem>Java Client API:  HBase supports an easy to use Java API for programmatic access.</listitem>
+              <listitem>Thrift/REST API:  HBase also supports Thrift and REST for non-Java front-ends.</listitem>
+              <listitem>Block Cache and Bloom Filters:  HBase supports a Block Cache and Bloom Filters for high volume query optimization.</listitem>
+              <listitem>Operational Management:  HBase provides build-in web-pages for operational insight as well as JMX metrics.</listitem>
+            </itemizedlist>
+	  </para>
+      </section>      
+	
+	  <section xml:id="arch.overview.when">
+	    <title>When Should I Use HBase?</title>
+	          <para>First, make sure you have enough data.  HBase isn't suitable for every problem.  If you have 
+                hundreds of millions or billions of rows, then HBase is a good candidate.  If you only have a few 
+                thousand/million rows, then using a traditional RDBMS might be a better choice due to the 
+                fact that all of your data might wind up on a single node (or two) and the rest of the cluster may
+                be sitting idle.
+	          </para>
+	          <para>Second, make sure you have enough hardware.  Even HDFS doesn't do well with anything less than
+                5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.
+                </para>
+                <para>HBase can run quite well stand-alone on a laptop - but this should be considered a development
+                configuration only.
+                </para>
+      </section>
+      <section xml:id="arch.overview.hbasehdfs">
+        <title>What Is The Difference Between HBase and Hadoop/HDFS?</title>
+          <para><link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files. 
+          It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. 
+          HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. 
+          This can sometimes be a point of conceptual confusion.  HBase internally puts your data in indexed "StoreFiles" that exist
+          on HDFS for high-speed lookups.  See the <xref linkend="datamodel" /> and the rest of this chapter for more information on how HBase achieves its goals.
+         </para>
+      </section>
+	</section>
 
 	<section xml:id="arch.catalog">
 	 <title>Catalog Tables</title>
@@ -2000,17 +2057,7 @@ hbase> describe 't1'</programlisting>
         <qandaentry>
                 <question><para>When should I use HBase?</para></question>
             <answer>
-                <para>
-              Anybody can download and give HBase a spin, even on a laptop.  The scope of this answer is when 
-              would it be best to use HBase in a <emphasis>real</emphasis> deployment.
-                </para>
-                <para>First, make sure you have enough hardware.  Even HDFS doesn't do well with anything less than
-                5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.
-                Second, make sure you have enough data.  HBase isn't suitable for every problem.  If you have 
-                hundreds of millions or billions of rows, then HBase is a good candidate.  If you only have a few 
-                thousand/million rows, then using a traditional RDBMS might be a better choice due to the 
-                fact that all of your data might wind up on a single node (or two) and the rest of the cluster may
-                be sitting idle.
+                <para>See the <xref linkend="arch.overview" /> in the Architecture chapter.
                 </para>
             </answer>
         </qandaentry>
@@ -2031,17 +2078,6 @@ hbase> describe 't1'</programlisting>
                 </para>
             </answer>
         </qandaentry>
-        <qandaentry xml:id="faq.hdfs.hbase">
-            <question><para>How does HBase work on top of HDFS?</para></question>
-            <answer>
-                <para>
-                    <link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files.  It's documentation 
-                    states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. 
-                    HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.  This can sometimes be a point of conceptual confusion.
-                    See the <xref linkend="datamodel" /> and <xref linkend="architecture" /> sections for more information on how HBase achieves its goals.
-                </para>
-            </answer>
-        </qandaentry>
     </qandadiv>
     <qandadiv xml:id="faq.config"><title>Configuration</title>
         <qandaentry xml:id="faq.config.started">
@@ -2109,6 +2145,16 @@ hbase> describe 't1'</programlisting>
             </answer>
         </qandaentry>
     </qandadiv>
+    <qandadiv xml:id="faq.mapreduce"><title>MapReduce</title>
+        <qandaentry xml:id="faq.mapreduce.use">
+            <question><para>How can I use MapReduce with HBase?</para></question>
+            <answer>
+                <para>
+                    See <xref linkend="mapreduce" />
+                </para>
+            </answer>
+        </qandaentry>
+    </qandadiv>
     <qandadiv><title>Performance and Troubleshooting</title>
         <qandaentry>
             <question><para>

Modified: hbase/trunk/src/docbkx/troubleshooting.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/troubleshooting.xml?rev=1209688&r1=1209687&r2=1209688&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/troubleshooting.xml (original)
+++ hbase/trunk/src/docbkx/troubleshooting.xml Fri Dec  2 20:57:35 2011
@@ -196,6 +196,28 @@ export HBASE_OPTS="-XX:NewSize=64m -XX:M
             </para>                  
       </section>
     </section>
+    <section xml:id="trouble.resources">
+      <title>Resources</title>
+      <section xml:id="trouble.resources.lists">
+        <title>Dist-Lists</title>
+        <para>Sign up for the <link xlink:href="http://hbase.apache.org/mail-lists.html">HBase Dist-Lists</link> and post a question.  'Dev' is aimed at the
+        community of developers actually building HBase and for features currently under development, and 'User' for generally used for questions on released
+        versions of HBase.
+        </para>
+      </section>
+      <section xml:id="trouble.resources.searchhadoop">
+        <title>search-hadoop.com</title>
+        <para>
+        <link xlink:href="http://search-hadoop.com">search-hadoop.com</link> indexes all the mailing lists and is great for historical searches.  
+        </para>
+      </section>
+      <section xml:id="trouble.resources.jira">
+        <title>JIRA</title>
+        <para>
+        <link xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link> is also really helpful when looking for Hadoop/HBase-specific issues.
+        </para>
+      </section>
+    </section>
     <section xml:id="trouble.tools">
       <title>Tools</title>
          <section xml:id="trouble.tools.builtin">
@@ -221,12 +243,6 @@ export HBASE_OPTS="-XX:NewSize=64m -XX:M
        </section>
        <section xml:id="trouble.tools.external">
           <title>External Tools</title>
-       <section xml:id="trouble.tools.searchhadoop">
-        <title>search-hadoop.com</title>
-        <para>
-        <link xlink:href="http://search-hadoop.com">search-hadoop.com</link> indexes all the mailing lists and <link xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>, itâs really helpful when looking for Hadoop/HBase-specific issues.
-        </para>
-      </section>
       <section xml:id="trouble.tools.tail">
         <title>tail</title>
         <para>