You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by dm...@apache.org on 2011/10/16 16:16:31 UTC

svn commit: r1184830 - in /hbase/trunk/src/docbkx: book.xml performance.xml troubleshooting.xml

Author: dmeil
Date: Sun Oct 16 14:16:31 2011
New Revision: 1184830

URL: http://svn.apache.org/viewvc?rev=1184830&view=rev
Log:
HBASE-4598 book update (book.xml, perf.xml, trouble.xml)

Modified:
    hbase/trunk/src/docbkx/book.xml
    hbase/trunk/src/docbkx/performance.xml
    hbase/trunk/src/docbkx/troubleshooting.xml

Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1184830&r1=1184829&r2=1184830&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Sun Oct 16 14:16:31 2011
@@ -1316,7 +1316,7 @@ scan.setFilter(filter);
     <section xml:id="master"><title>Master</title>
        <para><code>HMaster</code> is the implementation of the Master Server.  The Master server
        is responsible for monitoring all RegionServer instances in the cluster, and is
-       the interface for all metadata changes.
+       the interface for all metadata changes.  In a distributed cluster, the Master typically runs on the <xref linkend="arch.hdfs.nn" />.
        </para>
        <section xml:id="master.startup"><title>Startup Behavior</title>
          <para>If run in a multi-Master environment, all Masters compete to run the cluster.  If the active
@@ -1352,7 +1352,8 @@ scan.setFilter(filter);
 
      </section>
      <section xml:id="regionserver.arch"><title>RegionServer</title>
-       <para><code>HRegionServer</code> is the RegionServer implementation.  It is responsible for serving and managing regions.  
+       <para><code>HRegionServer</code> is the RegionServer implementation.  It is responsible for serving and managing regions.
+       In a distributed cluster, a RegionServer runs on a <xref linkend="arch.hdfs.dn" />.  
        </para>
        <section xml:id="regionserver.arch.api"><title>Interface</title>
          <para>The methods exposed by <code>HRegionRegionInterface</code> contain both data-oriented and region-maintenance methods:
@@ -1711,6 +1712,27 @@ scan.setFilter(filter);
      </section>   <!--  bloom  -->  
      
     </section>
+    
+    <section xml:id="arch.hdfs"><title>HDFS</title>
+       <para>As HBase runs on HDFS (and each StoreFile is written as a file on HDFS),
+        it is important to have an understanding of the HDFS Architecture
+         especially in terms of how it stores files, handles failovers, and replicates blocks.
+       </para>
+       <para>See the Hadoop documentation on <link xlink:href="http://hadoop.apache.org/common/docs/current/hdfs_design.html">HDFS Architecture</link>
+       for more information.
+       </para>
+       <section xml:id="arch.hdfs.nn"><title>NameNode</title>
+         <para>The NameNode is responsible for maintaining the filesystem metadata.  See the above HDFS Architecture link
+         for more information.
+         </para>
+       </section>
+       <section xml:id="arch.hdfs.dn"><title>DataNode</title>
+         <para>The DataNodes are responsible for storing HDFS blocks.  See the above HDFS Architecture link
+         for more information.
+         </para>
+       </section>
+    </section>       
+    
   </chapter>   <!--  architecture -->
   
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="external_apis.xml" />
@@ -1889,15 +1911,15 @@ hbase> describe 't1'</programlisting>
             </answer>
         </qandaentry>
     </qandadiv>
-    <qandadiv xml:id="ec2"><title>EC2</title>
+    <qandadiv xml:id="ec2"><title>Amazon EC2</title>
         <qandaentry>
             <question><para>
-            Why doesn't my remote java connection into my ec2 cluster work?
+            I am running HBase on Amazon EC2 and...
             </para></question>
             <answer>
                 <para>
-          See Andrew's answer here, up on the user list: <link xlink:href="http://search-hadoop.com/m/sPdqNFAwyg2">Remote Java client connection into EC2 instance</link>.
-                </para>
+ 	            See Troubleshooting <xref linkend="trouble.ec2" /> and Performance <xref linkend="perf.ec2" /> sections.                
+               </para>
             </answer>
         </qandaentry>
     </qandadiv>

Modified: hbase/trunk/src/docbkx/performance.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/performance.xml?rev=1184830&r1=1184829&r2=1184830&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/performance.xml (original)
+++ hbase/trunk/src/docbkx/performance.xml Sun Oct 16 14:16:31 2011
@@ -409,15 +409,35 @@ htable.close();</programlisting></para>
        </para>
      </section>
   </section>  <!--  deleting -->
+
+  <section xml:id="perf.hdfs"><title>HDFS</title>
+   <para>Because HBase runs on <xref linkend="arch.hdfs" /> it is important to understand how it works and how it affects
+   HBase.
+   </para>
+    <section xml:id="perf.hdfs.curr"><title>Current Issues With Low-Latency Reads</title>
+      <para>The original use-case for HDFS was batch processing.  As such, there low-latency reads were historically not a priority.
+      With the increased adoption of HBase this is changing, and several improvements are already in development.
+      See the 
+      <link xlink:href="https://issues.apache.org/jira/browse/HDFS-1599">Umbrella Jira Ticket for HDFS Improvements for HBase</link>.
+      </para>
+    </section>
+    <section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
+     <para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as 
+     a MapReduce source or sink).  The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues, 
+     returning the most current row or specified timestamps, etc.), and as such HBase is 4-5 times slower than HDFS in this 
+     processing context.  Not that there isn't room for improvement (and this gap will, over time, be reduced), but HDFS
+      will always be faster in this use-case.
+     </para>
+    </section>
+  </section>
   
   <section xml:id="perf.ec2"><title>Amazon EC2</title>
-  <para>Performance questions are common on Amazon EC2 environments because it is is a shared environment.  You will
-  not see the same throughput as a dedicated server.  In terms of running tests on EC2, run them several times for the same
-  reason (i.e., it's a shared environment and you don't know what else is happening on the server).
-  </para>
-  <para>If you are running on EC2 and post performance questions on the dist-list, please state this fact up-front that
-   because EC2 issues are practically a separate class of performance issues.
-  
-  </para>
+   <para>Performance questions are common on Amazon EC2 environments because it is is a shared environment.  You will
+   not see the same throughput as a dedicated server.  In terms of running tests on EC2, run them several times for the same
+   reason (i.e., it's a shared environment and you don't know what else is happening on the server).
+   </para>
+   <para>If you are running on EC2 and post performance questions on the dist-list, please state this fact up-front that
+    because EC2 issues are practically a separate class of performance issues.
+   </para>
   </section>
 </chapter>

Modified: hbase/trunk/src/docbkx/troubleshooting.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/troubleshooting.xml?rev=1184830&r1=1184829&r2=1184830&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/troubleshooting.xml (original)
+++ hbase/trunk/src/docbkx/troubleshooting.xml Sun Oct 16 14:16:31 2011
@@ -793,6 +793,13 @@ ERROR org.apache.hadoop.hbase.regionserv
              <para>Questions on HBase and Amazon EC2 come up frequently on the HBase dist-list. Search for old threads using <link xlink:href="http://search-hadoop.com/">Search Hadoop</link>
              </para>
           </section>
+          <section xml:id="trouble.ec2.connection">
+             <title>Remote Java Connection into EC2 Cluster Not Working</title>
+             <para>
+             See Andrew's answer here, up on the user list: <link xlink:href="http://search-hadoop.com/m/sPdqNFAwyg2">Remote Java client connection into EC2 instance</link>.
+             </para>
+          </section>
+          
     </section>
     
   </chapter>