You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2012/10/21 00:28:59 UTC
svn commit: r1400526 - /hbase/trunk/src/docbkx/configuration.xml

Author: stack
Date: Sat Oct 20 22:28:59 2012
New Revision: 1400526

URL: http://svn.apache.org/viewvc?rev=1400526&view=rev
Log:
Add in Andrew Purtell's BigTop pointer

Modified:
    hbase/trunk/src/docbkx/configuration.xml

Modified: hbase/trunk/src/docbkx/configuration.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/configuration.xml?rev=1400526&r1=1400525&r2=1400526&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/configuration.xml (original)
+++ hbase/trunk/src/docbkx/configuration.xml Sat Oct 20 22:28:59 2012
@@ -30,10 +30,10 @@
     <para>This chapter is the Not-So-Quick start guide to HBase configuration.  It goes
     over system requirements, Hadoop setup, the different HBase run modes, and the
     various configurations in HBase.  Please read this chapter carefully.  At a mimimum
-    ensure that all <xref linkend="basic.prerequisites" /> have 
+    ensure that all <xref linkend="basic.prerequisites" /> have
       been satisfied.  Failure to do so will cause you (and us) grief debugging strange errors
       and/or data loss.</para>
-    
+
     <para>
         HBase uses the same configuration system as Hadoop.
         To configure a deploy, edit a file of environment variables
@@ -57,7 +57,7 @@ to ensure well-formedness of your docume
     content of the <filename>conf</filename> directory to
     all nodes of the cluster.  HBase will not do this for you.
     Use <command>rsync</command>.</para>
-    
+
     <section xml:id="basic.prerequisites">
     <title>Basic Prerequisites</title>
     <para>This section lists required services and some required system configuration.
@@ -69,7 +69,7 @@ to ensure well-formedness of your docume
         xlink:href="http://www.java.com/download/">Oracle</link>.</para>
     </section>
     <section xml:id="os">
-        <title>Operating System</title>        
+        <title>Operating System</title>
       <section xml:id="ssh">
         <title>ssh</title>
 
@@ -151,9 +151,9 @@ to ensure well-formedness of your docume
       2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
       </programlisting> Do yourself a favor and change the upper bound on the
         number of file descriptors. Set it to north of 10k.  The math runs roughly as follows:  per ColumnFamily
-        there is at least one StoreFile and possibly up to 5 or 6 if the region is under load.  Multiply the 
+        there is at least one StoreFile and possibly up to 5 or 6 if the region is under load.  Multiply the
         average number of StoreFiles per ColumnFamily times the number of regions per RegionServer.  For example, assuming
-        that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily, 
+        that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily,
         and there are 100 regions per RegionServer, the JVM will open 3 * 3 * 100 = 900 file descriptors
         (not counting open jar files, config files, etc.)
         </para>
@@ -216,13 +216,13 @@ to ensure well-formedness of your docume
         xlink:href="http://cygwin.com/">Cygwin</link> to have a *nix-like
         environment for the shell scripts. The full details are explained in
         the <link xlink:href="http://hbase.apache.org/cygwin.html">Windows
-        Installation</link> guide. Also 
+        Installation</link> guide. Also
         <link xlink:href="http://search-hadoop.com/?q=hbase+windows&amp;fc_project=HBase&amp;fc_type=mail+_hash_+dev">search our user mailing list</link> to pick
         up latest fixes figured by Windows users.</para>
       </section>
 
     </section>   <!--  OS -->
-    
+
     <section xml:id="hadoop">
         <title><link
         xlink:href="http://hadoop.apache.org">Hadoop</link><indexterm>
@@ -289,7 +289,7 @@ to ensure well-formedness of your docume
     <link xlink:href="http://www.cloudera.com/">Cloudera</link> or
     <link xlink:href="http://www.mapr.com/">MapR</link> distributions.
     Cloudera' <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link>
-    is Apache Hadoop 0.20.x plus patches including all of the 
+    is Apache Hadoop 0.20.x plus patches including all of the
     <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
     additions needed to add a durable sync. Use the released, most recent version of CDH3.  In CDH, append
     support is enabled by default so you do not need to make the above mentioned edits to
@@ -311,6 +311,16 @@ to ensure well-formedness of your docume
         replace the jar in HBase everywhere on your cluster.  Hadoop version
         mismatch issues have various manifestations but often all looks like
         its hung up.</para>
+    <note xml:id="bigtop"><title>Packaging and Apache BigTop</title>
+        <para><link xlink:href="http://bigtop.apache.org">Apache Bigtop</link>
+            is an umbrella for packaging and tests of the Apache Hadoop
+            ecosystem, including Apache HBase. Bigtop performs testing at various
+            levels (packaging, platform, runtime, upgrade, etc...), developed by a
+            community, with a focus on the system as a whole, rather than individual
+            projects. We recommend installing Apache HBase packages as provided by a
+            Bigtop release rather than rolling your own piecemeal integration of
+            various component releases.</para>
+    </note>
 
        <section xml:id="hadoop.security">
           <title>HBase on Secure Hadoop</title>
@@ -320,7 +330,7 @@ to ensure well-formedness of your docume
           with the secure version.  If you want to read more about how to setup
           Secure HBase, see <xref linkend="hbase.secure.configuration" />.</para>
        </section>
-           
+
        <section xml:id="dfs.datanode.max.xcievers">
         <title><varname>dfs.datanode.max.xcievers</varname><indexterm>
             <primary>xcievers</primary>
@@ -354,7 +364,7 @@ to ensure well-formedness of your docume
        <para>See also <xref linkend="casestudies.xceivers"/>
        </para>
       </section>
-     
+
      </section>    <!--  hadoop -->
      </section>
 
@@ -418,7 +428,7 @@ to ensure well-formedness of your docume
           HBase. Do not use this configuration for production nor for
           evaluating HBase performance.</para>
 
-	      <para>First, setup your HDFS in <link xlink:href="http://hadoop.apache.org/docs/r1.0.3/single_node_setup.html">pseudo-distributed mode</link>. 
+	      <para>First, setup your HDFS in <link xlink:href="http://hadoop.apache.org/docs/r1.0.3/single_node_setup.html">pseudo-distributed mode</link>.
    	      </para>
 	      <para>Next, configure HBase.  Below is an example <filename>conf/hbase-site.xml</filename>.
           This is the file into
@@ -501,10 +511,10 @@ to ensure well-formedness of your docume
 	                </programlisting>
 				</para>
 			</section>
-		    		    
+
 		  </section>
 
-        </section>  
+        </section>
 
         <section xml:id="fully_dist">
           <title>Fully-distributed</title>
@@ -600,7 +610,7 @@ to ensure well-formedness of your docume
       <section xml:id="confirm">
         <title>Running and Confirming Your Installation</title>
 
-         
+
 
         <para>Make sure HDFS is running first. Start and stop the Hadoop HDFS
         daemons by running <filename>bin/start-hdfs.sh</filename> over in the
@@ -610,31 +620,31 @@ to ensure well-formedness of your docume
         not normally use the mapreduce daemons. These do not need to be
         started.</para>
 
-         
+
 
         <para><emphasis>If</emphasis> you are managing your own ZooKeeper,
         start it and confirm its running else, HBase will start up ZooKeeper
         for you as part of its start process.</para>
 
-         
+
 
         <para>Start HBase with the following command:</para>
 
-         
+
 
         <programlisting>bin/start-hbase.sh</programlisting>
 
-         Run the above from the 
+         Run the above from the
 
         <varname>HBASE_HOME</varname>
 
-         directory. 
+         directory.
 
         <para>You should now have a running HBase instance. HBase logs can be
         found in the <filename>logs</filename> subdirectory. Check them out
         especially if HBase had trouble starting.</para>
 
-         
+
 
         <para>HBase also puts up a UI listing vital attributes. By default its
         deployed on the Master host at port 60010 (HBase RegionServers listen
@@ -644,13 +654,13 @@ to ensure well-formedness of your docume
         Master's homepage you'd point your browser at
         <filename>http://master.example.org:60010</filename>.</para>
 
-         
+
 
     <para>Once HBase has started, see the <xref linkend="shell_exercises" /> for how to
         create tables, add data, scan your insertions, and finally disable and
         drop your tables.</para>
 
-         
+
 
         <para>To stop HBase after exiting the HBase shell enter
         <programlisting>$ ./bin/stop-hbase.sh
@@ -660,15 +670,15 @@ stopping hbase...............</programli
         until HBase has shut down completely before stopping the Hadoop
         daemons.</para>
 
-         
+
       </section>
      </section>    <!--  run modes -->
-    
-    
-    
-    <section xml:id="config.files">    
+
+
+
+    <section xml:id="config.files">
          <title>Configuration Files</title>
-         
+
     <section xml:id="hbase.site">
     <title><filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename></title>
     <para>Just as in Hadoop where you add site-specific HDFS configuration
@@ -744,11 +754,11 @@ stopping hbase...............</programli
           Minimally, a client of HBase needs several libraries in its <varname>CLASSPATH</varname> when connecting to a cluster, including:
           <programlisting>
 commons-configuration (commons-configuration-1.6.jar)
-commons-lang (commons-lang-2.5.jar) 
-commons-logging (commons-logging-1.1.1.jar) 
-hadoop-core (hadoop-core-1.0.0.jar) 
+commons-lang (commons-lang-2.5.jar)
+commons-logging (commons-logging-1.1.1.jar)
+hadoop-core (hadoop-core-1.0.0.jar)
 hbase (hbase-0.92.0.jar)
-log4j (log4j-1.2.16.jar) 
+log4j (log4j-1.2.16.jar)
 slf4j-api (slf4j-api-1.5.8.jar)
 slf4j-log4j (slf4j-log4j12-1.5.8.jar)
 zookeeper (zookeeper-3.4.2.jar)</programlisting>
@@ -769,7 +779,7 @@ zookeeper (zookeeper-3.4.2.jar)</program
 </configuration>
 ]]></programlisting>
         </para>
-        
+
         <section xml:id="java.client.config">
         <title>Java client configuration</title>
         <para>The configuration used by a Java client is kept
@@ -778,15 +788,15 @@ zookeeper (zookeeper-3.4.2.jar)</program
         on invocation, will read in the content of the first <filename>hbase-site.xml</filename> found on
         the client's <varname>CLASSPATH</varname>, if one is present
         (Invocation will also factor in any <filename>hbase-default.xml</filename> found;
-        an hbase-default.xml ships inside the <filename>hbase.X.X.X.jar</filename>). 
+        an hbase-default.xml ships inside the <filename>hbase.X.X.X.jar</filename>).
         It is also possible to specify configuration directly without having to read from a
         <filename>hbase-site.xml</filename>.  For example, to set the ZooKeeper
         ensemble for the cluster programmatically do as follows:
         <programlisting>Configuration config = HBaseConfiguration.create();
-config.set("hbase.zookeeper.quorum", "localhost");  // Here we are running zookeeper locally</programlisting>    
+config.set("hbase.zookeeper.quorum", "localhost");  // Here we are running zookeeper locally</programlisting>
         If multiple ZooKeeper instances make up your ZooKeeper ensemble,
         they may be specified in a comma-separated list (just as in the <filename>hbase-site.xml</filename> file).
-        This populated <classname>Configuration</classname> instance can then be passed to an 
+        This populated <classname>Configuration</classname> instance can then be passed to an
         <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>,
         and so on.
         </para>
@@ -794,7 +804,7 @@ config.set("hbase.zookeeper.quorum", "lo
         </section>
 
       </section>  <!--  config files -->
-      
+
       <section xml:id="example_config">
       <title>Example Configurations</title>
 
@@ -886,7 +896,7 @@ config.set("hbase.zookeeper.quorum", "lo
           1G.</para>
 
           <programlisting>
-    
+
 $ git diff hbase-env.sh
 diff --git a/conf/hbase-env.sh b/conf/hbase-env.sh
 index e70ebc6..96f8c27 100644
@@ -894,11 +904,11 @@ index e70ebc6..96f8c27 100644
 +++ b/conf/hbase-env.sh
 @@ -31,7 +31,7 @@ export JAVA_HOME=/usr/lib//jvm/java-6-sun/
  # export HBASE_CLASSPATH=
- 
+
  # The maximum amount of heap to use, in MB. Default is 1000.
 -# export HBASE_HEAPSIZE=1000
 +export HBASE_HEAPSIZE=4096
- 
+
  # Extra Java runtime options.
  # Below are what we set by default.  May only work with SUN JVM.
 
@@ -910,8 +920,8 @@ index e70ebc6..96f8c27 100644
         </section>
       </section>
      </section>       <!-- example config -->
-      
-      
+
+
       <section xml:id="important_configurations">
       <title>The Important Configurations</title>
       <para>Below we list what the <emphasis>important</emphasis>
@@ -935,7 +945,7 @@ index e70ebc6..96f8c27 100644
               configuration under control otherwise, a long garbage collection that lasts
               beyond the ZooKeeper session timeout will take out
               your RegionServer (You might be fine with this -- you probably want recovery to start
-          on the server if a RegionServer has been in GC for a long period of time).</para> 
+          on the server if a RegionServer has been in GC for a long period of time).</para>
 
       <para>To change this configuration, edit <filename>hbase-site.xml</filename>,
           copy the changed file around the cluster and restart.</para>
@@ -1011,7 +1021,7 @@ index e70ebc6..96f8c27 100644
       cluster (You can always later manually split the big Regions should one prove
       hot and you want to spread the request load over the cluster).  A lower number of regions is
        preferred, generally in the range of 20 to low-hundreds
-       per RegionServer.  Adjust the regionsize as appropriate to achieve this number. 
+       per RegionServer.  Adjust the regionsize as appropriate to achieve this number.
        </para>
        <para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
        For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
@@ -1019,10 +1029,10 @@ index e70ebc6..96f8c27 100644
        <para>You may need to experiment with this setting based on your hardware configuration and application needs.
        </para>
        <para>Adjust <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
-       RegionSize can also be set on a per-table basis via 
+       RegionSize can also be set on a per-table basis via
        <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
       </para>
-      
+
       </section>
       <section xml:id="disable.splitting">
       <title>Managed Splitting</title>
@@ -1075,22 +1085,22 @@ of all regions.
 </para>
       </section>
       <section xml:id="managed.compactions"><title>Managed Compactions</title>
-      <para>A common administrative technique is to manage major compactions manually, rather than letting 
+      <para>A common administrative technique is to manage major compactions manually, rather than letting
       HBase do it.  By default, <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname> is one day and major compactions
       may kick in when you least desire it - especially on a busy system.  To turn off automatic major compactions set
-      the value to <varname>0</varname>. 
+      the value to <varname>0</varname>.
       </para>
       <para>It is important to stress that major compactions are absolutely necessary for StoreFile cleanup, the only variant is when
-      they occur.  They can be administered through the HBase shell, or via 
+      they occur.  They can be administered through the HBase shell, or via
       <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin</link>.
       </para>
       <para>For more information about compactions and the compaction file selection process, see <xref linkend="compaction"/></para>
       </section>
-      
+
       <section xml:id="spec.ex"><title>Speculative Execution</title>
-        <para>Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off 
+        <para>Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off
         Speculative Execution at a system-level unless you need it for a specific case, where it can be configured per-job.
-        Set the properties <varname>mapred.map.tasks.speculative.execution</varname> and 
+        Set the properties <varname>mapred.map.tasks.speculative.execution</varname> and
         <varname>mapred.reduce.tasks.speculative.execution</varname> to false.
         </para>
       </section>
@@ -1118,9 +1128,9 @@ of all regions.
       <link xlink:href="http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&amp;subj=Re+Inconsistent+scan+performance+with+caching+set+to+1">Inconsistent scan performance with caching set to 1</link>
       and the issue cited therein where setting notcpdelay improved scan speeds.</para>
     </section>
-         
+
       </section>
-      
+
       </section> <!--  important config -->
 
   </chapter>