You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2013/03/13 16:20:20 UTC

svn commit: r1455996 [3/7] - in /hbase/branches/0.94/src: docbkx/ site/ site/resources/css/ site/resources/images/ site/xdoc/

Modified: hbase/branches/0.94/src/docbkx/configuration.xml
URL: http://svn.apache.org/viewvc/hbase/branches/0.94/src/docbkx/configuration.xml?rev=1455996&r1=1455995&r2=1455996&view=diff
==============================================================================
--- hbase/branches/0.94/src/docbkx/configuration.xml (original)
+++ hbase/branches/0.94/src/docbkx/configuration.xml Wed Mar 13 15:20:19 2013
@@ -26,14 +26,16 @@
  * limitations under the License.
  */
 -->
-    <title>Configuration</title>
-    <para>This chapter is the Not-So-Quick start guide to HBase configuration.</para>
-    <para>Please read this chapter carefully and ensure that all requirements have 
+    <title>Apache HBase (TM) Configuration</title>
+    <para>This chapter is the Not-So-Quick start guide to Apache HBase (TM) configuration.  It goes
+    over system requirements, Hadoop setup, the different Apache HBase run modes, and the
+    various configurations in HBase.  Please read this chapter carefully.  At a mimimum
+    ensure that all <xref linkend="basic.prerequisites" /> have
       been satisfied.  Failure to do so will cause you (and us) grief debugging strange errors
       and/or data loss.</para>
-    
+
     <para>
-        HBase uses the same configuration system as Hadoop.
+        Apache HBase uses the same configuration system as Apache Hadoop.
         To configure a deploy, edit a file of environment variables
         in <filename>conf/hbase-env.sh</filename> -- this configuration
         is used mostly by the launcher shell scripts getting the cluster
@@ -55,17 +57,20 @@ to ensure well-formedness of your docume
     content of the <filename>conf</filename> directory to
     all nodes of the cluster.  HBase will not do this for you.
     Use <command>rsync</command>.</para>
-    
+
+    <section xml:id="basic.prerequisites">
+    <title>Basic Prerequisites</title>
+    <para>This section lists required services and some required system configuration.
+    </para>
+
     <section xml:id="java">
         <title>Java</title>
-
-        <para>Just like Hadoop, HBase requires java 6 from <link
-        xlink:href="http://www.java.com/download/">Oracle</link>. Usually
-        you'll want to use the latest version available except the problematic
-        u18 (u24 is the latest version as of this writing).</para>
+        <para>Just like Hadoop, HBase requires at least java 6 from
+        <link xlink:href="http://www.java.com/download/">Oracle</link>.</para>
     </section>
+
     <section xml:id="os">
-        <title>Operating System</title>        
+        <title>Operating System</title>
       <section xml:id="ssh">
         <title>ssh</title>
 
@@ -73,14 +78,20 @@ to ensure well-formedness of your docume
         <command>sshd</command> must be running to use Hadoop's scripts to
         manage remote Hadoop and HBase daemons. You must be able to ssh to all
         nodes, including your local node, using passwordless login (Google
-        "ssh passwordless login").</para>
+        "ssh passwordless login").  If on mac osx, see the section,
+        <link xlink:href="http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29">SSH: Setting up Remote Desktop and Enabling Self-Login</link>
+        on the hadoop wiki.</para>
       </section>
 
       <section xml:id="dns">
         <title>DNS</title>
 
-        <para>HBase uses the local hostname to self-report it's IP address.
-        Both forward and reverse DNS resolving should work.</para>
+        <para>HBase uses the local hostname to self-report its IP address.
+        Both forward and reverse DNS resolving must work in versions of
+        HBase previous to 0.92.0
+        <footnote><para>The <link xlink:href="https://github.com/sujee/hadoop-dns-checker">hadoop-dns-checker</link> tool can be used to verify
+        DNS is working correctly on the cluster.  The project README file provides detailed instructions on usage.
+</para></footnote>.</para>
 
         <para>If your machine has multiple interfaces, HBase will use the
         interface that the primary hostname resolves to.</para>
@@ -97,15 +108,7 @@ to ensure well-formedness of your docume
       </section>
       <section xml:id="loopback.ip">
         <title>Loopback IP</title>
-        <para>HBase expects the loopback IP address to be 127.0.0.1.  Ubuntu and some other distributions,
-        for example, will default to 127.0.1.1 and this will cause problems for you.
-        </para>
-        <para><filename>/etc/hosts</filename> should look something like this:
-<programlisting>
-            127.0.0.1 localhost
-            127.0.0.1 ubuntu.ubuntu-domain ubuntu
-</programlisting>
-        </para>
+        <para>HBase expects the loopback IP address to be 127.0.0.1.  See <xref linkend="loopback.ip"/></para>
        </section>
 
       <section xml:id="ntp">
@@ -132,7 +135,7 @@ to ensure well-formedness of your docume
             </indexterm>
         </title>
 
-        <para>HBase is a database.  It uses a lot of files all at the same time.
+        <para>Apache HBase is a database.  It uses a lot of files all at the same time.
         The default ulimit -n -- i.e. user file limit -- of 1024 on most *nix systems
         is insufficient (On mac os x its 256). Any significant amount of loading will
         lead you to <xref linkend="trouble.rs.runtime.filehandles"/>.
@@ -141,9 +144,9 @@ to ensure well-formedness of your docume
       2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
       </programlisting> Do yourself a favor and change the upper bound on the
         number of file descriptors. Set it to north of 10k.  The math runs roughly as follows:  per ColumnFamily
-        there is at least one StoreFile and possibly up to 5 or 6 if the region is under load.  Multiply the 
+        there is at least one StoreFile and possibly up to 5 or 6 if the region is under load.  Multiply the
         average number of StoreFiles per ColumnFamily times the number of regions per RegionServer.  For example, assuming
-        that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily, 
+        that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily,
         and there are 100 regions per RegionServer, the JVM will open 3 * 3 * 100 = 900 file descriptors
         (not counting open jar files, config files, etc.)
         </para>
@@ -153,7 +156,7 @@ to ensure well-formedness of your docume
         <footnote><para>See Jack Levin's <link xlink:href="">major hdfs issues</link>
                 note up on the user list.</para></footnote>
         <footnote><para>The requirement that a database requires upping of system limits
-        is not peculiar to HBase.  See for example the section
+        is not peculiar to Apache HBase.  See for example the section
         <emphasis>Setting Shell Limits for the Oracle User</emphasis> in
         <link xlink:href="http://www.akadia.com/services/ora_linux_install_10g.html">
         Short Guide to install Oracle 10 on Linux</link>.</para></footnote>.
@@ -198,7 +201,7 @@ to ensure well-formedness of your docume
       <section xml:id="windows">
         <title>Windows</title>
 
-        <para>HBase has been little tested running on Windows. Running a
+        <para>Apache HBase has been little tested running on Windows. Running a
         production install of HBase on top of Windows is not
         recommended.</para>
 
@@ -206,32 +209,61 @@ to ensure well-formedness of your docume
         xlink:href="http://cygwin.com/">Cygwin</link> to have a *nix-like
         environment for the shell scripts. The full details are explained in
         the <link xlink:href="http://hbase.apache.org/cygwin.html">Windows
-        Installation</link> guide. Also 
+        Installation</link> guide. Also
         <link xlink:href="http://search-hadoop.com/?q=hbase+windows&amp;fc_project=HBase&amp;fc_type=mail+_hash_+dev">search our user mailing list</link> to pick
         up latest fixes figured by Windows users.</para>
       </section>
 
     </section>   <!--  OS -->
-    
+
     <section xml:id="hadoop">
         <title><link
         xlink:href="http://hadoop.apache.org">Hadoop</link><indexterm>
             <primary>Hadoop</primary>
           </indexterm></title>
-         <note><title>Please read all of this section</title>
-         <para>Please read this section to the end.  Up front we
-         wade through the weeds of Hadoop versions.  Later we talk of what you must do in HBase
-         to make it work w/ a particular Hadoop version.</para>
-         </note>
-
-          <para>
-        HBase will lose data unless it is running on an HDFS that has a durable
-        <code>sync</code> implementation. Hadoop 0.20.2, Hadoop 0.20.203.0, and Hadoop 0.20.204.0
-	DO NOT have this attribute.
-        Currently only Hadoop versions 0.20.205.x or any release in excess of this
-        version -- this includes hadoop 1.0.0 -- have a working, durable sync
+         <para>Selecting a Hadoop version is critical for your HBase deployment. Below table shows some information about what versions of Hadoop are supported by various HBase versions. Based on the version of HBase, you should select the most appropriate version of Hadoop. We are not in the Hadoop distro selection business. You can use Hadoop distributions from Apache, or learn about vendor distributions of Hadoop at <link xlink:href="http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support"/></para>
+         <para>
+	     <table>
+		 <title>Hadoop version support matrix</title>
+		 <tgroup cols='4' align='left' colsep='1' rowsep='1'><colspec colname='c1' align='left'/><colspec colname='c2' align='center'/><colspec colname='c3' align='center'/><colspec colname='c4' align='center'/>
+         <thead>
+	     <row><entry>               </entry><entry>HBase-0.92.x</entry><entry>HBase-0.94.x</entry><entry>HBase-0.96</entry></row>
+	     </thead><tbody>
+         <row><entry>Hadoop-0.20.205</entry><entry>S</entry>          <entry>X</entry>           <entry>X</entry></row>
+         <row><entry>Hadoop-0.22.x  </entry><entry>S</entry>          <entry>X</entry>           <entry>X</entry></row>
+         <row><entry>Hadoop-1.0.x   </entry><entry>S</entry>          <entry>S</entry>           <entry>S</entry></row>
+         <row><entry>Hadoop-1.1.x   </entry><entry>NT</entry>         <entry>S</entry>           <entry>S</entry></row>
+         <row><entry>Hadoop-0.23.x  </entry><entry>X</entry>          <entry>S</entry>           <entry>NT</entry></row>
+         <row><entry>Hadoop-2.x     </entry><entry>X</entry>          <entry>S</entry>           <entry>S</entry></row>
+		 </tbody></tgroup></table>
+
+        Where
+		<simplelist type='vert' columns='1'>
+		<member>S = supported and tested,</member>
+		<member>X = not supported,</member>
+		<member>NT = it should run, but not tested enough.</member>
+		</simplelist>
+        </para>
+        <para>
+	Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its <filename>lib</filename> directory. The bundled jar is ONLY for use in standalone mode. In distributed mode, it is <emphasis>critical</emphasis> that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the jar in HBase everywhere on your cluster. Hadoop version mismatch issues have various manifestations but often all looks like its hung up.
+    </para>
+    <section xml:id="hadoop.hbase-0.94">
+	<title>Apache HBase 0.92 and 0.94</title>
+	<para>HBase 0.92 and 0.94 versions can work with Hadoop versions, 0.20.205, 0.22.x, 1.0.x, and 1.1.x. HBase-0.94 can additionally work with Hadoop-0.23.x and 2.x, but you may have to recompile the code using the specific maven profile (see top level pom.xml)</para>
+   </section>
+
+    <section xml:id="hadoop.hbase-0.96">
+	<title>Apache HBase 0.96</title>
+	<para>Apache HBase 0.96.0 requires Apache Hadoop 1.x at a minimum, and it can run equally well on hadoop-2.0.
+	As of Apache HBase 0.96.x, Apache Hadoop 1.0.x at least is required. We will no longer run properly on older Hadoops such as 0.20.205 or branch-0.20-append. Do not move to Apache HBase 0.96.x if you cannot upgrade your Hadoop<footnote><para>See <link xlink:href="http://search-hadoop.com/m/7vFVx4EsUb2">HBase, mail # dev - DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?</link></para></footnote>.</para>
+   </section>
+
+    <section xml:id="hadoop.older.versions">
+	<title>Hadoop versions 0.20.x - 1.x</title>
+	<para>
+     HBase will lose data unless it is running on an HDFS that has a durable
+        <code>sync</code> implementation.  DO NOT use Hadoop 0.20.2, Hadoop 0.20.203.0, and Hadoop 0.20.204.0 which DO NOT have this attribute. Currently only Hadoop versions 0.20.205.x or any release in excess of this version -- this includes hadoop-1.0.0 -- have a working, durable sync
           <footnote>
-          <title>On Hadoop Versions</title>
           <para>The Cloudera blog post <link xlink:href="http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/">An update on Apache Hadoop 1.0</link>
           by Charles Zedlweski has a nice exposition on how all the Hadoop versions relate.
           Its worth checking out if you are having trouble making sense of the
@@ -250,57 +282,18 @@ to ensure well-formedness of your docume
         </programlisting>
         You will have to restart your cluster after making this edit.  Ignore the chicken-little
         comment you'll find in the <filename>hdfs-default.xml</filename> in the
-        description for the <varname>dfs.support.append</varname> configuration; it says it is not enabled because there
-        are <quote>... bugs in the 'append code' and is not supported in any production
-        cluster.</quote>. This comment is stale, from another era, and while I'm sure there
-        are bugs, the sync/append code has been running
-        in production at large scale deploys and is on
-        by default in the offerings of hadoop by commercial vendors
-        <footnote><para>Until recently only the
-        <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
-        branch had a working sync but no official release was ever made from this branch.
-        You had to build it yourself. Michael Noll wrote a detailed blog,
-        <link xlink:href="http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-version-for-hbase-0-90-2/">Building
-        an Hadoop 0.20.x version for HBase 0.90.2</link>, on how to build an
-    Hadoop from branch-0.20-append.  Recommended.</para></footnote>
-    <footnote><para>Praveen Kumar has written
-            a complimentary article,
-            <link xlink:href="http://praveen.kumar.in/2011/06/20/building-hadoop-and-hbase-for-hbase-maven-application-development/">Building Hadoop and HBase for HBase Maven application development</link>.
-</para></footnote><footnote>Cloudera have <varname>dfs.support.append</varname> set to true by default.</footnote>.</para>
-
-<para>Or use the
-    <link xlink:href="http://www.cloudera.com/">Cloudera</link> or
-    <link xlink:href="http://www.mapr.com/">MapR</link> distributions.
-    Cloudera' <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link>
-    is Apache Hadoop 0.20.x plus patches including all of the 
-    <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
-    additions needed to add a durable sync. Use the released, most recent version of CDH3.</para>
-    <para>
-    <link xlink:href="http://www.mapr.com/">MapR</link>
-    includes a commercial, reimplementation of HDFS.
-    It has a durable sync as well as some other interesting features that are not
-    yet in Apache Hadoop.  Their <link xlink:href="http://www.mapr.com/products/mapr-editions/m3-edition">M3</link>
-    product is free to use and unlimited.
-    </para>
-
-        <para>Because HBase depends on Hadoop, it bundles an instance of the
-        Hadoop jar under its <filename>lib</filename> directory. The bundled jar is ONLY for use in standalone mode.
-        In distributed mode, it is <emphasis>critical</emphasis> that the version of Hadoop that is out
-        on your cluster match what is under HBase.  Replace the hadoop jar found in the HBase
-        <filename>lib</filename> directory with the hadoop jar you are running on
-        your cluster to avoid version mismatch issues. Make sure you
-        replace the jar in HBase everywhere on your cluster.  Hadoop version
-        mismatch issues have various manifestations but often all looks like
-        its hung up.</para>
-
+        description for the <varname>dfs.support.append</varname> configuration.
+     </para>
+     </section>
        <section xml:id="hadoop.security">
-          <title>Hadoop Security</title>
-          <para>HBase will run on any Hadoop 0.20.x that incorporates Hadoop
-          security features -- e.g. Y! 0.20S or CDH3B3 -- as long as you do as
+          <title>Apache HBase on Secure Hadoop</title>
+          <para>Apache HBase will run on any Hadoop 0.20.x that incorporates Hadoop
+          security features as long as you do as
           suggested above and replace the Hadoop jar that ships with HBase
-          with the secure version.</para>
+          with the secure version.  If you want to read more about how to setup
+          Secure HBase, see <xref linkend="hbase.secure.configuration" />.</para>
        </section>
-           
+
        <section xml:id="dfs.datanode.max.xcievers">
         <title><varname>dfs.datanode.max.xcievers</varname><indexterm>
             <primary>xcievers</primary>
@@ -331,9 +324,12 @@ to ensure well-formedness of your docume
         java.io.IOException: No live nodes contain current block. Will get new
         block locations from namenode and retry...</code>
         <footnote><para>See <link xlink:href="http://ccgtech.blogspot.com/2010/02/hadoop-hdfs-deceived-by-xciever.html">Hadoop HDFS: Deceived by Xciever</link> for an informative rant on xceivering.</para></footnote></para>
+       <para>See also <xref linkend="casestudies.xceivers"/>
+       </para>
       </section>
-     
+
      </section>    <!--  hadoop -->
+     </section>
 
     <section xml:id="standalone_dist">
       <title>HBase run modes: Standalone and Distributed</title>
@@ -376,7 +372,7 @@ to ensure well-formedness of your docume
 
         <para>Distributed modes require an instance of the <emphasis>Hadoop
         Distributed File System</emphasis> (HDFS). See the Hadoop <link
-        xlink:href="http://hadoop.apache.org/common/docs/current/api/overview-summary.html#overview_description">
+        xlink:href="http://hadoop.apache.org/common/docs/r1.1.1/api/overview-summary.html#overview_description">
         requirements and instructions</link> for how to set up a HDFS. Before
         proceeding, ensure you have an appropriate, working HDFS.</para>
 
@@ -395,57 +391,92 @@ to ensure well-formedness of your docume
           HBase. Do not use this configuration for production nor for
           evaluating HBase performance.</para>
 
-          <para>Once you have confirmed your HDFS setup, edit
-          <filename>conf/hbase-site.xml</filename>. This is the file into
+	      <para>First, setup your HDFS in <link xlink:href="http://hadoop.apache.org/docs/r1.0.3/single_node_setup.html">pseudo-distributed mode</link>.
+   	      </para>
+	      <para>Next, configure HBase.  Below is an example <filename>conf/hbase-site.xml</filename>.
+          This is the file into
           which you add local customizations and overrides for
-          <xreg linkend="hbase_default_configurations" /> and <xref linkend="hdfs_client_conf" />. Point HBase at the running Hadoop HDFS
-          instance by setting the <varname>hbase.rootdir</varname> property.
-          This property points HBase at the Hadoop filesystem instance to use.
-          For example, adding the properties below to your
-          <filename>hbase-site.xml</filename> says that HBase should use the
-          <filename>/hbase</filename> directory in the HDFS whose namenode is
-          at port 8020 on your local machine, and that it should run with one
-          replica only (recommended for pseudo-distributed mode):</para>
+          <xref linkend="hbase_default_configurations" /> and <xref linkend="hdfs_client_conf" />.
+              Note that the <varname>hbase.rootdir</varname> property points to the
+              local HDFS instance.
+   		  </para>
 
-          <programlisting>
+          <para>Now skip to <xref linkend="confirm" /> for how to start and verify your
+          pseudo-distributed install. <footnote>
+              <para>See <xref linkend="pseudo.extras">Pseudo-distributed
+              mode extras</xref> for notes on how to start extra Masters and
+              RegionServers when running pseudo-distributed.</para>
+            </footnote></para>
+
+          <note>
+            <para>Let HBase create the <varname>hbase.rootdir</varname>
+            directory. If you don't, you'll get warning saying HBase needs a
+            migration run because the directory is missing files expected by
+            HBase (it'll create them if you let it).</para>
+          </note>
+
+  		  <section xml:id="pseudo.config">
+  		  	<title>Pseudo-distributed Configuration File</title>
+			<para>Below is a sample pseudo-distributed file for the node <varname>h-24-30.example.com</varname>.
+<filename>hbase-site.xml</filename>
+<programlisting>
 &lt;configuration&gt;
   ...
   &lt;property&gt;
     &lt;name&gt;hbase.rootdir&lt;/name&gt;
-    &lt;value&gt;hdfs://localhost:8020/hbase&lt;/value&gt;
-    &lt;description&gt;The directory shared by RegionServers.
-    &lt;/description&gt;
+    &lt;value&gt;hdfs://h-24-30.sfo.stumble.net:8020/hbase&lt;/value&gt;
   &lt;/property&gt;
   &lt;property&gt;
-    &lt;name&gt;dfs.replication&lt;/name&gt;
-    &lt;value&gt;1&lt;/value&gt;
-    &lt;description&gt;The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count.
-    &lt;/description&gt;
+    &lt;name&gt;hbase.cluster.distributed&lt;/name&gt;
+    &lt;value&gt;true&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;hbase.zookeeper.quorum&lt;/name&gt;
+    &lt;value&gt;h-24-30.sfo.stumble.net&lt;/value&gt;
   &lt;/property&gt;
   ...
 &lt;/configuration&gt;
 </programlisting>
+</para>
 
-          <note>
-            <para>Let HBase create the <varname>hbase.rootdir</varname>
-            directory. If you don't, you'll get warning saying HBase needs a
-            migration run because the directory is missing files expected by
-            HBase (it'll create them if you let it).</para>
-          </note>
+  		  </section>
 
-          <note>
-            <para>Above we bind to <varname>localhost</varname>. This means
-            that a remote client cannot connect. Amend accordingly, if you
-            want to connect from a remote location.</para>
-          </note>
+		  <section xml:id="pseudo.extras">
+		    <title>Pseudo-distributed Extras</title>
+
+		  <section xml:id="pseudo.extras.start">
+		  	<title>Startup</title>
+		    	<para>To start up the initial HBase cluster...
+                   <programlisting>% bin/start-hbase.sh</programlisting>
+                </para>
+            	<para>To start up an extra backup master(s) on the same server run...
+                       <programlisting>% bin/local-master-backup.sh start 1</programlisting>
+                       ... the '1' means use ports 60001 &amp; 60011, and this backup master's logfile will be at <filename>logs/hbase-${USER}-1-master-${HOSTNAME}.log</filename>.
+                </para>
+                <para>To startup multiple backup masters run... <programlisting>% bin/local-master-backup.sh start 2 3</programlisting> You can start up to 9 backup masters (10 total).
+ 				</para>
+				<para>To start up more regionservers...
+     			  <programlisting>% bin/local-regionservers.sh start 1</programlisting>
+     			where '1' means use ports 60201 &amp; 60301 and its logfile will be at <filename>logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log</filename>.
+     			</para>
+     			<para>To add 4 more regionservers in addition to the one you just started by running... <programlisting>% bin/local-regionservers.sh start 2 3 4 5</programlisting>
+     			This supports up to 99 extra regionservers (100 total).
+				</para>
+			</section>
+			<section xml:id="pseudo.options.stop">
+		  	<title>Stop</title>
+    			<para>Assuming you want to stop master backup # 1, run...
+            	<programlisting>% cat /tmp/hbase-${USER}-1-master.pid |xargs kill -9</programlisting>
+            	Note that bin/local-master-backup.sh stop 1 will try to stop the cluster along with the master.
+            	</para>
+            	<para>To stop an individual regionserver, run...
+                	<programlisting>% bin/local-regionservers.sh stop 1
+	                </programlisting>
+				</para>
+			</section>
+
+		  </section>
 
-          <para>Now skip to <xref linkend="confirm" /> for how to start and verify your
-          pseudo-distributed install. <footnote>
-              <para>See <link
-              xlink:href="http://hbase.apache.org/pseudo-distributed.html">Pseudo-distributed
-              mode extras</link> for notes on how to start extra Masters and
-              RegionServers when running pseudo-distributed.</para>
-            </footnote></para>
         </section>
 
         <section xml:id="fully_dist">
@@ -542,7 +573,7 @@ to ensure well-formedness of your docume
       <section xml:id="confirm">
         <title>Running and Confirming Your Installation</title>
 
-         
+
 
         <para>Make sure HDFS is running first. Start and stop the Hadoop HDFS
         daemons by running <filename>bin/start-hdfs.sh</filename> over in the
@@ -552,31 +583,31 @@ to ensure well-formedness of your docume
         not normally use the mapreduce daemons. These do not need to be
         started.</para>
 
-         
+
 
         <para><emphasis>If</emphasis> you are managing your own ZooKeeper,
         start it and confirm its running else, HBase will start up ZooKeeper
         for you as part of its start process.</para>
 
-         
+
 
         <para>Start HBase with the following command:</para>
 
-         
+
 
         <programlisting>bin/start-hbase.sh</programlisting>
 
-         Run the above from the 
+         Run the above from the
 
         <varname>HBASE_HOME</varname>
 
-         directory. 
+         directory.
 
         <para>You should now have a running HBase instance. HBase logs can be
         found in the <filename>logs</filename> subdirectory. Check them out
         especially if HBase had trouble starting.</para>
 
-         
+
 
         <para>HBase also puts up a UI listing vital attributes. By default its
         deployed on the Master host at port 60010 (HBase RegionServers listen
@@ -586,13 +617,13 @@ to ensure well-formedness of your docume
         Master's homepage you'd point your browser at
         <filename>http://master.example.org:60010</filename>.</para>
 
-         
+
 
     <para>Once HBase has started, see the <xref linkend="shell_exercises" /> for how to
         create tables, add data, scan your insertions, and finally disable and
         drop your tables.</para>
 
-         
+
 
         <para>To stop HBase after exiting the HBase shell enter
         <programlisting>$ ./bin/stop-hbase.sh
@@ -602,574 +633,15 @@ stopping hbase...............</programli
         until HBase has shut down completely before stopping the Hadoop
         daemons.</para>
 
-         
+
       </section>
      </section>    <!--  run modes -->
-    
-     <section xml:id="zookeeper">
-            <title>ZooKeeper<indexterm>
-                <primary>ZooKeeper</primary>
-              </indexterm></title>
-
-            <para>A distributed HBase depends on a running ZooKeeper cluster.
-            All participating nodes and clients need to be able to access the
-            running ZooKeeper ensemble. HBase by default manages a ZooKeeper
-            "cluster" for you. It will start and stop the ZooKeeper ensemble
-            as part of the HBase start/stop process. You can also manage the
-            ZooKeeper ensemble independent of HBase and just point HBase at
-            the cluster it should use. To toggle HBase management of
-            ZooKeeper, use the <varname>HBASE_MANAGES_ZK</varname> variable in
-            <filename>conf/hbase-env.sh</filename>. This variable, which
-            defaults to <varname>true</varname>, tells HBase whether to
-            start/stop the ZooKeeper ensemble servers as part of HBase
-            start/stop.</para>
-
-            <para>When HBase manages the ZooKeeper ensemble, you can specify
-            ZooKeeper configuration using its native
-            <filename>zoo.cfg</filename> file, or, the easier option is to
-            just specify ZooKeeper options directly in
-            <filename>conf/hbase-site.xml</filename>. A ZooKeeper
-            configuration option can be set as a property in the HBase
-            <filename>hbase-site.xml</filename> XML configuration file by
-            prefacing the ZooKeeper option name with
-            <varname>hbase.zookeeper.property</varname>. For example, the
-            <varname>clientPort</varname> setting in ZooKeeper can be changed
-            by setting the
-            <varname>hbase.zookeeper.property.clientPort</varname> property.
-            For all default values used by HBase, including ZooKeeper
-            configuration, see <xref linkend="hbase_default_configurations" />. Look for the
-            <varname>hbase.zookeeper.property</varname> prefix <footnote>
-                <para>For the full list of ZooKeeper configurations, see
-                ZooKeeper's <filename>zoo.cfg</filename>. HBase does not ship
-                with a <filename>zoo.cfg</filename> so you will need to browse
-                the <filename>conf</filename> directory in an appropriate
-                ZooKeeper download.</para>
-              </footnote></para>
-
-            <para>You must at least list the ensemble servers in
-            <filename>hbase-site.xml</filename> using the
-            <varname>hbase.zookeeper.quorum</varname> property. This property
-            defaults to a single ensemble member at
-            <varname>localhost</varname> which is not suitable for a fully
-            distributed HBase. (It binds to the local machine only and remote
-            clients will not be able to connect). <note xml:id="how_many_zks">
-                <title>How many ZooKeepers should I run?</title>
-
-                <para>You can run a ZooKeeper ensemble that comprises 1 node
-                only but in production it is recommended that you run a
-                ZooKeeper ensemble of 3, 5 or 7 machines; the more members an
-                ensemble has, the more tolerant the ensemble is of host
-                failures. Also, run an odd number of machines. In ZooKeeper, 
-                an even number of peers is supported, but it is normally not used 
-                because an even sized ensemble requires, proportionally, more peers 
-                to form a quorum than an odd sized ensemble requires. For example, an 
-                ensemble with 4 peers requires 3 to form a quorum, while an ensemble with 
-                5 also requires 3 to form a quorum. Thus, an ensemble of 5 allows 2 peers to 
-                fail, and thus is more fault tolerant than the ensemble of 4, which allows 
-                only 1 down peer.                 
-                </para>
-                <para>Give each ZooKeeper server around 1GB of RAM, and if possible, its own
-                dedicated disk (A dedicated disk is the best thing you can do
-                to ensure a performant ZooKeeper ensemble). For very heavily
-                loaded clusters, run ZooKeeper servers on separate machines
-                from RegionServers (DataNodes and TaskTrackers).</para>
-              </note></para>
-
-            <para>For example, to have HBase manage a ZooKeeper quorum on
-            nodes <emphasis>rs{1,2,3,4,5}.example.com</emphasis>, bound to
-            port 2222 (the default is 2181) ensure
-            <varname>HBASE_MANAGE_ZK</varname> is commented out or set to
-            <varname>true</varname> in <filename>conf/hbase-env.sh</filename>
-            and then edit <filename>conf/hbase-site.xml</filename> and set
-            <varname>hbase.zookeeper.property.clientPort</varname> and
-            <varname>hbase.zookeeper.quorum</varname>. You should also set
-            <varname>hbase.zookeeper.property.dataDir</varname> to other than
-            the default as the default has ZooKeeper persist data under
-            <filename>/tmp</filename> which is often cleared on system
-            restart. In the example below we have ZooKeeper persist to
-            <filename>/user/local/zookeeper</filename>. <programlisting>
-  &lt;configuration&gt;
-    ...
-    &lt;property&gt;
-      &lt;name&gt;hbase.zookeeper.property.clientPort&lt;/name&gt;
-      &lt;value&gt;2222&lt;/value&gt;
-      &lt;description&gt;Property from ZooKeeper's config zoo.cfg.
-      The port at which the clients will connect.
-      &lt;/description&gt;
-    &lt;/property&gt;
-    &lt;property&gt;
-      &lt;name&gt;hbase.zookeeper.quorum&lt;/name&gt;
-      &lt;value&gt;rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com&lt;/value&gt;
-      &lt;description&gt;Comma separated list of servers in the ZooKeeper Quorum.
-      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
-      By default this is set to localhost for local and pseudo-distributed modes
-      of operation. For a fully-distributed setup, this should be set to a full
-      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
-      this is the list of servers which we will start/stop ZooKeeper on.
-      &lt;/description&gt;
-    &lt;/property&gt;
-    &lt;property&gt;
-      &lt;name&gt;hbase.zookeeper.property.dataDir&lt;/name&gt;
-      &lt;value&gt;/usr/local/zookeeper&lt;/value&gt;
-      &lt;description&gt;Property from ZooKeeper's config zoo.cfg.
-      The directory where the snapshot is stored.
-      &lt;/description&gt;
-    &lt;/property&gt;
-    ...
-  &lt;/configuration&gt;</programlisting></para>
-
-            <section>
-              <title>Using existing ZooKeeper ensemble</title>
-
-              <para>To point HBase at an existing ZooKeeper cluster, one that
-              is not managed by HBase, set <varname>HBASE_MANAGES_ZK</varname>
-              in <filename>conf/hbase-env.sh</filename> to false
-              <programlisting>
-  ...
-  # Tell HBase whether it should manage it's own instance of Zookeeper or not.
-  export HBASE_MANAGES_ZK=false</programlisting> Next set ensemble locations
-              and client port, if non-standard, in
-              <filename>hbase-site.xml</filename>, or add a suitably
-              configured <filename>zoo.cfg</filename> to HBase's
-              <filename>CLASSPATH</filename>. HBase will prefer the
-              configuration found in <filename>zoo.cfg</filename> over any
-              settings in <filename>hbase-site.xml</filename>.</para>
-
-              <para>When HBase manages ZooKeeper, it will start/stop the
-              ZooKeeper servers as a part of the regular start/stop scripts.
-              If you would like to run ZooKeeper yourself, independent of
-              HBase start/stop, you would do the following</para>
 
-              <programlisting>
-${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
-</programlisting>
 
-              <para>Note that you can use HBase in this manner to spin up a
-              ZooKeeper cluster, unrelated to HBase. Just make sure to set
-              <varname>HBASE_MANAGES_ZK</varname> to <varname>false</varname>
-              if you want it to stay up across HBase restarts so that when
-              HBase shuts down, it doesn't take ZooKeeper down with it.</para>
-
-              <para>For more information about running a distinct ZooKeeper
-              cluster, see the ZooKeeper <link
-              xlink:href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting
-              Started Guide</link>.  Additionally, see the <link xlink:href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7">ZooKeeper Wiki</link> or the 
-          <link xlink:href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup">ZooKeeper documentation</link> 
-          for more information on ZooKeeper sizing.
-            </para>
-            </section>
-            
 
-            <section xml:id="zk.sasl.auth">
-              <title>SASL Authentication with ZooKeeper</title>
-              <para>Newer releases of HBase (&gt;= 0.92) will
-              support connecting to a ZooKeeper Quorum that supports
-              SASL authentication (which is available in Zookeeper
-              versions 3.4.0 or later).</para>
-              
-              <para>This describes how to set up HBase to mutually
-              authenticate with a ZooKeeper Quorum. ZooKeeper/HBase
-              mutual authentication (<link
-              xlink:href="https://issues.apache.org/jira/browse/HBASE-2418">HBASE-2418</link>)
-              is required as part of a complete secure HBase configuration
-              (<link
-              xlink:href="https://issues.apache.org/jira/browse/HBASE-3025">HBASE-3025</link>).
-
-              For simplicity of explication, this section ignores
-              additional configuration required (Secure HDFS and Coprocessor
-              configuration).  It's recommended to begin with an
-              HBase-managed Zookeeper configuration (as opposed to a
-              standalone Zookeeper quorum) for ease of learning.
-              </para>
-
-              <section><title>Operating System Prerequisites</title></section>
-
-              <para>
-                  You need to have a working Kerberos KDC setup. For
-                  each <code>$HOST</code> that will run a ZooKeeper
-                  server, you should have a principle
-                  <code>zookeeper/$HOST</code>.  For each such host,
-                  add a service key (using the <code>kadmin</code> or
-                  <code>kadmin.local</code> tool's <code>ktadd</code>
-                  command) for <code>zookeeper/$HOST</code> and copy
-                  this file to <code>$HOST</code>, and make it
-                  readable only to the user that will run zookeeper on
-                  <code>$HOST</code>. Note the location of this file,
-                  which we will use below as
-                  <filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename>.
-              </para>
-
-              <para>
-                Similarly, for each <code>$HOST</code> that will run
-                an HBase server (master or regionserver), you should
-                have a principle: <code>hbase/$HOST</code>. For each
-                host, add a keytab file called
-                <filename>hbase.keytab</filename> containing a service
-                key for <code>hbase/$HOST</code>, copy this file to
-                <code>$HOST</code>, and make it readable only to the
-                user that will run an HBase service on
-                <code>$HOST</code>. Note the location of this file,
-                which we will use below as
-                <filename>$PATH_TO_HBASE_KEYTAB</filename>.
-              </para>
-
-              <para>
-                Each user who will be an HBase client should also be
-                given a Kerberos principal. This principal should
-                usually have a password assigned to it (as opposed to,
-                as with the HBase servers, a keytab file) which only
-                this user knows. The client's principal's
-                <code>maxrenewlife</code> should be set so that it can
-                be renewed enough so that the user can complete their
-                HBase client processes. For example, if a user runs a
-                long-running HBase client process that takes at most 3
-                days, we might create this user's principal within
-                <code>kadmin</code> with: <code>addprinc -maxrenewlife
-                3days</code>. The Zookeeper client and server
-                libraries manage their own ticket refreshment by
-                running threads that wake up periodically to do the
-                refreshment.
-              </para>
-
-                <para>On each host that will run an HBase client
-                (e.g. <code>hbase shell</code>), add the following
-                file to the HBase home directory's <filename>conf</filename>
-                directory:</para>
-
-                <programlisting>
-                  Client {
-                    com.sun.security.auth.module.Krb5LoginModule required
-                    useKeyTab=false
-                    useTicketCache=true;
-                  };
-                </programlisting>
-
-                <para>We'll refer to this JAAS configuration file as
-                <filename>$CLIENT_CONF</filename> below.</para>
-
-              <section>
-                <title>HBase-managed Zookeeper Configuration</title>
-
-                <para>On each node that will run a zookeeper, a
-                master, or a regionserver, create a <link
-                xlink:href="http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html">JAAS</link>
-                configuration file in the conf directory of the node's
-                <filename>HBASE_HOME</filename> directory that looks like the
-                following:</para>
-
-                <programlisting>
-                  Server {
-                    com.sun.security.auth.module.Krb5LoginModule required
-                    useKeyTab=true
-                    keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
-                    storeKey=true
-                    useTicketCache=false
-                    principal="zookeeper/$HOST";
-                  };
-                  Client {
-                    com.sun.security.auth.module.Krb5LoginModule required
-                    useKeyTab=true
-                    useTicketCache=false
-                    keyTab="$PATH_TO_HBASE_KEYTAB"
-                    principal="hbase/$HOST";
-                  };
-                </programlisting>
-                
-                where the <filename>$PATH_TO_HBASE_KEYTAB</filename> and
-                <filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename> files are what
-                you created above, and <code>$HOST</code> is the hostname for that
-                node.
-
-                <para>The <code>Server</code> section will be used by
-                the Zookeeper quorum server, while the
-                <code>Client</code> section will be used by the HBase
-                master and regionservers. The path to this file should
-                be substituted for the text <filename>$HBASE_SERVER_CONF</filename>
-                in the <filename>hbase-env.sh</filename>
-                listing below.</para>
-
-                <para>
-                  The path to this file should be substituted for the
-                  text <filename>$CLIENT_CONF</filename> in the
-                  <filename>hbase-env.sh</filename> listing below.
-                </para>
-
-                <para>Modify your <filename>hbase-env.sh</filename> to include the
-                following:</para>
-
-                <programlisting>
-                  export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
-                  export HBASE_MANAGES_ZK=true
-                  export HBASE_ZOOKEEPER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
-                  export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
-                  export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
-                </programlisting>
-
-                where <filename>$HBASE_SERVER_CONF</filename> and
-                <filename>$CLIENT_CONF</filename> are the full paths to the
-                JAAS configuration files created above.
-
-                <para>Modify your <filename>hbase-site.xml</filename> on each node
-                that will run zookeeper, master or regionserver to contain:</para>
-
-                <programlisting><![CDATA[
-                  <configuration>
-                    <property>
-                      <name>hbase.zookeeper.quorum</name>
-                      <value>$ZK_NODES</value>
-                    </property>
-                    <property>
-                      <name>hbase.cluster.distributed</name>
-                      <value>true</value>
-                    </property>
-                    <property>
-                      <name>hbase.zookeeper.property.authProvider.1</name>
-                      <value>org.apache.zookeeper.server.auth.SASLAuthenticationProvider</value>
-                    </property>
-                    <property>
-                      <name>hbase.zookeeper.property.kerberos.removeHostFromPrincipal</name>
-                      <value>true</value>
-                    </property>
-                    <property>
-                      <name>hbase.zookeeper.property.kerberos.removeRealmFromPrincipal</name>
-                      <value>true</value>
-                    </property>
-                  </configuration>
-                  ]]></programlisting>
-
-                <para>where <code>$ZK_NODES</code> is the
-                comma-separated list of hostnames of the Zookeeper
-                Quorum hosts.</para>
-
-                <para>Start your hbase cluster by running one or more
-                of the following set of commands on the appropriate
-                hosts:
-                </para>
-
-                <programlisting>
-                  bin/hbase zookeeper start
-                  bin/hbase master start
-                  bin/hbase regionserver start
-                </programlisting>
-
-              </section>
-
-              <section><title>External Zookeeper Configuration</title>
-                <para>Add a JAAS configuration file that looks like:
-
-                <programlisting>
-                  Client {
-                    com.sun.security.auth.module.Krb5LoginModule required
-                    useKeyTab=true
-                    useTicketCache=false
-                    keyTab="$PATH_TO_HBASE_KEYTAB"
-                    principal="hbase/$HOST";
-                  };
-                </programlisting>
-
-                where the <filename>$PATH_TO_HBASE_KEYTAB</filename> is the keytab 
-                created above for HBase services to run on this host, and <code>$HOST</code> is the
-                hostname for that node. Put this in the HBase home's
-                configuration directory. We'll refer to this file's
-                full pathname as <filename>$HBASE_SERVER_CONF</filename> below.</para>
-
-                <para>Modify your hbase-env.sh to include the following:</para>
-
-                <programlisting>
-                  export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
-                  export HBASE_MANAGES_ZK=false
-                  export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
-                  export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
-                </programlisting>
-
-
-                <para>Modify your <filename>hbase-site.xml</filename> on each node
-                that will run a master or regionserver to contain:</para>
-
-                <programlisting><![CDATA[
-                  <configuration>
-                    <property>
-                      <name>hbase.zookeeper.quorum</name>
-                      <value>$ZK_NODES</value>
-                    </property>
-                    <property>
-                      <name>hbase.cluster.distributed</name>
-                      <value>true</value>
-                    </property>
-                  </configuration>
-                  ]]>
-                </programlisting>
-
-                <para>where <code>$ZK_NODES</code> is the
-                comma-separated list of hostnames of the Zookeeper
-                Quorum hosts.</para>
-
-                <para>
-                  Add a <filename>zoo.cfg</filename> for each Zookeeper Quorum host containing:
-                  <programlisting>
-                      authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
-                      kerberos.removeHostFromPrincipal=true
-                      kerberos.removeRealmFromPrincipal=true
-                  </programlisting>
-
-                  Also on each of these hosts, create a JAAS configuration file containing:
-
-                  <programlisting>
-                  Server {
-                    com.sun.security.auth.module.Krb5LoginModule required
-                    useKeyTab=true
-                    keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
-                    storeKey=true
-                    useTicketCache=false
-                    principal="zookeeper/$HOST";
-                  };
-                  </programlisting>
-
-                  where <code>$HOST</code> is the hostname of each
-                  Quorum host. We will refer to the full pathname of
-                  this file as <filename>$ZK_SERVER_CONF</filename> below.
-
-                </para>
-
-                <para>
-                  Start your Zookeepers on each Zookeeper Quorum host with:
-                  
-                  <programlisting>
-                    SERVER_JVMFLAGS="-Djava.security.auth.login.config=$ZK_SERVER_CONF" bin/zkServer start
-                  </programlisting>
-
-                </para>
-
-                <para>
-                  Start your HBase cluster by running one or more of the following set of commands on the appropriate nodes:
-                </para>
-
-                <programlisting>
-                  bin/hbase master start
-                  bin/hbase regionserver start
-                </programlisting>
-
-
-              </section>
-
-              <section>
-                <title>Zookeeper Server Authentication Log Output</title>
-                <para>If the configuration above is successful,
-                you should see something similar to the following in
-                your Zookeeper server logs:
-                <programlisting>
-11/12/05 22:43:39 INFO zookeeper.Login: successfully logged in.
-11/12/05 22:43:39 INFO server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
-11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh thread started.
-11/12/05 22:43:39 INFO zookeeper.Login: TGT valid starting at:        Mon Dec 05 22:43:39 UTC 2011
-11/12/05 22:43:39 INFO zookeeper.Login: TGT expires:                  Tue Dec 06 22:43:39 UTC 2011
-11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:36:42 UTC 2011
-..
-11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: 
-  Successfully authenticated client: authenticationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN; 
-  authorizationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN.
-11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: Setting authorizedID: hbase
-11/12/05 22:43:59 INFO server.ZooKeeperServer: adding SASL authorization for authorizationID: hbase
-                </programlisting>
-                  
-                </para>
-
-              </section>
-
-              <section>
-                <title>Zookeeper Client Authentication Log Output</title>
-                <para>On the Zookeeper client side (HBase master or regionserver),
-                you should see something similar to the following:
-
-                <programlisting>
-11/12/05 22:43:59 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ip-10-166-175-249.us-west-1.compute.internal:2181 sessionTimeout=180000 watcher=master:60000
-11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Opening socket connection to server /10.166.175.249:2181
-11/12/05 22:43:59 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 14851@ip-10-166-175-249
-11/12/05 22:43:59 INFO zookeeper.Login: successfully logged in.
-11/12/05 22:43:59 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
-11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh thread started.
-11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, initiating session
-11/12/05 22:43:59 INFO zookeeper.Login: TGT valid starting at:        Mon Dec 05 22:43:59 UTC 2011
-11/12/05 22:43:59 INFO zookeeper.Login: TGT expires:                  Tue Dec 06 22:43:59 UTC 2011
-11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:30:37 UTC 2011
-11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, sessionid = 0x134106594320000, negotiated timeout = 180000
-                </programlisting>
-                </para>
-              </section>
-
-              <section>
-                <title>Configuration from Scratch</title>
-
-                This has been tested on the current standard Amazon
-                Linux AMI.  First setup KDC and principals as
-                described above. Next checkout code and run a sanity
-                check.
-                
-                <programlisting>
-                git clone git://git.apache.org/hbase.git
-                cd hbase
-                mvn -Psecurity,localTests clean test -Dtest=TestZooKeeperACL
-                </programlisting>
-
-                Then configure HBase as described above.
-                Manually edit target/cached_classpath.txt (see below)..
-
-                <programlisting>
-                bin/hbase zookeeper &amp;
-                bin/hbase master &amp;
-                bin/hbase regionserver &amp;
-                </programlisting>
-              </section>
-
-
-              <section>
-                <title>Future improvements</title>
-
-                <section><title>Fix target/cached_classpath.txt</title>
-                <para>
-                You must override the standard hadoop-core jar file from the
-                <code>target/cached_classpath.txt</code>
-                file with the version containing the HADOOP-7070 fix. You can use the following script to do this:
-
-                <programlisting>
-                  echo `find ~/.m2 -name "*hadoop-core*7070*SNAPSHOT.jar"` ':' `cat target/cached_classpath.txt` | sed 's/ //g' > target/tmp.txt 
-                  mv target/tmp.txt target/cached_classpath.txt
-                </programlisting>
-
-                </para>
-
-                </section>
-
-                <section>
-                  <title>Set JAAS configuration
-                  programmatically</title> 
-
-
-                  This would avoid the need for a separate Hadoop jar
-                  that fixes <link xlink:href="https://issues.apache.org/jira/browse/HADOOP-7070">HADOOP-7070</link>.
-                </section>
-                
-                <section>
-                  <title>Elimination of 
-                  <code>kerberos.removeHostFromPrincipal</code> and 
-                  <code>kerberos.removeRealmFromPrincipal</code></title>
-                </section>
-                
-              </section>
-
-
-            </section> <!-- SASL Authentication with ZooKeeper -->
-
-
-
-
-
-     </section>     <!--  zookeeper -->        
-    
-    
-    <section xml:id="config.files">    
+    <section xml:id="config.files">
          <title>Configuration Files</title>
-         
+
     <section xml:id="hbase.site">
     <title><filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename></title>
     <para>Just as in Hadoop where you add site-specific HDFS configuration
@@ -1197,7 +669,7 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {star
     The generated file is a docbook section with a glossary
     in it-->
     <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
-      href="../../target/site/hbase-default.xml" />
+        href="../../hbase-common/src/main/resources/hbase-default.xml" />
     </section>
 
       <section xml:id="hbase.env.sh">
@@ -1242,8 +714,17 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {star
             used by tests).
       </para>
       <para>
-          Minimally, a client of HBase needs the hbase, hadoop, log4j, commons-logging, commons-lang,
-          and ZooKeeper jars in its <varname>CLASSPATH</varname> connecting to a cluster.
+          Minimally, a client of HBase needs several libraries in its <varname>CLASSPATH</varname> when connecting to a cluster, including:
+          <programlisting>
+commons-configuration (commons-configuration-1.6.jar)
+commons-lang (commons-lang-2.5.jar)
+commons-logging (commons-logging-1.1.1.jar)
+hadoop-core (hadoop-core-1.0.0.jar)
+hbase (hbase-0.92.0.jar)
+log4j (log4j-1.2.16.jar)
+slf4j-api (slf4j-api-1.5.8.jar)
+slf4j-log4j (slf4j-log4j12-1.5.8.jar)
+zookeeper (zookeeper-3.4.2.jar)</programlisting>
       </para>
         <para>
           An example basic <filename>hbase-site.xml</filename> for client only
@@ -1261,7 +742,7 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {star
 </configuration>
 ]]></programlisting>
         </para>
-        
+
         <section xml:id="java.client.config">
         <title>Java client configuration</title>
         <para>The configuration used by a Java client is kept
@@ -1270,15 +751,15 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {star
         on invocation, will read in the content of the first <filename>hbase-site.xml</filename> found on
         the client's <varname>CLASSPATH</varname>, if one is present
         (Invocation will also factor in any <filename>hbase-default.xml</filename> found;
-        an hbase-default.xml ships inside the <filename>hbase.X.X.X.jar</filename>). 
+        an hbase-default.xml ships inside the <filename>hbase.X.X.X.jar</filename>).
         It is also possible to specify configuration directly without having to read from a
         <filename>hbase-site.xml</filename>.  For example, to set the ZooKeeper
         ensemble for the cluster programmatically do as follows:
         <programlisting>Configuration config = HBaseConfiguration.create();
-config.set("hbase.zookeeper.quorum", "localhost");  // Here we are running zookeeper locally</programlisting>    
+config.set("hbase.zookeeper.quorum", "localhost");  // Here we are running zookeeper locally</programlisting>
         If multiple ZooKeeper instances make up your ZooKeeper ensemble,
         they may be specified in a comma-separated list (just as in the <filename>hbase-site.xml</filename> file).
-        This populated <classname>Configuration</classname> instance can then be passed to an 
+        This populated <classname>Configuration</classname> instance can then be passed to an
         <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>,
         and so on.
         </para>
@@ -1286,7 +767,7 @@ config.set("hbase.zookeeper.quorum", "lo
         </section>
 
       </section>  <!--  config files -->
-      
+
       <section xml:id="example_config">
       <title>Example Configurations</title>
 
@@ -1378,7 +859,7 @@ config.set("hbase.zookeeper.quorum", "lo
           1G.</para>
 
           <programlisting>
-    
+
 $ git diff hbase-env.sh
 diff --git a/conf/hbase-env.sh b/conf/hbase-env.sh
 index e70ebc6..96f8c27 100644
@@ -1386,11 +867,11 @@ index e70ebc6..96f8c27 100644
 +++ b/conf/hbase-env.sh
 @@ -31,7 +31,7 @@ export JAVA_HOME=/usr/lib//jvm/java-6-sun/
  # export HBASE_CLASSPATH=
- 
+
  # The maximum amount of heap to use, in MB. Default is 1000.
 -# export HBASE_HEAPSIZE=1000
 +export HBASE_HEAPSIZE=4096
- 
+
  # Extra Java runtime options.
  # Below are what we set by default.  May only work with SUN JVM.
 
@@ -1402,8 +883,8 @@ index e70ebc6..96f8c27 100644
         </section>
       </section>
      </section>       <!-- example config -->
-      
-      
+
+
       <section xml:id="important_configurations">
       <title>The Important Configurations</title>
       <para>Below we list what the <emphasis>important</emphasis>
@@ -1415,9 +896,23 @@ index e70ebc6..96f8c27 100644
       <section xml:id="required_configuration"><title>Required Configurations</title>
           <para>Review the <xref linkend="os" /> and <xref linkend="hadoop" /> sections.
       </para>
+      <section xml:id="big.cluster.config"><title>Big Cluster Configurations</title>
+        <para>If a cluster with a lot of regions, it is possible if an eager beaver
+            regionserver checks in soon after master start while all the rest in the
+            cluster are laggardly, this first server to checkin will be assigned all
+            regions.  If lots of regions, this first server could buckle under the
+            load.  To prevent the above scenario happening up the
+            <varname>hbase.master.wait.on.regionservers.mintostart</varname> from its
+            default value of 1.  See
+            <link xlink:href="https://issues.apache.org/jira/browse/HBASE-6389">HBASE-6389 Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments</link>
+            for more detail.
+        </para>
+      </section>
       </section>
 
       <section xml:id="recommended_configurations"><title>Recommended Configurations</title>
+          <section xml:id="recommended_configurations.zk">
+              <title>ZooKeeper Configuration</title>
           <section xml:id="zookeeper.session.timeout"><title><varname>zookeeper.session.timeout</varname></title>
           <para>The default timeout is three minutes (specified in milliseconds). This means
               that if a server crashes, it will be three minutes before the Master notices
@@ -1427,7 +922,7 @@ index e70ebc6..96f8c27 100644
               configuration under control otherwise, a long garbage collection that lasts
               beyond the ZooKeeper session timeout will take out
               your RegionServer (You might be fine with this -- you probably want recovery to start
-          on the server if a RegionServer has been in GC for a long period of time).</para> 
+          on the server if a RegionServer has been in GC for a long period of time).</para>
 
       <para>To change this configuration, edit <filename>hbase-site.xml</filename>,
           copy the changed file around the cluster and restart.</para>
@@ -1444,6 +939,18 @@ index e70ebc6..96f8c27 100644
           <para>See <xref linkend="zookeeper"/>.
           </para>
       </section>
+      </section>
+      <section xml:id="recommended.configurations.hdfs">
+          <title>HDFS Configurations</title>
+          <section xml:id="dfs.datanode.failed.volumes.tolerated">
+              <title>dfs.datanode.failed.volumes.tolerated</title>
+              <para>This is the "...number of volumes that are allowed to fail before a datanode stops offering service. By default
+                  any volume failure will cause a datanode to shutdown" from the <filename>hdfs-default.xml</filename>
+                  description.  If you have > three or four disks, you might want to set this to 1 or if you have many disks,
+                  two or more.
+              </para>
+          </section>
+      </section>
           <section xml:id="hbase.regionserver.handler.count"><title><varname>hbase.regionserver.handler.count</varname></title>
           <para>
           This setting defines the number of threads that are kept open to answer
@@ -1503,7 +1010,7 @@ index e70ebc6..96f8c27 100644
       cluster (You can always later manually split the big Regions should one prove
       hot and you want to spread the request load over the cluster).  A lower number of regions is
        preferred, generally in the range of 20 to low-hundreds
-       per RegionServer.  Adjust the regionsize as appropriate to achieve this number. 
+       per RegionServer.  Adjust the regionsize as appropriate to achieve this number.
        </para>
        <para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
        For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
@@ -1511,10 +1018,58 @@ index e70ebc6..96f8c27 100644
        <para>You may need to experiment with this setting based on your hardware configuration and application needs.
        </para>
        <para>Adjust <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
-       RegionSize can also be set on a per-table basis via 
+       RegionSize can also be set on a per-table basis via
        <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
       </para>
-      
+      <section xml:id="too_many_regions">
+          <title>How many regions per RegionServer?</title>
+          <para>
+              Typically you want to keep your region count low on HBase for numerous reasons.
+              Usually right around 100 regions per RegionServer has yielded the best results.
+              Here are some of the reasons below for keeping region count low:
+              <unorderedlist>
+                  <listitem><para>
+                          MSLAB requires 2mb per memstore (that's 2mb per family per region).
+                          1000 regions that have 2 families each is 3.9GB of heap used, and it's not even storing data yet. NB: the 2MB value is configurable.
+                  </para></listitem>
+                  <listitem><para>If you fill all the regions at somewhat the same rate, the global memory usage makes it that it forces tiny
+                          flushes when you have too many regions which in turn generates compactions.
+                          Rewriting the same data tens of times is the last thing you want.
+                          An example is filling 1000 regions (with one family) equally and let's consider a lower bound for global memstore
+                          usage of 5GB (the region server would have a big heap).
+                          Once it reaches 5GB it will force flush the biggest region,
+                          at that point they should almost all have about 5MB of data so
+                          it would flush that amount. 5MB inserted later, it would flush another
+                          region that will now have a bit over 5MB of data, and so on.
+                          A basic formula for the amount of regions to have per region server would
+                          look like this:
+                          Heap * upper global memstore limit = amount of heap devoted to memstore
+                          then the amount of heap devoted to memstore / (Number of regions per RS * CFs).
+                          This will give you the rough memstore size if everything is being written to.
+                          A more accurate formula is
+                          Heap * upper global memstore limit = amount of heap devoted to memstore then the
+                          amount of heap devoted to memstore / (Number of actively written regions per RS * CFs).
+                          This can allot you a higher region count from the write perspective if you know how many
+                          regions you will be writing to at one time.
+                  </para></listitem>
+                  <listitem><para>The master as is is allergic to tons of regions, and will
+                          take a lot of time assigning them and moving them around in batches.
+                          The reason is that it's heavy on ZK usage, and it's not very async
+                          at the moment (could really be improved -- and has been imporoved a bunch
+                          in 0.96 hbase).
+                  </para></listitem>
+                  <listitem><para>
+                          In older versions of HBase (pre-v2 hfile, 0.90 and previous), tons of regions
+                          on a few RS can cause the store file index to rise raising heap usage and can
+                          create memory pressure or OOME on the RSs
+                  </para></listitem>
+          </unorderedlist>
+      </para>
+      <para>Another issue is the effect of the number of regions on mapreduce jobs.
+          Keeping 5 regions per RS would be too low for a job, whereas 1000 will generate too many maps.
+      </para>
+      </section>
+
       </section>
       <section xml:id="disable.splitting">
       <title>Managed Splitting</title>
@@ -1567,23 +1122,30 @@ of all regions.
 </para>
       </section>
       <section xml:id="managed.compactions"><title>Managed Compactions</title>
-      <para>A common administrative technique is to manage major compactions manually, rather than letting 
+      <para>A common administrative technique is to manage major compactions manually, rather than letting
       HBase do it.  By default, <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname> is one day and major compactions
       may kick in when you least desire it - especially on a busy system.  To turn off automatic major compactions set
-      the value to <varname>0</varname>. 
+      the value to <varname>0</varname>.
       </para>
       <para>It is important to stress that major compactions are absolutely necessary for StoreFile cleanup, the only variant is when
-      they occur.  They can be administered through the HBase shell, or via 
+      they occur.  They can be administered through the HBase shell, or via
       <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin</link>.
       </para>
       <para>For more information about compactions and the compaction file selection process, see <xref linkend="compaction"/></para>
       </section>
-      
+
+      <section xml:id="spec.ex"><title>Speculative Execution</title>
+        <para>Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off
+        Speculative Execution at a system-level unless you need it for a specific case, where it can be configured per-job.
+        Set the properties <varname>mapred.map.tasks.speculative.execution</varname> and
+        <varname>mapred.reduce.tasks.speculative.execution</varname> to false.
+        </para>
+      </section>
       </section>
 
       <section xml:id="other_configuration"><title>Other Configurations</title>
          <section xml:id="balancer_config"><title>Balancer</title>
-           <para>The balancer is periodic operation run on the master to redistribute regions on the cluster.  It is configured via
+           <para>The balancer is a periodic operation which is run on the master to redistribute regions on the cluster.  It is configured via
            <varname>hbase.balancer.period</varname> and defaults to 300000 (5 minutes). </para>
            <para>See <xref linkend="master.processes.loadbalancer" /> for more information on the LoadBalancer.
            </para>
@@ -1596,38 +1158,18 @@ of all regions.
            on the size you need by surveying regionserver UIs; you'll see index block size accounted near the
            top of the webpage).</para>
          </section>
-      </section>
-      
-      </section> <!--  important config -->
-
-	  <section xml:id="config.bloom">
-	    <title>Bloom Filter Configuration</title>
-        <section>
-        <title><varname>io.hfile.bloom.enabled</varname> global kill
-        switch</title>
-
-        <para><code>io.hfile.bloom.enabled</code> in
-        <classname>Configuration</classname> serves as the kill switch in case
-        something goes wrong. Default = <varname>true</varname>.</para>
-        </section>
-
-        <section>
-        <title><varname>io.hfile.bloom.error.rate</varname></title>
+    <section xml:id="nagles">
+      <title><link xlink:href="http://en.wikipedia.org/wiki/Nagle's_algorithm">Nagle's</link> or the small package problem</title>
+      <para>If a big 40ms or so occasional delay is seen in operations against HBase,
+      try the Nagles' setting.  For example, see the user mailing list thread,
+      <link xlink:href="http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&amp;subj=Re+Inconsistent+scan+performance+with+caching+set+to+1">Inconsistent scan performance with caching set to 1</link>
+      and the issue cited therein where setting notcpdelay improved scan speeds.  You might also
+      see the graphs on the tail of <link xlink:href="https://issues.apache.org/jira/browse/HBASE-7008">HBASE-7008 Set scanner caching to a better default</link>
+      where our Lars Hofhansl tries various data sizes w/ Nagle's on and off measuring the effect.</para>
+    </section>
 
-        <para><varname>io.hfile.bloom.error.rate</varname> = average false
-        positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
-        bit per bloom entry.</para>
-        </section>
+      </section>
 
-        <section>
-        <title><varname>io.hfile.bloom.max.fold</varname></title>
+      </section> <!--  important config -->
 
-        <para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
-        fold rate. Most people should leave this alone. Default = 7, or can
-        collapse to at least 1/128th of original size. See the
-        <emphasis>Development Process</emphasis> section of the document <link
-        xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
-        in HBase</link> for more on what this option means.</para>
-        </section>
-      </section>
   </chapter>

Modified: hbase/branches/0.94/src/docbkx/customization.xsl
URL: http://svn.apache.org/viewvc/hbase/branches/0.94/src/docbkx/customization.xsl?rev=1455996&r1=1455995&r2=1455996&view=diff
==============================================================================
--- hbase/branches/0.94/src/docbkx/customization.xsl (original)
+++ hbase/branches/0.94/src/docbkx/customization.xsl Wed Mar 13 15:20:19 2013
@@ -20,15 +20,29 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-
-This stylesheet is used making an html version of hbase-default.xml.
 -->
   <xsl:import href="urn:docbkx:stylesheet"/>
+  <xsl:output method="html" encoding="UTF-8" indent="no"/>
 
   <xsl:template name="user.header.content">
   </xsl:template>
 
   <xsl:template name="user.footer.content">
+<div id="disqus_thread"></div>
+<script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = '<xsl:value-of select="@xml:id" />';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script>
+<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
+<a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
   </xsl:template>
 
 </xsl:stylesheet>