You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2012/09/24 00:01:17 UTC
svn commit: r1389153 - in /hbase/trunk/src/docbkx: book.xml
configuration.xml getting_started.xml performance.xml zookeeper.xml
Author: stack
Date: Sun Sep 23 22:01:16 2012
New Revision: 1389153
URL: http://svn.apache.org/viewvc?rev=1389153&view=rev
Log:
More edits: Moved ZK to its own chapter, put the bloom filter stuff together in one place, made the distributed setup more focused
Added:
hbase/trunk/src/docbkx/zookeeper.xml
Modified:
hbase/trunk/src/docbkx/book.xml
hbase/trunk/src/docbkx/configuration.xml
hbase/trunk/src/docbkx/getting_started.xml
hbase/trunk/src/docbkx/performance.xml
Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1389153&r1=1389152&r2=1389153&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Sun Sep 23 22:01:16 2012
@@ -2318,65 +2318,6 @@ myHtd.setValue(HTableDescriptor.SPLIT_PO
</section> <!-- compaction -->
</section> <!-- store -->
-
- <section xml:id="blooms">
- <title>Bloom Filters</title>
- <para><link xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed over in <link
- xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200
- Add bloomfilters</link>.<footnote>
- <para>For description of the development process -- why static blooms
- rather than dynamic -- and for an overview of the unique properties
- that pertain to blooms in HBase, as well as possible future
- directions, see the <emphasis>Development Process</emphasis> section
- of the document <link
- xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
- in HBase</link> attached to <link
- xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para>
- </footnote><footnote>
- <para>The bloom filters described here are actually version two of
- blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
- option based on work done by the <link
- xlink:href="http://www.one-lab.org">European Commission One-Lab
- Project 034819</link>. The core of the HBase bloom work was later
- pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
- Version 1 of HBase blooms never worked that well. Version 2 is a
- rewrite from scratch though again it starts with the one-lab
- work.</para>
- </footnote></para>
- <para>See also <xref linkend="schema.bloom" /> and <xref linkend="config.bloom" />.
- </para>
-
- <section xml:id="bloom_footprint">
- <title>Bloom StoreFile footprint</title>
-
- <para>Bloom filters add an entry to the <classname>StoreFile</classname>
- general <classname>FileInfo</classname> data structure and then two
- extra entries to the <classname>StoreFile</classname> metadata
- section.</para>
-
- <section>
- <title>BloomFilter in the <classname>StoreFile</classname>
- <classname>FileInfo</classname> data structure</title>
-
- <para><classname>FileInfo</classname> has a
- <varname>BLOOM_FILTER_TYPE</varname> entry which is set to
- <varname>NONE</varname>, <varname>ROW</varname> or
- <varname>ROWCOL.</varname></para>
- </section>
-
- <section>
- <title>BloomFilter entries in <classname>StoreFile</classname>
- metadata</title>
-
- <para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash
- Function used, etc. Its small in size and is cached on
- <classname>StoreFile.Reader</classname> load</para>
- <para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter
- data. Obtained on-demand. Stored in the LRU cache, if it is enabled
- (Its enabled by default).</para>
- </section>
- </section>
- </section> <!-- bloom -->
</section> <!-- regions -->
@@ -2519,6 +2460,7 @@ myHtd.setValue(HTableDescriptor.SPLIT_PO
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="case_studies.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ops_mgt.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="developer.xml" />
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="zookeeper.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="community.xml" />
<appendix xml:id="faq">
Modified: hbase/trunk/src/docbkx/configuration.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/configuration.xml?rev=1389153&r1=1389152&r2=1389153&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/configuration.xml (original)
+++ hbase/trunk/src/docbkx/configuration.xml Sun Sep 23 22:01:16 2012
@@ -27,8 +27,10 @@
*/
-->
<title>Configuration</title>
- <para>This chapter is the Not-So-Quick start guide to HBase configuration.</para>
- <para>Please read this chapter carefully and ensure that all requirements have
+ <para>This chapter is the Not-So-Quick start guide to HBase configuration. It goes
+ over system requirements, Hadoop setup, the different HBase run modes, and the
+ various configurations in HBase. Please read this chapter carefully and ensure
+ that all <xref linkend="basic.requirements" /> have
been satisfied. Failure to do so will cause you (and us) grief debugging strange errors
and/or data loss.</para>
@@ -56,6 +58,10 @@ to ensure well-formedness of your docume
all nodes of the cluster. HBase will not do this for you.
Use <command>rsync</command>.</para>
+ <section xml:id="basic.requirements">
+ <title>Basic Requirements</title>
+ <para>This section lists required services and some required system configuration.
+ </para>
<section xml:id="java">
<title>Java</title>
@@ -237,7 +243,6 @@ to ensure well-formedness of your docume
Currently only Hadoop versions 0.20.205.x or any release in excess of this
version -- this includes hadoop 1.0.0 -- have a working, durable sync
<footnote>
- <title>On Hadoop Versions</title>
<para>The Cloudera blog post <link xlink:href="http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/">An update on Apache Hadoop 1.0</link>
by Charles Zedlweski has a nice exposition on how all the Hadoop versions relate.
Its worth checking out if you are having trouble making sense of the
@@ -352,6 +357,7 @@ to ensure well-formedness of your docume
</section>
</section> <!-- hadoop -->
+ </section>
<section xml:id="standalone_dist">
<title>HBase run modes: Standalone and Distributed</title>
@@ -686,565 +692,6 @@ stopping hbase...............</programli
</section>
</section> <!-- run modes -->
- <section xml:id="zookeeper">
- <title>ZooKeeper<indexterm>
- <primary>ZooKeeper</primary>
- </indexterm></title>
-
- <para>A distributed HBase depends on a running ZooKeeper cluster.
- All participating nodes and clients need to be able to access the
- running ZooKeeper ensemble. HBase by default manages a ZooKeeper
- "cluster" for you. It will start and stop the ZooKeeper ensemble
- as part of the HBase start/stop process. You can also manage the
- ZooKeeper ensemble independent of HBase and just point HBase at
- the cluster it should use. To toggle HBase management of
- ZooKeeper, use the <varname>HBASE_MANAGES_ZK</varname> variable in
- <filename>conf/hbase-env.sh</filename>. This variable, which
- defaults to <varname>true</varname>, tells HBase whether to
- start/stop the ZooKeeper ensemble servers as part of HBase
- start/stop.</para>
-
- <para>When HBase manages the ZooKeeper ensemble, you can specify
- ZooKeeper configuration using its native
- <filename>zoo.cfg</filename> file, or, the easier option is to
- just specify ZooKeeper options directly in
- <filename>conf/hbase-site.xml</filename>. A ZooKeeper
- configuration option can be set as a property in the HBase
- <filename>hbase-site.xml</filename> XML configuration file by
- prefacing the ZooKeeper option name with
- <varname>hbase.zookeeper.property</varname>. For example, the
- <varname>clientPort</varname> setting in ZooKeeper can be changed
- by setting the
- <varname>hbase.zookeeper.property.clientPort</varname> property.
- For all default values used by HBase, including ZooKeeper
- configuration, see <xref linkend="hbase_default_configurations" />. Look for the
- <varname>hbase.zookeeper.property</varname> prefix <footnote>
- <para>For the full list of ZooKeeper configurations, see
- ZooKeeper's <filename>zoo.cfg</filename>. HBase does not ship
- with a <filename>zoo.cfg</filename> so you will need to browse
- the <filename>conf</filename> directory in an appropriate
- ZooKeeper download.</para>
- </footnote></para>
-
- <para>You must at least list the ensemble servers in
- <filename>hbase-site.xml</filename> using the
- <varname>hbase.zookeeper.quorum</varname> property. This property
- defaults to a single ensemble member at
- <varname>localhost</varname> which is not suitable for a fully
- distributed HBase. (It binds to the local machine only and remote
- clients will not be able to connect). <note xml:id="how_many_zks">
- <title>How many ZooKeepers should I run?</title>
-
- <para>You can run a ZooKeeper ensemble that comprises 1 node
- only but in production it is recommended that you run a
- ZooKeeper ensemble of 3, 5 or 7 machines; the more members an
- ensemble has, the more tolerant the ensemble is of host
- failures. Also, run an odd number of machines. In ZooKeeper,
- an even number of peers is supported, but it is normally not used
- because an even sized ensemble requires, proportionally, more peers
- to form a quorum than an odd sized ensemble requires. For example, an
- ensemble with 4 peers requires 3 to form a quorum, while an ensemble with
- 5 also requires 3 to form a quorum. Thus, an ensemble of 5 allows 2 peers to
- fail, and thus is more fault tolerant than the ensemble of 4, which allows
- only 1 down peer.
- </para>
- <para>Give each ZooKeeper server around 1GB of RAM, and if possible, its own
- dedicated disk (A dedicated disk is the best thing you can do
- to ensure a performant ZooKeeper ensemble). For very heavily
- loaded clusters, run ZooKeeper servers on separate machines
- from RegionServers (DataNodes and TaskTrackers).</para>
- </note></para>
-
- <para>For example, to have HBase manage a ZooKeeper quorum on
- nodes <emphasis>rs{1,2,3,4,5}.example.com</emphasis>, bound to
- port 2222 (the default is 2181) ensure
- <varname>HBASE_MANAGE_ZK</varname> is commented out or set to
- <varname>true</varname> in <filename>conf/hbase-env.sh</filename>
- and then edit <filename>conf/hbase-site.xml</filename> and set
- <varname>hbase.zookeeper.property.clientPort</varname> and
- <varname>hbase.zookeeper.quorum</varname>. You should also set
- <varname>hbase.zookeeper.property.dataDir</varname> to other than
- the default as the default has ZooKeeper persist data under
- <filename>/tmp</filename> which is often cleared on system
- restart. In the example below we have ZooKeeper persist to
- <filename>/user/local/zookeeper</filename>. <programlisting>
- <configuration>
- ...
- <property>
- <name>hbase.zookeeper.property.clientPort</name>
- <value>2222</value>
- <description>Property from ZooKeeper's config zoo.cfg.
- The port at which the clients will connect.
- </description>
- </property>
- <property>
- <name>hbase.zookeeper.quorum</name>
- <value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value>
- <description>Comma separated list of servers in the ZooKeeper Quorum.
- For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
- By default this is set to localhost for local and pseudo-distributed modes
- of operation. For a fully-distributed setup, this should be set to a full
- list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
- this is the list of servers which we will start/stop ZooKeeper on.
- </description>
- </property>
- <property>
- <name>hbase.zookeeper.property.dataDir</name>
- <value>/usr/local/zookeeper</value>
- <description>Property from ZooKeeper's config zoo.cfg.
- The directory where the snapshot is stored.
- </description>
- </property>
- ...
- </configuration></programlisting></para>
-
- <section>
- <title>Using existing ZooKeeper ensemble</title>
-
- <para>To point HBase at an existing ZooKeeper cluster, one that
- is not managed by HBase, set <varname>HBASE_MANAGES_ZK</varname>
- in <filename>conf/hbase-env.sh</filename> to false
- <programlisting>
- ...
- # Tell HBase whether it should manage its own instance of Zookeeper or not.
- export HBASE_MANAGES_ZK=false</programlisting> Next set ensemble locations
- and client port, if non-standard, in
- <filename>hbase-site.xml</filename>, or add a suitably
- configured <filename>zoo.cfg</filename> to HBase's
- <filename>CLASSPATH</filename>. HBase will prefer the
- configuration found in <filename>zoo.cfg</filename> over any
- settings in <filename>hbase-site.xml</filename>.</para>
-
- <para>When HBase manages ZooKeeper, it will start/stop the
- ZooKeeper servers as a part of the regular start/stop scripts.
- If you would like to run ZooKeeper yourself, independent of
- HBase start/stop, you would do the following</para>
-
- <programlisting>
-${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
-</programlisting>
-
- <para>Note that you can use HBase in this manner to spin up a
- ZooKeeper cluster, unrelated to HBase. Just make sure to set
- <varname>HBASE_MANAGES_ZK</varname> to <varname>false</varname>
- if you want it to stay up across HBase restarts so that when
- HBase shuts down, it doesn't take ZooKeeper down with it.</para>
-
- <para>For more information about running a distinct ZooKeeper
- cluster, see the ZooKeeper <link
- xlink:href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting
- Started Guide</link>. Additionally, see the <link xlink:href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7">ZooKeeper Wiki</link> or the
- <link xlink:href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup">ZooKeeper documentation</link>
- for more information on ZooKeeper sizing.
- </para>
- </section>
-
-
- <section xml:id="zk.sasl.auth">
- <title>SASL Authentication with ZooKeeper</title>
- <para>Newer releases of HBase (>= 0.92) will
- support connecting to a ZooKeeper Quorum that supports
- SASL authentication (which is available in Zookeeper
- versions 3.4.0 or later).</para>
-
- <para>This describes how to set up HBase to mutually
- authenticate with a ZooKeeper Quorum. ZooKeeper/HBase
- mutual authentication (<link
- xlink:href="https://issues.apache.org/jira/browse/HBASE-2418">HBASE-2418</link>)
- is required as part of a complete secure HBase configuration
- (<link
- xlink:href="https://issues.apache.org/jira/browse/HBASE-3025">HBASE-3025</link>).
-
- For simplicity of explication, this section ignores
- additional configuration required (Secure HDFS and Coprocessor
- configuration). It's recommended to begin with an
- HBase-managed Zookeeper configuration (as opposed to a
- standalone Zookeeper quorum) for ease of learning.
- </para>
-
- <section><title>Operating System Prerequisites</title></section>
-
- <para>
- You need to have a working Kerberos KDC setup. For
- each <code>$HOST</code> that will run a ZooKeeper
- server, you should have a principle
- <code>zookeeper/$HOST</code>. For each such host,
- add a service key (using the <code>kadmin</code> or
- <code>kadmin.local</code> tool's <code>ktadd</code>
- command) for <code>zookeeper/$HOST</code> and copy
- this file to <code>$HOST</code>, and make it
- readable only to the user that will run zookeeper on
- <code>$HOST</code>. Note the location of this file,
- which we will use below as
- <filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename>.
- </para>
-
- <para>
- Similarly, for each <code>$HOST</code> that will run
- an HBase server (master or regionserver), you should
- have a principle: <code>hbase/$HOST</code>. For each
- host, add a keytab file called
- <filename>hbase.keytab</filename> containing a service
- key for <code>hbase/$HOST</code>, copy this file to
- <code>$HOST</code>, and make it readable only to the
- user that will run an HBase service on
- <code>$HOST</code>. Note the location of this file,
- which we will use below as
- <filename>$PATH_TO_HBASE_KEYTAB</filename>.
- </para>
-
- <para>
- Each user who will be an HBase client should also be
- given a Kerberos principal. This principal should
- usually have a password assigned to it (as opposed to,
- as with the HBase servers, a keytab file) which only
- this user knows. The client's principal's
- <code>maxrenewlife</code> should be set so that it can
- be renewed enough so that the user can complete their
- HBase client processes. For example, if a user runs a
- long-running HBase client process that takes at most 3
- days, we might create this user's principal within
- <code>kadmin</code> with: <code>addprinc -maxrenewlife
- 3days</code>. The Zookeeper client and server
- libraries manage their own ticket refreshment by
- running threads that wake up periodically to do the
- refreshment.
- </para>
-
- <para>On each host that will run an HBase client
- (e.g. <code>hbase shell</code>), add the following
- file to the HBase home directory's <filename>conf</filename>
- directory:</para>
-
- <programlisting>
- Client {
- com.sun.security.auth.module.Krb5LoginModule required
- useKeyTab=false
- useTicketCache=true;
- };
- </programlisting>
-
- <para>We'll refer to this JAAS configuration file as
- <filename>$CLIENT_CONF</filename> below.</para>
-
- <section>
- <title>HBase-managed Zookeeper Configuration</title>
-
- <para>On each node that will run a zookeeper, a
- master, or a regionserver, create a <link
- xlink:href="http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html">JAAS</link>
- configuration file in the conf directory of the node's
- <filename>HBASE_HOME</filename> directory that looks like the
- following:</para>
-
- <programlisting>
- Server {
- com.sun.security.auth.module.Krb5LoginModule required
- useKeyTab=true
- keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
- storeKey=true
- useTicketCache=false
- principal="zookeeper/$HOST";
- };
- Client {
- com.sun.security.auth.module.Krb5LoginModule required
- useKeyTab=true
- useTicketCache=false
- keyTab="$PATH_TO_HBASE_KEYTAB"
- principal="hbase/$HOST";
- };
- </programlisting>
-
- where the <filename>$PATH_TO_HBASE_KEYTAB</filename> and
- <filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename> files are what
- you created above, and <code>$HOST</code> is the hostname for that
- node.
-
- <para>The <code>Server</code> section will be used by
- the Zookeeper quorum server, while the
- <code>Client</code> section will be used by the HBase
- master and regionservers. The path to this file should
- be substituted for the text <filename>$HBASE_SERVER_CONF</filename>
- in the <filename>hbase-env.sh</filename>
- listing below.</para>
-
- <para>
- The path to this file should be substituted for the
- text <filename>$CLIENT_CONF</filename> in the
- <filename>hbase-env.sh</filename> listing below.
- </para>
-
- <para>Modify your <filename>hbase-env.sh</filename> to include the
- following:</para>
-
- <programlisting>
- export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
- export HBASE_MANAGES_ZK=true
- export HBASE_ZOOKEEPER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
- export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
- export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
- </programlisting>
-
- where <filename>$HBASE_SERVER_CONF</filename> and
- <filename>$CLIENT_CONF</filename> are the full paths to the
- JAAS configuration files created above.
-
- <para>Modify your <filename>hbase-site.xml</filename> on each node
- that will run zookeeper, master or regionserver to contain:</para>
-
- <programlisting><![CDATA[
- <configuration>
- <property>
- <name>hbase.zookeeper.quorum</name>
- <value>$ZK_NODES</value>
- </property>
- <property>
- <name>hbase.cluster.distributed</name>
- <value>true</value>
- </property>
- <property>
- <name>hbase.zookeeper.property.authProvider.1</name>
- <value>org.apache.zookeeper.server.auth.SASLAuthenticationProvider</value>
- </property>
- <property>
- <name>hbase.zookeeper.property.kerberos.removeHostFromPrincipal</name>
- <value>true</value>
- </property>
- <property>
- <name>hbase.zookeeper.property.kerberos.removeRealmFromPrincipal</name>
- <value>true</value>
- </property>
- </configuration>
- ]]></programlisting>
-
- <para>where <code>$ZK_NODES</code> is the
- comma-separated list of hostnames of the Zookeeper
- Quorum hosts.</para>
-
- <para>Start your hbase cluster by running one or more
- of the following set of commands on the appropriate
- hosts:
- </para>
-
- <programlisting>
- bin/hbase zookeeper start
- bin/hbase master start
- bin/hbase regionserver start
- </programlisting>
-
- </section>
-
- <section><title>External Zookeeper Configuration</title>
- <para>Add a JAAS configuration file that looks like:
-
- <programlisting>
- Client {
- com.sun.security.auth.module.Krb5LoginModule required
- useKeyTab=true
- useTicketCache=false
- keyTab="$PATH_TO_HBASE_KEYTAB"
- principal="hbase/$HOST";
- };
- </programlisting>
-
- where the <filename>$PATH_TO_HBASE_KEYTAB</filename> is the keytab
- created above for HBase services to run on this host, and <code>$HOST</code> is the
- hostname for that node. Put this in the HBase home's
- configuration directory. We'll refer to this file's
- full pathname as <filename>$HBASE_SERVER_CONF</filename> below.</para>
-
- <para>Modify your hbase-env.sh to include the following:</para>
-
- <programlisting>
- export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
- export HBASE_MANAGES_ZK=false
- export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
- export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
- </programlisting>
-
-
- <para>Modify your <filename>hbase-site.xml</filename> on each node
- that will run a master or regionserver to contain:</para>
-
- <programlisting><![CDATA[
- <configuration>
- <property>
- <name>hbase.zookeeper.quorum</name>
- <value>$ZK_NODES</value>
- </property>
- <property>
- <name>hbase.cluster.distributed</name>
- <value>true</value>
- </property>
- </configuration>
- ]]>
- </programlisting>
-
- <para>where <code>$ZK_NODES</code> is the
- comma-separated list of hostnames of the Zookeeper
- Quorum hosts.</para>
-
- <para>
- Add a <filename>zoo.cfg</filename> for each Zookeeper Quorum host containing:
- <programlisting>
- authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
- kerberos.removeHostFromPrincipal=true
- kerberos.removeRealmFromPrincipal=true
- </programlisting>
-
- Also on each of these hosts, create a JAAS configuration file containing:
-
- <programlisting>
- Server {
- com.sun.security.auth.module.Krb5LoginModule required
- useKeyTab=true
- keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
- storeKey=true
- useTicketCache=false
- principal="zookeeper/$HOST";
- };
- </programlisting>
-
- where <code>$HOST</code> is the hostname of each
- Quorum host. We will refer to the full pathname of
- this file as <filename>$ZK_SERVER_CONF</filename> below.
-
- </para>
-
- <para>
- Start your Zookeepers on each Zookeeper Quorum host with:
-
- <programlisting>
- SERVER_JVMFLAGS="-Djava.security.auth.login.config=$ZK_SERVER_CONF" bin/zkServer start
- </programlisting>
-
- </para>
-
- <para>
- Start your HBase cluster by running one or more of the following set of commands on the appropriate nodes:
- </para>
-
- <programlisting>
- bin/hbase master start
- bin/hbase regionserver start
- </programlisting>
-
-
- </section>
-
- <section>
- <title>Zookeeper Server Authentication Log Output</title>
- <para>If the configuration above is successful,
- you should see something similar to the following in
- your Zookeeper server logs:
- <programlisting>
-11/12/05 22:43:39 INFO zookeeper.Login: successfully logged in.
-11/12/05 22:43:39 INFO server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
-11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh thread started.
-11/12/05 22:43:39 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:39 UTC 2011
-11/12/05 22:43:39 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:39 UTC 2011
-11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:36:42 UTC 2011
-..
-11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler:
- Successfully authenticated client: authenticationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN;
- authorizationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN.
-11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: Setting authorizedID: hbase
-11/12/05 22:43:59 INFO server.ZooKeeperServer: adding SASL authorization for authorizationID: hbase
- </programlisting>
-
- </para>
-
- </section>
-
- <section>
- <title>Zookeeper Client Authentication Log Output</title>
- <para>On the Zookeeper client side (HBase master or regionserver),
- you should see something similar to the following:
-
- <programlisting>
-11/12/05 22:43:59 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ip-10-166-175-249.us-west-1.compute.internal:2181 sessionTimeout=180000 watcher=master:60000
-11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Opening socket connection to server /10.166.175.249:2181
-11/12/05 22:43:59 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 14851@ip-10-166-175-249
-11/12/05 22:43:59 INFO zookeeper.Login: successfully logged in.
-11/12/05 22:43:59 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
-11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh thread started.
-11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, initiating session
-11/12/05 22:43:59 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:59 UTC 2011
-11/12/05 22:43:59 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:59 UTC 2011
-11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:30:37 UTC 2011
-11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, sessionid = 0x134106594320000, negotiated timeout = 180000
- </programlisting>
- </para>
- </section>
-
- <section>
- <title>Configuration from Scratch</title>
-
- This has been tested on the current standard Amazon
- Linux AMI. First setup KDC and principals as
- described above. Next checkout code and run a sanity
- check.
-
- <programlisting>
- git clone git://git.apache.org/hbase.git
- cd hbase
- mvn -PlocalTests clean test -Dtest=TestZooKeeperACL
- </programlisting>
-
- Then configure HBase as described above.
- Manually edit target/cached_classpath.txt (see below)..
-
- <programlisting>
- bin/hbase zookeeper &
- bin/hbase master &
- bin/hbase regionserver &
- </programlisting>
- </section>
-
-
- <section>
- <title>Future improvements</title>
-
- <section><title>Fix target/cached_classpath.txt</title>
- <para>
- You must override the standard hadoop-core jar file from the
- <code>target/cached_classpath.txt</code>
- file with the version containing the HADOOP-7070 fix. You can use the following script to do this:
-
- <programlisting>
- echo `find ~/.m2 -name "*hadoop-core*7070*SNAPSHOT.jar"` ':' `cat target/cached_classpath.txt` | sed 's/ //g' > target/tmp.txt
- mv target/tmp.txt target/cached_classpath.txt
- </programlisting>
-
- </para>
-
- </section>
-
- <section>
- <title>Set JAAS configuration
- programmatically</title>
-
-
- This would avoid the need for a separate Hadoop jar
- that fixes <link xlink:href="https://issues.apache.org/jira/browse/HADOOP-7070">HADOOP-7070</link>.
- </section>
-
- <section>
- <title>Elimination of
- <code>kerberos.removeHostFromPrincipal</code> and
- <code>kerberos.removeRealmFromPrincipal</code></title>
- </section>
-
- </section>
-
-
- </section> <!-- SASL Authentication with ZooKeeper -->
-
-
-
-
-
- </section> <!-- zookeeper -->
<section xml:id="config.files">
@@ -1704,34 +1151,4 @@ of all regions.
</section> <!-- important config -->
- <section xml:id="config.bloom">
- <title>Bloom Filter Configuration</title>
- <section>
- <title><varname>io.hfile.bloom.enabled</varname> global kill
- switch</title>
-
- <para><code>io.hfile.bloom.enabled</code> in
- <classname>Configuration</classname> serves as the kill switch in case
- something goes wrong. Default = <varname>true</varname>.</para>
- </section>
-
- <section>
- <title><varname>io.hfile.bloom.error.rate</varname></title>
-
- <para><varname>io.hfile.bloom.error.rate</varname> = average false
- positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
- bit per bloom entry.</para>
- </section>
-
- <section>
- <title><varname>io.hfile.bloom.max.fold</varname></title>
-
- <para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
- fold rate. Most people should leave this alone. Default = 7, or can
- collapse to at least 1/128th of original size. See the
- <emphasis>Development Process</emphasis> section of the document <link
- xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
- in HBase</link> for more on what this option means.</para>
- </section>
- </section>
</chapter>
Modified: hbase/trunk/src/docbkx/getting_started.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/getting_started.xml?rev=1389153&r1=1389152&r2=1389153&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/getting_started.xml (original)
+++ hbase/trunk/src/docbkx/getting_started.xml Sun Sep 23 22:01:16 2012
@@ -33,8 +33,9 @@
<para><xref linkend="quickstart" /> will get you up and
running on a single-node instance of HBase using the local filesystem.
- <xref linkend="configuration" /> describes setup
- of HBase in distributed mode running on top of HDFS.</para>
+ <xref linkend="configuration" /> describes basic system
+ requirements and configuration running HBase in distributed mode
+ on top of HDFS.</para>
</section>
<section xml:id="quickstart">
@@ -51,7 +52,7 @@
<para>Choose a download site from this list of <link
xlink:href="http://www.apache.org/dyn/closer.cgi/hbase/">Apache Download
- Mirrors</link>. Click on suggested top link. This will take you to a
+ Mirrors</link>. Click on the suggested top link. This will take you to a
mirror of <emphasis>HBase Releases</emphasis>. Click on the folder named
<filename>stable</filename> and then download the file that ends in
<filename>.tar.gz</filename> to your local filesystem; e.g.
@@ -65,24 +66,21 @@ $ cd hbase-<?eval ${project.version}?>
</programlisting></para>
<para>At this point, you are ready to start HBase. But before starting
- it, you might want to edit <filename>conf/hbase-site.xml</filename> and
- set the directory you want HBase to write to,
- <varname>hbase.rootdir</varname>. <programlisting>
-
-<?xml version="1.0"?>
+ it, you might want to edit <filename>conf/hbase-site.xml</filename>, the
+ file you write your site-specific configurations into, and
+ set <varname>hbase.rootdir</varname>, the directory HBase writes data to,
+<programlisting><?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///DIRECTORY/hbase</value>
</property>
-</configuration>
-
-</programlisting> Replace <varname>DIRECTORY</varname> in the above with a
- path to a directory where you want HBase to store its data. By default,
+</configuration></programlisting> Replace <varname>DIRECTORY</varname> in the above with the
+ path to the directory where you want HBase to store its data. By default,
<varname>hbase.rootdir</varname> is set to
<filename>/tmp/hbase-${user.name}</filename> which means you'll lose all
- your data whenever your server reboots (Most operating systems clear
+ your data whenever your server reboots unless you change it (Most operating systems clear
<filename>/tmp</filename> on restart).</para>
</section>
@@ -96,7 +94,7 @@ starting Master, logging to logs/hbase-u
standalone mode, HBase runs all daemons in the the one JVM; i.e. both
the HBase and ZooKeeper daemons. HBase logs can be found in the
<filename>logs</filename> subdirectory. Check them out especially if
- HBase had trouble starting.</para>
+ it seems HBase had trouble starting.</para>
<note>
<title>Is <application>java</application> installed?</title>
@@ -108,7 +106,7 @@ starting Master, logging to logs/hbase-u
options the java program takes (HBase requires java 6). If this is not
the case, HBase will not start. Install java, edit
<filename>conf/hbase-env.sh</filename>, uncommenting the
- <envar>JAVA_HOME</envar> line pointing it to your java install. Then,
+ <envar>JAVA_HOME</envar> line pointing it to your java install, then,
retry the steps above.</para>
</note>
</section>
@@ -154,9 +152,7 @@ hbase(main):006:0> put 'test', 'row3'
<varname>cf</varname> in this example -- followed by a colon and then a
column qualifier suffix (<varname>a</varname> in this case).</para>
- <para>Verify the data insert.</para>
-
- <para>Run a scan of the table by doing the following</para>
+ <para>Verify the data insert by running a scan of the table as follows</para>
<para><programlisting>hbase(main):007:0> scan 'test'
ROW COLUMN+CELL
@@ -165,7 +161,7 @@ row2 column=cf:b, timestamp=128838
row3 column=cf:c, timestamp=1288380747365, value=value3
3 row(s) in 0.0590 seconds</programlisting></para>
- <para>Get a single row as follows</para>
+ <para>Get a single row</para>
<para><programlisting>hbase(main):008:0> get 'test', 'row1'
COLUMN CELL
@@ -198,9 +194,9 @@ stopping hbase...............</programli
<title>Where to go next</title>
<para>The above described standalone setup is good for testing and
- experiments only. Next move on to <xref linkend="configuration" /> where we'll go into
- depth on the different HBase run modes, requirements and critical
- configurations needed setting up a distributed HBase deploy.</para>
+ experiments only. In the next chapter, <xref linkend="configuration" />,
+ we'll go into depth on the different HBase run modes, system requirements
+ running HBase, and critical configurations setting up a distributed HBase deploy.</para>
</section>
</section>
Modified: hbase/trunk/src/docbkx/performance.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/performance.xml?rev=1389153&r1=1389152&r2=1389153&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/performance.xml (original)
+++ hbase/trunk/src/docbkx/performance.xml Sun Sep 23 22:01:16 2012
@@ -526,6 +526,96 @@ htable.close();</programlisting></para>
too few regions then the reads could likely be served from too few nodes. </para>
<para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para>
</section>
+ <section xml:id="blooms">
+ <title>Bloom Filters</title>
+ <para>Enabling Bloom Filters can save your having to go to disk and
+ can help improve read latencys.</para>
+ <para><link xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed over in <link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200
+ Add bloomfilters</link>.<footnote>
+ <para>For description of the development process -- why static blooms
+ rather than dynamic -- and for an overview of the unique properties
+ that pertain to blooms in HBase, as well as possible future
+ directions, see the <emphasis>Development Process</emphasis> section
+ of the document <link
+ xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
+ in HBase</link> attached to <link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para>
+ </footnote><footnote>
+ <para>The bloom filters described here are actually version two of
+ blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
+ option based on work done by the <link
+ xlink:href="http://www.one-lab.org">European Commission One-Lab
+ Project 034819</link>. The core of the HBase bloom work was later
+ pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
+ Version 1 of HBase blooms never worked that well. Version 2 is a
+ rewrite from scratch though again it starts with the one-lab
+ work.</para>
+ </footnote></para>
+ <para>See also <xref linkend="schema.bloom" />.
+ </para>
+
+ <section xml:id="bloom_footprint">
+ <title>Bloom StoreFile footprint</title>
+
+ <para>Bloom filters add an entry to the <classname>StoreFile</classname>
+ general <classname>FileInfo</classname> data structure and then two
+ extra entries to the <classname>StoreFile</classname> metadata
+ section.</para>
+
+ <section>
+ <title>BloomFilter in the <classname>StoreFile</classname>
+ <classname>FileInfo</classname> data structure</title>
+
+ <para><classname>FileInfo</classname> has a
+ <varname>BLOOM_FILTER_TYPE</varname> entry which is set to
+ <varname>NONE</varname>, <varname>ROW</varname> or
+ <varname>ROWCOL.</varname></para>
+ </section>
+
+ <section>
+ <title>BloomFilter entries in <classname>StoreFile</classname>
+ metadata</title>
+
+ <para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash
+ Function used, etc. Its small in size and is cached on
+ <classname>StoreFile.Reader</classname> load</para>
+ <para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter
+ data. Obtained on-demand. Stored in the LRU cache, if it is enabled
+ (Its enabled by default).</para>
+ </section>
+ </section>
+ <section xml:id="config.bloom">
+ <title>Bloom Filter Configuration</title>
+ <section>
+ <title><varname>io.hfile.bloom.enabled</varname> global kill
+ switch</title>
+
+ <para><code>io.hfile.bloom.enabled</code> in
+ <classname>Configuration</classname> serves as the kill switch in case
+ something goes wrong. Default = <varname>true</varname>.</para>
+ </section>
+
+ <section>
+ <title><varname>io.hfile.bloom.error.rate</varname></title>
+
+ <para><varname>io.hfile.bloom.error.rate</varname> = average false
+ positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
+ bit per bloom entry.</para>
+ </section>
+
+ <section>
+ <title><varname>io.hfile.bloom.max.fold</varname></title>
+
+ <para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
+ fold rate. Most people should leave this alone. Default = 7, or can
+ collapse to at least 1/128th of original size. See the
+ <emphasis>Development Process</emphasis> section of the document <link
+ xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
+ in HBase</link> for more on what this option means.</para>
+ </section>
+ </section>
+ </section> <!-- bloom -->
</section> <!-- reading -->
Added: hbase/trunk/src/docbkx/zookeeper.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/zookeeper.xml?rev=1389153&view=auto
==============================================================================
--- hbase/trunk/src/docbkx/zookeeper.xml (added)
+++ hbase/trunk/src/docbkx/zookeeper.xml Sun Sep 23 22:01:16 2012
@@ -0,0 +1,586 @@
+<?xml version="1.0"?>
+ <chapter xml:id="zookeeper"
+ version="5.0" xmlns="http://docbook.org/ns/docbook"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns:m="http://www.w3.org/1998/Math/MathML"
+ xmlns:html="http://www.w3.org/1999/xhtml"
+ xmlns:db="http://docbook.org/ns/docbook">
+<!--
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+-->
+
+ <title>ZooKeeper<indexterm>
+ <primary>ZooKeeper</primary>
+ </indexterm></title>
+
+ <para>A distributed HBase depends on a running ZooKeeper cluster.
+ All participating nodes and clients need to be able to access the
+ running ZooKeeper ensemble. HBase by default manages a ZooKeeper
+ "cluster" for you. It will start and stop the ZooKeeper ensemble
+ as part of the HBase start/stop process. You can also manage the
+ ZooKeeper ensemble independent of HBase and just point HBase at
+ the cluster it should use. To toggle HBase management of
+ ZooKeeper, use the <varname>HBASE_MANAGES_ZK</varname> variable in
+ <filename>conf/hbase-env.sh</filename>. This variable, which
+ defaults to <varname>true</varname>, tells HBase whether to
+ start/stop the ZooKeeper ensemble servers as part of HBase
+ start/stop.</para>
+
+ <para>When HBase manages the ZooKeeper ensemble, you can specify
+ ZooKeeper configuration using its native
+ <filename>zoo.cfg</filename> file, or, the easier option is to
+ just specify ZooKeeper options directly in
+ <filename>conf/hbase-site.xml</filename>. A ZooKeeper
+ configuration option can be set as a property in the HBase
+ <filename>hbase-site.xml</filename> XML configuration file by
+ prefacing the ZooKeeper option name with
+ <varname>hbase.zookeeper.property</varname>. For example, the
+ <varname>clientPort</varname> setting in ZooKeeper can be changed
+ by setting the
+ <varname>hbase.zookeeper.property.clientPort</varname> property.
+ For all default values used by HBase, including ZooKeeper
+ configuration, see <xref linkend="hbase_default_configurations" />. Look for the
+ <varname>hbase.zookeeper.property</varname> prefix <footnote>
+ <para>For the full list of ZooKeeper configurations, see
+ ZooKeeper's <filename>zoo.cfg</filename>. HBase does not ship
+ with a <filename>zoo.cfg</filename> so you will need to browse
+ the <filename>conf</filename> directory in an appropriate
+ ZooKeeper download.</para>
+ </footnote></para>
+
+ <para>You must at least list the ensemble servers in
+ <filename>hbase-site.xml</filename> using the
+ <varname>hbase.zookeeper.quorum</varname> property. This property
+ defaults to a single ensemble member at
+ <varname>localhost</varname> which is not suitable for a fully
+ distributed HBase. (It binds to the local machine only and remote
+ clients will not be able to connect). <note xml:id="how_many_zks">
+ <title>How many ZooKeepers should I run?</title>
+
+ <para>You can run a ZooKeeper ensemble that comprises 1 node
+ only but in production it is recommended that you run a
+ ZooKeeper ensemble of 3, 5 or 7 machines; the more members an
+ ensemble has, the more tolerant the ensemble is of host
+ failures. Also, run an odd number of machines. In ZooKeeper,
+ an even number of peers is supported, but it is normally not used
+ because an even sized ensemble requires, proportionally, more peers
+ to form a quorum than an odd sized ensemble requires. For example, an
+ ensemble with 4 peers requires 3 to form a quorum, while an ensemble with
+ 5 also requires 3 to form a quorum. Thus, an ensemble of 5 allows 2 peers to
+ fail, and thus is more fault tolerant than the ensemble of 4, which allows
+ only 1 down peer.
+ </para>
+ <para>Give each ZooKeeper server around 1GB of RAM, and if possible, its own
+ dedicated disk (A dedicated disk is the best thing you can do
+ to ensure a performant ZooKeeper ensemble). For very heavily
+ loaded clusters, run ZooKeeper servers on separate machines
+ from RegionServers (DataNodes and TaskTrackers).</para>
+ </note></para>
+
+ <para>For example, to have HBase manage a ZooKeeper quorum on
+ nodes <emphasis>rs{1,2,3,4,5}.example.com</emphasis>, bound to
+ port 2222 (the default is 2181) ensure
+ <varname>HBASE_MANAGE_ZK</varname> is commented out or set to
+ <varname>true</varname> in <filename>conf/hbase-env.sh</filename>
+ and then edit <filename>conf/hbase-site.xml</filename> and set
+ <varname>hbase.zookeeper.property.clientPort</varname> and
+ <varname>hbase.zookeeper.quorum</varname>. You should also set
+ <varname>hbase.zookeeper.property.dataDir</varname> to other than
+ the default as the default has ZooKeeper persist data under
+ <filename>/tmp</filename> which is often cleared on system
+ restart. In the example below we have ZooKeeper persist to
+ <filename>/user/local/zookeeper</filename>. <programlisting>
+ <configuration>
+ ...
+ <property>
+ <name>hbase.zookeeper.property.clientPort</name>
+ <value>2222</value>
+ <description>Property from ZooKeeper's config zoo.cfg.
+ The port at which the clients will connect.
+ </description>
+ </property>
+ <property>
+ <name>hbase.zookeeper.quorum</name>
+ <value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value>
+ <description>Comma separated list of servers in the ZooKeeper Quorum.
+ For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
+ By default this is set to localhost for local and pseudo-distributed modes
+ of operation. For a fully-distributed setup, this should be set to a full
+ list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
+ this is the list of servers which we will start/stop ZooKeeper on.
+ </description>
+ </property>
+ <property>
+ <name>hbase.zookeeper.property.dataDir</name>
+ <value>/usr/local/zookeeper</value>
+ <description>Property from ZooKeeper's config zoo.cfg.
+ The directory where the snapshot is stored.
+ </description>
+ </property>
+ ...
+ </configuration></programlisting></para>
+
+ <section>
+ <title>Using existing ZooKeeper ensemble</title>
+
+ <para>To point HBase at an existing ZooKeeper cluster, one that
+ is not managed by HBase, set <varname>HBASE_MANAGES_ZK</varname>
+ in <filename>conf/hbase-env.sh</filename> to false
+ <programlisting>
+ ...
+ # Tell HBase whether it should manage its own instance of Zookeeper or not.
+ export HBASE_MANAGES_ZK=false</programlisting> Next set ensemble locations
+ and client port, if non-standard, in
+ <filename>hbase-site.xml</filename>, or add a suitably
+ configured <filename>zoo.cfg</filename> to HBase's
+ <filename>CLASSPATH</filename>. HBase will prefer the
+ configuration found in <filename>zoo.cfg</filename> over any
+ settings in <filename>hbase-site.xml</filename>.</para>
+
+ <para>When HBase manages ZooKeeper, it will start/stop the
+ ZooKeeper servers as a part of the regular start/stop scripts.
+ If you would like to run ZooKeeper yourself, independent of
+ HBase start/stop, you would do the following</para>
+
+ <programlisting>
+${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
+</programlisting>
+
+ <para>Note that you can use HBase in this manner to spin up a
+ ZooKeeper cluster, unrelated to HBase. Just make sure to set
+ <varname>HBASE_MANAGES_ZK</varname> to <varname>false</varname>
+ if you want it to stay up across HBase restarts so that when
+ HBase shuts down, it doesn't take ZooKeeper down with it.</para>
+
+ <para>For more information about running a distinct ZooKeeper
+ cluster, see the ZooKeeper <link
+ xlink:href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting
+ Started Guide</link>. Additionally, see the <link xlink:href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7">ZooKeeper Wiki</link> or the
+ <link xlink:href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup">ZooKeeper documentation</link>
+ for more information on ZooKeeper sizing.
+ </para>
+ </section>
+
+
+ <section xml:id="zk.sasl.auth">
+ <title>SASL Authentication with ZooKeeper</title>
+ <para>Newer releases of HBase (>= 0.92) will
+ support connecting to a ZooKeeper Quorum that supports
+ SASL authentication (which is available in Zookeeper
+ versions 3.4.0 or later).</para>
+
+ <para>This describes how to set up HBase to mutually
+ authenticate with a ZooKeeper Quorum. ZooKeeper/HBase
+ mutual authentication (<link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-2418">HBASE-2418</link>)
+ is required as part of a complete secure HBase configuration
+ (<link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-3025">HBASE-3025</link>).
+
+ For simplicity of explication, this section ignores
+ additional configuration required (Secure HDFS and Coprocessor
+ configuration). It's recommended to begin with an
+ HBase-managed Zookeeper configuration (as opposed to a
+ standalone Zookeeper quorum) for ease of learning.
+ </para>
+
+ <section><title>Operating System Prerequisites</title></section>
+
+ <para>
+ You need to have a working Kerberos KDC setup. For
+ each <code>$HOST</code> that will run a ZooKeeper
+ server, you should have a principle
+ <code>zookeeper/$HOST</code>. For each such host,
+ add a service key (using the <code>kadmin</code> or
+ <code>kadmin.local</code> tool's <code>ktadd</code>
+ command) for <code>zookeeper/$HOST</code> and copy
+ this file to <code>$HOST</code>, and make it
+ readable only to the user that will run zookeeper on
+ <code>$HOST</code>. Note the location of this file,
+ which we will use below as
+ <filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename>.
+ </para>
+
+ <para>
+ Similarly, for each <code>$HOST</code> that will run
+ an HBase server (master or regionserver), you should
+ have a principle: <code>hbase/$HOST</code>. For each
+ host, add a keytab file called
+ <filename>hbase.keytab</filename> containing a service
+ key for <code>hbase/$HOST</code>, copy this file to
+ <code>$HOST</code>, and make it readable only to the
+ user that will run an HBase service on
+ <code>$HOST</code>. Note the location of this file,
+ which we will use below as
+ <filename>$PATH_TO_HBASE_KEYTAB</filename>.
+ </para>
+
+ <para>
+ Each user who will be an HBase client should also be
+ given a Kerberos principal. This principal should
+ usually have a password assigned to it (as opposed to,
+ as with the HBase servers, a keytab file) which only
+ this user knows. The client's principal's
+ <code>maxrenewlife</code> should be set so that it can
+ be renewed enough so that the user can complete their
+ HBase client processes. For example, if a user runs a
+ long-running HBase client process that takes at most 3
+ days, we might create this user's principal within
+ <code>kadmin</code> with: <code>addprinc -maxrenewlife
+ 3days</code>. The Zookeeper client and server
+ libraries manage their own ticket refreshment by
+ running threads that wake up periodically to do the
+ refreshment.
+ </para>
+
+ <para>On each host that will run an HBase client
+ (e.g. <code>hbase shell</code>), add the following
+ file to the HBase home directory's <filename>conf</filename>
+ directory:</para>
+
+ <programlisting>
+ Client {
+ com.sun.security.auth.module.Krb5LoginModule required
+ useKeyTab=false
+ useTicketCache=true;
+ };
+ </programlisting>
+
+ <para>We'll refer to this JAAS configuration file as
+ <filename>$CLIENT_CONF</filename> below.</para>
+
+ <section>
+ <title>HBase-managed Zookeeper Configuration</title>
+
+ <para>On each node that will run a zookeeper, a
+ master, or a regionserver, create a <link
+ xlink:href="http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html">JAAS</link>
+ configuration file in the conf directory of the node's
+ <filename>HBASE_HOME</filename> directory that looks like the
+ following:</para>
+
+ <programlisting>
+ Server {
+ com.sun.security.auth.module.Krb5LoginModule required
+ useKeyTab=true
+ keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
+ storeKey=true
+ useTicketCache=false
+ principal="zookeeper/$HOST";
+ };
+ Client {
+ com.sun.security.auth.module.Krb5LoginModule required
+ useKeyTab=true
+ useTicketCache=false
+ keyTab="$PATH_TO_HBASE_KEYTAB"
+ principal="hbase/$HOST";
+ };
+ </programlisting>
+
+ where the <filename>$PATH_TO_HBASE_KEYTAB</filename> and
+ <filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename> files are what
+ you created above, and <code>$HOST</code> is the hostname for that
+ node.
+
+ <para>The <code>Server</code> section will be used by
+ the Zookeeper quorum server, while the
+ <code>Client</code> section will be used by the HBase
+ master and regionservers. The path to this file should
+ be substituted for the text <filename>$HBASE_SERVER_CONF</filename>
+ in the <filename>hbase-env.sh</filename>
+ listing below.</para>
+
+ <para>
+ The path to this file should be substituted for the
+ text <filename>$CLIENT_CONF</filename> in the
+ <filename>hbase-env.sh</filename> listing below.
+ </para>
+
+ <para>Modify your <filename>hbase-env.sh</filename> to include the
+ following:</para>
+
+ <programlisting>
+ export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
+ export HBASE_MANAGES_ZK=true
+ export HBASE_ZOOKEEPER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
+ export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
+ export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
+ </programlisting>
+
+ where <filename>$HBASE_SERVER_CONF</filename> and
+ <filename>$CLIENT_CONF</filename> are the full paths to the
+ JAAS configuration files created above.
+
+ <para>Modify your <filename>hbase-site.xml</filename> on each node
+ that will run zookeeper, master or regionserver to contain:</para>
+
+ <programlisting><![CDATA[
+ <configuration>
+ <property>
+ <name>hbase.zookeeper.quorum</name>
+ <value>$ZK_NODES</value>
+ </property>
+ <property>
+ <name>hbase.cluster.distributed</name>
+ <value>true</value>
+ </property>
+ <property>
+ <name>hbase.zookeeper.property.authProvider.1</name>
+ <value>org.apache.zookeeper.server.auth.SASLAuthenticationProvider</value>
+ </property>
+ <property>
+ <name>hbase.zookeeper.property.kerberos.removeHostFromPrincipal</name>
+ <value>true</value>
+ </property>
+ <property>
+ <name>hbase.zookeeper.property.kerberos.removeRealmFromPrincipal</name>
+ <value>true</value>
+ </property>
+ </configuration>
+ ]]></programlisting>
+
+ <para>where <code>$ZK_NODES</code> is the
+ comma-separated list of hostnames of the Zookeeper
+ Quorum hosts.</para>
+
+ <para>Start your hbase cluster by running one or more
+ of the following set of commands on the appropriate
+ hosts:
+ </para>
+
+ <programlisting>
+ bin/hbase zookeeper start
+ bin/hbase master start
+ bin/hbase regionserver start
+ </programlisting>
+
+ </section>
+
+ <section><title>External Zookeeper Configuration</title>
+ <para>Add a JAAS configuration file that looks like:
+
+ <programlisting>
+ Client {
+ com.sun.security.auth.module.Krb5LoginModule required
+ useKeyTab=true
+ useTicketCache=false
+ keyTab="$PATH_TO_HBASE_KEYTAB"
+ principal="hbase/$HOST";
+ };
+ </programlisting>
+
+ where the <filename>$PATH_TO_HBASE_KEYTAB</filename> is the keytab
+ created above for HBase services to run on this host, and <code>$HOST</code> is the
+ hostname for that node. Put this in the HBase home's
+ configuration directory. We'll refer to this file's
+ full pathname as <filename>$HBASE_SERVER_CONF</filename> below.</para>
+
+ <para>Modify your hbase-env.sh to include the following:</para>
+
+ <programlisting>
+ export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
+ export HBASE_MANAGES_ZK=false
+ export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
+ export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
+ </programlisting>
+
+
+ <para>Modify your <filename>hbase-site.xml</filename> on each node
+ that will run a master or regionserver to contain:</para>
+
+ <programlisting><![CDATA[
+ <configuration>
+ <property>
+ <name>hbase.zookeeper.quorum</name>
+ <value>$ZK_NODES</value>
+ </property>
+ <property>
+ <name>hbase.cluster.distributed</name>
+ <value>true</value>
+ </property>
+ </configuration>
+ ]]>
+ </programlisting>
+
+ <para>where <code>$ZK_NODES</code> is the
+ comma-separated list of hostnames of the Zookeeper
+ Quorum hosts.</para>
+
+ <para>
+ Add a <filename>zoo.cfg</filename> for each Zookeeper Quorum host containing:
+ <programlisting>
+ authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
+ kerberos.removeHostFromPrincipal=true
+ kerberos.removeRealmFromPrincipal=true
+ </programlisting>
+
+ Also on each of these hosts, create a JAAS configuration file containing:
+
+ <programlisting>
+ Server {
+ com.sun.security.auth.module.Krb5LoginModule required
+ useKeyTab=true
+ keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
+ storeKey=true
+ useTicketCache=false
+ principal="zookeeper/$HOST";
+ };
+ </programlisting>
+
+ where <code>$HOST</code> is the hostname of each
+ Quorum host. We will refer to the full pathname of
+ this file as <filename>$ZK_SERVER_CONF</filename> below.
+
+ </para>
+
+ <para>
+ Start your Zookeepers on each Zookeeper Quorum host with:
+
+ <programlisting>
+ SERVER_JVMFLAGS="-Djava.security.auth.login.config=$ZK_SERVER_CONF" bin/zkServer start
+ </programlisting>
+
+ </para>
+
+ <para>
+ Start your HBase cluster by running one or more of the following set of commands on the appropriate nodes:
+ </para>
+
+ <programlisting>
+ bin/hbase master start
+ bin/hbase regionserver start
+ </programlisting>
+
+
+ </section>
+
+ <section>
+ <title>Zookeeper Server Authentication Log Output</title>
+ <para>If the configuration above is successful,
+ you should see something similar to the following in
+ your Zookeeper server logs:
+ <programlisting>
+11/12/05 22:43:39 INFO zookeeper.Login: successfully logged in.
+11/12/05 22:43:39 INFO server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
+11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh thread started.
+11/12/05 22:43:39 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:39 UTC 2011
+11/12/05 22:43:39 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:39 UTC 2011
+11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:36:42 UTC 2011
+..
+11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler:
+ Successfully authenticated client: authenticationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN;
+ authorizationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN.
+11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: Setting authorizedID: hbase
+11/12/05 22:43:59 INFO server.ZooKeeperServer: adding SASL authorization for authorizationID: hbase
+ </programlisting>
+
+ </para>
+
+ </section>
+
+ <section>
+ <title>Zookeeper Client Authentication Log Output</title>
+ <para>On the Zookeeper client side (HBase master or regionserver),
+ you should see something similar to the following:
+
+ <programlisting>
+11/12/05 22:43:59 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ip-10-166-175-249.us-west-1.compute.internal:2181 sessionTimeout=180000 watcher=master:60000
+11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Opening socket connection to server /10.166.175.249:2181
+11/12/05 22:43:59 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 14851@ip-10-166-175-249
+11/12/05 22:43:59 INFO zookeeper.Login: successfully logged in.
+11/12/05 22:43:59 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
+11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh thread started.
+11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, initiating session
+11/12/05 22:43:59 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:59 UTC 2011
+11/12/05 22:43:59 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:59 UTC 2011
+11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:30:37 UTC 2011
+11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, sessionid = 0x134106594320000, negotiated timeout = 180000
+ </programlisting>
+ </para>
+ </section>
+
+ <section>
+ <title>Configuration from Scratch</title>
+
+ This has been tested on the current standard Amazon
+ Linux AMI. First setup KDC and principals as
+ described above. Next checkout code and run a sanity
+ check.
+
+ <programlisting>
+ git clone git://git.apache.org/hbase.git
+ cd hbase
+ mvn -PlocalTests clean test -Dtest=TestZooKeeperACL
+ </programlisting>
+
+ Then configure HBase as described above.
+ Manually edit target/cached_classpath.txt (see below)..
+
+ <programlisting>
+ bin/hbase zookeeper &
+ bin/hbase master &
+ bin/hbase regionserver &
+ </programlisting>
+ </section>
+
+
+ <section>
+ <title>Future improvements</title>
+
+ <section><title>Fix target/cached_classpath.txt</title>
+ <para>
+ You must override the standard hadoop-core jar file from the
+ <code>target/cached_classpath.txt</code>
+ file with the version containing the HADOOP-7070 fix. You can use the following script to do this:
+
+ <programlisting>
+ echo `find ~/.m2 -name "*hadoop-core*7070*SNAPSHOT.jar"` ':' `cat target/cached_classpath.txt` | sed 's/ //g' > target/tmp.txt
+ mv target/tmp.txt target/cached_classpath.txt
+ </programlisting>
+
+ </para>
+
+ </section>
+
+ <section>
+ <title>Set JAAS configuration
+ programmatically</title>
+
+
+ This would avoid the need for a separate Hadoop jar
+ that fixes <link xlink:href="https://issues.apache.org/jira/browse/HADOOP-7070">HADOOP-7070</link>.
+ </section>
+
+ <section>
+ <title>Elimination of
+ <code>kerberos.removeHostFromPrincipal</code> and
+ <code>kerberos.removeRealmFromPrincipal</code></title>
+ </section>
+
+ </section>
+
+
+ </section> <!-- SASL Authentication with ZooKeeper -->
+
+
+
+
+ </chapter>