You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2011/03/15 23:23:12 UTC
svn commit: r1081966 [2/2] - in /hbase/trunk/src/docbkx: book.xml
configuration.xml getting_started.xml performance.xml preface.xml shell.xml
upgrading.xml
Added: hbase/trunk/src/docbkx/getting_started.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/getting_started.xml?rev=1081966&view=auto
==============================================================================
--- hbase/trunk/src/docbkx/getting_started.xml (added)
+++ hbase/trunk/src/docbkx/getting_started.xml Tue Mar 15 22:23:12 2011
@@ -0,0 +1,853 @@
+<?xml version="1.0"?>
+ <chapter xml:id="getting_started"
+ version="5.0" xmlns="http://docbook.org/ns/docbook"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns:m="http://www.w3.org/1998/Math/MathML"
+ xmlns:html="http://www.w3.org/1999/xhtml"
+ xmlns:db="http://docbook.org/ns/docbook">
+ <title>Getting Started</title>
+ <section >
+ <title>Introduction</title>
+ <para>
+ <link linkend="quickstart">Quick Start</link> will get you up and running
+ on a single-node instance of HBase using the local filesystem.
+ The <link linkend="notsoquick">Not-so-quick Start Guide</link>
+ describes setup of HBase in distributed mode running on top of HDFS.
+ </para>
+ </section>
+
+ <section xml:id="quickstart">
+ <title>Quick Start</title>
+
+ <para>This guide describes setup of a standalone HBase
+ instance that uses the local filesystem. It leads you
+ through creating a table, inserting rows via the
+ <link linkend="shell">HBase Shell</link>, and then cleaning up and shutting
+ down your standalone HBase instance.
+ The below exercise should take no more than
+ ten minutes (not including download time).
+ </para>
+
+ <section>
+ <title>Download and unpack the latest stable release.</title>
+
+ <para>Choose a download site from this list of <link
+ xlink:href="http://www.apache.org/dyn/closer.cgi/hbase/">Apache
+ Download Mirrors</link>. Click on suggested top link. This will take you to a
+ mirror of <emphasis>HBase Releases</emphasis>. Click on
+ the folder named <filename>stable</filename> and then download the
+ file that ends in <filename>.tar.gz</filename> to your local filesystem;
+ e.g. <filename>hbase-<?eval ${project.version}?>.tar.gz</filename>.</para>
+
+ <para>Decompress and untar your download and then change into the
+ unpacked directory.</para>
+
+ <para><programlisting>$ tar xfz hbase-<?eval ${project.version}?>.tar.gz
+$ cd hbase-<?eval ${project.version}?>
+</programlisting></para>
+
+<para>
+ At this point, you are ready to start HBase. But before starting it,
+ you might want to edit <filename>conf/hbase-site.xml</filename>
+ and set the directory you want HBase to write to,
+ <varname>hbase.rootdir</varname>.
+ <programlisting>
+<![CDATA[
+<?xml version="1.0"?>
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<configuration>
+ <property>
+ <name>hbase.rootdir</name>
+ <value>file:///DIRECTORY/hbase</value>
+ </property>
+</configuration>
+]]>
+</programlisting>
+Replace <varname>DIRECTORY</varname> in the above with a path to a directory where you want
+HBase to store its data. By default, <varname>hbase.rootdir</varname> is
+set to <filename>/tmp/hbase-${user.name}</filename>
+which means you'll lose all your data whenever your server reboots
+(Most operating systems clear <filename>/tmp</filename> on restart).
+</para>
+</section>
+<section xml:id="start_hbase">
+<title>Start HBase</title>
+
+ <para>Now start HBase:<programlisting>$ ./bin/start-hbase.sh
+starting Master, logging to logs/hbase-user-master-example.org.out</programlisting></para>
+
+ <para>You should
+ now have a running standalone HBase instance. In standalone mode, HBase runs
+ all daemons in the the one JVM; i.e. both the HBase and ZooKeeper daemons.
+ HBase logs can be found in the <filename>logs</filename> subdirectory. Check them
+ out especially if HBase had trouble starting.</para>
+
+ <note>
+ <title>Is <application>java</application> installed?</title>
+ <para>All of the above presumes a 1.6 version of Oracle
+ <application>java</application> is installed on your
+ machine and available on your path; i.e. when you type
+ <application>java</application>, you see output that describes the options
+ the java program takes (HBase requires java 6). If this is
+ not the case, HBase will not start.
+ Install java, edit <filename>conf/hbase-env.sh</filename>, uncommenting the
+ <envar>JAVA_HOME</envar> line pointing it to your java install. Then,
+ retry the steps above.</para>
+ </note>
+ </section>
+
+
+ <section xml:id="shell_exercises">
+ <title>Shell Exercises</title>
+ <para>Connect to your running HBase via the
+ <link linkend="shell">HBase Shell</link>.</para>
+
+ <para><programlisting>$ ./bin/hbase shell
+HBase Shell; enter 'help<RETURN>' for list of supported commands.
+Type "exit<RETURN>" to leave the HBase Shell
+Version: 0.89.20100924, r1001068, Fri Sep 24 13:55:42 PDT 2010
+
+hbase(main):001:0> </programlisting></para>
+
+ <para>Type <command>help</command> and then <command><RETURN></command>
+ to see a listing of shell
+ commands and options. Browse at least the paragraphs at the end of
+ the help emission for the gist of how variables and command
+ arguments are entered into the
+ HBase shell; in particular note how table names, rows, and
+ columns, etc., must be quoted.</para>
+
+ <para>Create a table named <varname>test</varname> with a single
+ <link linkend="columnfamily">column family</link> named <varname>cf</varname>.
+ Verify its creation by listing all tables and then insert some
+ values.</para>
+ <para><programlisting>hbase(main):003:0> create 'test', 'cf'
+0 row(s) in 1.2200 seconds
+hbase(main):003:0> list 'table'
+test
+1 row(s) in 0.0550 seconds
+hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1'
+0 row(s) in 0.0560 seconds
+hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'
+0 row(s) in 0.0370 seconds
+hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
+0 row(s) in 0.0450 seconds</programlisting></para>
+
+ <para>Above we inserted 3 values, one at a time. The first insert is at
+ <varname>row1</varname>, column <varname>cf:a</varname> with a value of
+ <varname>value1</varname>.
+ Columns in HBase are comprised of a
+ <link linkend="columnfamily">column family</link> prefix
+ -- <varname>cf</varname> in this example -- followed by
+ a colon and then a column qualifier suffix (<varname>a</varname> in this case).
+ </para>
+
+ <para>Verify the data insert.</para>
+
+ <para>Run a scan of the table by doing the following</para>
+
+ <para><programlisting>hbase(main):007:0> scan 'test'
+ROW COLUMN+CELL
+row1 column=cf:a, timestamp=1288380727188, value=value1
+row2 column=cf:b, timestamp=1288380738440, value=value2
+row3 column=cf:c, timestamp=1288380747365, value=value3
+3 row(s) in 0.0590 seconds</programlisting></para>
+
+ <para>Get a single row as follows</para>
+
+ <para><programlisting>hbase(main):008:0> get 'test', 'row1'
+COLUMN CELL
+cf:a timestamp=1288380727188, value=value1
+1 row(s) in 0.0400 seconds</programlisting></para>
+
+ <para>Now, disable and drop your table. This will clean up all
+ done above.</para>
+
+ <para><programlisting>hbase(main):012:0> disable 'test'
+0 row(s) in 1.0930 seconds
+hbase(main):013:0> drop 'test'
+0 row(s) in 0.0770 seconds </programlisting></para>
+
+ <para>Exit the shell by typing exit.</para>
+
+ <para><programlisting>hbase(main):014:0> exit</programlisting></para>
+ </section>
+
+ <section xml:id="stopping">
+ <title>Stopping HBase</title>
+ <para>Stop your hbase instance by running the stop script.</para>
+
+ <para><programlisting>$ ./bin/stop-hbase.sh
+stopping hbase...............</programlisting></para>
+ </section>
+
+ <section><title>Where to go next
+ </title>
+ <para>The above described standalone setup is good for testing and experiments only.
+ Move on to the next section, the <link linkend="notsoquick">Not-so-quick Start Guide</link>
+ where we'll go into depth on the different HBase run modes, requirements and critical
+ configurations needed setting up a distributed HBase deploy.
+ </para>
+ </section>
+ </section>
+
+ <section xml:id="notsoquick">
+ <title>Not-so-quick Start Guide</title>
+
+ <section xml:id="requirements"><title>Requirements</title>
+ <para>HBase has the following requirements. Please read the
+ section below carefully and ensure that all requirements have been
+ satisfied. Failure to do so will cause you (and us) grief debugging
+ strange errors and/or data loss.
+ </para>
+
+ <section xml:id="java"><title>java</title>
+<para>
+ Just like Hadoop, HBase requires java 6 from <link xlink:href="http://www.java.com/download/">Oracle</link>.
+Usually you'll want to use the latest version available except the problematic u18 (u22 is the latest version as of this writing).</para>
+</section>
+
+ <section xml:id="hadoop"><title><link xlink:href="http://hadoop.apache.org">hadoop</link><indexterm><primary>Hadoop</primary></indexterm></title>
+<para>This version of HBase will only run on <link xlink:href="http://hadoop.apache.org/common/releases.html">Hadoop 0.20.x</link>.
+ It will not run on hadoop 0.21.x (nor 0.22.x) as of this writing.
+ HBase will lose data unless it is running on an HDFS that has a
+ durable <code>sync</code>. Currently only the
+ <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
+ branch has this attribute
+ <footnote>
+ <para>
+ See <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/CHANGES.txt">CHANGES.txt</link>
+ in branch-0.20-append to see list of patches involved adding append on the Hadoop 0.20 branch.
+ </para>
+ </footnote>.
+ No official releases have been made from this branch up to now
+ so you will have to build your own Hadoop from the tip of this branch.
+ Scroll down in the Hadoop <link xlink:href="http://wiki.apache.org/hadoop/HowToRelease">How To Release</link> to the section
+ <emphasis>Build Requirements</emphasis> for instruction on how to build Hadoop.
+ </para>
+
+ <para>
+ Or rather than build your own, you could use
+ Cloudera's <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link>.
+ CDH has the 0.20-append patches needed to add a durable sync (CDH3 is still in beta.
+ Either CDH3b2 or CDH3b3 will suffice).
+ </para>
+
+ <para>Because HBase depends on Hadoop, it bundles an instance of
+ the Hadoop jar under its <filename>lib</filename> directory.
+ The bundled Hadoop was made from the Apache branch-0.20-append branch
+ at the time of this HBase's release.
+ It is <emphasis>critical</emphasis> that the version of Hadoop that is
+ out on your cluster matches what is Hbase match. Replace the hadoop
+ jar found in the HBase <filename>lib</filename> directory with the
+ hadoop jar you are running out on your cluster to avoid version mismatch issues.
+ Make sure you replace the jar all over your cluster.
+ For example, versions of CDH do not have HDFS-724 whereas
+ Hadoops branch-0.20-append branch does have HDFS-724. This
+ patch changes the RPC version because protocol was changed.
+ Version mismatch issues have various manifestations but often all looks like its hung up.
+ </para>
+
+ <note><title>Can I just replace the jar in Hadoop 0.20.2 tarball with the <emphasis>sync</emphasis>-supporting Hadoop jar found in HBase?</title>
+ <para>
+ You could do this. It works going by a recent posting up on the
+ <link xlink:href="http://www.apacheserver.net/Using-Hadoop-bundled-in-lib-directory-HBase-at1136240.htm">mailing list</link>.
+ </para>
+ </note>
+ <note><title>Hadoop Security</title>
+ <para>HBase will run on any Hadoop 0.20.x that incorporates Hadoop security features -- e.g. Y! 0.20S or CDH3B3 -- as long
+ as you do as suggested above and replace the Hadoop jar that ships with HBase with the secure version.
+ </para>
+ </note>
+
+ </section>
+<section xml:id="ssh"> <title>ssh</title>
+<para><command>ssh</command> must be installed and <command>sshd</command> must
+be running to use Hadoop's scripts to manage remote Hadoop and HBase daemons.
+ You must be able to ssh to all nodes, including your local node, using passwordless login (Google "ssh passwordless login").
+ </para>
+</section>
+ <section xml:id="dns"><title>DNS</title>
+ <para>HBase uses the local hostname to self-report it's IP address. Both forward and reverse DNS resolving should work.</para>
+ <para>If your machine has multiple interfaces, HBase will use the interface that the primary hostname resolves to.</para>
+ <para>If this is insufficient, you can set <varname>hbase.regionserver.dns.interface</varname> to indicate the primary interface.
+ This only works if your cluster
+ configuration is consistent and every host has the same network interface configuration.</para>
+ <para>Another alternative is setting <varname>hbase.regionserver.dns.nameserver</varname> to choose a different nameserver than the
+ system wide default.</para>
+</section>
+ <section xml:id="ntp"><title>NTP</title>
+<para>
+ The clocks on cluster members should be in basic alignments. Some skew is tolerable but
+ wild skew could generate odd behaviors. Run <link xlink:href="http://en.wikipedia.org/wiki/Network_Time_Protocol">NTP</link>
+ on your cluster, or an equivalent.
+ </para>
+ <para>If you are having problems querying data, or "weird" cluster operations, check system time!</para>
+</section>
+
+
+ <section xml:id="ulimit">
+ <title><varname>ulimit</varname><indexterm><primary>ulimit</primary></indexterm></title>
+ <para>HBase is a database, it uses a lot of files at the same time.
+ The default ulimit -n of 1024 on *nix systems is insufficient.
+ Any significant amount of loading will lead you to
+ <link xlink:href="http://wiki.apache.org/hadoop/Hbase/FAQ#A6">FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs?</link>.
+ You may also notice errors such as
+ <programlisting>
+ 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
+ 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
+ </programlisting>
+ Do yourself a favor and change the upper bound on the number of file descriptors.
+ Set it to north of 10k. See the above referenced FAQ for how.</para>
+ <para>To be clear, upping the file descriptors for the user who is
+ running the HBase process is an operating system configuration, not an
+ HBase configuration. Also, a common mistake is that administrators
+ will up the file descriptors for a particular user but for whatever reason,
+ HBase will be running as some one else. HBase prints in its logs
+ as the first line the ulimit its seeing. Ensure its correct.
+ <footnote>
+ <para>A useful read setting config on you hadoop cluster is Aaron Kimballs'
+ <link xlink:ref="http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/">Configuration Parameters: What can you just ignore?</link>
+ </para>
+ </footnote>
+ </para>
+ <section xml:id="ulimit_ubuntu">
+ <title><varname>ulimit</varname> on Ubuntu</title>
+ <para>
+ If you are on Ubuntu you will need to make the following changes:</para>
+ <para>
+ In the file <filename>/etc/security/limits.conf</filename> add a line like:
+ <programlisting>hadoop - nofile 32768</programlisting>
+ Replace <varname>hadoop</varname>
+ with whatever user is running Hadoop and HBase. If you have
+ separate users, you will need 2 entries, one for each user.
+ </para>
+ <para>
+ In the file <filename>/etc/pam.d/common-session</filename> add as the last line in the file:
+ <programlisting>session required pam_limits.so</programlisting>
+ Otherwise the changes in <filename>/etc/security/limits.conf</filename> won't be applied.
+ </para>
+ <para>
+ Don't forget to log out and back in again for the changes to take effect!
+ </para>
+ </section>
+ </section>
+
+ <section xml:id="dfs.datanode.max.xcievers">
+ <title><varname>dfs.datanode.max.xcievers</varname><indexterm><primary>xcievers</primary></indexterm></title>
+ <para>
+ An Hadoop HDFS datanode has an upper bound on the number of files
+ that it will serve at any one time.
+ The upper bound parameter is called
+ <varname>xcievers</varname> (yes, this is misspelled). Again, before
+ doing any loading, make sure you have configured
+ Hadoop's <filename>conf/hdfs-site.xml</filename>
+ setting the <varname>xceivers</varname> value to at least the following:
+ <programlisting>
+ <property>
+ <name>dfs.datanode.max.xcievers</name>
+ <value>4096</value>
+ </property>
+ </programlisting>
+ </para>
+ <para>Be sure to restart your HDFS after making the above
+ configuration.</para>
+ <para>Not having this configuration in place makes for strange looking
+ failures. Eventually you'll see a complain in the datanode logs
+ complaining about the xcievers exceeded, but on the run up to this
+ one manifestation is complaint about missing blocks. For example:
+ <code>10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...</code>
+ </para>
+ </section>
+
+<section xml:id="windows">
+<title>Windows</title>
+<para>
+HBase has been little tested running on windows.
+Running a production install of HBase on top of
+windows is not recommended.
+</para>
+<para>
+If you are running HBase on Windows, you must install
+<link xlink:href="http://cygwin.com/">Cygwin</link>
+to have a *nix-like environment for the shell scripts. The full details
+are explained in the <link xlink:href="http://hbase.apache.org/cygwin.html">Windows Installation</link>
+guide.
+</para>
+</section>
+
+ </section>
+
+ <section xml:id="standalone_dist"><title>HBase run modes: Standalone and Distributed</title>
+ <para>HBase has two run modes: <link linkend="standalone">standalone</link>
+ and <link linkend="distributed">distributed</link>.
+ Out of the box, HBase runs in standalone mode. To set up a
+ distributed deploy, you will need to configure HBase by editing
+ files in the HBase <filename>conf</filename> directory.</para>
+
+<para>Whatever your mode, you will need to edit <code>conf/hbase-env.sh</code>
+to tell HBase which <command>java</command> to use. In this file
+you set HBase environment variables such as the heapsize and other options
+for the <application>JVM</application>, the preferred location for log files, etc.
+Set <varname>JAVA_HOME</varname> to point at the root of your
+<command>java</command> install.</para>
+
+ <section xml:id="standalone"><title>Standalone HBase</title>
+ <para>This is the default mode. Standalone mode is
+ what is described in the <link linkend="quickstart">quickstart</link>
+ section. In standalone mode, HBase does not use HDFS -- it uses the local
+ filesystem instead -- and it runs all HBase daemons and a local zookeeper
+ all up in the same JVM. Zookeeper binds to a well known port so clients may
+ talk to HBase.
+ </para>
+ </section>
+ <section xml:id="distributed"><title>Distributed</title>
+ <para>Distributed mode can be subdivided into distributed but all daemons run on a
+ single node -- a.k.a <emphasis>pseudo-distributed</emphasis>-- and
+ <emphasis>fully-distributed</emphasis> where the daemons
+ are spread across all nodes in the cluster
+ <footnote><para>The pseudo-distributed vs fully-distributed nomenclature comes from Hadoop.</para></footnote>.</para>
+ <para>
+ Distributed modes require an instance of the
+ <emphasis>Hadoop Distributed File System</emphasis> (HDFS). See the
+ Hadoop <link xlink:href="http://hadoop.apache.org/common/docs/current/api/overview-summary.html#overview_description">
+ requirements and instructions</link> for how to set up a HDFS.
+ Before proceeding, ensure you have an appropriate, working HDFS.
+ </para>
+ <para>Below we describe the different distributed setups.
+ Starting, verification and exploration of your install, whether a
+ <emphasis>pseudo-distributed</emphasis> or <emphasis>fully-distributed</emphasis>
+ configuration is described in a section that follows,
+ <link linkend="confirm">Running and Confirming your Installation</link>.
+ The same verification script applies to both deploy types.</para>
+
+ <section xml:id="pseudo"><title>Pseudo-distributed</title>
+<para>A pseudo-distributed mode is simply a distributed mode run on a single host.
+Use this configuration testing and prototyping on HBase. Do not use this configuration
+for production nor for evaluating HBase performance.
+</para>
+<para>Once you have confirmed your HDFS setup,
+edit <filename>conf/hbase-site.xml</filename>. This is the file
+into which you add local customizations and overrides for
+<link linkend="hbase_default_configurations">Default HBase Configurations</link>
+and <link linkend="hdfs_client_conf">HDFS Client Configurations</link>.
+Point HBase at the running Hadoop HDFS instance by setting the
+<varname>hbase.rootdir</varname> property.
+This property points HBase at the Hadoop filesystem instance to use.
+For example, adding the properties below to your
+<filename>hbase-site.xml</filename> says that HBase
+should use the <filename>/hbase</filename>
+directory in the HDFS whose namenode is at port 9000 on your local machine, and that
+it should run with one replica only (recommended for pseudo-distributed mode):</para>
+<programlisting>
+<configuration>
+ ...
+ <property>
+ <name>hbase.rootdir</name>
+ <value>hdfs://localhost:9000/hbase</value>
+ <description>The directory shared by region servers.
+ </description>
+ </property>
+ <property>
+ <name>dfs.replication</name>
+ <value>1</value>
+ <description>The replication count for HLog & HFile storage. Should not be greater than HDFS datanode count.
+ </description>
+ </property>
+ ...
+</configuration>
+</programlisting>
+
+<note>
+<para>Let HBase create the <varname>hbase.rootdir</varname>
+directory. If you don't, you'll get warning saying HBase
+needs a migration run because the directory is missing files
+expected by HBase (it'll create them if you let it).</para>
+</note>
+
+<note>
+<para>Above we bind to <varname>localhost</varname>.
+This means that a remote client cannot
+connect. Amend accordingly, if you want to
+connect from a remote location.</para>
+</note>
+
+<para>Now skip to <link linkend="confirm">Running and Confirming your Installation</link>
+for how to start and verify your pseudo-distributed install.
+
+<footnote>
+ <para>See <link xlink:href="http://hbase.apache.org/pseudo-distributed.html">Pseudo-distributed mode extras</link>
+for notes on how to start extra Masters and regionservers when running
+ pseudo-distributed.</para>
+</footnote>
+</para>
+
+</section>
+
+ <section xml:id="fully_dist"><title>Fully-distributed</title>
+
+<para>For running a fully-distributed operation on more than one host, make
+the following configurations. In <filename>hbase-site.xml</filename>,
+add the property <varname>hbase.cluster.distributed</varname>
+and set it to <varname>true</varname> and point the HBase
+<varname>hbase.rootdir</varname> at the appropriate
+HDFS NameNode and location in HDFS where you would like
+HBase to write data. For example, if you namenode were running
+at namenode.example.org on port 9000 and you wanted to home
+your HBase in HDFS at <filename>/hbase</filename>,
+make the following configuration.</para>
+<programlisting>
+<configuration>
+ ...
+ <property>
+ <name>hbase.rootdir</name>
+ <value>hdfs://namenode.example.org:9000/hbase</value>
+ <description>The directory shared by region servers.
+ </description>
+ </property>
+ <property>
+ <name>hbase.cluster.distributed</name>
+ <value>true</value>
+ <description>The mode the cluster will be in. Possible values are
+ false: standalone and pseudo-distributed setups with managed Zookeeper
+ true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
+ </description>
+ </property>
+ ...
+</configuration>
+</programlisting>
+
+<section xml:id="regionserver"><title><filename>regionservers</filename></title>
+<para>In addition, a fully-distributed mode requires that you
+modify <filename>conf/regionservers</filename>.
+The <filename><link linkend="regionservrers">regionservers</link></filename> file lists all hosts
+that you would have running <application>HRegionServer</application>s, one host per line
+(This file in HBase is like the Hadoop <filename>slaves</filename> file). All servers
+listed in this file will be started and stopped when HBase cluster start or stop is run.</para>
+</section>
+
+<section xml:id="zookeeper"><title>ZooKeeper<indexterm><primary>ZooKeeper</primary></indexterm></title>
+<para>A distributed HBase depends on a running ZooKeeper cluster.
+All participating nodes and clients
+need to be able to access the running ZooKeeper ensemble.
+HBase by default manages a ZooKeeper "cluster" for you.
+It will start and stop the ZooKeeper ensemble as part of
+the HBase start/stop process. You can also manage
+the ZooKeeper ensemble independent of HBase and
+just point HBase at the cluster it should use.
+To toggle HBase management of ZooKeeper,
+use the <varname>HBASE_MANAGES_ZK</varname> variable in
+<filename>conf/hbase-env.sh</filename>.
+This variable, which defaults to <varname>true</varname>, tells HBase whether to
+start/stop the ZooKeeper ensemble servers as part of HBase start/stop.</para>
+
+<para>When HBase manages the ZooKeeper ensemble, you can specify ZooKeeper configuration
+using its native <filename>zoo.cfg</filename> file, or, the easier option
+is to just specify ZooKeeper options directly in <filename>conf/hbase-site.xml</filename>.
+A ZooKeeper configuration option can be set as a property in the HBase
+<filename>hbase-site.xml</filename>
+XML configuration file by prefacing the ZooKeeper option name with
+<varname>hbase.zookeeper.property</varname>.
+For example, the <varname>clientPort</varname> setting in ZooKeeper can be changed by
+setting the <varname>hbase.zookeeper.property.clientPort</varname> property.
+
+For all default values used by HBase, including ZooKeeper configuration,
+see the section
+<link linkend="hbase_default_configurations">Default HBase Configurations</link>.
+Look for the <varname>hbase.zookeeper.property</varname> prefix
+
+<footnote><para>For the full list of ZooKeeper configurations,
+see ZooKeeper's <filename>zoo.cfg</filename>.
+HBase does not ship with a <filename>zoo.cfg</filename> so you will need to
+browse the <filename>conf</filename> directory in an appropriate ZooKeeper download.
+</para>
+</footnote>
+</para>
+
+
+
+<para>You must at least list the ensemble servers in <filename>hbase-site.xml</filename>
+using the <varname>hbase.zookeeper.quorum</varname> property.
+This property defaults to a single ensemble member at
+<varname>localhost</varname> which is not suitable for a
+fully distributed HBase. (It binds to the local machine only and remote clients
+will not be able to connect).
+<note xml:id="how_many_zks">
+<title>How many ZooKeepers should I run?</title>
+<para>
+You can run a ZooKeeper ensemble that comprises 1 node only but
+in production it is recommended that you run a ZooKeeper ensemble of
+3, 5 or 7 machines; the more members an ensemble has, the more
+tolerant the ensemble is of host failures. Also, run an odd number of machines.
+There can be no quorum if the number of members is an even number. Give each
+ZooKeeper server around 1GB of RAM, and if possible, its own dedicated disk
+(A dedicated disk is the best thing you can do to ensure a performant ZooKeeper
+ensemble). For very heavily loaded clusters, run ZooKeeper servers on separate machines from
+RegionServers (DataNodes and TaskTrackers).</para>
+</note>
+</para>
+
+
+<para>For example, to have HBase manage a ZooKeeper quorum on nodes
+<emphasis>rs{1,2,3,4,5}.example.com</emphasis>, bound to port 2222 (the default is 2181)
+ensure <varname>HBASE_MANAGE_ZK</varname> is commented out or set to
+<varname>true</varname> in <filename>conf/hbase-env.sh</filename> and
+then edit <filename>conf/hbase-site.xml</filename> and set
+<varname>hbase.zookeeper.property.clientPort</varname>
+and
+<varname>hbase.zookeeper.quorum</varname>. You should also
+set
+<varname>hbase.zookeeper.property.dataDir</varname>
+to other than the default as the default has ZooKeeper persist data under
+<filename>/tmp</filename> which is often cleared on system restart.
+In the example below we have ZooKeeper persist to <filename>/user/local/zookeeper</filename>.
+<programlisting>
+ <configuration>
+ ...
+ <property>
+ <name>hbase.zookeeper.property.clientPort</name>
+ <value>2222</value>
+ <description>Property from ZooKeeper's config zoo.cfg.
+ The port at which the clients will connect.
+ </description>
+ </property>
+ <property>
+ <name>hbase.zookeeper.quorum</name>
+ <value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value>
+ <description>Comma separated list of servers in the ZooKeeper Quorum.
+ For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
+ By default this is set to localhost for local and pseudo-distributed modes
+ of operation. For a fully-distributed setup, this should be set to a full
+ list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
+ this is the list of servers which we will start/stop ZooKeeper on.
+ </description>
+ </property>
+ <property>
+ <name>hbase.zookeeper.property.dataDir</name>
+ <value>/usr/local/zookeeper</value>
+ <description>Property from ZooKeeper's config zoo.cfg.
+ The directory where the snapshot is stored.
+ </description>
+ </property>
+ ...
+ </configuration></programlisting>
+</para>
+
+<section><title>Using existing ZooKeeper ensemble</title>
+<para>To point HBase at an existing ZooKeeper cluster,
+one that is not managed by HBase,
+set <varname>HBASE_MANAGES_ZK</varname> in
+<filename>conf/hbase-env.sh</filename> to false
+<programlisting>
+ ...
+ # Tell HBase whether it should manage it's own instance of Zookeeper or not.
+ export HBASE_MANAGES_ZK=false</programlisting>
+
+Next set ensemble locations and client port, if non-standard,
+in <filename>hbase-site.xml</filename>,
+or add a suitably configured <filename>zoo.cfg</filename> to HBase's <filename>CLASSPATH</filename>.
+HBase will prefer the configuration found in <filename>zoo.cfg</filename>
+over any settings in <filename>hbase-site.xml</filename>.
+</para>
+
+<para>When HBase manages ZooKeeper, it will start/stop the ZooKeeper servers as a part
+of the regular start/stop scripts. If you would like to run ZooKeeper yourself,
+independent of HBase start/stop, you would do the following</para>
+<programlisting>
+${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
+</programlisting>
+
+<para>Note that you can use HBase in this manner to spin up a ZooKeeper cluster,
+unrelated to HBase. Just make sure to set <varname>HBASE_MANAGES_ZK</varname> to
+<varname>false</varname> if you want it to stay up across HBase restarts
+so that when HBase shuts down, it doesn't take ZooKeeper down with it.</para>
+
+<para>For more information about running a distinct ZooKeeper cluster, see
+the ZooKeeper <link xlink:href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting Started Guide</link>.
+</para>
+</section>
+</section>
+
+<section xml:id="hdfs_client_conf">
+<title>HDFS Client Configuration</title>
+<para>Of note, if you have made <emphasis>HDFS client configuration</emphasis> on your Hadoop cluster
+-- i.e. configuration you want HDFS clients to use as opposed to server-side configurations --
+HBase will not see this configuration unless you do one of the following:</para>
+<itemizedlist>
+ <listitem><para>Add a pointer to your <varname>HADOOP_CONF_DIR</varname>
+ to the <varname>HBASE_CLASSPATH</varname> environment variable
+ in <filename>hbase-env.sh</filename>.</para></listitem>
+ <listitem><para>Add a copy of <filename>hdfs-site.xml</filename>
+ (or <filename>hadoop-site.xml</filename>) or, better, symlinks,
+ under
+ <filename>${HBASE_HOME}/conf</filename>, or</para></listitem>
+ <listitem><para>if only a small set of HDFS client
+ configurations, add them to <filename>hbase-site.xml</filename>.</para></listitem>
+</itemizedlist>
+
+<para>An example of such an HDFS client configuration is <varname>dfs.replication</varname>. If for example,
+you want to run with a replication factor of 5, hbase will create files with the default of 3 unless
+you do the above to make the configuration available to HBase.</para>
+</section>
+ </section>
+ </section>
+
+<section xml:id="confirm"><title>Running and Confirming Your Installation</title>
+<para>Make sure HDFS is running first.
+Start and stop the Hadoop HDFS daemons by running <filename>bin/start-hdfs.sh</filename>
+over in the <varname>HADOOP_HOME</varname> directory.
+You can ensure it started properly by testing the <command>put</command> and
+<command>get</command> of files into the Hadoop filesystem.
+HBase does not normally use the mapreduce daemons. These do not need to be started.</para>
+
+<para><emphasis>If</emphasis> you are managing your own ZooKeeper, start it
+and confirm its running else, HBase will start up ZooKeeper for you as part
+of its start process.</para>
+
+<para>Start HBase with the following command:</para>
+<programlisting>bin/start-hbase.sh</programlisting>
+Run the above from the <varname>HBASE_HOME</varname> directory.
+
+<para>You should now have a running HBase instance.
+HBase logs can be found in the <filename>logs</filename> subdirectory. Check them
+out especially if HBase had trouble starting.</para>
+
+<para>HBase also puts up a UI listing vital attributes. By default its deployed on the Master host
+at port 60010 (HBase RegionServers listen on port 60020 by default and put up an informational
+http server at 60030). If the Master were running on a host named <varname>master.example.org</varname>
+on the default port, to see the Master's homepage you'd point your browser at
+<filename>http://master.example.org:60010</filename>.</para>
+
+<para>Once HBase has started, see the
+<link linkend="shell_exercises">Shell Exercises</link> section for how to
+create tables, add data, scan your insertions, and finally disable and
+drop your tables.
+</para>
+
+<para>To stop HBase after exiting the HBase shell enter
+<programlisting>$ ./bin/stop-hbase.sh
+stopping hbase...............</programlisting>
+Shutdown can take a moment to complete. It can take longer if your cluster
+is comprised of many machines. If you are running a distributed operation,
+be sure to wait until HBase has shut down completely
+before stopping the Hadoop daemons.</para>
+
+
+
+</section>
+</section>
+
+
+
+
+
+
+ <section xml:id="example_config"><title>Example Configurations</title>
+ <section><title>Basic Distributed HBase Install</title>
+ <para>Here is an example basic configuration for a distributed ten node cluster.
+ The nodes are named <varname>example0</varname>, <varname>example1</varname>, etc., through
+node <varname>example9</varname> in this example. The HBase Master and the HDFS namenode
+are running on the node <varname>example0</varname>. RegionServers run on nodes
+<varname>example1</varname>-<varname>example9</varname>.
+A 3-node ZooKeeper ensemble runs on <varname>example1</varname>,
+<varname>example2</varname>, and <varname>example3</varname> on the
+default ports. ZooKeeper data is persisted to the directory
+<filename>/export/zookeeper</filename>.
+Below we show what the main configuration files
+-- <filename>hbase-site.xml</filename>, <filename>regionservers</filename>, and
+<filename>hbase-env.sh</filename> -- found in the HBase
+<filename>conf</filename> directory might look like.
+</para>
+ <section xml:id="hbase_site"><title><filename>hbase-site.xml</filename></title>
+ <programlisting>
+<![CDATA[
+<?xml version="1.0"?>
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<configuration>
+ <property>
+ <name>hbase.zookeeper.quorum</name>
+ <value>example1,example2,example3</value>
+ <description>The directory shared by region servers.
+ </description>
+ </property>
+ <property>
+ <name>hbase.zookeeper.property.dataDir</name>
+ <value>/export/zookeeper</value>
+ <description>Property from ZooKeeper's config zoo.cfg.
+ The directory where the snapshot is stored.
+ </description>
+ </property>
+ <property>
+ <name>hbase.rootdir</name>
+ <value>hdfs://example0:9000/hbase</value>
+ <description>The directory shared by region servers.
+ </description>
+ </property>
+ <property>
+ <name>hbase.cluster.distributed</name>
+ <value>true</value>
+ <description>The mode the cluster will be in. Possible values are
+ false: standalone and pseudo-distributed setups with managed Zookeeper
+ true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
+ </description>
+ </property>
+</configuration>
+]]>
+ </programlisting>
+ </section>
+
+ <section xml:id="regionservers"><title><filename>regionservers</filename></title>
+ <para>In this file you list the nodes that will run regionservers. In
+ our case we run regionservers on all but the head node
+ <varname>example1</varname> which is
+ carrying the HBase Master and the HDFS namenode</para>
+ <programlisting>
+ example1
+ example3
+ example4
+ example5
+ example6
+ example7
+ example8
+ example9
+ </programlisting>
+ </section>
+
+ <section xml:id="hbase_env"><title><filename>hbase-env.sh</filename></title>
+ <para>Below we use a <command>diff</command> to show the differences from
+ default in the <filename>hbase-env.sh</filename> file. Here we are setting
+the HBase heap to be 4G instead of the default 1G.
+ </para>
+ <programlisting>
+ <![CDATA[
+$ git diff hbase-env.sh
+diff --git a/conf/hbase-env.sh b/conf/hbase-env.sh
+index e70ebc6..96f8c27 100644
+--- a/conf/hbase-env.sh
++++ b/conf/hbase-env.sh
+@@ -31,7 +31,7 @@ export JAVA_HOME=/usr/lib//jvm/java-6-sun/
+ # export HBASE_CLASSPATH=
+
+ # The maximum amount of heap to use, in MB. Default is 1000.
+-# export HBASE_HEAPSIZE=1000
++export HBASE_HEAPSIZE=4096
+
+ # Extra Java runtime options.
+ # Below are what we set by default. May only work with SUN JVM.
+]]>
+ </programlisting>
+
+ <para>Use <command>rsync</command> to copy the content of
+ the <filename>conf</filename> directory to
+ all nodes of the cluster.
+ </para>
+ </section>
+
+ </section>
+
+ </section>
+ </section>
+
+ </chapter>
Added: hbase/trunk/src/docbkx/performance.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/performance.xml?rev=1081966&view=auto
==============================================================================
--- hbase/trunk/src/docbkx/performance.xml (added)
+++ hbase/trunk/src/docbkx/performance.xml Tue Mar 15 22:23:12 2011
@@ -0,0 +1,39 @@
+<?xml version="1.0"?>
+<chapter xml:id="performance"
+ version="5.0" xmlns="http://docbook.org/ns/docbook"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns:m="http://www.w3.org/1998/Math/MathML"
+ xmlns:html="http://www.w3.org/1999/xhtml"
+ xmlns:db="http://docbook.org/ns/docbook">
+
+ <title>Performance Tuning</title>
+ <para>Start with the <link xlink:href="http://wiki.apache.org/hadoop/PerformanceTuning">wiki Performance Tuning</link> page.
+ It has a general discussion of the main factors involved; RAM, compression, JVM settings, etc.
+ Afterward, come back here for more pointers.
+ </para>
+ <section xml:id="jvm">
+ <title>Java</title>
+ <section xml:id="gc">
+ <title>The Garage Collector and HBase</title>
+ <section xml:id="gcpause">
+ <title>Long GC pauses</title>
+ <para>
+ In his presentation,
+ <link xlink:href="http://www.slideshare.net/cloudera/hbase-hug-presentation">Avoiding Full GCs with MemStore-Local Allocation Buffers</link>,
+ Todd Lipcon describes two cases of stop-the-world garbage collections common in HBase, especially during loading;
+ CMS failure modes and old generation heap fragmentation brought. To address the first,
+ start the CMS earlier than default by adding <code>-XX:CMSInitiatingOccupancyFraction</code>
+ and setting it down from defaults. Start at 60 or 70 percent (The lower you bring down
+ the threshold, the more GCing is done, the more CPU used). To address the second
+ fragmentation issue, Todd added an experimental facility that must be
+ explicitly enabled in HBase 0.90.x (Its defaulted to be on in 0.92.x HBase). See
+ <code>hbase.hregion.memstore.mslab.enabled</code> to true in your
+ <classname>Configuration</classname>. See the cited slides for background and
+ detail.
+ </para>
+ </section>
+ </section>
+ </section>
+ </chapter>
Added: hbase/trunk/src/docbkx/preface.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/preface.xml?rev=1081966&view=auto
==============================================================================
--- hbase/trunk/src/docbkx/preface.xml (added)
+++ hbase/trunk/src/docbkx/preface.xml Tue Mar 15 22:23:12 2011
@@ -0,0 +1,27 @@
+<?xml version="1.0"?>
+ <preface xml:id="preface"
+ version="5.0" xmlns="http://docbook.org/ns/docbook"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns:m="http://www.w3.org/1998/Math/MathML"
+ xmlns:html="http://www.w3.org/1999/xhtml"
+ xmlns:db="http://docbook.org/ns/docbook">
+ <title>Preface</title>
+
+ <para>This book aims to be the official guide for the <link
+ xlink:href="http://hbase.apache.org/">HBase</link> version it ships with.
+ This document describes HBase version <emphasis><?eval ${project.version}?></emphasis>.
+ Herein you will find either the definitive documentation on an HBase topic
+ as of its standing when the referenced HBase version shipped, or
+ this book will point to the location in <link
+ xlink:href="http://hbase.apache.org/docs/current/api/index.html">javadoc</link>,
+ <link xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>
+ or <link xlink:href="http://wiki.apache.org/hadoop/Hbase">wiki</link>
+ where the pertinent information can be found.</para>
+
+ <para>This book is a work in progress. It is lacking in many areas but we
+ hope to fill in the holes with time. Feel free to add to this book should
+ by adding a patch to an issue up in the HBase <link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>.</para>
+ </preface>
Added: hbase/trunk/src/docbkx/shell.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/shell.xml?rev=1081966&view=auto
==============================================================================
--- hbase/trunk/src/docbkx/shell.xml (added)
+++ hbase/trunk/src/docbkx/shell.xml Tue Mar 15 22:23:12 2011
@@ -0,0 +1,89 @@
+<?xml version="1.0"?>
+ <chapter xml:id="shell"
+ version="5.0" xmlns="http://docbook.org/ns/docbook"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns:m="http://www.w3.org/1998/Math/MathML"
+ xmlns:html="http://www.w3.org/1999/xhtml"
+ xmlns:db="http://docbook.org/ns/docbook">
+ <title>The HBase Shell</title>
+
+ <para>
+ The HBase Shell is <link xlink:href="http://jruby.org">(J)Ruby</link>'s
+ IRB with some HBase particular verbs added. Anything you can do in
+ IRB, you should be able to do in the HBase Shell.</para>
+ <para>To run the HBase shell,
+ do as follows:
+ <programlisting>$ ./bin/hbase shell</programlisting>
+ </para>
+ <para>Type <command>help</command> and then <command><RETURN></command>
+ to see a listing of shell
+ commands and options. Browse at least the paragraphs at the end of
+ the help emission for the gist of how variables and command
+ arguments are entered into the
+ HBase shell; in particular note how table names, rows, and
+ columns, etc., must be quoted.</para>
+ <para>See <link linkend="shell_exercises">Shell Exercises</link>
+ for example basic shell operation.</para>
+
+ <section xml:id="scripting"><title>Scripting</title>
+ <para>For examples scripting HBase, look in the
+ HBase <filename>bin</filename> directory. Look at the files
+ that end in <filename>*.rb</filename>. To run one of these
+ files, do as follows:
+ <programlisting>$ ./bin/hbase org.jruby.Main PATH_TO_SCRIPT</programlisting>
+ </para>
+ </section>
+
+ <section xml:id="shell_tricks"><title>Shell Tricks</title>
+ <section><title><filename>irbrc</filename></title>
+ <para>Create an <filename>.irbrc</filename> file for yourself in your
+ home directory. Add customizations. A useful one is
+ command history so commands are save across Shell invocations:
+ <programlisting>
+ $ more .irbrc
+ require 'irb/ext/save-history'
+ IRB.conf[:SAVE_HISTORY] = 100
+ IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irb-save-history"</programlisting>
+ See the <application>ruby</application> documentation of
+ <filename>.irbrc</filename> to learn about other possible
+ confiurations.
+ </para>
+ </section>
+ <section><title>LOG data to timestamp</title>
+ <para>
+ To convert the date '08/08/16 20:56:29' from an hbase log into a timestamp, do:
+ <programlisting>
+ hbase(main):021:0> import java.text.SimpleDateFormat
+ hbase(main):022:0> import java.text.ParsePosition
+ hbase(main):023:0> SimpleDateFormat.new("yy/MM/dd HH:mm:ss").parse("08/08/16 20:56:29", ParsePosition.new(0)).getTime() => 1218920189000</programlisting>
+ </para>
+ <para>
+ To go the other direction:
+ <programlisting>
+ hbase(main):021:0> import java.util.Date
+ hbase(main):022:0> Date.new(1218920189000).toString() => "Sat Aug 16 20:56:29 UTC 2008"</programlisting>
+ </para>
+ <para>
+ To output in a format that is exactly like that of the HBase log format will take a little messing with
+ <link xlink:href="http://download.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html">SimpleDateFormat</link>.
+ </para>
+ </section>
+ <section><title>Debug</title>
+ <section><title>Shell debug switch</title>
+ <para>You can set a debug switch in the shell to see more output
+ -- e.g. more of the stack trace on exception --
+ when you run a command:
+ <programlisting>hbase> debug <RETURN></programlisting>
+ </para>
+ </section>
+ <section><title>DEBUG log level</title>
+ <para>To enable DEBUG level logging in the shell,
+ launch it with the <command>-d</command> option.
+ <programlisting>$ ./bin/hbase shell -d</programlisting>
+ </para>
+ </section>
+ </section>
+ </section>
+ </chapter>
Added: hbase/trunk/src/docbkx/upgrading.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/upgrading.xml?rev=1081966&view=auto
==============================================================================
--- hbase/trunk/src/docbkx/upgrading.xml (added)
+++ hbase/trunk/src/docbkx/upgrading.xml Tue Mar 15 22:23:12 2011
@@ -0,0 +1,55 @@
+<?xml version="1.0"?>
+ <chapter xml:id="upgrading"
+ version="5.0" xmlns="http://docbook.org/ns/docbook"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns:m="http://www.w3.org/1998/Math/MathML"
+ xmlns:html="http://www.w3.org/1999/xhtml"
+ xmlns:db="http://docbook.org/ns/docbook">
+ <title>Upgrading</title>
+ <para>
+ Review the <link linkend="requirements">requirements</link>
+ section above, in particular the section on Hadoop version.
+ </para>
+ <section xml:id="upgrade0.90">
+ <title>Upgrading to HBase 0.90.x from 0.20.x or 0.89.x</title>
+ <para>This version of 0.90.x HBase can be started on data written by
+ HBase 0.20.x or HBase 0.89.x. There is no need of a migration step.
+ HBase 0.89.x and 0.90.x does write out the name of region directories
+ differently -- it names them with a md5 hash of the region name rather
+ than a jenkins hash -- so this means that once started, there is no
+ going back to HBase 0.20.x.
+ </para>
+ <para>
+ Be sure to remove the <filename>hbase-default.xml</filename> from
+ your <filename>conf</filename>
+ directory on upgrade. A 0.20.x version of this file will have
+ sub-optimal configurations for 0.90.x HBase. The
+ <filename>hbase-default.xml</filename> file is now bundled into the
+ HBase jar and read from there. If you would like to review
+ the content of this file, see it in the src tree at
+ <filename>src/main/resources/hbase-default.xml</filename> or
+ see <link linkend="hbase_default_configurations">Default HBase Configurations</link>.
+ </para>
+ <para>
+ Finally, if upgrading from 0.20.x, check your
+ <varname>.META.</varname> schema in the shell. In the past we would
+ recommend that users run with a 16kb
+ <varname>MEMSTORE_FLUSHSIZE</varname>.
+ Run <code>hbase> scan '-ROOT-'</code> in the shell. This will output
+ the current <varname>.META.</varname> schema. Check
+ <varname>MEMSTORE_FLUSHSIZE</varname> size. Is it 16kb (16384)? If so, you will
+ need to change this (The 'normal'/default value is 64MB (67108864)).
+ Run the script <filename>bin/set_meta_memstore_size.rb</filename>.
+ This will make the necessary edit to your <varname>.META.</varname> schema.
+ Failure to run this change will make for a slow cluster <footnote>
+ <para>
+ See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-3499">HBASE-3499 Users upgrading to 0.90.0 need to have their .META. table updated with the right MEMSTORE_SIZE</link>
+ </para>
+ </footnote>
+ .
+
+ </para>
+ </section>
+ </chapter>