You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by en...@apache.org on 2014/12/03 06:53:30 UTC
[7/9] hbase git commit: Blanket update of src/main/docbkx from master
http://git-wip-us.apache.org/repos/asf/hbase/blob/48d9d27d/src/main/docbkx/book.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml
index d275984..f835dc7 100644
--- a/src/main/docbkx/book.xml
+++ b/src/main/docbkx/book.xml
@@ -39,10 +39,18 @@
<imageobject>
<imagedata
align="center"
- valign="middle"
+ valign="left"
fileref="hbase_logo.png" />
</imageobject>
</inlinemediaobject>
+ <inlinemediaobject>
+ <imageobject>
+ <imagedata
+ align="center"
+ valign="right"
+ fileref="jumping-orca_rotated_25percent.png" />
+ </imageobject>
+ </inlinemediaobject>
</link>
</subtitle>
<copyright>
@@ -438,25 +446,25 @@
<para> A namespace can be created, removed or altered. Namespace membership is determined
during table creation by specifying a fully-qualified table name of the form:</para>
- <programlisting><![CDATA[<table namespace>:<table qualifier>]]></programlisting>
+ <programlisting language="xml"><![CDATA[<table namespace>:<table qualifier>]]></programlisting>
<example>
<title>Examples</title>
- <programlisting>
+ <programlisting language="bourne">
#Create a namespace
create_namespace 'my_ns'
</programlisting>
- <programlisting>
+ <programlisting language="bourne">
#create my_table in my_ns namespace
create 'my_ns:my_table', 'fam'
</programlisting>
- <programlisting>
+ <programlisting language="bourne">
#drop namespace
drop_namespace 'my_ns'
</programlisting>
- <programlisting>
+ <programlisting language="bourne">
#alter namespace
alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
</programlisting>
@@ -478,7 +486,7 @@ alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
<example>
<title>Examples</title>
- <programlisting>
+ <programlisting language="bourne">
#namespace=foo and table qualifier=bar
create 'foo:bar', 'fam'
@@ -534,16 +542,17 @@ create 'bar', 'fam'
<title>Data Model Operations</title>
<para>The four primary data model operations are Get, Put, Scan, and Delete. Operations are
applied via <link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>
- instances. </para>
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html">Table</link>
+ instances.
+ </para>
<section
xml:id="get">
<title>Get</title>
<para><link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link>
returns attributes for a specified row. Gets are executed via <link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#get%28org.apache.hadoop.hbase.client.Get%29">
- HTable.get</link>. </para>
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get(org.apache.hadoop.hbase.client.Get)">
+ Table.get</link>. </para>
</section>
<section
xml:id="put">
@@ -552,10 +561,10 @@ create 'bar', 'fam'
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link>
either adds new rows to a table (if the key is new) or can update existing rows (if the
key already exists). Puts are executed via <link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put%28org.apache.hadoop.hbase.client.Put%29">
- HTable.put</link> (writeBuffer) or <link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29">
- HTable.batch</link> (non-writeBuffer). </para>
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)">
+ Table.put</link> (writeBuffer) or <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List, java.lang.Object[])">
+ Table.batch</link> (non-writeBuffer). </para>
</section>
<section
xml:id="scan">
@@ -563,27 +572,26 @@ create 'bar', 'fam'
<para><link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link>
allow iteration over multiple rows for specified attributes. </para>
- <para>The following is an example of a on an HTable table instance. Assume that a table is
+ <para>The following is an example of a Scan on a Table instance. Assume that a table is
populated with rows with keys "row1", "row2", "row3", and then another set of rows with
- the keys "abc1", "abc2", and "abc3". The following example shows how startRow and stopRow
- can be applied to a Scan instance to return the rows beginning with "row".</para>
- <programlisting>
+ the keys "abc1", "abc2", and "abc3". The following example shows how to set a Scan
+ instance to return the rows beginning with "row".</para>
+<programlisting language="java">
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
-HTable htable = ... // instantiate HTable
+Table table = ... // instantiate a Table instance
Scan scan = new Scan();
scan.addColumn(CF, ATTR);
scan.setRowPrefixFilter(Bytes.toBytes("row"));
-ResultScanner rs = htable.getScanner(scan);
+ResultScanner rs = table.getScanner(scan);
try {
for (Result r = rs.next(); r != null; r = rs.next()) {
// process result...
} finally {
rs.close(); // always close the ResultScanner!
-}
</programlisting>
<para>Note that generally the easiest way to specify a specific stop point for a scan is by
using the <link
@@ -596,7 +604,7 @@ try {
<para><link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html">Delete</link>
removes a row from a table. Deletes are executed via <link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete(org.apache.hadoop.hbase.client.Delete)">
HTable.delete</link>. </para>
<para>HBase does not modify data in place, and so deletes are handled by creating new
markers called <emphasis>tombstones</emphasis>. These tombstones, along with the dead
@@ -724,12 +732,12 @@ try {
xml:id="default_get_example">
<title>Default Get Example</title>
<para>The following Get will only retrieve the current version of the row</para>
- <programlisting>
+ <programlisting language="java">
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Get get = new Get(Bytes.toBytes("row1"));
-Result r = htable.get(get);
+Result r = table.get(get);
byte[] b = r.getValue(CF, ATTR); // returns current version of value
</programlisting>
</section>
@@ -737,13 +745,13 @@ byte[] b = r.getValue(CF, ATTR); // returns current version of value
xml:id="versioned_get_example">
<title>Versioned Get Example</title>
<para>The following Get will return the last 3 versions of the row.</para>
- <programlisting>
+ <programlisting language="java">
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Get get = new Get(Bytes.toBytes("row1"));
get.setMaxVersions(3); // will return last 3 versions of row
-Result r = htable.get(get);
+Result r = table.get(get);
byte[] b = r.getValue(CF, ATTR); // returns current version of value
List<KeyValue> kv = r.getColumn(CF, ATTR); // returns all versions of this column
</programlisting>
@@ -765,27 +773,27 @@ List<KeyValue> kv = r.getColumn(CF, ATTR); // returns all versions of thi
<title>Implicit Version Example</title>
<para>The following Put will be implicitly versioned by HBase with the current
time.</para>
- <programlisting>
+ <programlisting language="java">
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Put put = new Put(Bytes.toBytes(row));
put.add(CF, ATTR, Bytes.toBytes( data));
-htable.put(put);
+table.put(put);
</programlisting>
</section>
<section
xml:id="explicit_version_example">
<title>Explicit Version Example</title>
<para>The following Put has the version timestamp explicitly set.</para>
- <programlisting>
+ <programlisting language="java">
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Put put = new Put( Bytes.toBytes(row));
long explicitTimeInMs = 555; // just an example
put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data));
-htable.put(put);
+table.put(put);
</programlisting>
<para>Caution: the version timestamp is internally by HBase for things like time-to-live
calculations. It's usually best to avoid setting this timestamp yourself. Prefer using
@@ -981,7 +989,7 @@ htable.put(put);
Be sure to use the correct version of the HBase JAR for your system. The backticks
(<literal>`</literal> symbols) cause ths shell to execute the sub-commands, setting the
CLASSPATH as part of the command. This example assumes you use a BASH-compatible shell. </para>
- <screen>$ <userinput>HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-VERSION.jar rowcounter usertable</userinput></screen>
+ <screen language="bourne">$ <userinput>HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-VERSION.jar rowcounter usertable</userinput></screen>
<para>When the command runs, internally, the HBase JAR finds the dependencies it needs for
zookeeper, guava, and its other dependencies on the passed <envar>HADOOP_CLASSPATH</envar>
and adds the JARs to the MapReduce job configuration. See the source at
@@ -992,7 +1000,7 @@ htable.put(put);
<screen>java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper</screen>
<para>If this occurs, try modifying the command as follows, so that it uses the HBase JARs
from the <filename>target/</filename> directory within the build environment.</para>
- <screen>$ <userinput>HADOOP_CLASSPATH=${HBASE_HOME}/target/hbase-server-VERSION.jar:`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/target/hbase-server-VERSIION.jar rowcounter usertable</userinput></screen>
+ <screen language="bourne">$ <userinput>HADOOP_CLASSPATH=${HBASE_HOME}/hbase-server/target/hbase-server-VERSION-SNAPSHOT.jar:`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server/target/hbase-server-VERSION-SNAPSHOT.jar rowcounter usertable</userinput></screen>
</note>
<caution>
<title>Notice to Mapreduce users of HBase 0.96.1 and above</title>
@@ -1042,14 +1050,14 @@ Exception in thread "main" java.lang.IllegalAccessError: class
<code>HADOOP_CLASSPATH</code> environment variable at job submission time. When
launching jobs that package their dependencies, all three of the following job launching
commands satisfy this requirement:</para>
- <screen>
+ <screen language="bourne">
$ <userinput>HADOOP_CLASSPATH=/path/to/hbase-protocol.jar:/path/to/hbase/conf hadoop jar MyJob.jar MyJobMainClass</userinput>
$ <userinput>HADOOP_CLASSPATH=$(hbase mapredcp):/path/to/hbase/conf hadoop jar MyJob.jar MyJobMainClass</userinput>
$ <userinput>HADOOP_CLASSPATH=$(hbase classpath) hadoop jar MyJob.jar MyJobMainClass</userinput>
</screen>
<para>For jars that do not package their dependencies, the following command structure is
necessary:</para>
- <screen>
+ <screen language="bourne">
$ <userinput>HADOOP_CLASSPATH=$(hbase mapredcp):/etc/hbase/conf hadoop jar MyApp.jar MyJobMainClass -libjars $(hbase mapredcp | tr ':' ',')</userinput> ...
</screen>
<para>See also <link
@@ -1100,7 +1108,7 @@ $ <userinput>HADOOP_CLASSPATH=$(hbase mapredcp):/etc/hbase/conf hadoop jar MyApp
<para>The HBase JAR also serves as a Driver for some bundled mapreduce jobs. To learn about
the bundled MapReduce jobs, run the following command.</para>
- <screen>$ <userinput>${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-VERSION.jar</userinput>
+ <screen language="bourne">$ <userinput>${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-VERSION.jar</userinput>
<computeroutput>An example program must be given as the first argument.
Valid program names are:
copytable: Export a table from local cluster to peer cluster
@@ -1112,7 +1120,7 @@ Valid program names are:
</screen>
<para>Each of the valid program names are bundled MapReduce jobs. To run one of the jobs,
model your command after the following example.</para>
- <screen>$ <userinput>${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-VERSION.jar rowcounter myTable</userinput></screen>
+ <screen language="bourne">$ <userinput>${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-VERSION.jar rowcounter myTable</userinput></screen>
</section>
<section>
@@ -1174,7 +1182,7 @@ Valid program names are:
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html">RowCounter</link>
MapReduce job uses <code>TableInputFormat</code> and does a count of all rows in the specified
table. To run it, use the following command: </para>
- <screen>$ <userinput>./bin/hadoop jar hbase-X.X.X.jar</userinput></screen>
+ <screen language="bourne">$ <userinput>./bin/hadoop jar hbase-X.X.X.jar</userinput></screen>
<para>This will
invoke the HBase MapReduce Driver class. Select <literal>rowcounter</literal> from the choice of jobs
offered. This will print rowcouner usage advice to standard output. Specify the tablename,
@@ -1213,7 +1221,7 @@ Valid program names are:
<para>The following is an example of using HBase as a MapReduce source in read-only manner.
Specifically, there is a Mapper instance but no Reducer, and nothing is being emitted from
the Mapper. There job would be defined as follows...</para>
- <programlisting>
+ <programlisting language="java">
Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "ExampleRead");
job.setJarByClass(MyReadJob.class); // class that contains mapper
@@ -1240,7 +1248,7 @@ if (!b) {
</programlisting>
<para>...and the mapper instance would extend <link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html">TableMapper</link>...</para>
- <programlisting>
+ <programlisting language="java">
public static class MyMapper extends TableMapper<Text, Text> {
public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
@@ -1254,7 +1262,7 @@ public static class MyMapper extends TableMapper<Text, Text> {
<title>HBase MapReduce Read/Write Example</title>
<para>The following is an example of using HBase both as a source and as a sink with
MapReduce. This example will simply copy data from one table to another.</para>
- <programlisting>
+ <programlisting language="java">
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleReadWrite");
job.setJarByClass(MyReadWriteJob.class); // class that contains mapper
@@ -1293,7 +1301,7 @@ if (!b) {
<para>The following is the example mapper, which will create a <classname>Put</classname>
and matching the input <classname>Result</classname> and emit it. Note: this is what the
CopyTable utility does. </para>
- <programlisting>
+ <programlisting language="java">
public static class MyMapper extends TableMapper<ImmutableBytesWritable, Put> {
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
@@ -1327,7 +1335,7 @@ public static class MyMapper extends TableMapper<ImmutableBytesWritable, Put&
<para>The following example uses HBase as a MapReduce source and sink with a summarization
step. This example will count the number of distinct instances of a value in a table and
write those summarized counts in another table.
- <programlisting>
+ <programlisting language="java">
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummary");
job.setJarByClass(MySummaryJob.class); // class that contains mapper and reducer
@@ -1358,7 +1366,7 @@ if (!b) {
In this example mapper a column with a String-value is chosen as the value to summarize
upon. This value is used as the key to emit from the mapper, and an
<classname>IntWritable</classname> represents an instance counter.
- <programlisting>
+ <programlisting language="java">
public static class MyMapper extends TableMapper<Text, IntWritable> {
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR1 = "attr1".getBytes();
@@ -1376,7 +1384,7 @@ public static class MyMapper extends TableMapper<Text, IntWritable> {
</programlisting>
In the reducer, the "ones" are counted (just like any other MR example that does this),
and then emits a <classname>Put</classname>.
- <programlisting>
+ <programlisting language="java">
public static class MyTableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> {
public static final byte[] CF = "cf".getBytes();
public static final byte[] COUNT = "count".getBytes();
@@ -1401,7 +1409,7 @@ public static class MyTableReducer extends TableReducer<Text, IntWritable, Im
<para>This very similar to the summary example above, with exception that this is using
HBase as a MapReduce source but HDFS as the sink. The differences are in the job setup and
in the reducer. The mapper remains the same. </para>
- <programlisting>
+ <programlisting language="java">
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummaryToFile");
job.setJarByClass(MySummaryFileJob.class); // class that contains mapper and reducer
@@ -1430,7 +1438,7 @@ if (!b) {
<para>As stated above, the previous Mapper can run unchanged with this example. As for the
Reducer, it is a "generic" Reducer instead of extending TableMapper and emitting
Puts.</para>
- <programlisting>
+ <programlisting language="java">
public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
@@ -1448,7 +1456,7 @@ if (!b) {
<title>HBase MapReduce Summary to HBase Without Reducer</title>
<para>It is also possible to perform summaries without a reducer - if you use HBase as the
reducer. </para>
- <para>An HBase target table would need to exist for the job summary. The HTable method
+ <para>An HBase target table would need to exist for the job summary. The Table method
<code>incrementColumnValue</code> would be used to atomically increment values. From a
performance perspective, it might make sense to keep a Map of values with their values to
be incremeneted for each map-task, and make one update per key at during the <code>
@@ -1470,7 +1478,7 @@ if (!b) {
reducers. Neither is right or wrong, it depends on your use-case. Recognize that the more
reducers that are assigned to the job, the more simultaneous connections to the RDBMS will
be created - this will scale, but only to a point. </para>
- <programlisting>
+ <programlisting language="java">
public static class MyRdbmsReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private Connection c = null;
@@ -1500,12 +1508,14 @@ if (!b) {
<title>Accessing Other HBase Tables in a MapReduce Job</title>
<para>Although the framework currently allows one HBase table as input to a MapReduce job,
other HBase tables can be accessed as lookup tables, etc., in a MapReduce job via creating
- an HTable instance in the setup method of the Mapper.
- <programlisting>public class MyMapper extends TableMapper<Text, LongWritable> {
- private HTable myOtherTable;
+ an Table instance in the setup method of the Mapper.
+ <programlisting language="java">public class MyMapper extends TableMapper<Text, LongWritable> {
+ private Table myOtherTable;
public void setup(Context context) {
- myOtherTable = new HTable("myOtherTable");
+ // In here create a Connection to the cluster and save it or use the Connection
+ // from the existing table
+ myOtherTable = connection.getTable("myOtherTable");
}
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
@@ -1693,9 +1703,7 @@ if (!b) {
<section
xml:id="client">
<title>Client</title>
- <para>The HBase client <link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>
- is responsible for finding RegionServers that are serving the particular row range of
+ <para>The HBase client finds the RegionServers that are serving the particular row range of
interest. It does this by querying the <code>hbase:meta</code> table. See <xref
linkend="arch.catalog.meta" /> for details. After locating the required region(s), the
client contacts the RegionServer serving that region, rather than going through the master,
@@ -1703,29 +1711,41 @@ if (!b) {
subsequent requests need not go through the lookup process. Should a region be reassigned
either by the master load balancer or because a RegionServer has died, the client will
requery the catalog tables to determine the new location of the user region. </para>
+
<para>See <xref
linkend="master.runtime" /> for more information about the impact of the Master on HBase
Client communication. </para>
- <para>Administrative functions are handled through <link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link>
+ <para>Administrative functions are done via an instance of <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html">Admin</link>
</para>
+
<section
xml:id="client.connections">
- <title>Connections</title>
- <para>For connection configuration information, see <xref
- linkend="client_dependencies" />. </para>
+ <title>Cluster Connections</title>
+ <para>The API changed in HBase 1.0. Its been cleaned up and users are returned
+ Interfaces to work against rather than particular types. In HBase 1.0,
+ obtain a cluster Connection from ConnectionFactory and thereafter, get from it
+ instances of Table, Admin, and RegionLocator on an as-need basis. When done, close
+ obtained instances. Finally, be sure to cleanup your Connection instance before
+ exiting. Connections are heavyweight objects. Create once and keep an instance around.
+ Table, Admin and RegionLocator instances are lightweight. Create as you go and then
+ let go as soon as you are done by closing them. See the
+ <link xlink:href="/Users/stack/checkouts/hbase.git/target/site/apidocs/org/apache/hadoop/hbase/client/package-summary.html">Client Package Javadoc Description</link> for example usage of the new HBase 1.0 API.</para>
+
+ <para>For connection configuration information, see <xref linkend="client_dependencies" />. </para>
+
<para><emphasis><link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>
- instances are not thread-safe</emphasis>. Only one thread use an instance of HTable at
- any given time. When creating HTable instances, it is advisable to use the same <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html">Table</link>
+ instances are not thread-safe</emphasis>. Only one thread can use an instance of Table at
+ any given time. When creating Table instances, it is advisable to use the same <link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration">HBaseConfiguration</link>
instance. This will ensure sharing of ZooKeeper and socket instances to the RegionServers
which is usually what you want. For example, this is preferred:</para>
- <programlisting>HBaseConfiguration conf = HBaseConfiguration.create();
+ <programlisting language="java">HBaseConfiguration conf = HBaseConfiguration.create();
HTable table1 = new HTable(conf, "myTable");
HTable table2 = new HTable(conf, "myTable");</programlisting>
<para>as opposed to this:</para>
- <programlisting>HBaseConfiguration conf1 = HBaseConfiguration.create();
+ <programlisting language="java">HBaseConfiguration conf1 = HBaseConfiguration.create();
HTable table1 = new HTable(conf1, "myTable");
HBaseConfiguration conf2 = HBaseConfiguration.create();
HTable table2 = new HTable(conf2, "myTable");</programlisting>
@@ -1739,7 +1759,7 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
the following example:</para>
<example>
<title>Pre-Creating a <code>HConnection</code></title>
- <programlisting>// Create a connection to the cluster.
+ <programlisting language="java">// Create a connection to the cluster.
HConnection connection = HConnectionManager.createConnection(Configuration);
HTableInterface table = connection.getTable("myTable");
// use table as needed, the table returned is lightweight
@@ -1796,7 +1816,7 @@ connection.close();</programlisting>
represents a list of Filters with a relationship of <code>FilterList.Operator.MUST_PASS_ALL</code> or
<code>FilterList.Operator.MUST_PASS_ONE</code> between the Filters. The following example shows an 'or' between two
Filters (checking for either 'my value' or 'my other value' on the same attribute).</para>
-<programlisting>
+<programlisting language="java">
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE);
SingleColumnValueFilter filter1 = new SingleColumnValueFilter(
cf,
@@ -1829,7 +1849,7 @@ scan.setFilter(list);
</code>), inequality (<code>CompareOp.NOT_EQUAL</code>), or ranges (e.g.,
<code>CompareOp.GREATER</code>). The following is example of testing equivalence a
column to a String value "my value"...</para>
- <programlisting>
+ <programlisting language="java">
SingleColumnValueFilter filter = new SingleColumnValueFilter(
cf,
column,
@@ -1852,7 +1872,7 @@ scan.setFilter(filter);
<para><link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html">RegexStringComparator</link>
supports regular expressions for value comparisons.</para>
- <programlisting>
+ <programlisting language="java">
RegexStringComparator comp = new RegexStringComparator("my."); // any value that starts with 'my'
SingleColumnValueFilter filter = new SingleColumnValueFilter(
cf,
@@ -1873,7 +1893,7 @@ scan.setFilter(filter);
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html">SubstringComparator</link>
can be used to determine if a given substring exists in a value. The comparison is
case-insensitive. </para>
- <programlisting>
+ <programlisting language="java">
SubstringComparator comp = new SubstringComparator("y val"); // looking for 'my value'
SingleColumnValueFilter filter = new SingleColumnValueFilter(
cf,
@@ -1930,7 +1950,7 @@ scan.setFilter(filter);
<para>Note: The same column qualifier can be used in different column families. This
filter returns all matching columns. </para>
<para>Example: Find all columns in a row and family that start with "abc"</para>
- <programlisting>
+ <programlisting language="java">
HTableInterface t = ...;
byte[] row = ...;
byte[] family = ...;
@@ -1960,7 +1980,7 @@ rs.close();
prefixes. It can be used to efficiently get discontinuous sets of columns from very wide
rows. </para>
<para>Example: Find all columns in a row and family that start with "abc" or "xyz"</para>
- <programlisting>
+ <programlisting language="java">
HTableInterface t = ...;
byte[] row = ...;
byte[] family = ...;
@@ -1993,7 +2013,7 @@ rs.close();
filter returns all matching columns. </para>
<para>Example: Find all columns in a row and family between "bbbb" (inclusive) and "bbdd"
(inclusive)</para>
- <programlisting>
+ <programlisting language="java">
HTableInterface t = ...;
byte[] row = ...;
byte[] family = ...;
@@ -2145,46 +2165,56 @@ rs.close();
xml:id="block.cache">
<title>Block Cache</title>
- <para>HBase provides three different BlockCache implementations: the default onheap
- LruBlockCache, BucketCache, and SlabCache, which are both (usually) offheap. This section
+ <para>HBase provides two different BlockCache implementations: the default onheap
+ LruBlockCache and BucketCache, which is (usually) offheap. This section
discusses benefits and drawbacks of each implementation, how to choose the appropriate
option, and configuration options for each.</para>
+
+ <note><title>Block Cache Reporting: UI</title>
+ <para>See the RegionServer UI for detail on caching deploy. Since HBase-0.98.4, the
+ Block Cache detail has been significantly extended showing configurations,
+ sizings, current usage, time-in-the-cache, and even detail on block counts and types.</para>
+ </note>
+
<section>
+
<title>Cache Choices</title>
<para><classname>LruBlockCache</classname> is the original implementation, and is
- entirely within the Java heap. <classname>SlabCache</classname> and
- <classname>BucketCache</classname> are mainly intended for keeping blockcache
- data offheap, although BucketCache can also keep data onheap and in files.</para>
- <para><emphasis>SlabCache is deprecated and will be removed in 1.0!</emphasis></para>
- <para>BucketCache has seen more production deploys and has more deploy options. Fetching
- will always be slower when fetching from BucketCache or SlabCache, as compared with the
- native onheap LruBlockCache. However, latencies tend to be less erratic over time,
- because there is less garbage collection.</para>
- <para>Anecdotal evidence indicates that BucketCache requires less garbage collection than
- SlabCache so should be even less erratic (than SlabCache or LruBlockCache).</para>
- <para>SlabCache tends to do more garbage collections, because blocks are always moved
- between L1 and L2, at least given the way <classname>DoubleBlockCache</classname>
- currently works. When you enable SlabCache, you are enabling a two tier caching
- system, an L1 cache which is implemented by an instance of LruBlockCache and
- an offheap L2 cache which is implemented by SlabCache. Management of these
- two tiers and how blocks move between them is done by <classname>DoubleBlockCache</classname>
- when you are using SlabCache. DoubleBlockCache works by caching all blocks in L1
- AND L2. When blocks are evicted from L1, they are moved to L2. See
- <xref linkend="offheap.blockcache.slabcache" /> for more detail on how DoubleBlockCache works.
+ entirely within the Java heap. <classname>BucketCache</classname> is mainly
+ intended for keeping blockcache data offheap, although BucketCache can also
+ keep data onheap and serve from a file-backed cache.
+ <note><title>BucketCache is production ready as of hbase-0.98.6</title>
+ <para>To run with BucketCache, you need HBASE-11678. This was included in
+ hbase-0.98.6.
+ </para>
+ </note>
</para>
- <para>The hosting class for BucketCache is <classname>CombinedBlockCache</classname>.
- It keeps all DATA blocks in the BucketCache and meta blocks -- INDEX and BLOOM blocks --
+
+ <para>Fetching will always be slower when fetching from BucketCache,
+ as compared to the native onheap LruBlockCache. However, latencies tend to be
+ less erratic across time, because there is less garbage collection when you use
+ BucketCache since it is managing BlockCache allocations, not the GC. If the
+ BucketCache is deployed in offheap mode, this memory is not managed by the
+ GC at all. This is why you'd use BucketCache, so your latencies are less erratic and to mitigate GCs
+ and heap fragmentation. See Nick Dimiduk's <link
+ xlink:href="http://www.n10k.com/blog/blockcache-101/">BlockCache 101</link> for
+ comparisons running onheap vs offheap tests. Also see
+ <link xlink:href="http://people.apache.org/~stack/bc/">Comparing BlockCache Deploys</link>
+ which finds that if your dataset fits inside your LruBlockCache deploy, use it otherwise
+ if you are experiencing cache churn (or you want your cache to exist beyond the
+ vagaries of java GC), use BucketCache.
+ </para>
+
+ <para>When you enable BucketCache, you are enabling a two tier caching
+ system, an L1 cache which is implemented by an instance of LruBlockCache and
+ an offheap L2 cache which is implemented by BucketCache. Management of these
+ two tiers and the policy that dictates how blocks move between them is done by
+ <classname>CombinedBlockCache</classname>. It keeps all DATA blocks in the L2
+ BucketCache and meta blocks -- INDEX and BLOOM blocks --
onheap in the L1 <classname>LruBlockCache</classname>.
- </para>
- <para>Because the hosting class for each implementation
- (<classname>DoubleBlockCache</classname> vs <classname>CombinedBlockCache</classname>)
- works so differently, it is difficult to do a fair comparison between BucketCache and SlabCache.
- See Nick Dimiduk's <link
- xlink:href="http://www.n10k.com/blog/blockcache-101/">BlockCache 101</link> for some
- numbers.</para>
- <para>For more information about the off heap cache options, see <xref
- linkend="offheap.blockcache" />.</para>
+ See <xref linkend="offheap.blockcache" /> for more detail on going offheap.</para>
</section>
+
<section xml:id="cache.configurations">
<title>General Cache Configurations</title>
<para>Apart from the cache implementation itself, you can set some general configuration
@@ -2196,6 +2226,7 @@ rs.close();
introduced in <link xlink:href="https://issues.apache.org/jira/browse/HBASE-9857"
>HBASE-9857</link>.</para>
</section>
+
<section
xml:id="block.cache.design">
<title>LruBlockCache Design</title>
@@ -2219,7 +2250,7 @@ rs.close();
was accessed. Catalog tables are configured like this. This group is the last one
considered during evictions.</para>
<para>To mark a column family as in-memory, call
- <programlisting>HColumnDescriptor.setInMemory(true);</programlisting> if creating a table from java,
+ <programlisting language="java">HColumnDescriptor.setInMemory(true);</programlisting> if creating a table from java,
or set <command>IN_MEMORY => true</command> when creating or altering a table in
the shell: e.g. <programlisting>hbase(main):003:0> create 't', {NAME => 'f', IN_MEMORY => 'true'}</programlisting></para>
</listitem>
@@ -2334,58 +2365,58 @@ rs.close();
and would surely evict data that's currently in use. </para>
</listitem>
</itemizedlist>
+ <section xml:id="data.blocks.in.fscache">
+ <title>Caching META blocks only (DATA blocks in fscache)</title>
+ <para>An interesting setup is one where we cache META blocks only and we read DATA
+ blocks in on each access. If the DATA blocks fit inside fscache, this alternative
+ may make sense when access is completely random across a very large dataset.
+ To enable this setup, alter your table and for each column family
+ set <varname>BLOCKCACHE => 'false'</varname>. You are 'disabling' the
+ BlockCache for this column family only you can never disable the caching of
+ META blocks. Since
+ <link xlink:href="https://issues.apache.org/jira/browse/HBASE-4683">HBASE-4683 Always cache index and bloom blocks</link>,
+ we will cache META blocks even if the BlockCache is disabled.
+ </para>
+ </section>
</section>
<section
xml:id="offheap.blockcache">
<title>Offheap Block Cache</title>
- <section xml:id="offheap.blockcache.slabcache">
- <title>Enable SlabCache</title>
- <para><emphasis>SlabCache is deprecated and will be removed in 1.0!</emphasis></para>
- <para> SlabCache is originally described in <link
- xlink:href="http://blog.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/">Caching
- in Apache HBase: SlabCache</link>. Quoting from the API documentation for <link
- xlink:href="http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.html">DoubleBlockCache</link>,
- the hosting class for SlabCache deploys,
- DoubleBlockCache is an abstraction layer that combines two caches, the smaller onHeapCache and the
- larger offHeapCache. CacheBlock attempts to cache the block in both caches, while
- readblock reads first from the faster on heap cache before looking for the block in
- the off heap cache. Metrics are the combined size and hits and misses of both
- caches.</para>
- <para>To enable SlabCache, set the float
- <varname>hbase.offheapcache.percentage</varname> to some value between 0 and 1 in
- the <filename>hbase-site.xml</filename> file on the RegionServer. The value will be multiplied by the
- setting for <varname>-XX:MaxDirectMemorySize</varname> in the RegionServer's
- <filename>hbase-env.sh</filename> configuration file and the result is used by
- SlabCache as its offheap store. The onheap store will be the value of the float
- <varname>HConstants.HFILE_BLOCK_CACHE_SIZE_KEY</varname> setting (some value between
- 0 and 1) multiplied by the size of the allocated Java heap.</para>
- <para>Restart (or rolling restart) your cluster for the configurations to take effect.
- Check logs for errors or unexpected behavior.</para>
- </section>
<section xml:id="enable.bucketcache">
- <title>Enable BucketCache</title>
- <para>The usual deploy of BucketCache is via a
- managing class that sets up two caching tiers: an L1 onheap cache
- implemented by LruBlockCache and a second L2 cache implemented
- with BucketCache. The managing class is <link
- xlink:href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html">CombinedBlockCache</link> by default. The just-previous link describes the mechanism of CombinedBlockCache. In short, it works
+ <title>How to Enable BucketCache</title>
+ <para>The usual deploy of BucketCache is via a managing class that sets up two caching tiers: an L1 onheap cache
+ implemented by LruBlockCache and a second L2 cache implemented with BucketCache. The managing class is <link
+ xlink:href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html">CombinedBlockCache</link> by default.
+ The just-previous link describes the caching 'policy' implemented by CombinedBlockCache. In short, it works
by keeping meta blocks -- INDEX and BLOOM in the L1, onheap LruBlockCache tier -- and DATA
blocks are kept in the L2, BucketCache tier. It is possible to amend this behavior in
HBase since version 1.0 and ask that a column family have both its meta and DATA blocks hosted onheap in the L1 tier by
- setting <varname>cacheDataInL1</varname> via <programlisting>(HColumnDescriptor.setCacheDataInL1(true)</programlisting>
+ setting <varname>cacheDataInL1</varname> via
+ <code>(HColumnDescriptor.setCacheDataInL1(true)</code>
or in the shell, creating or amending column families setting <varname>CACHE_DATA_IN_L1</varname>
to true: e.g. <programlisting>hbase(main):003:0> create 't', {NAME => 't', CONFIGURATION => {CACHE_DATA_IN_L1 => 'true'}}</programlisting></para>
- <para>The BucketCache deploy can be onheap, offheap, or file based. You set which via the
- <varname>hbase.bucketcache.ioengine</varname> setting it to
- <varname>heap</varname> for BucketCache running as part of the java heap,
- <varname>offheap</varname> for BucketCache to make allocations offheap,
- and <varname>file:PATH_TO_FILE</varname> for BucketCache to use a file
- (Useful in particular if you have some fast i/o attached to the box such
+
+ <para>The BucketCache Block Cache can be deployed onheap, offheap, or file based.
+ You set which via the
+ <varname>hbase.bucketcache.ioengine</varname> setting. Setting it to
+ <varname>heap</varname> will have BucketCache deployed inside the
+ allocated java heap. Setting it to <varname>offheap</varname> will have
+ BucketCache make its allocations offheap,
+ and an ioengine setting of <varname>file:PATH_TO_FILE</varname> will direct
+ BucketCache to use a file caching (Useful in particular if you have some fast i/o attached to the box such
as SSDs).
</para>
- <para>To disable CombinedBlockCache, and use the BucketCache as a strict L2 cache to the L1
- LruBlockCache, set <varname>CacheConfig.BUCKET_CACHE_COMBINED_KEY</varname> to
- <literal>false</literal>. In this mode, on eviction from L1, blocks go to L2.</para>
+ <para xml:id="raw.l1.l2">It is possible to deploy an L1+L2 setup where we bypass the CombinedBlockCache
+ policy and have BucketCache working as a strict L2 cache to the L1
+ LruBlockCache. For such a setup, set <varname>CacheConfig.BUCKET_CACHE_COMBINED_KEY</varname> to
+ <literal>false</literal>. In this mode, on eviction from L1, blocks go to L2.
+ When a block is cached, it is cached first in L1. When we go to look for a cached block,
+ we look first in L1 and if none found, then search L2. Let us call this deploy format,
+ <emphasis><indexterm><primary>Raw L1+L2</primary></indexterm></emphasis>.</para>
+ <para>Other BucketCache configs include: specifying a location to persist cache to across
+ restarts, how many threads to use writing the cache, etc. See the
+ <link xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html">CacheConfig.html</link>
+ class for configuration options and descriptions.</para>
<procedure>
<title>BucketCache Example Configuration</title>
@@ -2408,7 +2439,7 @@ rs.close();
<step>
<para>Next, add the following configuration to the RegionServer's
<filename>hbase-site.xml</filename>.</para>
- <programlisting>
+ <programlisting language="xml">
<![CDATA[<property>
<name>hbase.bucketcache.ioengine</name>
<value>offheap</value>
@@ -2433,8 +2464,89 @@ rs.close();
In other words, you configure the L1 LruBlockCache as you would normally,
as you would when there is no L2 BucketCache present.
</para>
+ <para><link xlink:href="https://issues.apache.org/jira/browse/HBASE-10641"
+ >HBASE-10641</link> introduced the ability to configure multiple sizes for the
+ buckets of the bucketcache, in HBase 0.98 and newer. To configurable multiple bucket
+ sizes, configure the new property <option>hfile.block.cache.sizes</option> (instead of
+ <option>hfile.block.cache.size</option>) to a comma-separated list of block sizes,
+ ordered from smallest to largest, with no spaces. The goal is to optimize the bucket
+ sizes based on your data access patterns. The following example configures buckets of
+ size 4096 and 8192.</para>
+ <screen language="xml"><![CDATA[
+<property>
+ <name>hfile.block.cache.sizes</name>
+ <value>4096,8192</value>
+</property>
+ ]]></screen>
+ <note xml:id="direct.memory">
+ <title>Direct Memory Usage In HBase</title>
+ <para>The default maximum direct memory varies by JVM. Traditionally it is 64M
+ or some relation to allocated heap size (-Xmx) or no limit at all (JDK7 apparently).
+ HBase servers use direct memory, in particular short-circuit reading, the hosted DFSClient will
+ allocate direct memory buffers. If you do offheap block caching, you'll
+ be making use of direct memory. Starting your JVM, make sure
+ the <varname>-XX:MaxDirectMemorySize</varname> setting in
+ <filename>conf/hbase-env.sh</filename> is set to some value that is
+ higher than what you have allocated to your offheap blockcache
+ (<varname>hbase.bucketcache.size</varname>). It should be larger than your offheap block
+ cache and then some for DFSClient usage (How much the DFSClient uses is not
+ easy to quantify; it is the number of open hfiles * <varname>hbase.dfs.client.read.shortcircuit.buffer.size</varname>
+ where hbase.dfs.client.read.shortcircuit.buffer.size is set to 128k in HBase -- see <filename>hbase-default.xml</filename>
+ default configurations).
+ Direct memory, which is part of the Java process heap, is separate from the object
+ heap allocated by -Xmx. The value allocated by MaxDirectMemorySize must not exceed
+ physical RAM, and is likely to be less than the total available RAM due to other
+ memory requirements and system constraints.
+ </para>
+ <para>You can see how much memory -- onheap and offheap/direct -- a RegionServer is
+ configured to use and how much it is using at any one time by looking at the
+ <emphasis>Server Metrics: Memory</emphasis> tab in the UI. It can also be gotten
+ via JMX. In particular the direct memory currently used by the server can be found
+ on the <varname>java.nio.type=BufferPool,name=direct</varname> bean. Terracotta has
+ a <link
+ xlink:href="http://terracotta.org/documentation/4.0/bigmemorygo/configuration/storage-options"
+ >good write up</link> on using offheap memory in java. It is for their product
+ BigMemory but alot of the issues noted apply in general to any attempt at going
+ offheap. Check it out.</para>
+ </note>
+ <note xml:id="hbase.bucketcache.percentage.in.combinedcache"><title>hbase.bucketcache.percentage.in.combinedcache</title>
+ <para>This is a pre-HBase 1.0 configuration removed because it
+ was confusing. It was a float that you would set to some value
+ between 0.0 and 1.0. Its default was 0.9. If the deploy was using
+ CombinedBlockCache, then the LruBlockCache L1 size was calculated to
+ be (1 - <varname>hbase.bucketcache.percentage.in.combinedcache</varname>) * <varname>size-of-bucketcache</varname>
+ and the BucketCache size was <varname>hbase.bucketcache.percentage.in.combinedcache</varname> * size-of-bucket-cache.
+ where size-of-bucket-cache itself is EITHER the value of the configuration hbase.bucketcache.size
+ IF it was specified as megabytes OR <varname>hbase.bucketcache.size</varname> * <varname>-XX:MaxDirectMemorySize</varname> if
+ <varname>hbase.bucketcache.size</varname> between 0 and 1.0.
+ </para>
+ <para>In 1.0, it should be more straight-forward. L1 LruBlockCache size
+ is set as a fraction of java heap using hfile.block.cache.size setting
+ (not the best name) and L2 is set as above either in absolute
+ megabytes or as a fraction of allocated maximum direct memory.
+ </para>
+ </note>
</section>
</section>
+ <section>
+ <title>Comprewssed Blockcache</title>
+ <para><link xlink:href="https://issues.apache.org/jira/browse/HBASE-11331"
+ >HBASE-11331</link> introduced lazy blockcache decompression, more simply referred to
+ as compressed blockcache. When compressed blockcache is enabled. data and encoded data
+ blocks are cached in the blockcache in their on-disk format, rather than being
+ decompressed and decrypted before caching.</para>
+ <para xlink:href="https://issues.apache.org/jira/browse/HBASE-11331">For a RegionServer
+ hosting more data than can fit into cache, enabling this feature with SNAPPY compression
+ has been shown to result in 50% increase in throughput and 30% improvement in mean
+ latency while, increasing garbage collection by 80% and increasing overall CPU load by
+ 2%. See HBASE-11331 for more details about how performance was measured and achieved.
+ For a RegionServer hosting data that can comfortably fit into cache, or if your workload
+ is sensitive to extra CPU or garbage-collection load, you may receive less
+ benefit.</para>
+ <para>Compressed blockcache is disabled by default. To enable it, set
+ <code>hbase.block.data.cachecompressed</code> to <code>true</code> in
+ <filename>hbase-site.xml</filename> on all RegionServers.</para>
+ </section>
</section>
<section
@@ -2618,7 +2730,7 @@ rs.close();
ZooKeeper splitlog node (<filename>/hbase/splitlog</filename>) as tasks. You can
view the contents of the splitlog by issuing the following
<command>zkcli</command> command. Example output is shown.</para>
- <screen>ls /hbase/splitlog
+ <screen language="bourne">ls /hbase/splitlog
[hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2F.logs%2Fhost8.sample.com%2C57020%2C1340474893275-splitting%2Fhost8.sample.com%253A57020.1340474893900,
hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2F.logs%2Fhost3.sample.com%2C57020%2C1340474893299-splitting%2Fhost3.sample.com%253A57020.1340474893931,
hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2F.logs%2Fhost4.sample.com%2C57020%2C1340474893287-splitting%2Fhost4.sample.com%253A57020.1340474893946]
@@ -2969,6 +3081,205 @@ ctime = Sat Jun 23 11:13:40 PDT 2012
</para>
</section>
+ <section xml:id="regions.arch.states">
+ <title>Region State Transition</title>
+ <para> HBase maintains a state for each region and persists the state in META. The state
+ of the META region itself is persisted in ZooKeeper. You can see the states of regions
+ in transition in the Master web UI. Following is the list of possible region
+ states.</para>
+
+ <itemizedlist>
+ <title>Possible Region States</title>
+ <listitem>
+ <para>OFFLINE: the region is offline and not opening</para>
+ </listitem>
+ <listitem>
+ <para>OPENING: the region is in the process of being opened</para>
+ </listitem>
+ <listitem>
+ <para>OPEN: the region is open and the region server has notified the master</para>
+ </listitem>
+ <listitem>
+ <para>FAILED_OPEN: the region server failed to open the region</para>
+ </listitem>
+ <listitem>
+ <para>CLOSING: the region is in the process of being closed</para>
+ </listitem>
+ <listitem>
+ <para>CLOSED: the region server has closed the region and notified the master</para>
+ </listitem>
+ <listitem>
+ <para>FAILED_CLOSE: the region server failed to close the region</para>
+ </listitem>
+ <listitem>
+ <para>SPLITTING: the region server notified the master that the region is
+ splitting</para>
+ </listitem>
+ <listitem>
+ <para>SPLIT: the region server notified the master that the region has finished
+ splitting</para>
+ </listitem>
+ <listitem>
+ <para>SPLITTING_NEW: this region is being created by a split which is in
+ progress</para>
+ </listitem>
+ <listitem>
+ <para>MERGING: the region server notified the master that this region is being merged
+ with another region</para>
+ </listitem>
+ <listitem>
+ <para>MERGED: the region server notified the master that this region has been
+ merged</para>
+ </listitem>
+ <listitem>
+ <para>MERGING_NEW: this region is being created by a merge of two regions</para>
+ </listitem>
+ </itemizedlist>
+
+ <figure>
+ <title>Region State Transitions</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata align="center" valign="middle" fileref="region_states.png"/>
+ </imageobject>
+ <caption>
+ <para>This graph shows all allowed transitions a region can undergo. In the graph,
+ each node is a state. A node has a color based on the state type, for readability.
+ A directed line in the graph is a possible state transition.</para>
+ </caption>
+ </mediaobject>
+ </figure>
+
+ <itemizedlist>
+ <title>Graph Legend</title>
+ <listitem>
+ <para>Brown: Offline state, a special state that can be transient (after closed before
+ opening), terminal (regions of disabled tables), or initial (regions of newly
+ created tables)</para></listitem>
+ <listitem>
+ <para>Palegreen: Online state that regions can serve requests</para></listitem>
+ <listitem>
+ <para>Lightblue: Transient states</para></listitem>
+ <listitem>
+ <para>Red: Failure states that need OPS attention</para></listitem>
+ <listitem>
+ <para>Gold: Terminal states of regions split/merged</para></listitem>
+ <listitem>
+ <para>Grey: Initial states of regions created through split/merge</para></listitem>
+ </itemizedlist>
+
+ <orderedlist>
+ <title>Region State Transitions Explained</title>
+ <listitem>
+ <para>The master moves a region from <literal>OFFLINE</literal> to
+ <literal>OPENING</literal> state and tries to assign the region to a region
+ server. The region server may or may not have received the open region request. The
+ master retries sending the open region request to the region server until the RPC
+ goes through or the master runs out of retries. After the region server receives the
+ open region request, the region server begins opening the region.</para>
+ </listitem>
+ <listitem>
+ <para>If the master is running out of retries, the master prevents the region server
+ from opening the region by moving the region to <literal>CLOSING</literal> state and
+ trying to close it, even if the region server is starting to open the region.</para>
+ </listitem>
+ <listitem>
+ <para>After the region server opens the region, it continues to try to notify the
+ master until the master moves the region to <literal>OPEN</literal> state and
+ notifies the region server. The region is now open.</para>
+ </listitem>
+ <listitem>
+ <para>If the region server cannot open the region, it notifies the master. The master
+ moves the region to <literal>CLOSED</literal> state and tries to open the region on
+ a different region server.</para>
+ </listitem>
+ <listitem>
+ <para>If the master cannot open the region on any of a certain number of regions, it
+ moves the region to <literal>FAILED_OPEN</literal> state, and takes no further
+ action until an operator intervenes from the HBase shell, or the server is
+ dead.</para>
+ </listitem>
+ <listitem>
+ <para>The master moves a region from <literal>OPEN</literal> to
+ <literal>CLOSING</literal> state. The region server holding the region may or may
+ not have received the close region request. The master retries sending the close
+ request to the server until the RPC goes through or the master runs out of
+ retries.</para>
+ </listitem>
+ <listitem>
+ <para>If the region server is not online, or throws
+ <code>NotServingRegionException</code>, the master moves the region to
+ <literal>OFFLINE</literal> state and re-assigns it to a different region
+ server.</para>
+ </listitem>
+ <listitem>
+ <para>If the region server is online, but not reachable after the master runs out of
+ retries, the master moves the region to <literal>FAILED_CLOSE</literal> state and
+ takes no further action until an operator intervenes from the HBase shell, or the
+ server is dead.</para>
+ </listitem>
+ <listitem>
+ <para>If the region server gets the close region request, it closes the region and
+ notifies the master. The master moves the region to <literal>CLOSED</literal> state
+ and re-assigns it to a different region server.</para>
+ </listitem>
+ <listitem>
+ <para>Before assigning a region, the master moves the region to
+ <literal>OFFLINE</literal> state automatically if it is in
+ <literal>CLOSED</literal> state.</para>
+ </listitem>
+ <listitem>
+ <para>When a region server is about to split a region, it notifies the master. The
+ master moves the region to be split from <literal>OPEN</literal> to
+ <literal>SPLITTING</literal> state and add the two new regions to be created to
+ the region server. These two regions are in <literal>SPLITING_NEW</literal> state
+ initially.</para>
+ </listitem>
+ <listitem>
+ <para>After notifying the master, the region server starts to split the region. Once
+ past the point of no return, the region server notifies the master again so the
+ master can update the META. However, the master does not update the region states
+ until it is notified by the server that the split is done. If the split is
+ successful, the splitting region is moved from <literal>SPLITTING</literal> to
+ <literal>SPLIT</literal> state and the two new regions are moved from
+ <literal>SPLITTING_NEW</literal> to <literal>OPEN</literal> state.</para>
+ </listitem>
+ <listitem>
+ <para>If the split fails, the splitting region is moved from
+ <literal>SPLITTING</literal> back to <literal>OPEN</literal> state, and the two
+ new regions which were created are moved from <literal>SPLITTING_NEW</literal> to
+ <literal>OFFLINE</literal> state.</para>
+ </listitem>
+ <listitem>
+ <para>When a region server is about to merge two regions, it notifies the master
+ first. The master moves the two regions to be merged from <literal>OPEN</literal> to
+ <literal>MERGING</literal>state, and adds the new region which will hold the
+ contents of the merged regions region to the region server. The new region is in
+ <literal>MERGING_NEW</literal> state initially.</para>
+ </listitem>
+ <listitem>
+ <para>After notifying the master, the region server starts to merge the two regions.
+ Once past the point of no return, the region server notifies the master again so the
+ master can update the META. However, the master does not update the region states
+ until it is notified by the region server that the merge has completed. If the merge
+ is successful, the two merging regions are moved from <literal>MERGING</literal> to
+ <literal>MERGED</literal> state and the new region is moved from
+ <literal>MERGING_NEW</literal> to <literal>OPEN</literal> state.</para>
+ </listitem>
+ <listitem>
+ <para>If the merge fails, the two merging regions are moved from
+ <literal>MERGING</literal> back to <literal>OPEN</literal> state, and the new
+ region which was created to hold the contents of the merged regions is moved from
+ <literal>MERGING_NEW</literal> to <literal>OFFLINE</literal> state.</para>
+ </listitem>
+ <listitem>
+ <para>For regions in <literal>FAILED_OPEN</literal> or <literal>FAILED_CLOSE</literal>
+ states , the master tries to close them again when they are reassigned by an
+ operator via HBase Shell. </para>
+ </listitem>
+ </orderedlist>
+ </section>
+
</section> <!-- assignment -->
<section xml:id="regions.arch.locality">
@@ -3018,7 +3329,7 @@ ctime = Sat Jun 23 11:13:40 PDT 2012
Typically a custom split policy should extend HBase's default split policy: <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.html">ConstantSizeRegionSplitPolicy</link>.
</para>
<para>The policy can set globally through the HBaseConfiguration used or on a per table basis:
-<programlisting>
+<programlisting language="java">
HTableDescriptor myHtd = ...;
myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName());
</programlisting>
@@ -3125,7 +3436,7 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
opens merged region on the regionserver and reports the merge to Master at last.
</para>
<para>An example of region merges in the hbase shell
- <programlisting>$ hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME'
+ <programlisting language="bourne">$ hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME'
hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME', true
</programlisting>
It's an asynchronous operation and call returns immediately without waiting merge completed.
@@ -3227,10 +3538,10 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
<para>To view a textualized version of hfile content, you can do use
the <classname>org.apache.hadoop.hbase.io.hfile.HFile
- </classname>tool. Type the following to see usage:<programlisting><code>$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile </code> </programlisting>For
+ </classname>tool. Type the following to see usage:<programlisting language="bourne"><code>$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile </code> </programlisting>For
example, to view the content of the file
<filename>hdfs://10.81.47.41:8020/hbase/TEST/1418428042/DSMP/4759508618286845475</filename>,
- type the following:<programlisting> <code>$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f hdfs://10.81.47.41:8020/hbase/TEST/1418428042/DSMP/4759508618286845475 </code> </programlisting>If
+ type the following:<programlisting language="bourne"> <code>$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f hdfs://10.81.47.41:8020/hbase/TEST/1418428042/DSMP/4759508618286845475 </code> </programlisting>If
you leave off the option -v to see just a summary on the hfile. See
usage for other things to do with the <classname>HFile</classname>
tool.</para>
@@ -3315,307 +3626,514 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
</section>
</section>
- <section
+ <section
xml:id="compaction">
<title>Compaction</title>
- <para><firstterm>Compaction</firstterm> is an operation which reduces the number of
- StoreFiles, by merging them together, in order to increase performance on read
- operations. Compactions can be resource-intensive to perform, and can either help or
- hinder performance depending on many factors. </para>
- <para>Compactions fall into two categories: minor and major.</para>
- <para><firstterm>Minor compactions</firstterm> usually pick up a small number of small,
- adjacent <systemitem>StoreFiles</systemitem> and rewrite them as a single
- <systemitem>StoreFile</systemitem>. Minor compactions do not drop deletes or expired
- cells. If a minor compaction picks up all the <systemitem>StoreFiles</systemitem> in a
- <systemitem>Store</systemitem>, it promotes itself from a minor to a major compaction.
- If there are a lot of small files to be compacted, the algorithm tends to favor minor
- compactions to "clean up" those small files.</para>
- <para>The goal of a <firstterm>major compaction</firstterm> is to end up with a single
- StoreFile per store. Major compactions also process delete markers and max versions.
- Attempting to process these during a minor compaction could cause side effects. </para>
-
- <formalpara>
+ <itemizedlist>
+ <title>Ambiguous Terminology</title>
+ <listitem><para>A <firstterm>StoreFile</firstterm> is a facade of HFile. In terms of compaction, use of
+ StoreFile seems to have prevailed in the past.</para></listitem>
+ <listitem><para>A <firstterm>Store</firstterm> is the same thing as a ColumnFamily.
+ StoreFiles are related to a Store, or ColumnFamily.</para></listitem>
+ <listitem>
+ <para>If you want to read more about StoreFiles versus HFiles and Stores versus
+ ColumnFamilies, see <link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-11316">HBASE-11316</link>.</para>
+ </listitem>
+ </itemizedlist>
+ <para>When the MemStore reaches a given size
+ (<code>hbase.hregion.memstore.flush.size)</code>, it flushes its contents to a
+ StoreFile. The number of StoreFiles in a Store increases over time.
+ <firstterm>Compaction</firstterm> is an operation which reduces the number of
+ StoreFiles in a Store, by merging them together, in order to increase performance on
+ read operations. Compactions can be resource-intensive to perform, and can either help
+ or hinder performance depending on many factors. </para>
+ <para>Compactions fall into two categories: minor and major. Minor and major compactions
+ differ in the following ways.</para>
+ <para><firstterm>Minor compactions</firstterm> usually select a small number of small,
+ adjacent StoreFiles and rewrite them as a single StoreFile. Minor compactions do not
+ drop (filter out) deletes or expired versions, because of potential side effects. See <xref
+ linkend="compaction.and.deletes" /> and <xref
+ linkend="compaction.and.versions" /> for information on how deletes and versions are
+ handled in relation to compactions. The end result of a minor compaction is fewer,
+ larger StoreFiles for a given Store.</para>
+ <para>The end result of a <firstterm>major compaction</firstterm> is a single StoreFile
+ per Store. Major compactions also process delete markers and max versions. See <xref
+ linkend="compaction.and.deletes" /> and <xref
+ linkend="compaction.and.versions" /> for information on how deletes and versions are
+ handled in relation to compactions.</para>
+
+ <formalpara
+ xml:id="compaction.and.deletes">
<title>Compaction and Deletions</title>
<para> When an explicit deletion occurs in HBase, the data is not actually deleted.
Instead, a <firstterm>tombstone</firstterm> marker is written. The tombstone marker
prevents the data from being returned with queries. During a major compaction, the
data is actually deleted, and the tombstone marker is removed from the StoreFile. If
- the deletion happens because of an expired TTL, no tombstone is created. Instead, the
- expired data is filtered out and is not written back to the compacted StoreFile.</para>
+ the deletion happens because of an expired TTL, no tombstone is created. Instead, the
+ expired data is filtered out and is not written back to the compacted
+ StoreFile.</para>
</formalpara>
-
- <formalpara>
+
+ <formalpara
+ xml:id="compaction.and.versions">
<title>Compaction and Versions</title>
- <para> When you create a column family, you can specify the maximum number of versions
+ <para> When you create a Column Family, you can specify the maximum number of versions
to keep, by specifying <varname>HColumnDescriptor.setMaxVersions(int
versions)</varname>. The default value is <literal>3</literal>. If more versions
than the specified maximum exist, the excess versions are filtered out and not written
- back to the compacted StoreFile.</para>
+ back to the compacted StoreFile.</para>
</formalpara>
-
+
<note>
<title>Major Compactions Can Impact Query Results</title>
- <para> In some situations, older versions can be inadvertently
- resurrected if a newer version is explicitly deleted. See <xref
- linkend="major.compactions.change.query.results" /> for a more in-depth explanation. This
- situation is only possible before the compaction finishes.
- </para>
+ <para> In some situations, older versions can be inadvertently resurrected if a newer
+ version is explicitly deleted. See <xref
+ linkend="major.compactions.change.query.results" /> for a more in-depth explanation.
+ This situation is only possible before the compaction finishes. </para>
</note>
-
+
<para>In theory, major compactions improve performance. However, on a highly loaded
system, major compactions can require an inappropriate number of resources and adversely
affect performance. In a default configuration, major compactions are scheduled
- automatically to run once in a 7-day period. This is usually inappropriate for systems
+ automatically to run once in a 7-day period. This is sometimes inappropriate for systems
in production. You can manage major compactions manually. See <xref
linkend="managed.compactions" />. </para>
<para>Compactions do not perform region merges. See <xref
- linkend="ops.regionmgt.merge" /> for more information on region merging. </para>
+ linkend="ops.regionmgt.merge" /> for more information on region merging. </para>
<section
xml:id="compaction.file.selection">
- <title>Algorithm for Compaction File Selection - HBase 0.96.x and newer</title>
- <para>The compaction algorithms used by HBase have evolved over time. HBase 0.96
- introduced new algorithms for compaction file selection. To find out about the old
- algorithms, see <xref
- linkend="compaction" />. The rest of this section describes the new algorithm. File
- selection happens in several phases and is controlled by several configurable
- parameters. These parameters will be explained in context, and then will be given in a
- table which shows their descriptions, defaults, and implications of changing
- them.</para>
-
- <formalpara xml:id="exploringcompaction.policy">
- <title>The<link
- xlink:href="https://issues.apache.org/jira/browse/HBASE-7842">ExploringCompaction Policy</link></title>
- <para><link
- xlink:href="https://issues.apache.org/jira/browse/HBASE-7842">HBASE-7842</link>
- was introduced in HBase 0.96 and represents a major change in the algorithms for
- file selection for compactions. Its goal is to do the most impactful compaction with
- the lowest cost, in situations where a lot of files need compaction. In such a
- situation, the list of all eligible files is "explored", and files are grouped by
- size before any ratio-based algorithms are run. This favors clean-up of large
- numbers of small files before larger files are considered. For more details, refer
- to the link to the JIRA. Most of the code for this change can be reviewed in
- <filename>hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.java</filename>.</para>
- </formalpara>
-
- <variablelist>
- <title>Algorithms for Determining File List and Compaction Type</title>
- <varlistentry>
- <term>Create a list of all files which can possibly be compacted, ordered by
- sequence ID.</term>
- <listitem>
+ <title>Compaction Policy - HBase 0.96.x and newer</title>
+ <para>Compacting large StoreFiles, or too many StoreFiles at once, can cause more IO
+ load than your cluster is able to handle without causing performance problems. The
+ method by which HBase selects which StoreFiles to include in a compaction (and whether
+ the compaction is a minor or major compaction) is called the <firstterm>compaction
+ policy</firstterm>.</para>
+ <para>Prior to HBase 0.96.x, there was only one compaction policy. That original
+ compaction policy is still available as
+ <systemitem>RatioBasedCompactionPolicy</systemitem> The new compaction default
+ policy, called <systemitem>ExploringCompactionPolicy</systemitem>, was subsequently
+ backported to HBase 0.94 and HBase 0.95, and is the default in HBase 0.96 and newer.
+ It was implemented in <link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-7842">HBASE-7842</link>. In
+ short, <systemitem>ExploringCompactionPolicy</systemitem> attempts to select the best
+ possible set of StoreFiles to compact with the least amount of work, while the
+ <systemitem>RatioBasedCompactionPolicy</systemitem> selects the first set that meets
+ the criteria.</para>
+ <para>Regardless of the compaction policy used, file selection is controlled by several
+ configurable parameters and happens in a multi-step approach. These parameters will be
+ explained in context, and then will be given in a table which shows their
+ descriptions, defaults, and implications of changing them.</para>
+
+ <section
+ xml:id="compaction.being.stuck">
+ <title>Being Stuck</title>
+ <para>When the MemStore gets too large, it needs to flush its contents to a StoreFile.
+ However, a Store can only have <varname>hbase.hstore.blockingStoreFiles</varname>
+ files, so the MemStore needs to wait for the number of StoreFiles to be reduced by
+ one or more compactions. However, if the MemStore grows larger than
+ <varname>hbase.hregion.memstore.flush.size</varname>, it is not able to flush its
+ contents to a StoreFile. If the MemStore is too large and the number of StpreFo;es
+ is also too high, the algorithm is said to be "stuck". The compaction algorithm
+ checks for this "stuck" situation and provides mechanisms to alleviate it.</para>
+ </section>
+
+ <section
+ xml:id="exploringcompaction.policy">
+ <title>The ExploringCompactionPolicy Algorithm</title>
+ <para>The ExploringCompactionPolicy algorithm considers each possible set of
+ adjacent StoreFiles before choosing the set where compaction will have the most
+ benefit. </para>
+ <para>One situation where the ExploringCompactionPolicy works especially well is when
+ you are bulk-loading data and the bulk loads create larger StoreFiles than the
+ StoreFiles which are holding data older than the bulk-loaded data. This can "trick"
+ HBase into choosing to perform a major compaction each time a compaction is needed,
+ and cause a lot of extra overhead. With the ExploringCompactionPolicy, major
+ compactions happen much less frequently because minor compactions are more
+ efficient.</para>
+ <para>In general, ExploringCompactionPolicy is the right choice for most situations,
+ and thus is the default compaction policy. You can also use
+ ExploringCompactionPolicy along with <xref
+ linkend="ops.stripe" />.</para>
+ <para>The logic of this policy can be examined in
+ <filename>hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.java</filename>.
+ The following is a walk-through of the logic of the
+ ExploringCompactionPolicy.</para>
+ <procedure>
+ <step>
+ <para>Make a list of all existing StoreFiles in the Store. The rest of the
+ algorithm filters this list to come up with the subset of HFiles which will be
+ chosen for compaction.</para>
+ </step>
+ <step>
+ <para>If this was a user-requested compaction, attempt to perform the requested
+ compaction type, regardless of what would normally be chosen. Note that even if
+ the user requests a major compaction, it may not be possible to perform a major
+ compaction. This may be because not all StoreFiles in the Column Family are
+ available to compact or because there are too many Stores in the Column
+ Family.</para>
+ </step>
+ <step>
+ <para>Some StoreFiles are automatically excluded from consideration. These
+ include:</para>
+ <itemizedlist>
+ <listitem>
+ <para>StoreFiles that are larger than
+ <varname>hbase.hstore.compaction.max.size</varname></para>
+ </listitem>
+ <listitem>
+ <para>StoreFiles that were created by a bulk-load operation which explicitly
+ excluded compaction. You may decide to exclude StoreFiles resulting from
+ bulk loads, from compaction. To do this, specify the
+ <varname>hbase.mapreduce.hfileoutputformat.compaction.exclude</varname>
+ parameter during the bulk load operation.</para>
+ </listitem>
+ </itemizedlist>
+ </step>
+ <step>
+ <para>Iterate through the list from step 1, and make a list of all potential sets
+ of StoreFiles to compact together. A potential set is a grouping of
+ <varname>hbase.hstore.compaction.min</varname> contiguous StoreFiles in the
+ list. For each set, perform some sanity-checking and figure out whether this is
+ the best compaction that could be done:</para>
+ <itemizedlist>
+ <listitem>
+ <para>If the number of StoreFiles in this set (not the size of the StoreFiles)
+ is fewer than <varname>hbase.hstore.compaction.min</varname> or more than
+ <varname>hbase.hstore.compaction.max</varname>, take it out of
+ consideration.</para>
+ </listitem>
+ <listitem>
+ <para>Compare the size of this set of StoreFiles with the size of the smallest
+ possible compaction that has been found in the list so far. If the size of
+ this set of StoreFiles represents the smallest compaction that could be
+ done, store it to be used as a fall-back if the algorithm is "stuck" and no
+ StoreFiles would otherwise be chosen. See <xref
+ linkend="compaction.being.stuck" />.</para>
+ </listitem>
+ <listitem>
+ <para>Do size-based sanity checks against each StoreFile in this set of
+ StoreFiles.</para>
+ <itemizedlist>
+ <listitem>
+ <para>If the size of this StoreFile is larger than
+ <varname>hbase.hstore.compaction.max.size</varname>, take it out of
+ consideration.</para>
+ </listitem>
+ <listitem>
+ <para>If the size is greater than or equal to
+ <varname>hbase.hstore.compaction.min.size</varname>, sanity-check it
+ against the file-based ratio to see whether it is too large to be
+ considered. The sanity-checking is successful if:</para>
+ <itemizedlist>
+ <listitem>
+ <para>There is only one StoreFile in this set, or</para>
+ </listitem>
+ <listitem>
+ <para>For each StoreFile, its size multiplied by
+ <varname>hbase.hstore.compaction.ratio</varname> (or
+ <varname>hbase.hstore.compaction.ratio.offpeak</varname> if
+ off-peak hours are configured and it is during off-peak hours) is
+ less than the sum of the sizes of the other HFiles in the
+ set.</para>
+ </listitem>
+ </itemizedlist>
+ </listitem>
+ </itemizedlist>
+ </listitem>
+ </itemizedlist>
+ </step>
+ <step>
+ <para>If this set of StoreFiles is still in consideration, compare it to the
+ previously-selected best compaction. If it is better, replace the
+ previously-selected best compaction with this one.</para>
+ </step>
+ <step>
+ <para>When the entire list of potential compactions has been processed, perform
+ the best compaction that was found. If no StoreFiles were selected for
+ compaction, but there are multiple StoreFiles, assume the algorithm is stuck
+ (see <xref
+ linkend="compaction.being.stuck" />) and if so, perform the smallest
+ compaction t
<TRUNCATED>