You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2014/05/28 16:59:11 UTC
[13/14] HBASE-11199 One-time effort to pretty-print the Docbook XML,
to make further patch review easier (Misty Stanley-Jones)
http://git-wip-us.apache.org/repos/asf/hbase/blob/63e8304e/src/main/docbkx/book.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml
index 1fca2be..2ac9de3 100644
--- a/src/main/docbkx/book.xml
+++ b/src/main/docbkx/book.xml
@@ -19,38 +19,45 @@
* limitations under the License.
*/
-->
-<book version="5.0" xmlns="http://docbook.org/ns/docbook"
- xmlns:xlink="http://www.w3.org/1999/xlink"
- xmlns:xi="http://www.w3.org/2001/XInclude"
- xmlns:svg="http://www.w3.org/2000/svg"
- xmlns:m="http://www.w3.org/1998/Math/MathML"
- xmlns:html="http://www.w3.org/1999/xhtml"
- xmlns:db="http://docbook.org/ns/docbook" xml:id="book">
+<book
+ version="5.0"
+ xmlns="http://docbook.org/ns/docbook"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns:m="http://www.w3.org/1998/Math/MathML"
+ xmlns:html="http://www.w3.org/1999/xhtml"
+ xmlns:db="http://docbook.org/ns/docbook"
+ xml:id="book">
<info>
- <title><link xlink:href="http://www.hbase.org">
- The Apache HBase™ Reference Guide
- </link></title>
- <subtitle><link xlink:href="http://www.hbase.org">
- <inlinemediaobject>
- <imageobject>
- <imagedata align="center" valign="middle" fileref="hbase_logo.png" />
- </imageobject>
- </inlinemediaobject>
- </link>
+ <title><link
+ xlink:href="http://www.hbase.org"> The Apache HBase™ Reference Guide </link></title>
+ <subtitle><link
+ xlink:href="http://www.hbase.org">
+ <inlinemediaobject>
+ <imageobject>
+ <imagedata
+ align="center"
+ valign="middle"
+ fileref="hbase_logo.png" />
+ </imageobject>
+ </inlinemediaobject>
+ </link>
</subtitle>
- <copyright><year>2014</year><holder>Apache Software Foundation.
- All Rights Reserved. Apache Hadoop, Hadoop, MapReduce, HDFS, Zookeeper, HBase, and the HBase project logo are trademarks of the Apache Software Foundation.
- </holder>
+ <copyright>
+ <year>2014</year>
+ <holder>Apache Software Foundation. All Rights Reserved. Apache Hadoop, Hadoop, MapReduce,
+ HDFS, Zookeeper, HBase, and the HBase project logo are trademarks of the Apache Software
+ Foundation. </holder>
</copyright>
- <abstract>
- <para>This is the official reference guide of
- <link xlink:href="http://www.hbase.org">Apache HBase™</link>,
- a distributed, versioned, big data store built on top of
- <link xlink:href="http://hadoop.apache.org/">Apache Hadoop™</link> and
- <link xlink:href="http://zookeeper.apache.org/">Apache ZooKeeper™</link>.
- </para>
- </abstract>
+ <abstract>
+ <para>This is the official reference guide of <link
+ xlink:href="http://www.hbase.org">Apache HBase™</link>, a distributed, versioned, big
+ data store built on top of <link
+ xlink:href="http://hadoop.apache.org/">Apache Hadoop™</link> and <link
+ xlink:href="http://zookeeper.apache.org/">Apache ZooKeeper™</link>. </para>
+ </abstract>
<revhistory>
<revision>
@@ -65,151 +72,241 @@
</info>
<!--XInclude some chapters-->
- <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="preface.xml" />
- <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="getting_started.xml" />
- <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="configuration.xml" />
- <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="upgrading.xml"/>
- <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="shell.xml"/>
+ <xi:include
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ href="preface.xml" />
+ <xi:include
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ href="getting_started.xml" />
+ <xi:include
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ href="configuration.xml" />
+ <xi:include
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ href="upgrading.xml" />
+ <xi:include
+ xmlns:xi="http://www.w3.org/2001/XInclude"
+ href="shell.xml" />
- <chapter xml:id="datamodel">
+ <chapter
+ xml:id="datamodel">
<title>Data Model</title>
- <para>In short, applications store data into an HBase table.
- Tables are made of rows and columns.
- All columns in HBase belong to a particular column family.
- Table cells -- the intersection of row and column
- coordinates -- are versioned.
- A cell’s content is an uninterpreted array of bytes.
- </para>
- <para>Table row keys are also byte arrays so almost anything can
- serve as a row key from strings to binary representations of longs or
- even serialized data structures. Rows in HBase tables
- are sorted by row key. The sort is byte-ordered. All table accesses are
- via the table row key -- its primary key.
-</para>
+ <para>In short, applications store data into an HBase table. Tables are made of rows and
+ columns. All columns in HBase belong to a particular column family. Table cells -- the
+ intersection of row and column coordinates -- are versioned. A cell’s content is an
+ uninterpreted array of bytes. </para>
+ <para>Table row keys are also byte arrays so almost anything can serve as a row key from strings
+ to binary representations of longs or even serialized data structures. Rows in HBase tables
+ are sorted by row key. The sort is byte-ordered. All table accesses are via the table row key
+ -- its primary key. </para>
- <section xml:id="conceptual.view"><title>Conceptual View</title>
- <para>
- The following example is a slightly modified form of the one on page
- 2 of the <link xlink:href="http://research.google.com/archive/bigtable.html">BigTable</link> paper.
- There is a table called <varname>webtable</varname> that contains two column families named
- <varname>contents</varname> and <varname>anchor</varname>.
- In this example, <varname>anchor</varname> contains two
- columns (<varname>anchor:cssnsi.com</varname>, <varname>anchor:my.look.ca</varname>)
- and <varname>contents</varname> contains one column (<varname>contents:html</varname>).
- <note>
- <title>Column Names</title>
- <para>
- By convention, a column name is made of its column family prefix and a
- <emphasis>qualifier</emphasis>. For example, the
- column
- <emphasis>contents:html</emphasis> is made up of the column family <varname>contents</varname>
- and <varname>html</varname> qualifier.
- The colon character (<literal>:</literal>) delimits the column family from the
- column family <emphasis>qualifier</emphasis>.
- </para>
- </note>
- <table frame='all'><title>Table <varname>webtable</varname></title>
- <tgroup cols='4' align='left' colsep='1' rowsep='1'>
- <colspec colname='c1'/>
- <colspec colname='c2'/>
- <colspec colname='c3'/>
- <colspec colname='c4'/>
- <thead>
- <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily <varname>contents</varname></entry><entry>ColumnFamily <varname>anchor</varname></entry></row>
- </thead>
- <tbody>
- <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry></entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry></entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
- </tbody>
- </tgroup>
- </table>
- </para>
- </section>
- <section xml:id="physical.view"><title>Physical View</title>
- <para>
- Although at a conceptual level tables may be viewed as a sparse set of rows.
- Physically they are stored on a per-column family basis. New columns
- (i.e., <varname>columnfamily:column</varname>) can be added to any
- column family without pre-announcing them.
- <table frame='all'><title>ColumnFamily <varname>anchor</varname></title>
- <tgroup cols='3' align='left' colsep='1' rowsep='1'>
- <colspec colname='c1'/>
- <colspec colname='c2'/>
- <colspec colname='c3'/>
- <thead>
- <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>Column Family <varname>anchor</varname></entry></row>
- </thead>
- <tbody>
- <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
- </tbody>
- </tgroup>
- </table>
- <table frame='all'><title>ColumnFamily <varname>contents</varname></title>
- <tgroup cols='3' align='left' colsep='1' rowsep='1'>
- <colspec colname='c1'/>
- <colspec colname='c2'/>
- <colspec colname='c3'/>
- <thead>
- <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily "contents:"</entry></row>
- </thead>
- <tbody>
- <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
- </tbody>
- </tgroup>
- </table>
- It is important to note in the diagram above that the empty cells shown in the
- conceptual view are not stored since they need not be in a column-oriented
- storage format. Thus a request for the value of the <varname>contents:html</varname>
- column at time stamp <literal>t8</literal> would return no value. Similarly, a
- request for an <varname>anchor:my.look.ca</varname> value at time stamp
- <literal>t9</literal> would return no value. However, if no timestamp is
- supplied, the most recent value for a particular column would be returned
- and would also be the first one found since timestamps are stored in
- descending order. Thus a request for the values of all columns in the row
- <varname>com.cnn.www</varname> if no timestamp is specified would be:
- the value of <varname>contents:html</varname> from time stamp
- <literal>t6</literal>, the value of <varname>anchor:cnnsi.com</varname>
- from time stamp <literal>t9</literal>, the value of
- <varname>anchor:my.look.ca</varname> from time stamp <literal>t8</literal>.
- </para>
- <para>For more information about the internals of how Apache HBase stores data, see <xref linkend="regions.arch" />.
- </para>
- </section>
+ <section
+ xml:id="conceptual.view">
+ <title>Conceptual View</title>
+ <para> The following example is a slightly modified form of the one on page 2 of the <link
+ xlink:href="http://research.google.com/archive/bigtable.html">BigTable</link> paper. There
+ is a table called <varname>webtable</varname> that contains two column families named
+ <varname>contents</varname> and <varname>anchor</varname>. In this example,
+ <varname>anchor</varname> contains two columns (<varname>anchor:cssnsi.com</varname>,
+ <varname>anchor:my.look.ca</varname>) and <varname>contents</varname> contains one column
+ (<varname>contents:html</varname>). <note>
+ <title>Column Names</title>
+ <para> By convention, a column name is made of its column family prefix and a
+ <emphasis>qualifier</emphasis>. For example, the column
+ <emphasis>contents:html</emphasis> is made up of the column family
+ <varname>contents</varname> and <varname>html</varname> qualifier. The colon character
+ (<literal>:</literal>) delimits the column family from the column family
+ <emphasis>qualifier</emphasis>. </para>
+ </note>
+ <table
+ frame="all">
+ <title>Table <varname>webtable</varname></title>
+ <tgroup
+ cols="4"
+ align="left"
+ colsep="1"
+ rowsep="1">
+ <colspec
+ colname="c1" />
+ <colspec
+ colname="c2" />
+ <colspec
+ colname="c3" />
+ <colspec
+ colname="c4" />
+ <thead>
+ <row>
+ <entry>Row Key</entry>
+ <entry>Time Stamp</entry>
+ <entry>ColumnFamily <varname>contents</varname></entry>
+ <entry>ColumnFamily <varname>anchor</varname></entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>"com.cnn.www"</entry>
+ <entry>t9</entry>
+ <entry />
+ <entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry>
+ </row>
+ <row>
+ <entry>"com.cnn.www"</entry>
+ <entry>t8</entry>
+ <entry />
+ <entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry>
+ </row>
+ <row>
+ <entry>"com.cnn.www"</entry>
+ <entry>t6</entry>
+ <entry><varname>contents:html</varname> = "<html>..."</entry>
+ <entry />
+ </row>
+ <row>
+ <entry>"com.cnn.www"</entry>
+ <entry>t5</entry>
+ <entry><varname>contents:html</varname> = "<html>..."</entry>
+ <entry />
+ </row>
+ <row>
+ <entry>"com.cnn.www"</entry>
+ <entry>t3</entry>
+ <entry><varname>contents:html</varname> = "<html>..."</entry>
+ <entry />
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+ </section>
+ <section
+ xml:id="physical.view">
+ <title>Physical View</title>
+ <para> Although at a conceptual level tables may be viewed as a sparse set of rows. Physically
+ they are stored on a per-column family basis. New columns (i.e.,
+ <varname>columnfamily:column</varname>) can be added to any column family without
+ pre-announcing them. <table
+ frame="all">
+ <title>ColumnFamily <varname>anchor</varname></title>
+ <tgroup
+ cols="3"
+ align="left"
+ colsep="1"
+ rowsep="1">
+ <colspec
+ colname="c1" />
+ <colspec
+ colname="c2" />
+ <colspec
+ colname="c3" />
+ <thead>
+ <row>
+ <entry>Row Key</entry>
+ <entry>Time Stamp</entry>
+ <entry>Column Family <varname>anchor</varname></entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>"com.cnn.www"</entry>
+ <entry>t9</entry>
+ <entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry>
+ </row>
+ <row>
+ <entry>"com.cnn.www"</entry>
+ <entry>t8</entry>
+ <entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <table
+ frame="all">
+ <title>ColumnFamily <varname>contents</varname></title>
+ <tgroup
+ cols="3"
+ align="left"
+ colsep="1"
+ rowsep="1">
+ <colspec
+ colname="c1" />
+ <colspec
+ colname="c2" />
+ <colspec
+ colname="c3" />
+ <thead>
+ <row>
+ <entry>Row Key</entry>
+ <entry>Time Stamp</entry>
+ <entry>ColumnFamily "contents:"</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>"com.cnn.www"</entry>
+ <entry>t6</entry>
+ <entry><varname>contents:html</varname> = "<html>..."</entry>
+ </row>
+ <row>
+ <entry>"com.cnn.www"</entry>
+ <entry>t5</entry>
+ <entry><varname>contents:html</varname> = "<html>..."</entry>
+ </row>
+ <row>
+ <entry>"com.cnn.www"</entry>
+ <entry>t3</entry>
+ <entry><varname>contents:html</varname> = "<html>..."</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table> It is important to note in the diagram above that the empty cells shown in the
+ conceptual view are not stored since they need not be in a column-oriented storage format.
+ Thus a request for the value of the <varname>contents:html</varname> column at time stamp
+ <literal>t8</literal> would return no value. Similarly, a request for an
+ <varname>anchor:my.look.ca</varname> value at time stamp <literal>t9</literal> would
+ return no value. However, if no timestamp is supplied, the most recent value for a
+ particular column would be returned and would also be the first one found since timestamps
+ are stored in descending order. Thus a request for the values of all columns in the row
+ <varname>com.cnn.www</varname> if no timestamp is specified would be: the value of
+ <varname>contents:html</varname> from time stamp <literal>t6</literal>, the value of
+ <varname>anchor:cnnsi.com</varname> from time stamp <literal>t9</literal>, the value of
+ <varname>anchor:my.look.ca</varname> from time stamp <literal>t8</literal>. </para>
+ <para>For more information about the internals of how Apache HBase stores data, see <xref
+ linkend="regions.arch" />. </para>
+ </section>
- <section xml:id="namespace">
+ <section
+ xml:id="namespace">
<title>Namespace</title>
- <para>
- A namespace is a logical grouping of tables analogous to a database in relation database
- systems. This abstraction lays the groundwork for upcoming multi-tenancy related features:
- <itemizedlist>
- <listitem><para>Quota Management (HBASE-8410) - Restrict the amount of resources (ie
- regions, tables) a namespace can consume.</para></listitem>
- <listitem><para>Namespace Security Administration (HBASE-9206) - provide another
- level of security administration for tenants.</para></listitem>
- <listitem><para>Region server groups (HBASE-6721) - A namespace/table can be
- pinned onto a subset of regionservers thus guaranteeing a course level of
- isolation.</para></listitem>
+ <para> A namespace is a logical grouping of tables analogous to a database in relation
+ database systems. This abstraction lays the groundwork for upcoming multi-tenancy related
+ features: <itemizedlist>
+ <listitem>
+ <para>Quota Management (HBASE-8410) - Restrict the amount of resources (ie regions,
+ tables) a namespace can consume.</para>
+ </listitem>
+ <listitem>
+ <para>Namespace Security Administration (HBASE-9206) - provide another level of security
+ administration for tenants.</para>
+ </listitem>
+ <listitem>
+ <para>Region server groups (HBASE-6721) - A namespace/table can be pinned onto a subset
+ of regionservers thus guaranteeing a course level of isolation.</para>
+ </listitem>
</itemizedlist>
</para>
- <section xml:id="namespace_creation">
+ <section
+ xml:id="namespace_creation">
<title>Namespace management</title>
- <para>
- A namespace can be created, removed or altered. Namespace membership is determined during
- table creation by specifying a fully-qualified table name of the form:</para>
-
- <programlisting><table namespace>:<table qualifier></programlisting>
-
+ <para> A namespace can be created, removed or altered. Namespace membership is determined
+ during table creation by specifying a fully-qualified table name of the form:</para>
+
+ <programlisting><![CDATA[<table namespace>:<table qualifier>]]></programlisting>
+
<example>
<title>Examples</title>
- <programlisting>
+ <programlisting>
#Create a namespace
create_namespace 'my_ns'
</programlisting>
@@ -227,20 +324,23 @@ alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
</programlisting>
</example>
</section>
- <section xml:id="namespace_special">
+ <section
+ xml:id="namespace_special">
<title>Predefined namespaces</title>
- <para>
- There are two predefined special namespaces:
- <itemizedlist>
- <listitem><para>hbase - system namespace, used to contain hbase internal tables</para></listitem>
- <listitem><para>default - tables with no explicit specified namespace will automatically
- fall into this namespace.</para></listitem>
- </itemizedlist>
- </para>
-<example>
- <title>Examples</title>
+ <para> There are two predefined special namespaces: </para>
+ <itemizedlist>
+ <listitem>
+ <para>hbase - system namespace, used to contain hbase internal tables</para>
+ </listitem>
+ <listitem>
+ <para>default - tables with no explicit specified namespace will automatically fall into
+ this namespace.</para>
+ </listitem>
+ </itemizedlist>
+ <example>
+ <title>Examples</title>
-<programlisting>
+ <programlisting>
#namespace=foo and table qualifier=bar
create 'foo:bar', 'fam'
@@ -251,85 +351,85 @@ create 'bar', 'fam'
</section>
</section>
- <section xml:id="table">
+ <section
+ xml:id="table">
<title>Table</title>
- <para>
- Tables are declared up front at schema definition time.
- </para>
+ <para> Tables are declared up front at schema definition time. </para>
</section>
- <section xml:id="row">
+ <section
+ xml:id="row">
<title>Row</title>
- <para>Row keys are uninterrpreted bytes. Rows are
- lexicographically sorted with the lowest order appearing first
- in a table. The empty byte array is used to denote both the
- start and end of a tables' namespace.</para>
+ <para>Row keys are uninterrpreted bytes. Rows are lexicographically sorted with the lowest
+ order appearing first in a table. The empty byte array is used to denote both the start and
+ end of a tables' namespace.</para>
</section>
- <section xml:id="columnfamily">
+ <section
+ xml:id="columnfamily">
<title>Column Family<indexterm><primary>Column Family</primary></indexterm></title>
- <para>
- Columns in Apache HBase are grouped into <emphasis>column families</emphasis>.
- All column members of a column family have the same prefix. For example, the
- columns <emphasis>courses:history</emphasis> and
- <emphasis>courses:math</emphasis> are both members of the
- <emphasis>courses</emphasis> column family.
- The colon character (<literal
- >:</literal>) delimits the column family from the
- <indexterm><primary>column family qualifier</primary><secondary>Column Family Qualifier</secondary></indexterm>.
- The column family prefix must be composed of
- <emphasis>printable</emphasis> characters. The qualifying tail, the
- column family <emphasis>qualifier</emphasis>, can be made of any
- arbitrary bytes. Column families must be declared up front
- at schema definition time whereas columns do not need to be
- defined at schema time but can be conjured on the fly while
- the table is up an running.</para>
- <para>Physically, all column family members are stored together on the
- filesystem. Because tunings and
- storage specifications are done at the column family level, it is
- advised that all column family members have the same general access
- pattern and size characteristics.</para>
-
- <para></para>
+ <para> Columns in Apache HBase are grouped into <emphasis>column families</emphasis>. All
+ column members of a column family have the same prefix. For example, the columns
+ <emphasis>courses:history</emphasis> and <emphasis>courses:math</emphasis> are both
+ members of the <emphasis>courses</emphasis> column family. The colon character
+ (<literal>:</literal>) delimits the column family from the <indexterm><primary>column
+ family qualifier</primary><secondary>Column Family Qualifier</secondary></indexterm>.
+ The column family prefix must be composed of <emphasis>printable</emphasis> characters. The
+ qualifying tail, the column family <emphasis>qualifier</emphasis>, can be made of any
+ arbitrary bytes. Column families must be declared up front at schema definition time whereas
+ columns do not need to be defined at schema time but can be conjured on the fly while the
+ table is up an running.</para>
+ <para>Physically, all column family members are stored together on the filesystem. Because
+ tunings and storage specifications are done at the column family level, it is advised that
+ all column family members have the same general access pattern and size
+ characteristics.</para>
+
</section>
- <section xml:id="cells">
+ <section
+ xml:id="cells">
<title>Cells<indexterm><primary>Cells</primary></indexterm></title>
- <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
- specifies a <literal>cell</literal> in HBase.
- Cell content is uninterrpreted bytes</para>
+ <para>A <emphasis>{row, column, version} </emphasis>tuple exactly specifies a
+ <literal>cell</literal> in HBase. Cell content is uninterrpreted bytes</para>
</section>
- <section xml:id="data_model_operations">
- <title>Data Model Operations</title>
- <para>The four primary data model operations are Get, Put, Scan, and Delete. Operations are applied via
- <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link> instances.
- </para>
- <section xml:id="get">
+ <section
+ xml:id="data_model_operations">
+ <title>Data Model Operations</title>
+ <para>The four primary data model operations are Get, Put, Scan, and Delete. Operations are
+ applied via <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>
+ instances. </para>
+ <section
+ xml:id="get">
<title>Get</title>
- <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> returns
- attributes for a specified row. Gets are executed via
- <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#get%28org.apache.hadoop.hbase.client.Get%29">
- HTable.get</link>.
- </para>
+ <para><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link>
+ returns attributes for a specified row. Gets are executed via <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#get%28org.apache.hadoop.hbase.client.Get%29">
+ HTable.get</link>. </para>
</section>
- <section xml:id="put">
+ <section
+ xml:id="put">
<title>Put</title>
- <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link> either
- adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via
- <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put%28org.apache.hadoop.hbase.client.Put%29">
- HTable.put</link> (writeBuffer) or <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29">
- HTable.batch</link> (non-writeBuffer).
- </para>
+ <para><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link>
+ either adds new rows to a table (if the key is new) or can update existing rows (if the
+ key already exists). Puts are executed via <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put%28org.apache.hadoop.hbase.client.Put%29">
+ HTable.put</link> (writeBuffer) or <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29">
+ HTable.batch</link> (non-writeBuffer). </para>
</section>
- <section xml:id="scan">
- <title>Scans</title>
- <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link> allow
- iteration over multiple rows for specified attributes.
- </para>
- <para>The following is an example of a
- on an HTable table instance. Assume that a table is populated with rows with keys "row1", "row2", "row3",
- and then another set of rows with the keys "abc1", "abc2", and "abc3". The following example shows how startRow and stopRow
- can be applied to a Scan instance to return the rows beginning with "row".
-<programlisting>
+ <section
+ xml:id="scan">
+ <title>Scans</title>
+ <para><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link>
+ allow iteration over multiple rows for specified attributes. </para>
+ <para>The following is an example of a on an HTable table instance. Assume that a table is
+ populated with rows with keys "row1", "row2", "row3", and then another set of rows with
+ the keys "abc1", "abc2", and "abc3". The following example shows how startRow and stopRow
+ can be applied to a Scan instance to return the rows beginning with "row".</para>
+ <programlisting>
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
@@ -348,122 +448,121 @@ try {
rs.close(); // always close the ResultScanner!
}
</programlisting>
- </para>
- <para>Note that generally the easiest way to specify a specific stop point for a scan is by using the <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html">InclusiveStopFilter</link> class.
- </para>
- </section>
- <section xml:id="delete">
+ <para>Note that generally the easiest way to specify a specific stop point for a scan is by
+ using the <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html">InclusiveStopFilter</link>
+ class. </para>
+ </section>
+ <section
+ xml:id="delete">
<title>Delete</title>
- <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html">Delete</link> removes
- a row from a table. Deletes are executed via
- <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
- HTable.delete</link>.
- </para>
- <para>HBase does not modify data in place, and so deletes are handled by creating new markers called <emphasis>tombstones</emphasis>.
- These tombstones, along with the dead values, are cleaned up on major compactions.
- </para>
- <para>See <xref linkend="version.delete"/> for more information on deleting versions of columns, and see
- <xref linkend="compaction"/> for more information on compactions.
- </para>
+ <para><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html">Delete</link>
+ removes a row from a table. Deletes are executed via <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
+ HTable.delete</link>. </para>
+ <para>HBase does not modify data in place, and so deletes are handled by creating new
+ markers called <emphasis>tombstones</emphasis>. These tombstones, along with the dead
+ values, are cleaned up on major compactions. </para>
+ <para>See <xref
+ linkend="version.delete" /> for more information on deleting versions of columns, and
+ see <xref
+ linkend="compaction" /> for more information on compactions. </para>
</section>
</section>
- <section xml:id="versions">
+ <section
+ xml:id="versions">
<title>Versions<indexterm><primary>Versions</primary></indexterm></title>
- <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
- specifies a <literal>cell</literal> in HBase. It's possible to have an
- unbounded number of cells where the row and column are the same but the
- cell address differs only in its version dimension.</para>
-
- <para>While rows and column keys are expressed as bytes, the version is
- specified using a long integer. Typically this long contains time
- instances such as those returned by
- <code>java.util.Date.getTime()</code> or
- <code>System.currentTimeMillis()</code>, that is: <quote>the difference,
- measured in milliseconds, between the current time and midnight, January
- 1, 1970 UTC</quote>.</para>
-
- <para>The HBase version dimension is stored in decreasing order, so that
- when reading from a store file, the most recent values are found
- first.</para>
-
- <para>There is a lot of confusion over the semantics of
- <literal>cell</literal> versions, in HBase. In particular, a couple
- questions that often come up are:<itemizedlist>
- <listitem>
- <para>If multiple writes to a cell have the same version, are all
- versions maintained or just the last?<footnote>
- <para>Currently, only the last written is fetchable.</para>
- </footnote></para>
- </listitem>
-
- <listitem>
- <para>Is it OK to write cells in a non-increasing version
- order?<footnote>
- <para>Yes</para>
- </footnote></para>
- </listitem>
- </itemizedlist></para>
-
- <para>Below we describe how the version dimension in HBase currently
- works<footnote>
+ <para>A <emphasis>{row, column, version} </emphasis>tuple exactly specifies a
+ <literal>cell</literal> in HBase. It's possible to have an unbounded number of cells where
+ the row and column are the same but the cell address differs only in its version
+ dimension.</para>
+
+ <para>While rows and column keys are expressed as bytes, the version is specified using a long
+ integer. Typically this long contains time instances such as those returned by
+ <code>java.util.Date.getTime()</code> or <code>System.currentTimeMillis()</code>, that is:
+ <quote>the difference, measured in milliseconds, between the current time and midnight,
+ January 1, 1970 UTC</quote>.</para>
+
+ <para>The HBase version dimension is stored in decreasing order, so that when reading from a
+ store file, the most recent values are found first.</para>
+
+ <para>There is a lot of confusion over the semantics of <literal>cell</literal> versions, in
+ HBase. In particular, a couple questions that often come up are:</para>
+ <itemizedlist>
+ <listitem>
+ <para>If multiple writes to a cell have the same version, are all versions maintained or
+ just the last?<footnote>
+ <para>Currently, only the last written is fetchable.</para>
+ </footnote></para>
+ </listitem>
+
+ <listitem>
+ <para>Is it OK to write cells in a non-increasing version order?<footnote>
+ <para>Yes</para>
+ </footnote></para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Below we describe how the version dimension in HBase currently works<footnote>
<para>See <link
- xlink:href="https://issues.apache.org/jira/browse/HBASE-2406">HBASE-2406</link>
- for discussion of HBase versions. <link
- xlink:href="http://outerthought.org/blog/417-ot.html">Bending time
- in HBase</link> makes for a good read on the version, or time,
- dimension in HBase. It has more detail on versioning than is
- provided here. As of this writing, the limiitation
- <emphasis>Overwriting values at existing timestamps</emphasis>
- mentioned in the article no longer holds in HBase. This section is
- basically a synopsis of this article by Bruno Dumon.</para>
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-2406">HBASE-2406</link> for
+ discussion of HBase versions. <link
+ xlink:href="http://outerthought.org/blog/417-ot.html">Bending time in HBase</link>
+ makes for a good read on the version, or time, dimension in HBase. It has more detail on
+ versioning than is provided here. As of this writing, the limiitation
+ <emphasis>Overwriting values at existing timestamps</emphasis> mentioned in the
+ article no longer holds in HBase. This section is basically a synopsis of this article
+ by Bruno Dumon.</para>
</footnote>.</para>
- <section xml:id="versions.ops">
+ <section
+ xml:id="versions.ops">
<title>Versions and HBase Operations</title>
- <para>In this section we look at the behavior of the version dimension
- for each of the core HBase operations.</para>
+ <para>In this section we look at the behavior of the version dimension for each of the core
+ HBase operations.</para>
<section>
<title>Get/Scan</title>
- <para>Gets are implemented on top of Scans. The below discussion of
- <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> applies equally to <link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scans</link>.</para>
+ <para>Gets are implemented on top of Scans. The below discussion of <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link>
+ applies equally to <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scans</link>.</para>
- <para>By default, i.e. if you specify no explicit version, when
- doing a <literal>get</literal>, the cell whose version has the
- largest value is returned (which may or may not be the latest one
- written, see later). The default behavior can be modified in the
- following ways:</para>
+ <para>By default, i.e. if you specify no explicit version, when doing a
+ <literal>get</literal>, the cell whose version has the largest value is returned
+ (which may or may not be the latest one written, see later). The default behavior can be
+ modified in the following ways:</para>
<itemizedlist>
<listitem>
<para>to return more than one version, see <link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()">Get.setMaxVersions()</link></para>
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()">Get.setMaxVersions()</link></para>
</listitem>
<listitem>
<para>to return versions other than the latest, see <link
- xlink:href="???">Get.setTimeRange()</link></para>
+ xlink:href="???">Get.setTimeRange()</link></para>
- <para>To retrieve the latest version that is less than or equal
- to a given value, thus giving the 'latest' state of the record
- at a certain point in time, just use a range from 0 to the
- desired version and set the max versions to 1.</para>
+ <para>To retrieve the latest version that is less than or equal to a given value, thus
+ giving the 'latest' state of the record at a certain point in time, just use a range
+ from 0 to the desired version and set the max versions to 1.</para>
</listitem>
</itemizedlist>
</section>
- <section xml:id="default_get_example">
- <title>Default Get Example</title>
- <para>The following Get will only retrieve the current version of the row
-<programlisting>
+ <section
+ xml:id="default_get_example">
+ <title>Default Get Example</title>
+ <para>The following Get will only retrieve the current version of the row</para>
+ <programlisting>
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
@@ -471,12 +570,12 @@ Get get = new Get(Bytes.toBytes("row1"));
Result r = htable.get(get);
byte[] b = r.getValue(CF, ATTR); // returns current version of value
</programlisting>
- </para>
</section>
- <section xml:id="versioned_get_example">
- <title>Versioned Get Example</title>
- <para>The following Get will return the last 3 versions of the row.
-<programlisting>
+ <section
+ xml:id="versioned_get_example">
+ <title>Versioned Get Example</title>
+ <para>The following Get will return the last 3 versions of the row.</para>
+ <programlisting>
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
@@ -486,26 +585,25 @@ Result r = htable.get(get);
byte[] b = r.getValue(CF, ATTR); // returns current version of value
List<KeyValue> kv = r.getColumn(CF, ATTR); // returns all versions of this column
</programlisting>
- </para>
</section>
<section>
<title>Put</title>
- <para>Doing a put always creates a new version of a
- <literal>cell</literal>, at a certain timestamp. By default the
- system uses the server's <literal>currentTimeMillis</literal>, but
- you can specify the version (= the long integer) yourself, on a
- per-column level. This means you could assign a time in the past or
- the future, or use the long value for non-time purposes.</para>
-
- <para>To overwrite an existing value, do a put at exactly the same
- row, column, and version as that of the cell you would
- overshadow.</para>
- <section xml:id="implicit_version_example">
- <title>Implicit Version Example</title>
- <para>The following Put will be implicitly versioned by HBase with the current time.
-<programlisting>
+ <para>Doing a put always creates a new version of a <literal>cell</literal>, at a certain
+ timestamp. By default the system uses the server's <literal>currentTimeMillis</literal>,
+ but you can specify the version (= the long integer) yourself, on a per-column level.
+ This means you could assign a time in the past or the future, or use the long value for
+ non-time purposes.</para>
+
+ <para>To overwrite an existing value, do a put at exactly the same row, column, and
+ version as that of the cell you would overshadow.</para>
+ <section
+ xml:id="implicit_version_example">
+ <title>Implicit Version Example</title>
+ <para>The following Put will be implicitly versioned by HBase with the current
+ time.</para>
+ <programlisting>
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
@@ -513,12 +611,12 @@ Put put = new Put(Bytes.toBytes(row));
put.add(CF, ATTR, Bytes.toBytes( data));
htable.put(put);
</programlisting>
- </para>
</section>
- <section xml:id="explicit_version_example">
- <title>Explicit Version Example</title>
- <para>The following Put has the version timestamp explicitly set.
-<programlisting>
+ <section
+ xml:id="explicit_version_example">
+ <title>Explicit Version Example</title>
+ <para>The following Put has the version timestamp explicitly set.</para>
+ <programlisting>
public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
@@ -527,62 +625,63 @@ long explicitTimeInMs = 555; // just an example
put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data));
htable.put(put);
</programlisting>
- Caution: the version timestamp is internally by HBase for things like time-to-live calculations.
- It's usually best to avoid setting this timestamp yourself. Prefer using a separate
- timestamp attribute of the row, or have the timestamp a part of the rowkey, or both.
- </para>
+ <para>Caution: the version timestamp is internally by HBase for things like time-to-live
+ calculations. It's usually best to avoid setting this timestamp yourself. Prefer using
+ a separate timestamp attribute of the row, or have the timestamp a part of the rowkey,
+ or both. </para>
</section>
</section>
- <section xml:id="version.delete">
+ <section
+ xml:id="version.delete">
<title>Delete</title>
- <para>There are three different types of internal delete markers
- <footnote><para>See Lars Hofhansl's blog for discussion of his attempt
- adding another, <link xlink:href="http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html">Scanning in HBase: Prefix Delete Marker</link></para></footnote>:
- <itemizedlist>
- <listitem><para>Delete: for a specific version of a column.</para>
+ <para>There are three different types of internal delete markers <footnote>
+ <para>See Lars Hofhansl's blog for discussion of his attempt adding another, <link
+ xlink:href="http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html">Scanning
+ in HBase: Prefix Delete Marker</link></para>
+ </footnote>: </para>
+ <itemizedlist>
+ <listitem>
+ <para>Delete: for a specific version of a column.</para>
</listitem>
- <listitem><para>Delete column: for all versions of a column.</para>
+ <listitem>
+ <para>Delete column: for all versions of a column.</para>
</listitem>
- <listitem><para>Delete family: for all columns of a particular ColumnFamily</para>
+ <listitem>
+ <para>Delete family: for all columns of a particular ColumnFamily</para>
</listitem>
</itemizedlist>
- When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column).
- </para>
- <para>Deletes work by creating <emphasis>tombstone</emphasis>
- markers. For example, let's suppose we want to delete a row. For
- this you can specify a version, or else by default the
- <literal>currentTimeMillis</literal> is used. What this means is
- <quote>delete all cells where the version is less than or equal to
- this version</quote>. HBase never modifies data in place, so for
- example a delete will not immediately delete (or mark as deleted)
- the entries in the storage file that correspond to the delete
- condition. Rather, a so-called <emphasis>tombstone</emphasis> is
- written, which will mask the deleted values<footnote>
- <para>When HBase does a major compaction, the tombstones are
- processed to actually remove the dead values, together with the
- tombstones themselves.</para>
- </footnote>. If the version you specified when deleting a row is
- larger than the version of any value in the row, then you can
- consider the complete row to be deleted.</para>
- <para>For an informative discussion on how deletes and versioning interact, see
- the thread <link xlink:href="http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421">Put w/ timestamp -> Deleteall -> Put w/ timestamp fails</link>
- up on the user mailing list.</para>
- <para>Also see <xref linkend="keyvalue"/> for more information on the internal KeyValue format.
- </para>
- <para>Delete markers are purged during the major compaction of store,
- unless the KEEP_DELETED_CELLS is set in the column family. In some
- scenarios, users want to keep the deletes for a time and you can set the
- delete TTL: hbase.hstore.time.to.purge.deletes in the configuration.
- If this delete TTL is not set, or set to 0, all delete markers including those
- with future timestamp are purged during the later major compaction.
- Otherwise, a delete marker is kept until the major compaction after
- marker's timestamp + delete TTL.
- </para>
+ <para>When deleting an entire row, HBase will internally create a tombstone for each
+ ColumnFamily (i.e., not each individual column). </para>
+ <para>Deletes work by creating <emphasis>tombstone</emphasis> markers. For example, let's
+ suppose we want to delete a row. For this you can specify a version, or else by default
+ the <literal>currentTimeMillis</literal> is used. What this means is <quote>delete all
+ cells where the version is less than or equal to this version</quote>. HBase never
+ modifies data in place, so for example a delete will not immediately delete (or mark as
+ deleted) the entries in the storage file that correspond to the delete condition.
+ Rather, a so-called <emphasis>tombstone</emphasis> is written, which will mask the
+ deleted values<footnote>
+ <para>When HBase does a major compaction, the tombstones are processed to actually
+ remove the dead values, together with the tombstones themselves.</para>
+ </footnote>. If the version you specified when deleting a row is larger than the version
+ of any value in the row, then you can consider the complete row to be deleted.</para>
+ <para>For an informative discussion on how deletes and versioning interact, see the thread <link
+ xlink:href="http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421">Put w/
+ timestamp -> Deleteall -> Put w/ timestamp fails</link> up on the user mailing
+ list.</para>
+ <para>Also see <xref
+ linkend="keyvalue" /> for more information on the internal KeyValue format. </para>
+ <para>Delete markers are purged during the major compaction of store, unless the
+ KEEP_DELETED_CELLS is set in the column family. In some scenarios, users want to keep
+ the deletes for a time and you can set the delete TTL:
+ hbase.hstore.time.to.purge.deletes in the configuration. If this delete TTL is not set,
+ or set to 0, all delete markers including those with future timestamp are purged during
+ the later major compaction. Otherwise, a delete marker is kept until the major
+ compaction after marker's timestamp + delete TTL. </para>
</section>
- </section>
+ </section>
<section>
<title>Current Limitations</title>
@@ -608,18 +707,18 @@ htable.put(put);
within the same millisecond.</para>
</section>
- <section>
+ <section
+ xml:id="major.compactions.change.query.results">
<title>Major compactions change query results</title>
-
- <para><quote>...create three cell versions at t1, t2 and t3, with a
- maximum-versions setting of 2. So when getting all versions, only
- the values at t2 and t3 will be returned. But if you delete the
- version at t2 or t3, the one at t1 will appear again. Obviously,
- once a major compaction has run, such behavior will not be the case
- anymore...<footnote>
+
+ <para><quote>...create three cell versions at t1, t2 and t3, with a maximum-versions
+ setting of 2. So when getting all versions, only the values at t2 and t3 will be
+ returned. But if you delete the version at t2 or t3, the one at t1 will appear again.
+ Obviously, once a major compaction has run, such behavior will not be the case anymore...<footnote>
<para>See <emphasis>Garbage Collection</emphasis> in <link
- xlink:href="http://outerthought.org/blog/417-ot.html">Bending
- time in HBase</link> </para>
+ xlink:href="http://outerthought.org/blog/417-ot.html">Bending time in
+ HBase</link>
+ </para>
</footnote></quote></para>
</section>
</section>
@@ -1452,7 +1551,7 @@ connection.close();</programlisting>
<para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html">FilterList</link>
represents a list of Filters with a relationship of <code>FilterList.Operator.MUST_PASS_ALL</code> or
<code>FilterList.Operator.MUST_PASS_ONE</code> between the Filters. The following example shows an 'or' between two
- Filters (checking for either 'my value' or 'my other value' on the same attribute).
+ Filters (checking for either 'my value' or 'my other value' on the same attribute).</para>
<programlisting>
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE);
SingleColumnValueFilter filter1 = new SingleColumnValueFilter(
@@ -1471,21 +1570,22 @@ SingleColumnValueFilter filter2 = new SingleColumnValueFilter(
list.add(filter2);
scan.setFilter(list);
</programlisting>
- </para>
</section>
</section>
- <section xml:id="client.filter.cv"><title>Column Value</title>
- <section xml:id="client.filter.cv.scvf"><title>SingleColumnValueFilter</title>
+ <section
+ xml:id="client.filter.cv">
+ <title>Column Value</title>
+ <section
+ xml:id="client.filter.cv.scvf">
+ <title>SingleColumnValueFilter</title>
<para><link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html"
- >SingleColumnValueFilter</link> can be used to test column values for equivalence
- (<code><link
- xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/CompareFilter.CompareOp.html"
- >CompareOp.EQUAL</link>
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html">SingleColumnValueFilter</link>
+ can be used to test column values for equivalence (<code><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/CompareFilter.CompareOp.html">CompareOp.EQUAL</link>
</code>), inequality (<code>CompareOp.NOT_EQUAL</code>), or ranges (e.g.,
<code>CompareOp.GREATER</code>). The following is example of testing equivalence a
- column to a String value "my value"...
- <programlisting>
+ column to a String value "my value"...</para>
+ <programlisting>
SingleColumnValueFilter filter = new SingleColumnValueFilter(
cf,
column,
@@ -1494,17 +1594,21 @@ SingleColumnValueFilter filter = new SingleColumnValueFilter(
);
scan.setFilter(filter);
</programlisting>
- </para>
</section>
</section>
- <section xml:id="client.filter.cvp"><title>Column Value Comparators</title>
- <para>There are several Comparator classes in the Filter package that deserve special mention.
- These Comparators are used in concert with other Filters, such as <xref linkend="client.filter.cv.scvf" />.
- </para>
- <section xml:id="client.filter.cvp.rcs"><title>RegexStringComparator</title>
- <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html">RegexStringComparator</link>
- supports regular expressions for value comparisons.
-<programlisting>
+ <section
+ xml:id="client.filter.cvp">
+ <title>Column Value Comparators</title>
+ <para>There are several Comparator classes in the Filter package that deserve special
+ mention. These Comparators are used in concert with other Filters, such as <xref
+ linkend="client.filter.cv.scvf" />. </para>
+ <section
+ xml:id="client.filter.cvp.rcs">
+ <title>RegexStringComparator</title>
+ <para><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html">RegexStringComparator</link>
+ supports regular expressions for value comparisons.</para>
+ <programlisting>
RegexStringComparator comp = new RegexStringComparator("my."); // any value that starts with 'my'
SingleColumnValueFilter filter = new SingleColumnValueFilter(
cf,
@@ -1514,14 +1618,18 @@ SingleColumnValueFilter filter = new SingleColumnValueFilter(
);
scan.setFilter(filter);
</programlisting>
- See the Oracle JavaDoc for <link xlink:href="http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html">supported RegEx patterns in Java</link>.
- </para>
+ <para>See the Oracle JavaDoc for <link
+ xlink:href="http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html">supported
+ RegEx patterns in Java</link>. </para>
</section>
- <section xml:id="client.filter.cvp.SubStringComparator"><title>SubstringComparator</title>
- <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html">SubstringComparator</link>
- can be used to determine if a given substring exists in a value. The comparison is case-insensitive.
- </para>
-<programlisting>
+ <section
+ xml:id="client.filter.cvp.SubStringComparator">
+ <title>SubstringComparator</title>
+ <para><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html">SubstringComparator</link>
+ can be used to determine if a given substring exists in a value. The comparison is
+ case-insensitive. </para>
+ <programlisting>
SubstringComparator comp = new SubstringComparator("y val"); // looking for 'my value'
SingleColumnValueFilter filter = new SingleColumnValueFilter(
cf,
@@ -1532,37 +1640,53 @@ SingleColumnValueFilter filter = new SingleColumnValueFilter(
scan.setFilter(filter);
</programlisting>
</section>
- <section xml:id="client.filter.cvp.bfp"><title>BinaryPrefixComparator</title>
- <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.html">BinaryPrefixComparator</link>.</para>
+ <section
+ xml:id="client.filter.cvp.bfp">
+ <title>BinaryPrefixComparator</title>
+ <para>See <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.html">BinaryPrefixComparator</link>.</para>
</section>
- <section xml:id="client.filter.cvp.bc"><title>BinaryComparator</title>
- <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryComparator.html">BinaryComparator</link>.</para>
+ <section
+ xml:id="client.filter.cvp.bc">
+ <title>BinaryComparator</title>
+ <para>See <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryComparator.html">BinaryComparator</link>.</para>
</section>
</section>
- <section xml:id="client.filter.kvm"><title>KeyValue Metadata</title>
- <para>As HBase stores data internally as KeyValue pairs, KeyValue Metadata Filters evaluate the existence of keys (i.e., ColumnFamily:Column qualifiers)
- for a row, as opposed to values the previous section.
- </para>
- <section xml:id="client.filter.kvm.ff"><title>FamilyFilter</title>
- <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FamilyFilter.html">FamilyFilter</link> can be used
- to filter on the ColumnFamily. It is generally a better idea to select ColumnFamilies in the Scan than to do it with a Filter.</para>
+ <section
+ xml:id="client.filter.kvm">
+ <title>KeyValue Metadata</title>
+ <para>As HBase stores data internally as KeyValue pairs, KeyValue Metadata Filters evaluate
+ the existence of keys (i.e., ColumnFamily:Column qualifiers) for a row, as opposed to
+ values the previous section. </para>
+ <section
+ xml:id="client.filter.kvm.ff">
+ <title>FamilyFilter</title>
+ <para><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FamilyFilter.html">FamilyFilter</link>
+ can be used to filter on the ColumnFamily. It is generally a better idea to select
+ ColumnFamilies in the Scan than to do it with a Filter.</para>
</section>
- <section xml:id="client.filter.kvm.qf"><title>QualifierFilter</title>
- <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/QualifierFilter.html">QualifierFilter</link> can be used
- to filter based on Column (aka Qualifier) name.
- </para>
+ <section
+ xml:id="client.filter.kvm.qf">
+ <title>QualifierFilter</title>
+ <para><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/QualifierFilter.html">QualifierFilter</link>
+ can be used to filter based on Column (aka Qualifier) name. </para>
</section>
- <section xml:id="client.filter.kvm.cpf"><title>ColumnPrefixFilter</title>
- <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html">ColumnPrefixFilter</link> can be used
- to filter based on the lead portion of Column (aka Qualifier) names.
- </para>
- <para>A ColumnPrefixFilter seeks ahead to the first column matching the prefix in each row and for each involved column family. It can be used to efficiently
- get a subset of the columns in very wide rows.
- </para>
- <para>Note: The same column qualifier can be used in different column families. This filter returns all matching columns.
- </para>
- <para>Example: Find all columns in a row and family that start with "abc"
-<programlisting>
+ <section
+ xml:id="client.filter.kvm.cpf">
+ <title>ColumnPrefixFilter</title>
+ <para><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html">ColumnPrefixFilter</link>
+ can be used to filter based on the lead portion of Column (aka Qualifier) names. </para>
+ <para>A ColumnPrefixFilter seeks ahead to the first column matching the prefix in each row
+ and for each involved column family. It can be used to efficiently get a subset of the
+ columns in very wide rows. </para>
+ <para>Note: The same column qualifier can be used in different column families. This
+ filter returns all matching columns. </para>
+ <para>Example: Find all columns in a row and family that start with "abc"</para>
+ <programlisting>
HTableInterface t = ...;
byte[] row = ...;
byte[] family = ...;
@@ -1580,17 +1704,19 @@ for (Result r = rs.next(); r != null; r = rs.next()) {
}
rs.close();
</programlisting>
-</para>
</section>
- <section xml:id="client.filter.kvm.mcpf"><title>MultipleColumnPrefixFilter</title>
- <para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html">MultipleColumnPrefixFilter</link> behaves like ColumnPrefixFilter
- but allows specifying multiple prefixes.
- </para>
- <para>Like ColumnPrefixFilter, MultipleColumnPrefixFilter efficiently seeks ahead to the first column matching the lowest prefix and also seeks past ranges of columns between prefixes.
- It can be used to efficiently get discontinuous sets of columns from very wide rows.
- </para>
- <para>Example: Find all columns in a row and family that start with "abc" or "xyz"
-<programlisting>
+ <section
+ xml:id="client.filter.kvm.mcpf">
+ <title>MultipleColumnPrefixFilter</title>
+ <para><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html">MultipleColumnPrefixFilter</link>
+ behaves like ColumnPrefixFilter but allows specifying multiple prefixes. </para>
+ <para>Like ColumnPrefixFilter, MultipleColumnPrefixFilter efficiently seeks ahead to the
+ first column matching the lowest prefix and also seeks past ranges of columns between
+ prefixes. It can be used to efficiently get discontinuous sets of columns from very wide
+ rows. </para>
+ <para>Example: Find all columns in a row and family that start with "abc" or "xyz"</para>
+ <programlisting>
HTableInterface t = ...;
byte[] row = ...;
byte[] family = ...;
@@ -1608,19 +1734,22 @@ for (Result r = rs.next(); r != null; r = rs.next()) {
}
rs.close();
</programlisting>
-</para>
</section>
- <section xml:id="client.filter.kvm.crf "><title>ColumnRangeFilter</title>
- <para>A <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html">ColumnRangeFilter</link> allows efficient intra row scanning.
- </para>
- <para>A ColumnRangeFilter can seek ahead to the first matching column for each involved column family. It can be used to efficiently
- get a 'slice' of the columns of a very wide row.
- i.e. you have a million columns in a row but you only want to look at columns bbbb-bbdd.
- </para>
- <para>Note: The same column qualifier can be used in different column families. This filter returns all matching columns.
- </para>
- <para>Example: Find all columns in a row and family between "bbbb" (inclusive) and "bbdd" (inclusive)
-<programlisting>
+ <section
+ xml:id="client.filter.kvm.crf ">
+ <title>ColumnRangeFilter</title>
+ <para>A <link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html">ColumnRangeFilter</link>
+ allows efficient intra row scanning. </para>
+ <para>A ColumnRangeFilter can seek ahead to the first matching column for each involved
+ column family. It can be used to efficiently get a 'slice' of the columns of a very wide
+ row. i.e. you have a million columns in a row but you only want to look at columns
+ bbbb-bbdd. </para>
+ <para>Note: The same column qualifier can be used in different column families. This
+ filter returns all matching columns. </para>
+ <para>Example: Find all columns in a row and family between "bbbb" (inclusive) and "bbdd"
+ (inclusive)</para>
+ <programlisting>
HTableInterface t = ...;
byte[] row = ...;
byte[] family = ...;
@@ -1639,7 +1768,6 @@ for (Result r = rs.next(); r != null; r = rs.next()) {
}
rs.close();
</programlisting>
-</para>
<para>Note: Introduced in HBase 0.92</para>
</section>
</section>
@@ -2279,18 +2407,297 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
</section>
</section>
- <section xml:id="compaction">
- <title>Compaction</title>
- <para>There are two types of compactions: minor and major. Minor compactions will usually pick up a couple of the smaller adjacent
- StoreFiles and rewrite them as one. Minors do not drop deletes or expired cells, only major compactions do this. Sometimes a minor compaction
- will pick up all the StoreFiles in the Store and in this case it actually promotes itself to being a major compaction.
- </para>
- <para>After a major compaction runs there will be a single StoreFile per Store, and this will help performance usually. Caution: major compactions rewrite all of the Stores data and on a loaded system, this may not be tenable;
- major compactions will usually have to be done manually on large systems. See <xref linkend="managed.compactions" />.
- </para>
- <para>Compactions will <emphasis>not</emphasis> perform region merges. See <xref linkend="ops.regionmgt.merge"/> for more information on region merging.
- </para>
- <section xml:id="compaction.file.selection">
+ <section
+ xml:id="compaction">
+ <title>Compaction</title>
+ <para><firstterm>Compaction</firstterm> is an operation which reduces the number of
+ StoreFiles, by merging them together, in order to increase performance on read
+ operations. Compactions can be resource-intensive to perform, and can either help or
+ hinder performance depending on many factors. </para>
+ <para>Compactions fall into two categories: minor and major.</para>
+ <para><firstterm>Minor compactions</firstterm> usually pick up a small number of small,
+ adjacent <systemitem>StoreFiles</systemitem> and rewrite them as a single
+ <systemitem>StoreFile</systemitem>. Minor compactions do not drop deletes or expired
+ cells. If a minor compaction picks up all the <systemitem>StoreFiles</systemitem> in a
+ <systemitem>Store</systemitem>, it promotes itself from a minor to a major compaction.
+ If there are a lot of small files to be compacted, the algorithm tends to favor minor
+ compactions to "clean up" those small files.</para>
+ <para>The goal of a <firstterm>major compaction</firstterm> is to end up with a single
+ StoreFile per store. Major compactions also process delete markers and max versions.
+ Attempting to process these during a minor compaction could cause side effects. </para>
+
+ <formalpara>
+ <title>Compaction and Deletions</title>
+ <para> When an explicit deletion occurs in HBase, the data is not actually deleted.
+ Instead, a <firstterm>tombstone</firstterm> marker is written. The tombstone marker
+ prevents the data from being returned with queries. During a major compaction, the
+ data is actually deleted, and the tombstone marker is removed from the StoreFile. If
+ the deletion happens because of an expired TTL, no tombstone is created. Instead, the
+ expired data is filtered out and is not written back to the compacted StoreFile.</para>
+ </formalpara>
+
+ <formalpara>
+ <title>Compaction and Versions</title>
+ <para> When you create a column family, you can specify the maximum number of versions
+ to keep, by specifying <varname>HColumnDescriptor.setMaxVersions(int
+ versions)</varname>. The default value is <literal>3</literal>. If more versions
+ than the specified maximum exist, the excess versions are filtered out and not written
+ back to the compacted StoreFile.</para>
+ </formalpara>
+
+ <note>
+ <title>Major Compactions Can Impact Query Results</title>
+ <para> In some situations, older versions can be inadvertently
+ resurrected if a newer version is explicitly deleted. See <xref
+ linkend="major.compactions.change.query.results" /> for a more in-depth explanation. This
+ situation is only possible before the compaction finishes.
+ </para>
+ </note>
+
+ <para>In theory, major compactions improve performance. However, on a highly loaded
+ system, major compactions can require an inappropriate number of resources and adversely
+ affect performance. In a default configuration, major compactions are scheduled
+ automatically to run once in a 7-day period. This is usually inappropriate for systems
+ in production. You can manage major compactions manually. See <xref
+ linkend="managed.compactions" />. </para>
+ <para>Compactions do not perform region merges. See <xref
+ linkend="ops.regionmgt.merge" /> for more information on region merging. </para>
+ <section
+ xml:id="compaction.file.selection">
+ <title>Algorithm for Compaction File Selection - HBase 0.96.x and newer</title>
+ <para>The compaction algorithms used by HBase have evolved over time. HBase 0.96
+ introduced new algorithms for compaction file selection. To find out about the old
+ algorithms, see <xref
+ linkend="compaction" />. The rest of this section describes the new algorithm. File
+ selection happens in several phases and is controlled by several configurable
+ parameters. These parameters will be explained in context, and then will be given in a
+ table which shows their descriptions, defaults, and implications of changing
+ them.</para>
+
+ <formalpara>
+ <title>The<link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-7842">Exploring Compaction Policy</link></title>
+ <para><link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-7842">HBASE-7842</link>
+ was introduced in HBase 0.96 and represents a major change in the algorithms for
+ file selection for compactions. Its goal is to do the most impactful compaction with
+ the lowest cost, in situations where a lot of files need compaction. In such a
+ situation, the list of all eligible files is "explored", and files are grouped by
+ size before any ratio-based algorithms are run. This favors clean-up of large
+ numbers of small files before larger files are considered. For more details, refer
+ to the link to the JIRA. Most of the code for this change can be reviewed in
+ <filename>hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.java</filename>.</para>
+ </formalpara>
+
+ <variablelist>
+ <title>Algorithms for Determining File List and Compaction Type</title>
+ <varlistentry>
+ <term>Create a list of all files which can possibly be compacted, ordered by
+ sequence ID.</term>
+ <listitem>
+ <para>The first phase is to create a list of all candidates for compaction. A list
+ is created of all StoreFiles not already in the compaction queue, and all files
+ newer than the newest file that is currently being compacted. This list of files
+ is ordered by the sequence ID. The sequence ID is generated when a Put is
+ appended to the write-ahead log (WAL), and is stored in the metadata of the
+ StoreFile.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>Check to see if major compaction is required because there are too many
+ StoreFiles and the memstore is too large.</term>
+ <listitem>
+ <para>A store can only <varname>have hbase.hstore.blockingStoreFiles</varname>. If
+ the store has too many files, you cannot flush data. In addition, you cannot
+ perform an insert if the memstore is over
+ <varname>hbase.hregion.memstore.flush.size</varname>. Normally, minor
+ compactions will alleviate this situation. However, if the normal compaction
+ algorithm do not find any normally-eligible StoreFiles, a major compaction is
+ the only way to get out of this situation, and is forced. This is also called a
+ size-based or size-triggered major compaction.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>If this compaction was user-requested, do a major compaction.</term>
+ <listitem>
+ <para>Compactions can run on a schedule or can be initiated manually. If a
+ compaction is requested manually, it is always a major compaction. If the
+ compaction is user-requested, the major compaction still happens even if the are
+ more than <varname>hbase.hstore.compaction.max</varname> files that need
+ compaction.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>Exclude files which are too large.</term>
+ <listitem>
+ <para>The purpose of compaction is to merge small files together, and it is
+ counterproductive to compact files which are too large. Files larger than
+ <varname>hbase.hstore.compaction.max.size</varname> are excluded from
+ consideration.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>If configured, exclude bulk-loaded files.</term>
+ <listitem>
+ <para>You may decide to exclude bulk-loaded files from compaction, in the bulk
+ load operation, by specifying the
+ <varname>hbase.mapreduce.hfileoutputformat.compaction.exclude
+ parameter.</varname> If a bulk-loaded file was excluded, it is removed from
+ consideration at this point.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>If there are too many files to compact, do a minor compaction.</term>
+ <listitem>
+ <para>The maximum number of files allowed in a major compaction is controlled by
+ the <varname>hbase.hstore.compaction.max</varname> parameter. If the list
+ contains more than this number of files, a compaction that would otherwise be a
+ major compaction is downgraded to a minor compaction. However, a user-requested
+ major compaction still occurs even if there are more than
+ <varname>hbase.hstore.compaction.max</varname> files to compact.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>Only run the compaction if enough files need to be compacted.</term>
+ <listitem>
+ <para>If the list contains fewer than
+ <varname>hbase.hstore.compaction.min</varname> files to compact, compaction is
+ aborted.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>If this is a minor compaction, determine which files are eligible, based upon
+ the <varname>hbase.store.compaction.ratio</varname>.</term>
+ <listitem>
+ <para>The value of the <varname>hbase.store.compaction.ratio</varname> parameter
+ is multiplied by the sum of files smaller than a given file, to determine
+ whether that file is selected for compaction during a minor compaction. For
+ instance, if hbase.store.compaction.ratio is 1.2, FileX is 5 mb, FileY is 2 mb,
+ and FileZ is 3 mb:</para>
+ <screen>5 <= 1.2 x (2 + 3) or 5 <= 6</screen>
+ <para>In this scenario, FileX is eligible for minor compaction. If FileX were 7
+ mb, it would not be eligible for minor compaction. This ratio favors smaller
+ files. You can configure a different ratio for use in off-peak hours, using the
+ parameter <varname>hbase.hstore.compaction.ratio.offpeak</varname>, if you also
+ configure <varname>hbase.offpeak.start.hour</varname> and
+ <varname>hbase.offpeak.end.hour</varname>.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>If major compactions are not managed manually, and it has been too long since
+ the last major compaction, run a major compaction anyway.</term>
+ <listitem>
+ <para>If the last major compaction was too long ago and there is more than one
+ file to be compacted, a major compaction is run, even if it would otherwise have
+ been minor. By default, the maximum time between major compactions is 7 days,
+ plus or minus a 4.8 hour period, and determined randomly within those
+ parameters. Prior to HBase 0.96, the major compaction period was 24 hours. This
+ is also referred to as a time-based or time-triggered major compaction. See
+ <varname>hbase.hregion.majorcompaction</varname> in the table below to tune or
+ disable time-based major compactions.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <table>
+ <title>Parameters Used by Compaction Algorithm</title>
+ <textobject>
+ <para>This table contains the main configuration parameters for compaction. This
+ list is not exhaustive. To tune these parameters from the defaults, edit the
+ <filename>hbase-default.xml</filename> file. For a full list of all
+ configuration parameters available, see <xref
+ linkend="config.files" />.</para>
+ </textobject>
+ <tgroup
+ cols="3">
+ <thead>
+ <row>
+ <entry>Parameter</entry>
+ <entry>Description</entry>
+ <entry>Default</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>hbase.hstore.compaction.min</entry>
+ <entry>The minimum number of files which must be eligible for compaction before
+ compaction can run.</entry>
+ <entry>3</entry>
+ </row>
+ <row>
+ <entry>hbase.hstore.compaction.max</entry>
+ <entry>The maximum number of files which will be selected for a single minor
+ compaction, regardless of the number of eligible files.</entry>
+ <entry>10</entry>
+ </row>
+ <row>
+ <entry>hbase.hstore.compaction.min.size</entry>
+ <entry>A StoreFile smaller than this size (in bytes) will always be eligible for
+ minor compaction.</entry>
+ <entry>10</entry>
+ </row>
+ <row>
+ <entry>hbase.hstore.compaction.max.size</entry>
+ <entry>A StoreFile larger than this size (in bytes) will be excluded from minor
+ compaction.</entry>
+ <entry>1000</entry>
+ </row>
+ <row>
+ <entry>hbase.store.compaction.ratio</entry>
+ <entry>For minor compaction, this ratio is used to determine whether a given
+ file is eligible for compaction. Its effect is to limit compaction of large
+ files. Expressed as a floating-point decimal.</entry>
+ <entry>1.2F</entry>
+ </row>
+ <row>
+ <entry>hbase.hstore.compaction.ratio.offpeak</entry>
+ <entry>The compaction ratio used during off-peak compactions, if off-peak is
+ enabled. Expressed as a floating-point decimal. This allows for more
+ aggressive compaction, because in theory, the cluster is under less load.
+ Ignored if off-peak is disabled (default).</entry>
+ <entry>5.0F</entry>
+ </row>
+ <row>
+ <entry>hbase.offpeak.start.hour</entry>
+ <entry>The start of off-peak hours, expressed as an integer between 0 and 23,
+ inclusive. Set to <literal>-1</literal> to disable off-peak.</entry>
+ <entry>-1 (disabled)</entry>
+ </row>
+ <row>
+ <entry>hbase.offpeak.end.hour</entry>
+ <entry>The end of off-peak hours, expressed as an integer between 0 and 23,
+ inclusive. Set to <literal>-1</literal> to disable off-peak.</entry>
+ <entry>-1 (disabled)</entry>
+ </row>
+ <row>
+ <entry>hbase.regionserver.thread.compaction.throttle</entry>
+ <entry>Throttles compaction if too much of a backlog of compaction work
+ exists.</entry>
+ <entry>2 x hbase.hstore.compaction.max x hbase.hregion.memstore.flush.size
+ (which defaults to 128)</entry>
+ </row>
+ <row>
+ <entry>hbase.hregion.majorcompaction</entry>
+ <entry>Time between major compactions, expressed in milliseconds. Set to 0 to
+ disable time-based automatic major compactions. User-requested and size-based
<TRUNCATED>