You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by dm...@apache.org on 2011/10/15 23:31:49 UTC
svn commit: r1183727 [2/2] - /hbase/trunk/src/docbkx/book.xml
Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1183727&r1=1183726&r2=1183727&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Sat Oct 15 21:31:49 2011
@@ -64,1002 +64,998 @@
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="upgrading.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="shell.xml" />
+ <chapter xml:id="datamodel">
+ <title>Data Model</title>
+ <para>In short, applications store data into an HBase table.
+ Tables are made of rows and columns.
+ All columns in HBase belong to a particular column family.
+ Table cells -- the intersection of row and column
+ coordinates -- are versioned.
+ A cellâs content is an uninterpreted array of bytes.
+ </para>
+ <para>Table row keys are also byte arrays so almost anything can
+ serve as a row key from strings to binary representations of longs or
+ even serialized data structures. Rows in HBase tables
+ are sorted by row key. The sort is byte-ordered. All table accesses are
+ via the table row key -- its primary key.
+</para>
+ <section xml:id="conceptual.view"><title>Conceptual View</title>
+ <para>
+ The following example is a slightly modified form of the one on page
+ 2 of the <link xlink:href="http://labs.google.com/papers/bigtable.html">BigTable</link> paper.
+ There is a table called <varname>webtable</varname> that contains two column families named
+ <varname>contents</varname> and <varname>anchor</varname>.
+ In this example, <varname>anchor</varname> contains two
+ columns (<varname>anchor:cssnsi.com</varname>, <varname>anchor:my.look.ca</varname>)
+ and <varname>contents</varname> contains one column (<varname>contents:html</varname>).
+ <note>
+ <title>Column Names</title>
+ <para>
+ By convention, a column name is made of its column family prefix and a
+ <emphasis>qualifier</emphasis>. For example, the
+ column
+ <emphasis>contents:html</emphasis> is of the column family <varname>contents</varname>
+ The colon character (<literal
+ moreinfo="none">:</literal>) delimits the column family from the
+ column family <emphasis>qualifier</emphasis>.
+ </para>
+ </note>
+ <table frame='all'><title>Table <varname>webtable</varname></title>
+ <tgroup cols='4' align='left' colsep='1' rowsep='1'>
+ <colspec colname='c1'/>
+ <colspec colname='c2'/>
+ <colspec colname='c3'/>
+ <colspec colname='c4'/>
+ <thead>
+ <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily <varname>contents</varname></entry><entry>ColumnFamily <varname>anchor</varname></entry></row>
+ </thead>
+ <tbody>
+ <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry></entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
+ <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry></entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
+ <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
+ <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
+ <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+ </section>
+ <section xml:id="physical.view"><title>Physical View</title>
+ <para>
+ Although at a conceptual level tables may be viewed as a sparse set of rows.
+ Physically they are stored on a per-column family basis. New columns
+ (i.e., <varname>columnfamily:column</varname>) can be added to any
+ column family without pre-announcing them.
+ <table frame='all'><title>ColumnFamily <varname>anchor</varname></title>
+ <tgroup cols='3' align='left' colsep='1' rowsep='1'>
+ <colspec colname='c1'/>
+ <colspec colname='c2'/>
+ <colspec colname='c3'/>
+ <thead>
+ <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>Column Family <varname>anchor</varname></entry></row>
+ </thead>
+ <tbody>
+ <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
+ <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
+ </tbody>
+ </tgroup>
+ </table>
+ <table frame='all'><title>ColumnFamily <varname>contents</varname></title>
+ <tgroup cols='3' align='left' colsep='1' rowsep='1'>
+ <colspec colname='c1'/>
+ <colspec colname='c2'/>
+ <colspec colname='c3'/>
+ <thead>
+ <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily "contents:"</entry></row>
+ </thead>
+ <tbody>
+ <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
+ <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
+ <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
+ </tbody>
+ </tgroup>
+ </table>
+ It is important to note in the diagram above that the empty cells shown in the
+ conceptual view are not stored since they need not be in a column-oriented
+ storage format. Thus a request for the value of the <varname>contents:html</varname>
+ column at time stamp <literal>t8</literal> would return no value. Similarly, a
+ request for an <varname>anchor:my.look.ca</varname> value at time stamp
+ <literal>t9</literal> would return no value. However, if no timestamp is
+ supplied, the most recent value for a particular column would be returned
+ and would also be the first one found since timestamps are stored in
+ descending order. Thus a request for the values of all columns in the row
+ <varname>com.cnn.www</varname> if no timestamp is specified would be:
+ the value of <varname>contents:html</varname> from time stamp
+ <literal>t6</literal>, the value of <varname>anchor:cnnsi.com</varname>
+ from time stamp <literal>t9</literal>, the value of
+ <varname>anchor:my.look.ca</varname> from time stamp <literal>t8</literal>.
+ </para>
+ </section>
- <chapter xml:id="mapreduce">
- <title>HBase and MapReduce</title>
- <para>See <link xlink:href="http://hbase.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description">HBase and MapReduce</link> up in javadocs.
- Start there. Below is some additional help.</para>
- <section xml:id="splitter">
- <title>Map-Task Spitting</title>
- <section xml:id="splitter.default">
- <title>The Default HBase MapReduce Splitter</title>
- <para>When <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html">TableInputFormat</link>
- is used to source an HBase table in a MapReduce job,
- its splitter will make a map task for each region of the table.
- Thus, if there are 100 regions in the table, there will be
- 100 map-tasks for the job - regardless of how many column families are selected in the Scan.</para>
+ <section xml:id="table">
+ <title>Table</title>
+ <para>
+ Tables are declared up front at schema definition time.
+ </para>
</section>
- <section xml:id="splitter.custom">
- <title>Custom Splitters</title>
- <para>For those interested in implementing custom splitters, see the method <code>getSplits</code> in
- <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html">TableInputFormatBase</link>.
- That is where the logic for map-task assignment resides.
- </para>
+
+ <section xml:id="row">
+ <title>Row</title>
+ <para>Row keys are uninterrpreted bytes. Rows are
+ lexicographically sorted with the lowest order appearing first
+ in a table. The empty byte array is used to denote both the
+ start and end of a tables' namespace.</para>
</section>
- </section>
- <section xml:id="mapreduce.example">
- <title>HBase MapReduce Examples</title>
- <section xml:id="mapreduce.example.read">
- <title>HBase MapReduce Read Example</title>
- <para>The following is an example of using HBase as a MapReduce source in read-only manner. Specifically,
- there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper. There job would be defined
- as follows...
- <programlisting>
-Configuration config = HBaseConfiguration.create();
-Job job = new Job(config, "ExampleRead");
-job.setJarByClass(MyReadJob.class); // class that contains mapper
-
-Scan scan = new Scan();
-scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
-scan.setCacheBlocks(false); // don't set to true for MR jobs
-// set other scan attrs
-...
-
-TableMapReduceUtil.initTableMapperJob(
- tableName, // input HBase table name
- scan, // Scan instance to control CF and attribute selection
- MyMapper.class, // mapper
- null, // mapper output key
- null, // mapper output value
- job);
-job.setOutputFormatClass(NullOutputFormat.class); // because we aren't emitting anything from mapper
-
-boolean b = job.waitForCompletion(true);
-if (!b) {
- throw new IOException("error with job!");
-}
- </programlisting>
- ...and the mapper instance would extend <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html">TableMapper</link>...
- <programlisting>
-public static class MyMapper extends TableMapper<Text, Text> {
- public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
- // process data for the row from the Result instance.
- }
-}
- </programlisting>
- </para>
- </section>
- <section xml:id="mapreduce.example.readwrite">
- <title>HBase MapReduce Read/Write Example</title>
- <para>The following is an example of using HBase both as a source and as a sink with MapReduce.
- This example will simply copy data from one table to another.
- <programlisting>
-Configuration config = HBaseConfiguration.create();
-Job job = new Job(config,"ExampleReadWrite");
-job.setJarByClass(MyReadWriteJob.class); // class that contains mapper
-
-Scan scan = new Scan();
-scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
-scan.setCacheBlocks(false); // don't set to true for MR jobs
-// set other scan attrs
-
-TableMapReduceUtil.initTableMapperJob(
- sourceTable, // input table
- scan, // Scan instance to control CF and attribute selection
- MyMapper.class, // mapper class
- null, // mapper output key
- null, // mapper output value
- job);
-TableMapReduceUtil.initTableReducerJob(
- targetTable, // output table
- null, // reducer class
- job);
-job.setNumReduceTasks(0);
-
-boolean b = job.waitForCompletion(true);
-if (!b) {
- throw new IOException("error with job!");
-}
- </programlisting>
- An explanation is required of what <classname>TableMapReduceUtil</classname> is doing, especially with the reducer.
- <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link> is being used
- as the outputFormat class, and several parameters are being set on the config (e.g., TableOutputFormat.OUTPUT_TABLE), as
- well as setting the reducer output key to <classname>ImmutableBytesWritable</classname> and reducer value to <classname>Writable</classname>.
- These could be set by the programmer on the job and conf, but <classname>TableMapReduceUtil</classname> tries to make things easier.
- <para>The following is the example mapper, which will create a <classname>Put</classname> and matching the input <classname>Result</classname>
- and emit it. Note: this is what the CopyTable utility does.
- </para>
- <programlisting>
-public static class MyMapper extends TableMapper<ImmutableBytesWritable, Put> {
+ <section xml:id="columnfamily">
+ <title>Column Family<indexterm><primary>Column Family</primary></indexterm></title>
+ <para>
+ Columns in HBase are grouped into <emphasis>column families</emphasis>.
+ All column members of a column family have the same prefix. For example, the
+ columns <emphasis>courses:history</emphasis> and
+ <emphasis>courses:math</emphasis> are both members of the
+ <emphasis>courses</emphasis> column family.
+ The colon character (<literal
+ moreinfo="none">:</literal>) delimits the column family from the
+ <indexterm>column family <emphasis>qualifier</emphasis><primary>Column Family Qualifier</primary></indexterm>.
+ The column family prefix must be composed of
+ <emphasis>printable</emphasis> characters. The qualifying tail, the
+ column family <emphasis>qualifier</emphasis>, can be made of any
+ arbitrary bytes. Column families must be declared up front
+ at schema definition time whereas columns do not need to be
+ defined at schema time but can be conjured on the fly while
+ the table is up an running.</para>
+ <para>Physically, all column family members are stored together on the
+ filesystem. Because tunings and
+ storage specifications are done at the column family level, it is
+ advised that all column family members have the same general access
+ pattern and size characteristics.</para>
- public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
- // this example is just copying the data from the source table...
- context.write(row, resultToPut(row,value));
- }
-
- private static Put resultToPut(ImmutableBytesWritable key, Result result) throws IOException {
- Put put = new Put(key.get());
- for (KeyValue kv : result.raw()) {
- put.add(kv);
- }
- return put;
- }
-}
- </programlisting>
- <para>There isn't actually a reducer step, so <classname>TableOutputFormat</classname> takes care of sending the <classname>Put</classname>
- to the target table.
- </para>
- <para>This is just an example, developers could choose not to use <classname>TableOutputFormat</classname> and connect to the
- target table themselves.
- </para>
- </para>
+ <para></para>
</section>
- <section xml:id="mapreduce.example.summary">
- <title>HBase MapReduce Summary Example</title>
- <para>The following example uses HBase as a MapReduce source and sink with a summarization step. This example will
- count the number of distinct instances of a value in a table and write those summarized counts in another table.
- <programlisting>
-Configuration config = HBaseConfiguration.create();
-Job job = new Job(config,"ExampleSummary");
-job.setJarByClass(MySummaryJob.class); // class that contains mapper and reducer
-
+ <section xml:id="cells">
+ <title>Cells<indexterm><primary>Cells</primary></indexterm></title>
+ <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
+ specifies a <literal>cell</literal> in HBase.
+ Cell content is uninterrpreted bytes</para>
+ </section>
+ <section xml:id="data_model_operations">
+ <title>Data Model Operations</title>
+ <para>The four primary data model operations are Get, Put, Scan, and Delete. Operations are applied via
+ <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html">HTable</link> instances.
+ </para>
+ <section xml:id="get">
+ <title>Get</title>
+ <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Get.html">Get</link> returns
+ attributes for a specified row. Gets are executed via
+ <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#get%28org.apache.hadoop.hbase.client.Get%29">
+ HTable.get</link>.
+ </para>
+ </section>
+ <section xml:id="put">
+ <title>Put</title>
+ <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Put.html">Put</link> either
+ adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via
+ <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#put%28org.apache.hadoop.hbase.client.Put%29">
+ HTable.put</link> (writeBuffer) or <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29">
+ HTable.batch</link> (non-writeBuffer).
+ </para>
+ </section>
+ <section xml:id="scan">
+ <title>Scans</title>
+ <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Scan.html">Scan</link> allow
+ iteration over multiple rows for specified attributes.
+ </para>
+ <para>The following is an example of a
+ on an HTable table instance. Assume that a table is populated with rows with keys "row1", "row2", "row3",
+ and then another set of rows with the keys "abc1", "abc2", and "abc3". The following example shows how startRow and stopRow
+ can be applied to a Scan instance to return the rows beginning with "row".
+<programlisting>
+HTable htable = ... // instantiate HTable
+
Scan scan = new Scan();
-scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
-scan.setCacheBlocks(false); // don't set to true for MR jobs
-// set other scan attrs
-
-TableMapReduceUtil.initTableMapperJob(
- sourceTable, // input table
- scan, // Scan instance to control CF and attribute selection
- MyMapper.class, // mapper class
- Text.class, // mapper output key
- IntWritable.class, // mapper output value
- job);
-TableMapReduceUtil.initTableReducerJob(
- targetTable, // output table
- MyReducer.class, // reducer class
- job);
-job.setNumReduceTasks(1); // at least one, adjust as required
-
-boolean b = job.waitForCompletion(true);
-if (!b) {
- throw new IOException("error with job!");
-}
- </programlisting>
- In this example mapper a column with a String-value is chosen as the value to summarize upon.
- This value is used as the key to emit from the mapper, and an <classname>IntWritable</classname> represents an instance counter.
- <programlisting>
-public static class MyMapper extends TableMapper<Text, IntWritable> {
-
- private final IntWritable ONE = new IntWritable(1);
- private Text text = new Text();
-
- public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
- String val = new String(value.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr1")));
- text.set(val); // we can only emit Writables...
-
- context.write(text, ONE);
- }
-}
- </programlisting>
- In the reducer, the "ones" are counted (just like any other MR example that does this), and then emits a <classname>Put</classname>.
- <programlisting>
-public static class MyReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> {
-
- public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
- int i = 0;
- for (IntWritable val : values) {
- i += val.get();
- }
- Put put = new Put(Bytes.toBytes(key.toString()));
- put.add(Bytes.toBytes("cf"), Bytes.toBytes("count"), Bytes.toBytes(i));
-
- context.write(null, put);
- }
+scan.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("attr"));
+scan.setStartRow( Bytes.toBytes("row"));
+scan.setStopRow( Bytes.toBytes("row" + new byte[] {0})); // note: stop key != start key
+for(Result result : htable.getScanner(scan)) {
+ // process Result instance
}
- </programlisting>
-
- </para>
+</programlisting>
+ </para>
+ </section>
+ <section xml:id="delete">
+ <title>Delete</title>
+ <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Delete.html">Delete</link> removes
+ a row from a table. Deletes are executed via
+ <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
+ HTable.delete</link>.
+ </para>
+ </section>
+
</section>
- </section>
- <section xml:id="mapreduce.htable.access">
- <title>Accessing Other HBase Tables in a MapReduce Job</title>
- <para>Although the framework currently allows one HBase table as input to a
- MapReduce job, other HBase tables can
- be accessed as lookup tables, etc., in a
- MapReduce job via creating an HTable instance in the setup method of the Mapper.
- <programlisting>public class MyMapper extends TableMapper<Text, LongWritable> {
- private HTable myOtherTable;
- public void setup(Context context) {
- myOtherTable = new HTable("myOtherTable");
- }
-
- public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
- // process Result...
- // use 'myOtherTable' for lookups
- }
-
- </programlisting>
- </para>
- </section>
- <section xml:id="mapreduce.specex">
- <title>Speculative Execution</title>
- <para>It is generally advisable to turn off speculative execution for
- MapReduce jobs that use HBase as a source. This can either be done on a
- per-Job basis through properties, on on the entire cluster. Especially
- for longer running jobs, speculative execution will create duplicate
- map-tasks which will double-write your data to HBase; this is probably
- not what you want.
- </para>
- </section>
- </chapter>
- <chapter xml:id="schema">
- <title>HBase and Schema Design</title>
- <para>A good general introduction on the strength and weaknesses modelling on
- the various non-rdbms datastores is Ian Varleys' Master thesis,
- <link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link>.
- Recommended. Also, read <xref linkend="keyvalue"/> for how HBase stores data internally.
- </para>
- <section xml:id="schema.creation">
- <title>
- Schema Creation
- </title>
- <para>HBase schemas can be created or updated with <xref linkend="shell" />
- or by using <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link> in the Java API.
- </para>
- <para>Tables must be disabled when making ColumnFamily modifications, for example..
- <programlisting>
-Configuration config = HBaseConfiguration.create();
-HBaseAdmin admin = new HBaseAdmin(conf);
-String table = "myTable";
+ <section xml:id="versions">
+ <title>Versions<indexterm><primary>Versions</primary></indexterm></title>
-admin.disableTable(table);
+ <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
+ specifies a <literal>cell</literal> in HBase. Its possible to have an
+ unbounded number of cells where the row and column are the same but the
+ cell address differs only in its version dimension.</para>
-HColumnDescriptor cf1 = ...;
-admin.addColumn(table, cf1 ); // adding new ColumnFamily
-HColumnDescriptor cf2 = ...;
-admin.modifyColumn(table, cf2 ); // modifying existing ColumnFamily
+ <para>While rows and column keys are expressed as bytes, the version is
+ specified using a long integer. Typically this long contains time
+ instances such as those returned by
+ <code>java.util.Date.getTime()</code> or
+ <code>System.currentTimeMillis()</code>, that is: <quote>the difference,
+ measured in milliseconds, between the current time and midnight, January
+ 1, 1970 UTC</quote>.</para>
-admin.enableTable(table);
- </programlisting>
- </para>See <xref linkend="client_dependencies"/> for more information about configuring client connections.
- <para>
- </para>
- </section>
- <section xml:id="number.of.cfs">
- <title>
- On the number of column families
- </title>
- <para>
- HBase currently does not do well with anything above two or three column families so keep the number
- of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so
- if one column family is carrying the bulk of the data bringing on flushes, the adjacent families
- will also be flushed though the amount of data they carry is small. Compaction is currently triggered
- by the total number of files under a column family. Its not size based. When many column families the
- flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by
- changing flushing and compaction to work on a per column family basis).
- </para>
- <para>Try to make do with one column family if you can in your schemas. Only introduce a
- second and third column family in the case where data access is usually column scoped;
- i.e. you query one column family or the other but usually not both at the one time.
- </para>
- </section>
- <section xml:id="rowkey.design"><title>Rowkey Design</title>
- <section xml:id="timeseries">
- <title>
- Monotonically Increasing Row Keys/Timeseries Data
- </title>
- <para>
- In the HBase chapter of Tom White's book <link xlink:url="http://oreilly.com/catalog/9780596521981">Hadoop: The Definitive Guide</link> (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores:
- <link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>. The pile-up on a single region brought on
- by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general its best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
- </para>
+ <para>The HBase version dimension is stored in decreasing order, so that
+ when reading from a store file, the most recent values are found
+ first.</para>
+ <para>There is a lot of confusion over the semantics of
+ <literal>cell</literal> versions, in HBase. In particular, a couple
+ questions that often come up are:<itemizedlist>
+ <listitem>
+ <para>If multiple writes to a cell have the same version, are all
+ versions maintained or just the last?<footnote>
+ <para>Currently, only the last written is fetchable.</para>
+ </footnote></para>
+ </listitem>
- <para>If you do need to upload time series data into HBase, you should
- study <link xlink:href="http://opentsdb.net/">OpenTSDB</link> as a
- successful example. It has a page describing the <link xlink:href=" http://opentsdb.net/schema.html">schema</link> it uses in
- HBase. The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key. However, the difference is that the timestamp is not in the <emphasis>lead</emphasis> position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types. Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
- </para>
- </section>
- <section xml:id="keysize">
- <title>Try to minimize row and column sizes</title>
- <subtitle>Or why are my StoreFile indices large?</subtitle>
- <para>In HBase, values are always freighted with their coordinates; as a
- cell value passes through the system, it'll be accompanied by its
- row, column name, and timestamp - always. If your rows and column names
- are large, especially compared to the size of the cell value, then
- you may run up against some interesting scenarios. One such is
- the case described by Marc Limotte at the tail of
- <link xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
- (recommended!).
- Therein, the indices that are kept on HBase storefiles (<xref linkend="hfile" />)
- to facilitate random access may end up occupyng large chunks of the HBase
- allotted RAM because the cell value coordinates are large.
- Mark in the above cited comment suggests upping the block size so
- entries in the store file index happen at a larger interval or
- modify the table schema so it makes for smaller rows and column
- names.
- Compression will also make for larger indices. See
- the thread <link xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&subj=a+question+storefileIndexSize">a question storefileIndexSize</link>
- up on the user mailing list.
- </para>
- <para>Most of the time small inefficiencies don't matter all that much. Unfortunately,
- this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
- several billion times in your data. See <xref linkend="keyvalue"/> for more information on HBase stores data internally.</para>
- <section xml:id="keysize.cf"><title>Column Families</title>
- <para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
- </para>
- </section>
- <section xml:id="keysize.atttributes"><title>Attributes</title>
- <para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
- to store in HBase.
- </para>
- </section>
- <section xml:id="keysize.row"><title>Rowkey Length</title>
- <para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan).
- A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs
- when designing rowkeys.
- </para>
- </section>
- </section>
- <section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
- <para>A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps
- as a part of the key can help greatly with a special case of this problem. Also found in the HBase chapter of Tom White's book Hadoop: The Definitive Guide (O'Reilly),
- the technique involves appending (<code>Long.MAX_VALUE - timestamp</code>) to the end of any key, e.g., [key][reverse_timestamp].
- </para>
- <para>The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record. Since HBase keys
- are in sorted order, this key sorts before any older row-keys for [key] and thus is first.
- </para>
- <para>This technique would be used instead of using <xref linkend="schema.versions">HBase Versioning</xref> where the intent is to hold onto all versions
- "forever" (or a very long time) and at the same time quickly obtain access to any other version by using the same Scan technique.
- </para>
- </section>
- <section xml:id="rowkey.scope">
- <title>Rowkeys and ColumnFamilies</title>
- <para>Rowkeys are scoped to ColumnFamilies. Thus, the same rowkey could exist in each ColumnFamily that exists in a table without collision.
- </para>
- </section>
- <section xml:id="changing.rowkeys"><title>Immutability of Rowkeys</title>
- <para>Rowkeys cannot be changed. The only way they can be "changed" in a table is if the row is deleted and then re-inserted.
- This is a fairly common question on the HBase dist-list so it pays to get the rowkeys right the first time (and/or before you've
- inserted a lot of data).
- </para>
- </section>
- </section> <!-- rowkey design -->
- <section xml:id="schema.versions">
- <title>
- Number of Versions
- </title>
- <section xml:id="schema.versions.max"><title>Maximum Number of Versions</title>
- <para>The maximum number of row versions to store is configured per column
- family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
- The default for max versions is 3.
- This is an important parameter because as described in <xref linkend="datamodel" />
- section HBase does <emphasis>not</emphasis> overwrite row values, but rather
- stores different values per row by time (and qualifier). Excess versions are removed during major
- compactions. The number of max versions may need to be increased or decreased depending on application needs.
- </para>
- <para>It is not recommended setting the number of max versions to an exceedingly high level (e.g., hundreds or more) unless those old values are
- very dear to you because this will greatly increase StoreFile size.
- </para>
- </section>
- <section xml:id="schema.minversions">
- <title>
- Minimum Number of Versions
- </title>
- <para>Like number of max row versions, the minimum number of row versions to keep is configured per column
- family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
- The default for min versions is 0, which means the feature is disabled.
- The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the
- number of row versions parameter to allow configurations such as
- "keep the last T minutes worth of data, at most N versions, <emphasis>but keep at least M versions around</emphasis>"
- (where M is the value for minimum number of row versions, M<=N).
- This parameter should only be set when time-to-live is enabled for a column family and must be less or equal to the
- number of row versions.
- </para>
- </section>
- </section>
- <section xml:id="supported.datatypes">
- <title>
- Supported Datatypes
- </title>
- <para>HBase supports a "bytes-in/bytes-out" interface via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link> and
- <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html">Result</link>, so anything that can be
- converted to an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
- </para>
- <para>There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask);
- search the mailling list for conversations on this topic. All rows in HBase conform to the <xref linkend="datamodel">datamodel</xref>, and
- that includes versioning. Take that into consideration when making your design, as well as block size for the ColumnFamily.
- </para>
- <section xml:id="counters">
- <title>Counters</title>
- <para>
- One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See
- <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29">Increment</link> in HTable.
- </para>
- <para>Synchronization on counters are done on the RegionServer, not in the client.
- </para>
- </section>
- </section>
- <section xml:id="cf.in.memory">
- <title>
- In-Memory ColumnFamilies
- </title>
- <para>ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily.
- In-memory blocks have the highest priority in the <xref linkend="block.cache" />, but it is not a guarantee that the entire table
- will be in memory.
- </para>
- <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
- </para>
- </section>
- <section xml:id="ttl">
- <title>Time To Live (TTL)</title>
- <para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
- This applies to <emphasis>all</emphasis> versions of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC.
- </para>
- <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
- </para>
- </section>
- <section xml:id="schema.bloom">
- <title>Bloom Filters</title>
- <para>Bloom Filters can be enabled per-ColumnFamily.
- Use <code>HColumnDescriptor.setBloomFilterType(NONE | ROW |
- ROWCOL)</code> to enable blooms per Column Family. Default =
- <varname>NONE</varname> for no bloom filters. If
- <varname>ROW</varname>, the hash of the row will be added to the bloom
- on each insert. If <varname>ROWCOL</varname>, the hash of the row +
- column family + column family qualifier will be added to the bloom on
- each key insert.</para>
- <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> and
- <xref linkend="blooms"/> for more information.
- </para>
- </section>
- <section xml:id="secondary.indexes">
- <title>
- Secondary Indexes and Alternate Query Paths
- </title>
- <para>This section could also be titled "what if my table rowkey looks like <emphasis>this</emphasis> but I also want to query my table like <emphasis>that</emphasis>."
- A common example on the dist-list is where a row-key is of the format "user-timestamp" but there are are reporting requirements on activity across users for certain
- time ranges. Thus, selecting by user is easy because it is in the lead position of the key, but time is not.
- </para>
- <para>There is no single answer on the best way to handle this because it depends on...
- <itemizedlist>
- <listitem>Number of users</listitem>
- <listitem>Data size and data arrival rate</listitem>
- <listitem>Flexibility of reporting requirements (e.g., completely ad-hoc date selection vs. pre-configured ranges) </listitem>
- <listitem>Desired execution speed of query (e.g., 90 seconds may be reasonable to some for an ad-hoc report, whereas it may be too long for others) </listitem>
- </itemizedlist>
- ... and solutions are also influenced by the size of the cluster and how much processing power you have to throw at the solution.
- Common techniques are in sub-sections below. This is a comprehensive, but not exhaustive, list of approaches.
- </para>
- <para>It should not be a surprise that secondary indexes require additional cluster space and processing.
- This is precisely what happens in an RDBMS because the act of creating an alternate index requires both space and processing cycles to update. RBDMS products
- are more advanced in this regard to handle alternative index management out of the box. However, HBase scales better at larger data volumes, so this is a feature trade-off.
- </para>
- <para>Pay attention to <xref linkend="performance"/> when implementing any of these approaches.</para>
- <para>Additionally, see the David Butler response in this dist-list thread <link xlink:href="http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&subj=Stargate+hbase">HBase, mail # user - Stargate+hbase</link>
- </para>
- <section xml:id="secondary.indexes.filter">
- <title>
- Filter Query
- </title>
- <para>Depending on the case, it may be appropriate to use <xref linkend="client.filter"/>. In this case, no secondary index is created.
- However, don't try a full-scan on a large table like this from an application (i.e., single-threaded client).
- </para>
- </section>
- <section xml:id="secondary.indexes.periodic">
- <title>
- Periodic-Update Secondary Index
- </title>
- <para>A secondary index could be created in an other table which is periodically updated via a MapReduce job. The job could be executed intra-day, but depending on
- load-strategy it could still potentially be out of sync with the main data table.</para>
- <para>See <xref linkend="mapreduce.example.readwrite"/> for more information.</para>
- </section>
- <section xml:id="secondary.indexes.dualwrite">
- <title>
- Dual-Write Secondary Index
- </title>
- <para>Another strategy is to build the secondary index while publishing data to the cluster (e.g., write to data table, write to index table).
- If this is approach is taken after a data table already exists, then bootstrapping will be needed for the secondary index with a MapReduce job (see <xref linkend="secondary.indexes.periodic"/>).</para>
- </section>
- <section xml:id="secondary.indexes.summary">
- <title>
- Summary Tables
- </title>
- <para>Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach.
- These would be generated with MapReduce jobs into another table.</para>
- <para>See <xref linkend="mapreduce.example.summary"/> for more information.</para>
- </section>
- <section xml:id="secondary.indexes.coproc">
- <title>
- Coprocessor Secondary Index
- </title>
- <para>Coprocessors act like RDBMS triggers. These are currently on TRUNK.
- </para>
- </section>
- </section>
- <section xml:id="schema.smackdown"><title>Schema Design Smackdown</title>
- <para>This section will describe common schema design questions that appear on the dist-list. These are
- general guidelines and not laws - each application must consider it's own needs.
- </para>
- <section xml:id="schema.smackdown.rowsversions"><title>Rows vs. Versions</title>
- <para>A common question is whether one should prefer rows or HBase's built-in-versioning. The context is typically where there are
- "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 3 max versions). The
- rows-approach would require storing a timstamp in some portion of the rowkey so that they would not overwite with each successive update.
- </para>
- <para>Preference: Rows (generally speaking).
- </para>
+ <listitem>
+ <para>Is it OK to write cells in a non-increasing version
+ order?<footnote>
+ <para>Yes</para>
+ </footnote></para>
+ </listitem>
+ </itemizedlist></para>
+
+ <para>Below we describe how the version dimension in HBase currently
+ works<footnote>
+ <para>See <link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-2406">HBASE-2406</link>
+ for discussion of HBase versions. <link
+ xlink:href="http://outerthought.org/blog/417-ot.html">Bending time
+ in HBase</link> makes for a good read on the version, or time,
+ dimension in HBase. It has more detail on versioning than is
+ provided here. As of this writing, the limiitation
+ <emphasis>Overwriting values at existing timestamps</emphasis>
+ mentioned in the article no longer holds in HBase. This section is
+ basically a synopsis of this article by Bruno Dumon.</para>
+ </footnote>.</para>
+
+ <section xml:id="versions.ops">
+ <title>Versions and HBase Operations</title>
+
+ <para>In this section we look at the behavior of the version dimension
+ for each of the core HBase operations.</para>
+
+ <section>
+ <title>Get/Scan</title>
+
+ <para>Gets are implemented on top of Scans. The below discussion of
+ <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Get.html">Get</link> applies equally to <link
+ xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Scan.html">Scans</link>.</para>
+
+ <para>By default, i.e. if you specify no explicit version, when
+ doing a <literal>get</literal>, the cell whose version has the
+ largest value is returned (which may or may not be the latest one
+ written, see later). The default behavior can be modified in the
+ following ways:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>to return more than one version, see <link
+ xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()">Get.setMaxVersions()</link></para>
+ </listitem>
+
+ <listitem>
+ <para>to return versions other than the latest, see <link
+ xlink:href="???">Get.setTimeRange()</link></para>
+
+ <para>To retrieve the latest version that is less than or equal
+ to a given value, thus giving the 'latest' state of the record
+ at a certain point in time, just use a range from 0 to the
+ desired version and set the max versions to 1.</para>
+ </listitem>
+ </itemizedlist>
+
+ </section>
+ <section xml:id="default_get_example">
+ <title>Default Get Example</title>
+ <para>The following Get will only retrieve the current version of the row
+<programlisting>
+Get get = new Get(Bytes.toBytes("row1"));
+Result r = htable.get(get);
+byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr")); // returns current version of value
+</programlisting>
+ </para>
+ </section>
+ <section xml:id="versioned_get_example">
+ <title>Versioned Get Example</title>
+ <para>The following Get will return the last 3 versions of the row.
+<programlisting>
+Get get = new Get(Bytes.toBytes("row1"));
+get.setMaxVersions(3); // will return last 3 versions of row
+Result r = htable.get(get);
+byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr")); // returns current version of value
+List<KeyValue> kv = r.getColumn(Bytes.toBytes("cf"), Bytes.toBytes("attr")); // returns all versions of this column
+</programlisting>
+ </para>
+ </section>
+
+ <section>
+ <title>Put</title>
+
+ <para>Doing a put always creates a new version of a
+ <literal>cell</literal>, at a certain timestamp. By default the
+ system uses the server's <literal>currentTimeMillis</literal>, but
+ you can specify the version (= the long integer) yourself, on a
+ per-column level. This means you could assign a time in the past or
+ the future, or use the long value for non-time purposes.</para>
+
+ <para>To overwrite an existing value, do a put at exactly the same
+ row, column, and version as that of the cell you would
+ overshadow.</para>
+ <section xml:id="implicit_version_example">
+ <title>Implicit Version Example</title>
+ <para>The following Put will be implicitly versioned by HBase with the current time.
+<programlisting>
+Put put = new Put(Bytes.toBytes(row));
+put.add(Bytes.toBytes("cf"), Bytes.toBytes("attr1"), Bytes.toBytes( data));
+htable.put(put);
+</programlisting>
+ </para>
+ </section>
+ <section xml:id="explicit_version_example">
+ <title>Explicit Version Example</title>
+ <para>The following Put has the version timestamp explicitly set.
+<programlisting>
+Put put = new Put( Bytes.toBytes(row));
+long explicitTimeInMs = 555; // just an example
+put.add(Bytes.toBytes("cf"), Bytes.toBytes("attr1"), explicitTimeInMs, Bytes.toBytes(data));
+htable.put(put);
+</programlisting>
+ Caution: the version timestamp is internally by HBase for things like time-to-live calculations.
+ It's usually best to avoid setting the timestamp yourself.
+ </para>
+ </section>
+
+ </section>
+
+ <section>
+ <title>Delete</title>
+
+ <para>When performing a delete operation in HBase, there are two
+ ways to specify the versions to be deleted</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>Delete all versions older than a certain timestamp</para>
+ </listitem>
+
+ <listitem>
+ <para>Delete the version at a specific timestamp</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>A delete can apply to a complete row, a complete column
+ family, or to just one column. It is only in the last case that you
+ can delete explicit versions. For the deletion of a row or all the
+ columns within a family, it always works by deleting all cells older
+ than a certain version.</para>
+
+ <para>Deletes work by creating <emphasis>tombstone</emphasis>
+ markers. For example, let's suppose we want to delete a row. For
+ this you can specify a version, or else by default the
+ <literal>currentTimeMillis</literal> is used. What this means is
+ <quote>delete all cells where the version is less than or equal to
+ this version</quote>. HBase never modifies data in place, so for
+ example a delete will not immediately delete (or mark as deleted)
+ the entries in the storage file that correspond to the delete
+ condition. Rather, a so-called <emphasis>tombstone</emphasis> is
+ written, which will mask the deleted values<footnote>
+ <para>When HBase does a major compaction, the tombstones are
+ processed to actually remove the dead values, together with the
+ tombstones themselves.</para>
+ </footnote>. If the version you specified when deleting a row is
+ larger than the version of any value in the row, then you can
+ consider the complete row to be deleted.</para>
+ </section>
+ </section>
+
+ <section>
+ <title>Current Limitations</title>
+
+ <para>There are still some bugs (or at least 'undecided behavior')
+ with the version dimension that will be addressed by later HBase
+ releases.</para>
+
+ <section>
+ <title>Deletes mask Puts</title>
+
+ <para>Deletes mask puts, even puts that happened after the delete
+ was entered<footnote>
+ <para><link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-2256">HBASE-2256</link></para>
+ </footnote>. Remember that a delete writes a tombstone, which only
+ disappears after then next major compaction has run. Suppose you do
+ a delete of everything <= T. After this you do a new put with a
+ timestamp <= T. This put, even if it happened after the delete,
+ will be masked by the delete tombstone. Performing the put will not
+ fail, but when you do a get you will notice the put did have no
+ effect. It will start working again after the major compaction has
+ run. These issues should not be a problem if you use
+ always-increasing versions for new puts to a row. But they can occur
+ even if you do not care about time: just do delete and put
+ immediately after each other, and there is some chance they happen
+ within the same millisecond.</para>
+ </section>
+
+ <section>
+ <title>Major compactions change query results</title>
+
+ <para><quote>...create three cell versions at t1, t2 and t3, with a
+ maximum-versions setting of 2. So when getting all versions, only
+ the values at t2 and t3 will be returned. But if you delete the
+ version at t2 or t3, the one at t1 will appear again. Obviously,
+ once a major compaction has run, such behavior will not be the case
+ anymore...<footnote>
+ <para>See <emphasis>Garbage Collection</emphasis> in <link
+ xlink:href="http://outerthought.org/blog/417-ot.html">Bending
+ time in HBase</link> </para>
+ </footnote></quote></para>
+ </section>
+ </section>
</section>
- <section xml:id="schema.smackdown.rowscols"><title>Rows vs. Columns</title>
- <para>Another common question is whether one should prefer rows or columns. The context is typically in extreme cases of wide
- tables, such as having 1 row with 1 million attributes, or 1 million rows with 1 columns apiece.
+ </chapter> <!-- data model -->
+
+ <chapter xml:id="schema">
+ <title>HBase and Schema Design</title>
+ <para>A good general introduction on the strength and weaknesses modelling on
+ the various non-rdbms datastores is Ian Varleys' Master thesis,
+ <link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link>.
+ Recommended. Also, read <xref linkend="keyvalue"/> for how HBase stores data internally.
</para>
- <para>Preference: Rows (generally speaking). To be clear, this guideline is in the context is in extremely wide cases, not in the
- standard use-case where one needs to store a few dozen or hundred columns.
+ <section xml:id="schema.creation">
+ <title>
+ Schema Creation
+ </title>
+ <para>HBase schemas can be created or updated with <xref linkend="shell" />
+ or by using <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link> in the Java API.
</para>
- </section>
- </section>
+ <para>Tables must be disabled when making ColumnFamily modifications, for example..
+ <programlisting>
+Configuration config = HBaseConfiguration.create();
+HBaseAdmin admin = new HBaseAdmin(conf);
+String table = "myTable";
- </chapter>
+admin.disableTable(table);
-
- <chapter xml:id="datamodel">
- <title>Data Model</title>
- <para>In short, applications store data into an HBase table.
- Tables are made of rows and columns.
- All columns in HBase belong to a particular column family.
- Table cells -- the intersection of row and column
- coordinates -- are versioned.
- A cellâs content is an uninterpreted array of bytes.
- </para>
- <para>Table row keys are also byte arrays so almost anything can
- serve as a row key from strings to binary representations of longs or
- even serialized data structures. Rows in HBase tables
- are sorted by row key. The sort is byte-ordered. All table accesses are
- via the table row key -- its primary key.
-</para>
+HColumnDescriptor cf1 = ...;
+admin.addColumn(table, cf1 ); // adding new ColumnFamily
+HColumnDescriptor cf2 = ...;
+admin.modifyColumn(table, cf2 ); // modifying existing ColumnFamily
- <section xml:id="conceptual.view"><title>Conceptual View</title>
- <para>
- The following example is a slightly modified form of the one on page
- 2 of the <link xlink:href="http://labs.google.com/papers/bigtable.html">BigTable</link> paper.
- There is a table called <varname>webtable</varname> that contains two column families named
- <varname>contents</varname> and <varname>anchor</varname>.
- In this example, <varname>anchor</varname> contains two
- columns (<varname>anchor:cssnsi.com</varname>, <varname>anchor:my.look.ca</varname>)
- and <varname>contents</varname> contains one column (<varname>contents:html</varname>).
- <note>
- <title>Column Names</title>
+admin.enableTable(table);
+ </programlisting>
+ </para>See <xref linkend="client_dependencies"/> for more information about configuring client connections.
<para>
- By convention, a column name is made of its column family prefix and a
- <emphasis>qualifier</emphasis>. For example, the
- column
- <emphasis>contents:html</emphasis> is of the column family <varname>contents</varname>
- The colon character (<literal
- moreinfo="none">:</literal>) delimits the column family from the
- column family <emphasis>qualifier</emphasis>.
+ </para>
+ </section>
+ <section xml:id="number.of.cfs">
+ <title>
+ On the number of column families
+ </title>
+ <para>
+ HBase currently does not do well with anything above two or three column families so keep the number
+ of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so
+ if one column family is carrying the bulk of the data bringing on flushes, the adjacent families
+ will also be flushed though the amount of data they carry is small. Compaction is currently triggered
+ by the total number of files under a column family. Its not size based. When many column families the
+ flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by
+ changing flushing and compaction to work on a per column family basis).
+ </para>
+ <para>Try to make do with one column family if you can in your schemas. Only introduce a
+ second and third column family in the case where data access is usually column scoped;
+ i.e. you query one column family or the other but usually not both at the one time.
+ </para>
+ </section>
+ <section xml:id="rowkey.design"><title>Rowkey Design</title>
+ <section xml:id="timeseries">
+ <title>
+ Monotonically Increasing Row Keys/Timeseries Data
+ </title>
+ <para>
+ In the HBase chapter of Tom White's book <link xlink:url="http://oreilly.com/catalog/9780596521981">Hadoop: The Definitive Guide</link> (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores:
+ <link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>. The pile-up on a single region brought on
+ by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general its best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
</para>
- </note>
- <table frame='all'><title>Table <varname>webtable</varname></title>
- <tgroup cols='4' align='left' colsep='1' rowsep='1'>
- <colspec colname='c1'/>
- <colspec colname='c2'/>
- <colspec colname='c3'/>
- <colspec colname='c4'/>
- <thead>
- <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily <varname>contents</varname></entry><entry>ColumnFamily <varname>anchor</varname></entry></row>
- </thead>
- <tbody>
- <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry></entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry></entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
- </tbody>
- </tgroup>
- </table>
- </para>
- </section>
- <section xml:id="physical.view"><title>Physical View</title>
- <para>
- Although at a conceptual level tables may be viewed as a sparse set of rows.
- Physically they are stored on a per-column family basis. New columns
- (i.e., <varname>columnfamily:column</varname>) can be added to any
- column family without pre-announcing them.
- <table frame='all'><title>ColumnFamily <varname>anchor</varname></title>
- <tgroup cols='3' align='left' colsep='1' rowsep='1'>
- <colspec colname='c1'/>
- <colspec colname='c2'/>
- <colspec colname='c3'/>
- <thead>
- <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>Column Family <varname>anchor</varname></entry></row>
- </thead>
- <tbody>
- <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
- </tbody>
- </tgroup>
- </table>
- <table frame='all'><title>ColumnFamily <varname>contents</varname></title>
- <tgroup cols='3' align='left' colsep='1' rowsep='1'>
- <colspec colname='c1'/>
- <colspec colname='c2'/>
- <colspec colname='c3'/>
- <thead>
- <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily "contents:"</entry></row>
- </thead>
- <tbody>
- <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
- <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
- </tbody>
- </tgroup>
- </table>
- It is important to note in the diagram above that the empty cells shown in the
- conceptual view are not stored since they need not be in a column-oriented
- storage format. Thus a request for the value of the <varname>contents:html</varname>
- column at time stamp <literal>t8</literal> would return no value. Similarly, a
- request for an <varname>anchor:my.look.ca</varname> value at time stamp
- <literal>t9</literal> would return no value. However, if no timestamp is
- supplied, the most recent value for a particular column would be returned
- and would also be the first one found since timestamps are stored in
- descending order. Thus a request for the values of all columns in the row
- <varname>com.cnn.www</varname> if no timestamp is specified would be:
- the value of <varname>contents:html</varname> from time stamp
- <literal>t6</literal>, the value of <varname>anchor:cnnsi.com</varname>
- from time stamp <literal>t9</literal>, the value of
- <varname>anchor:my.look.ca</varname> from time stamp <literal>t8</literal>.
- </para>
- </section>
- <section xml:id="table">
- <title>Table</title>
- <para>
- Tables are declared up front at schema definition time.
- </para>
+
+ <para>If you do need to upload time series data into HBase, you should
+ study <link xlink:href="http://opentsdb.net/">OpenTSDB</link> as a
+ successful example. It has a page describing the <link xlink:href=" http://opentsdb.net/schema.html">schema</link> it uses in
+ HBase. The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key. However, the difference is that the timestamp is not in the <emphasis>lead</emphasis> position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types. Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
+ </para>
+ </section>
+ <section xml:id="keysize">
+ <title>Try to minimize row and column sizes</title>
+ <subtitle>Or why are my StoreFile indices large?</subtitle>
+ <para>In HBase, values are always freighted with their coordinates; as a
+ cell value passes through the system, it'll be accompanied by its
+ row, column name, and timestamp - always. If your rows and column names
+ are large, especially compared to the size of the cell value, then
+ you may run up against some interesting scenarios. One such is
+ the case described by Marc Limotte at the tail of
+ <link xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
+ (recommended!).
+ Therein, the indices that are kept on HBase storefiles (<xref linkend="hfile" />)
+ to facilitate random access may end up occupyng large chunks of the HBase
+ allotted RAM because the cell value coordinates are large.
+ Mark in the above cited comment suggests upping the block size so
+ entries in the store file index happen at a larger interval or
+ modify the table schema so it makes for smaller rows and column
+ names.
+ Compression will also make for larger indices. See
+ the thread <link xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&subj=a+question+storefileIndexSize">a question storefileIndexSize</link>
+ up on the user mailing list.
+ </para>
+ <para>Most of the time small inefficiencies don't matter all that much. Unfortunately,
+ this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
+ several billion times in your data. See <xref linkend="keyvalue"/> for more information on HBase stores data internally.</para>
+ <section xml:id="keysize.cf"><title>Column Families</title>
+ <para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
+ </para>
+ </section>
+ <section xml:id="keysize.atttributes"><title>Attributes</title>
+ <para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
+ to store in HBase.
+ </para>
+ </section>
+ <section xml:id="keysize.row"><title>Rowkey Length</title>
+ <para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan).
+ A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs
+ when designing rowkeys.
+ </para>
+ </section>
</section>
-
- <section xml:id="row">
- <title>Row</title>
- <para>Row keys are uninterrpreted bytes. Rows are
- lexicographically sorted with the lowest order appearing first
- in a table. The empty byte array is used to denote both the
- start and end of a tables' namespace.</para>
+ <section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
+ <para>A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps
+ as a part of the key can help greatly with a special case of this problem. Also found in the HBase chapter of Tom White's book Hadoop: The Definitive Guide (O'Reilly),
+ the technique involves appending (<code>Long.MAX_VALUE - timestamp</code>) to the end of any key, e.g., [key][reverse_timestamp].
+ </para>
+ <para>The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record. Since HBase keys
+ are in sorted order, this key sorts before any older row-keys for [key] and thus is first.
+ </para>
+ <para>This technique would be used instead of using <xref linkend="schema.versions">HBase Versioning</xref> where the intent is to hold onto all versions
+ "forever" (or a very long time) and at the same time quickly obtain access to any other version by using the same Scan technique.
+ </para>
</section>
-
- <section xml:id="columnfamily">
- <title>Column Family<indexterm><primary>Column Family</primary></indexterm></title>
- <para>
- Columns in HBase are grouped into <emphasis>column families</emphasis>.
- All column members of a column family have the same prefix. For example, the
- columns <emphasis>courses:history</emphasis> and
- <emphasis>courses:math</emphasis> are both members of the
- <emphasis>courses</emphasis> column family.
- The colon character (<literal
- moreinfo="none">:</literal>) delimits the column family from the
- <indexterm>column family <emphasis>qualifier</emphasis><primary>Column Family Qualifier</primary></indexterm>.
- The column family prefix must be composed of
- <emphasis>printable</emphasis> characters. The qualifying tail, the
- column family <emphasis>qualifier</emphasis>, can be made of any
- arbitrary bytes. Column families must be declared up front
- at schema definition time whereas columns do not need to be
- defined at schema time but can be conjured on the fly while
- the table is up an running.</para>
- <para>Physically, all column family members are stored together on the
- filesystem. Because tunings and
- storage specifications are done at the column family level, it is
- advised that all column family members have the same general access
- pattern and size characteristics.</para>
-
- <para></para>
+ <section xml:id="rowkey.scope">
+ <title>Rowkeys and ColumnFamilies</title>
+ <para>Rowkeys are scoped to ColumnFamilies. Thus, the same rowkey could exist in each ColumnFamily that exists in a table without collision.
+ </para>
</section>
- <section xml:id="cells">
- <title>Cells<indexterm><primary>Cells</primary></indexterm></title>
- <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
- specifies a <literal>cell</literal> in HBase.
- Cell content is uninterrpreted bytes</para>
+ <section xml:id="changing.rowkeys"><title>Immutability of Rowkeys</title>
+ <para>Rowkeys cannot be changed. The only way they can be "changed" in a table is if the row is deleted and then re-inserted.
+ This is a fairly common question on the HBase dist-list so it pays to get the rowkeys right the first time (and/or before you've
+ inserted a lot of data).
+ </para>
</section>
- <section xml:id="data_model_operations">
- <title>Data Model Operations</title>
- <para>The four primary data model operations are Get, Put, Scan, and Delete. Operations are applied via
- <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html">HTable</link> instances.
- </para>
- <section xml:id="get">
- <title>Get</title>
- <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Get.html">Get</link> returns
- attributes for a specified row. Gets are executed via
- <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#get%28org.apache.hadoop.hbase.client.Get%29">
- HTable.get</link>.
- </para>
- </section>
- <section xml:id="put">
- <title>Put</title>
- <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Put.html">Put</link> either
- adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via
- <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#put%28org.apache.hadoop.hbase.client.Put%29">
- HTable.put</link> (writeBuffer) or <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29">
- HTable.batch</link> (non-writeBuffer).
- </para>
- </section>
- <section xml:id="scan">
- <title>Scans</title>
- <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Scan.html">Scan</link> allow
- iteration over multiple rows for specified attributes.
- </para>
- <para>The following is an example of a
- on an HTable table instance. Assume that a table is populated with rows with keys "row1", "row2", "row3",
- and then another set of rows with the keys "abc1", "abc2", and "abc3". The following example shows how startRow and stopRow
- can be applied to a Scan instance to return the rows beginning with "row".
-<programlisting>
-HTable htable = ... // instantiate HTable
-
-Scan scan = new Scan();
-scan.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("attr"));
-scan.setStartRow( Bytes.toBytes("row"));
-scan.setStopRow( Bytes.toBytes("row" + new byte[] {0})); // note: stop key != start key
-for(Result result : htable.getScanner(scan)) {
- // process Result instance
-}
-</programlisting>
- </para>
- </section>
- <section xml:id="delete">
- <title>Delete</title>
- <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Delete.html">Delete</link> removes
- a row from a table. Deletes are executed via
- <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
- HTable.delete</link>.
- </para>
- </section>
-
+ </section> <!-- rowkey design -->
+ <section xml:id="schema.versions">
+ <title>
+ Number of Versions
+ </title>
+ <section xml:id="schema.versions.max"><title>Maximum Number of Versions</title>
+ <para>The maximum number of row versions to store is configured per column
+ family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
+ The default for max versions is 3.
+ This is an important parameter because as described in <xref linkend="datamodel" />
+ section HBase does <emphasis>not</emphasis> overwrite row values, but rather
+ stores different values per row by time (and qualifier). Excess versions are removed during major
+ compactions. The number of max versions may need to be increased or decreased depending on application needs.
+ </para>
+ <para>It is not recommended setting the number of max versions to an exceedingly high level (e.g., hundreds or more) unless those old values are
+ very dear to you because this will greatly increase StoreFile size.
+ </para>
+ </section>
+ <section xml:id="schema.minversions">
+ <title>
+ Minimum Number of Versions
+ </title>
+ <para>Like number of max row versions, the minimum number of row versions to keep is configured per column
+ family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
+ The default for min versions is 0, which means the feature is disabled.
+ The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the
+ number of row versions parameter to allow configurations such as
+ "keep the last T minutes worth of data, at most N versions, <emphasis>but keep at least M versions around</emphasis>"
+ (where M is the value for minimum number of row versions, M<=N).
+ This parameter should only be set when time-to-live is enabled for a column family and must be less or equal to the
+ number of row versions.
+ </para>
+ </section>
+ </section>
+ <section xml:id="supported.datatypes">
+ <title>
+ Supported Datatypes
+ </title>
+ <para>HBase supports a "bytes-in/bytes-out" interface via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link> and
+ <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html">Result</link>, so anything that can be
+ converted to an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
+ </para>
+ <para>There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask);
+ search the mailling list for conversations on this topic. All rows in HBase conform to the <xref linkend="datamodel">datamodel</xref>, and
+ that includes versioning. Take that into consideration when making your design, as well as block size for the ColumnFamily.
+ </para>
+ <section xml:id="counters">
+ <title>Counters</title>
+ <para>
+ One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See
+ <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29">Increment</link> in HTable.
+ </para>
+ <para>Synchronization on counters are done on the RegionServer, not in the client.
+ </para>
+ </section>
+ </section>
+ <section xml:id="cf.in.memory">
+ <title>
+ In-Memory ColumnFamilies
+ </title>
+ <para>ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily.
+ In-memory blocks have the highest priority in the <xref linkend="block.cache" />, but it is not a guarantee that the entire table
+ will be in memory.
+ </para>
+ <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
+ </para>
+ </section>
+ <section xml:id="ttl">
+ <title>Time To Live (TTL)</title>
+ <para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
+ This applies to <emphasis>all</emphasis> versions of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC.
+ </para>
+ <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
+ </para>
+ </section>
+ <section xml:id="schema.bloom">
+ <title>Bloom Filters</title>
+ <para>Bloom Filters can be enabled per-ColumnFamily.
+ Use <code>HColumnDescriptor.setBloomFilterType(NONE | ROW |
+ ROWCOL)</code> to enable blooms per Column Family. Default =
+ <varname>NONE</varname> for no bloom filters. If
+ <varname>ROW</varname>, the hash of the row will be added to the bloom
+ on each insert. If <varname>ROWCOL</varname>, the hash of the row +
+ column family + column family qualifier will be added to the bloom on
+ each key insert.</para>
+ <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> and
+ <xref linkend="blooms"/> for more information.
+ </para>
+ </section>
+ <section xml:id="secondary.indexes">
+ <title>
+ Secondary Indexes and Alternate Query Paths
+ </title>
+ <para>This section could also be titled "what if my table rowkey looks like <emphasis>this</emphasis> but I also want to query my table like <emphasis>that</emphasis>."
+ A common example on the dist-list is where a row-key is of the format "user-timestamp" but there are are reporting requirements on activity across users for certain
+ time ranges. Thus, selecting by user is easy because it is in the lead position of the key, but time is not.
+ </para>
+ <para>There is no single answer on the best way to handle this because it depends on...
+ <itemizedlist>
+ <listitem>Number of users</listitem>
+ <listitem>Data size and data arrival rate</listitem>
+ <listitem>Flexibility of reporting requirements (e.g., completely ad-hoc date selection vs. pre-configured ranges) </listitem>
+ <listitem>Desired execution speed of query (e.g., 90 seconds may be reasonable to some for an ad-hoc report, whereas it may be too long for others) </listitem>
+ </itemizedlist>
+ ... and solutions are also influenced by the size of the cluster and how much processing power you have to throw at the solution.
+ Common techniques are in sub-sections below. This is a comprehensive, but not exhaustive, list of approaches.
+ </para>
+ <para>It should not be a surprise that secondary indexes require additional cluster space and processing.
+ This is precisely what happens in an RDBMS because the act of creating an alternate index requires both space and processing cycles to update. RBDMS products
+ are more advanced in this regard to handle alternative index management out of the box. However, HBase scales better at larger data volumes, so this is a feature trade-off.
+ </para>
+ <para>Pay attention to <xref linkend="performance"/> when implementing any of these approaches.</para>
+ <para>Additionally, see the David Butler response in this dist-list thread <link xlink:href="http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&subj=Stargate+hbase">HBase, mail # user - Stargate+hbase</link>
+ </para>
+ <section xml:id="secondary.indexes.filter">
+ <title>
+ Filter Query
+ </title>
+ <para>Depending on the case, it may be appropriate to use <xref linkend="client.filter"/>. In this case, no secondary index is created.
+ However, don't try a full-scan on a large table like this from an application (i.e., single-threaded client).
+ </para>
</section>
+ <section xml:id="secondary.indexes.periodic">
+ <title>
+ Periodic-Update Secondary Index
+ </title>
+ <para>A secondary index could be created in an other table which is periodically updated via a MapReduce job. The job could be executed intra-day, but depending on
+ load-strategy it could still potentially be out of sync with the main data table.</para>
+ <para>See <xref linkend="mapreduce.example.readwrite"/> for more information.</para>
+ </section>
+ <section xml:id="secondary.indexes.dualwrite">
+ <title>
+ Dual-Write Secondary Index
+ </title>
+ <para>Another strategy is to build the secondary index while publishing data to the cluster (e.g., write to data table, write to index table).
+ If this is approach is taken after a data table already exists, then bootstrapping will be needed for the secondary index with a MapReduce job (see <xref linkend="secondary.indexes.periodic"/>).</para>
+ </section>
+ <section xml:id="secondary.indexes.summary">
+ <title>
+ Summary Tables
+ </title>
+ <para>Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach.
+ These would be generated with MapReduce jobs into another table.</para>
+ <para>See <xref linkend="mapreduce.example.summary"/> for more information.</para>
+ </section>
+ <section xml:id="secondary.indexes.coproc">
+ <title>
+ Coprocessor Secondary Index
+ </title>
+ <para>Coprocessors act like RDBMS triggers. These are currently on TRUNK.
+ </para>
+ </section>
+ </section>
+ <section xml:id="schema.smackdown"><title>Schema Design Smackdown</title>
+ <para>This section will describe common schema design questions that appear on the dist-list. These are
+ general guidelines and not laws - each application must consider it's own needs.
+ </para>
+ <section xml:id="schema.smackdown.rowsversions"><title>Rows vs. Versions</title>
+ <para>A common question is whether one should prefer rows or HBase's built-in-versioning. The context is typically where there are
+ "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 3 max versions). The
+ rows-approach would require storing a timstamp in some portion of the rowkey so that they would not overwite with each successive update.
+ </para>
+ <para>Preference: Rows (generally speaking).
+ </para>
+ </section>
+ <section xml:id="schema.smackdown.rowscols"><title>Rows vs. Columns</title>
+ <para>Another common question is whether one should prefer rows or columns. The context is typically in extreme cases of wide
+ tables, such as having 1 row with 1 million attributes, or 1 million rows with 1 columns apiece.
+ </para>
+ <para>Preference: Rows (generally speaking). To be clear, this guideline is in the context is in extremely wide cases, not in the
+ standard use-case where one needs to store a few dozen or hundred columns.
+ </para>
+ </section>
+ </section>
+ </chapter> <!-- schema design -->
- <section xml:id="versions">
- <title>Versions<indexterm><primary>Versions</primary></indexterm></title>
-
- <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
- specifies a <literal>cell</literal> in HBase. Its possible to have an
- unbounded number of cells where the row and column are the same but the
- cell address differs only in its version dimension.</para>
-
- <para>While rows and column keys are expressed as bytes, the version is
- specified using a long integer. Typically this long contains time
- instances such as those returned by
- <code>java.util.Date.getTime()</code> or
- <code>System.currentTimeMillis()</code>, that is: <quote>the difference,
- measured in milliseconds, between the current time and midnight, January
- 1, 1970 UTC</quote>.</para>
-
- <para>The HBase version dimension is stored in decreasing order, so that
- when reading from a store file, the most recent values are found
- first.</para>
-
- <para>There is a lot of confusion over the semantics of
- <literal>cell</literal> versions, in HBase. In particular, a couple
- questions that often come up are:<itemizedlist>
- <listitem>
- <para>If multiple writes to a cell have the same version, are all
- versions maintained or just the last?<footnote>
- <para>Currently, only the last written is fetchable.</para>
- </footnote></para>
- </listitem>
-
- <listitem>
- <para>Is it OK to write cells in a non-increasing version
- order?<footnote>
- <para>Yes</para>
- </footnote></para>
- </listitem>
- </itemizedlist></para>
-
- <para>Below we describe how the version dimension in HBase currently
- works<footnote>
- <para>See <link
- xlink:href="https://issues.apache.org/jira/browse/HBASE-2406">HBASE-2406</link>
- for discussion of HBase versions. <link
- xlink:href="http://outerthought.org/blog/417-ot.html">Bending time
- in HBase</link> makes for a good read on the version, or time,
[... 440 lines stripped ...]