You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2013/04/02 20:07:09 UTC
svn commit: r1463654 [2/2] - in /hbase/hbase.apache.org/trunk: ./ book/ schema_design/ upgrading/

Added: hbase/hbase.apache.org/trunk/schema_design/rowkey.design.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/schema_design/rowkey.design.html?rev=1463654&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/schema_design/rowkey.design.html (added)
+++ hbase/hbase.apache.org/trunk/schema_design/rowkey.design.html Tue Apr  2 18:07:08 2013
@@ -0,0 +1,148 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>1.3.&nbsp;Rowkey Design</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="up" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="prev" href="number.of.cfs.html" title="1.2.&nbsp; On the number of column families"><link rel="next" href="schema.versions.html" title="1.4.&nbsp; Number of Versions"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.3.&nbsp;Rowkey Design</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="number.of.cfs.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="schema.versions
 .html">Next</a></td></tr></table><hr></div><div class="section" title="1.3.&nbsp;Rowkey Design"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="rowkey.design"></a>1.3.&nbsp;Rowkey Design</h2></div></div></div><div class="section" title="1.3.1.&nbsp; Monotonically Increasing Row Keys/Timeseries Data"><div class="titlepage"><div><div><h3 class="title"><a name="timeseries"></a>1.3.1.&nbsp;
+    Monotonically Increasing Row Keys/Timeseries Data
+    </h3></div></div></div><p>
+      In the HBase chapter of Tom White's book Hadoop: The Definitive Guide (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc.  With monotonically increasing row-keys (i.e., using a timestamp), this will happen.  See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores:
+      <a class="link" href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/" target="_top">monotonically increasing values are bad</a>.  The pile-up on a single region brought on
+      by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
+    </p><p>If you do need to upload time series data into HBase, you should
+    study <a class="link" href="http://opentsdb.net/" target="_top">OpenTSDB</a> as a
+    successful example.  It has a page describing the <a class="link" href=" http://opentsdb.net/schema.html" target="_top">schema</a> it uses in
+    HBase.  The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key.  However, the difference is that the timestamp is not in the <span class="emphasis"><em>lead</em></span> position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types.  Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
+   </p><p>See <a class="xref" href="schema.casestudies.html" title="1.11.&nbsp;Schema Design Case Studies">Section&nbsp;1.11, &#8220;Schema Design Case Studies&#8221;</a> for some rowkey design examples.
+   </p></div><div class="section" title="1.3.2.&nbsp;Try to minimize row and column sizes"><div class="titlepage"><div><div><h3 class="title"><a name="keysize"></a>1.3.2.&nbsp;Try to minimize row and column sizes</h3></div><div><h4 class="subtitle">Or why are my StoreFile indices large?</h4></div></div></div><p>In HBase, values are always freighted with their coordinates; as a
+          cell value passes through the system, it'll be accompanied by its
+          row, column name, and timestamp - always.  If your rows and column names
+          are large, especially compared to the size of the cell value, then
+          you may run up against some interesting scenarios.  One such is
+          the case described by Marc Limotte at the tail of
+          HBASE-3551
+          (recommended!).
+          Therein, the indices that are kept on HBase storefiles (<a class="xref" href="">???</a>)
+                  to facilitate random access may end up occupyng large chunks of the HBase
+                  allotted RAM because the cell value coordinates are large.
+                  Mark in the above cited comment suggests upping the block size so
+                  entries in the store file index happen at a larger interval or
+                  modify the table schema so it makes for smaller rows and column
+                  names.
+                  Compression will also make for larger indices.  See
+                  the thread <a class="link" href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&amp;subj=a+question+storefileIndexSize" target="_top">a question storefileIndexSize</a>
+                  up on the user mailing list.
+       </p><p>Most of the time small inefficiencies don't matter all that much.  Unfortunately,
+         this is a case where they do.  Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
+       several billion times in your data. </p><p>See <a class="xref" href="">???</a> for more information on HBase stores data internally to see why this is important.</p><div class="section" title="1.3.2.1.&nbsp;Column Families"><div class="titlepage"><div><div><h4 class="title"><a name="keysize.cf"></a>1.3.2.1.&nbsp;Column Families</h4></div></div></div><p>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
+         </p><p>See <a class="xref" href="">???</a> for more information on HBase stores data internally to see why this is important.</p></div><div class="section" title="1.3.2.2.&nbsp;Attributes"><div class="titlepage"><div><div><h4 class="title"><a name="keysize.atttributes"></a>1.3.2.2.&nbsp;Attributes</h4></div></div></div><p>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
+         to store in HBase.
+         </p><p>See <a class="xref" href="">???</a> for more information on HBase stores data internally to see why this is important.</p></div><div class="section" title="1.3.2.3.&nbsp;Rowkey Length"><div class="titlepage"><div><div><h4 class="title"><a name="keysize.row"></a>1.3.2.3.&nbsp;Rowkey Length</h4></div></div></div><p>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan).
+         A short key that is useless for data access is not better than a longer key with better get/scan properties.  Expect tradeoffs
+         when designing rowkeys.
+         </p></div><div class="section" title="1.3.2.4.&nbsp;Byte Patterns"><div class="titlepage"><div><div><h4 class="title"><a name="keysize.patterns"></a>1.3.2.4.&nbsp;Byte Patterns</h4></div></div></div><p>A long is 8 bytes.  You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes.
+            If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes.
+         </p><p>Not convinced?  Below is some sample code that you can run on your own.
+</p><pre class="programlisting">
+// long
+//
+long l = 1234567890L;
+byte[] lb = Bytes.toBytes(l);
+System.out.println("long bytes length: " + lb.length);   // returns 8
+
+String s = "" + l;
+byte[] sb = Bytes.toBytes(s);
+System.out.println("long as string length: " + sb.length);    // returns 10
+
+// hash
+//
+MessageDigest md = MessageDigest.getInstance("MD5");
+byte[] digest = md.digest(Bytes.toBytes(s));
+System.out.println("md5 digest bytes length: " + digest.length);    // returns 16
+
+String sDigest = new String(digest);
+byte[] sbDigest = Bytes.toBytes(sDigest);
+System.out.println("md5 digest as string length: " + sbDigest.length);    // returns 26
+</pre><p>
+         </p></div></div><div class="section" title="1.3.3.&nbsp;Reverse Timestamps"><div class="titlepage"><div><div><h3 class="title"><a name="reverse.timestamp"></a>1.3.3.&nbsp;Reverse Timestamps</h3></div></div></div><p>A common problem in database processing is quickly finding the most recent version of a value.  A technique using reverse timestamps
+    as a part of the key can help greatly with a special case of this problem.  Also found in the HBase chapter of Tom White's book Hadoop:  The Definitive Guide (O'Reilly),
+    the technique involves appending (<code class="code">Long.MAX_VALUE - timestamp</code>) to the end of any key, e.g., [key][reverse_timestamp].
+    </p><p>The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record.  Since HBase keys
+    are in sorted order, this key sorts before any older row-keys for [key] and thus is first.
+    </p><p>This technique would be used instead of using <a class="xref" href="schema.versions.html" title="1.4.&nbsp; Number of Versions">Section&nbsp;1.4, &#8220;
+  Number of Versions
+  &#8221;</a> where the intent is to hold onto all versions
+    "forever" (or a very long time) and at the same time quickly obtain access to any other version by using the same Scan technique.
+    </p></div><div class="section" title="1.3.4.&nbsp;Rowkeys and ColumnFamilies"><div class="titlepage"><div><div><h3 class="title"><a name="rowkey.scope"></a>1.3.4.&nbsp;Rowkeys and ColumnFamilies</h3></div></div></div><p>Rowkeys are scoped to ColumnFamilies.  Thus, the same rowkey could exist in each ColumnFamily that exists in a table without collision.
+    </p></div><div class="section" title="1.3.5.&nbsp;Immutability of Rowkeys"><div class="titlepage"><div><div><h3 class="title"><a name="changing.rowkeys"></a>1.3.5.&nbsp;Immutability of Rowkeys</h3></div></div></div><p>Rowkeys cannot be changed.  The only way they can be "changed" in a table is if the row is deleted and then re-inserted.
+    This is a fairly common question on the HBase dist-list so it pays to get the rowkeys right the first time (and/or before you've
+    inserted a lot of data).
+    </p></div><div class="section" title="1.3.6.&nbsp;Relationship Between RowKeys and Region Splits"><div class="titlepage"><div><div><h3 class="title"><a name="rowkey.regionsplits"></a>1.3.6.&nbsp;Relationship Between RowKeys and Region Splits</h3></div></div></div><p>If you pre-split your table, it is <span class="emphasis"><em>critical</em></span> to understand how your rowkey will be distributed across
+    the region boundaries.  As an example of why this is important, consider the example of using displayable hex characters as the
+    lead position of the key (e.g., ""0000000000000000" to "ffffffffffffffff").  Running those key ranges through <code class="code">Bytes.split</code>
+    (which is the split strategy used when creating regions in <code class="code">HBaseAdmin.createTable(byte[] startKey, byte[] endKey, numRegions)</code>
+    for 10 regions will generate the following splits...
+    </p><p>
+    </p><pre class="programlisting">
+48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48                                // 0
+54 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10                 // 6
+61 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -68                 // =
+68 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -126  // D
+75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 72                                // K
+82 18 18 18 18 18 18 18 18 18 18 18 18 18 18 14                                // R
+88 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -44                 // X
+95 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -102                // _
+102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102                // f
+    </pre><p>
+    ... (note:  the lead byte is listed to the right as a comment.)  Given that the first split is a '0' and the last split is an 'f',
+    everything is great, right?  Not so fast.
+    </p><p>The problem is that all the data is going to pile up in the first 2 regions and the last region thus creating a "lumpy" (and
+    possibly "hot") region problem.  To understand why, refer to an  <a class="link" href="http://www.asciitable.com" target="_top">ASCII Table</a>.
+    '0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values (bytes 58 to 96) that will <span class="emphasis"><em>never appear in this
+    keyspace</em></span> because the only values are [0-9] and [a-f].  Thus, the middle regions regions will
+    never be used.  To make pre-spliting work with this example keyspace, a custom definition of splits (i.e., and not relying on the
+    built-in split method) is required.
+    </p><p>Lesson #1:  Pre-splitting tables is generally a best practice, but you need to pre-split them in such a way that all the
+    regions are accessible in the keyspace.  While this example demonstrated the problem with a hex-key keyspace, the same problem can happen
+     with <span class="emphasis"><em>any</em></span> keyspace.  Know your data.
+    </p><p>Lesson #2:  While generally not advisable, using hex-keys (and more generally, displayable data) can still work with pre-split
+    tables as long as all the created regions are accessible in the keyspace.
+    </p><p>To conclude this example, the following is an example of  how appropriate splits can be pre-created for hex-keys:.
+	    </p><pre class="programlisting">public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits)
+throws IOException {
+  try {
+    admin.createTable( table, splits );
+    return true;
+  } catch (TableExistsException e) {
+    logger.info("table " + table.getNameAsString() + " already exists");
+    // the table already exists...
+    return false;
+  }
+}
+
+public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) {
+  byte[][] splits = new byte[numRegions-1][];
+  BigInteger lowestKey = new BigInteger(startKey, 16);
+  BigInteger highestKey = new BigInteger(endKey, 16);
+  BigInteger range = highestKey.subtract(lowestKey);
+  BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions));
+  lowestKey = lowestKey.add(regionIncrement);
+  for(int i=0; i &lt; numRegions-1;i++) {
+    BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
+    byte[] b = String.format("%016x", key).getBytes();
+    splits[i] = b;
+  }
+  return splits;
+}</pre></div></div><div id="disqus_thread"></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = 'rowkey.design';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="number.of.cfs.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="schema.versions.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.2.&nbsp;
+      On the number of column families
+  &nbsp;</td><td width="20%" align="center"><a accesskey="h" href="schema_design.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.4.&nbsp;
+  Number of Versions
+  </td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/schema_design/schema.casestudies.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/schema_design/schema.casestudies.html?rev=1463654&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/schema_design/schema.casestudies.html (added)
+++ hbase/hbase.apache.org/trunk/schema_design/schema.casestudies.html Tue Apr  2 18:07:08 2013
@@ -0,0 +1,119 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>1.11.&nbsp;Schema Design Case Studies</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="up" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="prev" href="constraints.html" title="1.10.&nbsp;Constraints"><link rel="next" href="schema.ops.html" title="1.12.&nbsp;Operational and Performance Configuration Options"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.11.&nbsp;Schema Design Case Studies</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="constraints.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a access
 key="n" href="schema.ops.html">Next</a></td></tr></table><hr></div><div class="section" title="1.11.&nbsp;Schema Design Case Studies"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="schema.casestudies"></a>1.11.&nbsp;Schema Design Case Studies</h2></div></div></div><p>The following will describe some typical data ingestion use-cases with HBase, and how the rowkey design and construction
+   can be approached.  Note:  this is just an illustration of potential approaches, not an exhaustive list. 
+   Know your data, and know your processing requirements.
+  </p><p>There are 3 case studies described:    
+      </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Log Data / Timeseries Data</li><li class="listitem">Log Data / Timeseries on Steroids</li><li class="listitem">Customer/Sales</li></ul></div><p> 
+    ... and then a brief section on "Tall/Wide/Middle" in terms of schema design approaches.
+  </p><div class="section" title="1.11.1.&nbsp;Log Data and Timeseries Data Case Study"><div class="titlepage"><div><div><h3 class="title"><a name="schema.casestudies.log-timeseries"></a>1.11.1.&nbsp;Log Data and Timeseries Data Case Study</h3></div></div></div><p>Assume that the following data elements are being collected.
+        </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Hostname</li><li class="listitem">Timestamp</li><li class="listitem">Log event</li><li class="listitem">Value/message</li></ul></div><p>
+        We can store them in an HBase table called LOG_DATA, but what will the rowkey be?  
+       From these attributes the rowkey will be some combination of hostname, timestamp, and log-event - but what specifically?        
+      </p><div class="section" title="1.11.1.1.&nbsp;Timestamp In The Rowkey Lead Position"><div class="titlepage"><div><div><h4 class="title"><a name="schema.casestudies.log-timeseries.tslead"></a>1.11.1.1.&nbsp;Timestamp In The Rowkey Lead Position</h4></div></div></div><p>The rowkey <code class="code">[timestamp][hostname][log-event]</code> suffers from the monotonically increasing rowkey problem 
+        described in <a class="xref" href="rowkey.design.html#timeseries" title="1.3.1.&nbsp; Monotonically Increasing Row Keys/Timeseries Data">Section&nbsp;1.3.1, &#8220;
+    Monotonically Increasing Row Keys/Timeseries Data
+    &#8221;</a>.
+        </p><p>There is another pattern frequently mentioned in the dist-lists about &#8220;bucketing&#8221; timestamps, by performing a mod operation 
+        on the timestamp.  If time-oriented scans are important, this could be a useful approach.  Attention must be paid to the number
+        of buckets, because this will require the same number of scans to return results.
+</p><pre class="programlisting">
+long bucket = timestamp % numBuckets;
+</pre><p>
+        &#8230; to construct:
+</p><pre class="programlisting">
+[bucket][timestamp][hostname][log-event]
+</pre><p>        
+          As stated above, to select data for a particular timerange, a Scan will need to be performed for each bucket.  100 buckets,
+          for example, will provide a wide distribution in the keyspace but it will require 100 Scans to obtain data for a single
+          timestamp, so there are trade-offs. 
+        </p></div><div class="section" title="1.11.1.2.&nbsp;Host In The Rowkey Lead Position"><div class="titlepage"><div><div><h4 class="title"><a name="schema.casestudies.log-timeseries.hostlead"></a>1.11.1.2.&nbsp;Host In The Rowkey Lead Position</h4></div></div></div><p>The rowkey <code class="code">[hostname][log-event][timestamp]</code> is a candidate if there is a large-ish number of hosts to spread
+        the writes and reads across the keyspace.  This approach would be useful if scanning by hostname was a priority.
+        </p></div><div class="section" title="1.11.1.3.&nbsp;Timestamp, or Reverse Timestamp?"><div class="titlepage"><div><div><h4 class="title"><a name="schema.casestudies.log-timeseries.revts"></a>1.11.1.3.&nbsp;Timestamp, or Reverse Timestamp?</h4></div></div></div><p>If the most important access path is to pull most recent events, then storing the timestamps as reverse-timestamps 
+        (e.g., <code class="code">timestamp = Long.MAX_VALUE &#8211; timestamp</code>) will create the property of being able to do a Scan on
+        <code class="code">[hostname][log-event]</code> to obtain the quickly obtain the most recently captured events.
+        </p><p>Neither approach is wrong, it just depends on what is most appropriate for the situation.
+        </p></div><div class="section" title="1.11.1.4.&nbsp;Variangle Length or Fixed Length Rowkeys?"><div class="titlepage"><div><div><h4 class="title"><a name="schema.casestudies.log-timeseries.varkeys"></a>1.11.1.4.&nbsp;Variangle Length or Fixed Length Rowkeys?</h4></div></div></div><p>It is critical to remember that rowkeys are stamped on every column in HBase.  If the hostname is &#8220;a&#8221; and the event type
+         is &#8220;e1&#8221; then the resulting rowkey would be quite small.  However, what if the ingested hostname is
+          &#8220;myserver1.mycompany.com&#8221; and the event type is &#8220;com.package1.subpackage2.subsubpackage3.ImportantService&#8221;?  
+         </p><p>It might make sense to use some substitution in the rowkey.  There are at least two approaches:  hashed and numeric.
+         In the Hostname In The Rowkey Lead Position example, it might look like this:
+        </p><p>Composite Rowkey With Hashes:  
+           </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[MD5 hash of hostname] = 16 bytes</li><li class="listitem">[MD5 hash of event-type] = 16 bytes</li><li class="listitem">[timestamp] = 8 bytes</li></ul></div><p>
+        </p><p>Composite Rowkey With Numeric Substitution: 
+        </p><p>For this approach another lookup table would be needed in addition to LOG_DATA, called LOG_TYPES.  
+        The rowkey of LOG_TYPES would be:
+		  </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[type]  (e.g., byte indicating hostname vs. event-type)</li><li class="listitem">[bytes]  variable length bytes for raw hostname or event-type.</li></ul></div><p>
+        A column for this rowkey could be a long with an assigned number, which could be obtained by using an 
+		<a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29" target="_top">HBase counter</a>.
+        </p><p>So the resulting composite rowkey would be:
+		</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[substituted long for hostname] = 8 bytes</li><li class="listitem">[substituted long for event type] = 8 bytes</li><li class="listitem">[timestamp] = 8 bytes</li></ul></div><p>
+		In either the Hash or Numeric substitution approach, the raw values for hostname and event-type can be stored as columns.
+        </p></div></div><div class="section" title="1.11.2.&nbsp;Log Data and Timeseries Data on Steroids Case Study"><div class="titlepage"><div><div><h3 class="title"><a name="schema.casestudies.log-timeseries.log-steroids"></a>1.11.2.&nbsp;Log Data and Timeseries Data on Steroids Case Study</h3></div></div></div><p>This effectively is the OpenTSDB approach.  What OpenTSDB does is re-write data and pack rows into columns for 
+        certain time-periods.  For a detailed explanation, see:  <a class="link" href="http://opentsdb.net/schema.html" target="_top">http://opentsdb.net/schema.html</a>.
+      </p><p>But this is how the general concept works:  data is ingested, for example, in this manner&#8230;
+</p><pre class="programlisting">
+[hostname][log-event][timestamp1]
+[hostname][log-event][timestamp2]
+[hostname][log-event][timestamp3]
+</pre><p>
+       &#8230; with separate rowkeys for each detailed event, but is re-written like this&#8230; 
+       </p><p><code class="code">[hostname][log-event][timerange]</code>
+       </p><p>&#8230; and each of the above events are converted into columns stored with a time-offset relative to the beginning timerange 
+       (e.g., every 5 minutes).  This is obviously a very advanced processing technique, but HBase makes this possible.
+      </p></div><div class="section" title="1.11.3.&nbsp;Customer / Sales Case Study"><div class="titlepage"><div><div><h3 class="title"><a name="schema.casestudies.log-timeseries.custsales"></a>1.11.3.&nbsp;Customer / Sales Case Study</h3></div></div></div><p>Assume that HBase is used to store customer and sales information.  There are two core record-types being ingested:  
+        a Customer record type, and Sales record type.
+      </p><p>The Customer record type would include all the things that you&#8217;d typically expect:
+        </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Customer number</li><li class="listitem">Customer name</li><li class="listitem">Address (e.g., city, state, zip)</li><li class="listitem">Phone numbers, etc.</li></ul></div><p>
+     </p><p>The Sales record type would include things like:
+        </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Customer number</li><li class="listitem">Sales/order number</li><li class="listitem">Sales date</li><li class="listitem">A series of nested objects for shipping locations and line-items (this itself is a design case study)</li></ul></div><p>
+    </p><p>Assuming that the combination of customer number and sales order uniquely identify an order, these two attributes will compose
+ the rowkey, and specifically a composite key such as:
+    </p><p><code class="code">[customer number][sales number]</code>
+    </p><p>
+&#8230; for a SALES table.  However, there are more design decisions to make:  are the <span class="emphasis"><em>raw</em></span> values the best choices for rowkeys?
+    </p><p>The same design questions in the Log Data use-case confront us here.  What is the keyspace of the customer number, and what is the 
+format (e.g., numeric?  alphanumeric?) As it is advantageous to use fixed-length keys in HBase, as well as keys that can support a 
+reasonable spread in the keyspace, similar options appear:
+    </p><p>Composite Rowkey With Hashes:  
+      </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[MD5 of customer number] = 16 bytes</li><li class="listitem">[MD5 of sales number] = 16 bytes</li></ul></div><p>
+    </p><p>Composite Numeric/Hash Combo Rowkey: 
+      </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[substituted long for customer number] = 8 bytes</li><li class="listitem">[MD5 of sales number] = 16 bytes</li></ul></div><p>
+     </p><div class="section" title="1.11.3.1.&nbsp;Single Table? Multiple Tables?"><div class="titlepage"><div><div><h4 class="title"><a name="schema.casestudies.log-timeseries.custsales.tables"></a>1.11.3.1.&nbsp;Single Table?  Multiple Tables?</h4></div></div></div><p>A traditional design approach would have separate tables for CUSTOMER and SALES.  Another option is to pack multiple 
+            record types into a single table (e.g., CUSTOMER++).            
+            </p><p>Customer Record Type Rowkey:
+              </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[customer-id]</li><li class="listitem">[type] = type indicating &#8216;1&#8217; for customer record type</li></ul></div><p>
+            </p><p>Sales Record Type Rowkey:
+              </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[customer-id]</li><li class="listitem">[type] = type indicating &#8216;2&#8217; for sales record type</li><li class="listitem">[sales-order]</li></ul></div><p>
+            </p><p>The advantage of this particular CUSTOMER++ approach is that organizes many different record-types by customer-id 
+            (e.g., a single scan could get you everything about that customer).  The disadvantage is that it&#8217;s not as easy to scan for
+            a particular record-type.
+            </p></div></div><div class="section" title="1.11.4.&nbsp;&#34;Tall/Wide/Middle&#34; Schema Design Smackdown"><div class="titlepage"><div><div><h3 class="title"><a name="schema.smackdown"></a>1.11.4.&nbsp;"Tall/Wide/Middle" Schema Design Smackdown</h3></div></div></div><p>This section will describe additional schema design questions that appear on the dist-list, specifically about
+	  tall and wide tables.  These are general guidelines and not laws - each application must consider its own needs.
+	  </p><div class="section" title="1.11.4.1.&nbsp;Rows vs. Versions"><div class="titlepage"><div><div><h4 class="title"><a name="schema.smackdown.rowsversions"></a>1.11.4.1.&nbsp;Rows vs. Versions</h4></div></div></div><p>A common question is whether one should prefer rows or HBase's built-in-versioning.  The context is typically where there are
+	    "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 3 max versions).  The
+	    rows-approach would require storing a timstamp in some portion of the rowkey so that they would not overwite with each successive update.
+	    </p><p>Preference:  Rows (generally speaking).
+	    </p></div><div class="section" title="1.11.4.2.&nbsp;Rows vs. Columns"><div class="titlepage"><div><div><h4 class="title"><a name="schema.smackdown.rowscols"></a>1.11.4.2.&nbsp;Rows vs. Columns</h4></div></div></div><p>Another common question is whether one should prefer rows or columns.  The context is typically in extreme cases of wide
+	    tables, such as having 1 row with 1 million attributes, or 1 million rows with 1 columns apiece.
+	    </p><p>Preference:  Rows (generally speaking).  To be clear, this guideline is in the context is in extremely wide cases, not in the
+	    standard use-case where one needs to store a few dozen or hundred columns.  But there is also a middle path between these two
+	    options, and that is "Rows as Columns."
+	    </p></div><div class="section" title="1.11.4.3.&nbsp;Rows as Columns"><div class="titlepage"><div><div><h4 class="title"><a name="schema.smackdown.rowsascols"></a>1.11.4.3.&nbsp;Rows as Columns</h4></div></div></div><p>The middle path between Rows vs. Columns is packing data that would be a separate row into columns, for certain rows.
+	    OpenTSDB is the best example of this case where a single row represents a defined time-range, and then discrete events are treated as
+	    columns.  This approach is often more complex, and may require the additional complexity of re-writing your data, but has the
+	    advantage of being I/O efficient.  For an overview of this approach, see
+	    <a class="link" href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html" target="_top">Lessons Learned from OpenTSDB</a>
+	    from HBaseCon2012.
+	    </p></div></div></div><div id="disqus_thread"></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = 'schema.casestudies';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="constraints.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="schema.ops.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.10.&nbsp;Constraints&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="schema_design.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.12.&nbsp;Operational and Performance Configuration Options</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/schema_design/schema.joins.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/schema_design/schema.joins.html?rev=1463654&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/schema_design/schema.joins.html (added)
+++ hbase/hbase.apache.org/trunk/schema_design/schema.joins.html Tue Apr  2 18:07:08 2013
@@ -0,0 +1,17 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>1.6.&nbsp;Joins</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="up" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="prev" href="supported.datatypes.html" title="1.5.&nbsp; Supported Datatypes"><link rel="next" href="ttl.html" title="1.7.&nbsp;Time To Live (TTL)"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.6.&nbsp;Joins</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="supported.datatypes.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="ttl.html">Next</a></td></tr></table><hr></div
 ><div class="section" title="1.6.&nbsp;Joins"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="schema.joins"></a>1.6.&nbsp;Joins</h2></div></div></div><p>If you have multiple tables, don't forget to factor in the potential for <a class="xref" href="">???</a> into the schema design.
+    </p></div><div id="disqus_thread"></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = 'schema.joins';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="supported.datatypes.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="ttl.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.5.&nbsp;
+  Supported Datatypes
+  &nbsp;</td><td width="20%" align="center"><a accesskey="h" href="schema_design.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.7.&nbsp;Time To Live (TTL)</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/schema_design/schema.ops.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/schema_design/schema.ops.html?rev=1463654&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/schema_design/schema.ops.html (added)
+++ hbase/hbase.apache.org/trunk/schema_design/schema.ops.html Tue Apr  2 18:07:08 2013
@@ -0,0 +1,16 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>1.12.&nbsp;Operational and Performance Configuration Options</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="up" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="prev" href="schema.casestudies.html" title="1.11.&nbsp;Schema Design Case Studies"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.12.&nbsp;Operational and Performance Configuration Options</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="schema.casestudies.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;</td></tr></table><hr></div><div class="sec
 tion" title="1.12.&nbsp;Operational and Performance Configuration Options"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="schema.ops"></a>1.12.&nbsp;Operational and Performance Configuration Options</h2></div></div></div><p>See the Performance section <a class="xref" href="">???</a> for more information operational and performance
+    schema design options, such as Bloom Filters, Table-configured regionsizes, compression, and blocksizes.
+    </p></div><div id="disqus_thread"></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = 'schema.ops';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="schema.casestudies.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;</td></tr><tr><td width="40%" align="left" valign="top">1.11.&nbsp;Schema Design Case Studies&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="schema_design.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/schema_design/schema.versions.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/schema_design/schema.versions.html?rev=1463654&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/schema_design/schema.versions.html (added)
+++ hbase/hbase.apache.org/trunk/schema_design/schema.versions.html Tue Apr  2 18:07:08 2013
@@ -0,0 +1,40 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>1.4.&nbsp; Number of Versions</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="up" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="prev" href="rowkey.design.html" title="1.3.&nbsp;Rowkey Design"><link rel="next" href="supported.datatypes.html" title="1.5.&nbsp; Supported Datatypes"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.4.&nbsp;
+  Number of Versions
+  </th></tr><tr><td width="20%" align="left"><a accesskey="p" href="rowkey.design.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="supported.datatypes.html">Next</a></td></tr></table><hr></div><div class="section" title="1.4.&nbsp; Number of Versions"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="schema.versions"></a>1.4.&nbsp;
+  Number of Versions
+  </h2></div></div></div><div class="section" title="1.4.1.&nbsp;Maximum Number of Versions"><div class="titlepage"><div><div><h3 class="title"><a name="schema.versions.max"></a>1.4.1.&nbsp;Maximum Number of Versions</h3></div></div></div><p>The maximum number of row versions to store is configured per column
+      family via <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html" target="_top">HColumnDescriptor</a>.
+      The default for max versions is 3.
+      This is an important parameter because as described in <a class="xref" href="">???</a>
+      section HBase does <span class="emphasis"><em>not</em></span> overwrite row values, but rather
+      stores different values per row by time (and qualifier).  Excess versions are removed during major
+      compactions.  The number of max versions may need to be increased or decreased depending on application needs.
+      </p><p>It is not recommended setting the number of max versions to an exceedingly high level (e.g., hundreds or more) unless those old values are
+      very dear to you because this will greatly increase StoreFile size.
+      </p></div><div class="section" title="1.4.2.&nbsp; Minimum Number of Versions"><div class="titlepage"><div><div><h3 class="title"><a name="schema.minversions"></a>1.4.2.&nbsp;
+    Minimum Number of Versions
+    </h3></div></div></div><p>Like maximum number of row versions, the minimum number of row versions to keep is configured per column
+      family via <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html" target="_top">HColumnDescriptor</a>.
+      The default for min versions is 0, which means the feature is disabled.
+      The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the
+      number of row versions parameter to allow configurations such as
+      "keep the last T minutes worth of data, at most N versions, <span class="emphasis"><em>but keep at least M versions around</em></span>"
+      (where M is the value for minimum number of row versions, M&lt;N).
+      This parameter should only be set when time-to-live is enabled for a column family and must be less than the
+      number of row versions.
+    </p></div></div><div id="disqus_thread"></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = 'schema.versions';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="rowkey.design.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="supported.datatypes.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.3.&nbsp;Rowkey Design&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="schema_design.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.5.&nbsp;
+  Supported Datatypes
+  </td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/schema_design/schema_design.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/schema_design/schema_design.html?rev=1463654&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/schema_design/schema_design.html (added)
+++ hbase/hbase.apache.org/trunk/schema_design/schema_design.html Tue Apr  2 18:07:08 2013
@@ -0,0 +1,72 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>Chapter&nbsp;1.&nbsp;HBase and Schema Design</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="next" href="number.of.cfs.html" title="1.2.&nbsp; On the number of column families"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter&nbsp;1.&nbsp;HBase and Schema Design</th></tr><tr><td width="20%" align="left">&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="number.of.cfs.html">Next</a></td></tr></table><hr></div><div class="chapter" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><div class="titlepage"><div><div><h2 class="title"><a name="schema"></a
 >Chapter&nbsp;1.&nbsp;HBase and Schema Design</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="schema_design.html#schema.creation">1.1. 
+      Schema Creation
+  </a></span></dt><dd><dl><dt><span class="section"><a href="schema_design.html#schema.updates">1.1.1. Schema Updates</a></span></dt></dl></dd><dt><span class="section"><a href="number.of.cfs.html">1.2. 
+      On the number of column families
+  </a></span></dt><dd><dl><dt><span class="section"><a href="number.of.cfs.html#number.of.cfs.card">1.2.1. Cardinality of ColumnFamilies</a></span></dt></dl></dd><dt><span class="section"><a href="rowkey.design.html">1.3. Rowkey Design</a></span></dt><dd><dl><dt><span class="section"><a href="rowkey.design.html#timeseries">1.3.1. 
+    Monotonically Increasing Row Keys/Timeseries Data
+    </a></span></dt><dt><span class="section"><a href="rowkey.design.html#keysize">1.3.2. Try to minimize row and column sizes</a></span></dt><dt><span class="section"><a href="rowkey.design.html#reverse.timestamp">1.3.3. Reverse Timestamps</a></span></dt><dt><span class="section"><a href="rowkey.design.html#rowkey.scope">1.3.4. Rowkeys and ColumnFamilies</a></span></dt><dt><span class="section"><a href="rowkey.design.html#changing.rowkeys">1.3.5. Immutability of Rowkeys</a></span></dt><dt><span class="section"><a href="rowkey.design.html#rowkey.regionsplits">1.3.6. Relationship Between RowKeys and Region Splits</a></span></dt></dl></dd><dt><span class="section"><a href="schema.versions.html">1.4. 
+  Number of Versions
+  </a></span></dt><dd><dl><dt><span class="section"><a href="schema.versions.html#schema.versions.max">1.4.1. Maximum Number of Versions</a></span></dt><dt><span class="section"><a href="schema.versions.html#schema.minversions">1.4.2. 
+    Minimum Number of Versions
+    </a></span></dt></dl></dd><dt><span class="section"><a href="supported.datatypes.html">1.5. 
+  Supported Datatypes
+  </a></span></dt><dd><dl><dt><span class="section"><a href="supported.datatypes.html#counters">1.5.1. Counters</a></span></dt></dl></dd><dt><span class="section"><a href="schema.joins.html">1.6. Joins</a></span></dt><dt><span class="section"><a href="ttl.html">1.7. Time To Live (TTL)</a></span></dt><dt><span class="section"><a href="cf.keep.deleted.html">1.8. 
+  Keeping Deleted Cells
+  </a></span></dt><dt><span class="section"><a href="secondary.indexes.html">1.9. 
+  Secondary Indexes and Alternate Query Paths
+  </a></span></dt><dd><dl><dt><span class="section"><a href="secondary.indexes.html#secondary.indexes.filter">1.9.1. 
+       Filter Query
+      </a></span></dt><dt><span class="section"><a href="secondary.indexes.html#secondary.indexes.periodic">1.9.2. 
+       Periodic-Update Secondary Index
+      </a></span></dt><dt><span class="section"><a href="secondary.indexes.html#secondary.indexes.dualwrite">1.9.3. 
+       Dual-Write Secondary Index
+      </a></span></dt><dt><span class="section"><a href="secondary.indexes.html#secondary.indexes.summary">1.9.4. 
+       Summary Tables
+      </a></span></dt><dt><span class="section"><a href="secondary.indexes.html#secondary.indexes.coproc">1.9.5. 
+       Coprocessor Secondary Index
+      </a></span></dt></dl></dd><dt><span class="section"><a href="constraints.html">1.10. Constraints</a></span></dt><dt><span class="section"><a href="schema.casestudies.html">1.11. Schema Design Case Studies</a></span></dt><dd><dl><dt><span class="section"><a href="schema.casestudies.html#schema.casestudies.log-timeseries">1.11.1. Log Data and Timeseries Data Case Study</a></span></dt><dt><span class="section"><a href="schema.casestudies.html#schema.casestudies.log-timeseries.log-steroids">1.11.2. Log Data and Timeseries Data on Steroids Case Study</a></span></dt><dt><span class="section"><a href="schema.casestudies.html#schema.casestudies.log-timeseries.custsales">1.11.3. Customer / Sales Case Study</a></span></dt><dt><span class="section"><a href="schema.casestudies.html#schema.smackdown">1.11.4. "Tall/Wide/Middle" Schema Design Smackdown</a></span></dt></dl></dd><dt><span class="section"><a href="schema.ops.html">1.12. Operational and Performance Configuration Options<
 /a></span></dt></dl></div><p>A good general introduction on the strength and weaknesses modelling on
+          the various non-rdbms datastores is Ian Varley's Master thesis,
+          <a class="link" href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf" target="_top">No Relation: The Mixed Blessings of Non-Relational Databases</a>.
+          Recommended.  Also, read <a class="xref" href="">???</a> for how HBase stores data internally, and the section on 
+          <a class="xref" href="schema.casestudies.html" title="1.11.&nbsp;Schema Design Case Studies">Section&nbsp;1.11, &#8220;Schema Design Case Studies&#8221;</a>.
+      </p><div class="section" title="1.1.&nbsp; Schema Creation"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="schema.creation"></a>1.1.&nbsp;
+      Schema Creation
+  </h2></div></div></div><p>HBase schemas can be created or updated with <a class="xref" href="">???</a>
+      or by using <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html" target="_top">HBaseAdmin</a> in the Java API.
+      </p><p>Tables must be disabled when making ColumnFamily modifications, for example..
+      </p><pre class="programlisting">
+Configuration config = HBaseConfiguration.create();
+HBaseAdmin admin = new HBaseAdmin(conf);
+String table = "myTable";
+
+admin.disableTable(table);
+
+HColumnDescriptor cf1 = ...;
+admin.addColumn(table, cf1);      // adding new ColumnFamily
+HColumnDescriptor cf2 = ...;
+admin.modifyColumn(table, cf2);    // modifying existing ColumnFamily
+
+admin.enableTable(table);
+      </pre><p>
+      </p>See <a class="xref" href="">???</a> for more information about configuring client connections.
+      <p>Note:  online schema changes are supported in the 0.92.x codebase, but the 0.90.x codebase requires the table
+      to be disabled.
+      </p><div class="section" title="1.1.1.&nbsp;Schema Updates"><div class="titlepage"><div><div><h3 class="title"><a name="schema.updates"></a>1.1.1.&nbsp;Schema Updates</h3></div></div></div><p>When changes are made to either Tables or ColumnFamilies (e.g., region size, block size), these changes
+      take effect the next time there is a major compaction and the StoreFiles get re-written.
+      </p><p>See <a class="xref" href="">???</a> for more information on StoreFiles.
+      </p></div></div></div><div id="disqus_thread"></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = 'schema';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left">&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="number.of.cfs.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right" valign="top">&nbsp;1.2.&nbsp;
+      On the number of column families
+  </td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/schema_design/secondary.indexes.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/schema_design/secondary.indexes.html?rev=1463654&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/schema_design/secondary.indexes.html (added)
+++ hbase/hbase.apache.org/trunk/schema_design/secondary.indexes.html Tue Apr  2 18:07:08 2013
@@ -0,0 +1,49 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>1.9.&nbsp; Secondary Indexes and Alternate Query Paths</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="up" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="prev" href="cf.keep.deleted.html" title="1.8.&nbsp; Keeping Deleted Cells"><link rel="next" href="constraints.html" title="1.10.&nbsp;Constraints"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.9.&nbsp;
+  Secondary Indexes and Alternate Query Paths
+  </th></tr><tr><td width="20%" align="left"><a accesskey="p" href="cf.keep.deleted.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="constraints.html">Next</a></td></tr></table><hr></div><div class="section" title="1.9.&nbsp; Secondary Indexes and Alternate Query Paths"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="secondary.indexes"></a>1.9.&nbsp;
+  Secondary Indexes and Alternate Query Paths
+  </h2></div></div></div><p>This section could also be titled "what if my table rowkey looks like <span class="emphasis"><em>this</em></span> but I also want to query my table like <span class="emphasis"><em>that</em></span>."
+  A common example on the dist-list is where a row-key is of the format "user-timestamp" but there are reporting requirements on activity across users for certain
+  time ranges.  Thus, selecting by user is easy because it is in the lead position of the key, but time is not.
+  </p><p>There is no single answer on the best way to handle this because it depends on...
+   </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Number of users</li><li class="listitem">Data size and data arrival rate</li><li class="listitem">Flexibility of reporting requirements (e.g., completely ad-hoc date selection vs. pre-configured ranges) </li><li class="listitem">Desired execution speed of query (e.g., 90 seconds may be reasonable to some for an ad-hoc report, whereas it may be too long for others) </li></ul></div><p>
+   ... and solutions are also influenced by the size of the cluster and how much processing power you have to throw at the solution.
+   Common techniques are in sub-sections below.  This is a comprehensive, but not exhaustive, list of approaches.
+  </p><p>It should not be a surprise that secondary indexes require additional cluster space and processing.
+  This is precisely what happens in an RDBMS because the act of creating an alternate index requires both space and processing cycles to update.  RBDMS products
+  are more advanced in this regard to handle alternative index management out of the box.  However, HBase scales better at larger data volumes, so this is a feature trade-off.
+  </p><p>Pay attention to <a class="xref" href="">???</a> when implementing any of these approaches.</p><p>Additionally, see the David Butler response in this dist-list thread <a class="link" href="http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&amp;subj=Stargate+hbase" target="_top">HBase, mail # user - Stargate+hbase</a>
+   </p><div class="section" title="1.9.1.&nbsp; Filter Query"><div class="titlepage"><div><div><h3 class="title"><a name="secondary.indexes.filter"></a>1.9.1.&nbsp;
+       Filter Query
+      </h3></div></div></div><p>Depending on the case, it may be appropriate to use <a class="xref" href="">???</a>.  In this case, no secondary index is created.
+      However, don't try a full-scan on a large table like this from an application (i.e., single-threaded client).
+      </p></div><div class="section" title="1.9.2.&nbsp; Periodic-Update Secondary Index"><div class="titlepage"><div><div><h3 class="title"><a name="secondary.indexes.periodic"></a>1.9.2.&nbsp;
+       Periodic-Update Secondary Index
+      </h3></div></div></div><p>A secondary index could be created in an other table which is periodically updated via a MapReduce job.  The job could be executed intra-day, but depending on
+      load-strategy it could still potentially be out of sync with the main data table.</p><p>See <a class="xref" href="">???</a> for more information.</p></div><div class="section" title="1.9.3.&nbsp; Dual-Write Secondary Index"><div class="titlepage"><div><div><h3 class="title"><a name="secondary.indexes.dualwrite"></a>1.9.3.&nbsp;
+       Dual-Write Secondary Index
+      </h3></div></div></div><p>Another strategy is to build the secondary index while publishing data to the cluster (e.g., write to data table, write to index table).
+      If this is approach is taken after a data table already exists, then bootstrapping will be needed for the secondary index with a MapReduce job (see <a class="xref" href="secondary.indexes.html#secondary.indexes.periodic" title="1.9.2.&nbsp; Periodic-Update Secondary Index">Section&nbsp;1.9.2, &#8220;
+       Periodic-Update Secondary Index
+      &#8221;</a>).</p></div><div class="section" title="1.9.4.&nbsp; Summary Tables"><div class="titlepage"><div><div><h3 class="title"><a name="secondary.indexes.summary"></a>1.9.4.&nbsp;
+       Summary Tables
+      </h3></div></div></div><p>Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach.
+      These would be generated with MapReduce jobs into another table.</p><p>See <a class="xref" href="">???</a> for more information.</p></div><div class="section" title="1.9.5.&nbsp; Coprocessor Secondary Index"><div class="titlepage"><div><div><h3 class="title"><a name="secondary.indexes.coproc"></a>1.9.5.&nbsp;
+       Coprocessor Secondary Index
+      </h3></div></div></div><p>Coprocessors act like RDBMS triggers.  These were added in 0.92.  For more information, see <a class="xref" href="">???</a>
+      </p></div></div><div id="disqus_thread"></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = 'secondary.indexes';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="cf.keep.deleted.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="constraints.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.8.&nbsp;
+  Keeping Deleted Cells
+  &nbsp;</td><td width="20%" align="center"><a accesskey="h" href="schema_design.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.10.&nbsp;Constraints</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/schema_design/supported.datatypes.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/schema_design/supported.datatypes.html?rev=1463654&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/schema_design/supported.datatypes.html (added)
+++ hbase/hbase.apache.org/trunk/schema_design/supported.datatypes.html Tue Apr  2 18:07:08 2013
@@ -0,0 +1,30 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>1.5.&nbsp; Supported Datatypes</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="up" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="prev" href="schema.versions.html" title="1.4.&nbsp; Number of Versions"><link rel="next" href="schema.joins.html" title="1.6.&nbsp;Joins"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.5.&nbsp;
+  Supported Datatypes
+  </th></tr><tr><td width="20%" align="left"><a accesskey="p" href="schema.versions.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="schema.joins.html">Next</a></td></tr></table><hr></div><div class="section" title="1.5.&nbsp; Supported Datatypes"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="supported.datatypes"></a>1.5.&nbsp;
+  Supported Datatypes
+  </h2></div></div></div><p>HBase supports a "bytes-in/bytes-out" interface via <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html" target="_top">Put</a> and
+  <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html" target="_top">Result</a>, so anything that can be
+  converted to an array of bytes can be stored as a value.  Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
+  </p><p>There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask);
+  search the mailling list for conversations on this topic. All rows in HBase conform to the <a class="xref" href="">???</a>, and
+  that includes versioning.  Take that into consideration when making your design, as well as block size for the ColumnFamily.
+  </p><div class="section" title="1.5.1.&nbsp;Counters"><div class="titlepage"><div><div><h3 class="title"><a name="counters"></a>1.5.1.&nbsp;Counters</h3></div></div></div><p>
+      One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers).  See
+      <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29" target="_top">Increment</a> in HTable.
+      </p><p>Synchronization on counters are done on the RegionServer, not in the client.
+      </p></div></div><div id="disqus_thread"></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = 'supported.datatypes';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="schema.versions.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="schema.joins.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.4.&nbsp;
+  Number of Versions
+  &nbsp;</td><td width="20%" align="center"><a accesskey="h" href="schema_design.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.6.&nbsp;Joins</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/schema_design/ttl.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/schema_design/ttl.html?rev=1463654&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/schema_design/ttl.html (added)
+++ hbase/hbase.apache.org/trunk/schema_design/ttl.html Tue Apr  2 18:07:08 2013
@@ -0,0 +1,19 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>1.7.&nbsp;Time To Live (TTL)</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="up" href="schema_design.html" title="Chapter&nbsp;1.&nbsp;HBase and Schema Design"><link rel="prev" href="schema.joins.html" title="1.6.&nbsp;Joins"><link rel="next" href="cf.keep.deleted.html" title="1.8.&nbsp; Keeping Deleted Cells"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.7.&nbsp;Time To Live (TTL)</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="schema.joins.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="cf.keep.deleted.html">Next</a></
 td></tr></table><hr></div><div class="section" title="1.7.&nbsp;Time To Live (TTL)"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="ttl"></a>1.7.&nbsp;Time To Live (TTL)</h2></div></div></div><p>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
+  This applies to <span class="emphasis"><em>all</em></span> versions of a row - even the current one.  The TTL time encoded in the HBase for the row is specified in UTC.
+  </p><p>See <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html" target="_top">HColumnDescriptor</a> for more information.
+  </p></div><div id="disqus_thread"></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = 'ttl';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="schema.joins.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="cf.keep.deleted.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.6.&nbsp;Joins&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="schema_design.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.8.&nbsp;
+  Keeping Deleted Cells
+  </td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/upgrading/upgrade0.96.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/upgrading/upgrade0.96.html?rev=1463654&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/upgrading/upgrade0.96.html (added)
+++ hbase/hbase.apache.org/trunk/upgrading/upgrade0.96.html Tue Apr  2 18:07:08 2013
@@ -0,0 +1,19 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>1.2.&nbsp;Upgrading from 0.94.x to 0.96.x</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="upgrading.html" title="Chapter&nbsp;1.&nbsp;Upgrading"><link rel="up" href="upgrading.html" title="Chapter&nbsp;1.&nbsp;Upgrading"><link rel="prev" href="upgrading.html" title="Chapter&nbsp;1.&nbsp;Upgrading"><link rel="next" href="upgrade0.94.html" title="1.3.&nbsp;Upgrading from 0.92.x to 0.94.x"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.2.&nbsp;Upgrading from 0.94.x to 0.96.x</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="upgrading.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="upgrade0.94.html">Next</a></
 td></tr></table><hr></div><div class="section" title="1.2.&nbsp;Upgrading from 0.94.x to 0.96.x"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="upgrade0.96"></a>1.2.&nbsp;Upgrading from 0.94.x to 0.96.x</h2></div><div><h3 class="subtitle">The Singularity</h3></div></div></div><p>You will have to stop your old 0.94 cluster completely to upgrade.  If you are replicating
+     between clusters, both clusters will have to go down to upgrade.  Make sure it is a clean shutdown
+     so there are no WAL files laying around (TODO: Can 0.96 read 0.94 WAL files?).  Make sure
+     zookeeper is cleared of state.  All clients must be upgraded to 0.96 too.
+ </p><p>The API has changed in a few areas; in particular how you use coprocessors (TODO: MapReduce too?)
+ </p><p>TODO: Write about 3.4 zk ensemble and multi support</p></div><div id="disqus_thread"></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = 'upgrade0.96';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="upgrading.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="upgrade0.94.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter&nbsp;1.&nbsp;Upgrading&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="upgrading.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.3.&nbsp;Upgrading from 0.92.x to 0.94.x</td></tr></table></div></body></html>
\ No newline at end of file