You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2013/04/06 08:08:58 UTC
svn commit: r1465200 [7/21] - in /hbase/hbase.apache.org/trunk: ./ book/ case_studies/ community/ configuration/ css/ developer/ getting_started/ images/ ops_mgt/ performance/ rpc/ schema_design/ security/ shell/ troubleshooting/ upgrading/

Modified: hbase/hbase.apache.org/trunk/book/schema.casestudies.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/schema.casestudies.html?rev=1465200&r1=1465199&r2=1465200&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book/schema.casestudies.html (original)
+++ hbase/hbase.apache.org/trunk/book/schema.casestudies.html Sat Apr  6 06:08:56 2013
@@ -3,10 +3,11 @@
    <title>6.11.&nbsp;Schema Design Case Studies</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" href="schema.html" title="Chapter&nbsp;6.&nbsp;HBase and Schema Design"><link rel="prev" href="constraints.html" title="6.10.&nbsp;Constraints"><link rel="next" href="schema.ops.html" title="6.12.&nbsp;Operational and Performance Configuration Options"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">6.11.&nbsp;Schema Design Case Studies</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="constraints.html">Prev</a>&nbsp;</td><th width="60%" align="center">Chapter&nbsp;6.&nbsp;HBase and Schema Design</th><td width="20%" align="right"
 >&nbsp;<a accesskey="n" href="schema.ops.html">Next</a></td></tr></table><hr></div><div class="section" title="6.11.&nbsp;Schema Design Case Studies"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="schema.casestudies"></a>6.11.&nbsp;Schema Design Case Studies</h2></div></div></div><p>The following will describe some typical data ingestion use-cases with HBase, and how the rowkey design and construction
    can be approached.  Note:  this is just an illustration of potential approaches, not an exhaustive list. 
    Know your data, and know your processing requirements.
-  </p><p>There are 3 case studies described:    
-      </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Log Data / Timeseries Data</li><li class="listitem">Log Data / Timeseries on Steroids</li><li class="listitem">Customer/Sales</li></ul></div><p> 
-    ... and then a brief section on "Tall/Wide/Middle" in terms of schema design approaches.
-  </p><div class="section" title="6.11.1.&nbsp;Log Data and Timeseries Data Case Study"><div class="titlepage"><div><div><h3 class="title"><a name="schema.casestudies.log-timeseries"></a>6.11.1.&nbsp;Log Data and Timeseries Data Case Study</h3></div></div></div><p>Assume that the following data elements are being collected.
+  </p><p>It is highly recommended that you read the rest of the <a class="xref" href="schema.html" title="Chapter&nbsp;6.&nbsp;HBase and Schema Design">Chapter&nbsp;6, <i>HBase and Schema Design</i></a> first, before reading
+  these case studies.
+  </p><p>Thee following case studies are described:    
+      </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Log Data / Timeseries Data</li><li class="listitem">Log Data / Timeseries on Steroids</li><li class="listitem">Customer/Order</li><li class="listitem">Tall/Wide/Middle Schema Design</li><li class="listitem">List Data</li></ul></div><p> 
+  </p><div class="section" title="6.11.1.&nbsp;Case Study - Log Data and Timeseries Data"><div class="titlepage"><div><div><h3 class="title"><a name="schema.casestudies.log-timeseries"></a>6.11.1.&nbsp;Case Study - Log Data and Timeseries Data</h3></div></div></div><p>Assume that the following data elements are being collected.
         </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Hostname</li><li class="listitem">Timestamp</li><li class="listitem">Log event</li><li class="listitem">Value/message</li></ul></div><p>
         We can store them in an HBase table called LOG_DATA, but what will the rowkey be?  
        From these attributes the rowkey will be some combination of hostname, timestamp, and log-event - but what specifically?        
@@ -49,8 +50,10 @@ long bucket = timestamp % numBuckets;
         </p><p>So the resulting composite rowkey would be:
 		</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[substituted long for hostname] = 8 bytes</li><li class="listitem">[substituted long for event type] = 8 bytes</li><li class="listitem">[timestamp] = 8 bytes</li></ul></div><p>
 		In either the Hash or Numeric substitution approach, the raw values for hostname and event-type can be stored as columns.
-        </p></div></div><div class="section" title="6.11.2.&nbsp;Log Data and Timeseries Data on Steroids Case Study"><div class="titlepage"><div><div><h3 class="title"><a name="schema.casestudies.log-timeseries.log-steroids"></a>6.11.2.&nbsp;Log Data and Timeseries Data on Steroids Case Study</h3></div></div></div><p>This effectively is the OpenTSDB approach.  What OpenTSDB does is re-write data and pack rows into columns for 
-        certain time-periods.  For a detailed explanation, see:  <a class="link" href="http://opentsdb.net/schema.html" target="_top">http://opentsdb.net/schema.html</a>.
+        </p></div></div><div class="section" title="6.11.2.&nbsp;Case Study - Log Data and Timeseries Data on Steroids"><div class="titlepage"><div><div><h3 class="title"><a name="schema.casestudies.log-steroids"></a>6.11.2.&nbsp;Case Study - Log Data and Timeseries Data on Steroids</h3></div></div></div><p>This effectively is the OpenTSDB approach.  What OpenTSDB does is re-write data and pack rows into columns for 
+        certain time-periods.  For a detailed explanation, see:  <a class="link" href="http://opentsdb.net/schema.html" target="_top">http://opentsdb.net/schema.html</a>, 
+        and <a class="link" href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html" target="_top">Lessons Learned from OpenTSDB</a>
+	    from HBaseCon2012.
       </p><p>But this is how the general concept works:  data is ingested, for example, in this manner&#8230;
 </p><pre class="programlisting">
 [hostname][log-event][timestamp1]
@@ -61,34 +64,79 @@ long bucket = timestamp % numBuckets;
        </p><p><code class="code">[hostname][log-event][timerange]</code>
        </p><p>&#8230; and each of the above events are converted into columns stored with a time-offset relative to the beginning timerange 
        (e.g., every 5 minutes).  This is obviously a very advanced processing technique, but HBase makes this possible.
-      </p></div><div class="section" title="6.11.3.&nbsp;Customer / Sales Case Study"><div class="titlepage"><div><div><h3 class="title"><a name="schema.casestudies.log-timeseries.custsales"></a>6.11.3.&nbsp;Customer / Sales Case Study</h3></div></div></div><p>Assume that HBase is used to store customer and sales information.  There are two core record-types being ingested:  
-        a Customer record type, and Sales record type.
+      </p></div><div class="section" title="6.11.3.&nbsp;Case Study - Customer/Order"><div class="titlepage"><div><div><h3 class="title"><a name="schema.casestudies.custorder"></a>6.11.3.&nbsp;Case Study - Customer/Order</h3></div></div></div><p>Assume that HBase is used to store customer and order information.  There are two core record-types being ingested:  
+        a Customer record type, and Order record type.
       </p><p>The Customer record type would include all the things that you&#8217;d typically expect:
         </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Customer number</li><li class="listitem">Customer name</li><li class="listitem">Address (e.g., city, state, zip)</li><li class="listitem">Phone numbers, etc.</li></ul></div><p>
-     </p><p>The Sales record type would include things like:
-        </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Customer number</li><li class="listitem">Sales/order number</li><li class="listitem">Sales date</li><li class="listitem">A series of nested objects for shipping locations and line-items (this itself is a design case study)</li></ul></div><p>
+     </p><p>The Order record type would include things like:
+        </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Customer number</li><li class="listitem">Order number</li><li class="listitem">Sales date</li><li class="listitem">A series of nested objects for shipping locations and line-items (see <a class="xref" href="schema.casestudies.html#schema.casestudies.custorder.obj" title="6.11.3.2.&nbsp;Order Object Design">Section&nbsp;6.11.3.2, &#8220;Order Object Design&#8221;</a>
+           for details)</li></ul></div><p>
     </p><p>Assuming that the combination of customer number and sales order uniquely identify an order, these two attributes will compose
  the rowkey, and specifically a composite key such as:
-    </p><p><code class="code">[customer number][sales number]</code>
-    </p><p>
-&#8230; for a SALES table.  However, there are more design decisions to make:  are the <span class="emphasis"><em>raw</em></span> values the best choices for rowkeys?
+    </p><p><code class="code">[customer number][order number]</code>
+    </p><p>&#8230; for a ORDER table.  However, there are more design decisions to make:  are the <span class="emphasis"><em>raw</em></span> values the best choices for rowkeys?
     </p><p>The same design questions in the Log Data use-case confront us here.  What is the keyspace of the customer number, and what is the 
 format (e.g., numeric?  alphanumeric?) As it is advantageous to use fixed-length keys in HBase, as well as keys that can support a 
 reasonable spread in the keyspace, similar options appear:
     </p><p>Composite Rowkey With Hashes:  
-      </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[MD5 of customer number] = 16 bytes</li><li class="listitem">[MD5 of sales number] = 16 bytes</li></ul></div><p>
+      </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[MD5 of customer number] = 16 bytes</li><li class="listitem">[MD5 of order number] = 16 bytes</li></ul></div><p>
     </p><p>Composite Numeric/Hash Combo Rowkey: 
-      </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[substituted long for customer number] = 8 bytes</li><li class="listitem">[MD5 of sales number] = 16 bytes</li></ul></div><p>
-     </p><div class="section" title="6.11.3.1.&nbsp;Single Table? Multiple Tables?"><div class="titlepage"><div><div><h4 class="title"><a name="schema.casestudies.log-timeseries.custsales.tables"></a>6.11.3.1.&nbsp;Single Table?  Multiple Tables?</h4></div></div></div><p>A traditional design approach would have separate tables for CUSTOMER and SALES.  Another option is to pack multiple 
+      </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[substituted long for customer number] = 8 bytes</li><li class="listitem">[MD5 of order number] = 16 bytes</li></ul></div><p>
+     </p><div class="section" title="6.11.3.1.&nbsp;Single Table? Multiple Tables?"><div class="titlepage"><div><div><h4 class="title"><a name="schema.casestudies.custorder.tables"></a>6.11.3.1.&nbsp;Single Table?  Multiple Tables?</h4></div></div></div><p>A traditional design approach would have separate tables for CUSTOMER and SALES.  Another option is to pack multiple 
             record types into a single table (e.g., CUSTOMER++).            
             </p><p>Customer Record Type Rowkey:
               </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[customer-id]</li><li class="listitem">[type] = type indicating &#8216;1&#8217; for customer record type</li></ul></div><p>
-            </p><p>Sales Record Type Rowkey:
-              </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[customer-id]</li><li class="listitem">[type] = type indicating &#8216;2&#8217; for sales record type</li><li class="listitem">[sales-order]</li></ul></div><p>
+            </p><p>Order Record Type Rowkey:
+              </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[customer-id]</li><li class="listitem">[type] = type indicating &#8216;2&#8217; for order record type</li><li class="listitem">[order]</li></ul></div><p>
             </p><p>The advantage of this particular CUSTOMER++ approach is that organizes many different record-types by customer-id 
             (e.g., a single scan could get you everything about that customer).  The disadvantage is that it&#8217;s not as easy to scan for
             a particular record-type.
-            </p></div></div><div class="section" title="6.11.4.&nbsp;&#34;Tall/Wide/Middle&#34; Schema Design Smackdown"><div class="titlepage"><div><div><h3 class="title"><a name="schema.smackdown"></a>6.11.4.&nbsp;"Tall/Wide/Middle" Schema Design Smackdown</h3></div></div></div><p>This section will describe additional schema design questions that appear on the dist-list, specifically about
+            </p></div><div class="section" title="6.11.3.2.&nbsp;Order Object Design"><div class="titlepage"><div><div><h4 class="title"><a name="schema.casestudies.custorder.obj"></a>6.11.3.2.&nbsp;Order Object Design</h4></div></div></div><p>Now we need to address how to model the Order object.  Assume that the class structure is as follows:
+</p><pre class="programlisting">
+<code class="filename">Order</code>
+     <code class="filename">ShippingLocation</code>     (an Order can have multiple ShippingLocations)
+          <code class="filename">LineItem</code>               (a ShippingLocation can have multiple LineItems)
+</pre><p>
+	       ... there are multiple options on storing this data.
+	      </p><div class="section" title="6.11.3.2.1.&nbsp;Completely Normalized"><div class="titlepage"><div><div><h5 class="title"><a name="schema.casestudies.custorder.obj.norm"></a>6.11.3.2.1.&nbsp;Completely Normalized</h5></div></div></div><p>With this approach, there would be separate tables for ORDER, SHIPPING_LOCATION, and LINE_ITEM.          
+	        </p><p>The ORDER table's rowkey was described above: <a class="xref" href="schema.casestudies.html#schema.casestudies.custorder" title="6.11.3.&nbsp;Case Study - Customer/Order">Section&nbsp;6.11.3, &#8220;Case Study - Customer/Order&#8221;</a>
+	        </p><p>The SHIPPING_LOCATION's composite rowkey would be something like this:
+	          </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[order-rowkey]</li><li class="listitem">[shipping location number] (e.g., 1st location, 2nd, etc.)</li></ul></div><p>
+	        </p><p>The LINE_ITEM table's composite rowkey would be something like this:
+	          </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[order-rowkey]</li><li class="listitem">[shipping location number] (e.g., 1st location, 2nd, etc.)</li><li class="listitem">[line item number] (e.g., 1st lineitem, 2nd, etc.)</li></ul></div><p>
+	        </p><p>Such a normalized model is likely to be the approach with an RDBMS, but that's not your only option with HBase.
+	        The cons of such an approach is that to retrieve information about any Order, you will need:
+	          </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">Get on the ORDER table for the Order</li><li class="listitem">Scan on the SHIPPING_LOCATION table for that order to get the ShippingLocation instances</li><li class="listitem">Scan on the LINE_ITEM for each ShippingLocation</li></ul></div><p>
+	          ... granted, this is what an RDBMS would do under the covers anyway, but since there are no joins in HBase
+	          you're just more aware of this fact.
+	        </p></div><div class="section" title="6.11.3.2.2.&nbsp;Single Table With Record Types"><div class="titlepage"><div><div><h5 class="title"><a name="schema.casestudies.custorder.obj.rectype"></a>6.11.3.2.2.&nbsp;Single Table With Record Types</h5></div></div></div><p>With this approach, there would exist a single table ORDER that would contain 
+	        </p><p>The Order rowkey was described above: <a class="xref" href="schema.casestudies.html#schema.casestudies.custorder" title="6.11.3.&nbsp;Case Study - Customer/Order">Section&nbsp;6.11.3, &#8220;Case Study - Customer/Order&#8221;</a>
+	          </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[order-rowkey]</li><li class="listitem">[ORDER record type]</li></ul></div><p>
+	        </p><p>The ShippingLocation composite rowkey would be something like this:
+	          </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[order-rowkey]</li><li class="listitem">[SHIPPING record type]</li><li class="listitem">[shipping location number] (e.g., 1st location, 2nd, etc.)</li></ul></div><p>
+	        </p><p>The LineItem composite rowkey would be something like this:
+	          </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[order-rowkey]</li><li class="listitem">[LINE record type]</li><li class="listitem">[shipping location number] (e.g., 1st location, 2nd, etc.)</li><li class="listitem">[line item number] (e.g., 1st lineitem, 2nd, etc.)</li></ul></div><p>
+	        </p></div><div class="section" title="6.11.3.2.3.&nbsp;Denormalized"><div class="titlepage"><div><div><h5 class="title"><a name="schema.casestudies.custorder.obj.denorm"></a>6.11.3.2.3.&nbsp;Denormalized</h5></div></div></div><p>A variant of the Single Table With Record Types approach is to denormalize and flatten some of the object 
+	        hierarchy, such as collapsing the ShippingLocation attributes onto each LineItem instance.
+	        </p><p>The LineItem composite rowkey would be something like this:
+	          </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">[order-rowkey]</li><li class="listitem">[LINE record type]</li><li class="listitem">[line item number] (e.g., 1st lineitem, 2nd, etc. - care must be taken that there are unique across the entire order)</li></ul></div><p>
+	        </p><p>... and the LineItem columns would be something like this:
+	          </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">itemNumber</li><li class="listitem">quantity</li><li class="listitem">price</li><li class="listitem">shipToLine1 (denormalized from ShippingLocation)</li><li class="listitem">shipToLine2 (denormalized from ShippingLocation)</li><li class="listitem">shipToCity (denormalized from ShippingLocation)</li><li class="listitem">shipToState (denormalized from ShippingLocation)</li><li class="listitem">shipToZip (denormalized from ShippingLocation)</li></ul></div><p>
+	        </p><p>The pros of this approach include a less complex object heirarchy, but one of the cons is that updating gets more 
+	        complicated in case any of this information changes.
+	        </p></div><div class="section" title="6.11.3.2.4.&nbsp;Object BLOB"><div class="titlepage"><div><div><h5 class="title"><a name="schema.casestudies.custorder.obj.singleobj"></a>6.11.3.2.4.&nbsp;Object BLOB</h5></div></div></div><p>With this approach, the entire Order object graph is treated, in one way or another, as a BLOB.  For example, the 
+	        ORDER table's rowkey was described above: <a class="xref" href="schema.casestudies.html#schema.casestudies.custorder" title="6.11.3.&nbsp;Case Study - Customer/Order">Section&nbsp;6.11.3, &#8220;Case Study - Customer/Order&#8221;</a>, and a 
+	        single column called "order" would contain an object that could be deserialized that contained a container Order, 
+	        ShippingLocations, and LineItems.
+	        </p><p>There are many options here:  JSON, XML, Java Serialization, Avro, Hadoop Writables, etc.  All of them are variants
+	        of the same approach:  encode the object graph to a byte-array.  Care should be taken with this approach to ensure backward 
+	        compatibilty in case the object model changes such that older persisted structures can still be read back out of HBase.
+	        </p><p>Pros are being able to manage complex object graphs with minimal I/O (e.g., a single HBase Get per
+	        Order in this example), but the cons include the aforementioned warning about backward compatiblity of serialization,
+	        language dependencies of serialization (e.g., Java Serialization only works with Java clients), the fact that
+	        you have to deserialize the entire object to get any piece of information inside the BLOB, and the difficulty in 
+	        getting frameworks like Hive to work with custom objects like this.
+	        </p></div></div></div><div class="section" title="6.11.4.&nbsp;Case Study - &#34;Tall/Wide/Middle&#34; Schema Design Smackdown"><div class="titlepage"><div><div><h3 class="title"><a name="schema.smackdown"></a>6.11.4.&nbsp;Case Study - "Tall/Wide/Middle" Schema Design Smackdown</h3></div></div></div><p>This section will describe additional schema design questions that appear on the dist-list, specifically about
 	  tall and wide tables.  These are general guidelines and not laws - each application must consider its own needs.
 	  </p><div class="section" title="6.11.4.1.&nbsp;Rows vs. Versions"><div class="titlepage"><div><div><h4 class="title"><a name="schema.smackdown.rowsversions"></a>6.11.4.1.&nbsp;Rows vs. Versions</h4></div></div></div><p>A common question is whether one should prefer rows or HBase's built-in-versioning.  The context is typically where there are
 	    "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 3 max versions).  The
@@ -103,9 +151,116 @@ reasonable spread in the keyspace, simil
 	    OpenTSDB is the best example of this case where a single row represents a defined time-range, and then discrete events are treated as
 	    columns.  This approach is often more complex, and may require the additional complexity of re-writing your data, but has the
 	    advantage of being I/O efficient.  For an overview of this approach, see
-	    <a class="link" href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html" target="_top">Lessons Learned from OpenTSDB</a>
-	    from HBaseCon2012.
-	    </p></div></div></div><div id="disqus_thread"></div><script type="text/javascript">
+	    <a class="xref" href="">???</a>.
+	    </p></div></div><div class="section" title="6.11.5.&nbsp;Case Study - List Data"><div class="titlepage"><div><div><h3 class="title"><a name="casestudies.schema.listdata"></a>6.11.5.&nbsp;Case Study - List Data</h3></div></div></div><p>The following is an exchange from the user dist-list regarding a fairly common question:  
+    		how to handle per-user list data in Apache HBase. 
+    		</p><p>*** QUESTION ***</p><p>
+    		We're looking at how to store a large amount of (per-user) list data in
+HBase, and we were trying to figure out what kind of access pattern made
+the most sense.  One option is store the majority of the data in a key, so
+we could have something like:
+    		</p><pre class="programlisting">
+&lt;FixedWidthUserName&gt;&lt;FixedWidthValueId1&gt;:"" (no value)
+&lt;FixedWidthUserName&gt;&lt;FixedWidthValueId2&gt;:"" (no value)
+&lt;FixedWidthUserName&gt;&lt;FixedWidthValueId3&gt;:"" (no value)
+			</pre>
+
+The other option we had was to do this entirely using:
+    		<pre class="programlisting">
+&lt;FixedWidthUserName&gt;&lt;FixedWidthPageNum0&gt;:&lt;FixedWidthLength&gt;&lt;FixedIdNextPageNum&gt;&lt;ValueId1&gt;&lt;ValueId2&gt;&lt;ValueId3&gt;...
+&lt;FixedWidthUserName&gt;&lt;FixedWidthPageNum1&gt;:&lt;FixedWidthLength&gt;&lt;FixedIdNextPageNum&gt;&lt;ValueId1&gt;&lt;ValueId2&gt;&lt;ValueId3&gt;...
+    		</pre><p>
+where each row would contain multiple values.
+So in one case reading the first thirty values would be:
+			</p><pre class="programlisting">
+scan { STARTROW =&gt; 'FixedWidthUsername' LIMIT =&gt; 30}
+    		</pre>
+And in the second case it would be
+    		<pre class="programlisting">
+get 'FixedWidthUserName\x00\x00\x00\x00'
+    		</pre><p>
+The general usage pattern would be to read only the first 30 values of
+these lists, with infrequent access reading deeper into the lists.  Some
+users would have &lt;= 30 total values in these lists, and some users would
+have millions (i.e. power-law distribution)
+			</p><p>
+ The single-value format seems like it would take up more space on HBase,
+but would offer some improved retrieval / pagination flexibility.  Would
+there be any significant performance advantages to be able to paginate via
+gets vs paginating with scans?
+			</p><p>
+  My initial understanding was that doing a scan should be faster if our
+paging size is unknown (and caching is set appropriately), but that gets
+should be faster if we'll always need the same page size.  I've ended up
+hearing different people tell me opposite things about performance.  I
+assume the page sizes would be relatively consistent, so for most use cases
+we could guarantee that we only wanted one page of data in the
+fixed-page-length case.  I would also assume that we would have infrequent
+updates, but may have inserts into the middle of these lists (meaning we'd
+need to update all subsequent rows).
+			</p><p>
+Thanks for help / suggestions / follow-up questions.
+			</p><p>*** ANSWER ***</p><p>
+If I understand you correctly, you're ultimately trying to store
+triples in the form "user, valueid, value", right? E.g., something
+like:
+			</p><pre class="programlisting">
+"user123, firstname, Paul",
+"user234, lastname, Smith"
+			</pre><p>
+(But the usernames are fixed width, and the valueids are fixed width).
+			</p><p>
+And, your access pattern is along the lines of: "for user X, list the
+next 30 values, starting with valueid Y". Is that right? And these
+values should be returned sorted by valueid?
+			</p><p>
+The tl;dr version is that you should probably go with one row per
+user+value, and not build a complicated intra-row pagination scheme on
+your own unless you're really sure it is needed.
+			</p><p>
+Your two options mirror a common question people have when designing
+HBase schemas: should I go "tall" or "wide"? Your first schema is
+"tall": each row represents one value for one user, and so there are
+many rows in the table for each user; the row key is user + valueid,
+and there would be (presumably) a single column qualifier that means
+"the value". This is great if you want to scan over rows in sorted
+order by row key (thus my question above, about whether these ids are
+sorted correctly). You can start a scan at any user+valueid, read the
+next 30, and be done. What you're giving up is the ability to have
+transactional guarantees around all the rows for one user, but it
+doesn't sound like you need that. Doing it this way is generally
+recommended (see
+here <a class="link" href="http://hbase.apache.org/book.html#schema.smackdown" target="_top">http://hbase.apache.org/book.html#schema.smackdown</a>).
+			</p><p>
+Your second option is "wide": you store a bunch of values in one row,
+using different qualifiers (where the qualifier is the valueid). The
+simple way to do that would be to just store ALL values for one user
+in a single row. I'm guessing you jumped to the "paginated" version
+because you're assuming that storing millions of columns in a single
+row would be bad for performance, which may or may not be true; as
+long as you're not trying to do too much in a single request, or do
+things like scanning over and returning all of the cells in the row,
+it shouldn't be fundamentally worse. The client has methods that allow
+you to get specific slices of columns.
+			</p><p>
+Note that neither case fundamentally uses more disk space than the
+other; you're just "shifting" part of the identifying information for
+a value either to the left (into the row key, in option one) or to the
+right (into the column qualifiers in option 2). Under the covers,
+every key/value still stores the whole row key, and column family
+name. (If this is a bit confusing, take an hour and watch Lars
+George's excellent video about understanding HBase schema design:
+<a class="link" href="http://www.youtube.com/watch?v=_HLoH_PgrLk)" target="_top">http://www.youtube.com/watch?v=_HLoH_PgrLk)</a>.
+			</p><p>
+A manually paginated version has lots more complexities, as you note,
+like having to keep track of how many things are in each page,
+re-shuffling if new values are inserted, etc. That seems significantly
+more complex. It might have some slight speed advantages (or
+disadvantages!) at extremely high throughput, and the only way to
+really know that would be to try it out. If you don't have time to
+build it both ways and compare, my advice would be to start with the
+simplest option (one row per user+value). Start simple and iterate! :)
+			</p></div></div><div id="disqus_thread"></div><script type="text/javascript">
     var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
     var disqus_url = 'http://hbase.apache.org/book';
     var disqus_identifier = 'schema.casestudies';

Modified: hbase/hbase.apache.org/trunk/book/schema.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/schema.html?rev=1465200&r1=1465199&r2=1465200&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book/schema.html (original)
+++ hbase/hbase.apache.org/trunk/book/schema.html Sat Apr  6 06:08:56 2013
@@ -26,7 +26,7 @@
        Summary Tables
       </a></span></dt><dt><span class="section"><a href="secondary.indexes.html#secondary.indexes.coproc">6.9.5. 
        Coprocessor Secondary Index
-      </a></span></dt></dl></dd><dt><span class="section"><a href="constraints.html">6.10. Constraints</a></span></dt><dt><span class="section"><a href="schema.casestudies.html">6.11. Schema Design Case Studies</a></span></dt><dd><dl><dt><span class="section"><a href="schema.casestudies.html#schema.casestudies.log-timeseries">6.11.1. Log Data and Timeseries Data Case Study</a></span></dt><dt><span class="section"><a href="schema.casestudies.html#schema.casestudies.log-timeseries.log-steroids">6.11.2. Log Data and Timeseries Data on Steroids Case Study</a></span></dt><dt><span class="section"><a href="schema.casestudies.html#schema.casestudies.log-timeseries.custsales">6.11.3. Customer / Sales Case Study</a></span></dt><dt><span class="section"><a href="schema.casestudies.html#schema.smackdown">6.11.4. "Tall/Wide/Middle" Schema Design Smackdown</a></span></dt></dl></dd><dt><span class="section"><a href="schema.ops.html">6.12. Operational and Performance Configuration Options<
 /a></span></dt></dl></div><p>A good general introduction on the strength and weaknesses modelling on
+      </a></span></dt></dl></dd><dt><span class="section"><a href="constraints.html">6.10. Constraints</a></span></dt><dt><span class="section"><a href="schema.casestudies.html">6.11. Schema Design Case Studies</a></span></dt><dd><dl><dt><span class="section"><a href="schema.casestudies.html#schema.casestudies.log-timeseries">6.11.1. Case Study - Log Data and Timeseries Data</a></span></dt><dt><span class="section"><a href="schema.casestudies.html#schema.casestudies.log-steroids">6.11.2. Case Study - Log Data and Timeseries Data on Steroids</a></span></dt><dt><span class="section"><a href="schema.casestudies.html#schema.casestudies.custorder">6.11.3. Case Study - Customer/Order</a></span></dt><dt><span class="section"><a href="schema.casestudies.html#schema.smackdown">6.11.4. Case Study - "Tall/Wide/Middle" Schema Design Smackdown</a></span></dt><dt><span class="section"><a href="schema.casestudies.html#casestudies.schema.listdata">6.11.5. Case Study - List Data</a></span></
 dt></dl></dd><dt><span class="section"><a href="schema.ops.html">6.12. Operational and Performance Configuration Options</a></span></dt></dl></div><p>A good general introduction on the strength and weaknesses modelling on
           the various non-rdbms datastores is Ian Varley's Master thesis,
           <a class="link" href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf" target="_top">No Relation: The Mixed Blessings of Non-Relational Databases</a>.
           Recommended.  Also, read <a class="xref" href="regions.arch.html#keyvalue" title="9.7.6.4.&nbsp;KeyValue">Section&nbsp;9.7.6.4, &#8220;KeyValue&#8221;</a> for how HBase stores data internally, and the section on 

Modified: hbase/hbase.apache.org/trunk/book/security.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/security.html?rev=1465200&r1=1465199&r2=1465200&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book/security.html (original)
+++ hbase/hbase.apache.org/trunk/book/security.html Sat Apr  6 06:08:56 2013
@@ -1,6 +1,6 @@
 <html><head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-   <title>Chapter&nbsp;8.&nbsp;Secure Apache HBase (TM)</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="prev" href="mapreduce.specex.html" title="7.4.&nbsp;Speculative Execution"><link rel="next" href="hbase.accesscontrol.configuration.html" title="8.2.&nbsp;Access Control"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter&nbsp;8.&nbsp;Secure Apache HBase (TM)</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="mapreduce.specex.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" 
 href="hbase.accesscontrol.configuration.html">Next</a></td></tr></table><hr></div><div class="chapter" title="Chapter&nbsp;8.&nbsp;Secure Apache HBase (TM)"><div class="titlepage"><div><div><h2 class="title"><a name="security"></a>Chapter&nbsp;8.&nbsp;Secure Apache HBase (TM)</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="security.html#hbase.secure.configuration">8.1. Secure Client Access to Apache HBase</a></span></dt><dd><dl><dt><span class="section"><a href="security.html#d2475e4468">8.1.1. Prerequisites</a></span></dt><dt><span class="section"><a href="security.html#d2475e4509">8.1.2. Server-side Configuration for Secure Operation</a></span></dt><dt><span class="section"><a href="security.html#d2475e4521">8.1.3. Client-side Configuration for Secure Operation</a></span></dt><dt><span class="section"><a href="security.html#d2475e4560">8.1.4. Client-side Configuration for Secure Operation - Thrift Gateway</a></sp
 an></dt><dt><span class="section"><a href="security.html#d2475e4575">8.1.5. Client-side Configuration for Secure Operation - REST Gateway</a></span></dt></dl></dd><dt><span class="section"><a href="hbase.accesscontrol.configuration.html">8.2. Access Control</a></span></dt><dd><dl><dt><span class="section"><a href="hbase.accesscontrol.configuration.html#d2475e4600">8.2.1. Prerequisites</a></span></dt><dt><span class="section"><a href="hbase.accesscontrol.configuration.html#d2475e4607">8.2.2. Overview</a></span></dt><dt><span class="section"><a href="hbase.accesscontrol.configuration.html#d2475e4764">8.2.3. Server-side Configuration for Access Control</a></span></dt><dt><span class="section"><a href="hbase.accesscontrol.configuration.html#d2475e4776">8.2.4. Shell Enhancements for Access Control</a></span></dt></dl></dd><dt><span class="section"><a href="hbase.secure.bulkload.html">8.3. Secure Bulk Load</a></span></dt></dl></div><div class="section" title="8.1.&nbsp;Secure Clie
 nt Access to Apache HBase"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="hbase.secure.configuration"></a>8.1.&nbsp;Secure Client Access to Apache HBase</h2></div></div></div><p>Newer releases of Apache HBase (TM) (&gt;= 0.92) support optional SASL authentication of clients<sup>[<a name="d2475e4459" href="#ftn.d2475e4459" class="footnote">20</a>]</sup>.</p><p>This describes how to set up Apache HBase and clients for connection to secure HBase resources.</p><div class="section" title="8.1.1.&nbsp;Prerequisites"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e4468"></a>8.1.1.&nbsp;Prerequisites</h3></div></div></div><p>
+   <title>Chapter&nbsp;8.&nbsp;Secure Apache HBase (TM)</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="prev" href="mapreduce.specex.html" title="7.4.&nbsp;Speculative Execution"><link rel="next" href="hbase.accesscontrol.configuration.html" title="8.2.&nbsp;Access Control"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter&nbsp;8.&nbsp;Secure Apache HBase (TM)</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="mapreduce.specex.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" 
 href="hbase.accesscontrol.configuration.html">Next</a></td></tr></table><hr></div><div class="chapter" title="Chapter&nbsp;8.&nbsp;Secure Apache HBase (TM)"><div class="titlepage"><div><div><h2 class="title"><a name="security"></a>Chapter&nbsp;8.&nbsp;Secure Apache HBase (TM)</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="security.html#hbase.secure.configuration">8.1. Secure Client Access to Apache HBase</a></span></dt><dd><dl><dt><span class="section"><a href="security.html#d2519e4701">8.1.1. Prerequisites</a></span></dt><dt><span class="section"><a href="security.html#d2519e4742">8.1.2. Server-side Configuration for Secure Operation</a></span></dt><dt><span class="section"><a href="security.html#d2519e4754">8.1.3. Client-side Configuration for Secure Operation</a></span></dt><dt><span class="section"><a href="security.html#d2519e4793">8.1.4. Client-side Configuration for Secure Operation - Thrift Gateway</a></sp
 an></dt><dt><span class="section"><a href="security.html#d2519e4808">8.1.5. Client-side Configuration for Secure Operation - REST Gateway</a></span></dt></dl></dd><dt><span class="section"><a href="hbase.accesscontrol.configuration.html">8.2. Access Control</a></span></dt><dd><dl><dt><span class="section"><a href="hbase.accesscontrol.configuration.html#d2519e4833">8.2.1. Prerequisites</a></span></dt><dt><span class="section"><a href="hbase.accesscontrol.configuration.html#d2519e4840">8.2.2. Overview</a></span></dt><dt><span class="section"><a href="hbase.accesscontrol.configuration.html#d2519e4997">8.2.3. Server-side Configuration for Access Control</a></span></dt><dt><span class="section"><a href="hbase.accesscontrol.configuration.html#d2519e5009">8.2.4. Shell Enhancements for Access Control</a></span></dt></dl></dd><dt><span class="section"><a href="hbase.secure.bulkload.html">8.3. Secure Bulk Load</a></span></dt></dl></div><div class="section" title="8.1.&nbsp;Secure Clie
 nt Access to Apache HBase"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="hbase.secure.configuration"></a>8.1.&nbsp;Secure Client Access to Apache HBase</h2></div></div></div><p>Newer releases of Apache HBase (TM) (&gt;= 0.92) support optional SASL authentication of clients<sup>[<a name="d2519e4692" href="#ftn.d2519e4692" class="footnote">21</a>]</sup>.</p><p>This describes how to set up Apache HBase and clients for connection to secure HBase resources.</p><div class="section" title="8.1.1.&nbsp;Prerequisites"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e4701"></a>8.1.1.&nbsp;Prerequisites</h3></div></div></div><p>
         You need to have a working Kerberos KDC.
     </p><p>
         A HBase configured for secure client access is expected to be running
@@ -57,7 +57,7 @@
         keytabs for Hadoop. Those steps are omitted here. Copy the resulting
         keytab files to where the client daemon will execute and make them
         readable only to the user account under which the daemon will run.
-    </p></div><div class="section" title="8.1.2.&nbsp;Server-side Configuration for Secure Operation"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e4509"></a>8.1.2.&nbsp;Server-side Configuration for Secure Operation</h3></div></div></div><p>
+    </p></div><div class="section" title="8.1.2.&nbsp;Server-side Configuration for Secure Operation"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e4742"></a>8.1.2.&nbsp;Server-side Configuration for Secure Operation</h3></div></div></div><p>
         Add the following to the <code class="code">hbase-site.xml</code> file on every server machine in the cluster:
     </p><pre class="programlisting">
       &lt;property&gt;
@@ -75,7 +75,7 @@
     </pre><p>
        A full shutdown and restart of HBase service is required when deploying
        these configuration changes.
-    </p></div><div class="section" title="8.1.3.&nbsp;Client-side Configuration for Secure Operation"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e4521"></a>8.1.3.&nbsp;Client-side Configuration for Secure Operation</h3></div></div></div><p>
+    </p></div><div class="section" title="8.1.3.&nbsp;Client-side Configuration for Secure Operation"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e4754"></a>8.1.3.&nbsp;Client-side Configuration for Secure Operation</h3></div></div></div><p>
         Add the following to the <code class="code">hbase-site.xml</code> file on every client:
     </p><pre class="programlisting">
       &lt;property&gt;
@@ -109,7 +109,7 @@
       HTable table = new HTable(conf, tablename);
     </pre><p>
         Expect a ~10% performance penalty for encrypted communication.
-    </p></div><div class="section" title="8.1.4.&nbsp;Client-side Configuration for Secure Operation - Thrift Gateway"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e4560"></a>8.1.4.&nbsp;Client-side Configuration for Secure Operation - Thrift Gateway</h3></div></div></div><p>
+    </p></div><div class="section" title="8.1.4.&nbsp;Client-side Configuration for Secure Operation - Thrift Gateway"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e4793"></a>8.1.4.&nbsp;Client-side Configuration for Secure Operation - Thrift Gateway</h3></div></div></div><p>
         Add the following to the <code class="code">hbase-site.xml</code> file for every Thrift gateway:
     </p><pre class="programlisting">
     &lt;property&gt;
@@ -129,7 +129,7 @@
         credential. No authentication will be performed by the Thrift gateway
         itself. All client access via the Thrift gateway will use the Thrift
         gateway's credential and have its privilege.
-    </p></div><div class="section" title="8.1.5.&nbsp;Client-side Configuration for Secure Operation - REST Gateway"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e4575"></a>8.1.5.&nbsp;Client-side Configuration for Secure Operation - REST Gateway</h3></div></div></div><p>
+    </p></div><div class="section" title="8.1.5.&nbsp;Client-side Configuration for Secure Operation - REST Gateway"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e4808"></a>8.1.5.&nbsp;Client-side Configuration for Secure Operation - REST Gateway</h3></div></div></div><p>
         Add the following to the <code class="code">hbase-site.xml</code> file for every REST gateway:
     </p><pre class="programlisting">
     &lt;property&gt;
@@ -153,7 +153,7 @@
         It should be possible for clients to authenticate with the HBase
         cluster through the REST gateway in a pass-through manner via SPEGNO
         HTTP authentication. This is future work.
-    </p></div></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d2475e4459" href="#d2475e4459" class="para">20</a>] </sup>See
+    </p></div></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d2519e4692" href="#d2519e4692" class="para">21</a>] </sup>See
     also Matteo Bertozzi's article on <a class="link" href="http://www.cloudera.com/blog/2012/09/understanding-user-authentication-and-authorization-in-apache-hbase/" target="_top">Understanding User Authentication and Authorization in Apache HBase</a>.</p></div></div></div><div id="disqus_thread"></div><script type="text/javascript">
     var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
     var disqus_url = 'http://hbase.apache.org/book';

Modified: hbase/hbase.apache.org/trunk/book/shell.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/shell.html?rev=1465200&r1=1465199&r2=1465200&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book/shell.html (original)
+++ hbase/hbase.apache.org/trunk/book/shell.html Sat Apr  6 06:08:56 2013
@@ -1,6 +1,6 @@
 <html><head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-   <title>Chapter&nbsp;4.&nbsp;The Apache HBase Shell</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="prev" href="upgrade0.90.html" title="3.5.&nbsp;Upgrading to HBase 0.90.x from 0.20.x or 0.89.x"><link rel="next" href="shell_tricks.html" title="4.2.&nbsp;Shell Tricks"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter&nbsp;4.&nbsp;The Apache HBase Shell</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="upgrade0.90.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="shell
 _tricks.html">Next</a></td></tr></table><hr></div><div class="chapter" title="Chapter&nbsp;4.&nbsp;The Apache HBase Shell"><div class="titlepage"><div><div><h2 class="title"><a name="shell"></a>Chapter&nbsp;4.&nbsp;The Apache HBase Shell</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="shell.html#scripting">4.1. Scripting</a></span></dt><dt><span class="section"><a href="shell_tricks.html">4.2. Shell Tricks</a></span></dt><dd><dl><dt><span class="section"><a href="shell_tricks.html#d2475e2917">4.2.1. <code class="filename">irbrc</code></a></span></dt><dt><span class="section"><a href="shell_tricks.html#d2475e2935">4.2.2. LOG data to timestamp</a></span></dt><dt><span class="section"><a href="shell_tricks.html#d2475e2953">4.2.3. Debug</a></span></dt><dt><span class="section"><a href="shell_tricks.html#d2475e2975">4.2.4. Commands</a></span></dt></dl></dd></dl></div><p>
+   <title>Chapter&nbsp;4.&nbsp;The Apache HBase Shell</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="prev" href="upgrade0.90.html" title="3.5.&nbsp;Upgrading to HBase 0.90.x from 0.20.x or 0.89.x"><link rel="next" href="shell_tricks.html" title="4.2.&nbsp;Shell Tricks"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter&nbsp;4.&nbsp;The Apache HBase Shell</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="upgrade0.90.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="shell
 _tricks.html">Next</a></td></tr></table><hr></div><div class="chapter" title="Chapter&nbsp;4.&nbsp;The Apache HBase Shell"><div class="titlepage"><div><div><h2 class="title"><a name="shell"></a>Chapter&nbsp;4.&nbsp;The Apache HBase Shell</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="shell.html#scripting">4.1. Scripting</a></span></dt><dt><span class="section"><a href="shell_tricks.html">4.2. Shell Tricks</a></span></dt><dd><dl><dt><span class="section"><a href="shell_tricks.html#d2519e2940">4.2.1. <code class="filename">irbrc</code></a></span></dt><dt><span class="section"><a href="shell_tricks.html#d2519e2958">4.2.2. LOG data to timestamp</a></span></dt><dt><span class="section"><a href="shell_tricks.html#d2519e2976">4.2.3. Debug</a></span></dt><dt><span class="section"><a href="shell_tricks.html#d2519e2998">4.2.4. Commands</a></span></dt></dl></dd></dl></div><p>
         The Apache HBase (TM) Shell is <a class="link" href="http://jruby.org" target="_top">(J)Ruby</a>'s
         IRB with some HBase particular commands added.  Anything you can do in
         IRB, you should be able to do in the HBase Shell.</p><p>To run the HBase shell,

Modified: hbase/hbase.apache.org/trunk/book/shell_tricks.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/shell_tricks.html?rev=1465200&r1=1465199&r2=1465200&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book/shell_tricks.html (original)
+++ hbase/hbase.apache.org/trunk/book/shell_tricks.html Sat Apr  6 06:08:56 2013
@@ -1,6 +1,6 @@
 <html><head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-   <title>4.2.&nbsp;Shell Tricks</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" href="shell.html" title="Chapter&nbsp;4.&nbsp;The Apache HBase Shell"><link rel="prev" href="shell.html" title="Chapter&nbsp;4.&nbsp;The Apache HBase Shell"><link rel="next" href="datamodel.html" title="Chapter&nbsp;5.&nbsp;Data Model"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">4.2.&nbsp;Shell Tricks</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="shell.html">Prev</a>&nbsp;</td><th width="60%" align="center">Chapter&nbsp;4.&nbsp;The Apache HBase Shell</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="datamodel.html">Next</a>
 </td></tr></table><hr></div><div class="section" title="4.2.&nbsp;Shell Tricks"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="shell_tricks"></a>4.2.&nbsp;Shell Tricks</h2></div></div></div><div class="section" title="4.2.1.&nbsp;irbrc"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2917"></a>4.2.1.&nbsp;<code class="filename">irbrc</code></h3></div></div></div><p>Create an <code class="filename">.irbrc</code> file for yourself in your
+   <title>4.2.&nbsp;Shell Tricks</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" href="shell.html" title="Chapter&nbsp;4.&nbsp;The Apache HBase Shell"><link rel="prev" href="shell.html" title="Chapter&nbsp;4.&nbsp;The Apache HBase Shell"><link rel="next" href="datamodel.html" title="Chapter&nbsp;5.&nbsp;Data Model"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">4.2.&nbsp;Shell Tricks</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="shell.html">Prev</a>&nbsp;</td><th width="60%" align="center">Chapter&nbsp;4.&nbsp;The Apache HBase Shell</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="datamodel.html">Next</a>
 </td></tr></table><hr></div><div class="section" title="4.2.&nbsp;Shell Tricks"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="shell_tricks"></a>4.2.&nbsp;Shell Tricks</h2></div></div></div><div class="section" title="4.2.1.&nbsp;irbrc"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2940"></a>4.2.1.&nbsp;<code class="filename">irbrc</code></h3></div></div></div><p>Create an <code class="filename">.irbrc</code> file for yourself in your
                     home directory. Add customizations. A useful one is
                     command history so commands are save across Shell invocations:
                     </p><pre class="programlisting">
@@ -11,7 +11,7 @@
                 See the <span class="application">ruby</span> documentation of
                 <code class="filename">.irbrc</code> to learn about other possible
                 confiurations.
-                </p></div><div class="section" title="4.2.2.&nbsp;LOG data to timestamp"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2935"></a>4.2.2.&nbsp;LOG data to timestamp</h3></div></div></div><p>
+                </p></div><div class="section" title="4.2.2.&nbsp;LOG data to timestamp"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2958"></a>4.2.2.&nbsp;LOG data to timestamp</h3></div></div></div><p>
                 To convert the date '08/08/16 20:56:29' from an hbase log into a timestamp, do:
                 </p><pre class="programlisting">
                     hbase(main):021:0&gt; import java.text.SimpleDateFormat
@@ -25,14 +25,14 @@
             </p><p>
                 To output in a format that is exactly like that of the HBase log format will take a little messing with
                 <a class="link" href="http://download.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html" target="_top">SimpleDateFormat</a>.
-            </p></div><div class="section" title="4.2.3.&nbsp;Debug"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2953"></a>4.2.3.&nbsp;Debug</h3></div></div></div><div class="section" title="4.2.3.1.&nbsp;Shell debug switch"><div class="titlepage"><div><div><h4 class="title"><a name="d2475e2956"></a>4.2.3.1.&nbsp;Shell debug switch</h4></div></div></div><p>You can set a debug switch in the shell to see more output
+            </p></div><div class="section" title="4.2.3.&nbsp;Debug"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2976"></a>4.2.3.&nbsp;Debug</h3></div></div></div><div class="section" title="4.2.3.1.&nbsp;Shell debug switch"><div class="titlepage"><div><div><h4 class="title"><a name="d2519e2979"></a>4.2.3.1.&nbsp;Shell debug switch</h4></div></div></div><p>You can set a debug switch in the shell to see more output
                     -- e.g. more of the stack trace on exception --
                     when you run a command:
                     </p><pre class="programlisting">hbase&gt; debug &lt;RETURN&gt;</pre><p>
-                 </p></div><div class="section" title="4.2.3.2.&nbsp;DEBUG log level"><div class="titlepage"><div><div><h4 class="title"><a name="d2475e2964"></a>4.2.3.2.&nbsp;DEBUG log level</h4></div></div></div><p>To enable DEBUG level logging in the shell,
+                 </p></div><div class="section" title="4.2.3.2.&nbsp;DEBUG log level"><div class="titlepage"><div><div><h4 class="title"><a name="d2519e2987"></a>4.2.3.2.&nbsp;DEBUG log level</h4></div></div></div><p>To enable DEBUG level logging in the shell,
                     launch it with the <span class="command"><strong>-d</strong></span> option.
                     </p><pre class="programlisting">$ ./bin/hbase shell -d</pre><p>
-               </p></div></div><div class="section" title="4.2.4.&nbsp;Commands"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2975"></a>4.2.4.&nbsp;Commands</h3></div></div></div><div class="section" title="4.2.4.1.&nbsp;count"><div class="titlepage"><div><div><h4 class="title"><a name="d2475e2978"></a>4.2.4.1.&nbsp;count</h4></div></div></div><p>Count command returns the number of rows in a table.
+               </p></div></div><div class="section" title="4.2.4.&nbsp;Commands"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2998"></a>4.2.4.&nbsp;Commands</h3></div></div></div><div class="section" title="4.2.4.1.&nbsp;count"><div class="titlepage"><div><div><h4 class="title"><a name="d2519e3001"></a>4.2.4.1.&nbsp;count</h4></div></div></div><p>Count command returns the number of rows in a table.
 		    It's quite fast when configured with the right CACHE
             </p><pre class="programlisting">hbase&gt; count '&lt;tablename&gt;', CACHE =&gt; 1000</pre><p>
             The above count fetches 1000 rows at a time.  Set CACHE lower if your rows are big.

Modified: hbase/hbase.apache.org/trunk/book/snappy.compression.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/snappy.compression.html?rev=1465200&r1=1465199&r2=1465200&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book/snappy.compression.html (original)
+++ hbase/hbase.apache.org/trunk/book/snappy.compression.html Sat Apr  6 06:08:56 2013
@@ -7,7 +7,7 @@
     </h2></div></div></div><p>
         If snappy is installed, HBase can make use of it (courtesy of
         <a class="link" href="http://code.google.com/p/hadoop-snappy/" target="_top">hadoop-snappy</a>
-        <sup>[<a name="d2475e12185" href="#ftn.d2475e12185" class="footnote">32</a>]</sup>).
+        <sup>[<a name="d2519e12391" href="#ftn.d2519e12391" class="footnote">33</a>]</sup>).
 
         </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>
                     Build and install <a class="link" href="http://code.google.com/p/snappy/" target="_top">snappy</a> on all nodes
@@ -27,15 +27,18 @@ hbase&gt; describe 't1'</pre><p>
     </h3></div></div></div><p>
         You will find the snappy library file under the .libs directory from your Snappy build (For example
         /home/hbase/snappy-1.0.5/.libs/). The file is called libsnappy.so.1.x.x where 1.x.x is the version of the snappy
-        code you are building. You can either copy this file into your hbase directory under libsnappy.so name, or simply
-        create a symbolic link to it.
+        code you are building. You can either copy this file into your hbase lib directory -- under lib/native/PLATFORM --
+        naming the file as libsnappy.so,
+        or simply create a symbolic link to it (See ./bin/hbase for how it does library path for native libs).
     </p><p>
         The second file you need is the hadoop native library. You will find this file in your hadoop installation directory
         under lib/native/Linux-amd64-64/ or lib/native/Linux-i386-32/. The file you are looking for is libhadoop.so.1.x.x.
-        Again, you can simply copy this file or link to it, under the name libhadoop.so.
+        Again, you can simply copy this file or link to it from under hbase in lib/native/PLATFORM (e.g. Linux-amd64-64, etc.),
+        using the name libhadoop.so.
     </p><p>
         At the end of the installation, you should have both libsnappy.so and libhadoop.so links or files present into
-        lib/native/Linux-amd64-64 or into lib/native/Linux-i386-32
+        lib/native/Linux-amd64-64 or into lib/native/Linux-i386-32 (where the last part of the directory path is the
+        PLATFORM you built and rare running the native lib on)
     </p><p>To point hbase at snappy support, in hbase-env.sh set
         </p><pre class="programlisting">export HBASE_LIBRARY_PATH=/pathtoyourhadoop/lib/native/Linux-amd64-64</pre><p>
         In <code class="filename">/pathtoyourhadoop/lib/native/Linux-amd64-64</code> you should have something like:
@@ -45,7 +48,7 @@ hbase&gt; describe 't1'</pre><p>
         libsnappy.so.1
         libsnappy.so.1.1.2
     </pre><p>
-    </p></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d2475e12185" href="#d2475e12185" class="para">32</a>] </sup>See <a class="link" href="http://search-hadoop.com/m/Ds8d51c263B1/%2522Hadoop-Snappy+in+synch+with+Hadoop+trunk%2522&amp;subj=Hadoop+Snappy+in+synch+with+Hadoop+trunk" target="_top">Alejandro's note</a> up on the list on difference between Snappy in Hadoop
+    </p></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d2519e12391" href="#d2519e12391" class="para">33</a>] </sup>See <a class="link" href="http://search-hadoop.com/m/Ds8d51c263B1/%2522Hadoop-Snappy+in+synch+with+Hadoop+trunk%2522&amp;subj=Hadoop+Snappy+in+synch+with+Hadoop+trunk" target="_top">Alejandro's note</a> up on the list on difference between Snappy in Hadoop
         and Snappy in HBase</p></div></div></div><div id="disqus_thread"></div><script type="text/javascript">
     var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
     var disqus_url = 'http://hbase.apache.org/book';

Modified: hbase/hbase.apache.org/trunk/book/standalone_dist.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/standalone_dist.html?rev=1465200&r1=1465199&r2=1465200&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book/standalone_dist.html (original)
+++ hbase/hbase.apache.org/trunk/book/standalone_dist.html Sat Apr  6 06:08:56 2013
@@ -18,7 +18,7 @@
         daemons run on a single node -- a.k.a
         <span class="emphasis"><em>pseudo-distributed</em></span>-- and
         <span class="emphasis"><em>fully-distributed</em></span> where the daemons are spread
-        across all nodes in the cluster <sup>[<a name="d2475e717" href="#ftn.d2475e717" class="footnote">9</a>]</sup>.</p><p>Distributed modes require an instance of the <span class="emphasis"><em>Hadoop
+        across all nodes in the cluster <sup>[<a name="d2519e717" href="#ftn.d2519e717" class="footnote">9</a>]</sup>.</p><p>Distributed modes require an instance of the <span class="emphasis"><em>Hadoop
         Distributed File System</em></span> (HDFS). See the Hadoop <a class="link" href="http://hadoop.apache.org/common/docs/r1.1.1/api/overview-summary.html#overview_description" target="_top">
         requirements and instructions</a> for how to set up a HDFS. Before
         proceeding, ensure you have an appropriate, working HDFS.</p><p>Below we describe the different distributed setups. Starting,
@@ -37,7 +37,7 @@
               Note that the <code class="varname">hbase.rootdir</code> property points to the
               local HDFS instance.
    		  </p><p>Now skip to <a class="xref" href="standalone_dist.html#confirm" title="2.2.3.&nbsp;Running and Confirming Your Installation">Section&nbsp;2.2.3, &#8220;Running and Confirming Your Installation&#8221;</a> for how to start and verify your
-          pseudo-distributed install. <sup>[<a name="d2475e765" href="#ftn.d2475e765" class="footnote">10</a>]</sup></p><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>Let HBase create the <code class="varname">hbase.rootdir</code>
+          pseudo-distributed install. <sup>[<a name="d2519e765" href="#ftn.d2519e765" class="footnote">10</a>]</sup></p><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>Let HBase create the <code class="varname">hbase.rootdir</code>
             directory. If you don't, you'll get warning saying HBase needs a
             migration run because the directory is missing files expected by
             HBase (it'll create them if you let it).</p></div><div class="section" title="2.2.2.1.1.&nbsp;Pseudo-distributed Configuration File"><div class="titlepage"><div><div><h5 class="title"><a name="pseudo.config"></a>2.2.2.1.1.&nbsp;Pseudo-distributed Configuration File</h5></div></div></div><p>Below is a sample pseudo-distributed file for the node <code class="varname">h-24-30.example.com</code>.
@@ -158,8 +158,8 @@ stopping hbase...............</pre><p> S
         complete. It can take longer if your cluster is comprised of many
         machines. If you are running a distributed operation, be sure to wait
         until HBase has shut down completely before stopping the Hadoop
-        daemons.</p></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d2475e717" href="#d2475e717" class="para">9</a>] </sup>The pseudo-distributed vs fully-distributed nomenclature
-            comes from Hadoop.</p></div><div class="footnote"><p><sup>[<a id="ftn.d2475e765" href="#d2475e765" class="para">10</a>] </sup>See <a class="xref" href="standalone_dist.html#pseudo.extras" title="2.2.2.1.2.&nbsp;Pseudo-distributed Extras">Section&nbsp;2.2.2.1.2, &#8220;Pseudo-distributed Extras&#8221;</a> for notes on how to start extra Masters and
+        daemons.</p></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d2519e717" href="#d2519e717" class="para">9</a>] </sup>The pseudo-distributed vs fully-distributed nomenclature
+            comes from Hadoop.</p></div><div class="footnote"><p><sup>[<a id="ftn.d2519e765" href="#d2519e765" class="para">10</a>] </sup>See <a class="xref" href="standalone_dist.html#pseudo.extras" title="2.2.2.1.2.&nbsp;Pseudo-distributed Extras">Section&nbsp;2.2.2.1.2, &#8220;Pseudo-distributed Extras&#8221;</a> for notes on how to start extra Masters and
               RegionServers when running pseudo-distributed.</p></div></div></div><div id="disqus_thread"></div><script type="text/javascript">
     var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
     var disqus_url = 'http://hbase.apache.org/book';

Modified: hbase/hbase.apache.org/trunk/book/trouble.resources.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/trouble.resources.html?rev=1465200&r1=1465199&r2=1465200&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book/trouble.resources.html (original)
+++ hbase/hbase.apache.org/trunk/book/trouble.resources.html Sat Apr  6 06:08:56 2013
@@ -8,12 +8,12 @@
         is generally used for questions on released versions of Apache HBase.  Before going to the mailing list, make sure your
         question has not already been answered by searching the mailing list archives first.  Use
         <a class="xref" href="trouble.resources.html#trouble.resources.searchhadoop" title="12.3.1.&nbsp;search-hadoop.com">Section&nbsp;12.3.1, &#8220;search-hadoop.com&#8221;</a>.
-        Take some time crafting your question<sup>[<a name="d2475e8017" href="#ftn.d2475e8017" class="footnote">28</a>]</sup>; a quality question that includes all context and
+        Take some time crafting your question<sup>[<a name="d2519e8250" href="#ftn.d2519e8250" class="footnote">29</a>]</sup>; a quality question that includes all context and
         exhibits evidence the author has tried to find answers in the manual and out on lists
         is more likely to get a prompt response.
         </p></div><div class="section" title="12.3.3.&nbsp;IRC"><div class="titlepage"><div><div><h3 class="title"><a name="trouble.resources.irc"></a>12.3.3.&nbsp;IRC</h3></div></div></div><p>#hbase on irc.freenode.net</p></div><div class="section" title="12.3.4.&nbsp;JIRA"><div class="titlepage"><div><div><h3 class="title"><a name="trouble.resources.jira"></a>12.3.4.&nbsp;JIRA</h3></div></div></div><p>
         <a class="link" href="https://issues.apache.org/jira/browse/HBASE" target="_top">JIRA</a> is also really helpful when looking for Hadoop/HBase-specific issues.
-        </p></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d2475e8017" href="#d2475e8017" class="para">28</a>] </sup>See Getting Answers</p></div></div></div><div id="disqus_thread"></div><script type="text/javascript">
+        </p></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d2519e8250" href="#d2519e8250" class="para">29</a>] </sup>See Getting Answers</p></div></div></div><div id="disqus_thread"></div><script type="text/javascript">
     var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
     var disqus_url = 'http://hbase.apache.org/book';
     var disqus_identifier = 'trouble.resources';

Modified: hbase/hbase.apache.org/trunk/book/upgrade0.90.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/upgrade0.90.html?rev=1465200&r1=1465199&r2=1465200&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book/upgrade0.90.html (original)
+++ hbase/hbase.apache.org/trunk/book/upgrade0.90.html Sat Apr  6 06:08:56 2013
@@ -27,10 +27,10 @@
             need to change this (The 'normal'/default value is 64MB (67108864)).
             Run the script <code class="filename">bin/set_meta_memstore_size.rb</code>.
             This will make the necessary edit to your <code class="varname">.META.</code> schema.
-            Failure to run this change will make for a slow cluster <sup>[<a name="d2475e2867" href="#ftn.d2475e2867" class="footnote">12</a>]</sup>
+            Failure to run this change will make for a slow cluster <sup>[<a name="d2519e2890" href="#ftn.d2519e2890" class="footnote">13</a>]</sup>
             .
 
-          </p><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d2475e2867" href="#d2475e2867" class="para">12</a>] </sup>
+          </p><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d2519e2890" href="#d2519e2890" class="para">13</a>] </sup>
             See <a class="link" href="https://issues.apache.org/jira/browse/HBASE-3499" target="_top">HBASE-3499 Users upgrading to 0.90.0 need to have their .META. table updated with the right MEMSTORE_SIZE</a>
             </p></div></div></div><div id="disqus_thread"></div><script type="text/javascript">
     var disqus_shortname = 'hbase'; // required: replace example with your forum shortname

Modified: hbase/hbase.apache.org/trunk/book/upgrade0.92.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/upgrade0.92.html?rev=1465200&r1=1465199&r2=1465200&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book/upgrade0.92.html (original)
+++ hbase/hbase.apache.org/trunk/book/upgrade0.92.html Sat Apr  6 06:08:56 2013
@@ -14,46 +14,46 @@ There&#8217;s a separate tarball for sec
 If -XX:MaxDirectMemorySize is set in your hbase-env.sh, it&#8217;s going to enable the experimental off-heap cache (You may not want this).
 </li></ol></div><p>
 </p></div><p>
-</p><div class="section" title="3.4.1.&nbsp;You can&#8217;t go back!"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2714"></a>3.4.1.&nbsp;You can&#8217;t go back!
+</p><div class="section" title="3.4.1.&nbsp;You can&#8217;t go back!"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2737"></a>3.4.1.&nbsp;You can&#8217;t go back!
 </h3></div></div></div><p>To move to 0.92.0, all you need to do is shutdown your cluster, replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you clear out all 0.90.x instances) and restart (You cannot do a rolling restart from 0.90.x to 0.92.x -- you must restart).
 On startup, the <code class="varname">.META.</code> table content is rewritten removing the table schema from the <code class="varname">info:regioninfo</code> column.
 Also, any flushes done post first startup will write out data in the new 0.92.0 file format, <a class="link" href="http://hbase.apache.org/book.html#hfilev2" target="_top">HFile V2</a>.
 This means you cannot go back to 0.90.x once you&#8217;ve started HBase 0.92.0 over your HBase data directory.
-</p></div><div class="section" title="3.4.2.&nbsp;MSLAB is ON by default"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2728"></a>3.4.2.&nbsp;MSLAB is ON by default
+</p></div><div class="section" title="3.4.2.&nbsp;MSLAB is ON by default"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2751"></a>3.4.2.&nbsp;MSLAB is ON by default
 </h3></div></div></div><p>In 0.92.0, the <a class="link" href="http://hbase.apache.org/book.html#hbase.hregion.memstore.mslab.enabled" target="_top">hbase.hregion.memstore.mslab.enabled</a> flag is set to true
 (See <a class="xref" href="jvm.html#mslab">Section&nbsp;11.3.1.1, &#8220;Long GC pauses&#8221;</a>).  In 0.90.x it was <code class="constant">false</code>.  When it is enabled, memstores will step allocate memory in MSLAB 2MB chunks even if the
 memstore has zero or just a few small elements.  This is fine usually but if you had lots of regions per regionserver in a 0.90.x cluster (and MSLAB was off),
 you may find yourself OOME'ing on upgrade because the <span class="mathphrase">thousands of regions * number of column families * 2MB MSLAB (at a minimum)</span>
 puts your heap over the top.  Set <code class="varname">hbase.hregion.memstore.mslab.enabled</code> to
 <code class="constant">false</code> or set the MSLAB size down from 2MB by setting <code class="varname">hbase.hregion.memstore.mslab.chunksize</code> to something less.
-</p></div><div class="section" title="3.4.3.&nbsp;Distributed splitting is on by default"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2753"></a>3.4.3.&nbsp;Distributed splitting is on by default
+</p></div><div class="section" title="3.4.3.&nbsp;Distributed splitting is on by default"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2776"></a>3.4.3.&nbsp;Distributed splitting is on by default
 </h3></div></div></div><p>Previous, WAL logs on crash were split by the Master alone.  In 0.92.0, log splitting is done by the cluster (See See &#8220;HBASE-1364 [performance] Distributed splitting of regionserver commit logs&#8221;).  This should cut down significantly on the amount of time it takes splitting logs and getting regions back online again.
-</p></div><div class="section" title="3.4.4.&nbsp;Memory accounting is different now"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2758"></a>3.4.4.&nbsp;Memory accounting is different now
+</p></div><div class="section" title="3.4.4.&nbsp;Memory accounting is different now"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2781"></a>3.4.4.&nbsp;Memory accounting is different now
 </h3></div></div></div><p>In 0.92.0, <a class="xref" href="hfilev2.html" title="Appendix&nbsp;E.&nbsp;HFile format version 2">Appendix&nbsp;E, <i>HFile format version 2</i></a> indices and bloom filters take up residence in the same LRU used caching blocks that come from the filesystem.
 In 0.90.x, the HFile v1 indices lived outside of the LRU so they took up space even if the index was on a &#8216;cold&#8217; file, one that wasn&#8217;t being actively used.  With the indices now in the LRU, you may find you
 have less space for block caching.  Adjust your block cache accordingly.  See the <a class="xref" href="regionserver.arch.html#block.cache" title="9.6.4.&nbsp;Block Cache">Section&nbsp;9.6.4, &#8220;Block Cache&#8221;</a> for more detail.
 The block size default size has been changed in 0.92.0 from 0.2 (20 percent of heap) to 0.25.
-</p></div><div class="section" title="3.4.5.&nbsp;On the Hadoop version to use"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2767"></a>3.4.5.&nbsp;On the Hadoop version to use
+</p></div><div class="section" title="3.4.5.&nbsp;On the Hadoop version to use"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2790"></a>3.4.5.&nbsp;On the Hadoop version to use
 </h3></div></div></div><p>Run 0.92.0 on Hadoop 1.0.x (or CDH3u3 when it ships).  The performance benefits are worth making the move.  Otherwise, our Hadoop prescription is as it has been; you need an Hadoop that supports a working sync.  See <a class="xref" href="configuration.html#hadoop" title="2.1.3.&nbsp;Hadoop">Section&nbsp;2.1.3, &#8220;Hadoop&#8221;</a>.
 </p><p>If running on Hadoop 1.0.x (or CDH3u3), enable local read.  See <a class="link" href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf" target="_top">Practical Caching</a> presentation for ruminations on the performance benefits &#8216;going local&#8217; (and for how to enable local reads).
-</p></div><div class="section" title="3.4.6.&nbsp;HBase 0.92.0 ships with ZooKeeper 3.4.2"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2779"></a>3.4.6.&nbsp;HBase 0.92.0 ships with ZooKeeper 3.4.2
+</p></div><div class="section" title="3.4.6.&nbsp;HBase 0.92.0 ships with ZooKeeper 3.4.2"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2802"></a>3.4.6.&nbsp;HBase 0.92.0 ships with ZooKeeper 3.4.2
 </h3></div></div></div><p>If you can, upgrade your zookeeper.  If you can&#8217;t, 3.4.2 clients should work against 3.3.X ensembles (HBase makes use of 3.4.2 API).
-</p></div><div class="section" title="3.4.7.&nbsp;Online alter is off by default"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2784"></a>3.4.7.&nbsp;Online alter is off by default
+</p></div><div class="section" title="3.4.7.&nbsp;Online alter is off by default"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2807"></a>3.4.7.&nbsp;Online alter is off by default
 </h3></div></div></div><p>In 0.92.0, we&#8217;ve added an experimental online schema alter facility  (See <a class="xref" href="config.files.html#hbase.online.schema.update.enable" title="hbase.online.schema.update.enable"><code class="varname">hbase.online.schema.update.enable</code></a>).  Its off by default.  Enable it at your own risk.  Online alter and splitting tables do not play well together so be sure your cluster quiescent using this feature (for now).
-</p></div><div class="section" title="3.4.8.&nbsp;WebUI"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2791"></a>3.4.8.&nbsp;WebUI
+</p></div><div class="section" title="3.4.8.&nbsp;WebUI"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2814"></a>3.4.8.&nbsp;WebUI
 </h3></div></div></div><p>The webui has had a few additions made in 0.92.0.  It now shows a list of the regions currently transitioning, recent compactions/flushes, and a process list of running processes (usually empty if all is well and requests are being handled promptly).  Other additions including requests by region, a debugging servlet dump, etc.
-</p></div><div class="section" title="3.4.9.&nbsp;Security tarball"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2796"></a>3.4.9.&nbsp;Security tarball
+</p></div><div class="section" title="3.4.9.&nbsp;Security tarball"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2819"></a>3.4.9.&nbsp;Security tarball
 </h3></div></div></div><p>We now ship with two tarballs; secure and insecure HBase.  Documentation on how to setup a secure HBase is on the way.
-</p></div><div class="section" title="3.4.10.&nbsp;Experimental off-heap cache"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2801"></a>3.4.10.&nbsp;Experimental off-heap cache
+</p></div><div class="section" title="3.4.10.&nbsp;Experimental off-heap cache"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2824"></a>3.4.10.&nbsp;Experimental off-heap cache
 </h3></div></div></div><p>
 A new cache was contributed to 0.92.0 to act as a solution between using the &#8220;on-heap&#8221; cache which is the current LRU cache the region servers have and the operating system cache which is out of our control.
 To enable, set &#8220;-XX:MaxDirectMemorySize&#8221; in hbase-env.sh to the value for maximum direct memory size and specify hbase.offheapcache.percentage in hbase-site.xml with the percentage that you want to dedicate to off-heap cache. This should only be set for servers and not for clients. Use at your own risk.
 See this blog post for additional information on this new experimental feature: http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/
-</p></div><div class="section" title="3.4.11.&nbsp;Changes in HBase replication"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2806"></a>3.4.11.&nbsp;Changes in HBase replication
+</p></div><div class="section" title="3.4.11.&nbsp;Changes in HBase replication"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2829"></a>3.4.11.&nbsp;Changes in HBase replication
 </h3></div></div></div><p>0.92.0 adds two new features: multi-slave and multi-master replication. The way to enable this is the same as adding a new peer, so in order to have multi-master you would just run add_peer for each cluster that acts as a master to the other slave clusters. Collisions are handled at the timestamp level which may or may not be what you want, this needs to be evaluated on a per use case basis. Replication is still experimental in 0.92 and is disabled by default, run it at your own risk.
-</p></div><div class="section" title="3.4.12.&nbsp;RegionServer now aborts if OOME"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2811"></a>3.4.12.&nbsp;RegionServer now aborts if OOME
+</p></div><div class="section" title="3.4.12.&nbsp;RegionServer now aborts if OOME"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2834"></a>3.4.12.&nbsp;RegionServer now aborts if OOME
 </h3></div></div></div><p>If an OOME, we now have the JVM kill -9 the regionserver process so it goes down fast.  Previous, a RegionServer might stick around after incurring an OOME limping along in some wounded state.  To disable this facility, and recommend you leave it in place, you&#8217;d need to edit the bin/hbase file.  Look for the addition of the -XX:OnOutOfMemoryError="kill -9 %p" arguments (See [HBASE-4769] - &#8216;Abort RegionServer Immediately on OOME&#8217;)
-</p></div><div class="section" title="3.4.13.&nbsp;HFile V2 and the &#8220;Bigger, Fewer&#8221; Tendency"><div class="titlepage"><div><div><h3 class="title"><a name="d2475e2816"></a>3.4.13.&nbsp;HFile V2 and the &#8220;Bigger, Fewer&#8221; Tendency
+</p></div><div class="section" title="3.4.13.&nbsp;HFile V2 and the &#8220;Bigger, Fewer&#8221; Tendency"><div class="titlepage"><div><div><h3 class="title"><a name="d2519e2839"></a>3.4.13.&nbsp;HFile V2 and the &#8220;Bigger, Fewer&#8221; Tendency
 </h3></div></div></div><p>0.92.0 stores data in a new format, <a class="xref" href="hfilev2.html" title="Appendix&nbsp;E.&nbsp;HFile format version 2">Appendix&nbsp;E, <i>HFile format version 2</i></a>.   As HBase runs, it will move all your data from HFile v1 to HFile v2 format.  This auto-migration will run in the background as flushes and compactions run.
 HFile V2 allows HBase run with larger regions/files.  In fact, we encourage that all HBasers going forward tend toward Facebook axiom #1, run with larger, fewer regions.
 If you have lots of regions now -- more than 100s per host -- you should look into setting your region size up after you move to 0.92.0 (In 0.92.0, default size is now 1G, up from 256M), and then running online merge tool (See &#8220;HBASE-1621 merge tool should work on online cluster, but disabled table&#8221;).

Modified: hbase/hbase.apache.org/trunk/book/upgrade0.94.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book/upgrade0.94.html?rev=1465200&r1=1465199&r2=1465200&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book/upgrade0.94.html (original)
+++ hbase/hbase.apache.org/trunk/book/upgrade0.94.html Sat Apr  6 06:08:56 2013
@@ -1,6 +1,11 @@
 <html><head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-   <title>3.3.&nbsp;Upgrading from 0.92.x to 0.94.x</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" href="upgrading.html" title="Chapter&nbsp;3.&nbsp;Upgrading"><link rel="prev" href="upgrade0.96.html" title="3.2.&nbsp;Upgrading from 0.94.x to 0.96.x"><link rel="next" href="upgrade0.92.html" title="3.4.&nbsp;Upgrading from 0.90.x to 0.92.x"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">3.3.&nbsp;Upgrading from 0.92.x to 0.94.x</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="upgrade0.96.html">Prev</a>&nbsp;</td><th width="60%" align="center">Chapter&nbsp;3.&nbsp;Upgrading</th><td width="20%" align="right">&nbsp;<a access
 key="n" href="upgrade0.92.html">Next</a></td></tr></table><hr></div><div class="section" title="3.3.&nbsp;Upgrading from 0.92.x to 0.94.x"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="upgrade0.94"></a>3.3.&nbsp;Upgrading from 0.92.x to 0.94.x</h2></div></div></div><p>0.92 and 0.94 are interface compatible.  You can do a rolling upgrade between these versions.
+   <title>3.3.&nbsp;Upgrading from 0.92.x to 0.94.x</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="book.html" title="The Apache HBase&#153; Reference Guide"><link rel="up" href="upgrading.html" title="Chapter&nbsp;3.&nbsp;Upgrading"><link rel="prev" href="upgrade0.96.html" title="3.2.&nbsp;Upgrading from 0.94.x to 0.96.x"><link rel="next" href="upgrade0.92.html" title="3.4.&nbsp;Upgrading from 0.90.x to 0.92.x"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">3.3.&nbsp;Upgrading from 0.92.x to 0.94.x</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="upgrade0.96.html">Prev</a>&nbsp;</td><th width="60%" align="center">Chapter&nbsp;3.&nbsp;Upgrading</th><td width="20%" align="right">&nbsp;<a access
 key="n" href="upgrade0.92.html">Next</a></td></tr></table><hr></div><div class="section" title="3.3.&nbsp;Upgrading from 0.92.x to 0.94.x"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="upgrade0.94"></a>3.3.&nbsp;Upgrading from 0.92.x to 0.94.x</h2></div></div></div><p>We used to think that 0.92 and 0.94 were interface compatible and that you can do a
+          rolling upgrade between these versions but then we figured that
+          <a class="link" href="https://issues.apache.org/jira/browse/HBASE-5357" target="_top">HBASE-5357 Use builder pattern in HColumnDescriptor</a>
+          changed method signatures so rather than return void they instead return HColumnDescriptor.  This
+          will throw </p><pre class="programlisting">java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V</pre><p>
+          .... so 0.92 and 0.94 are NOT compatible.  You cannot do a rolling upgrade between them.
     </p></div><div id="disqus_thread"></div><script type="text/javascript">
     var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
     var disqus_url = 'http://hbase.apache.org/book';