You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by dm...@apache.org on 2012/01/23 03:27:55 UTC

svn commit: r1234674 - /hbase/trunk/src/docbkx/book.xml

Author: dmeil
Date: Mon Jan 23 02:27:54 2012
New Revision: 1234674

URL: http://svn.apache.org/viewvc?rev=1234674&view=rev
Log:
hbase-5252.  book.xml, added section in Data Model about joins

Modified:
    hbase/trunk/src/docbkx/book.xml

Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1234674&r1=1234673&r2=1234674&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Mon Jan 23 02:27:54 2012
@@ -522,6 +522,19 @@ htable.put(put);
         </section>
       </section>
     </section>
+    <section xml:id="joins"><title>Joins</title>
+      <para>Whether HBase supports joins is a common question on the dist-list, and there is a simple answer:  it doesn't,
+      at not least in the way that RDBMS' support them (e.g., with equi-joins or outer-joins in SQL).  As has been illustrated
+      in this chapter, the read data model operations in HBase are Get and Scan.       
+      </para>
+      <para>However, that doesn't mean that equivalent join functionality can't be supported in your application, but
+      you have to do it yourself.  The two primary strategies are either denormalizing the data upon writing to HBase,
+      or to have lookup tables and do the join between HBase tables in your application or MapReduce code (and as RDBMS' 
+      demonstrate, there are several strategies for this depending on the size of the tables, e.g., nested loops vs.
+      hash-joins).  So which is the best approach?  It depends on what you are trying to do, and as such there isn't a single
+      answer that works for every use case.
+      </para>
+    </section>
   </chapter>  <!-- data model -->
 
  <chapter xml:id="schema">
@@ -756,6 +769,10 @@ System.out.println("md5 digest as string
       </para>
     </section> 
   </section>
+  <section xml:id="schema.joins"><title>Joins</title>
+    <para>If you have multiple tables, don't forget to factor in the potential for <xref linkend="joins"/> into the schema design. 
+    </para>
+  </section>
   <section xml:id="ttl">
   <title>Time To Live (TTL)</title>
   <para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.