You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by dm...@apache.org on 2012/01/23 03:27:55 UTC
svn commit: r1234674 - /hbase/trunk/src/docbkx/book.xml
Author: dmeil
Date: Mon Jan 23 02:27:54 2012
New Revision: 1234674
URL: http://svn.apache.org/viewvc?rev=1234674&view=rev
Log:
hbase-5252. book.xml, added section in Data Model about joins
Modified:
hbase/trunk/src/docbkx/book.xml
Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1234674&r1=1234673&r2=1234674&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Mon Jan 23 02:27:54 2012
@@ -522,6 +522,19 @@ htable.put(put);
</section>
</section>
</section>
+ <section xml:id="joins"><title>Joins</title>
+ <para>Whether HBase supports joins is a common question on the dist-list, and there is a simple answer: it doesn't,
+ at not least in the way that RDBMS' support them (e.g., with equi-joins or outer-joins in SQL). As has been illustrated
+ in this chapter, the read data model operations in HBase are Get and Scan.
+ </para>
+ <para>However, that doesn't mean that equivalent join functionality can't be supported in your application, but
+ you have to do it yourself. The two primary strategies are either denormalizing the data upon writing to HBase,
+ or to have lookup tables and do the join between HBase tables in your application or MapReduce code (and as RDBMS'
+ demonstrate, there are several strategies for this depending on the size of the tables, e.g., nested loops vs.
+ hash-joins). So which is the best approach? It depends on what you are trying to do, and as such there isn't a single
+ answer that works for every use case.
+ </para>
+ </section>
</chapter> <!-- data model -->
<chapter xml:id="schema">
@@ -756,6 +769,10 @@ System.out.println("md5 digest as string
</para>
</section>
</section>
+ <section xml:id="schema.joins"><title>Joins</title>
+ <para>If you have multiple tables, don't forget to factor in the potential for <xref linkend="joins"/> into the schema design.
+ </para>
+ </section>
<section xml:id="ttl">
<title>Time To Live (TTL)</title>
<para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.