You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@phoenix.apache.org by ma...@apache.org on 2014/10/26 04:18:24 UTC

svn commit: r1634270 - in /phoenix/site: publish/joins.html source/src/site/markdown/joins.md

Author: maryannxue
Date: Sun Oct 26 03:18:24 2014
New Revision: 1634270

URL: http://svn.apache.org/r1634270
Log:
PHOENIX-1326 Update documentation for sub-query support and join optimizations

Modified:
    phoenix/site/publish/joins.html
    phoenix/site/source/src/site/markdown/joins.md

Modified: phoenix/site/publish/joins.html
URL: http://svn.apache.org/viewvc/phoenix/site/publish/joins.html?rev=1634270&r1=1634269&r2=1634270&view=diff
==============================================================================
--- phoenix/site/publish/joins.html (original)
+++ phoenix/site/publish/joins.html Sun Oct 26 03:18:24 2014
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2014-10-24
+ Generated by Apache Maven Doxia at 2014-10-25
  Rendered using Reflow Maven Skin 1.1.0 (http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -127,200 +127,190 @@
 </div> 
 <p>The standard SQL join syntax (with some limitations) is now supported by Phoenix to combine records from two or more tables based on their fields having common values.</p> 
 <p>For example, we have the following tables to store our order records, our customer information and the item information we sell in those orders.</p> 
-<div class="section"> 
- <div class="section"> 
-  <div class="section"> 
-   <h4 id="The_Orders_table:">The “Orders” table:</h4> 
-   <table border="0" class="bodyTable table table-striped table-hover"> 
-    <thead> 
-     <tr class="a"> 
-      <th>OrderID </th> 
-      <th>CustomerID </th> 
-      <th>ItemID </th> 
-      <th>Quantity </th> 
-      <th>Date</th> 
-     </tr> 
-    </thead> 
-    <tbody> 
-     <tr class="b"> 
-      <td>1630781 </td> 
-      <td>C004 </td> 
-      <td>I001 </td> 
-      <td>650 </td> 
-      <td>09-01-2013</td> 
-     </tr> 
-     <tr class="a"> 
-      <td>1630782 </td> 
-      <td>C003 </td> 
-      <td>I006 </td> 
-      <td>2500 </td> 
-      <td>09-02-2013</td> 
-     </tr> 
-     <tr class="b"> 
-      <td>1630783 </td> 
-      <td>C002 </td> 
-      <td>I002 </td> 
-      <td>340 </td> 
-      <td>09-03-2013</td> 
-     </tr> 
-     <tr class="a"> 
-      <td>1630784 </td> 
-      <td>C004 </td> 
-      <td>I006 </td> 
-      <td>1260 </td> 
-      <td>09-04-2013</td> 
-     </tr> 
-     <tr class="b"> 
-      <td>1630785 </td> 
-      <td>C005 </td> 
-      <td>I003 </td> 
-      <td>1500 </td> 
-      <td>09-05-2013</td> 
-     </tr> 
-    </tbody> 
-   </table> 
-  </div> 
-  <div class="section"> 
-   <h4 id="The_Customers_table:">The “Customers” table:</h4> 
-   <table border="0" class="bodyTable table table-striped table-hover"> 
-    <thead> 
-     <tr class="a"> 
-      <th>CustomerID </th> 
-      <th>CustomerName </th> 
-      <th>Country</th> 
-     </tr> 
-    </thead> 
-    <tbody> 
-     <tr class="b"> 
-      <td>C001 </td> 
-      <td>Telefunken </td> 
-      <td>Germany</td> 
-     </tr> 
-     <tr class="a"> 
-      <td>C002 </td> 
-      <td>Logica </td> 
-      <td>Belgium</td> 
-     </tr> 
-     <tr class="b"> 
-      <td>C003 </td> 
-      <td>Salora Oy </td> 
-      <td>Finland</td> 
-     </tr> 
-     <tr class="a"> 
-      <td>C004 </td> 
-      <td>Alps Nordic AB </td> 
-      <td>Sweden</td> 
-     </tr> 
-     <tr class="b"> 
-      <td>C005 </td> 
-      <td>Deister Electronics </td> 
-      <td>Germany</td> 
-     </tr> 
-     <tr class="a"> 
-      <td>C006 </td> 
-      <td>Thales Nederland </td> 
-      <td>Netherlands</td> 
-     </tr> 
-    </tbody> 
-   </table> 
-  </div> 
-  <div class="section"> 
-   <h4 id="The_Items_table:">The “Items” table:</h4> 
-   <table border="0" class="bodyTable table table-striped table-hover"> 
-    <thead> 
-     <tr class="a"> 
-      <th>ItemID </th> 
-      <th>ItemName </th> 
-      <th>Price</th> 
-     </tr> 
-    </thead> 
-    <tbody> 
-     <tr class="b"> 
-      <td>I001 </td> 
-      <td>BX016 </td> 
-      <td>15.96</td> 
-     </tr> 
-     <tr class="a"> 
-      <td>I002 </td> 
-      <td>MU947 </td> 
-      <td>20.35</td> 
-     </tr> 
-     <tr class="b"> 
-      <td>I003 </td> 
-      <td>MU3508 </td> 
-      <td>9.60</td> 
-     </tr> 
-     <tr class="a"> 
-      <td>I004 </td> 
-      <td>XC7732 </td> 
-      <td>55.24</td> 
-     </tr> 
-     <tr class="b"> 
-      <td>I005 </td> 
-      <td>XT0019 </td> 
-      <td>12.65</td> 
-     </tr> 
-     <tr class="a"> 
-      <td>I006 </td> 
-      <td>XT2217 </td> 
-      <td>12.35</td> 
-     </tr> 
-    </tbody> 
-   </table> 
-   <p>You may get a combined view of the “Orders” table and the “Customers” table by running the following join query:</p> 
-   <div class="source"> 
-    <pre>SELECT O.OrderID, C.CustomerName, C.Country, O.Date
+<p>The “<b>Orders</b>” table:</p> 
+<table border="0" class="bodyTable table table-striped table-hover"> 
+ <thead> 
+  <tr class="a"> 
+   <th>OrderID </th> 
+   <th>CustomerID </th> 
+   <th>ItemID </th> 
+   <th>Quantity </th> 
+   <th>Date</th> 
+  </tr> 
+ </thead> 
+ <tbody> 
+  <tr class="b"> 
+   <td>1630781 </td> 
+   <td>C004 </td> 
+   <td>I001 </td> 
+   <td>650 </td> 
+   <td>09-01-2013</td> 
+  </tr> 
+  <tr class="a"> 
+   <td>1630782 </td> 
+   <td>C003 </td> 
+   <td>I006 </td> 
+   <td>2500 </td> 
+   <td>09-02-2013</td> 
+  </tr> 
+  <tr class="b"> 
+   <td>1630783 </td> 
+   <td>C002 </td> 
+   <td>I002 </td> 
+   <td>340 </td> 
+   <td>09-03-2013</td> 
+  </tr> 
+  <tr class="a"> 
+   <td>1630784 </td> 
+   <td>C004 </td> 
+   <td>I006 </td> 
+   <td>1260 </td> 
+   <td>09-04-2013</td> 
+  </tr> 
+  <tr class="b"> 
+   <td>1630785 </td> 
+   <td>C005 </td> 
+   <td>I003 </td> 
+   <td>1500 </td> 
+   <td>09-05-2013</td> 
+  </tr> 
+ </tbody> 
+</table> 
+<p>The “<b>Customers</b>” table:</p> 
+<table border="0" class="bodyTable table table-striped table-hover"> 
+ <thead> 
+  <tr class="a"> 
+   <th>CustomerID </th> 
+   <th>CustomerName </th> 
+   <th>Country</th> 
+  </tr> 
+ </thead> 
+ <tbody> 
+  <tr class="b"> 
+   <td>C001 </td> 
+   <td>Telefunken </td> 
+   <td>Germany</td> 
+  </tr> 
+  <tr class="a"> 
+   <td>C002 </td> 
+   <td>Logica </td> 
+   <td>Belgium</td> 
+  </tr> 
+  <tr class="b"> 
+   <td>C003 </td> 
+   <td>Salora Oy </td> 
+   <td>Finland</td> 
+  </tr> 
+  <tr class="a"> 
+   <td>C004 </td> 
+   <td>Alps Nordic AB </td> 
+   <td>Sweden</td> 
+  </tr> 
+  <tr class="b"> 
+   <td>C005 </td> 
+   <td>Deister Electronics </td> 
+   <td>Germany</td> 
+  </tr> 
+  <tr class="a"> 
+   <td>C006 </td> 
+   <td>Thales Nederland </td> 
+   <td>Netherlands</td> 
+  </tr> 
+ </tbody> 
+</table> 
+<p>The “<b>Items</b>” table:</p> 
+<table border="0" class="bodyTable table table-striped table-hover"> 
+ <thead> 
+  <tr class="a"> 
+   <th>ItemID </th> 
+   <th>ItemName </th> 
+   <th>Price</th> 
+  </tr> 
+ </thead> 
+ <tbody> 
+  <tr class="b"> 
+   <td>I001 </td> 
+   <td>BX016 </td> 
+   <td>15.96</td> 
+  </tr> 
+  <tr class="a"> 
+   <td>I002 </td> 
+   <td>MU947 </td> 
+   <td>20.35</td> 
+  </tr> 
+  <tr class="b"> 
+   <td>I003 </td> 
+   <td>MU3508 </td> 
+   <td>9.60</td> 
+  </tr> 
+  <tr class="a"> 
+   <td>I004 </td> 
+   <td>XC7732 </td> 
+   <td>55.24</td> 
+  </tr> 
+  <tr class="b"> 
+   <td>I005 </td> 
+   <td>XT0019 </td> 
+   <td>12.65</td> 
+  </tr> 
+  <tr class="a"> 
+   <td>I006 </td> 
+   <td>XT2217 </td> 
+   <td>12.35</td> 
+  </tr> 
+ </tbody> 
+</table> 
+<p>You may get a combined view of the “Orders” table and the “Customers” table by running the following join query:</p> 
+<div class="source"> 
+ <pre>SELECT O.OrderID, C.CustomerName, C.Country, O.Date
 FROM Orders AS O
 INNER JOIN Customers AS C
 ON O.CustomerID = C.CustomerID;
 </pre> 
-   </div> 
-   <p>This will produce results like:</p> 
-   <table border="0" class="bodyTable table table-striped table-hover"> 
-    <thead> 
-     <tr class="a"> 
-      <th>O.OrderID </th> 
-      <th>C.CustomerName </th> 
-      <th>C.Country </th> 
-      <th>O.Date</th> 
-     </tr> 
-    </thead> 
-    <tbody> 
-     <tr class="b"> 
-      <td>1630781 </td> 
-      <td>Alps Nordic AB </td> 
-      <td>Sweden </td> 
-      <td>09-01-2013</td> 
-     </tr> 
-     <tr class="a"> 
-      <td>1630782 </td> 
-      <td>Salora Oy </td> 
-      <td>Finland </td> 
-      <td>09-02-2013</td> 
-     </tr> 
-     <tr class="b"> 
-      <td>1630783 </td> 
-      <td>Logica </td> 
-      <td>Belgium </td> 
-      <td>09-03-2013</td> 
-     </tr> 
-     <tr class="a"> 
-      <td>1630784 </td> 
-      <td>Alps Nordic AB </td> 
-      <td>Sweden </td> 
-      <td>09-04-2013</td> 
-     </tr> 
-     <tr class="b"> 
-      <td>1630785 </td> 
-      <td>Deister Electronics </td> 
-      <td>Germany </td> 
-      <td>09-05-2013</td> 
-     </tr> 
-    </tbody> 
-   </table> 
-  </div> 
- </div> 
 </div> 
+<p>This will produce results like:</p> 
+<table border="0" class="bodyTable table table-striped table-hover"> 
+ <thead> 
+  <tr class="a"> 
+   <th>O.OrderID </th> 
+   <th>C.CustomerName </th> 
+   <th>C.Country </th> 
+   <th>O.Date</th> 
+  </tr> 
+ </thead> 
+ <tbody> 
+  <tr class="b"> 
+   <td>1630781 </td> 
+   <td>Alps Nordic AB </td> 
+   <td>Sweden </td> 
+   <td>09-01-2013</td> 
+  </tr> 
+  <tr class="a"> 
+   <td>1630782 </td> 
+   <td>Salora Oy </td> 
+   <td>Finland </td> 
+   <td>09-02-2013</td> 
+  </tr> 
+  <tr class="b"> 
+   <td>1630783 </td> 
+   <td>Logica </td> 
+   <td>Belgium </td> 
+   <td>09-03-2013</td> 
+  </tr> 
+  <tr class="a"> 
+   <td>1630784 </td> 
+   <td>Alps Nordic AB </td> 
+   <td>Sweden </td> 
+   <td>09-04-2013</td> 
+  </tr> 
+  <tr class="b"> 
+   <td>1630785 </td> 
+   <td>Deister Electronics </td> 
+   <td>Germany </td> 
+   <td>09-05-2013</td> 
+  </tr> 
+ </tbody> 
+</table> 
 <div class="section"> 
  <h2 id="Joining_Tables_with_Indices">Joining Tables with Indices</h2> 
  <p>Secondary indices will be automatically utilized when running join queries. For example, if we create indices on the “Orders” table and the “Items” table respectively, which are defined as follows:</p> 
@@ -330,7 +320,7 @@ CREATE INDEX i2Orders ON Orders (Custome
 CREATE INDEX iItems ON Items (ItemName) INCLUDE (Price);
 </pre> 
  </div> 
- <p>We can find out each item’s total sales value by joining the “Items” table and the “Orders” table and then grouping the joined result with “ItemName” (and also adding some filtering conditions):</p> 
+ <p><a name="ex1"></a>We can find out each item’s total sales value by joining the “Items” table and the “Orders” table and then grouping the joined result with “ItemName” (and also adding some filtering conditions):</p> 
  <div class="source"> 
   <pre>SELECT ItemName, sum(Price * Quantity) AS OrderValue
 FROM Items
@@ -375,8 +365,99 @@ CLIENT MERGE SORT 
  <p>In this case, the index table “iItems” is used in place of the data table “Items” since the index table “iItems” is indexed on column “ItemName” and will hence benefit the GROUP-BY clause in this query. Meanwhile, the index table “i2Orders” is favored over the data table “Orders” and another index table “iOrders” because a range scan instead of a full scan can be applied as a result of the WHERE clause.</p> 
 </div> 
 <div class="section"> 
+ <h2 id="Grouped_Joins_and_Derived_Tables">Grouped Joins and Derived Tables</h2> 
+ <p>Phoenix also supports complex join syntax such as grouped joins (or sub joins) and joins with derived-tables. You can group joins by using parenthesis to prioritize certain joins before other joins are executed. You can also replace any one (or more) of your join tables with a sub-query (derived table), which could be yet another join query.</p> 
+ <p>For grouped joins, you can write something like:</p> 
+ <div class="source"> 
+  <pre>SELECT O.OrderID, I.ItemName, S.SupplierName
+FROM Orders AS O
+LEFT JOIN
+    (Items AS I
+     INNER JOIN Suppliers AS S
+     ON I.SupplierID = S.SupplierID)
+ON O.ItemID = I.ItemID;
+</pre> 
+ </div> 
+ <p>By replacing the sub join with a sub-query (derived table), we get an equivalent query as:</p> 
+ <div class="source"> 
+  <pre>SELECT O.OrderID, J.ItemName, J.SupplierName
+FROM Orders AS O
+LEFT JOIN
+    (SELECT ItemID, ItemName, SupplierName
+     FROM Items AS I
+     INNER JOIN Suppliers AS S
+     ON I.SupplierID = S.SupplierID) AS J
+ON O.ItemID = J.ItemID;
+</pre> 
+ </div> 
+ <p>As an alternative to the <a href="#ex1">earlier example</a> where we try to find out each item’s sales figures, instead of using group-by after joining the two tables, we can join the “Items” table with the grouped result from the “Orders” table:</p> 
+ <div class="source"> 
+  <pre>SELECT ItemName, O.OrderValue
+FROM Items
+JOIN
+    (SELECT ItemID, sum(Price * Quantity) AS OrderValue
+     FROM Orders
+     WHERE CustomerID &gt; 'C002'
+     GROUP BY ItemID) AS O
+ON Items.ItemID = O.ItemID;
+</pre> 
+ </div> 
+</div> 
+<div class="section"> 
+ <h2 id="Foreign_Key_to_Primary_Key_Join_Optimization">Foreign Key to Primary Key Join Optimization</h2> 
+ <p>Oftentimes a join will occur from a child table to a parent table, mapping the foreign key of the child table to the primary key of the parent. So instead of doing a full scan on the parent table, Phoenix will drive a skip-scan or a range-scan based on the foreign key values it got from the child table result.</p> 
+ <p>Phoenix will extract and sort multiple key parts from the join keys so that it can get the most accurate key hints/ranges possible for the parent table scan.</p> 
+ <p>For example, we have parent table “Employee” and child table “Patent” defined as:</p> 
+ <div class="source"> 
+  <pre>CREATE TABLE Employee (
+    Region VARCHAR NOT NULL,
+    LocalID VARCHAR NOT NULL,
+    Name VARCHAR NOT NULL,
+    StartDate DATE NOT NULL,
+    CONSTRAINT pk PRIMARY KEY (Region, LocalID));
+
+CREATE TABLE Patent (
+    PatentID VARCHAR NOT NULL,
+    DeptID VARCHAR NOT NULL,
+    LocalID VARCHAR NOT NULL,
+    Title VARCHAR NOT NULL,
+    Category VARCHAR NOT NULL,
+    FileDate DATE NOT NULL,
+    CONSTRAINT pk PRIMARY KEY (PatentID));
+</pre> 
+ </div> 
+ <p>Now we’d like to find out all those employees who filed patents after January 2000 and list their names according to their patent count:</p> 
+ <div class="source"> 
+  <pre>SELECT E.Name, E.Region, P.PCount
+FROM Employee AS E
+JOIN
+    (SELECT Region, LocalID, count(*) AS PCount
+     FROM Patent
+     WHERE P.FileDate &gt;= to_date('2000/01/01')
+     GROUP BY Region, LocalID) AS P
+ON E.Region = P.Region AND E.LocalID = P.LocalID
+</pre> 
+ </div> 
+ <p>The above statement will do a skip-scan over the “Employee” table and will use both join key “Region” and “LocalID” for runtime key hint calculation. Below is the execution time of this query with and without this optimization on an “Employee” table of about <i>5000000</i> records and a “Patent” table of about <i>1000</i> records: </p> 
+ <table border="0" class="bodyTable table table-striped table-hover"> 
+  <thead> 
+   <tr class="a"> 
+    <th>W/O Optimization </th> 
+    <th>W/ Optimization</th> 
+   </tr> 
+  </thead> 
+  <tbody> 
+   <tr class="b"> 
+    <td>8.1s </td> 
+    <td>0.4s</td> 
+   </tr> 
+  </tbody> 
+ </table> 
+ <p>However, there are times when the foreign key values from the child table account for a complete primary key space in the parent table, thus using skip-scans would only be slower not faster. In order to avoid such situations, Phoenix currently does a range-scan by default and only chooses to do a skip-scan when there is a child table filter in the WHERE clause or the ON clause, as in the above example. Table statistics will come to help making smarter choices between the two schemes in future. Yet you can always use hints “SKIP_SCAN_HASH_JOIN” or “RANGE_SCAN_HASH_JOIN” to change the default behavior.</p> 
+</div> 
+<div class="section"> 
  <h2 id="Configuration">Configuration</h2> 
- <p>The join functionality is now implemented through hash joins, which means one side of the join operator has to be small enough to fit into memory in order to be broadcast over all servers that have the data of concern from the other side of join.</p> 
+ <p>The join functionality is now implemented through hash joins, which means one side of the join operator has to be small enough to fit into memory in order to be broadcast over all servers that have the data of concern from the other side of join. This limitation will be eliminated once <a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1179">PHOENIX-1179</a> is implemented.</p> 
  <p>The servers-side caches are used to hold the hashed sub-query results. The size and the living time of the caches are controlled by the following parameters.</p> 
  <ol style="list-style-type: decimal"> 
   <li>phoenix.query.maxServerCacheBytes 
@@ -410,7 +491,7 @@ CLIENT MERGE SORT 
   <li> <p><i>lhs</i> LEFT OUTER JOIN <i>rhs</i></p> <p><i>rhs</i> will be built as hash map in server cache.</p></li> 
   <li> <p><i>lhs</i> RIGHT OUTER JOIN <i>rhs</i></p> <p><i>lhs</i> will be built as hash map in server cache.</p></li> 
  </ol> 
- <p>The join order is more complicated with multiple-join queries. You can try running “EXPLAIN <i>join_query</i>” to look at the actual execution plan. For multiple-inner-join queries, we apply star-join optimization by default, which means the leading (left-hand-side) table will be scanned only once joining all right-hand-side tables at the same time. You can turn off this optimization by specifying the hint “NO_STAR_JOIN” in your query if the overall size of all right-hand-side tables would exceed the memory size limit.</p> 
+ <p>The join order is more complicated with multiple-join queries. You can try running “EXPLAIN <i>join_query</i>” to look at the actual execution plan. For multiple-inner-join queries, Phoenix applies star-join optimization by default, which means the leading (left-hand-side) table will be scanned only once joining all right-hand-side tables at the same time. You can turn off this optimization by specifying the hint “NO_STAR_JOIN” in your query if the overall size of all right-hand-side tables would exceed the memory size limit.</p> 
  <p>Let’s take the previous query for example:</p> 
  <div class="source"> 
   <pre>SELECT O.OrderID, C.CustomerName, I.ItemName, I.Price, O.Quantity
@@ -453,6 +534,7 @@ ON O.ItemID = I.ItemID;
  <ol style="list-style-type: decimal"> 
   <li>FULL OUTER JOIN and CROSS JOIN are not supported.</li> 
   <li>Equi-joins: Only equality (=) comparison is supported in joining conditions (conditions that specify the connecting rules between the two sides of the join operator). However there is no restriction on other predicates in the ON clause concerning only one side of the join operator.</li> 
+  <li><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1179">PHOENIX-1179</a>: Joins between two large tables that can neither fit into memory.</li> 
  </ol> 
  <p>Continuous efforts are being made to enhance Phoenix with more complete join functionalities. Please refer to our <a href="roadmap.html">Roadmap</a> for more information.</p> 
 </div>

Modified: phoenix/site/source/src/site/markdown/joins.md
URL: http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/joins.md?rev=1634270&r1=1634269&r2=1634270&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/joins.md (original)
+++ phoenix/site/source/src/site/markdown/joins.md Sun Oct 26 03:18:24 2014
@@ -4,7 +4,7 @@ The standard SQL join syntax (with some 
 
 For example, we have the following tables to store our order records, our customer information and the item information we sell in those orders.
 
-####The "Orders" table:
+The "**Orders**" table:
 
 OrderID         |CustomerID      |ItemID          |Quantity        |Date
 ----------------|----------------|----------------|----------------|----
@@ -14,7 +14,7 @@ OrderID         |CustomerID      |ItemID
 1630784         |C004            |I006            |1260            |09-04-2013
 1630785         |C005            |I003            |1500            |09-05-2013
 
-####The "Customers" table:
+The "**Customers**" table:
 
 CustomerID      |CustomerName        |Country
 ----------------|--------------------|------- 
@@ -25,7 +25,7 @@ C004            |Alps Nordic AB      |Sw
 C005            |Deister Electronics |Germany
 C006            |Thales Nederland    |Netherlands
 
-####The "Items" table:
+The "**Items**" table:
 
 ItemID          |ItemName            |Price
 ----------------|--------------------|-----
@@ -61,7 +61,7 @@ Secondary indices will be automatically 
     CREATE INDEX i2Orders ON Orders (CustomerID) INCLUDE (ItemID, Quantity);
     CREATE INDEX iItems ON Items (ItemName) INCLUDE (Price);
 
-We can find out each item's total sales value by joining the "Items" table and the "Orders" table and then grouping the joined result with "ItemName" (and also adding some filtering conditions):
+<a name="ex1"></a>We can find out each item's total sales value by joining the "Items" table and the "Orders" table and then grouping the joined result with "ItemName" (and also adding some filtering conditions):
 
     SELECT ItemName, sum(Price * Quantity) AS OrderValue
     FROM Items
@@ -88,9 +88,88 @@ The execution plan for this query (by ru
 
 In this case, the index table "iItems" is used in place of the data table "Items" since the index table "iItems" is indexed on column "ItemName" and will hence benefit the GROUP-BY clause in this query. Meanwhile, the index table "i2Orders" is favored over the data table "Orders" and another index table "iOrders" because a range scan instead of a full scan can be applied as a result of the WHERE clause.
 
+## Grouped Joins and Derived Tables
+
+Phoenix also supports complex join syntax such as grouped joins (or sub joins) and joins with derived-tables. You can group joins by using parenthesis to prioritize certain joins before other joins are executed. You can also replace any one (or more) of your join tables with a sub-query (derived table), which could be yet another join query.
+
+For grouped joins, you can write something like:
+
+    SELECT O.OrderID, I.ItemName, S.SupplierName
+    FROM Orders AS O
+    LEFT JOIN
+        (Items AS I
+         INNER JOIN Suppliers AS S
+         ON I.SupplierID = S.SupplierID)
+    ON O.ItemID = I.ItemID;
+
+By replacing the sub join with a sub-query (derived table), we get an equivalent query as:
+
+    SELECT O.OrderID, J.ItemName, J.SupplierName
+    FROM Orders AS O
+    LEFT JOIN
+        (SELECT ItemID, ItemName, SupplierName
+         FROM Items AS I
+         INNER JOIN Suppliers AS S
+         ON I.SupplierID = S.SupplierID) AS J
+    ON O.ItemID = J.ItemID;
+
+As an alternative to the [earlier example](#ex1) where we try to find out each item's sales figures, instead of using group-by after joining the two tables, we can join the "Items" table with the grouped result from the "Orders" table:
+
+    SELECT ItemName, O.OrderValue
+    FROM Items
+    JOIN
+        (SELECT ItemID, sum(Price * Quantity) AS OrderValue
+         FROM Orders
+         WHERE CustomerID > 'C002'
+         GROUP BY ItemID) AS O
+    ON Items.ItemID = O.ItemID;
+
+## Foreign Key to Primary Key Join Optimization
+
+Oftentimes a join will occur from a child table to a parent table, mapping the foreign key of the child table to the primary key of the parent. So instead of doing a full scan on the parent table, Phoenix will drive a skip-scan or a range-scan based on the foreign key values it got from the child table result.
+
+Phoenix will extract and sort multiple key parts from the join keys so that it can get the most accurate key hints/ranges possible for the parent table scan.
+
+For example, we have parent table "Employee" and child table "Patent" defined as:
+
+    CREATE TABLE Employee (
+        Region VARCHAR NOT NULL,
+        LocalID VARCHAR NOT NULL,
+        Name VARCHAR NOT NULL,
+        StartDate DATE NOT NULL,
+        CONSTRAINT pk PRIMARY KEY (Region, LocalID));
+
+    CREATE TABLE Patent (
+        PatentID VARCHAR NOT NULL,
+        DeptID VARCHAR NOT NULL,
+        LocalID VARCHAR NOT NULL,
+        Title VARCHAR NOT NULL,
+        Category VARCHAR NOT NULL,
+        FileDate DATE NOT NULL,
+        CONSTRAINT pk PRIMARY KEY (PatentID));
+
+Now we’d like to find out all those employees who filed patents after January 2000 and list their names according to their patent count:
+
+    SELECT E.Name, E.Region, P.PCount
+    FROM Employee AS E
+    JOIN
+        (SELECT Region, LocalID, count(*) AS PCount
+         FROM Patent
+         WHERE P.FileDate >= to_date('2000/01/01')
+         GROUP BY Region, LocalID) AS P
+    ON E.Region = P.Region AND E.LocalID = P.LocalID
+
+The above statement will do a skip-scan over the "Employee" table and will use both join key "Region" and "LocalID" for runtime key hint calculation. Below is the execution time of this query with and without this optimization on an "Employee" table of about _5000000_ records and a "Patent" table of about _1000_ records: 
+
+W/O Optimization    |W/ Optimization
+--------------------|---------------
+8.1s                |0.4s
+
+However, there are times when the foreign key values from the child table account for a complete primary key space in the parent table, thus using skip-scans would only be slower not faster. In order to avoid such situations, Phoenix currently does a range-scan by default and only chooses to do a skip-scan when there is a child table filter in the WHERE clause or the ON clause, as in the above example. Table statistics will come to help making smarter choices between the two schemes in future. Yet you can always use hints "SKIP_SCAN_HASH_JOIN" or "RANGE_SCAN_HASH_JOIN" to change the default behavior.
+
 ## Configuration
 
-The join functionality is now implemented through hash joins, which means one side of the join operator has to be small enough to fit into memory in order to be broadcast over all servers that have the data of concern from the other side of join.
+The join functionality is now implemented through hash joins, which means one side of the join operator has to be small enough to fit into memory in order to be broadcast over all servers that have the data of concern from the other side of join. This limitation will be eliminated once [PHOENIX-1179](https://issues.apache.org/jira/browse/PHOENIX-1179) is implemented.
 
 The servers-side caches are used to hold the hashed sub-query results. The size and the living time of the caches are controlled by the following parameters.
 
@@ -129,7 +208,7 @@ Below is a description of the default jo
 
     _lhs_ will be built as hash map in server cache.
 
-The join order is more complicated with multiple-join queries. You can try running "EXPLAIN _join\_query_" to look at the actual execution plan. For multiple-inner-join queries, we apply star-join optimization by default, which means the leading (left-hand-side) table will be scanned only once joining all right-hand-side tables at the same time. You can turn off this optimization by specifying the hint "NO_STAR_JOIN" in your query if the overall size of all right-hand-side tables would exceed the memory size limit.
+The join order is more complicated with multiple-join queries. You can try running "EXPLAIN _join\_query_" to look at the actual execution plan. For multiple-inner-join queries, Phoenix applies star-join optimization by default, which means the leading (left-hand-side) table will be scanned only once joining all right-hand-side tables at the same time. You can turn off this optimization by specifying the hint "NO_STAR_JOIN" in your query if the overall size of all right-hand-side tables would exceed the memory size limit.
 
 Let's take the previous query for example:
 
@@ -169,6 +248,7 @@ In our Phoenix 3.2 and 4.2 releases, joi
 
 1. FULL OUTER JOIN and CROSS JOIN are not supported.
 2. Equi-joins: Only equality (=) comparison is supported in joining conditions (conditions that specify the connecting rules between the two sides of the join operator). However there is no restriction on other predicates in the ON clause concerning only one side of the join operator.
+3. [PHOENIX-1179](https://issues.apache.org/jira/browse/PHOENIX-1179): Joins between two large tables that can neither fit into memory.
 
 Continuous efforts are being made to enhance Phoenix with more complete join functionalities. Please refer to our [Roadmap](roadmap.html) for more information.