You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@phoenix.apache.org by ma...@apache.org on 2015/02/01 00:07:21 UTC

svn commit: r1656256 - in /phoenix/site: publish/joins.html publish/recent.html publish/roadmap.html source/src/site/markdown/joins.md source/src/site/markdown/recent.md source/src/site/markdown/roadmap.md

Author: maryannxue
Date: Sat Jan 31 23:07:21 2015
New Revision: 1656256

URL: http://svn.apache.org/r1656256
Log:
PHOENIX-1554 Update join documentation based on many-to-many support

Modified:
    phoenix/site/publish/joins.html
    phoenix/site/publish/recent.html
    phoenix/site/publish/roadmap.html
    phoenix/site/source/src/site/markdown/joins.md
    phoenix/site/source/src/site/markdown/recent.md
    phoenix/site/source/src/site/markdown/roadmap.md

Modified: phoenix/site/publish/joins.html
URL: http://svn.apache.org/viewvc/phoenix/site/publish/joins.html?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/publish/joins.html (original)
+++ phoenix/site/publish/joins.html Sat Jan 31 23:07:21 2015
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2015-01-27
+ Generated by Apache Maven Doxia at 2015-01-31
  Rendered using Reflow Maven Skin 1.1.0 (http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -411,6 +411,11 @@ ON Items.ItemID = O.ItemID;
  </div> 
 </div> 
 <div class="section"> 
+ <h2 id="Hash_Join_vs._Sort-Merge_Join">Hash Join vs. Sort-Merge Join</h2> 
+ <p>Basic hash join usually outperforms other types of join algorithms, but it has its limitations too, the most significant of which is the assumption that one of the relations must be small enough to fit into memory. Thus Phoenix now has both hash join and sort-merge join implemented to facilitate fast join operations as well as join between two large tables.</p> 
+ <p>Phoenix currently uses the hash join algorithm whenever possible since it is usually much faster. However we have the hint “USE_SORT_MERGE_JOIN” for forcing the usage of sort-merge join in a query. The choice between these two join algorithms, together with detecting the smaller relation for hash join, will be done automatically in future under the guidance provided by table statistics.</p> 
+</div> 
+<div class="section"> 
  <h2 id="foreign-key-to-primary-key-join-optimization">Foreign Key to Primary Key Join Optimization<a name="Foreign_Key_to_Primary_Key_Join_Optimization"></a></h2> 
  <p>Oftentimes a join will occur from a child table to a parent table, mapping the foreign key of the child table to the primary key of the parent. So instead of doing a full scan on the parent table, Phoenix will drive a skip-scan or a range-scan based on the foreign key values it got from the child table result.</p> 
  <p>Phoenix will extract and sort multiple key parts from the join keys so that it can get the most accurate key hints/ranges possible for the parent table scan.</p> 
@@ -460,17 +465,17 @@ ON E.Region = P.Region AND E.LocalID = P
    </tr> 
   </tbody> 
  </table> 
- <p>However, there are times when the foreign key values from the child table account for a complete primary key space in the parent table, thus using skip-scans would only be slower not faster. In order to avoid such situations, Phoenix currently does a range-scan by default and only chooses to do a skip-scan when there is a child table filter in the WHERE clause or the ON clause, as in the above example. Table statistics will come to help making smarter choices between the two schemes in future. Yet you can always use hints “SKIP_SCAN_HASH_JOIN” or “RANGE_SCAN_HASH_JOIN” to change the default behavior.</p> 
+ <p>However, there are times when the foreign key values from the child table account for a complete primary key space in the parent table, thus using skip-scans would only be slower not faster. Yet you can always turn off the optimization by specifying hint “NO_CHILD_PARENT_OPTIMIZATION”. Furthermore, table statistics will soon come in to help making smarter choices between the two schemes.</p> 
 </div> 
 <div class="section"> 
  <h2 id="Configuration">Configuration</h2> 
- <p>The join functionality is now implemented through hash joins, which means one side of the join operator has to be small enough to fit into memory in order to be broadcast over all servers that have the data of concern from the other side of join. This limitation will be eliminated once <a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1179">PHOENIX-1179</a> is implemented.</p> 
- <p>The servers-side caches are used to hold the hashed join-table results. The size and the living time of the caches are controlled by the following parameters. Note that a join-table can be a physical table, a view, a subquery, or a joined result of other join-tables in a multi-join query.</p> 
+ <p>As mentioned earlier, if we decide to use the hash join approach for our join queries, the prerequisite is that either of the relations can be small enough to fit into memory in order to be broadcast over all servers that have the data of concern from the other relation. And aside from making sure that the region server heap size is big enough to hold the smaller relation, we might also need to pay a attention to a few configuration parameters that are crucial to running hash joins.</p> 
+ <p>The servers-side caches are used to hold the hash table built upon the smaller relation. The size and the living time of the caches are controlled by the following parameters. Note that a relation can be a physical table, a view, a subquery, or a joined result of other relations in a multiple-join query.</p> 
  <ol style="list-style-type: decimal"> 
   <li>phoenix.query.maxServerCacheBytes 
    <ul> 
-    <li>Maximum size (in bytes) of a join-table result before compression and conversion to a hash map.</li> 
-    <li>Attempting to hash a join-table result of a size bigger than this setting will result in a MaxServerCacheSizeExceededException.</li> 
+    <li>Maximum size (in bytes) of the raw results of a relation before being compressed and sent over to the region servers.</li> 
+    <li>Attempting to serializing the raw results of a relation with a size bigger than this setting will result in a MaxServerCacheSizeExceededException.</li> 
     <li><b>Default: 104,857,600</b></li> 
    </ul></li> 
   <li>phoenix.query.maxGlobalMemoryPercentage 
@@ -487,16 +492,16 @@ ON E.Region = P.Region AND E.LocalID = P
    </ul></li> 
  </ol> 
  <p>See our <a href="tuning.html">Configuration and Tuning Guide</a> for more details.</p> 
- <p>Although changing parameters can sometimes be a solution to getting rid of the exceptions mentioned above, it is highly recommended that you first consider optimizing the join queries according to the information provided in the following chapter.</p> 
+ <p>Although changing parameters can sometimes be a solution to getting rid of the exceptions mentioned above, it is highly recommended that you first consider optimizing the join queries according to the information provided in the following section.</p> 
 </div> 
 <div class="section"> 
  <h2 id="Optimizing_Your_Query">Optimizing Your Query</h2> 
- <p>As mentioned in the previous chapter, it is most crucial to make sure that there will be enough memory for the join query execution. But other than rush to change the configuration immediately, sometimes all you need to do is to know a bit of the interiors and adjust the sequence of the tables that appear in your join query.</p> 
- <p>Below is a description of the default join order (without the presence of table statistics) and of which side of the query will be executed as an inner query and put into server cache:</p> 
+ <p>Now that we know if using hash join it is most crucial to make sure that there will be enough memory for the query execution, but other than rush to change the configuration immediately, sometimes all you need to do is to know a bit of the interiors and adjust the sequence of the tables that appear in your join query.</p> 
+ <p>Below is a description of the default join order (without the presence of table statistics) and of which side of the query will be taken as the “smaller” relation and be put into server cache:</p> 
  <ol style="list-style-type: decimal"> 
-  <li> <p><i>lhs</i> INNER JOIN <i>rhs</i></p> <p><i>rhs</i> will be built as hash map in server cache.</p></li> 
-  <li> <p><i>lhs</i> LEFT OUTER JOIN <i>rhs</i></p> <p><i>rhs</i> will be built as hash map in server cache.</p></li> 
-  <li> <p><i>lhs</i> RIGHT OUTER JOIN <i>rhs</i></p> <p><i>lhs</i> will be built as hash map in server cache.</p></li> 
+  <li> <p><i>lhs</i> INNER JOIN <i>rhs</i></p> <p><i>rhs</i> will be built as hash table in server cache.</p></li> 
+  <li> <p><i>lhs</i> LEFT OUTER JOIN <i>rhs</i></p> <p><i>rhs</i> will be built as hash table in server cache.</p></li> 
+  <li> <p><i>lhs</i> RIGHT OUTER JOIN <i>rhs</i></p> <p><i>lhs</i> will be built as hash table in server cache.</p></li> 
  </ol> 
  <p>The join order is more complicated with multiple-join queries. You can try running “EXPLAIN <i>join_query</i>” to look at the actual execution plan. For multiple-inner-join queries, Phoenix applies star-join optimization by default, which means the leading (left-hand-side) table will be scanned only once joining all right-hand-side tables at the same time. You can turn off this optimization by specifying the hint “NO_STAR_JOIN” in your query if the overall size of all right-hand-side tables would exceed the memory size limit.</p> 
  <p>Let’s take the previous query for example:</p> 
@@ -533,17 +538,16 @@ ON O.ItemID = I.ItemID;
 3. SCAN Items JOIN HASH[1] --&gt; Final Resultset
 </pre> 
  </div> 
- <p>It is also worth mentioning that not the entire dataset of the table should be counted into the memory consumption. Instead, only those columns used by the query, and of only the records that satisfy the predicates will be built into the server hash map.</p> 
+ <p>It is also worth mentioning that not the entire dataset of the table should be counted into the memory consumption. Instead, only those columns used by the query, and of only the records that satisfy the predicates will be built into the server hash table.</p> 
 </div> 
 <div class="section"> 
  <h2 id="Limitations">Limitations</h2> 
- <p>In our Phoenix 3.2 and 4.2 releases, joins have the following restrictions:</p> 
+ <p>In our Phoenix 3.3.0 and 4.3.0 releases, joins have the following restrictions and improvements to be made:</p> 
  <ol style="list-style-type: decimal"> 
-  <li>FULL OUTER JOIN and CROSS JOIN are not supported.</li> 
-  <li>Equi-joins: Only equality (=) comparison is supported in joining conditions (conditions that specify the connecting rules between the two sides of the join operator). However there is no restriction on other predicates in the ON clause concerning only one side of the join operator.</li> 
-  <li><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1179">PHOENIX-1179</a>: Joins between two large tables that can neither fit into memory.</li> 
+  <li><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1555">PHOENIX-1555</a>: Fallback to many-to-many join if hash join fails due to insufficient memory.</li> 
+  <li><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1556">PHOENIX-1556</a>: Base hash join versus many-to-many decision on how many guideposts will be traversed for RHS table(s).</li> 
  </ol> 
- <p>Continuous efforts are being made to enhance Phoenix with more complete join functionalities. Please refer to our <a href="roadmap.html">Roadmap</a> for more information.</p> 
+ <p>Continuous efforts are being made to bring in more performance enhancement for join queries based on table statistics. Please refer to our <a href="roadmap.html">Roadmap</a> for more information.</p> 
 </div>
 			</div>
 		</div>

Modified: phoenix/site/publish/recent.html
URL: http://svn.apache.org/viewvc/phoenix/site/publish/recent.html?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/publish/recent.html (original)
+++ phoenix/site/publish/recent.html Sat Jan 31 23:07:21 2015
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2015-01-27
+ Generated by Apache Maven Doxia at 2015-01-31
  Rendered using Reflow Maven Skin 1.1.0 (http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -137,6 +137,7 @@
  <li><b><a href="update_statistics.html">Statistics Collection</a></b>. Collects the statistics for a table to improve query parallelization. <b>Available in our 3.2/4.2 release</b></li> 
  <li><b><a href="joins.html">Join Improvements</a></b>. Improve existing hash join implementation. 
   <ul> 
+   <li><b><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1179">Many-to-many joins</a></b>. Support joins where both sides are too large to fit into memory. <b>Available in our 3.3/4.3 release</b></li> 
    <li><b><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-852">Optimize foreign key joins</a></b>. Optimize foreign key joins by leveraging our skip scan filter. <b>Available in our 3.2/4.2 release</b></li> 
    <li><b><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-167">Semi/anti joins</a></b>. Support semi/anti subqueries through the standard [NOT] IN and [NOT] EXISTS keywords. <b>Available in our 3.2/4.2 release</b></li> 
   </ul></li> 

Modified: phoenix/site/publish/roadmap.html
URL: http://svn.apache.org/viewvc/phoenix/site/publish/roadmap.html?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/publish/roadmap.html (original)
+++ phoenix/site/publish/roadmap.html Sat Jan 31 23:07:21 2015
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2015-01-27
+ Generated by Apache Maven Doxia at 2015-01-31
  Rendered using Reflow Maven Skin 1.1.0 (http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -137,7 +137,7 @@
  <li><b><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-400">Transaction Support</a></b>. Support transactions by integrating with an open source solution like <a class="externalLink" href="https://github.com/continuuity/tephra">Tephra</a>, <a class="externalLink" href="https://github.com/XiaoMi/themis">Themis</a>, or some other similar option.</li> 
  <li><b><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1167">Join Improvements</a></b>. Enhance our join capabilities in a variety of ways:<br /> 
   <ul> 
-   <li><b><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1179">Many-to-many joins</a></b>. Support joins where both sides are too large to fit into memory.</li> 
+   <li><b><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1556">Table-stats-guided choice between hash join and sort-merge join</a></b>. Base hash join versus many-to-many decision on how many guideposts will be traversed for RHS table(s).</li> 
    <li><b><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-150">Inlined parent/child joins</a></b>. Optimize parent/child joins by storing child rows inside of a parent row, forming the column qualifier through a known prefix plus the child row primary key.</li> 
   </ul></li> 
  <li><b><a href="subqueries.html">Subquery</a> Enhancement</b>, which includes support for <b><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1388">correlated subqueries in the HAVING clause</a></b> and <b><a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-1392">using subqueries as expressions</a></b>.</li> 

Modified: phoenix/site/source/src/site/markdown/joins.md
URL: http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/joins.md?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/joins.md (original)
+++ phoenix/site/source/src/site/markdown/joins.md Sat Jan 31 23:07:21 2015
@@ -124,6 +124,12 @@ As an alternative to the [earlier exampl
          GROUP BY ItemID) AS O
     ON Items.ItemID = O.ItemID;
 
+## Hash Join vs. Sort-Merge Join
+
+Basic hash join usually outperforms other types of join algorithms, but it has its limitations too, the most significant of which is the assumption that one of the relations must be small enough to fit into memory. Thus Phoenix now has both hash join and sort-merge join implemented to facilitate fast join operations as well as join between two large tables.
+
+Phoenix currently uses the hash join algorithm whenever possible since it is usually much faster. However we have the hint "USE_SORT_MERGE_JOIN" for forcing the usage of sort-merge join in a query. The choice between these two join algorithms, together with detecting the smaller relation for hash join, will be done automatically in future under the guidance provided by table statistics.
+
 ## Foreign Key to Primary Key Join Optimization<a name="foreign-key-to-primary-key-join-optimization"></a>
 
 Oftentimes a join will occur from a child table to a parent table, mapping the foreign key of the child table to the primary key of the parent. So instead of doing a full scan on the parent table, Phoenix will drive a skip-scan or a range-scan based on the foreign key values it got from the child table result.
@@ -165,17 +171,17 @@ W/O Optimization    |W/ Optimization
 --------------------|---------------
 8.1s                |0.4s
 
-However, there are times when the foreign key values from the child table account for a complete primary key space in the parent table, thus using skip-scans would only be slower not faster. In order to avoid such situations, Phoenix currently does a range-scan by default and only chooses to do a skip-scan when there is a child table filter in the WHERE clause or the ON clause, as in the above example. Table statistics will come to help making smarter choices between the two schemes in future. Yet you can always use hints "SKIP_SCAN_HASH_JOIN" or "RANGE_SCAN_HASH_JOIN" to change the default behavior.
+However, there are times when the foreign key values from the child table account for a complete primary key space in the parent table, thus using skip-scans would only be slower not faster. Yet you can always turn off the optimization by specifying hint "NO_CHILD_PARENT_OPTIMIZATION". Furthermore, table statistics will soon come in to help making smarter choices between the two schemes.
 
 ## Configuration
 
-The join functionality is now implemented through hash joins, which means one side of the join operator has to be small enough to fit into memory in order to be broadcast over all servers that have the data of concern from the other side of join. This limitation will be eliminated once [PHOENIX-1179](https://issues.apache.org/jira/browse/PHOENIX-1179) is implemented.
+As mentioned earlier, if we decide to use the hash join approach for our join queries, the prerequisite is that either of the relations can be small enough to fit into memory in order to be broadcast over all servers that have the data of concern from the other relation. And aside from making sure that the region server heap size is big enough to hold the smaller relation, we might also need to pay a attention to a few configuration parameters that are crucial to running hash joins.
 
-The servers-side caches are used to hold the hashed join-table results. The size and the living time of the caches are controlled by the following parameters. Note that a join-table can be a physical table, a view, a subquery, or a joined result of other join-tables in a multi-join query.
+The servers-side caches are used to hold the hash table built upon the smaller relation. The size and the living time of the caches are controlled by the following parameters. Note that a relation can be a physical table, a view, a subquery, or a joined result of other relations in a multiple-join query.
 
 1. phoenix.query.maxServerCacheBytes
-    * Maximum size (in bytes) of a join-table result before compression and conversion to a hash map.
-    * Attempting to hash a join-table result of a size bigger than this setting will result in a MaxServerCacheSizeExceededException.
+    * Maximum size (in bytes) of the raw results of a relation before being compressed and sent over to the region servers.
+    * Attempting to serializing the raw results of a relation with a size bigger than this setting will result in a MaxServerCacheSizeExceededException.
     * **Default: 104,857,600**
 2. phoenix.query.maxGlobalMemoryPercentage
     * Percentage of total heap memory (i.e. Runtime.getRuntime().maxMemory()) that all threads may use.
@@ -188,25 +194,25 @@ The servers-side caches are used to hold
 
 See our [Configuration and Tuning Guide](tuning.html) for more details.
 
-Although changing parameters can sometimes be a solution to getting rid of the exceptions mentioned above, it is highly recommended that you first consider optimizing the join queries according to the information provided in the following chapter.
+Although changing parameters can sometimes be a solution to getting rid of the exceptions mentioned above, it is highly recommended that you first consider optimizing the join queries according to the information provided in the following section.
 
 ## Optimizing Your Query
 
-As mentioned in the previous chapter, it is most crucial to make sure that there will be enough memory for the join query execution. But other than rush to change the configuration immediately, sometimes all you need to do is to know a bit of the interiors and adjust the sequence of the tables that appear in your join query.
+Now that we know if using hash join it is most crucial to make sure that there will be enough memory for the query execution, but other than rush to change the configuration immediately, sometimes all you need to do is to know a bit of the interiors and adjust the sequence of the tables that appear in your join query.
 
-Below is a description of the default join order (without the presence of table statistics) and of which side of the query will be executed as an inner query and put into server cache:
+Below is a description of the default join order (without the presence of table statistics) and of which side of the query will be taken as the "smaller" relation and be put into server cache:
 
 1. _lhs_ INNER JOIN _rhs_
 
-    _rhs_ will be built as hash map in server cache.
+    _rhs_ will be built as hash table in server cache.
 
 2. _lhs_ LEFT OUTER JOIN _rhs_
 
-    _rhs_ will be built as hash map in server cache.
+    _rhs_ will be built as hash table in server cache.
 
 3. _lhs_ RIGHT OUTER JOIN _rhs_
 
-    _lhs_ will be built as hash map in server cache.
+    _lhs_ will be built as hash table in server cache.
 
 The join order is more complicated with multiple-join queries. You can try running "EXPLAIN _join\_query_" to look at the actual execution plan. For multiple-inner-join queries, Phoenix applies star-join optimization by default, which means the leading (left-hand-side) table will be scanned only once joining all right-hand-side tables at the same time. You can turn off this optimization by specifying the hint "NO_STAR_JOIN" in your query if the overall size of all right-hand-side tables would exceed the memory size limit.
 
@@ -240,15 +246,14 @@ The join order will be:
     2. SCAN Orders JOIN HASH[0]; CLOSE HASH[0] --> BUILD HASH[1]
     3. SCAN Items JOIN HASH[1] --> Final Resultset
 
-It is also worth mentioning that not the entire dataset of the table should be counted into the memory consumption. Instead, only those columns used by the query, and of only the records that satisfy the predicates will be built into the server hash map.
+It is also worth mentioning that not the entire dataset of the table should be counted into the memory consumption. Instead, only those columns used by the query, and of only the records that satisfy the predicates will be built into the server hash table.
 
 ## Limitations
 
-In our Phoenix 3.2 and 4.2 releases, joins have the following restrictions:
+In our Phoenix 3.3.0 and 4.3.0 releases, joins have the following restrictions and improvements to be made:
 
-1. FULL OUTER JOIN and CROSS JOIN are not supported.
-2. Equi-joins: Only equality (=) comparison is supported in joining conditions (conditions that specify the connecting rules between the two sides of the join operator). However there is no restriction on other predicates in the ON clause concerning only one side of the join operator.
-3. [PHOENIX-1179](https://issues.apache.org/jira/browse/PHOENIX-1179): Joins between two large tables that can neither fit into memory.
+1. [PHOENIX-1555](https://issues.apache.org/jira/browse/PHOENIX-1555): Fallback to many-to-many join if hash join fails due to insufficient memory.
+2. [PHOENIX-1556](https://issues.apache.org/jira/browse/PHOENIX-1556): Base hash join versus many-to-many decision on how many guideposts will be traversed for RHS table(s).
 
-Continuous efforts are being made to enhance Phoenix with more complete join functionalities. Please refer to our [Roadmap](roadmap.html) for more information.
+Continuous efforts are being made to bring in more performance enhancement for join queries based on table statistics. Please refer to our [Roadmap](roadmap.html) for more information.
 

Modified: phoenix/site/source/src/site/markdown/recent.md
URL: http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/recent.md?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/recent.md (original)
+++ phoenix/site/source/src/site/markdown/recent.md Sat Jan 31 23:07:21 2015
@@ -4,6 +4,7 @@ As items are implemented from our road m
 
 1. **[Statistics Collection](update_statistics.html)**. Collects the statistics for a table to improve query parallelization. **Available in our 3.2/4.2 release**
 2. **[Join Improvements](joins.html)**. Improve existing hash join implementation.
+    * **[Many-to-many joins](https://issues.apache.org/jira/browse/PHOENIX-1179)**. Support joins where both sides are too large to fit into memory. **Available in our 3.3/4.3 release**
     * **[Optimize foreign key joins](https://issues.apache.org/jira/browse/PHOENIX-852)**. Optimize foreign key joins by leveraging our skip scan filter. **Available in our 3.2/4.2 release**
     * **[Semi/anti joins](https://issues.apache.org/jira/browse/PHOENIX-167)**. Support semi/anti subqueries through the standard [NOT] IN and [NOT] EXISTS keywords. **Available in our 3.2/4.2 release**
 3. **[Subqueries](subqueries.html)** Support independent subqueries and correlated subqueries in the WHERE clause as well as subqueries in the FROM clause. **Available in our 3.2/4.2 release**

Modified: phoenix/site/source/src/site/markdown/roadmap.md
URL: http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/roadmap.md?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/roadmap.md (original)
+++ phoenix/site/source/src/site/markdown/roadmap.md Sat Jan 31 23:07:21 2015
@@ -4,7 +4,7 @@ Our roadmap is driven by our user commun
 
 1. **[Transaction Support](https://issues.apache.org/jira/browse/PHOENIX-400)**. Support transactions by integrating with an open source solution like [Tephra](https://github.com/continuuity/tephra), [Themis](https://github.com/XiaoMi/themis), or some other similar option.
 1. **[Join Improvements](https://issues.apache.org/jira/browse/PHOENIX-1167)**. Enhance our join capabilities in a variety of ways:<br/>
-    *  **[Many-to-many joins](https://issues.apache.org/jira/browse/PHOENIX-1179)**. Support joins where both sides are too large to fit into memory. 
+    *  **[Table-stats-guided choice between hash join and sort-merge join](https://issues.apache.org/jira/browse/PHOENIX-1556)**. Base hash join versus many-to-many decision on how many guideposts will be traversed for RHS table(s).
     *  **[Inlined parent/child joins](https://issues.apache.org/jira/browse/PHOENIX-150)**. Optimize parent/child joins by storing child rows inside of a parent row, forming the column qualifier through a known prefix plus the child row primary key.
 2. **[Subquery](subqueries.html) Enhancement**, which includes support for **[correlated subqueries in the HAVING clause](https://issues.apache.org/jira/browse/PHOENIX-1388)** and **[using subqueries as expressions](https://issues.apache.org/jira/browse/PHOENIX-1392)**.
 15. **[Cost-based Query Optimization]((https://issues.apache.org/jira/browse/PHOENIX-1177))**. Enhance existing [statistics collection](update_statistics.html) by enabling further query optmizations based on the size and cardinality of the data.