You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@phoenix.apache.org by sa...@apache.org on 2017/10/25 18:16:49 UTC

svn commit: r1813331 [4/4] - in /phoenix/site: publish/ publish/language/ source/src/site/markdown/

Modified: phoenix/site/publish/update_statistics.html
URL: http://svn.apache.org/viewvc/phoenix/site/publish/update_statistics.html?rev=1813331&r1=1813330&r2=1813331&view=diff
==============================================================================
--- phoenix/site/publish/update_statistics.html (original)
+++ phoenix/site/publish/update_statistics.html Wed Oct 25 18:16:48 2017
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2017-10-12
+ Generated by Apache Maven Doxia at 2017-10-25
  Rendered using Reflow Maven Skin 1.1.0 (http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -162,8 +162,9 @@
 <div class="page-header">
  <h1>Statistics Collection</h1>
 </div> 
-<p>The UPDATE STATISTICS command updates the statistics collected on a table, to improve query performance. This command collects a set of keys per region per column family that are equal byte distanced from each other. These collected keys are called <i>guideposts</i> and they act as <i>hints/guides</i> to improve the parallelization of queries on a given target region.</p> 
+<p>The UPDATE STATISTICS command updates the statistics collected on a table. This command collects a set of keys per region per column family that are equal byte distanced from each other. These collected keys are called <i>guideposts</i> and they act as <i>hints/guides</i> to improve the parallelization of queries on a given target region.</p> 
 <p>Statistics are also automatically collected during major compactions and region splits so manually running this command may not be necessary.</p> 
+<p>In 4.12, we have added a new configuration <tt>phoenix.use.stats.parallelization</tt> which controls whether statistical information on the data should be used to drive query parallelization (as described below). The default value of the configuration is true.</p> 
 <div class="section"> 
  <h2 id="Parallelization">Parallelization</h2> 
  <p>Phoenix breaks up queries into multiple scans and runs them in parallel to reduce latency. Parallelization in Phoenix is driven by the statistics related configuration parameters. Each chunk of data between guideposts will be run in parallel in a separate scan to improve query performance. The chunk size is determined by the GUIDE_POSTS_WIDTH table property (Phoenix 4.9 or above) or the global server-side <tt>phoenix.stats.guidepost.width</tt> parameter if the table property is not set. As the size of the chunks decrease, you’ll want to increase <tt>phoenix.query.queueSize</tt> as more work will be queued in that case. Note that at a minimum, separate scans will be run for each table region. Statistics in Phoenix provides a means of gaining intraregion parallelization. In addition to the guidepost width specification, the client-side <tt>phoenix.query.threadPoolSize</tt> and <tt>phoenix.query.queueSize</tt> parameters and the server-side <tt>hbase.regionserver.handler.count
 </tt> parameter have an impact on the amount of parallelization.</p> 
@@ -469,7 +470,7 @@
 		<div class="row">
 			<div class="span12">
 				<p class="pull-right"><a href="#">Back to top</a></p>
-				<p class="copyright">Copyright &copy;2017 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved.</p>
+				<p class="copyright">Copyright &copy;2013-2017 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved.</p>
 			</div>
 		</div>
 	</div>

Modified: phoenix/site/source/src/site/markdown/columnencoding.md
URL: http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/columnencoding.md?rev=1813331&r1=1813330&r2=1813331&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/columnencoding.md (original)
+++ phoenix/site/source/src/site/markdown/columnencoding.md Wed Oct 25 18:16:48 2017
@@ -79,6 +79,5 @@ COLUMN_ENCODED_BYTES = 1;
 
 When using SINGLE_CELL_ARRAY_WITH_OFFSETS encoding, one has to use a number based column mapping scheme. An attempt to use SINGLE_CELL_ARRAY_WITH_OFFSETS with COLUMN_ENCODED_BYTES = NONE will throw an error.
 
-
- 
-
+###How to disable column mapping?
+To disable column mapping across all new tables, you need to set <code>phoenix.default.column.encoded.bytes.attrib</code> to 0. One can also leave it on globally and have it disabled selectively for a table by setting the COLUMN_ENCODED_BYTES = 0 property in the create table statement. 

Modified: phoenix/site/source/src/site/markdown/tuning_guide.md
URL: http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/tuning_guide.md?rev=1813331&r1=1813330&r2=1813331&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/tuning_guide.md (original)
+++ phoenix/site/source/src/site/markdown/tuning_guide.md Wed Oct 25 18:16:48 2017
@@ -275,6 +275,8 @@ http://phoenix.apache.org/language/index
 You can improve parallelization with the [UPDATE STATISTICS](https://phoenix.apache.org/update_statistics.html) command. This command subdivides each region by determining keys called *guideposts* that are equidistant from each other, then uses these guideposts to break up queries into multiple parallel scans.
 Statistics are turned on by default. With Phoenix 4.9, the user can set guidepost width for each table. Optimal guidepost width depends on a number of factors such as cluster size, cluster usage, number of cores per node, table size, and disk I/O.
 
+In Phoenix 4.12, we have added a new configuration <code>phoenix.use.stats.parallelization</code> that controls whether statistics should be used for driving parallelization. Note that one can still run stats collection. The information collected is used to surface estimates on number of bytes and rows a query will scan when an EXPLAIN is generated for it. 
+
 # Further Tuning
 
 For advice about tuning the underlying HBase and JVM layers, see [Operational and Performance Configuration Options](https://hbase.apache.org/book.html#schema.ops) in the Apache HBase™ Reference Guide.

Modified: phoenix/site/source/src/site/markdown/update_statistics.md
URL: http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/update_statistics.md?rev=1813331&r1=1813330&r2=1813331&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/update_statistics.md (original)
+++ phoenix/site/source/src/site/markdown/update_statistics.md Wed Oct 25 18:16:48 2017
@@ -1,7 +1,7 @@
 # Statistics Collection
 
-The UPDATE STATISTICS command updates the statistics collected on a table, to improve
-query performance. This command collects a set of keys per region per column family that
+The UPDATE STATISTICS command updates the statistics collected on a table. 
+This command collects a set of keys per region per column family that
 are equal byte distanced from each other. These collected keys are called *guideposts*
 and they act as *hints/guides* to improve the parallelization of queries on a given
 target region.
@@ -9,6 +9,11 @@ target region.
 Statistics are also automatically collected during major compactions and region splits so
 manually running this command may not be necessary.
 
+In 4.12, we have added a new configuration <code>phoenix.use.stats.parallelization</code>
+which controls whether statistical information on the data should be used to drive
+query parallelization (as described below). The default value of the configuration is true.
+
+
 ##Parallelization
 Phoenix breaks up queries into multiple scans and runs them in parallel to reduce latency.
 Parallelization in Phoenix is driven by the statistics related configuration parameters.