You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2019/07/30 04:03:38 UTC

[impala] 01/02: IMPALA-8807: fix OPTIMIZE_PARTITION_KEY_SCANS docs

This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b6b45c06656276edc90928c0bbb95c93e4a04f6f
Author: Tim Armstrong <ta...@cloudera.com>
AuthorDate: Mon Jul 29 17:29:35 2019 -0700

    IMPALA-8807: fix OPTIMIZE_PARTITION_KEY_SCANS docs
    
    The docs were inaccurate about the cases in which the optimisation
    applied. Happily, it actually works in a much wider set of cases.
    
    Change-Id: I8909b23bfe2b90470fc559fbc01f1e3aa3caa85d
    Reviewed-on: http://gerrit.cloudera.org:8080/13949
    Reviewed-by: Alex Rodoni <ar...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../topics/impala_optimize_partition_key_scans.xml | 28 +++++++++++++++++-----
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/docs/topics/impala_optimize_partition_key_scans.xml b/docs/topics/impala_optimize_partition_key_scans.xml
index 070f359..a70f3b2 100644
--- a/docs/topics/impala_optimize_partition_key_scans.xml
+++ b/docs/topics/impala_optimize_partition_key_scans.xml
@@ -52,15 +52,31 @@ under the License.
     <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
 
     <p>
-      This optimization speeds up common <q>introspection</q> operations when using queries
-      to calculate the cardinality and range for partition key columns.
+      This optimization speeds up common <q>introspection</q> operations
+      over partition key columns, for example determining the distinct values
+      of partition keys.
     </p>
 
     <p>
-      This optimization does not apply if the queries contain any <codeph>WHERE</codeph>,
-      <codeph>GROUP BY</codeph>, or <codeph>HAVING</codeph> clause. The relevant queries
-      should only compute the minimum, maximum, or number of distinct values for the
-      partition key columns across the whole table.
+      This optimization does not apply to <codeph>SELECT</codeph> statements
+      that reference columns that are not partition keys. It also only applies
+      when all the partition key columns in the <codeph>SELECT</codeph> statement
+      are referenced in one of the following contexts:
+      <ul>
+        <li>
+          <p>
+            Within a <codeph>MAX()</codeph> or <codeph>MAX()</codeph>
+            aggregate function or as the argument of any aggregate function with
+            the <codeph>DISTINCT</codeph> keyword applied.
+          </p>
+        </li>
+        <li>
+          <p>
+            Within a <codeph>WHERE</codeph>, <codeph>GROUP BY</codeph>
+            or <codeph>HAVING</codeph> clause.
+          </p>
+        </li>
+      </ul>
     </p>
 
     <p>