You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2019/07/30 04:03:37 UTC

[impala] branch master updated (8099911 -> 88da6fd)

This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from 8099911  IMPALA-8802: Switch to pgrep for graceful shutdown helper
     new b6b45c0  IMPALA-8807: fix OPTIMIZE_PARTITION_KEY_SCANS docs
     new 88da6fd  IMPALA-8534: data cache for dockerised tests

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 bin/jenkins/dockerized-impala-run-tests.sh         |  1 +
 bin/start-impala-cluster.py                        | 29 +++++++++++++++++++---
 .../topics/impala_optimize_partition_key_scans.xml | 28 ++++++++++++++++-----
 3 files changed, 48 insertions(+), 10 deletions(-)


[impala] 01/02: IMPALA-8807: fix OPTIMIZE_PARTITION_KEY_SCANS docs

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b6b45c06656276edc90928c0bbb95c93e4a04f6f
Author: Tim Armstrong <ta...@cloudera.com>
AuthorDate: Mon Jul 29 17:29:35 2019 -0700

    IMPALA-8807: fix OPTIMIZE_PARTITION_KEY_SCANS docs
    
    The docs were inaccurate about the cases in which the optimisation
    applied. Happily, it actually works in a much wider set of cases.
    
    Change-Id: I8909b23bfe2b90470fc559fbc01f1e3aa3caa85d
    Reviewed-on: http://gerrit.cloudera.org:8080/13949
    Reviewed-by: Alex Rodoni <ar...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../topics/impala_optimize_partition_key_scans.xml | 28 +++++++++++++++++-----
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/docs/topics/impala_optimize_partition_key_scans.xml b/docs/topics/impala_optimize_partition_key_scans.xml
index 070f359..a70f3b2 100644
--- a/docs/topics/impala_optimize_partition_key_scans.xml
+++ b/docs/topics/impala_optimize_partition_key_scans.xml
@@ -52,15 +52,31 @@ under the License.
     <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
 
     <p>
-      This optimization speeds up common <q>introspection</q> operations when using queries
-      to calculate the cardinality and range for partition key columns.
+      This optimization speeds up common <q>introspection</q> operations
+      over partition key columns, for example determining the distinct values
+      of partition keys.
     </p>
 
     <p>
-      This optimization does not apply if the queries contain any <codeph>WHERE</codeph>,
-      <codeph>GROUP BY</codeph>, or <codeph>HAVING</codeph> clause. The relevant queries
-      should only compute the minimum, maximum, or number of distinct values for the
-      partition key columns across the whole table.
+      This optimization does not apply to <codeph>SELECT</codeph> statements
+      that reference columns that are not partition keys. It also only applies
+      when all the partition key columns in the <codeph>SELECT</codeph> statement
+      are referenced in one of the following contexts:
+      <ul>
+        <li>
+          <p>
+            Within a <codeph>MAX()</codeph> or <codeph>MAX()</codeph>
+            aggregate function or as the argument of any aggregate function with
+            the <codeph>DISTINCT</codeph> keyword applied.
+          </p>
+        </li>
+        <li>
+          <p>
+            Within a <codeph>WHERE</codeph>, <codeph>GROUP BY</codeph>
+            or <codeph>HAVING</codeph> clause.
+          </p>
+        </li>
+      </ul>
     </p>
 
     <p>


[impala] 02/02: IMPALA-8534: data cache for dockerised tests

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 88da6fd421a9449d372de77aae61a33197f4d3c2
Author: Tim Armstrong <ta...@cloudera.com>
AuthorDate: Fri Jul 26 18:39:36 2019 -0700

    IMPALA-8534: data cache for dockerised tests
    
    This adds support for the data cache in dockerised clusters in
    start-impala-cluster.py. It is handled similarly to the
    log directories - we ensure that a separate data cache
    directory is created for each container, then mount
    it at /opt/impala/cache inside the container.
    
    This is then enabled by default for the dockerised tests.
    
    Testing:
    Did a dockerised test run.
    
    Change-Id: I2c75d4a5c1eea7a540d051bb175537163dec0e29
    Reviewed-on: http://gerrit.cloudera.org:8080/13934
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 bin/jenkins/dockerized-impala-run-tests.sh |  1 +
 bin/start-impala-cluster.py                | 29 +++++++++++++++++++++++++----
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/bin/jenkins/dockerized-impala-run-tests.sh b/bin/jenkins/dockerized-impala-run-tests.sh
index d82ff98..67a86b8 100755
--- a/bin/jenkins/dockerized-impala-run-tests.sh
+++ b/bin/jenkins/dockerized-impala-run-tests.sh
@@ -77,6 +77,7 @@ make -j ${IMPALA_BUILD_THREADS} docker_images parquet-reader
 source_impala_config
 
 export TEST_START_CLUSTER_ARGS="--docker_network=${DOCKER_NETWORK}"
+TEST_START_CLUSTER_ARGS+=" --data_cache_dir=/tmp --data_cache_size=500m"
 export MAX_PYTEST_FAILURES=0
 export NUM_CONCURRENT_TESTS=$(nproc)
 # Frontend tests fail because of localhost hardcoded everywhere
diff --git a/bin/start-impala-cluster.py b/bin/start-impala-cluster.py
index 910483d..f6ddb76 100755
--- a/bin/start-impala-cluster.py
+++ b/bin/start-impala-cluster.py
@@ -138,6 +138,8 @@ IMPALA_HOME = os.environ["IMPALA_HOME"]
 CORE_SITE_PATH = os.path.join(IMPALA_HOME, "fe/src/test/resources/core-site.xml")
 KNOWN_BUILD_TYPES = ["debug", "release", "latest"]
 IMPALA_LZO = os.environ["IMPALA_LZO"]
+# The location in the container where the cache is always mounted.
+DATA_CACHE_CONTAINER_PATH = "/opt/impala/cache"
 
 # Kills have a timeout to prevent automated scripts from hanging indefinitely.
 # It is set to a high value to avoid failing if processes are slow to shut down.
@@ -348,8 +350,15 @@ def build_impalad_arg_lists(cluster_size, num_coordinators, use_exclusive_coordi
       # Try creating the directory if it doesn't exist already. May raise exception.
       if not os.path.exists(data_cache_path):
         os.mkdir(data_cache_path)
+      if options.docker_network is None:
+        data_cache_path_arg = data_cache_path
+      else:
+        # The data cache directory will always be mounted at the same path inside the
+        # container.
+        data_cache_path_arg = DATA_CACHE_CONTAINER_PATH
+
       args = "-data_cache={dir}:{quota} {args}".format(
-          dir=data_cache_path, quota=options.data_cache_size, args=args)
+          dir=data_cache_path_arg, quota=options.data_cache_size, args=args)
 
     # Appended at the end so they can override previous args.
     if i < len(per_impalad_args):
@@ -526,7 +535,7 @@ class DockerMiniClusterOperations(object):
                   DEFAULT_HS2_HTTP_PORT: chosen_ports['hs2_http_port'],
                   DEFAULT_IMPALAD_WEBSERVER_PORT: chosen_ports['webserver_port']}
       self.__run_container__("impalad_coord_exec", impalad_arg_lists[i], port_map, i,
-          mem_limit=mem_limit)
+          mem_limit=mem_limit, supports_data_cache=True)
 
   def __gen_container_name__(self, daemon, instance=None):
     """Generate the name for the container, which should be unique among containers
@@ -541,7 +550,8 @@ class DockerMiniClusterOperations(object):
       return daemon
     return "{0}-{1}".format(daemon, instance)
 
-  def __run_container__(self, daemon, args, port_map, instance=None, mem_limit=None):
+  def __run_container__(self, daemon, args, port_map, instance=None, mem_limit=None,
+      supports_data_cache=False):
     """Launch a container with the daemon - impalad, catalogd, or statestored. If there
     are multiple impalads in the cluster, a unique instance number must be specified.
     'args' are command-line arguments to be appended to the end of the daemon command
@@ -549,7 +559,9 @@ class DockerMiniClusterOperations(object):
     --docker_auto_ports was set on the command line, 'port_map' is ignored and Docker
     will automatically choose the mapping. If there is an existing running or stopped
     container with the same name, it will be destroyed. If provided, mem_limit is
-    passed to "docker run" as a string to set the memory limit for the container."""
+    passed to "docker run" as a string to set the memory limit for the container.
+    If 'supports_data_cache' is true and the data cache is enabled via --data_cache_dir,
+    mount the data cache inside the container."""
     self.__destroy_container__(daemon, instance)
     if options.docker_auto_ports:
       port_args = ["-P"]
@@ -578,6 +590,15 @@ class DockerMiniClusterOperations(object):
       os.makedirs(log_dir)
     mount_args += ["--mount", "type=bind,src={0},dst=/opt/impala/logs".format(log_dir)]
 
+    # Create a data cache subdirectory for each daemon and mount at /opt/impala/cache
+    # in the container.
+    if options.data_cache_dir and supports_data_cache:
+      data_cache_dir = os.path.join(options.data_cache_dir, host_name + "_cache")
+      if not os.path.isdir(data_cache_dir):
+        os.makedirs(data_cache_dir)
+      mount_args += ["--mount", "type=bind,src={0},dst={1}".format(
+                     data_cache_dir, DATA_CACHE_CONTAINER_PATH)]
+
     # Run the container as the current user.
     user_args = ["--user", "{0}:{1}".format(os.getuid(), os.getgid())]