You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2017/04/14 23:48:05 UTC

[1/2] incubator-impala git commit: IMPALA-3040: Fix test_caching_ddl test

Repository: incubator-impala
Updated Branches:
  refs/heads/master 491154c8e -> 8bdfe0320


IMPALA-3040: Fix test_caching_ddl test

This commmit adds a 30sec timeout on the validation step of
test_caching_ddl test. This test has been flaky and we suspect a race
between the submission of a cache directive removal and the reported
cached directives from the 'hdfs cacheadmin' utility command.

Change-Id: I3ec4ba5dfae6e90a2bb76e22c93909b05bd78fa4
Reviewed-on: http://gerrit.cloudera.org:8080/6603
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/cb1e4f65
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/cb1e4f65
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/cb1e4f65

Branch: refs/heads/master
Commit: cb1e4f659f6fd42469ef0813aa56fa36cb43fc22
Parents: 491154c
Author: Dimitris Tsirogiannis <dt...@cloudera.com>
Authored: Mon Apr 10 15:43:35 2017 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Apr 14 22:34:44 2017 +0000

----------------------------------------------------------------------
 tests/query_test/test_hdfs_caching.py | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/cb1e4f65/tests/query_test/test_hdfs_caching.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_hdfs_caching.py b/tests/query_test/test_hdfs_caching.py
index a837ce3..f446913 100644
--- a/tests/query_test/test_hdfs_caching.py
+++ b/tests/query_test/test_hdfs_caching.py
@@ -189,7 +189,6 @@ class TestHdfsCachingDdl(ImpalaTestSuite):
 
   @pytest.mark.execute_serially
   def test_caching_ddl(self, vector):
-
     # Get the number of cache requests before starting the test
     num_entries_pre = get_num_cache_requests()
     self.run_test_case('QueryTest/hdfs-caching', vector)
@@ -204,7 +203,7 @@ class TestHdfsCachingDdl(ImpalaTestSuite):
     self.client.execute("drop table cachedb.cached_tbl_local")
 
     # Dropping the tables should cleanup cache entries leaving us with the same
-    # total number of entries
+    # total number of entries.
     assert num_entries_pre == get_num_cache_requests()
 
   @pytest.mark.execute_serially
@@ -300,7 +299,24 @@ def change_cache_directive_repl_for_path(path, repl):
       "Error modifying cache directive for path %s (%s, %s)" % (path, stdout, stderr)
 
 def get_num_cache_requests():
-  """Returns the number of outstanding cache requests"""
-  rc, stdout, stderr = exec_process("hdfs cacheadmin -listDirectives -stats")
-  assert rc == 0, 'Error executing hdfs cacheadmin: %s %s' % (stdout, stderr)
-  return len(stdout.split('\n'))
+  """Returns the number of outstanding cache requests. Due to race conditions in the
+    way cache requests are added/dropped/reported (see IMPALA-3040), this function tries
+    to return a stable result by making several attempts to stabilize it within a
+    reasonable timeout."""
+  def get_num_cache_requests_util():
+    rc, stdout, stderr = exec_process("hdfs cacheadmin -listDirectives -stats")
+    assert rc == 0, 'Error executing hdfs cacheadmin: %s %s' % (stdout, stderr)
+    return len(stdout.split('\n'))
+
+  wait_time_in_sec = 5
+  num_stabilization_attempts = 0
+  max_num_stabilization_attempts = 10
+  new_requests = None
+  num_requests = None
+  while num_stabilization_attempts < max_num_stabilization_attempts:
+    new_requests = get_num_cache_requests_util()
+    if new_requests == num_requests: break
+    num_requests = new_requests
+    num_stabilization_attempts = num_stabilization_attempts + 1
+    time.sleep(wait_time_in_sec)
+  return num_requests


[2/2] incubator-impala git commit: IMPALA-2924: [DOCS] Add docs for HDFS cache-related hints

Posted by ta...@apache.org.
IMPALA-2924: [DOCS] Add docs for HDFS cache-related hints

The JIRA discusses a RANDOM_REPLICA query option but Impala only
has a SCHEDULE_RANDOM_REPLICA option. So I stated that the
RANDOM_REPLICA hint is the same as specifying
SCHEDULE_RANDOM_REPLICA=true. Please confirm.

Change-Id: I7284dd45c8173eef104ebd32789429e8c16c7bf2
Reviewed-on: http://gerrit.cloudera.org:8080/6631
Reviewed-by: Lars Volker <lv...@cloudera.com>
Reviewed-by: John Russell <jr...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/8bdfe032
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/8bdfe032
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/8bdfe032

Branch: refs/heads/master
Commit: 8bdfe032012e0b52550bc6784dc972b9dcfb5f7b
Parents: cb1e4f6
Author: John Russell <jr...@cloudera.com>
Authored: Thu Apr 13 14:10:07 2017 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Apr 14 22:37:34 2017 +0000

----------------------------------------------------------------------
 docs/topics/impala_hints.xml | 42 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/8bdfe032/docs/topics/impala_hints.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_hints.xml b/docs/topics/impala_hints.xml
index 7d833f6..4524c14 100644
--- a/docs/topics/impala_hints.xml
+++ b/docs/topics/impala_hints.xml
@@ -80,7 +80,8 @@ INSERT <varname>insert_clauses</varname>
     <p rev="2.0.0">
       In <keyword keyref="impala20_full"/> and higher, you can also specify the hints inside comments that use
       either the <codeph>/* */</codeph> or <codeph>--</codeph> notation. Specify a <codeph>+</codeph> symbol
-      immediately before the hint name.
+      immediately before the hint name. Recently added hints are only available using the <codeph>/* */</codeph>
+      and <codeph>--</codeph> notation.
     </p>
 
 <codeblock rev="2.0.0">SELECT STRAIGHT_JOIN <varname>select_list</varname> FROM
@@ -102,6 +103,12 @@ INSERT <varname>insert_clauses</varname>
 INSERT <varname>insert_clauses</varname>
   -- +SHUFFLE|NOSHUFFLE
   SELECT <varname>remainder_of_query</varname>;
+
+<ph rev="IMPALA-2924">SELECT <varname>select_list</varname> FROM
+<varname>table_ref</varname>
+  /* +{SCHEDULE_CACHE_LOCAL | SCHEDULE_DISK_LOCAL | SCHEDULE_REMOTE}
+    [,RANDOM_REPLICA] */
+<varname>remainder_of_query</varname>;</ph>
 </codeblock>
 
     <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
@@ -109,7 +116,7 @@ INSERT <varname>insert_clauses</varname>
     <p>
       With both forms of hint syntax, include the <codeph>STRAIGHT_JOIN</codeph>
       keyword immediately after the <codeph>SELECT</codeph> keyword to prevent Impala from
-      reordering the tables in a way that makes the hint ineffective.
+      reordering the tables in a way that makes the join-related hints ineffective.
     </p>
 
     <p>
@@ -163,6 +170,37 @@ INSERT <varname>insert_clauses</varname>
 
     <p conref="../shared/impala_common.xml#common/insert_hints"/>
 
+    <p rev="IMPALA-2924">
+      <b>Hints for scheduling of HDFS blocks:</b>
+    </p>
+
+    <p rev="IMPALA-2924">
+      The hints <codeph>/* +SCHEDULE_CACHE_LOCAL */</codeph>,
+      <codeph>/* +SCHEDULE_DISK_LOCAL */</codeph>, and
+      <codeph>/* +SCHEDULE_REMOTE */</codeph> have the same effect
+      as specifying the <codeph>REPLICA_PREFERENCE</codeph> query
+      option with the respective option settings of <codeph>CACHE_LOCAL</codeph>,
+      <codeph>DISK_LOCAL</codeph>, or <codeph>REMOTE</codeph>.
+      The hint <codeph>/* +RANDOM_REPLICA */</codeph> is the same as
+      enabling the <codeph>SCHEDULE_RANDOM_REPLICA</codeph> query option.
+    </p>
+
+    <p rev="IMPALA-2924">
+      You can use these hints in combination by separating them with commas,
+      for example, <codeph>/* +SCHEDULE_CACHE_LOCAL,RANDOM_REPLICA */</codeph>.
+      See <xref keyref="replica_preference"/> and
+      <xref keyref="schedule_random_replica"/> for information about how
+      these settings influence the way Impala processes HDFS data blocks.
+    </p>
+
+    <p rev="IMPALA-2924">
+      Specifying the replica preference as a query hint always overrides the
+      query option setting. Specifying either the <codeph>SCHEDULE_RANDOM_REPLICA</codeph>
+      query option or the corresponding <codeph>RANDOM_REPLICA</codeph> query hint
+      enables the random tie-breaking behavior when processing data blocks
+      during the query.
+    </p>
+
     <p>
       <b>Suggestions versus directives:</b>
     </p>