You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2018/05/08 22:29:41 UTC

[1/5] impala git commit: IMPALA-5969: [DOCS] Adds --auth_creds_ok_in_clear to shell options

Repository: impala
Updated Branches:
  refs/heads/master e2e7c103a -> 96c9dac28


IMPALA-5969: [DOCS] Adds --auth_creds_ok_in_clear to shell options

This patch adds --auth_creds_ok_in_clear to the impala_shell_options
documentation xml

Change-Id: I19450ebd839b84a85598d283c04a77662fa5e44e
Reviewed-on: http://gerrit.cloudera.org:8080/10236
Reviewed-by: Jim Apple <jb...@apache.org>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/d8d8ddf8
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/d8d8ddf8
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/d8d8ddf8

Branch: refs/heads/master
Commit: d8d8ddf8af5984bcd87970c6c978bafc47b39d50
Parents: e2e7c10
Author: shashanknaikdev <sh...@gmail.com>
Authored: Sun Apr 29 23:11:40 2018 -0400
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue May 8 17:34:36 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_shell_options.xml | 9 +++++++++
 1 file changed, 9 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/d8d8ddf8/docs/topics/impala_shell_options.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_shell_options.xml b/docs/topics/impala_shell_options.xml
index 73e2711..755a800 100644
--- a/docs/topics/impala_shell_options.xml
+++ b/docs/topics/impala_shell_options.xml
@@ -557,6 +557,15 @@ under the License.
                 This feature is available in <keyword keyref="impala25_full"/> and higher.
               </entry>
             </row>
+            <row rev="2.3.0 IMPALA-2143">
+              <entry>--auth_creds_ok_in_clear</entry>
+              <entry>N/A</entry>
+              <entry>
+                Allows LDAP authentication to be used with an insecure connection to the shell.
+                WARNING: This will allow authentication credentials to be sent unencrypted,
+                and hence may be vulnerable to an attack.
+              </entry>
+            </row>
           </tbody>
         </tgroup>
       </table>


[3/5] impala git commit: IMPALA-4850: [DOCS] COMMENT should come after PARTITIONED BY

Posted by ta...@apache.org.
IMPALA-4850: [DOCS] COMMENT should come after PARTITIONED BY

Change-Id: I03fd4a308981955bb52ca79772fe2f7c01b5894f
Reviewed-on: http://gerrit.cloudera.org:8080/10316
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/4cf4c90a
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/4cf4c90a
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/4cf4c90a

Branch: refs/heads/master
Commit: 4cf4c90a2040e3f0bb2311a6cad8561f8db0a10e
Parents: 4a24618
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Fri May 4 16:37:03 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue May 8 21:31:45 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_create_table.xml | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/4cf4c90a/docs/topics/impala_create_table.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_table.xml b/docs/topics/impala_create_table.xml
index ad69c55..415b7be 100644
--- a/docs/topics/impala_create_table.xml
+++ b/docs/topics/impala_create_table.xml
@@ -92,13 +92,12 @@ under the License.
   [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
   <ph rev="2.9.0 IMPALA-4166">[SORT BY ([<varname>column</varname> [, <varname>column</varname> ...]])]</ph>
   [COMMENT '<varname>table_comment</varname>']
+  [ROW FORMAT <varname>row_format</varname>]
   [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
-  [
-   [ROW FORMAT <varname>row_format</varname>] [STORED AS <varname>file_format</varname>]
-  ]
+  [STORED AS <varname>file_format</varname>]
   [LOCATION '<varname>hdfs_path</varname>']
+<ph rev="1.4.0">  CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
   [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
-<ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
 </codeblock>
 
     <p>
@@ -109,13 +108,12 @@ under the License.
   <ph rev="2.5.0">[PARTITIONED BY (<varname>col_name</varname>[, ...])]</ph>
   <ph rev="2.9.0 IMPALA-4166">[SORT BY ([<varname>column</varname> [, <varname>column</varname> ...]])]</ph>
   [COMMENT '<varname>table_comment</varname>']
+  [ROW FORMAT <varname>row_format</varname>]
   [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
-  [
-   [ROW FORMAT <varname>row_format</varname>] <ph rev="">[STORED AS <varname>ctas_file_format</varname>]</ph>
-  ]
+  <ph rev="">[STORED AS <varname>ctas_file_format</varname>]</ph>
   [LOCATION '<varname>hdfs_path</varname>']
+  <ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
   [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
-<ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
 AS
   <varname>select_statement</varname></codeblock>
 
@@ -166,16 +164,15 @@ file_format:
 
 <codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
   LIKE PARQUET '<varname>hdfs_path_of_parquet_file</varname>'
+  [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
   <ph rev="2.9.0 IMPALA-4166">[SORT BY ([<varname>column</varname> [, <varname>column</varname> ...]])]</ph>
   [COMMENT '<varname>table_comment</varname>']
-  [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
+  [ROW FORMAT <varname>row_format</varname>]
   [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
-  [
-   [ROW FORMAT <varname>row_format</varname>] [STORED AS <varname>file_format</varname>]
-  ]
+  [STORED AS <varname>file_format</varname>]
   [LOCATION '<varname>hdfs_path</varname>']
-  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
 <ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
+  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
 data_type:
     <varname>primitive_type</varname>
   | array_type


[4/5] impala git commit: IMPALA-6227: reduce window of metric inconsistency

Posted by ta...@apache.org.
IMPALA-6227: reduce window of metric inconsistency

The admission controller test fetches multiple metrics relating to the
admission controller. Before this patch it fetched the whole metrics
list for each metric, meaning there was a substantial window for
the metrics to be inconsistent for a single backend. Now the metrics are
only fetched once. Metric updates are not transactional so there is
still a small window for raciness if an admission decision is made
exactly when the metrics are fetched.

Also try to detect the specific race between updating "dequeued"
and "admitted" that we saw in practice, since the race is still
possible with a smaller window. In that case we retry getting
the metrics.

Change-Id: I2f16edbec53e49446c4c37ef5f926eedb5604319
Reviewed-on: http://gerrit.cloudera.org:8080/10330
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/ab2fc5c8
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/ab2fc5c8
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/ab2fc5c8

Branch: refs/heads/master
Commit: ab2fc5c8b894ef7332d9e4307eacc7842d986aae
Parents: 4cf4c90
Author: Tim Armstrong <ta...@cloudera.com>
Authored: Fri May 4 17:17:20 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue May 8 22:22:47 2018 +0000

----------------------------------------------------------------------
 bin/start-impala-cluster.py                     |  4 +--
 tests/common/impala_service.py                  | 12 ++++++++-
 .../custom_cluster/test_admission_controller.py | 26 ++++++++++++++++----
 3 files changed, 34 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/ab2fc5c8/bin/start-impala-cluster.py
----------------------------------------------------------------------
diff --git a/bin/start-impala-cluster.py b/bin/start-impala-cluster.py
index e870852..cd926da 100755
--- a/bin/start-impala-cluster.py
+++ b/bin/start-impala-cluster.py
@@ -337,8 +337,8 @@ def wait_for_catalog(impalad, timeout_in_seconds=CLUSTER_WAIT_TIMEOUT_IN_SECONDS
   num_tbls = 0
   while (time() - start_time < timeout_in_seconds):
     try:
-      num_dbs = impalad.service.get_metric_value('catalog.num-databases')
-      num_tbls = impalad.service.get_metric_value('catalog.num-tables')
+      num_dbs, num_tbls = impalad.service.get_metric_values(
+          ['catalog.num-databases', 'catalog.num-tables'])
       client_beeswax = impalad.service.create_beeswax_client()
       client_hs2 = impalad.service.create_hs2_client()
       break

http://git-wip-us.apache.org/repos/asf/impala/blob/ab2fc5c8/tests/common/impala_service.py
----------------------------------------------------------------------
diff --git a/tests/common/impala_service.py b/tests/common/impala_service.py
index 3ad0e84..e86528b 100644
--- a/tests/common/impala_service.py
+++ b/tests/common/impala_service.py
@@ -93,8 +93,18 @@ class BaseImpalaService(object):
 
   def get_metric_value(self, metric_name, default_value=None):
     """Returns the value of the the given metric name from the Impala debug webpage"""
+    return self.get_metric_values([metric_name], [default_value])[0]
+
+  def get_metric_values(self, metric_names, default_values=None):
+    """Returns the value of the given metrics from the Impala debug webpage. If
+    default_values is provided and a metric is not present, the default value
+    is returned instead."""
+    if default_values is None:
+      default_values = [None for m in metric_names]
+    assert len(metric_names) == len(default_values)
     metrics = json.loads(self.read_debug_webpage('jsonmetrics?json'))
-    return metrics.get(metric_name, default_value)
+    return [metrics.get(metric_name, default_value)
+            for metric_name, default_value in zip(metric_names, default_values)]
 
   def wait_for_metric_value(self, metric_name, expected_value, timeout=10, interval=1):
     start_time = time()

http://git-wip-us.apache.org/repos/asf/impala/blob/ab2fc5c8/tests/custom_cluster/test_admission_controller.py
----------------------------------------------------------------------
diff --git a/tests/custom_cluster/test_admission_controller.py b/tests/custom_cluster/test_admission_controller.py
index ff6ca96..d1d9dd8 100644
--- a/tests/custom_cluster/test_admission_controller.py
+++ b/tests/custom_cluster/test_admission_controller.py
@@ -524,11 +524,27 @@ class TestAdmissionControllerStress(TestAdmissionControllerBase):
     metrics = {'admitted': 0, 'queued': 0, 'dequeued': 0, 'rejected' : 0,
         'released': 0, 'timed-out': 0}
     for impalad in self.impalads:
-      for short_name in metrics.keys():
-        metrics[short_name] += impalad.service.get_metric_value(\
-            metric_key(self.pool_name, 'total-%s' % short_name), 0)
+      keys = [metric_key(self.pool_name, 'total-%s' % short_name)
+              for short_name in metrics.keys()]
+      values = impalad.service.get_metric_values(keys, [0] * len(keys))
+      for short_name, value in zip(metrics.keys(), values):
+        metrics[short_name] += value
     return metrics
 
+  def get_consistent_admission_metrics(self, num_submitted):
+    """Same as get_admission_metrics() except retries until it gets consistent metrics for
+    num_submitted queries. See IMPALA-6227 for an example of problems with inconsistent
+    metrics where a dequeued query is reflected in dequeued but not admitted."""
+    ATTEMPTS = 5
+    for i in xrange(ATTEMPTS):
+      metrics = self.get_admission_metrics()
+      admitted_immediately = num_submitted - metrics['queued'] - metrics['rejected']
+      if admitted_immediately + metrics['dequeued'] == metrics['admitted']:
+        return metrics
+      LOG.info("Got inconsistent metrics {0}".format(metrics))
+    assert False, "Could not get consistent metrics for {0} queries after {1} attempts: "\
+        "{2}".format(num_submitted, ATTEMPTS, metrics)
+
   def wait_for_metric_changes(self, metric_names, initial, expected_delta):
     """
     Waits for the sum of metrics in metric_names to change by at least expected_delta.
@@ -844,7 +860,7 @@ class TestAdmissionControllerStress(TestAdmissionControllerBase):
     # Admit queries in waves until all queries are done. A new wave of admission
     # is started by killing some of the running queries.
     while len(self.executing_threads) > 0:
-      curr_metrics = self.get_admission_metrics();
+      curr_metrics = self.get_consistent_admission_metrics(num_queries);
       log_metrics("Main loop, curr_metrics: ", curr_metrics);
       num_to_end = len(self.executing_threads)
       LOG.info("Main loop, will request %s queries to end", num_to_end)
@@ -866,7 +882,7 @@ class TestAdmissionControllerStress(TestAdmissionControllerBase):
       # state or we may find an impalad dequeue more requests after we capture metrics.
       self.wait_for_statestore_updates(10)
 
-    final_metrics = self.get_admission_metrics();
+    final_metrics = self.get_consistent_admission_metrics(num_queries);
     log_metrics("Final metrics: ", final_metrics);
     metric_deltas = compute_metric_deltas(final_metrics, initial_metrics)
     assert metric_deltas['timed-out'] == 0


[2/5] impala git commit: [DOCS] Removed the references to YARN as Impala does not support YARN

Posted by ta...@apache.org.
[DOCS] Removed the references to YARN as Impala does not support YARN

Change-Id: Ifcea49b5859a2afbbbe99197e7818c30c7ba6d67
Reviewed-on: http://gerrit.cloudera.org:8080/10346
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/4a24618f
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/4a24618f
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/4a24618f

Branch: refs/heads/master
Commit: 4a24618fcd715092341b51970e823b895bad96d7
Parents: d8d8ddf
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Tue May 8 11:49:49 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue May 8 20:42:44 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_mem_limit.xml | 21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/4a24618f/docs/topics/impala_mem_limit.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_mem_limit.xml b/docs/topics/impala_mem_limit.xml
index 2abfaf7..5e0ca65 100644
--- a/docs/topics/impala_mem_limit.xml
+++ b/docs/topics/impala_mem_limit.xml
@@ -37,9 +37,9 @@ under the License.
   <conbody>
 
     <p>
-      <indexterm audience="hidden">MEM_LIMIT query option</indexterm>
-      When resource management is not enabled, defines the maximum amount of memory a query can allocate on each node.
-      Therefore, the total memory that can be used by a query is the <codeph>MEM_LIMIT</codeph> times the number of nodes.
+      The MEM_LIMIT query option defines the maximum amount of memory a query
+      can allocate on each node. The total memory that can be used by a query is
+      the <codeph>MEM_LIMIT</codeph> times the number of nodes.
     </p>
 
     <p rev="">
@@ -61,21 +61,6 @@ under the License.
     </p>
 
     <p>
-      When resource management is enabled, the mechanism for this option changes. If set, it overrides the
-      automatic memory estimate from Impala. Impala requests this amount of memory from YARN on each node, and the
-      query does not proceed until that much memory is available. The actual memory used by the query could be
-      lower, since some queries use much less memory than others. With resource management, the
-      <codeph>MEM_LIMIT</codeph> setting acts both as a hard limit on the amount of memory a query can use on any
-      node (enforced by YARN) and a guarantee that that much memory will be available on each node while the query
-      is being executed. When resource management is enabled but no <codeph>MEM_LIMIT</codeph> setting is
-      specified, Impala estimates the amount of memory needed on each node for each query, requests that much
-      memory from YARN before starting the query, and then internally sets the <codeph>MEM_LIMIT</codeph> on each
-      node to the requested amount of memory during the query. Thus, if the query takes more memory than was
-      originally estimated, Impala detects that the <codeph>MEM_LIMIT</codeph> is exceeded and cancels the query
-      itself.
-    </p>
-
-    <p>
       <b>Type:</b> numeric
     </p>
 


[5/5] impala git commit: IMPALA-6974: Use CMAKE_POSITION_INDEPENDENT_CODE in backend

Posted by ta...@apache.org.
IMPALA-6974: Use CMAKE_POSITION_INDEPENDENT_CODE in backend

Compilation of individual c++ files are only slightly
different between static and shared compilation. First,
CMake adds -D${LIBRARY_NAME}_EXPORTS to each compilation.
Second, CMake sets CMAKE_POSITION_INDEPENDENT_CODE, which
adds an -fPIC/-fPIE flag automatically. The extra define
is not used by our code, so preprocessing results in
identical code. However, we currently add a global -fPIC
to all compilation whether static or shared. This
introduces a second -fPIC flag on shared where static
only has one. This prevents a hit in ccache, even after
preprocessing.

Switching a global -fPIC to CMAKE_POSITION_INDEPENDENT_CODE
eliminates the difference between shared and static
compilation (apart from the added define). This allows
a ccache hit after preprocessing.

There is a slight difference in some of the compile
commands. CMAKE_POSITION_INDEPENDENT_CODE will add
an -fPIC or a -fPIE depending on whether the C++ file
is going to be an executable. For example,
daemon-main.cc gets -fPIE whereas hdfs-scan-node.cc
gets -fPIC. Previously, everything had an -fPIC.

This saves about an hour on all-build-options-ub1604
due to a higher ccache hit rate.

Before:
cache hit (direct)                  1523
cache hit (preprocessed)              61
cache miss                         12690

After:
cache hit (direct)                  1513
cache hit (preprocessed)            5575
cache miss                          7186

Change-Id: Id37bb5afa6a9b7909bb4efe1390a67f7d1469544
Reviewed-on: http://gerrit.cloudera.org:8080/10267
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/96c9dac2
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/96c9dac2
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/96c9dac2

Branch: refs/heads/master
Commit: 96c9dac287f5b1db305f4ce9b77a92a4f5f68eff
Parents: ab2fc5c
Author: Joe McDonnell <jo...@cloudera.com>
Authored: Tue May 1 11:51:21 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue May 8 22:23:13 2018 +0000

----------------------------------------------------------------------
 be/CMakeLists.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/96c9dac2/be/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/be/CMakeLists.txt b/be/CMakeLists.txt
index 8e4f8bd..cc5a597 100644
--- a/be/CMakeLists.txt
+++ b/be/CMakeLists.txt
@@ -311,7 +311,7 @@ set(CLANG_INCLUDE_FLAGS
 )
 
 # allow linking of static libs into dynamic lib
-add_definitions(-fPIC)
+set(CMAKE_POSITION_INDEPENDENT_CODE ON)
 
 # set compile output directory
 if ("${CMAKE_BUILD_TYPE}" STREQUAL "DEBUG" OR