You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by bh...@apache.org on 2019/02/15 04:59:42 UTC

[impala] branch master updated (55b9c89 -> 5ed6c66)

This is an automated email from the ASF dual-hosted git repository.

bharathv pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from 55b9c89  IMPALA-8187: UDF samples hide symbols by default
     new 7cee01d  IMPALA-8194: wait longer to detect JVM pause in TestPauseMonitor.
     new f8c9ef4  Update toolchain to support ubuntu 18.04
     new 345d012  IMPALA-8195: [DOCS] Impala supports Cartesian joins
     new 6a8bc7f  IMPALA-8189: Disable flaky scanner test on S3
     new f0a47ab  IMPALA-8199: Fix stress test: 'No module named RuntimeProfile.ttypes'
     new 5ed6c66  IMPALA-7961: Avoid adding unmodified objects to DDL response

The 6 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 bin/bootstrap_toolchain.py                         |  3 +-
 bin/impala-config.sh                               | 14 ++--
 docs/topics/impala_joins.xml                       | 31 ++-------
 .../impala/catalog/CatalogServiceCatalog.java      | 11 ++-
 .../org/apache/impala/catalog/FeCatalogUtils.java  | 15 ++++
 .../org/apache/impala/catalog/TopicUpdateLog.java  | 16 ++---
 .../apache/impala/service/CatalogOpExecutor.java   | 81 +++++++++++++++++-----
 tests/custom_cluster/test_pause_monitor.py         |  2 +-
 tests/query_test/test_scanners.py                  |  1 +
 tests/util/parse_util.py                           |  4 +-
 10 files changed, 111 insertions(+), 67 deletions(-)


[impala] 04/06: IMPALA-8189: Disable flaky scanner test on S3

Posted by bh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

bharathv pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 6a8bc7f742c6ee1085d8be5ac374bcf6c7bfa10f
Author: poojanilangekar <po...@cloudera.com>
AuthorDate: Wed Feb 13 14:53:49 2019 -0800

    IMPALA-8189: Disable flaky scanner test on S3
    
    TestParquet::test_resolution_by_name in test_scanners.py is flaky
    due to eventual consistency semantics of S3. It makes sense to
    disable this test temporarily since it is not S3 related. In the
    long run, the eventual consistency issue would have to be fixed
    either by using S3Guard or by modifying the S3Client.
    
    Change-Id: I0771db7c72952f8889a1979c94b02b5f16e4e0ab
    Reviewed-on: http://gerrit.cloudera.org:8080/12478
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 tests/query_test/test_scanners.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/query_test/test_scanners.py b/tests/query_test/test_scanners.py
index 0562201..ae9c9ab 100644
--- a/tests/query_test/test_scanners.py
+++ b/tests/query_test/test_scanners.py
@@ -698,6 +698,7 @@ class TestParquet(ImpalaTestSuite):
     assert c_schema_elt.converted_type == ConvertedType.UTF8
     assert d_schema_elt.converted_type == None
 
+  @SkipIfS3.eventually_consistent
   def test_resolution_by_name(self, vector, unique_database):
     self.run_test_case('QueryTest/parquet-resolution-by-name', vector,
                        use_db=unique_database)


[impala] 03/06: IMPALA-8195: [DOCS] Impala supports Cartesian joins

Posted by bh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

bharathv pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 345d012fff81b0d2c261ab15ec29b9e1ead66192
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Thu Feb 14 12:43:02 2019 -0800

    IMPALA-8195: [DOCS] Impala supports Cartesian joins
    
    - Removed the note that Impala does not support Cartesian joins.
    
    Change-Id: I8734a4c2cb63b1a3229660e01d45a578a42efc29
    Reviewed-on: http://gerrit.cloudera.org:8080/12487
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Tim Armstrong <ta...@cloudera.com>
---
 docs/topics/impala_joins.xml | 31 +++++--------------------------
 1 file changed, 5 insertions(+), 26 deletions(-)

diff --git a/docs/topics/impala_joins.xml b/docs/topics/impala_joins.xml
index ef0a67a..2c8cc94 100644
--- a/docs/topics/impala_joins.xml
+++ b/docs/topics/impala_joins.xml
@@ -34,13 +34,11 @@ under the License.
 
   <conbody>
 
-    <p>
-      <indexterm audience="hidden">joins</indexterm>
-      A join query is a <codeph>SELECT</codeph> statement that combines data from two or more tables,
-      and returns a result set containing items from some or all of those tables. It is a way to
-      cross-reference and correlate related data that is organized into multiple tables, typically
-      using identifiers that are repeated in each of the joined tables.
-    </p>
+    <p> A join query is a <codeph>SELECT</codeph> statement that combines data
+      from two or more tables, and returns a result set containing items from
+      some or all of those tables. It is a way to cross-reference and correlate
+      related data that is organized into multiple tables, typically using
+      identifiers that are repeated in each of the joined tables. </p>
 
     <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
 
@@ -121,25 +119,6 @@ SELECT t1.c1, t2.c2 FROM <b>t1 JOIN t2</b>
 SELECT lhs.id, rhs.parent, lhs.c1, rhs.c2 FROM tree_data lhs, tree_data rhs WHERE lhs.id = rhs.parent;</codeblock>
 
     <p>
-      <b>Cartesian joins:</b>
-    </p>
-
-    <p>
-      To avoid producing huge result sets by mistake, Impala does not allow Cartesian joins of the form:
-<codeblock>SELECT ... FROM t1 JOIN t2;
-SELECT ... FROM t1, t2;</codeblock>
-      If you intend to join the tables based on common values, add <codeph>ON</codeph> or <codeph>WHERE</codeph>
-      clauses to compare columns across the tables. If you truly intend to do a Cartesian join, use the
-      <codeph>CROSS JOIN</codeph> keyword as the join operator. The <codeph>CROSS JOIN</codeph> form does not use
-      any <codeph>ON</codeph> clause, because it produces a result set with all combinations of rows from the
-      left-hand and right-hand tables. The result set can still be filtered by subsequent <codeph>WHERE</codeph>
-      clauses. For example:
-    </p>
-
-<codeblock>SELECT ... FROM t1 CROSS JOIN t2;
-SELECT ... FROM t1 CROSS JOIN t2 WHERE <varname>tests_on_non_join_columns</varname>;</codeblock>
-
-    <p>
       <b>Inner and outer joins:</b>
     </p>
 


[impala] 01/06: IMPALA-8194: wait longer to detect JVM pause in TestPauseMonitor.

Posted by bh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

bharathv pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 7cee01d1ba7ab6018bf6d7e4d90f4758fd458ce0
Author: Andrew Sherman <as...@cloudera.com>
AuthorDate: Tue Feb 12 16:17:13 2019 -0800

    IMPALA-8194: wait longer to detect JVM pause in TestPauseMonitor.
    
    The test 'test_jvm_pause_monitor_logs_entries' stops and starts an
    impalad, and confirms that that the JVM pause monitor detects the pause
    by looking for a specific message in the log. In a test run the test
    failed to find the correct message after sleeping for 1.2 seconds.
    Because the test notes the last message that it sees in the log, we can
    observe that the test would have found the correct message if it had
    waited for just a few more milliseconds.
    
    This change increases the time that the test waits to 2 seconds.
    
    TESTING:
     Ran end-to-end tests cleanly and checked that
     test_jvm_pause_monitor_logs_entries ran OK.
    
    Change-Id: I735c0c0ecfd3a9099c9cef332c5e79854bec7b8d
    Reviewed-on: http://gerrit.cloudera.org:8080/12475
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 tests/custom_cluster/test_pause_monitor.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/custom_cluster/test_pause_monitor.py b/tests/custom_cluster/test_pause_monitor.py
index 166316f..bf47190 100644
--- a/tests/custom_cluster/test_pause_monitor.py
+++ b/tests/custom_cluster/test_pause_monitor.py
@@ -34,7 +34,7 @@ class TestPauseMonitor(CustomClusterTestSuite):
     time.sleep(5)
     impalad.kill(signal.SIGCONT)
     # Wait for over a second for the cache metrics to expire.
-    time.sleep(1.2)
+    time.sleep(2)
     # Check that the pause is detected.
     self.assert_impalad_log_contains('INFO', "Detected pause in JVM or host machine")
     # Check that the metrics we have for this updated as well


[impala] 06/06: IMPALA-7961: Avoid adding unmodified objects to DDL response

Posted by bh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

bharathv pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 5ed6c665d190dbe5303e241afbc50e0eacb0a6af
Author: Bharath Vissapragada <bh...@cloudera.com>
AuthorDate: Sun Feb 10 17:03:19 2019 -0800

    IMPALA-7961: Avoid adding unmodified objects to DDL response
    
    When a DDL is processed, we typically add the affected (added/removed)
    objects to the response TCatalogUpdateResult struct. This response
    is processed on the coordinator and the changes are applied locally.
    When SYNC_DDL is enabled, the Catalog server also includes a topic
    version number that should include all the affected objects so that the
    coordinator can wait for that miniumum topic version to be applied on
    all other coordinators before returning the control back to the user.
    This covering topic version is calculated by looking at the topic
    update log, which contains all the in-flight updates (and to an extent
    past updates) that are perodically GC'ed.
    
    Bug: In certain cases like CREATE TBL IF NOT EXISTS, we could end up
    adding objects to the DDL response which haven't been modified in a
    while (> TOPIC_UPDATE_LOG_GC_FREQUENCY) and hence could be potentially
    GC'ed from the TopicUpdateLog. This means that the Catalog server
    wouldn't be able to find a covering topic update version and eventually
    gives up throwing an error as described in the jira.
    
    Fix: Bumps the version of any objects that already exists when IF EXISTS
    is used in conjunction with SYNC_DDL. This makes sure that the object is
    included in the upcoming topic updates and waitForSyncDdlVersion() can find
    a covering topic update that includes this object. This is a hack and could
    cause false-positive invalidations, but definitely better than breaking
    SYNC_DDL semantics.
    
    Also added some additional diagnostic logging that could've simplified
    debugging an issue like this.
    
    Testing: Since this is a racy bug, I could only repro it by forcing
    frequent topic update log GCs along with a specific sequence of
    actions. Couldn't reproduce it with the patch.
    
    Change-Id: If3e914b70ba796c9b224e9dea559b8c40aa25d83
    Reviewed-on: http://gerrit.cloudera.org:8080/12428
    Reviewed-by: Bharath Vissapragada <bh...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../impala/catalog/CatalogServiceCatalog.java      | 11 ++-
 .../org/apache/impala/catalog/FeCatalogUtils.java  | 15 ++++
 .../org/apache/impala/catalog/TopicUpdateLog.java  | 16 ++---
 .../apache/impala/service/CatalogOpExecutor.java   | 81 +++++++++++++++++-----
 4 files changed, 92 insertions(+), 31 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java b/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
index 274ca35..0904714 100644
--- a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
+++ b/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
@@ -80,7 +80,7 @@ import org.apache.impala.thrift.TUpdateTableUsageRequest;
 import org.apache.impala.util.FunctionUtils;
 import org.apache.impala.util.PatternMatcher;
 import org.apache.impala.util.SentryProxy;
-import org.apache.log4j.Logger;
+import org.slf4j.Logger;
 import org.apache.thrift.TException;
 import org.apache.thrift.TSerializer;
 import org.apache.thrift.protocol.TBinaryProtocol;
@@ -91,6 +91,7 @@ import com.google.common.base.Preconditions;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Lists;
 import com.google.common.collect.Sets;
+import org.slf4j.LoggerFactory;
 
 
 /**
@@ -171,7 +172,7 @@ import com.google.common.collect.Sets;
  * loading thread pool.
  */
 public class CatalogServiceCatalog extends Catalog {
-  public static final Logger LOG = Logger.getLogger(CatalogServiceCatalog.class);
+  public static final Logger LOG = LoggerFactory.getLogger(CatalogServiceCatalog.class);
 
   private static final int INITIAL_META_STORE_CLIENT_POOL_SIZE = 10;
   private static final int MAX_NUM_SKIPPED_TOPIC_UPDATES = 2;
@@ -1279,7 +1280,7 @@ public class CatalogServiceCatalog extends Catalog {
         tableLoadingMgr_.backgroundLoad(tblName);
       }
     } catch (Exception e) {
-      LOG.error(e);
+      LOG.error("Error initializing Catalog", e);
       throw new CatalogException("Error initializing Catalog. Catalog may be empty.", e);
     } finally {
       versionLock_.writeLock().unlock();
@@ -2237,6 +2238,10 @@ public class CatalogServiceCatalog extends Catalog {
         if (lastSentTopicUpdate != currentTopicUpdate) {
           ++numAttempts;
           if (numAttempts > maxNumAttempts) {
+            LOG.error(String.format("Couldn't retrieve the covering topic version for "
+                + "catalog objects. Updated objects: %s, deleted objects: %s",
+                FeCatalogUtils.debugString(result.updated_catalog_objects),
+                FeCatalogUtils.debugString(result.removed_catalog_objects)));
             throw new CatalogException("Couldn't retrieve the catalog topic version " +
                 "for the SYNC_DDL operation after " + maxNumAttempts + " attempts." +
                 "The operation has been successfully executed but its effects may have " +
diff --git a/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java b/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
index d10dadc..7e55d73 100644
--- a/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
+++ b/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
@@ -40,6 +40,7 @@ import org.apache.impala.catalog.local.CatalogdMetaProvider;
 import org.apache.impala.catalog.local.LocalCatalog;
 import org.apache.impala.catalog.local.MetaProvider;
 import org.apache.impala.service.BackendConfig;
+import org.apache.impala.thrift.TCatalogObject;
 import org.apache.impala.thrift.TColumnDescriptor;
 import org.apache.impala.thrift.TGetCatalogMetricsResult;
 import org.apache.impala.thrift.THdfsPartition;
@@ -387,4 +388,18 @@ public abstract class FeCatalogUtils {
   }
 
 
+  /**
+   * Returns a debug string for a given list of TCatalogObjects. Includes the unique key
+   * and version number for each object.
+   */
+  public static String debugString(List<TCatalogObject> objects) {
+    if (objects == null || objects.size() == 0) return "[]";
+    List<String> catalogObjs = new ArrayList<>();
+    for (TCatalogObject object: objects) {
+      catalogObjs.add(String.format("%s version: %d",
+          Catalog.toCatalogObjectKey(object), object.catalog_version));
+    }
+    return "[" + Joiner.on(",").join(catalogObjs) + "]";
+  }
+
 }
diff --git a/fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java b/fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java
index 779d8f7..be80b3b 100644
--- a/fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java
+++ b/fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java
@@ -20,7 +20,8 @@ package org.apache.impala.catalog;
 import java.util.Map;
 import java.util.concurrent.ConcurrentHashMap;
 
-import org.apache.log4j.Logger;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import com.google.common.base.Preconditions;
 import com.google.common.base.Strings;
@@ -37,7 +38,7 @@ import com.google.common.base.Strings;
 // by the catalog for at least TOPIC_UPDATE_LOG_GC_FREQUENCY updates to be removed from
 // the log.
 public class TopicUpdateLog {
-  private static final Logger LOG = Logger.getLogger(TopicUpdateLog.class);
+  private static final Logger LOG = LoggerFactory.getLogger(TopicUpdateLog.class);
   // Frequency at which the entries of the topic update log are garbage collected.
   // An entry may survive for (2 * TOPIC_UPDATE_LOG_GC_FREQUENCY) - 1 topic updates.
   private final static int TOPIC_UPDATE_LOG_GC_FREQUENCY = 1000;
@@ -104,9 +105,8 @@ public class TopicUpdateLog {
       return;
     }
     if (numTopicUpdatesToGc_ == 0) {
-      if (LOG.isTraceEnabled()) {
-        LOG.trace("Topic update log GC started.");
-      }
+      LOG.info("Topic update log GC started. GC-ing topics with versions " +
+          "<= {}", oldestTopicUpdateToGc_);
       Preconditions.checkState(oldestTopicUpdateToGc_ > 0);
       int numEntriesRemoved = 0;
       for (Map.Entry<String, Entry> entry:
@@ -120,10 +120,8 @@ public class TopicUpdateLog {
       }
       numTopicUpdatesToGc_ = TOPIC_UPDATE_LOG_GC_FREQUENCY;
       oldestTopicUpdateToGc_ = lastTopicUpdateVersion;
-      if (LOG.isTraceEnabled()) {
-        LOG.trace("Topic update log GC finished. Removed " + numEntriesRemoved +
-            " entries.");
-      }
+      LOG.info("Topic update log GC finished. Removed {} entries.",
+          numEntriesRemoved);
     } else {
       --numTopicUpdatesToGc_;
     }
diff --git a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
index 137e682..19021b0 100644
--- a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
+++ b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
@@ -169,7 +169,7 @@ import org.apache.impala.util.CompressionUtil;
 import org.apache.impala.util.FunctionUtils;
 import org.apache.impala.util.HdfsCachingUtil;
 import org.apache.impala.util.MetaStoreUtil;
-import org.apache.log4j.Logger;
+import org.slf4j.Logger;
 import org.apache.thrift.TException;
 
 import com.codahale.metrics.Timer;
@@ -179,6 +179,7 @@ import com.google.common.base.Preconditions;
 import com.google.common.collect.Lists;
 import com.google.common.collect.Maps;
 import com.google.common.collect.Sets;
+import org.slf4j.LoggerFactory;
 
 /**
  * Class used to execute Catalog Operations, including DDL and refresh/invalidate
@@ -249,7 +250,7 @@ import com.google.common.collect.Sets;
  * metastore out of this class.
  */
 public class CatalogOpExecutor {
-  private static final Logger LOG = Logger.getLogger(CatalogOpExecutor.class);
+  private static final Logger LOG = LoggerFactory.getLogger(CatalogOpExecutor.class);
   // Format string for exceptions returned by Hive Metastore RPCs.
   private final static String HMS_RPC_ERROR_FORMAT_STR =
       "Error making '%s' RPC to Hive Metastore: ";
@@ -280,6 +281,7 @@ public class CatalogOpExecutor {
       requestingUser = new User(ddlRequest.getHeader().getRequesting_user());
     }
 
+    boolean syncDdl = ddlRequest.isSync_ddl();
     switch (ddlRequest.ddl_type) {
       case ALTER_TABLE:
         alterTable(ddlRequest.getAlter_table_params(), response);
@@ -291,14 +293,14 @@ public class CatalogOpExecutor {
         createDatabase(ddlRequest.getCreate_db_params(), response);
         break;
       case CREATE_TABLE_AS_SELECT:
-        response.setNew_table_created(
-            createTable(ddlRequest.getCreate_table_params(), response));
+        response.setNew_table_created(createTable(
+            ddlRequest.getCreate_table_params(), response, syncDdl));
         break;
       case CREATE_TABLE:
-        createTable(ddlRequest.getCreate_table_params(), response);
+        createTable(ddlRequest.getCreate_table_params(), response, syncDdl);
         break;
       case CREATE_TABLE_LIKE:
-        createTableLike(ddlRequest.getCreate_table_like_params(), response);
+        createTableLike(ddlRequest.getCreate_table_like_params(), response, syncDdl);
         break;
       case CREATE_VIEW:
         createView(ddlRequest.getCreate_view_params(), response);
@@ -363,7 +365,7 @@ public class CatalogOpExecutor {
     // operation. The version of this catalog update is returned to the requesting
     // impalad which will wait until this catalog update has been broadcast to all the
     // coordinators.
-    if (ddlRequest.isSync_ddl()) {
+    if (syncDdl) {
       response.getResult().setVersion(
           catalog_.waitForSyncDdlVersion(response.getResult()));
     }
@@ -1762,9 +1764,12 @@ public class CatalogOpExecutor {
    * lazily load the new metadata on the next access. If this is a managed Kudu table,
    * the table is also created in the Kudu storage engine. Re-throws any HMS or Kudu
    * exceptions encountered during the create.
+   * @param  syncDdl tells if SYNC_DDL option is enabled on this DDL request.
+   * @return true if a new table has been created with the given params, false
+   * otherwise.
    */
-  private boolean createTable(TCreateTableParams params, TDdlExecResponse response)
-      throws ImpalaException {
+  private boolean createTable(TCreateTableParams params, TDdlExecResponse response,
+      boolean syncDdl) throws ImpalaException {
     Preconditions.checkNotNull(params);
     TableName tableName = TableName.fromThrift(params.getTable_name());
     Preconditions.checkState(tableName != null && tableName.isFullyQualified());
@@ -1774,18 +1779,36 @@ public class CatalogOpExecutor {
     Table existingTbl = catalog_.getTableNoThrow(tableName.getDb(), tableName.getTbl());
     if (params.if_not_exists && existingTbl != null) {
       addSummary(response, "Table already exists.");
-      LOG.trace(String.format("Skipping table creation because %s already exists and " +
-          "IF NOT EXISTS was specified.", tableName));
-      existingTbl.getLock().lock();
+      LOG.trace("Skipping table creation because {} already exists and " +
+          "IF NOT EXISTS was specified.", tableName);
+      tryLock(existingTbl);
       try {
-        addTableToCatalogUpdate(existingTbl, response.getResult());
-        return false;
+        if (syncDdl) {
+          // When SYNC_DDL is enabled and the table already exists, we force a version
+          // bump on it so that it is added to the next statestore update. Without this
+          // we could potentially be referring to a table object that has already been
+          // GC'ed from the TopicUpdateLog and waitForSyncDdlVersion() cannot find a
+          // covering topic version (IMPALA-7961).
+          //
+          // This is a conservative hack to not break the SYNC_DDL semantics and could
+          // possibly result in false-positive invalidates on this table. However, that is
+          // better than breaking the SYNC_DDL semantics and the subsequent queries
+          // referring to this table failing with "table not found" errors.
+          long newVersion = catalog_.incrementAndGetCatalogVersion();
+          existingTbl.setCatalogVersion(newVersion);
+          LOG.trace("Table {} version bumped to {} because SYNC_DDL is enabled.",
+              tableName, newVersion);
+        }
+        addTableToCatalogUpdate(existingTbl, response.result);
       } finally {
+        // Release the locks held in tryLock().
+        catalog_.getLock().writeLock().unlock();
         existingTbl.getLock().unlock();
       }
+      return false;
     }
     org.apache.hadoop.hive.metastore.api.Table tbl = createMetaStoreTable(params);
-    LOG.trace(String.format("Creating table %s", tableName));
+    LOG.trace("Creating table {}", tableName);
     if (KuduTable.isKuduTable(tbl)) return createKuduTable(tbl, params, response);
     Preconditions.checkState(params.getColumns().size() > 0,
         "Empty column list given as argument to Catalog.createTable");
@@ -2015,8 +2038,10 @@ public class CatalogOpExecutor {
    * No data is copied as part of this process, it is a metadata only operation. If the
    * creation succeeds, an entry is added to the metadata cache to lazily load the new
    * table's metadata on the next access.
+   * @param  syncDdl tells is SYNC_DDL is enabled for this DDL request.
    */
-  private void createTableLike(TCreateTableLikeParams params, TDdlExecResponse response)
+  private void createTableLike(TCreateTableLikeParams params, TDdlExecResponse response
+      , boolean syncDdl)
       throws ImpalaException {
     Preconditions.checkNotNull(params);
 
@@ -2033,13 +2058,31 @@ public class CatalogOpExecutor {
       addSummary(response, "Table already exists.");
       LOG.trace(String.format("Skipping table creation because %s already exists and " +
           "IF NOT EXISTS was specified.", tblName));
-      existingTbl.getLock().lock();
+      tryLock(existingTbl);
       try {
-        addTableToCatalogUpdate(existingTbl, response.getResult());
-        return;
+        if (syncDdl) {
+          // When SYNC_DDL is enabled and the table already exists, we force a version
+          // bump on it so that it is added to the next statestore update. Without this
+          // we could potentially be referring to a table object that has already been
+          // GC'ed from the TopicUpdateLog and waitForSyncDdlVersion() cannot find a
+          // covering topic version (IMPALA-7961).
+          //
+          // This is a conservative hack to not break the SYNC_DDL semantics and could
+          // possibly result in false-positive invalidates on this table. However, that is
+          // better than breaking the SYNC_DDL semantics and the subsequent queries
+          // referring to this table failing with "table not found" errors.
+          long newVersion = catalog_.incrementAndGetCatalogVersion();
+          existingTbl.setCatalogVersion(newVersion);
+          LOG.trace("Table {} version bumped to {} because SYNC_DDL is enabled.",
+              existingTbl.getFullName(), newVersion);
+        }
+        addTableToCatalogUpdate(existingTbl, response.result);
       } finally {
+        // Release the locks held in tryLock().
+        catalog_.getLock().writeLock().unlock();
         existingTbl.getLock().unlock();
       }
+      return;
     }
     Table srcTable = getExistingTable(srcTblName.getDb(), srcTblName.getTbl());
     org.apache.hadoop.hive.metastore.api.Table tbl =


[impala] 02/06: Update toolchain to support ubuntu 18.04

Posted by bh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

bharathv pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit f8c9ef48419966840b6026a7d81cfa11a3540a17
Author: Hector Acosta <he...@cloudera.com>
AuthorDate: Fri Feb 8 14:50:38 2019 -0800

    Update toolchain to support ubuntu 18.04
    
    Openldap was bumped because it gained openssl 1.1 support, which is what
    ubuntu 18 uses.
    
    Change-Id: Ie25c8cb129c6817a2e116f31853ae64c5a8acfe9
    Reviewed-on: http://gerrit.cloudera.org:8080/12421
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 bin/bootstrap_toolchain.py |  3 ++-
 bin/impala-config.sh       | 14 +++++++-------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/bin/bootstrap_toolchain.py b/bin/bootstrap_toolchain.py
index 286aaaa..72e8096 100755
--- a/bin/bootstrap_toolchain.py
+++ b/bin/bootstrap_toolchain.py
@@ -75,7 +75,8 @@ OS_MAPPING = [
   OsMapping("ubuntu14.04", "ec2-package-ubuntu-14-04", None),
   OsMapping("ubuntu15.04", "ec2-package-ubuntu-14-04", None),
   OsMapping("ubuntu15.10", "ec2-package-ubuntu-14-04", None),
-  OsMapping('ubuntu16.04', "ec2-package-ubuntu-16-04", "ubuntu1604")
+  OsMapping('ubuntu16.04', "ec2-package-ubuntu-16-04", "ubuntu1604"),
+  OsMapping('ubuntu18.04', "ec2-package-ubuntu-18-04", "ubuntu1804")
 ]
 
 class Package(object):
diff --git a/bin/impala-config.sh b/bin/impala-config.sh
index 4301dd7..96d9a6b 100755
--- a/bin/impala-config.sh
+++ b/bin/impala-config.sh
@@ -68,7 +68,7 @@ fi
 # moving to a different build of the toolchain, e.g. when a version is bumped or a
 # compile option is changed. The build id can be found in the output of the toolchain
 # build jobs, it is constructed from the build number and toolchain git hash prefix.
-export IMPALA_TOOLCHAIN_BUILD_ID=241-259ecff082
+export IMPALA_TOOLCHAIN_BUILD_ID=43961c5c-7ece-489c-a6e3-ec4cbe7ef4b5-b23c19a002
 # Versions of toolchain dependencies.
 # -----------------------------------
 export IMPALA_AVRO_VERSION=1.7.4-p4
@@ -93,7 +93,7 @@ export IMPALA_FLATBUFFERS_VERSION=1.6.0
 unset IMPALA_FLATBUFFERS_URL
 export IMPALA_GCC_VERSION=4.9.2
 unset IMPALA_GCC_URL
-export IMPALA_GDB_VERSION=7.9.1
+export IMPALA_GDB_VERSION=7.9.1-p1
 unset IMPALA_GDB_URL
 export IMPALA_GFLAGS_VERSION=2.2.0-p2
 unset IMPALA_GFLAGS_URL
@@ -118,11 +118,11 @@ export IMPALA_LLVM_DEBUG_VERSION=5.0.1-asserts-p1
 unset IMPALA_LLVM_DEBUG_URL
 export IMPALA_LZ4_VERSION=1.7.5
 unset IMPALA_LZ4_URL
-export IMPALA_OPENLDAP_VERSION=2.4.25
+export IMPALA_OPENLDAP_VERSION=2.4.47
 unset IMPALA_OPENLDAP_URL
 export IMPALA_OPENSSL_VERSION=1.0.2l
 unset IMPALA_OPENSSL_URL
-export IMPALA_ORC_VERSION=1.4.3-p2
+export IMPALA_ORC_VERSION=1.4.3-p3
 unset IMPALA_ORC_URL
 export IMPALA_PROTOBUF_VERSION=3.5.1
 unset IMPALA_PROTOBUF_URL
@@ -141,7 +141,7 @@ export IMPALA_TPC_DS_VERSION=2.1.0
 unset IMPALA_TPC_DS_URL
 export IMPALA_TPC_H_VERSION=2.17.0
 unset IMPALA_TPC_H_URL
-export IMPALA_THRIFT_VERSION=0.9.3-p4
+export IMPALA_THRIFT_VERSION=0.9.3-p5
 unset IMPALA_THRIFT_URL
 export IMPALA_THRIFT11_VERSION=0.11.0-p2
 unset IMPALA_THRIFT11_URL
@@ -595,7 +595,7 @@ if [[ -z "${KUDU_IS_SUPPORTED-}" ]]; then
       # Remove spaces, trim minor versions, and convert to lowercase.
       DISTRO_VERSION="$(tr -d ' \n' <<< "$DISTRO_VERSION" | cut -d. -f1 | tr "A-Z" "a-z")"
       case "$DISTRO_VERSION" in
-        centos6 | centos7 | debian8 | suselinux12 | suse12 | ubuntu16 )
+        centos6 | centos7 | debian8 | suselinux12 | suse12 | ubuntu16 | ubuntu18)
           USE_CDH_KUDU=true
           KUDU_IS_SUPPORTED=true;;
         ubuntu14 )
@@ -611,7 +611,7 @@ if $USE_CDH_KUDU; then
   export IMPALA_KUDU_VERSION=${IMPALA_KUDU_VERSION-"1.9.0-cdh6.x-SNAPSHOT"}
   export IMPALA_KUDU_HOME=${CDH_COMPONENTS_HOME}/kudu-$IMPALA_KUDU_VERSION
 else
-  export IMPALA_KUDU_VERSION=${IMPALA_KUDU_VERSION-"4ec2598"}
+  export IMPALA_KUDU_VERSION=${IMPALA_KUDU_VERSION-"5211897"}
   export IMPALA_KUDU_HOME=${IMPALA_TOOLCHAIN}/kudu-$IMPALA_KUDU_VERSION
 fi
 


[impala] 05/06: IMPALA-8199: Fix stress test: 'No module named RuntimeProfile.ttypes'

Posted by bh...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

bharathv pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit f0a47ab2ca6e5c19f74c55af67927f446785f23c
Author: Thomas Tauber-Marshall <tm...@cloudera.com>
AuthorDate: Tue Feb 12 21:15:09 2019 -0800

    IMPALA-8199: Fix stress test: 'No module named RuntimeProfile.ttypes'
    
    A recent commit (IMPALA-6964) broke the stress test because it added
    an import of a generated thrift value to a python file that is
    included by the stress test. The stress test is intended to be able to
    be run without doing a full build of Impala, but in this case the
    generated thrift isn't available, leading to an import error.
    
    The solution is to only import the thrift value in the function where
    it is used, which is not called by the stress test.
    
    Testing:
    - Ran the stress test manually without doing a full build and
      confirmed that it works now.
    
    Change-Id: I7a3bd26d743ef6603fabf92f904feb4677001da5
    Reviewed-on: http://gerrit.cloudera.org:8080/12472
    Reviewed-by: Thomas Marshall <th...@cmu.edu>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 tests/util/parse_util.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/util/parse_util.py b/tests/util/parse_util.py
index 5dbaff0..031cbdb 100644
--- a/tests/util/parse_util.py
+++ b/tests/util/parse_util.py
@@ -17,7 +17,6 @@
 
 import re
 from datetime import datetime
-from RuntimeProfile.ttypes import TSummaryStatsCounter
 
 # IMPALA-6715: Every so often the stress test or the TPC workload directories get
 # changed, and the stress test loses the ability to run the full set of queries. Set
@@ -148,6 +147,9 @@ def get_bytes_summary_stats_counter(counter_name, runtime_profile):
               summary_stats[0].max_value == 8192 and \
               summary_stats[0].total_num_values == 1
   """
+  # This requires the Thrift definitions to be generated. We limit the scope of the import
+  # to allow tools like the stress test to import this file without building Impala.
+  from RuntimeProfile.ttypes import TSummaryStatsCounter
 
   regex_summary_stat = re.compile(r"""\(
     Avg:[^\(]*\((?P<avg>[0-9]+)\)\s;\s # Matches Avg: [?].[?] [?]B (?)