You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2018/03/12 22:01:06 UTC

[1/6] impala git commit: IMPALA-6394: Restart HDFS when blocks are under replicated

Repository: impala
Updated Branches:
  refs/heads/master bbf929a1e -> 532a0d7af


IMPALA-6394: Restart HDFS when blocks are under replicated

HDFS sometimes fails to fully replicate all the blocks in 30 seconds
and no progress is made. This patch tries to restart HDFS several times
before aborting the data loading.

Change-Id: Iefd4c2fc6c287f054e385de52bdc42b0bdbd7915
Reviewed-on: http://gerrit.cloudera.org:8080/9469
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/c7a58b8a
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/c7a58b8a
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/c7a58b8a

Branch: refs/heads/master
Commit: c7a58b8a7374b7cb1aacf6024deb861c79fe5a14
Parents: bbf929a
Author: Tianyi Wang <tw...@cloudera.com>
Authored: Fri Mar 2 14:13:49 2018 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Mar 9 22:54:47 2018 +0000

----------------------------------------------------------------------
 testdata/bin/create-load-data.sh | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/c7a58b8a/testdata/bin/create-load-data.sh
----------------------------------------------------------------------
diff --git a/testdata/bin/create-load-data.sh b/testdata/bin/create-load-data.sh
index 404bdfe..787baca 100755
--- a/testdata/bin/create-load-data.sh
+++ b/testdata/bin/create-load-data.sh
@@ -450,20 +450,26 @@ function copy-and-load-ext-data-source {
 }
 
 function wait-hdfs-replication {
-  FAIL_COUNT=0
-  while [[ "$FAIL_COUNT" -ne "6" ]] ; do
+  MAX_RETRIES=6
+  for ((RESTART_COUNT = 0; RESTART_COUNT <= MAX_RETRIES; ++RESTART_COUNT)); do
+    sleep "$((RESTART_COUNT * 10))"
     FSCK_OUTPUT="$(hdfs fsck /test-warehouse)"
     echo "$FSCK_OUTPUT"
     if grep "Under-replicated blocks:[[:space:]]*0" <<< "$FSCK_OUTPUT"; then
+      # All the blocks are fully-replicated. The data loading can continue.
       return
     fi
-    let FAIL_COUNT="$FAIL_COUNT"+1
-    sleep 5
+    if [[ "$RESTART_COUNT" -eq "$MAX_RETRIES" ]] ; then
+      echo "Some HDFS blocks are still under-replicated after restarting HDFS"\
+          "$MAX_RETRIES times."
+      echo "Some tests cannot pass without fully-replicated blocks (IMPALA-3887)."
+      echo "Failing the data loading."
+      exit 1
+    fi
+    echo "There are under-replicated blocks in HDFS. Attempting to restart HDFS to"\
+        "resolve this issue."
+    ${IMPALA_HOME}/testdata/bin/run-mini-dfs.sh
   done
-  echo "Some HDFS blocks are still under replicated after 30s."
-  echo "Some tests cannot pass without fully replicated blocks (IMPALA-3887)."
-  echo "Failing the data loading."
-  exit 1
 }
 
 # For kerberized clusters, use kerberos


[3/6] impala git commit: IMPALA-6629: Clean up catalog update logging

Posted by ta...@apache.org.
IMPALA-6629: Clean up catalog update logging

IMPALA-5990 introduced some redundant and unclear logging in the process
of assembling and sending catalog updates. This patch removes the
duplication, rewords some logs, and adds a log message when a catalog
update is fully assembled.

Change-Id: Iaa096b8c84304f28b37ac5e6794d688ba0a949a7
Reviewed-on: http://gerrit.cloudera.org:8080/9566
Reviewed-by: Tianyi Wang <tw...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/4e12ba6b
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/4e12ba6b
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/4e12ba6b

Branch: refs/heads/master
Commit: 4e12ba6ba563510addad3e2766aa32188b1e5ea9
Parents: e096233
Author: Tianyi Wang <tw...@cloudera.com>
Authored: Thu Mar 8 16:38:57 2018 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Sat Mar 10 00:42:36 2018 +0000

----------------------------------------------------------------------
 be/src/catalog/catalog-server.cc                | 21 +++++++++++---------
 be/src/catalog/catalog-server.h                 |  4 ++--
 be/src/service/fe-support.cc                    | 10 +++++-----
 .../java/org/apache/impala/catalog/Catalog.java |  4 +++-
 .../impala/catalog/CatalogServiceCatalog.java   |  8 ++------
 .../org/apache/impala/service/FeSupport.java    |  2 +-
 6 files changed, 25 insertions(+), 24 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/4e12ba6b/be/src/catalog/catalog-server.cc
----------------------------------------------------------------------
diff --git a/be/src/catalog/catalog-server.cc b/be/src/catalog/catalog-server.cc
index 2f4bcbe..8a91c25 100644
--- a/be/src/catalog/catalog-server.cc
+++ b/be/src/catalog/catalog-server.cc
@@ -233,16 +233,18 @@ void CatalogServer::UpdateCatalogTopicCallback(
   // to reload the full catalog.
   if (delta.from_version == 0 && catalog_objects_min_version_ != 0) {
     last_sent_catalog_version_ = 0L;
-  } else {
+  } else if (!pending_topic_updates_.empty()) {
     // Process the pending topic update.
-    LOG_EVERY_N(INFO, 300) << "Catalog Version: " << catalog_objects_max_version_
-                           << " Last Catalog Version: " << last_sent_catalog_version_;
-
     subscriber_topic_updates->emplace_back();
     TTopicDelta& update = subscriber_topic_updates->back();
     update.topic_name = IMPALA_CATALOG_TOPIC;
     update.topic_entries = std::move(pending_topic_updates_);
 
+    VLOG(1) << "A catalog update with " << update.topic_entries.size()
+            << " entries is assembled. Catalog version: "
+            << catalog_objects_max_version_ << " Last sent catalog version: "
+            << last_sent_catalog_version_;
+
     // Update the new catalog version and the set of known catalog objects.
     last_sent_catalog_version_ = catalog_objects_max_version_;
   }
@@ -457,8 +459,8 @@ void CatalogServer::TableMetricsUrlCallback(const Webserver::ArgumentMap& args,
   }
 }
 
-bool CatalogServer::AddPendingTopicItem(std::string key, const uint8_t* item_data,
-    uint32_t size, bool deleted) {
+bool CatalogServer::AddPendingTopicItem(std::string key, int64_t version,
+    const uint8_t* item_data, uint32_t size, bool deleted) {
   pending_topic_updates_.emplace_back();
   TTopicItem& item = pending_topic_updates_.back();
   if (FLAGS_compact_catalog_topic) {
@@ -474,8 +476,9 @@ bool CatalogServer::AddPendingTopicItem(std::string key, const uint8_t* item_dat
   }
   item.key = std::move(key);
   item.deleted = deleted;
-  VLOG(1) << "Publishing " << (deleted ? "deletion: " : "update: ") << item.key <<
-      " original size: " << size << (FLAGS_compact_catalog_topic ?
-      Substitute(" compressed size: $0", item.value.size()) : string());
+  VLOG(1) << "Collected " << (deleted ? "deletion: " : "update: ") << item.key
+          << ", version=" << version << ", original size=" << size
+          << (FLAGS_compact_catalog_topic ?
+              Substitute(", compressed size=$0", item.value.size()) : string());
   return true;
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/4e12ba6b/be/src/catalog/catalog-server.h
----------------------------------------------------------------------
diff --git a/be/src/catalog/catalog-server.h b/be/src/catalog/catalog-server.h
index a6a0c3f..2fa8ce7 100644
--- a/be/src/catalog/catalog-server.h
+++ b/be/src/catalog/catalog-server.h
@@ -76,8 +76,8 @@ class CatalogServer {
 
   /// Add a topic item to pending_topic_updates_. Caller must hold catalog_lock_.
   /// The return value is true if the operation succeeds and false otherwise.
-  bool AddPendingTopicItem(std::string key, const uint8_t* item_data, uint32_t size,
-      bool deleted);
+  bool AddPendingTopicItem(std::string key, int64_t version, const uint8_t* item_data,
+      uint32_t size, bool deleted);
 
  private:
   /// Thrift API implementation which proxies requests onto this CatalogService.

http://git-wip-us.apache.org/repos/asf/impala/blob/4e12ba6b/be/src/service/fe-support.cc
----------------------------------------------------------------------
diff --git a/be/src/service/fe-support.cc b/be/src/service/fe-support.cc
index 2d48d73..9d59883 100644
--- a/be/src/service/fe-support.cc
+++ b/be/src/service/fe-support.cc
@@ -428,7 +428,7 @@ Java_org_apache_impala_service_FeSupport_NativeLookupSymbol(
 extern "C"
 JNIEXPORT jboolean JNICALL
 Java_org_apache_impala_service_FeSupport_NativeAddPendingTopicItem(JNIEnv* env,
-    jclass caller_class, jlong native_catalog_server_ptr, jstring key,
+    jclass caller_class, jlong native_catalog_server_ptr, jstring key, jlong version,
     jbyteArray serialized_object, jboolean deleted) {
   std::string key_string;
   {
@@ -442,9 +442,9 @@ Java_org_apache_impala_service_FeSupport_NativeAddPendingTopicItem(JNIEnv* env,
   if (!JniScopedArrayCritical::Create(env, serialized_object, &obj_buf)) {
     return static_cast<jboolean>(false);
   }
-  reinterpret_cast<CatalogServer*>(native_catalog_server_ptr)->AddPendingTopicItem(
-      std::move(key_string), obj_buf.get(), static_cast<uint32_t>(obj_buf.size()),
-      deleted);
+  reinterpret_cast<CatalogServer*>(native_catalog_server_ptr)->
+      AddPendingTopicItem(std::move(key_string), version, obj_buf.get(),
+      static_cast<uint32_t>(obj_buf.size()), deleted);
   return static_cast<jboolean>(true);
 }
 
@@ -575,7 +575,7 @@ static JNINativeMethod native_methods[] = {
       (void*)::Java_org_apache_impala_service_FeSupport_NativeParseQueryOptions
   },
   {
-      (char*)"NativeAddPendingTopicItem", (char*)"(JLjava/lang/String;[BZ)Z",
+      (char*)"NativeAddPendingTopicItem", (char*)"(JLjava/lang/String;J[BZ)Z",
       (void*)::Java_org_apache_impala_service_FeSupport_NativeAddPendingTopicItem
   },
   {

http://git-wip-us.apache.org/repos/asf/impala/blob/4e12ba6b/fe/src/main/java/org/apache/impala/catalog/Catalog.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/catalog/Catalog.java b/fe/src/main/java/org/apache/impala/catalog/Catalog.java
index 0cd1eda..bee5be5 100644
--- a/fe/src/main/java/org/apache/impala/catalog/Catalog.java
+++ b/fe/src/main/java/org/apache/impala/catalog/Catalog.java
@@ -35,6 +35,7 @@ import org.apache.impala.util.PatternMatcher;
 import com.google.common.base.Joiner;
 import com.google.common.base.Preconditions;
 import com.google.common.collect.Lists;
+import org.apache.impala.util.TUniqueIdUtil;
 
 /**
  * Thread safe interface for reading and updating metadata stored in the Hive MetaStore.
@@ -574,7 +575,8 @@ public abstract class Catalog {
       case DATA_SOURCE:
         return "DATA_SOURCE:" + catalogObject.getData_source().getName().toLowerCase();
       case CATALOG:
-        return "CATALOG:" + catalogObject.getCatalog().catalog_service_id;
+        return "CATALOG:" +
+            TUniqueIdUtil.PrintId(catalogObject.getCatalog().catalog_service_id);
       default:
         throw new IllegalStateException(
             "Unsupported catalog object type: " + catalogObject.getType());

http://git-wip-us.apache.org/repos/asf/impala/blob/4e12ba6b/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java b/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
index 7ea821c..5bef242 100644
--- a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
+++ b/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
@@ -422,12 +422,8 @@ public class CatalogServiceCatalog extends Catalog {
       // TODO: TSerializer.serialize() returns a copy of the internal byte array, which
       // could be elided.
       byte[] data = serializer.serialize(obj);
-      if (LOG.isDebugEnabled()) {
-        LOG.debug("Collected catalog " + (delete ? "deletion: " : "update: ") + key +
-            " version: " + obj.catalog_version);
-      }
-      if (!FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr, key, data, delete))
-      {
+      if (!FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr, key,
+          obj.catalog_version, data, delete)) {
         LOG.error("NativeAddPendingTopicItem failed in BE. key=" + key + ", delete="
             + delete + ", data_size=" + data.length);
       }

http://git-wip-us.apache.org/repos/asf/impala/blob/4e12ba6b/fe/src/main/java/org/apache/impala/service/FeSupport.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/service/FeSupport.java b/fe/src/main/java/org/apache/impala/service/FeSupport.java
index d4311ff..16fdd22 100644
--- a/fe/src/main/java/org/apache/impala/service/FeSupport.java
+++ b/fe/src/main/java/org/apache/impala/service/FeSupport.java
@@ -86,7 +86,7 @@ public class FeSupport {
   // 'serializationBuffer' is a serialized TCatalogObject.
   // The return value is true if the operation succeeds and false otherwise.
   public native static boolean NativeAddPendingTopicItem(long nativeCatalogServerPtr,
-      String key, byte[] serializationBuffer, boolean deleted);
+      String key, long version, byte[] serializationBuffer, boolean deleted);
 
   // Get a catalog object update from the backend. A pair of isDeletion flag and
   // serialized TCatalogObject is returned.


[2/6] impala git commit: IMPALA-6240: [DOCS] Document PARQUET_ARRAY_RESOLUTION query option

Posted by ta...@apache.org.
IMPALA-6240: [DOCS] Document PARQUET_ARRAY_RESOLUTION query option

Cherry-picks: not for 2.x
Change-Id: I12696b609609ea16c05d8b7e84b2bae0be6d6cb5
Reviewed-on: http://gerrit.cloudera.org:8080/9534
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/e096233a
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/e096233a
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/e096233a

Branch: refs/heads/master
Commit: e096233a25d65c1078a153630547f9551f1cf15e
Parents: c7a58b8
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Wed Mar 7 12:08:44 2018 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Mar 9 23:37:57 2018 +0000

----------------------------------------------------------------------
 docs/impala.ditamap                             |   1 +
 docs/impala_keydefs.ditamap                     |   1 +
 docs/topics/impala_parquet_array_resolution.xml | 206 +++++++++++++++++++
 3 files changed, 208 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/e096233a/docs/impala.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index d9a42d7..0f010a2 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -207,6 +207,7 @@ under the License.
           <topicref rev="2.5.0" href="topics/impala_optimize_partition_key_scans.xml"/>
           <topicref href="topics/impala_parquet_compression_codec.xml"/>
           <topicref rev="2.6.0 IMPALA-2069" href="topics/impala_parquet_annotate_strings_utf8.xml"/>
+          <topicref rev="2.9.0 IMPALA-4725" href="topics/impala_parquet_array_resolution.xml"/>
           <topicref rev="2.6.0 IMPALA-2835" href="topics/impala_parquet_fallback_schema_resolution.xml"/>
           <topicref href="topics/impala_parquet_file_size.xml"/>
           <topicref rev="2.6.0 IMPALA-3286" href="topics/impala_prefetch_mode.xml"/>

http://git-wip-us.apache.org/repos/asf/impala/blob/e096233a/docs/impala_keydefs.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index 81881a8..35dd0b8 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -10798,6 +10798,7 @@ under the License.
   <keydef href="topics/impala_optimize_partition_key_scans.xml" keys="optimize_partition_key_scans"/>
   <keydef href="topics/impala_parquet_compression_codec.xml" keys="parquet_compression_codec"/>
   <keydef href="topics/impala_parquet_annotate_strings_utf8.xml" keys="parquet_annotate_strings_utf8"/>
+  <keydef href="topics/impala_parquet_array_resolution.xml" keys="parquet_array_resolution"/>
   <keydef href="topics/impala_parquet_fallback_schema_resolution.xml" keys="parquet_fallback_schema_resolution"/>
   <keydef href="topics/impala_parquet_file_size.xml" keys="parquet_file_size"/>
   <keydef href="topics/impala_prefetch_mode.xml" keys="prefetch_mode"/>

http://git-wip-us.apache.org/repos/asf/impala/blob/e096233a/docs/topics/impala_parquet_array_resolution.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_parquet_array_resolution.xml b/docs/topics/impala_parquet_array_resolution.xml
new file mode 100644
index 0000000..62c78d2
--- /dev/null
+++ b/docs/topics/impala_parquet_array_resolution.xml
@@ -0,0 +1,206 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="parquet_array_resolution" rev="2.9.0 IMPALA-4725">
+
+  <title>
+    PARQUET_ARRAY_RESOLUTION Query Option (<keyword keyref="impala29"/> or higher only)
+  </title>
+
+  <titlealts audience="PDF">
+
+    <navtitle>PARQUET_ARRAY_RESOLUTION</navtitle>
+
+  </titlealts>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Parquet"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="parquet_array_resolution">
+      The <codeph>PARQUET_ARRAY_RESOLUTION</codeph> query option controls the
+      behavior of the indexed-based resolution for nested arrays in Parquet.
+    </p>
+
+    <p>
+      In Parquet, you can represent an array using a 2-level or 3-level
+      representation. The modern, standard representation is 3-level. The legacy
+      2-level scheme is supported for compatibility with older Parquet files.
+      However, there is no reliable metadata within Parquet files to indicate
+      which encoding was used. It is even possible to have mixed encodings within
+      the same file if there are multiple arrays. The
+      <codeph>PARQUET_ARRAY_RESOLTUTION</codeph> option controls the process of
+      resolution that is to match every column/field reference from a query to a
+      column in the Parquet file.</p>
+
+    <p>
+      The supported values for the query option are:
+    </p>
+
+    <ul>
+      <li>
+        <codeph>THREE_LEVEL</codeph>: Assumes arrays are encoded with the 3-level
+        representation, and does not attempt the 2-level resolution.
+      </li>
+
+      <li>
+        <codeph>TWO_LEVEL</codeph>: Assumes arrays are encoded with the 2-level
+        representation, and does not attempt the 3-level resolution.
+      </li>
+
+      <li>
+        <codeph>TWO_LEVEL_THEN_THREE_LEVEL</codeph>: First tries to resolve
+        assuming a 2-level representation, and if unsuccessful, tries a 3-level
+        representation.
+      </li>
+    </ul>
+
+    <p>
+      All of the above options resolve arrays encoded with a single level.
+    </p>
+
+    <p>
+      A failure to resolve a column/field reference in a query with a given array
+      resolution policy does not necessarily result in a warning or error returned
+      by the query. A mismatch might be treated like a missing column (returns
+      NULL values), and it is not possible to reliably distinguish the 'bad
+      resolution' and 'legitimately missing column' cases.
+    </p>
+
+    <p>
+      The name-based policy generally does not have the problem of ambiguous
+      array representations. You specify to use the name-based policy by setting
+      the <codeph>PARQUET_FALLBACK_SCHEMA_RESOLUTION</codeph> query option to
+      <codeph>NAME</codeph>.
+    </p>
+
+    <p>
+      <b>Type:</b> Enum of <codeph>ONE_LEVEL</codeph>, <codeph>TWO_LEVEL</codeph>,
+      <codeph>THREE_LEVEL</codeph>
+    </p>
+
+    <p>
+      <b>Default:</b> <codeph>THREE_LEVEL</codeph>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_290"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      EXAMPLE A: The following Parquet schema of a file can be interpreted as a
+      2-level or 3-level:
+    </p>
+
+<codeblock>
+ParquetSchemaExampleA {
+  optional group single_element_groups (LIST) {
+    repeated group single_element_group {
+      required int64 count;
+    }
+  }
+}
+</codeblock>
+
+    <p>
+      The following table schema corresponds to a 2-level interpretation:
+    </p>
+
+<codeblock>
+CREATE TABLE t (col1 array&lt;struct&lt;f1: bigint>>) STORED AS PARQUET;
+</codeblock>
+
+    <p>
+      Successful query with a 2-level interpretation:
+    </p>
+
+<codeblock>
+SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL;
+SELECT ITEM.f1 FROM t.col1;
+</codeblock>
+
+    <p>
+      The following table schema corresponds to a 3-level interpretation:
+    </p>
+
+<codeblock>
+CREATE TABLE t (col1 array&lt;bigint>) STORED AS PARQUET;
+</codeblock>
+
+    <p>
+      Successful query with a 3-level interpretation:
+    </p>
+
+<codeblock>
+SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL;
+SELECT ITEM FROM t.col1
+</codeblock>
+
+    <p>
+      EXAMPLE B: The following Parquet schema of a file can be only be successfully
+      interpreted as a 2-level:
+    </p>
+
+<codeblock>
+ParquetSchemaExampleB {
+  required group list_of_ints (LIST) {
+    repeated int32 list_of_ints_tuple;
+  }
+}
+</codeblock>
+
+    <p>
+      The following table schema corresponds to a 2-level interpretation:
+    </p>
+
+<codeblock>
+CREATE TABLE t (col1 array&lt;int>) STORED AS PARQUET;
+</codeblock>
+
+    <p>
+      Successful query with a 2-level interpretation:
+    </p>
+
+<codeblock>
+SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL;
+SELECT ITEM FROM t.col1
+</codeblock>
+
+    <p>
+      Unsuccessful query with a 3-level interpretation. The query returns
+      <codeph>NULL</codeph>s as if the column was missing in the file:
+    </p>
+
+<codeblock>
+SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL;
+SELECT ITEM FROM t.col1
+</codeblock>
+
+  </conbody>
+
+</concept>


[6/6] impala git commit: IMPALA-3040: add logging to test_caching_ddl

Posted by ta...@apache.org.
IMPALA-3040: add logging to test_caching_ddl

We don't currently have enough information to reconstruct why the test
failed, so lets add logging with timestamps to understand which timeout
we're actually hitting.

Change-Id: Iabc30445440e0fb358856da407d833f5ae975213
Reviewed-on: http://gerrit.cloudera.org:8080/9579
Reviewed-by: Thomas Tauber-Marshall <tm...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/532a0d7a
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/532a0d7a
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/532a0d7a

Branch: refs/heads/master
Commit: 532a0d7afcb56217d2aae45c1c97890d6788251b
Parents: d3c115a
Author: Tim Armstrong <ta...@cloudera.com>
Authored: Mon Mar 12 11:03:31 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Mon Mar 12 21:53:10 2018 +0000

----------------------------------------------------------------------
 tests/query_test/test_hdfs_caching.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/532a0d7a/tests/query_test/test_hdfs_caching.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_hdfs_caching.py b/tests/query_test/test_hdfs_caching.py
index 6094142..c013ed4 100644
--- a/tests/query_test/test_hdfs_caching.py
+++ b/tests/query_test/test_hdfs_caching.py
@@ -24,7 +24,7 @@ from subprocess import check_call
 
 from tests.common.environ import specific_build_type_timeout
 from tests.common.impala_cluster import ImpalaCluster
-from tests.common.impala_test_suite import ImpalaTestSuite
+from tests.common.impala_test_suite import ImpalaTestSuite, LOG
 from tests.common.skip import SkipIfS3, SkipIfADLS, SkipIfIsilon, SkipIfLocal
 from tests.common.test_dimensions import create_single_exec_option_dimension
 from tests.util.filesystem_utils import get_fs_path
@@ -319,10 +319,14 @@ def get_num_cache_requests():
   max_num_stabilization_attempts = 10
   new_requests = None
   num_requests = None
+  LOG.info("{0} Entered get_num_cache_requests()".format(time.time()))
   while num_stabilization_attempts < max_num_stabilization_attempts:
     new_requests = get_num_cache_requests_util()
     if new_requests == num_requests: break
+    LOG.info("{0} Waiting to stabilise: num_requests={1} new_requests={2}".format(
+        time.time(), num_requests, new_requests))
     num_requests = new_requests
     num_stabilization_attempts = num_stabilization_attempts + 1
     time.sleep(wait_time_in_sec)
+  LOG.info("{0} Final num requests: {1}".format(time.time(), num_requests))
   return num_requests


[5/6] impala git commit: Fix test dimensions in test_errorlog.py

Posted by ta...@apache.org.
Fix test dimensions in test_errorlog.py

Bug: The test dimensions were set up such that
the test ran for all text-format variants
(including compressed text)

The test is file-format independent and slow (>40s),
so we should only run it once.

Change-Id: Icac90d308337f2cfb51e7de5bd23d410da073a73
Reviewed-on: http://gerrit.cloudera.org:8080/9546
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/d3c115ae
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/d3c115ae
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/d3c115ae

Branch: refs/heads/master
Commit: d3c115ae9aefac6317a2b9b96745702c43bce804
Parents: 6c264d8
Author: Alex Behm <al...@cloudera.com>
Authored: Wed Mar 7 16:27:32 2018 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Sat Mar 10 01:33:03 2018 +0000

----------------------------------------------------------------------
 tests/query_test/test_errorlog.py | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/d3c115ae/tests/query_test/test_errorlog.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_errorlog.py b/tests/query_test/test_errorlog.py
index 0580764..258dba0 100644
--- a/tests/query_test/test_errorlog.py
+++ b/tests/query_test/test_errorlog.py
@@ -20,7 +20,9 @@
 #
 from tests.beeswax.impala_beeswax import ImpalaBeeswaxException
 from tests.common.impala_test_suite import ImpalaTestSuite
-from tests.common.test_dimensions import create_exec_option_dimension
+from tests.common.test_dimensions import (
+    create_single_exec_option_dimension,
+    create_uncompressed_text_dimension)
 from time import sleep
 
 # Test injecting error logs in prepare phase and status::OK(). This tests one of race
@@ -33,10 +35,9 @@ class TestErrorLogs(ImpalaTestSuite):
   @classmethod
   def add_test_dimensions(cls):
     super(TestErrorLogs, cls).add_test_dimensions()
-    cls.ImpalaTestMatrix.add_constraint(lambda v:\
-        v.get_value('table_format').file_format == 'text')
-    cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension(
-        cluster_sizes=[0], disable_codegen_options=[False], batch_sizes=[0]))
+    cls.ImpalaTestMatrix.add_dimension(create_single_exec_option_dimension())
+    cls.ImpalaTestMatrix.add_dimension(
+        create_uncompressed_text_dimension(cls.get_workload()))
 
   def test_errorlog(self, vector):
     query = 'select count(*) from tpch.lineitem;'


[4/6] impala git commit: IMPALA-6627: [DOCS] Hive incompatibility with serialization.null.format

Posted by ta...@apache.org.
IMPALA-6627: [DOCS] Hive incompatibility with serialization.null.format

Change-Id: Ic412c2fe2eba03d4493defee497656d6c74936f3
Reviewed-on: http://gerrit.cloudera.org:8080/9563
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/6c264d8e
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/6c264d8e
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/6c264d8e

Branch: refs/heads/master
Commit: 6c264d8eb8b0973bc50fe32c0a18f6c011a00028
Parents: 4e12ba6
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Thu Mar 8 19:11:50 2018 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Sat Mar 10 01:31:13 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_langref_unsupported.xml | 8 ++++++++
 1 file changed, 8 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/6c264d8e/docs/topics/impala_langref_unsupported.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_langref_unsupported.xml b/docs/topics/impala_langref_unsupported.xml
index e1bcc68..effb1a8 100644
--- a/docs/topics/impala_langref_unsupported.xml
+++ b/docs/topics/impala_langref_unsupported.xml
@@ -185,6 +185,14 @@ under the License.
           with an Impala table.
         </li>
       </ul>
+      <p>
+        Impala respects the <codeph>serialization.null.format</codeph> table
+        property only for TEXT tables and ignores the property for Parquet and
+        other formats. Hive respects the <codeph>serialization.null.format</codeph>
+        property for Parquet and other formats and converts matching values
+        to NULL during the scan. See <xref keyref="text_data_files"/> for
+        using the table property in Impala.
+      </p>
     </conbody>
   </concept>