You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by bo...@apache.org on 2022/02/02 09:12:43 UTC

[impala] branch master updated (b96439f -> 27a1b4c)

This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from b96439f  IMPALA-11078 Add simple CSP header to webui.
     new 5cd2b2b  IMPALA-11091: Update documentation of event polling
     new 504e0d0  IMPALA-11056: Create option to fail query on Java UDF exceptions
     new 5fd859e  IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
     new 27a1b4c  IMPALA-11101: Change visibility on HdfsTable.setAvroSchema()

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exprs/hive-udf-call-ir.cc                   |  8 +++--
 be/src/exprs/hive-udf-call.cc                      |  8 +++--
 be/src/runtime/runtime-state.h                     |  3 ++
 be/src/service/query-options.cc                    |  4 +++
 be/src/service/query-options.h                     |  4 ++-
 common/thrift/ImpalaService.thrift                 |  4 +++
 common/thrift/Query.thrift                         |  4 +++
 docs/topics/impala_metadata.xml                    |  7 ++--
 .../org/apache/impala/catalog/FeIcebergTable.java  | 39 ++++++++++++++++------
 .../java/org/apache/impala/catalog/HdfsTable.java  |  2 +-
 .../queries/QueryTest/java-udf.test                |  6 ++++
 11 files changed, 70 insertions(+), 19 deletions(-)

[impala] 03/04: IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load

Posted by bo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 5fd859ee0caefc9c9321cbdd669d9ed8ab89d43b
Author: Tamas Mate <tm...@cloudera.com>
AuthorDate: Fri Jan 28 16:05:55 2022 +0100

    IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
    
    Impala used the FileSystem.getFileStatus() on every Iceberg DataFile to
    create the FileDescriptors. This operation is redundant because there is
    an internal HdfsTable inside the IcebergTable which loads the
    FileDescriptors recursively as well earlier.
    
    This commit updates the loadAllPartition operation to reuse the
    HdfsTable's FileDescriptors.
    
    Testing:
     - Executed the Iceberg tests.
    
    Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
    Reviewed-on: http://gerrit.cloudera.org:8080/18160
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../org/apache/impala/catalog/FeIcebergTable.java  | 39 ++++++++++++++++------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java b/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
index 1cb632b..e52e91f 100644
--- a/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
@@ -501,20 +501,39 @@ public interface FeIcebergTable extends FeFsTable {
     }
 
     /**
-     * Get all FileDescriptor from iceberg table without any predicates.
+     * Returns the FileDescriptors loaded by the internal HdfsTable. To avoid returning
+     * the metadata files the resulset is limited to the files that are tracked by
+     * Iceberg. Both the HdfsBaseDir and the DataFile path can contain the scheme in their
+     * path, using org.apache.hadoop.fs.Path to normalize the paths.
      */
     public static Map<String, HdfsPartition.FileDescriptor> loadAllPartition(
-        FeIcebergTable table) throws IOException, TableLoadingException {
-      // Empty predicates
+        IcebergTable table) throws IOException, TableLoadingException {
+      Map<String, HdfsPartition.FileDescriptor> hdfsFileDescMap = new HashMap<>();
+      Collection<HdfsPartition> partitions =
+          ((HdfsTable)table.getFeFsTable()).partitionMap_.values();
+      for (HdfsPartition partition : partitions) {
+        for (FileDescriptor fileDesc : partition.getFileDescriptors()) {
+            Path path = new Path(table.getHdfsBaseDir() + Path.SEPARATOR +
+                fileDesc.getRelativePath());
+            hdfsFileDescMap.put(path.toUri().getPath(), fileDesc);
+        }
+      }
+      Map<String, HdfsPartition.FileDescriptor> fileDescMap = new HashMap<>();
       List<DataFile> dataFileList = IcebergUtil.getIcebergDataFiles(table,
           new ArrayList<>(), /*timeTravelSpecl=*/null);
-
-      Map<String, HdfsPartition.FileDescriptor> fileDescMap = new HashMap<>();
-      for (DataFile file : dataFileList) {
-        HdfsPartition.FileDescriptor fileDesc = getFileDescriptor(
-            new Path(file.path().toString()),
-            new Path(table.getIcebergTableLocation()), table.getHostIndex());
-        fileDescMap.put(IcebergUtil.getDataFilePathHash(file), fileDesc);
+      for (DataFile dataFile : dataFileList) {
+          Path path = new Path(dataFile.path().toString());
+          if (hdfsFileDescMap.containsKey(path.toUri().getPath())) {
+            String pathHash = IcebergUtil.getDataFilePathHash(dataFile);
+            fileDescMap.put(pathHash, hdfsFileDescMap.get(path.toUri().getPath()));
+          } else {
+            LOG.warn("Iceberg DataFile '{}' cannot be found in the HDFS recursive file "
+                + "listing results.", path.toString());
+            HdfsPartition.FileDescriptor fileDesc = getFileDescriptor(
+                new Path(dataFile.path().toString()),
+                new Path(table.getIcebergTableLocation()), table.getHostIndex());
+            fileDescMap.put(IcebergUtil.getDataFilePathHash(dataFile), fileDesc);
+          }
       }
       return fileDescMap;
     }

[impala] 01/04: IMPALA-11091: Update documentation of event polling

Posted by bo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 5cd2b2b35527e3d13c0f5dab8f526245f6b6ff15
Author: Vihang Karajgaonkar <vi...@apache.org>
AuthorDate: Wed Jan 26 11:35:54 2022 -0800

    IMPALA-11091: Update documentation of event polling
    
    IMPALA-8795 turns on event polling by default but the
    documentation still says that it is a preview feature.
    This change updates the documentation to say that the
    feature is GA and enabled by default since Impala 4.1
    
    Change-Id: Ife34b92cc1fdf4839071a888e389db69c0b4924f
    Reviewed-on: http://gerrit.cloudera.org:8080/18173
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Shajini Thayasingh <st...@cloudera.com>
    Reviewed-by: Vihang Karajgaonkar <vi...@cloudera.com>
---
 docs/topics/impala_metadata.xml | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/topics/impala_metadata.xml b/docs/topics/impala_metadata.xml
index 4ded6fa..181da6e 100644
--- a/docs/topics/impala_metadata.xml
+++ b/docs/topics/impala_metadata.xml
@@ -236,8 +236,8 @@ under the License.
       </p>
 
       <note>
-        This is a preview feature in <keyword keyref="impala33_full"/> and not generally
-        available.
+	      This is a preview feature in <keyword keyref="impala33_full"/> and <keyword keyref="impala40"/>
+	      It is generally available and enabled by default from <keyword keyref="impala41"/> onwards.
       </note>
 
       <ul>
@@ -356,7 +356,8 @@ under the License.
 <codeblock> &lt;property>
     &lt;name>hive.metastore.transactional.event.listeners&lt;/name>
     &lt;value>org.apache.hive.hcatalog.listener.DbNotificationListener&lt;/value>
-
+  &lt;/property>
+  &lt;property>
     &lt;name>hive.metastore.dml.events&lt;/name>
     &lt;value>true&lt;/true>
   &lt;/property></codeblock>

[impala] 02/04: IMPALA-11056: Create option to fail query on Java UDF exceptions

Posted by bo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 504e0d00121bb11b54bbd754d52c1f3f66d29d20
Author: Steve Carlin <sc...@cloudera.com>
AuthorDate: Wed Dec 8 11:37:53 2021 -0800

    IMPALA-11056: Create option to fail query on Java UDF exceptions
    
    This commit will create a new query option,
    "abort_java_udf_on_exception".  The current and default behavior
    is that when the Java UDF throws an exception, a warning is logged
    and the function returns NULL. If the query option is set to
    true, the query will fail.
    
    Change-Id: Ifece20cf16a6575f1c498238f754440e870e2ce9
    Reviewed-on: http://gerrit.cloudera.org:8080/18080
    Reviewed-by: Kurt Deschler <kd...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Aman Sinha <am...@cloudera.com>
---
 be/src/exprs/hive-udf-call-ir.cc                                  | 8 ++++++--
 be/src/exprs/hive-udf-call.cc                                     | 8 ++++++--
 be/src/runtime/runtime-state.h                                    | 3 +++
 be/src/service/query-options.cc                                   | 4 ++++
 be/src/service/query-options.h                                    | 4 +++-
 common/thrift/ImpalaService.thrift                                | 4 ++++
 common/thrift/Query.thrift                                        | 4 ++++
 .../workloads/functional-query/queries/QueryTest/java-udf.test    | 6 ++++++
 8 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/be/src/exprs/hive-udf-call-ir.cc b/be/src/exprs/hive-udf-call-ir.cc
index b4e88b7..99f1a2a 100644
--- a/be/src/exprs/hive-udf-call-ir.cc
+++ b/be/src/exprs/hive-udf-call-ir.cc
@@ -64,8 +64,12 @@ AnyVal* HiveUdfCall::CallJavaAndStoreResult(const ColumnType* type,
       std::stringstream ss;
       ss << "Hive UDF path=" << jni_ctx->hdfs_location << " class="
           << jni_ctx->scalar_fn_symbol << " failed due to: " << status.GetDetail();
-      fn_ctx->AddWarning(ss.str().c_str());
-      jni_ctx->warning_logged = true;
+      if (fn_ctx->impl()->state()->abort_java_udf_on_exception()) {
+        fn_ctx->SetError(ss.str().c_str());
+      } else {
+        fn_ctx->AddWarning(ss.str().c_str());
+        jni_ctx->warning_logged = true;
+      }
     }
     jni_ctx->output_anyval->is_null = true;
     return jni_ctx->output_anyval;
diff --git a/be/src/exprs/hive-udf-call.cc b/be/src/exprs/hive-udf-call.cc
index a200d37..b5e21c8 100644
--- a/be/src/exprs/hive-udf-call.cc
+++ b/be/src/exprs/hive-udf-call.cc
@@ -115,8 +115,12 @@ AnyVal* HiveUdfCall::Evaluate(ScalarExprEvaluator* eval, const TupleRow* row) co
       stringstream ss;
       ss << "Hive UDF path=" << fn_.hdfs_location << " class=" << fn_.scalar_fn.symbol
         << " failed due to: " << status.GetDetail();
-      fn_ctx->AddWarning(ss.str().c_str());
-      jni_ctx->warning_logged = true;
+      if (fn_ctx->impl()->state()->abort_java_udf_on_exception()) {
+        fn_ctx->SetError(ss.str().c_str());
+      } else {
+        fn_ctx->AddWarning(ss.str().c_str());
+        jni_ctx->warning_logged = true;
+      }
     }
     jni_ctx->output_anyval->is_null = true;
     return jni_ctx->output_anyval;
diff --git a/be/src/runtime/runtime-state.h b/be/src/runtime/runtime-state.h
index 948fb52..7d2d57b 100644
--- a/be/src/runtime/runtime-state.h
+++ b/be/src/runtime/runtime-state.h
@@ -104,6 +104,9 @@ class RuntimeState {
   bool strict_mode() const { return query_options().strict_mode; }
   bool utf8_mode() const { return query_options().utf8_mode; }
   bool decimal_v2() const { return query_options().decimal_v2; }
+  bool abort_java_udf_on_exception() const {
+     return query_options().abort_java_udf_on_exception;
+  }
   const TQueryCtx& query_ctx() const;
   const TPlanFragment& fragment() const { return *fragment_; }
   const TPlanFragmentInstanceCtx& instance_ctx() const { return *instance_ctx_; }
diff --git a/be/src/service/query-options.cc b/be/src/service/query-options.cc
index 3dab0d8..cc06645 100644
--- a/be/src/service/query-options.cc
+++ b/be/src/service/query-options.cc
@@ -1152,6 +1152,10 @@ Status impala::SetQueryOption(const string& key, const string& value,
         query_options->__set_enable_async_load_data_execution(IsTrue(value));
         break;
       }
+      case TImpalaQueryOptions::ABORT_JAVA_UDF_ON_EXCEPTION: {
+        query_options->__set_abort_java_udf_on_exception(IsTrue(value));
+        break;
+      }
       default:
         if (IsRemovedQueryOption(key)) {
           LOG(WARNING) << "Ignoring attempt to set removed query option '" << key << "'";
diff --git a/be/src/service/query-options.h b/be/src/service/query-options.h
index 14bd3ee..4c471c3 100644
--- a/be/src/service/query-options.h
+++ b/be/src/service/query-options.h
@@ -47,7 +47,7 @@ typedef std::unordered_map<string, beeswax::TQueryOptionLevel::type>
 // time we add or remove a query option to/from the enum TImpalaQueryOptions.
 #define QUERY_OPTS_TABLE\
   DCHECK_EQ(_TImpalaQueryOptions_VALUES_TO_NAMES.size(),\
-      TImpalaQueryOptions::PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT+ 1);\
+      TImpalaQueryOptions::ABORT_JAVA_UDF_ON_EXCEPTION + 1);\
   REMOVED_QUERY_OPT_FN(abort_on_default_limit_exceeded, ABORT_ON_DEFAULT_LIMIT_EXCEEDED)\
   QUERY_OPT_FN(abort_on_error, ABORT_ON_ERROR, TQueryOptionLevel::REGULAR)\
   REMOVED_QUERY_OPT_FN(allow_unsupported_formats, ALLOW_UNSUPPORTED_FORMATS)\
@@ -269,6 +269,8 @@ typedef std::unordered_map<string, beeswax::TQueryOptionLevel::type>
       PARQUET_LATE_MATERIALIZATION_THRESHOLD, TQueryOptionLevel::ADVANCED)\
   QUERY_OPT_FN(parquet_dictionary_runtime_filter_entry_limit,\
       PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT, TQueryOptionLevel::ADVANCED)\
+  QUERY_OPT_FN(abort_java_udf_on_exception,\
+      ABORT_JAVA_UDF_ON_EXCEPTION, TQueryOptionLevel::ADVANCED)\
   ;
 
 /// Enforce practical limits on some query options to avoid undesired query state.
diff --git a/common/thrift/ImpalaService.thrift b/common/thrift/ImpalaService.thrift
index d16edc2..e2a0705 100644
--- a/common/thrift/ImpalaService.thrift
+++ b/common/thrift/ImpalaService.thrift
@@ -714,6 +714,10 @@ enum TImpalaQueryOptions {
   // enable runtime filtering on the row group. For example, 2 means that runtime filter
   // will be evaluated when the dictionary size is smaller or equal to 2.
   PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT = 139;
+
+  // Abort the Java UDF if an exception is thrown. Default is that only a
+  // warning will be logged if the Java UDF throws an exception.
+  ABORT_JAVA_UDF_ON_EXCEPTION = 140;
 }
 
 // The summary of a DML statement.
diff --git a/common/thrift/Query.thrift b/common/thrift/Query.thrift
index fa2f187..0313b85 100644
--- a/common/thrift/Query.thrift
+++ b/common/thrift/Query.thrift
@@ -567,6 +567,10 @@ struct TQueryOptions {
   // enable runtime filtering on the row group. For example, 2 means that runtime filter
   // will be evaluated when the dictionary size is smaller or equal to 2.
   140: optional i32 parquet_dictionary_runtime_filter_entry_limit = 1024;
+
+  // Abort the Java UDF if an exception is thrown. Default is that only a
+  // warning will be logged if the Java UDF throws an exception.
+  141: optional bool abort_java_udf_on_exception = false;
 }
 
 // Impala currently has three types of sessions: Beeswax, HiveServer2 and external
diff --git a/testdata/workloads/functional-query/queries/QueryTest/java-udf.test b/testdata/workloads/functional-query/queries/QueryTest/java-udf.test
index 152ba79..82a7fe9 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/java-udf.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/java-udf.test
@@ -104,6 +104,12 @@ boolean
 NULL
 ====
 ---- QUERY
+set abort_java_udf_on_exception=true;
+select throws_exception() from functional.alltypestiny;
+---- CATCH
+Test exception
+====
+---- QUERY
 select throws_exception() from functional.alltypestiny;
 ---- TYPES
 boolean

[impala] 04/04: IMPALA-11101: Change visibility on HdfsTable.setAvroSchema()

Posted by bo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 27a1b4c1203fd1fc7929d23659eed0861703e9e1
Author: Steve Carlin <sc...@cloudera.com>
AuthorDate: Tue Feb 1 11:12:34 2022 -0800

    IMPALA-11101: Change visibility on HdfsTable.setAvroSchema()
    
    This changes the visibility of HdfsTable.setAvroSchema for use by
    a derived class in an external frontend.
    
    Tested manually by querying an Avro table.
    
    Change-Id: I2a8a87c4c3ab9240d768ec18be316dab4b23ebde
    Reviewed-on: http://gerrit.cloudera.org:8080/18189
    Reviewed-by: Aman Sinha <am...@cloudera.com>
    Tested-by: Aman Sinha <am...@cloudera.com>
---
 fe/src/main/java/org/apache/impala/catalog/HdfsTable.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
index 35c8ce5..a9d07d4 100644
--- a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
@@ -1730,7 +1730,7 @@ public class HdfsTable extends Table implements FeFsTable {
    * as Avro. Additionally, this method also reconciles the schema if the column
    * definitions from the metastore differ from the Avro schema.
    */
-  private void setAvroSchema(IMetaStoreClient client,
+  protected void setAvroSchema(IMetaStoreClient client,
       org.apache.hadoop.hive.metastore.api.Table msTbl) throws Exception {
     Preconditions.checkState(isSchemaLoaded_);
     String inputFormat = msTbl.getSd().getInputFormat();