You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by bo...@apache.org on 2022/02/02 09:12:43 UTC
[impala] branch master updated (b96439f -> 27a1b4c)
This is an automated email from the ASF dual-hosted git repository.
boroknagyz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.
from b96439f IMPALA-11078 Add simple CSP header to webui.
new 5cd2b2b IMPALA-11091: Update documentation of event polling
new 504e0d0 IMPALA-11056: Create option to fail query on Java UDF exceptions
new 5fd859e IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
new 27a1b4c IMPALA-11101: Change visibility on HdfsTable.setAvroSchema()
The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
be/src/exprs/hive-udf-call-ir.cc | 8 +++--
be/src/exprs/hive-udf-call.cc | 8 +++--
be/src/runtime/runtime-state.h | 3 ++
be/src/service/query-options.cc | 4 +++
be/src/service/query-options.h | 4 ++-
common/thrift/ImpalaService.thrift | 4 +++
common/thrift/Query.thrift | 4 +++
docs/topics/impala_metadata.xml | 7 ++--
.../org/apache/impala/catalog/FeIcebergTable.java | 39 ++++++++++++++++------
.../java/org/apache/impala/catalog/HdfsTable.java | 2 +-
.../queries/QueryTest/java-udf.test | 6 ++++
11 files changed, 70 insertions(+), 19 deletions(-)
[impala] 03/04: IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
Posted by bo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit 5fd859ee0caefc9c9321cbdd669d9ed8ab89d43b
Author: Tamas Mate <tm...@cloudera.com>
AuthorDate: Fri Jan 28 16:05:55 2022 +0100
IMPALA-11076: Reuse FDs loaded by HdfsTable during IcebergTable load
Impala used the FileSystem.getFileStatus() on every Iceberg DataFile to
create the FileDescriptors. This operation is redundant because there is
an internal HdfsTable inside the IcebergTable which loads the
FileDescriptors recursively as well earlier.
This commit updates the loadAllPartition operation to reuse the
HdfsTable's FileDescriptors.
Testing:
- Executed the Iceberg tests.
Change-Id: Id49daec60237dd6c8cce92c00a09fd9a6e6479b4
Reviewed-on: http://gerrit.cloudera.org:8080/18160
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
.../org/apache/impala/catalog/FeIcebergTable.java | 39 ++++++++++++++++------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java b/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
index 1cb632b..e52e91f 100644
--- a/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
@@ -501,20 +501,39 @@ public interface FeIcebergTable extends FeFsTable {
}
/**
- * Get all FileDescriptor from iceberg table without any predicates.
+ * Returns the FileDescriptors loaded by the internal HdfsTable. To avoid returning
+ * the metadata files the resulset is limited to the files that are tracked by
+ * Iceberg. Both the HdfsBaseDir and the DataFile path can contain the scheme in their
+ * path, using org.apache.hadoop.fs.Path to normalize the paths.
*/
public static Map<String, HdfsPartition.FileDescriptor> loadAllPartition(
- FeIcebergTable table) throws IOException, TableLoadingException {
- // Empty predicates
+ IcebergTable table) throws IOException, TableLoadingException {
+ Map<String, HdfsPartition.FileDescriptor> hdfsFileDescMap = new HashMap<>();
+ Collection<HdfsPartition> partitions =
+ ((HdfsTable)table.getFeFsTable()).partitionMap_.values();
+ for (HdfsPartition partition : partitions) {
+ for (FileDescriptor fileDesc : partition.getFileDescriptors()) {
+ Path path = new Path(table.getHdfsBaseDir() + Path.SEPARATOR +
+ fileDesc.getRelativePath());
+ hdfsFileDescMap.put(path.toUri().getPath(), fileDesc);
+ }
+ }
+ Map<String, HdfsPartition.FileDescriptor> fileDescMap = new HashMap<>();
List<DataFile> dataFileList = IcebergUtil.getIcebergDataFiles(table,
new ArrayList<>(), /*timeTravelSpecl=*/null);
-
- Map<String, HdfsPartition.FileDescriptor> fileDescMap = new HashMap<>();
- for (DataFile file : dataFileList) {
- HdfsPartition.FileDescriptor fileDesc = getFileDescriptor(
- new Path(file.path().toString()),
- new Path(table.getIcebergTableLocation()), table.getHostIndex());
- fileDescMap.put(IcebergUtil.getDataFilePathHash(file), fileDesc);
+ for (DataFile dataFile : dataFileList) {
+ Path path = new Path(dataFile.path().toString());
+ if (hdfsFileDescMap.containsKey(path.toUri().getPath())) {
+ String pathHash = IcebergUtil.getDataFilePathHash(dataFile);
+ fileDescMap.put(pathHash, hdfsFileDescMap.get(path.toUri().getPath()));
+ } else {
+ LOG.warn("Iceberg DataFile '{}' cannot be found in the HDFS recursive file "
+ + "listing results.", path.toString());
+ HdfsPartition.FileDescriptor fileDesc = getFileDescriptor(
+ new Path(dataFile.path().toString()),
+ new Path(table.getIcebergTableLocation()), table.getHostIndex());
+ fileDescMap.put(IcebergUtil.getDataFilePathHash(dataFile), fileDesc);
+ }
}
return fileDescMap;
}
[impala] 01/04: IMPALA-11091: Update documentation of event polling
Posted by bo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit 5cd2b2b35527e3d13c0f5dab8f526245f6b6ff15
Author: Vihang Karajgaonkar <vi...@apache.org>
AuthorDate: Wed Jan 26 11:35:54 2022 -0800
IMPALA-11091: Update documentation of event polling
IMPALA-8795 turns on event polling by default but the
documentation still says that it is a preview feature.
This change updates the documentation to say that the
feature is GA and enabled by default since Impala 4.1
Change-Id: Ife34b92cc1fdf4839071a888e389db69c0b4924f
Reviewed-on: http://gerrit.cloudera.org:8080/18173
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Shajini Thayasingh <st...@cloudera.com>
Reviewed-by: Vihang Karajgaonkar <vi...@cloudera.com>
---
docs/topics/impala_metadata.xml | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/docs/topics/impala_metadata.xml b/docs/topics/impala_metadata.xml
index 4ded6fa..181da6e 100644
--- a/docs/topics/impala_metadata.xml
+++ b/docs/topics/impala_metadata.xml
@@ -236,8 +236,8 @@ under the License.
</p>
<note>
- This is a preview feature in <keyword keyref="impala33_full"/> and not generally
- available.
+ This is a preview feature in <keyword keyref="impala33_full"/> and <keyword keyref="impala40"/>
+ It is generally available and enabled by default from <keyword keyref="impala41"/> onwards.
</note>
<ul>
@@ -356,7 +356,8 @@ under the License.
<codeblock> <property>
<name>hive.metastore.transactional.event.listeners</name>
<value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
-
+ </property>
+ <property>
<name>hive.metastore.dml.events</name>
<value>true</true>
</property></codeblock>
[impala] 02/04: IMPALA-11056: Create option to fail query on Java UDF exceptions
Posted by bo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit 504e0d00121bb11b54bbd754d52c1f3f66d29d20
Author: Steve Carlin <sc...@cloudera.com>
AuthorDate: Wed Dec 8 11:37:53 2021 -0800
IMPALA-11056: Create option to fail query on Java UDF exceptions
This commit will create a new query option,
"abort_java_udf_on_exception". The current and default behavior
is that when the Java UDF throws an exception, a warning is logged
and the function returns NULL. If the query option is set to
true, the query will fail.
Change-Id: Ifece20cf16a6575f1c498238f754440e870e2ce9
Reviewed-on: http://gerrit.cloudera.org:8080/18080
Reviewed-by: Kurt Deschler <kd...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Aman Sinha <am...@cloudera.com>
---
be/src/exprs/hive-udf-call-ir.cc | 8 ++++++--
be/src/exprs/hive-udf-call.cc | 8 ++++++--
be/src/runtime/runtime-state.h | 3 +++
be/src/service/query-options.cc | 4 ++++
be/src/service/query-options.h | 4 +++-
common/thrift/ImpalaService.thrift | 4 ++++
common/thrift/Query.thrift | 4 ++++
.../workloads/functional-query/queries/QueryTest/java-udf.test | 6 ++++++
8 files changed, 36 insertions(+), 5 deletions(-)
diff --git a/be/src/exprs/hive-udf-call-ir.cc b/be/src/exprs/hive-udf-call-ir.cc
index b4e88b7..99f1a2a 100644
--- a/be/src/exprs/hive-udf-call-ir.cc
+++ b/be/src/exprs/hive-udf-call-ir.cc
@@ -64,8 +64,12 @@ AnyVal* HiveUdfCall::CallJavaAndStoreResult(const ColumnType* type,
std::stringstream ss;
ss << "Hive UDF path=" << jni_ctx->hdfs_location << " class="
<< jni_ctx->scalar_fn_symbol << " failed due to: " << status.GetDetail();
- fn_ctx->AddWarning(ss.str().c_str());
- jni_ctx->warning_logged = true;
+ if (fn_ctx->impl()->state()->abort_java_udf_on_exception()) {
+ fn_ctx->SetError(ss.str().c_str());
+ } else {
+ fn_ctx->AddWarning(ss.str().c_str());
+ jni_ctx->warning_logged = true;
+ }
}
jni_ctx->output_anyval->is_null = true;
return jni_ctx->output_anyval;
diff --git a/be/src/exprs/hive-udf-call.cc b/be/src/exprs/hive-udf-call.cc
index a200d37..b5e21c8 100644
--- a/be/src/exprs/hive-udf-call.cc
+++ b/be/src/exprs/hive-udf-call.cc
@@ -115,8 +115,12 @@ AnyVal* HiveUdfCall::Evaluate(ScalarExprEvaluator* eval, const TupleRow* row) co
stringstream ss;
ss << "Hive UDF path=" << fn_.hdfs_location << " class=" << fn_.scalar_fn.symbol
<< " failed due to: " << status.GetDetail();
- fn_ctx->AddWarning(ss.str().c_str());
- jni_ctx->warning_logged = true;
+ if (fn_ctx->impl()->state()->abort_java_udf_on_exception()) {
+ fn_ctx->SetError(ss.str().c_str());
+ } else {
+ fn_ctx->AddWarning(ss.str().c_str());
+ jni_ctx->warning_logged = true;
+ }
}
jni_ctx->output_anyval->is_null = true;
return jni_ctx->output_anyval;
diff --git a/be/src/runtime/runtime-state.h b/be/src/runtime/runtime-state.h
index 948fb52..7d2d57b 100644
--- a/be/src/runtime/runtime-state.h
+++ b/be/src/runtime/runtime-state.h
@@ -104,6 +104,9 @@ class RuntimeState {
bool strict_mode() const { return query_options().strict_mode; }
bool utf8_mode() const { return query_options().utf8_mode; }
bool decimal_v2() const { return query_options().decimal_v2; }
+ bool abort_java_udf_on_exception() const {
+ return query_options().abort_java_udf_on_exception;
+ }
const TQueryCtx& query_ctx() const;
const TPlanFragment& fragment() const { return *fragment_; }
const TPlanFragmentInstanceCtx& instance_ctx() const { return *instance_ctx_; }
diff --git a/be/src/service/query-options.cc b/be/src/service/query-options.cc
index 3dab0d8..cc06645 100644
--- a/be/src/service/query-options.cc
+++ b/be/src/service/query-options.cc
@@ -1152,6 +1152,10 @@ Status impala::SetQueryOption(const string& key, const string& value,
query_options->__set_enable_async_load_data_execution(IsTrue(value));
break;
}
+ case TImpalaQueryOptions::ABORT_JAVA_UDF_ON_EXCEPTION: {
+ query_options->__set_abort_java_udf_on_exception(IsTrue(value));
+ break;
+ }
default:
if (IsRemovedQueryOption(key)) {
LOG(WARNING) << "Ignoring attempt to set removed query option '" << key << "'";
diff --git a/be/src/service/query-options.h b/be/src/service/query-options.h
index 14bd3ee..4c471c3 100644
--- a/be/src/service/query-options.h
+++ b/be/src/service/query-options.h
@@ -47,7 +47,7 @@ typedef std::unordered_map<string, beeswax::TQueryOptionLevel::type>
// time we add or remove a query option to/from the enum TImpalaQueryOptions.
#define QUERY_OPTS_TABLE\
DCHECK_EQ(_TImpalaQueryOptions_VALUES_TO_NAMES.size(),\
- TImpalaQueryOptions::PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT+ 1);\
+ TImpalaQueryOptions::ABORT_JAVA_UDF_ON_EXCEPTION + 1);\
REMOVED_QUERY_OPT_FN(abort_on_default_limit_exceeded, ABORT_ON_DEFAULT_LIMIT_EXCEEDED)\
QUERY_OPT_FN(abort_on_error, ABORT_ON_ERROR, TQueryOptionLevel::REGULAR)\
REMOVED_QUERY_OPT_FN(allow_unsupported_formats, ALLOW_UNSUPPORTED_FORMATS)\
@@ -269,6 +269,8 @@ typedef std::unordered_map<string, beeswax::TQueryOptionLevel::type>
PARQUET_LATE_MATERIALIZATION_THRESHOLD, TQueryOptionLevel::ADVANCED)\
QUERY_OPT_FN(parquet_dictionary_runtime_filter_entry_limit,\
PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT, TQueryOptionLevel::ADVANCED)\
+ QUERY_OPT_FN(abort_java_udf_on_exception,\
+ ABORT_JAVA_UDF_ON_EXCEPTION, TQueryOptionLevel::ADVANCED)\
;
/// Enforce practical limits on some query options to avoid undesired query state.
diff --git a/common/thrift/ImpalaService.thrift b/common/thrift/ImpalaService.thrift
index d16edc2..e2a0705 100644
--- a/common/thrift/ImpalaService.thrift
+++ b/common/thrift/ImpalaService.thrift
@@ -714,6 +714,10 @@ enum TImpalaQueryOptions {
// enable runtime filtering on the row group. For example, 2 means that runtime filter
// will be evaluated when the dictionary size is smaller or equal to 2.
PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT = 139;
+
+ // Abort the Java UDF if an exception is thrown. Default is that only a
+ // warning will be logged if the Java UDF throws an exception.
+ ABORT_JAVA_UDF_ON_EXCEPTION = 140;
}
// The summary of a DML statement.
diff --git a/common/thrift/Query.thrift b/common/thrift/Query.thrift
index fa2f187..0313b85 100644
--- a/common/thrift/Query.thrift
+++ b/common/thrift/Query.thrift
@@ -567,6 +567,10 @@ struct TQueryOptions {
// enable runtime filtering on the row group. For example, 2 means that runtime filter
// will be evaluated when the dictionary size is smaller or equal to 2.
140: optional i32 parquet_dictionary_runtime_filter_entry_limit = 1024;
+
+ // Abort the Java UDF if an exception is thrown. Default is that only a
+ // warning will be logged if the Java UDF throws an exception.
+ 141: optional bool abort_java_udf_on_exception = false;
}
// Impala currently has three types of sessions: Beeswax, HiveServer2 and external
diff --git a/testdata/workloads/functional-query/queries/QueryTest/java-udf.test b/testdata/workloads/functional-query/queries/QueryTest/java-udf.test
index 152ba79..82a7fe9 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/java-udf.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/java-udf.test
@@ -104,6 +104,12 @@ boolean
NULL
====
---- QUERY
+set abort_java_udf_on_exception=true;
+select throws_exception() from functional.alltypestiny;
+---- CATCH
+Test exception
+====
+---- QUERY
select throws_exception() from functional.alltypestiny;
---- TYPES
boolean
[impala] 04/04: IMPALA-11101: Change visibility on HdfsTable.setAvroSchema()
Posted by bo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit 27a1b4c1203fd1fc7929d23659eed0861703e9e1
Author: Steve Carlin <sc...@cloudera.com>
AuthorDate: Tue Feb 1 11:12:34 2022 -0800
IMPALA-11101: Change visibility on HdfsTable.setAvroSchema()
This changes the visibility of HdfsTable.setAvroSchema for use by
a derived class in an external frontend.
Tested manually by querying an Avro table.
Change-Id: I2a8a87c4c3ab9240d768ec18be316dab4b23ebde
Reviewed-on: http://gerrit.cloudera.org:8080/18189
Reviewed-by: Aman Sinha <am...@cloudera.com>
Tested-by: Aman Sinha <am...@cloudera.com>
---
fe/src/main/java/org/apache/impala/catalog/HdfsTable.java | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
index 35c8ce5..a9d07d4 100644
--- a/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
@@ -1730,7 +1730,7 @@ public class HdfsTable extends Table implements FeFsTable {
* as Avro. Additionally, this method also reconciles the schema if the column
* definitions from the metastore differ from the Avro schema.
*/
- private void setAvroSchema(IMetaStoreClient client,
+ protected void setAvroSchema(IMetaStoreClient client,
org.apache.hadoop.hive.metastore.api.Table msTbl) throws Exception {
Preconditions.checkState(isSchemaLoaded_);
String inputFormat = msTbl.getSd().getInputFormat();