You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by st...@apache.org on 2021/03/30 01:30:44 UTC

[impala] branch master updated (8f8668a -> 103774a)

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from 8f8668a  IMPALA-10593: Conditionally skip runtime filter for outer joins
     new 77d6acd  IMPALA-10581: Implement ds_theta_intersect_f() function
     new dbc2fc1  IMPALA-10597: Enable setting 'iceberg.file_format'
     new 7198f07  IMPALA-10605: Deflake test_refresh_native
     new 103774a  Update Python requests package to 2.20.0

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exprs/datasketches-common.cc                | 27 +++++++
 be/src/exprs/datasketches-common.h                 | 15 ++++
 be/src/exprs/datasketches-functions-ir.cc          | 33 +++++----
 be/src/exprs/datasketches-functions.h              |  8 ++
 common/function-registry/impala_functions.py       |  2 +
 .../analysis/AlterTableSetTblProperties.java       | 37 +++++++++-
 .../apache/impala/analysis/CreateTableStmt.java    |  3 +
 .../org/apache/impala/catalog/IcebergTable.java    |  1 +
 infra/python/deps/requirements.txt                 |  3 -
 infra/python/deps/stage2-requirements.txt          |  8 +-
 .../queries/QueryTest/datasketches-theta.test      | 85 ++++++++++++++++++++++
 .../queries/QueryTest/iceberg-alter.test           | 38 ++++++++++
 .../queries/QueryTest/iceberg-negative.test        | 19 +++--
 tests/custom_cluster/test_permanent_udfs.py        |  2 +-
 14 files changed, 256 insertions(+), 25 deletions(-)

[impala] 02/04: IMPALA-10597: Enable setting 'iceberg.file_format'

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit dbc2fc14d86f84b25670ee9af7cedbbc3cd18b9f
Author: Zoltan Borok-Nagy <bo...@cloudera.com>
AuthorDate: Fri Mar 19 18:57:48 2021 +0100

    IMPALA-10597: Enable setting 'iceberg.file_format'
    
    Currently we prohibit setting the following properties:
    
    * iceberg.catalog
    * iceberg.catalog_location
    * iceberg.file_format
    * iceberg.table_identifier
    
    This patch enables setting 'iceberg.file_format', therefore if
    a table was created by another engine, but using HiveCatalog,
    we'll be able to set the data file format to the proper value
    and make the table readable by Impala. Setting the other
    properties are not needed for HiveCatalog tables.
    
    If the table wasn't created by HiveCatalog, then we cannot load the
    table, therefore we cannot invoke any ALTER TABLE statement at all.
    In that case we need to create an external table.
    
    If the table already contains data files, then Impala checks if
    all of them have the proper file format. If not, the ALTER TABLE
    statement fails.
    
    Before this patch a CREATE TABLE statement accepted any string
    for 'iceberg.file_format', and in case of invalid file formats the
    frontend silently used Parquet. This patch also adds a check to only
    allow valid file formats.
    
    Testing:
     * added e2e test
    
    Change-Id: I4b3506be4562a1ace3e6435867aadb3bdde7a8e2
    Reviewed-on: http://gerrit.cloudera.org:8080/17207
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../analysis/AlterTableSetTblProperties.java       | 37 ++++++++++++++++++++-
 .../apache/impala/analysis/CreateTableStmt.java    |  3 ++
 .../org/apache/impala/catalog/IcebergTable.java    |  1 +
 .../queries/QueryTest/iceberg-alter.test           | 38 ++++++++++++++++++++++
 .../queries/QueryTest/iceberg-negative.test        | 19 ++++++++---
 5 files changed, 92 insertions(+), 6 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java b/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
index b934b88..3091188 100644
--- a/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
+++ b/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
@@ -24,6 +24,7 @@ import java.util.Map;
 import org.apache.avro.SchemaParseException;
 import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
 import org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils;
+import org.apache.iceberg.DataFile;
 import org.apache.impala.authorization.AuthorizationConfig;
 import org.apache.impala.catalog.FeFsTable;
 import org.apache.impala.catalog.FeHBaseTable;
@@ -32,6 +33,7 @@ import org.apache.impala.catalog.FeKuduTable;
 import org.apache.impala.catalog.FeTable;
 import org.apache.impala.catalog.IcebergTable;
 import org.apache.impala.catalog.KuduTable;
+import org.apache.impala.catalog.TableLoadingException;
 import org.apache.impala.common.AnalysisException;
 import org.apache.impala.common.Pair;
 import org.apache.impala.thrift.TAlterTableParams;
@@ -41,6 +43,7 @@ import org.apache.impala.thrift.TSortingOrder;
 import org.apache.impala.thrift.TTablePropertyType;
 import org.apache.impala.util.AvroSchemaParser;
 import org.apache.impala.util.AvroSchemaUtils;
+import org.apache.impala.util.IcebergUtil;
 import org.apache.impala.util.MetaStoreUtil;
 
 import com.google.common.base.Preconditions;
@@ -147,11 +150,13 @@ public class AlterTableSetTblProperties extends AlterTableSetStmt {
 
   private void analyzeIcebergTable(Analyzer analyzer) throws AnalysisException {
     //Cannot set these properties related to metadata
-    icebergPropertyCheck(IcebergTable.ICEBERG_FILE_FORMAT);
     icebergPropertyCheck(IcebergTable.ICEBERG_CATALOG);
     icebergPropertyCheck(IcebergTable.ICEBERG_CATALOG_LOCATION);
     icebergPropertyCheck(IcebergTable.ICEBERG_TABLE_IDENTIFIER);
     icebergPropertyCheck(IcebergTable.METADATA_LOCATION);
+    if (tblProperties_.containsKey(IcebergTable.ICEBERG_FILE_FORMAT)) {
+      icebergTableFormatCheck(tblProperties_.get(IcebergTable.ICEBERG_FILE_FORMAT));
+    }
   }
 
   private void icebergPropertyCheck(String property) throws AnalysisException {
@@ -161,6 +166,36 @@ public class AlterTableSetTblProperties extends AlterTableSetStmt {
     }
   }
 
+  private void icebergTableFormatCheck(String fileformat) throws AnalysisException {
+    Preconditions.checkState(getTargetTable() instanceof FeIcebergTable);
+    Preconditions.checkState(fileformat != null);
+    if (IcebergUtil.getIcebergFileFormat(fileformat) == null) {
+      throw new AnalysisException("Invalid fileformat for Iceberg table: " + fileformat);
+    }
+    try {
+      FeIcebergTable iceTable = (FeIcebergTable)getTargetTable();
+      List<DataFile> dataFiles = IcebergUtil.getIcebergDataFiles(iceTable,
+          new ArrayList<>());
+      if (dataFiles.isEmpty()) return;
+      DataFile firstFile = dataFiles.get(0);
+      String errorMsg = "Attempt to set Iceberg data file format to %s, but found data " +
+          "file %s with file format %s.";
+      if (!firstFile.format().name().equalsIgnoreCase(fileformat)) {
+        throw new AnalysisException(String.format(errorMsg, fileformat, firstFile.path(),
+        firstFile.format().name()));
+      }
+      //TODO(IMPALA-10610): Iceberg tables with mixed file formats are not readable.
+      for (DataFile df : dataFiles) {
+        if (df.format() != firstFile.format()) {
+          throw new AnalysisException(String.format(errorMsg, fileformat, df.path(),
+              df.format().name()));
+        }
+      }
+    } catch (TableLoadingException e) {
+      throw new AnalysisException(e);
+    }
+  }
+
   /**
    * Check that Avro schema provided in avro.schema.url or avro.schema.literal is valid
    * Json and contains only supported Impala types. If both properties are set, then
diff --git a/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java b/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
index 1477536..752a226 100644
--- a/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
@@ -605,6 +605,9 @@ public class CreateTableStmt extends StatementBase {
         IcebergTable.ICEBERG_STORAGE_HANDLER);
 
     String fileformat = getTblProperties().get(IcebergTable.ICEBERG_FILE_FORMAT);
+    if (fileformat != null && IcebergUtil.getIcebergFileFormat(fileformat) == null) {
+      throw new AnalysisException("Invalid fileformat for Iceberg table: " + fileformat);
+    }
     if (fileformat == null || fileformat.isEmpty()) {
       putGeneratedKuduProperty(IcebergTable.ICEBERG_FILE_FORMAT, "parquet");
     }
diff --git a/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
index 0cfce43..9f3cc95 100644
--- a/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
@@ -262,6 +262,7 @@ public class IcebergTable extends Table implements FeIcebergTable {
         loadSchemaFromIceberg(metadata);
         // Loading hdfs table after loaded schema from Iceberg,
         // in case we create external Iceberg table skipping column info in sql.
+        icebergFileFormat_ = Utils.getIcebergFileFormat(msTbl);
         hdfsTable_.load(false, msClient, msTable_, true, true, false, null, null, reason);
         pathHashToFileDescMap_ = Utils.loadAllPartition(this);
         loadAllColumnStats(msClient);
diff --git a/testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test b/testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
index cfa980c..938e11d 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
@@ -111,3 +111,41 @@ SELECT * FROM iceberg_rename;
 ---- CATCH
 Could not resolve table reference: 'iceberg_rename'
 ====
+---- QUERY
+CREATE TABLE iceberg_changing_fileformats (i int)
+STORED AS ICEBERG
+TBLPROPERTIES('iceberg.file_format'='orc');
+DESCRIBE FORMATTED iceberg_changing_fileformats;
+---- RESULTS: VERIFY_IS_SUBSET
+'','iceberg.file_format ','orc                 '
+---- TYPES
+string, string, string
+====
+---- QUERY
+ALTER TABLE iceberg_changing_fileformats set TBLPROPERTIES('iceberg.file_format'='parquet');
+DESCRIBE FORMATTED iceberg_changing_fileformats;
+---- RESULTS: VERIFY_IS_SUBSET
+'','iceberg.file_format ','parquet             '
+---- TYPES
+string, string, string
+====
+---- QUERY
+INSERT INTO iceberg_changing_fileformats values (123);
+SELECT * FROM iceberg_changing_fileformats;
+---- RESULTS
+123
+---- TYPES
+INT
+====
+---- QUERY
+ALTER TABLE iceberg_changing_fileformats set TBLPROPERTIES('iceberg.file_format'='ORC');
+---- CATCH
+Attempt to set Iceberg data file format to ORC
+====
+---- QUERY
+DESCRIBE FORMATTED iceberg_changing_fileformats;
+---- RESULTS: VERIFY_IS_SUBSET
+'','iceberg.file_format ','parquet             '
+---- TYPES
+string, string, string
+====
diff --git a/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test b/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
index 38a61a6..d6f0969 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
@@ -188,11 +188,6 @@ ALTER TABLE iceberg_table_hadoop_catalog RENAME TO iceberg_table_hadoop_catalog_
 UnsupportedOperationException: Cannot rename Iceberg tables that use 'hadoop.catalog' as catalog.
 ====
 ---- QUERY
-ALTER TABLE iceberg_table_hadoop_catalog set TBLPROPERTIES('iceberg.file_format'='orc');
----- CATCH
-AnalysisException: Changing the 'iceberg.file_format' table property is not supported for Iceberg table.
-====
----- QUERY
 ALTER TABLE iceberg_table_hadoop_catalog set TBLPROPERTIES('iceberg.catalog'='hadoop.tables');
 ---- CATCH
 AnalysisException: Changing the 'iceberg.catalog' table property is not supported for Iceberg table.
@@ -328,3 +323,17 @@ STORED AS ICEBERG;
 ---- CATCH
 Syntax error in line
 ====
+---- QUERY
+CREATE TABLE iceberg_wrong_fileformat (i int)
+STORED AS ICEBERG
+TBLPROPERTIES('iceberg.file_format'='or');
+---- CATCH
+Invalid fileformat for Iceberg table: or
+====
+---- QUERY
+CREATE TABLE iceberg_set_wrong_fileformat (i int)
+STORED AS ICEBERG;
+ALTER TABLE iceberg_set_wrong_fileformat SET TBLPROPERTIES ('iceberg.file_format'='parq');
+---- CATCH
+Invalid fileformat for Iceberg table: parq
+====

[impala] 01/04: IMPALA-10581: Implement ds_theta_intersect_f() function

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 77d6acd032333d54063b43db25d35b5f29fc2c38
Author: Fucun Chu <ch...@hotmail.com>
AuthorDate: Fri Mar 12 16:59:05 2021 +0800

    IMPALA-10581: Implement ds_theta_intersect_f() function
    
    This function receives two strings that are serialized Apache
    DataSketches Theta sketches. Computes the intersection of two sketches
    of same or different column and returns the resulting sketch of
    intersection.
    
    Example:
    select ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2))
    from sketch_tbl;
    +-----------------------------------------------------------+
    | ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2)) |
    +-----------------------------------------------------------+
    | 5                                                         |
    +-----------------------------------------------------------+
    
    Change-Id: I335eada00730036d5433775cfe673e0e4babaa01
    Reviewed-on: http://gerrit.cloudera.org:8080/17186
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/exprs/datasketches-common.cc                | 27 +++++++
 be/src/exprs/datasketches-common.h                 | 15 ++++
 be/src/exprs/datasketches-functions-ir.cc          | 33 +++++----
 be/src/exprs/datasketches-functions.h              |  8 ++
 common/function-registry/impala_functions.py       |  2 +
 .../queries/QueryTest/datasketches-theta.test      | 85 ++++++++++++++++++++++
 6 files changed, 157 insertions(+), 13 deletions(-)

diff --git a/be/src/exprs/datasketches-common.cc b/be/src/exprs/datasketches-common.cc
index c80bd5b..e76ad83 100644
--- a/be/src/exprs/datasketches-common.cc
+++ b/be/src/exprs/datasketches-common.cc
@@ -78,6 +78,33 @@ StringVal StringStreamToStringVal(FunctionContext* ctx, const stringstream& str_
   return dst;
 }
 
+bool update_sketch_to_theta_union(FunctionContext* ctx,
+    const StringVal& serialized_sketch, datasketches::theta_union& sketch) {
+  if (!serialized_sketch.is_null && serialized_sketch.len > 0) {
+    datasketches::theta_sketch::unique_ptr sketch_ptr;
+    if (!DeserializeDsSketch(serialized_sketch, &sketch_ptr)) {
+      LogSketchDeserializationError(ctx);
+      return false;
+    }
+    sketch.update(*sketch_ptr);
+  }
+  return true;
+}
+
+bool update_sketch_to_theta_intersection(FunctionContext* ctx,
+    const StringVal& serialized_sketch, datasketches::theta_intersection& sketch) {
+  if (!serialized_sketch.is_null && serialized_sketch.len > 0) {
+    datasketches::theta_sketch::unique_ptr sketch_ptr;
+    if (!DeserializeDsSketch(serialized_sketch, &sketch_ptr)) {
+      LogSketchDeserializationError(ctx);
+      return false;
+    }
+    sketch.update(*sketch_ptr);
+    return true;
+  }
+  return false;
+}
+
 template<class T>
 StringVal DsKllVectorResultToStringVal(FunctionContext* ctx,
     const vector<T>& kll_result) {
diff --git a/be/src/exprs/datasketches-common.h b/be/src/exprs/datasketches-common.h
index 2f2b4d5..20a7863 100644
--- a/be/src/exprs/datasketches-common.h
+++ b/be/src/exprs/datasketches-common.h
@@ -22,6 +22,8 @@
 
 #include "common/status.h"
 #include "thirdparty/datasketches/hll.hpp"
+#include "thirdparty/datasketches/theta_union.hpp"
+#include "thirdparty/datasketches/theta_intersection.hpp"
 #include "udf/udf.h"
 
 namespace impala {
@@ -59,6 +61,19 @@ bool DeserializeDsSketch(const StringVal& serialized_sketch, T* sketch)
 StringVal StringStreamToStringVal(FunctionContext* ctx,
     const std::stringstream& str_stream);
 
+/// Helper function that receives a serialized DataSketches Theta sketch in
+/// 'serialized_sketch', deserializes it and update the deserialized sketch to 'sketch'.
+/// Returns false if the deserialization fails (the error log will be written),
+/// true otherwise.
+bool update_sketch_to_theta_union(FunctionContext* ctx,
+    const StringVal& serialized_sketch, datasketches::theta_union& sketch);
+
+/// Helper function that receives a serialized DataSketches Theta sketch in
+/// 'serialized_sketch', deserializes it and update the deserialized sketch to 'sketch'.
+/// Returns false if 'serialized_sketch' is empty or deserialization fails (the error log
+/// will be written), true otherwise.
+bool update_sketch_to_theta_intersection(FunctionContext* ctx,
+    const StringVal& serialized_sketch, datasketches::theta_intersection& sketch);
 
 /// Helper function that receives a vector and returns a comma separated StringVal that
 /// holds the items from the vector keeping the order.
diff --git a/be/src/exprs/datasketches-functions-ir.cc b/be/src/exprs/datasketches-functions-ir.cc
index 3a3fa4a..a7e7b74 100644
--- a/be/src/exprs/datasketches-functions-ir.cc
+++ b/be/src/exprs/datasketches-functions-ir.cc
@@ -22,6 +22,7 @@
 #include "thirdparty/datasketches/hll.hpp"
 #include "thirdparty/datasketches/theta_sketch.hpp"
 #include "thirdparty/datasketches/theta_union.hpp"
+#include "thirdparty/datasketches/theta_intersection.hpp"
 #include "thirdparty/datasketches/theta_a_not_b.hpp"
 #include "thirdparty/datasketches/kll_sketch.hpp"
 #include "udf/udf-internal.h"
@@ -160,19 +161,6 @@ StringVal DataSketchesFunctions::DsThetaExclude(FunctionContext* ctx,
   return StringVal::null();
 }
 
-bool update_sketch_to_theta_union(FunctionContext* ctx,
-    const StringVal& serialized_sketch, datasketches::theta_union& union_sketch) {
-  if (!serialized_sketch.is_null && serialized_sketch.len > 0) {
-    datasketches::theta_sketch::unique_ptr sketch_ptr;
-    if (!DeserializeDsSketch(serialized_sketch, &sketch_ptr)) {
-      LogSketchDeserializationError(ctx);
-      return false;
-    }
-    union_sketch.update(*sketch_ptr);
-  }
-  return true;
-}
-
 StringVal DataSketchesFunctions::DsThetaUnionF(FunctionContext* ctx,
     const StringVal& first_serialized_sketch, const StringVal& second_serialized_sketch) {
   datasketches::theta_union union_sketch = datasketches::theta_union::builder().build();
@@ -190,6 +178,25 @@ StringVal DataSketchesFunctions::DsThetaUnionF(FunctionContext* ctx,
   return StringStreamToStringVal(ctx, serialized_input);
 }
 
+StringVal DataSketchesFunctions::DsThetaIntersectF(FunctionContext* ctx,
+    const StringVal& first_serialized_sketch, const StringVal& second_serialized_sketch) {
+  datasketches::theta_intersection intersection_sketch;
+  // Update two sketches to theta_intersection
+  // Note that if one of the sketches is null, null is returned.
+  if (!update_sketch_to_theta_intersection(
+          ctx, first_serialized_sketch, intersection_sketch)) {
+    return StringVal::null();
+  }
+  if (!update_sketch_to_theta_intersection(
+          ctx, second_serialized_sketch, intersection_sketch)) {
+    return StringVal::null();
+  }
+  datasketches::compact_theta_sketch sketch = intersection_sketch.get_result();
+  std::stringstream serialized_input;
+  sketch.serialize(serialized_input);
+  return StringStreamToStringVal(ctx, serialized_input);
+}
+
 FloatVal DataSketchesFunctions::DsKllQuantile(FunctionContext* ctx,
     const StringVal& serialized_sketch, const DoubleVal& rank) {
   if (serialized_sketch.is_null || serialized_sketch.len == 0) return FloatVal::null();
diff --git a/be/src/exprs/datasketches-functions.h b/be/src/exprs/datasketches-functions.h
index 15d9774..7013945 100644
--- a/be/src/exprs/datasketches-functions.h
+++ b/be/src/exprs/datasketches-functions.h
@@ -85,6 +85,14 @@ public:
       const StringVal& first_serialized_sketch,
       const StringVal& second_serialized_sketch);
 
+  /// 'first_serialized_sketch' and 'second_serialized_sketch' are both expected as
+  /// serialized Apache DataSketches Theta sketches. If they are not, then the query
+  /// fails. Intersect two sketches and return the resulting sketch after the
+  /// intersection.
+  static StringVal DsThetaIntersectF(FunctionContext* ctx,
+      const StringVal& first_serialized_sketch,
+      const StringVal& second_serialized_sketch);
+
   /// 'serialized_sketch' is expected as a serialized Apache DataSketches KLL sketch. If
   /// it is not, then the query fails. 'rank' is used to identify which item (estimate)
   /// to return from the sketched dataset. E.g. 0.1 means the item where 10% of the
diff --git a/common/function-registry/impala_functions.py b/common/function-registry/impala_functions.py
index e46ef89..fde4c71 100644
--- a/common/function-registry/impala_functions.py
+++ b/common/function-registry/impala_functions.py
@@ -1009,6 +1009,8 @@ visible_functions = [
      '_ZN6impala21DataSketchesFunctions14DsThetaExcludeEPN10impala_udf15FunctionContextERKNS1_9StringValES6_'],
   [['ds_theta_union_f'], 'STRING', ['STRING', 'STRING'],
      '_ZN6impala21DataSketchesFunctions13DsThetaUnionFEPN10impala_udf15FunctionContextERKNS1_9StringValES6_'],
+  [['ds_theta_intersect_f'], 'STRING', ['STRING', 'STRING'],
+     '_ZN6impala21DataSketchesFunctions17DsThetaIntersectFEPN10impala_udf15FunctionContextERKNS1_9StringValES6_'],
   [['ds_kll_quantile'], 'FLOAT', ['STRING', 'DOUBLE'],
       '_ZN6impala21DataSketchesFunctions13DsKllQuantileEPN10impala_udf15FunctionContextERKNS1_9StringValERKNS1_9DoubleValE'],
   [['ds_kll_n'], 'BIGINT', ['STRING'],
diff --git a/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test b/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
index 27a0c06..cc03b22 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
@@ -481,4 +481,89 @@ select ds_theta_estimate(ds_theta_union_f(sketch1, sketch2)) from sketch_interme
 BIGINT
 ---- RESULTS
 15
+====
+---- QUERY
+# Check if the intersection of a valid sketch with a null value returns an null sketch.
+select
+    ds_theta_estimate(ds_theta_intersect_f(date_sketch, null)),
+    ds_theta_estimate(ds_theta_intersect_f(null, float_sketch))
+from sketch_store;
+---- TYPES
+BIGINT,BIGINT
+---- RESULTS
+0,0
+0,0
+0,0
+0,0
+====
+---- QUERY
+# Check that ds_theta_intersect_f() returns an empty sketch for 2 empty sketch.
+select ds_theta_estimate(ds_theta_intersect_f(
+ds_theta_sketch(cast(f2 as float)), ds_theta_sketch(cast(f2 as float))))
+from functional_parquet.emptytable;
+---- TYPES
+BIGINT
+---- RESULTS
+0
+====
+---- QUERY
+# Checks that ds_theta_intersect_f() returns an null sketch for 2 null inputs.
+select ds_theta_estimate(ds_theta_intersect_f(null_str, some_nulls)) from
+functional_parquet.nullrows where id='b';
+---- TYPES
+BIGINT
+---- RESULTS
+0
+====
+---- QUERY
+# ds_theta_intersect_f() returns an error if it receives an invalid serialized sketch.
+select ds_theta_intersect_f(date_string_col, null) from functional_parquet.alltypestiny
+where id=1;
+---- CATCH
+UDF ERROR: Unable to deserialize sketch.
+====
+---- QUERY
+# ds_theta_intersect_f() returns an error if it receives an invalid serialized sketch.
+select ds_theta_intersect_f(sketch1, sketch2) from (
+select ds_theta_sketch(float_col) sketch1, max(date_string_col) sketch2 from
+functional_parquet.alltypestiny
+) t
+---- CATCH
+UDF ERROR: Unable to deserialize sketch.
+====
+---- QUERY
+# Intersect the sketches from theta_sketches_impala_hive2 and checks if the intersect
+# produces the same result as if these sketches were used separately to get the estimates.
+select
+    ds_theta_estimate(ds_theta_intersect_f(i_ti, h_ti)) as ti,
+    ds_theta_estimate(ds_theta_intersect_f(i_i, h_i)) as i,
+    ds_theta_estimate(ds_theta_intersect_f(i_bi, h_bi)) as bi,
+    ds_theta_estimate(ds_theta_intersect_f(i_f, h_f)) as f,
+    ds_theta_estimate(ds_theta_intersect_f(i_d, h_d)) as d,
+    ds_theta_estimate(ds_theta_intersect_f(i_s, h_s)) as s,
+    ds_theta_estimate(ds_theta_intersect_f(i_c, h_c)) as c,
+    ds_theta_estimate(ds_theta_intersect_f(i_v, h_v)) as v,
+    ds_theta_estimate(ds_theta_intersect_f(i_nc, h_nc)) as nc
+from theta_sketches_impala_hive2;
+---- TYPES
+BIGINT,BIGINT,BIGINT,BIGINT,BIGINT,BIGINT,BIGINT,BIGINT,BIGINT
+---- RESULTS
+5,7,6,6,7,4,4,3,0
+====
+---- QUERY
+# Check that the non-empty input sketches are distinct and the result is empty.
+select ds_theta_estimate(ds_theta_intersect_f(date_sketch, float_sketch))
+from sketch_store where year=2009 and month=1;
+---- TYPES
+BIGINT
+---- RESULTS
+0
+====
+---- QUERY
+# Check When the inputs aren't the same but has some (but not all) items common.
+select ds_theta_estimate(ds_theta_intersect_f(sketch1, sketch2)) from sketch_intermediate;
+---- TYPES
+BIGINT
+---- RESULTS
+5
 ====
\ No newline at end of file

[impala] 04/04: Update Python requests package to 2.20.0

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 103774a8e517e5c8532e313df29e67008bc56617
Author: Jim Apple <jb...@apache.org>
AuthorDate: Sun Mar 21 17:16:32 2021 -0700

    Update Python requests package to 2.20.0
    
    See https://2.python-requests.org/en/master/community/updates/#id8.
    This is currently only used in the tests, but it's best to fix
    this now.
    
    While here, remove now-false not about required support for Python
    2.6.
    
    Change-Id: I092a641a12f38cdb45b0062c31ffb51c0c664800
    Reviewed-on: http://gerrit.cloudera.org:8080/17215
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
    Reviewed-by: Joe McDonnell <jo...@cloudera.com>
---
 infra/python/deps/requirements.txt        | 3 ---
 infra/python/deps/stage2-requirements.txt | 8 ++++++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/infra/python/deps/requirements.txt b/infra/python/deps/requirements.txt
index 11cdaea..ab1d758 100644
--- a/infra/python/deps/requirements.txt
+++ b/infra/python/deps/requirements.txt
@@ -17,8 +17,6 @@
 
 # To download requirements use the download_requirements script in this dir.
 
-# Remember, all modules below need to support python 2.6.
-
 # Dependents are indented. Dependents that have multiple parents are not listed
 # multiple times (though maybe they could be).
 
@@ -50,7 +48,6 @@ python-magic == 0.4.11
 # attempting to install pywebhdfs (https://github.com/pywebhdfs/pywebhdfs/issues/52).
 # pywebhdfs itself will be installed in stage 2.
   pbr == 3.1.1
-requests == 2.7.0
 # Newer versions of setuptools don't support Python 2.6
 setuptools == 36.8.0
 setuptools-scm == 1.15.4
diff --git a/infra/python/deps/stage2-requirements.txt b/infra/python/deps/stage2-requirements.txt
index 907f6f9..b5bc44b 100644
--- a/infra/python/deps/stage2-requirements.txt
+++ b/infra/python/deps/stage2-requirements.txt
@@ -18,8 +18,6 @@
 # This file contains packages that have dependencies in requirements.txt and that have to
 # be installed in a separate invocation of pip.
 
-# Remember, all modules below need to support python 2.6.
-
 # Requires setuptools-scm
 pytest == 2.9.2
   py == 1.4.32
@@ -34,3 +32,9 @@ hdfs == 2.0.2
 
 # Requires pbr
 pywebhdfs == 0.3.2
+
+requests == 2.20.0
+   chardet == 3.0.4
+   idna == 2.8
+   urllib3 == 1.21.1
+   certifi == 2020.12.5
\ No newline at end of file

[impala] 03/04: IMPALA-10605: Deflake test_refresh_native

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 7198f07c9237e9b46b0442a256a704c216f22235
Author: Vihang Karajgaonkar <vi...@apache.org>
AuthorDate: Wed Mar 24 09:33:50 2021 -0700

    IMPALA-10605: Deflake test_refresh_native
    
    This test uses a regex to parse the output of
    describe database and extract the db properties. The regex assumes that there
    will only be 1 key value pair which is broken when events processor is
    running. The fix is to modify the regex so that it only extracts the
    relevant function name prefix and its value.
    
    Testing:
    1. The test fails when events processor is enabled. After the patch
    the test works as expected.
    
    Change-Id: I1df35b9c5f2b21cc7172f03ff8611d46070d64c2
    Reviewed-on: http://gerrit.cloudera.org:8080/17227
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 tests/custom_cluster/test_permanent_udfs.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/custom_cluster/test_permanent_udfs.py b/tests/custom_cluster/test_permanent_udfs.py
index 104252e..f75e86e 100644
--- a/tests/custom_cluster/test_permanent_udfs.py
+++ b/tests/custom_cluster/test_permanent_udfs.py
@@ -275,7 +275,7 @@ class TestUdfPersistence(CustomClusterTestSuite):
     describe_db_hive = "DESCRIBE DATABASE EXTENDED {database}".format(
         database=self.HIVE_IMPALA_INTEGRATION_DB)
     result = self.run_stmt_in_hive(describe_db_hive)
-    regex = r"{(.*?)=(.*?)}"
+    regex = r"{.*(impala_registered_function.*?)=(.*?)[,}]"
     match = re.search(regex, result)
     func_name = match.group(1)
     func_contents = match.group(2)