You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by mo...@apache.org on 2023/01/10 16:06:32 UTC

[doris] 01/03: [improvement](multi-catalog) support unsupported column type (#15660)

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch branch-1.2-lts
in repository https://gitbox.apache.org/repos/asf/doris.git

commit dfe1e70e87d01bd160f43d88d7f1d671e29d6aac
Author: Mingyu Chen <mo...@163.com>
AuthorDate: Sun Jan 8 10:07:10 2023 +0800

    [improvement](multi-catalog) support unsupported column type (#15660)
    
    When creating an external catalog, Doris will automatically sync the schema of table from external catalog.
    But some of column type are not supported by Doris now, such as struct, map, etc.
    
    In previous, when meeting these unsupported column, Doris will throw an exception, and the corresponding
    table can not be synced. But user may just want to query other supported columns.
    
    In this PR, I add a new column type: UNSUPPORTED. And now it is just used for external table schema sync.
    When meeting unsupported column, it will be synced as column with UNSUPPORTED type.
    
    When query this table, there are serval situation:
    
    select * from table: throw error Unsupported type 'UNSUPPORTED_TYPE' xxx
    select k1 from table: k1 is with supported type. query OK.
    select * except(k2): k2 is with unsupported type. query OK
---
 .../hive/scripts/create_preinstalled_table.hql     | 12 ++++++++++
 docs/en/docs/advanced/variables.md                 |  4 +++-
 .../docs/ecosystem/external-table/multi-catalog.md | 28 ++++++++++++++++++++--
 .../docs/ecosystem/external-table/multi-catalog.md | 28 ++++++++++++++++++++--
 .../doris/catalog/HiveMetaStoreClientHelper.java   |  2 +-
 .../org/apache/doris/catalog/PrimitiveType.java    |  1 +
 .../java/org/apache/doris/catalog/ScalarType.java  |  1 +
 .../main/java/org/apache/doris/catalog/Type.java   |  6 ++---
 .../doris/external/elasticsearch/EsUtil.java       |  3 ++-
 .../org/apache/doris/external/jdbc/JdbcClient.java |  6 ++---
 gensrc/thrift/Types.thrift                         |  1 +
 .../external_catalog_p0/hive/test_hive_other.out   | 12 ++++++++++
 .../hive/test_hive_other.groovy                    | 20 +++++++++++++++-
 13 files changed, 109 insertions(+), 15 deletions(-)

diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_table.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_table.hql
index dfab9b0d1f..8bb9f456b8 100644
--- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_table.hql
+++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_table.hql
@@ -274,4 +274,16 @@ TBLPROPERTIES (
 
 msck repair table table_with_x01;
 
+CREATE TABLE `unsupported_type_table`(
+  k1 int,
+  k2 string,
+  k3 double,
+  k4 map<string,int>,
+  k5 STRUCT<
+            houseno:    STRING
+           ,streetname: STRING
+           >,
+  k6 int
+);
+
 show tables;
diff --git a/docs/en/docs/advanced/variables.md b/docs/en/docs/advanced/variables.md
index be384315e5..adbaddc51e 100644
--- a/docs/en/docs/advanced/variables.md
+++ b/docs/en/docs/advanced/variables.md
@@ -555,4 +555,6 @@ Translated with www.DeepL.com/Translator (free version)
 	Password strength verification policy. Defaults to `NONE` or `0`, i.e. no verification. Can be set to `STRONG` or `2`. When set to `STRONG` or `2`, when setting a password via the `ALTER USER` or `SET PASSWORD` commands, the password must contain any of "uppercase letters", "lowercase letters", "numbers" and "special characters". 3 items, and the length must be greater than or equal to 8. Special characters include: `~!@#$%^&*()_+|<>,.?/:;'[]{}"`.
 
 * `group_concat_max_len`
-    For compatible purpose. This variable has no effect, just enable some BI tools can query or set this session variable sucessfully.
\ No newline at end of file
+
+    For compatible purpose. This variable has no effect, just enable some BI tools can query or set this session variable sucessfully.
+
diff --git a/docs/en/docs/ecosystem/external-table/multi-catalog.md b/docs/en/docs/ecosystem/external-table/multi-catalog.md
index dc93e96312..426a3dd95a 100644
--- a/docs/en/docs/ecosystem/external-table/multi-catalog.md
+++ b/docs/en/docs/ecosystem/external-table/multi-catalog.md
@@ -24,10 +24,11 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-<version since="1.2.0">
 
 # Multi-Catalog
 
+<version since="1.2.0">
+
 Multi-Catalog is a feature introduced in Doris 1.2.0, which aims to make it easier to interface with external data sources to enhance Doris' data lake analysis and federated data query capabilities.
 
 In previous versions of Doris, there were only two levels of user data: Database and Table. When we need to connect to an external data source, we can only connect at the Database or Table level. For example, create a mapping of a table in an external data source through `create external table`, or map a Database in an external data source through `create external database`. If there are too many Databases or Tables in the external data source, users need to manually map them one by one, [...]
@@ -39,6 +40,8 @@ The new Multi-Catalog function adds a new layer of Catalog to the original metad
 
 This function will be used as a supplement and enhancement to the previous external table connection method (External Table) to help users perform fast multi-catalog federated queries.
 
+</version>
+
 ## Basic Concepts
 
 1. Internal Catalog
@@ -594,6 +597,28 @@ MySQL [db1]> select * from tbl1;
 
 After the user creates the catalog, Doris will automatically synchronize the database and tables of the data catalog. For different data catalog and data table formats, Doris will perform the following mapping relationships.
 
+<version since="dev">
+
+For types that cannot currently be mapped to Doris column types, such as map, struct, etc. Doris will map the column type to UNSUPPORTED type. For queries of type UNSUPPORTED, an example is as follows:
+
+Suppose the table schema after synchronization is:
+
+```
+k1 INT,
+k2 INT,
+k3 UNSUPPORTED,
+k4 INT
+```
+
+```
+select * from table;                // Error: Unsupported type 'UNSUPPORTED_TYPE' in '`k3`
+select * except(k3) from table;     // Query OK.
+select k1, k3 from table;           // Error: Unsupported type 'UNSUPPORTED_TYPE' in '`k3`
+select k1, k4 from table;           // Query OK.
+```
+
+</version>
+
 ### Hive MetaStore
 
 For Hive/Iceberge/Hudi
@@ -703,4 +728,3 @@ Currently, users need to manually refresh metadata via the [REFRESH CATALOG](../
 
 Automatic synchronization of metadata will be supported soon.
 
-</version>
diff --git a/docs/zh-CN/docs/ecosystem/external-table/multi-catalog.md b/docs/zh-CN/docs/ecosystem/external-table/multi-catalog.md
index 8abd2a5586..1eac90f64d 100644
--- a/docs/zh-CN/docs/ecosystem/external-table/multi-catalog.md
+++ b/docs/zh-CN/docs/ecosystem/external-table/multi-catalog.md
@@ -24,10 +24,11 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-<version since="1.2.0">
 
 # 多源数据目录
 
+<version since="1.2.0">
+
 多源数据目录(Multi-Catalog)是 Doris 1.2.0 版本中推出的功能,旨在能够更方便对接外部数据目录,以增强Doris的数据湖分析和联邦数据查询能力。
 
 在之前的 Doris 版本中,用户数据只有两个层级:Database 和 Table。当我们需要连接一个外部数据目录时,我们只能在Database 或 Table 层级进行对接。比如通过 `create external table` 的方式创建一个外部数据目录中的表的映射,或通过 `create external database` 的方式映射一个外部数据目录中的 Database。 如果外部数据目录中的 Database 或 Table 非常多,则需要用户手动进行一一映射,使用体验不佳。
@@ -39,6 +40,8 @@ under the License.
 
 该功能将作为之前外表连接方式(External Table)的补充和增强,帮助用户进行快速的多数据目录联邦查询。
 
+</version>
+
 ## 基础概念
 
 1. Internal Catalog
@@ -593,6 +596,28 @@ MySQL [db1]> select * from tbl1;
 
 用户创建 Catalog 后,Doris 会自动同步数据目录的数据库和表,针对不同的数据目录和数据表格式,Doris 会进行以下列映射关系。
 
+<version since="dev">
+
+对于当前无法映射到 Doris 列类型的外表类型,如 map,struct 等。Doris 会将列类型映射为 UNSUPPORTED 类型。对于 UNSUPPORTED 类型的查询,示例如下:
+
+假设同步后的表 schema 为:
+
+```
+k1 INT,
+k2 INT,
+k3 UNSUPPORTED,
+k4 INT
+```
+
+```
+select * from table;                // Error: Unsupported type 'UNSUPPORTED_TYPE' in '`k3`
+select * except(k3) from table;     // Query OK.
+select k1, k3 from table;           // Error: Unsupported type 'UNSUPPORTED_TYPE' in '`k3`
+select k1, k4 from table;           // Query OK.
+```
+
+</version>
+
 ### Hive MetaStore
 
 适用于 Hive/Iceberge/Hudi
@@ -702,4 +727,3 @@ Doris 的权限管理功能提供了对 Cataloig 层级的扩展,具体可参
 
 后续会支持元数据的自动同步。
 
-</version>
diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/HiveMetaStoreClientHelper.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/HiveMetaStoreClientHelper.java
index e43cecb0c7..0e88ec9e0d 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/catalog/HiveMetaStoreClientHelper.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/HiveMetaStoreClientHelper.java
@@ -727,7 +727,7 @@ public class HiveMetaStoreClientHelper {
             }
             return ScalarType.createDecimalType(precision, scale);
         }
-        return Type.INVALID;
+        return Type.UNSUPPORTED;
     }
 
     public static String showCreateTable(org.apache.hadoop.hive.metastore.api.Table remoteTable) {
diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/PrimitiveType.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/PrimitiveType.java
index c951a481f5..6dcdf14b70 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/catalog/PrimitiveType.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/PrimitiveType.java
@@ -31,6 +31,7 @@ import java.util.List;
 
 public enum PrimitiveType {
     INVALID_TYPE("INVALID_TYPE", -1, TPrimitiveType.INVALID_TYPE),
+    UNSUPPORTED("UNSUPPORTED_TYPE", -1, TPrimitiveType.UNSUPPORTED),
     // NULL_TYPE - used only in LiteralPredicate and NullLiteral to make NULLs compatible
     // with all other types.
     NULL_TYPE("NULL_TYPE", 1, TPrimitiveType.NULL_TYPE),
diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/ScalarType.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/ScalarType.java
index 446c73f58e..4a58b2457d 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/catalog/ScalarType.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/ScalarType.java
@@ -764,6 +764,7 @@ public class ScalarType extends Type {
     public boolean isSupported() {
         switch (type) {
             case BINARY:
+            case UNSUPPORTED:
                 return false;
             default:
                 return true;
diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/Type.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/Type.java
index 774ead88c9..34b9cb49b8 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/catalog/Type.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/Type.java
@@ -51,6 +51,7 @@ public abstract class Type {
 
     // Static constant types for scalar types that don't require additional information.
     public static final ScalarType INVALID = new ScalarType(PrimitiveType.INVALID_TYPE);
+    public static final ScalarType UNSUPPORTED = new ScalarType(PrimitiveType.UNSUPPORTED);
     public static final ScalarType NULL = new ScalarType(PrimitiveType.NULL_TYPE);
     public static final ScalarType BOOLEAN = new ScalarType(PrimitiveType.BOOLEAN);
     public static final ScalarType TINYINT = new ScalarType(PrimitiveType.TINYINT);
@@ -1409,7 +1410,6 @@ public abstract class Type {
         compatibilityMatrix[DECIMAL128.ordinal()][DECIMAL32.ordinal()] = PrimitiveType.DECIMAL128;
         compatibilityMatrix[DECIMAL128.ordinal()][DECIMAL64.ordinal()] = PrimitiveType.DECIMAL128;
 
-
         // HLL
         compatibilityMatrix[HLL.ordinal()][TIME.ordinal()] = PrimitiveType.INVALID_TYPE;
         compatibilityMatrix[HLL.ordinal()][TIMEV2.ordinal()] = PrimitiveType.INVALID_TYPE;
@@ -1445,7 +1445,6 @@ public abstract class Type {
         compatibilityMatrix[QUANTILE_STATE.ordinal()][DECIMAL64.ordinal()] = PrimitiveType.INVALID_TYPE;
         compatibilityMatrix[QUANTILE_STATE.ordinal()][DECIMAL128.ordinal()] = PrimitiveType.INVALID_TYPE;
 
-
         // TIME why here not???
         compatibilityMatrix[TIME.ordinal()][TIME.ordinal()] = PrimitiveType.INVALID_TYPE;
         compatibilityMatrix[TIME.ordinal()][TIMEV2.ordinal()] = PrimitiveType.INVALID_TYPE;
@@ -1475,7 +1474,8 @@ public abstract class Type {
                         || t1 == PrimitiveType.TIME || t2 == PrimitiveType.TIME
                         || t1 == PrimitiveType.TIMEV2 || t2 == PrimitiveType.TIMEV2
                         || t1 == PrimitiveType.MAP || t2 == PrimitiveType.MAP
-                        || t1 == PrimitiveType.STRUCT || t2 == PrimitiveType.STRUCT) {
+                        || t1 == PrimitiveType.STRUCT || t2 == PrimitiveType.STRUCT
+                        || t1 == PrimitiveType.UNSUPPORTED || t2 == PrimitiveType.UNSUPPORTED) {
                     continue;
                 }
                 Preconditions.checkNotNull(compatibilityMatrix[i][j]);
diff --git a/fe/fe-core/src/main/java/org/apache/doris/external/elasticsearch/EsUtil.java b/fe/fe-core/src/main/java/org/apache/doris/external/elasticsearch/EsUtil.java
index 7cf22cfda3..0cde72067d 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/external/elasticsearch/EsUtil.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/external/elasticsearch/EsUtil.java
@@ -440,8 +440,9 @@ public class EsUtil {
             case "ip":
             case "nested":
             case "object":
-            default:
                 return ScalarType.createStringType();
+            default:
+                return Type.UNSUPPORTED;
         }
     }
 
diff --git a/fe/fe-core/src/main/java/org/apache/doris/external/jdbc/JdbcClient.java b/fe/fe-core/src/main/java/org/apache/doris/external/jdbc/JdbcClient.java
index 9b1bd31975..2da7cd38ff 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/external/jdbc/JdbcClient.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/external/jdbc/JdbcClient.java
@@ -440,8 +440,7 @@ public class JdbcClient {
             case "ENUM":
                 return ScalarType.createStringType();
             default:
-                throw new JdbcClientException("Can not convert mysql data type to doris data type for type ["
-                        + mysqlType + "]");
+                return Type.UNSUPPORTED;
         }
     }
 
@@ -505,8 +504,7 @@ public class JdbcClient {
             case "uuid":
                 return ScalarType.createStringType();
             default:
-                throw new JdbcClientException("Can not convert postgresql data type to doris data type for type ["
-                        + pgType + "]");
+                return Type.UNSUPPORTED;
         }
     }
 
diff --git a/gensrc/thrift/Types.thrift b/gensrc/thrift/Types.thrift
index 08e8e69bf1..589b95f77c 100644
--- a/gensrc/thrift/Types.thrift
+++ b/gensrc/thrift/Types.thrift
@@ -91,6 +91,7 @@ enum TPrimitiveType {
   DECIMAL64,
   DECIMAL128I,
   JSONB,
+  UNSUPPORTED
 }
 
 enum TTypeNodeType {
diff --git a/regression-test/data/external_catalog_p0/hive/test_hive_other.out b/regression-test/data/external_catalog_p0/hive/test_hive_other.out
index 05c25b0e19..0ce6861e34 100644
--- a/regression-test/data/external_catalog_p0/hive/test_hive_other.out
+++ b/regression-test/data/external_catalog_p0/hive/test_hive_other.out
@@ -224,6 +224,10 @@ zyLjAtVdXV	GrJRf8WvRR
 2022-11-10	2022-11-10	zvbIuF1cg8	33ZkOc8Fl7	2022-11-10
 2022-11-10	2022-11-10	zyLjAtVdXV	GrJRf8WvRR	2022-11-10
 
+-- !q51 --
+
+-- !q52 --
+
 -- !q24 --
 zhangsan	1
 lisi	1
@@ -444,6 +448,10 @@ zyLjAtVdXV	GrJRf8WvRR
 2022-11-10	2022-11-10	zvbIuF1cg8	33ZkOc8Fl7	2022-11-10
 2022-11-10	2022-11-10	zyLjAtVdXV	GrJRf8WvRR	2022-11-10
 
+-- !q51 --
+
+-- !q52 --
+
 -- !q24 --
 zhangsan	1
 lisi	1
@@ -664,6 +672,10 @@ zyLjAtVdXV	GrJRf8WvRR
 2022-11-10	2022-11-10	zvbIuF1cg8	33ZkOc8Fl7	2022-11-10
 2022-11-10	2022-11-10	zyLjAtVdXV	GrJRf8WvRR	2022-11-10
 
+-- !q51 --
+
+-- !q52 --
+
 -- !after_refresh --
 2022-11-24	2022-11-24	zVhHON3m5L	ecmOZhvo5C	2022-11-24
 2022-11-24	2022-11-24	zceLX7IpL5	JZtsmIjKGQ	2022-11-24
diff --git a/regression-test/suites/external_catalog_p0/hive/test_hive_other.groovy b/regression-test/suites/external_catalog_p0/hive/test_hive_other.groovy
index 2d8e85e461..df2d489e0e 100644
--- a/regression-test/suites/external_catalog_p0/hive/test_hive_other.groovy
+++ b/regression-test/suites/external_catalog_p0/hive/test_hive_other.groovy
@@ -47,8 +47,26 @@ suite("test_hive_other", "p0") {
         order_qt_q48 """ select k2, k5 from table_with_x01 where dt in ('2022-11-25') order by k2 desc limit 10;"""
         order_qt_q49 """ select k2, k5 from table_with_x01 where dt in ('2022-11-10', '2022-11-10') order by k2 desc limit 10;"""
         order_qt_q50 """ select dt, dt, k2, k5, dt from table_with_x01 where dt in ('2022-11-10') or dt in ('2022-11-10') order by k2 desc limit 10;"""
-    }
 
+        test {
+            sql """select * from unsupported_type_table"""
+            exception """Unsupported type 'UNSUPPORTED_TYPE'"""
+        }
+
+        qt_q51 """select * except(k4,k5) from unsupported_type_table;"""
+
+        test {
+            sql """select k1,k4 from unsupported_type_table"""
+            exception """Unsupported type 'UNSUPPORTED_TYPE'"""
+        }
+
+        test {
+            sql """select k1 from unsupported_type_table where k4 is null;"""
+            exception """Unsupported type 'UNSUPPORTED_TYPE'"""
+        }
+
+        qt_q52 """select k1,k3,k6 from unsupported_type_table"""
+    }
 
     String enabled = context.config.otherConfigs.get("enableHiveTest")
     if (enabled != null && enabled.equalsIgnoreCase("true")) {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org