You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/13 15:15:28 UTC

[GitHub] [incubator-doris] dujl opened a new pull request, #9559: [feature-wip](hudi) Step1: Support create hudi external table

dujl opened a new pull request, #9559:
URL: https://github.com/apache/incubator-doris/pull/9559

   # Proposed changes
   
   
   Issue Number: close https://github.com/apache/incubator-doris/issues/9557
   
   Support create hudi external table in Doris.
   
   This is the first pr to support hudi external table.
   
   ## Feature Summary:
   
   ### The propose of the pr is:
   support create hudi table
   support show create table for hudi table
   
   ### Design
   1. create hudi table without schema(recommanded)
   ```
       CREATE [EXTERNAL] TABLE table_name
       ENGINE = HUDI
       [COMMENT "comment"]
       PROPERTIES (
       "hudi.database" = "hudi_db_in_hive_metastore",
       "hudi.table" = "hudi_table_in_hive_metastore",
       "hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
       );
   ```
   
   2. create hudi table with schema
   ```
       CREATE [EXTERNAL] TABLE table_name
       [(column_definition1[, column_definition2, ...])]
       ENGINE = HUDI
       [COMMENT "comment"]
       PROPERTIES (
       "hudi.database" = "hudi_db_in_hive_metastore",
       "hudi.table" = "hudi_table_in_hive_metastore",
       "hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
       );
   ```
   When create hudi table with schema, the columns must exist in corresponding table in hive metastore.
   
   3. show create table
   
   ```
   
   MySQL [db_demo]> show create table t_hudi;
   +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | Table  | Create Table                                                                                                                                                                                                                                                                  |
   +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | t_hudi | CREATE TABLE `t_hudi` (
   
   ) ENGINE=HUDI
   COMMENT "HUDI"
   PROPERTIES (
   "hudi.database" = "hudi_db",
   "hudi.table" = "hudi_ctas_cow_pt_tbl",
   "hudi.hive.metastore.uris"  =  "thrift://10.248.178.76:9083"
   ) |
   +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   1 row in set (0.01 sec)
   ```
   
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (No)
   2. Has unit tests been added: (Yes)
   6. Has document been added or modified: (Yes)
   7. Does it need to update dependencies: (Yes)
   8. Are there any changes that cannot be rolled back: (No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Jibing-Li commented on a diff in pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
Jibing-Li commented on code in PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559#discussion_r873182981


##########
docs/en/ecosystem/external-table/hudi-external-table.md:
##########
@@ -0,0 +1,137 @@
+---
+{
+    "title": "Doris Hudi external table",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Hudi External Table of Doris
+
+Hudi External Table of Doris provides Doris with the ability to access hdui external tables directly, eliminating the need for cumbersome data import and leveraging Doris' own OLAP capabilities to solve hudi table data analysis problems.
+
+ 1. support hudi data sources for Doris
+ 2. Support joint query between Doris and hdui data source tables to perform more complex analysis operations
+
+This document introduces how to use this feature and the considerations.
+
+## Glossary
+
+### Noun in Doris
+
+* FE: Frontend, the front-end node of Doris, responsible for metadata management and request access
+* BE: Backend, the backend node of Doris, responsible for query execution and data storage
+
+## How to use
+
+### Create Hudi External Table 
+
+Hudi tables can be created in Doris in two ways. You do not need to declare the column definitions of the table when creating an external table, Doris can automatically convert them based on the column definitions of the table in hive metastore.
+
+1. Create a separate external table to mount the Hudi table.  
+   The syntax can be viewed in `HELP CREATE TABLE`.

Review Comment:
   What is the second way of creating Hudi table? I didn't see it in the doc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] qidaye commented on a diff in pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
qidaye commented on code in PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559#discussion_r874419025


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java:
##########
@@ -4105,6 +4111,44 @@ private void createHiveTable(Database db, CreateTableStmt stmt) throws DdlExcept
         LOG.info("successfully create table[{}-{}]", tableName, tableId);
     }
 
+    private void createHudiTable(Database db, CreateTableStmt stmt) throws DdlException {
+        String tableName = stmt.getTableName();
+        List<Column> columns = stmt.getColumns();
+        long tableId = getNextId();
+        HudiTable hudiTable = new HudiTable(tableId, tableName, columns, stmt.getProperties());

Review Comment:
   Where to get the table schema if no columns are specified when creating the table?



##########
fe/fe-core/src/test/java/org/apache/doris/analysis/CreateTableStmtTest.java:
##########
@@ -274,4 +274,21 @@ public void testCreateIcebergTable() throws UserException {
                 "\"iceberg.hive.metastore.uris\"  =  \"thrift://127.0.0.1:9087\",\n" +
                 "\"iceberg.table\"  =  \"test\")", stmt.toString());
     }
+
+    @Test
+    public void testCreateHudiTable() throws UserException {

Review Comment:
   It's better to add another test case with column definations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559#issuecomment-1126838549

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
morningman merged PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559#issuecomment-1128342925

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] dujl commented on a diff in pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
dujl commented on code in PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559#discussion_r880271438


##########
fe/fe-core/src/test/java/org/apache/doris/analysis/CreateTableStmtTest.java:
##########
@@ -274,4 +274,21 @@ public void testCreateIcebergTable() throws UserException {
                 "\"iceberg.hive.metastore.uris\"  =  \"thrift://127.0.0.1:9087\",\n" +
                 "\"iceberg.table\"  =  \"test\")", stmt.toString());
     }
+
+    @Test
+    public void testCreateHudiTable() throws UserException {

Review Comment:
   have added test case with column definations in https://github.com/apache/incubator-doris/pull/9752



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a diff in pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559#discussion_r872528984


##########
fe/fe-core/src/main/java/org/apache/doris/external/hudi/HudiUtils.java:
##########
@@ -0,0 +1,112 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.external.hudi;
+
+import org.apache.doris.common.DdlException;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.Maps;
+import org.apache.hudi.hadoop.HoodieParquetInputFormat;
+import org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat;
+
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+public class HudiUtils {
+
+    private static final String PROPERTY_MISSING_MSG =
+            "Hudi table %s is null. Please add properties('%s'='xxx') when create table";
+
+    /**
+     * check hudi table properties
+     */
+    public static void validateCreatTable(HudiTable table) throws DdlException {

Review Comment:
   ```suggestion
       public static void validateCreateTable(HudiTable table) throws DdlException {
   ```



##########
gensrc/thrift/Descriptors.thrift:
##########
@@ -250,6 +250,12 @@ struct TIcebergTable {
   3: required map<string, string> properties
 }
 
+struct THudiTable {
+  1: required string db_name
+  2: required string table_name
+  3: required map<string, string> properties

Review Comment:
   use `optional` for all fields.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] dujl commented on a diff in pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
dujl commented on code in PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559#discussion_r873190382


##########
docs/en/ecosystem/external-table/hudi-external-table.md:
##########
@@ -0,0 +1,137 @@
+---
+{
+    "title": "Doris Hudi external table",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Hudi External Table of Doris
+
+Hudi External Table of Doris provides Doris with the ability to access hdui external tables directly, eliminating the need for cumbersome data import and leveraging Doris' own OLAP capabilities to solve hudi table data analysis problems.
+
+ 1. support hudi data sources for Doris
+ 2. Support joint query between Doris and hdui data source tables to perform more complex analysis operations
+
+This document introduces how to use this feature and the considerations.
+
+## Glossary
+
+### Noun in Doris
+
+* FE: Frontend, the front-end node of Doris, responsible for metadata management and request access
+* BE: Backend, the backend node of Doris, responsible for query execution and data storage
+
+## How to use
+
+### Create Hudi External Table 
+
+Hudi tables can be created in Doris in two ways. You do not need to declare the column definitions of the table when creating an external table, Doris can automatically convert them based on the column definitions of the table in hive metastore.
+
+1. Create a separate external table to mount the Hudi table.  
+   The syntax can be viewed in `HELP CREATE TABLE`.

Review Comment:
   1, create hudi external table without schema
   2, create hudi external table with schema



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] dujl commented on pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
dujl commented on PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559#issuecomment-1126178942

   
   > You if wrote `close #9557`, after this PR is merged, the issue will be automatically closed. So in this pr, you can remove "close" keyword.
   
   done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559#issuecomment-1126838545

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
morningman commented on PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559#issuecomment-1126175431

   You if wrote `close #9557`, after this PR is merged, the issue will be automatically closed.
   So in this pr, you can remove "close" keyword.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] dujl commented on a diff in pull request #9559: [feature-wip](hudi) Step1: Support create hudi external table

Posted by GitBox <gi...@apache.org>.
dujl commented on code in PR #9559:
URL: https://github.com/apache/incubator-doris/pull/9559#discussion_r880124516


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java:
##########
@@ -4105,6 +4111,44 @@ private void createHiveTable(Database db, CreateTableStmt stmt) throws DdlExcept
         LOG.info("successfully create table[{}-{}]", tableName, tableId);
     }
 
+    private void createHudiTable(Database db, CreateTableStmt stmt) throws DdlException {
+        String tableName = stmt.getTableName();
+        List<Column> columns = stmt.getColumns();
+        long tableId = getNextId();
+        HudiTable hudiTable = new HudiTable(tableId, tableName, columns, stmt.getProperties());

Review Comment:
   do not get table schema when create table, it will get schema when query hudi table. 
   please refer https://github.com/apache/incubator-doris/pull/9752



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org