You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/06/10 15:37:57 UTC

[GitHub] [incubator-doris] morningman commented on a change in pull request #3819: [Spark load][Fe 4/6] Add hive external table and update hive table syntax in loadstmt

morningman commented on a change in pull request #3819:
URL: https://github.com/apache/incubator-doris/pull/3819#discussion_r438202036



##########
File path: docs/zh-CN/sql-reference/sql-statements/Data Definition/CREATE TABLE.md
##########
@@ -152,6 +152,18 @@ under the License.
         "path" 中如果有多个文件,用逗号[,]分割。如果文件名中包含逗号,那么使用 %2c 来替代。如果文件名中包含 %,使用 %25 代替
         现在文件内容格式支持CSV,支持GZ,BZ2,LZ4,LZO(LZOP) 压缩格式。
 
+    3) 如果是hive,则需要在properties提供以下信息:
+    ```
+    PROPERTIES (
+        "database" = "hive_db_name",
+        "table" = "hive_table_name",
+        "hive.metastore.uris" = "thrift://127.0.0.1:9083"
+    )
+
+    ```
+    其中database是hive表对应的库名字,table是hive表的名字,hive.metastore.uris是hive metastore服务地址。
+    注意:目前hive外部表仅用于Spark Load使用。

Review comment:
       ```suggestion
       注意:目前 hive 外部表仅用于 Spark Load
   ```

##########
File path: fe/src/main/java/org/apache/doris/load/BrokerFileGroup.java
##########
@@ -165,6 +170,33 @@ public void parse(Database db, DataDescription dataDescription) throws DdlExcept
 
         // FilePath
         filePaths = dataDescription.getFilePaths();
+
+        if (dataDescription.isLoadFromTable()) {
+            String srcTableName = dataDescription.getSrcTableName();
+            // src table should be hive table
+            Table srcTable = db.getTable(srcTableName);
+            if (srcTable == null) {
+                throw new DdlException("Unknown table " + srcTableName + " in database " + db.getFullName());
+            }
+            if (!(srcTable instanceof HiveTable)) {
+                throw new DdlException("Source table " + srcTableName + " is not HiveTable");
+            }
+            // src table columns should include all columns of loaded table

Review comment:
       Is this necessary?
   I think we can support that some of olap table's column does not exist in HIVE table, and can be filled by default value or null?

##########
File path: fe/src/main/cup/sql_parser.cup
##########
@@ -1244,6 +1244,15 @@ data_desc ::=
         RESULT = new DataDescription(tableName, partitionNames, files, colList, colSep, fileFormat,
         columnsFromPath, isNeg, colMappingList, whereExpr);
     :}
+    | KW_DATA KW_FROM KW_TABLE ident:srcTableName
+    opt_negative:isNeg
+    KW_INTO KW_TABLE ident:tableName
+    opt_partition_names:partitionNames
+    opt_col_mapping_list:colMappingList

Review comment:
       How to map the hive table's columns to olap table's columns?
   What it column's name is same in two tables?
   
   How about reference to the [DeltaLake](https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/copy-into) `COPY INTO` stmt by using a `SELECT` statement?
   
   ```
   DATA AS (SELECT xxx FROM hive_table WHERE xxx)
   INTO TABLE olap_table
   PARTITION(p1, p2, ...)
   (k1, k2, k3, v1, v2)   /* indicate the columns of olap table which will be loaded */
   ```
   
   First, SQL is more flexible, and can be easily used by Spark to read from a hive table.
   

##########
File path: fe/src/main/java/org/apache/doris/catalog/Catalog.java
##########
@@ -3908,6 +3911,18 @@ private void createBrokerTable(Database db, CreateTableStmt stmt) throws DdlExce
         return;
     }
 
+    private void createHiveTable(Database db, CreateTableStmt stmt) throws DdlException {
+        String tableName = stmt.getTableName();
+        List<Column> columns = stmt.getColumns();
+        long tableId = Catalog.getCurrentCatalog().getNextId();

Review comment:
       ```suggestion
           long tableId = getNextId();
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org