You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/07 11:44:57 UTC

[GitHub] [incubator-doris] deardeng commented on a diff in pull request #9358: fix #9351 can't load parquet file with column name case sensitive with Doris column

deardeng commented on code in PR #9358:
URL: https://github.com/apache/incubator-doris/pull/9358#discussion_r867343045


##########
fe/fe-core/src/main/java/org/apache/doris/load/Load.java:
##########
@@ -1045,12 +1045,23 @@ private static void initColumns(Table tbl, List<ImportColumnDesc> columnExprs,
             return;
         }
 
+        Set<String> tmpSet = Sets.newHashSet();
+        for (ImportColumnDesc importColumnDesc : copiedColumnExprs) {
+            if (importColumnDesc.getExpr() == null) {
+                tmpSet.add(importColumnDesc.getColumnName());
+            }
+        }
+
         // init slot desc add expr map, also transform hadoop functions
         for (ImportColumnDesc importColumnDesc : copiedColumnExprs) {
             // make column name case match with real column name
             String columnName = importColumnDesc.getColumnName();
-            String realColName = tbl.getColumn(columnName) == null ? columnName
-                    : tbl.getColumn(columnName).getName();
+            String realColName;
+            if (tbl.getColumn(columnName) == null || tmpSet.contains(columnName) ){

Review Comment:
   such case, 
   
   CREATE TABLE `record` (
    `id` varchar(50) NOT NULL ,
    `SS` varchar(3) NULL 
   ) ENGINE=OLAP
   UNIQUE KEY(`id`)
   DISTRIBUTED BY HASH(`id`) BUCKETS 10;
   
   LOAD LABEL test.record
   ( 
    DATA INFILE ("hdfs://172.0.0.9/tmp/part0.parq")
    INTO TABLE record format as parquet
    (id, ss) 
    SET
    ( 
   	id = id, 
   	SS = ss
    )
   ) WITH BROKER 'Broker_Doris' ( "username" = "hadoop" );
   
   copiedColumnExprs has entrys ("id", "ss", "id = id", "SS = ss"), 
   exprMap has entrys ("id = id", "SS = ss"), 
   tmpSet has keys ("id", "ss")
   
   1. if don't check tmpSet.contains(columnName), realName will be SS. realName(SS) will be sent to BE. BE uses realName(SS) to match parquet file's SS column, and an error will be occur,parquet file's meta has only ss column
   2. if use tmpSet.contains(columnName), realName will be ss



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org