You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2019/10/31 01:22:38 UTC

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #972: [HUDI-313] Fix select count star error when querying a realtime table

umehrot2 commented on a change in pull request #972: [HUDI-313] Fix select count star error when querying a realtime table
URL: https://github.com/apache/incubator-hudi/pull/972#discussion_r340926632
 
 

 ##########
 File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
 ##########
 @@ -197,10 +197,27 @@ private static synchronized Configuration addRequiredProjectionFields(Configurat
     return configuration;
   }
 
+  /**
+   * Hive will append read columns' ids to old columns' ids during getRecordReader. In some cases, e.g. SELECT COUNT(*),
+   * the read columns' id is an empty string and Hive will combine it with Hoodie required projection ids and becomes
+   * e.g. ",2,0,3" and will cause an error. This method is used to avoid this situation.
+   */
 
 Review comment:
   As discussed with you internally as well, this appears to be a bug in `Hive`. It is manifesting because `Hudi` has the need to append its minimum set of projection columns i.e its metadata columns even incase of a `count` query.
   
   But ideally this needs to be fixed in Hive so it does not happen in the first place. https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java#L119
   
   Can we file a Jira with Hive, and add it to the comment here.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services