You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2021/01/21 06:02:00 UTC

[jira] [Updated] (HUDI-1376) Drop Hudi metadata columns before Spark datasource writing

     [ https://issues.apache.org/jira/browse/HUDI-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinoth Chandar updated HUDI-1376:
---------------------------------
    Status: Open  (was: New)

> Drop Hudi metadata columns before Spark datasource writing 
> -----------------------------------------------------------
>
>                 Key: HUDI-1376
>                 URL: https://issues.apache.org/jira/browse/HUDI-1376
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Wenning Ding
>            Assignee: Wenning Ding
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.7.0
>
>
> When updating a Hudi table through Spark datasource, it will use the schema of the input dataframe as the schema stored in the commit files. Thus, when upserted with rows containing metadata columns, the upsert commit file will store the metadata columns schema in the commit file which is unnecessary for common cases. And also this will bring an issue for bootstrap table.
> Since metadata columns are not used during the Spark datasource writing process, we can drop those columns in the beginning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)