You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Wenning Ding (Jira)" <ji...@apache.org> on 2020/11/08 04:16:00 UTC

[jira] [Updated] (HUDI-1376) Drop metadata columns before Spark datasource processing

     [ https://issues.apache.org/jira/browse/HUDI-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenning Ding updated HUDI-1376:
-------------------------------
    Summary: Drop metadata columns before Spark datasource processing   (was: Remove the schema of metadata columns in the commit files)

> Drop metadata columns before Spark datasource processing 
> ---------------------------------------------------------
>
>                 Key: HUDI-1376
>                 URL: https://issues.apache.org/jira/browse/HUDI-1376
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Wenning Ding
>            Assignee: Wenning Ding
>            Priority: Major
>              Labels: pull-request-available
>
> When updating a Hudi table through Spark datasource, it will use the schema of the input dataframe as the schema stored in the commit files. Thus, when upserted with rows containing metadata columns, the upsert commit file will store the metadata columns schema in the commit file which is unnecessary for common cases. And also this will bring an issue for bootstrap table.
> Since the schema of metadata columns is always the same, we should remove the schema of metadata columns in the commit file for any insert/upsert/... action.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)