You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/09/02 22:44:00 UTC

[jira] [Created] (HUDI-4772) Revisit dropped Partition Columns handling

Alexey Kudinkin created HUDI-4772:
-------------------------------------

             Summary: Revisit dropped Partition Columns handling
                 Key: HUDI-4772
                 URL: https://issues.apache.org/jira/browse/HUDI-4772
             Project: Apache Hudi
          Issue Type: Bug
          Components: writer-core
    Affects Versions: 0.13.0
            Reporter: Alexey Kudinkin
            Assignee: Alexey Kudinkin


Currently, dropping partition columns (controlled by "hoodie.datasource.write.drop.partition.columns") is handled in a piecemeal fashion, which unfortunately may to lead to very subtle and hard to troubleshoot issues when used.

For ex, currently in HoodieSparkSqlWriter this would affect what will be persisted as writer's schema – in case partition columns are dropped from the data file we will persist "reduced" schema as the one that was used by the Writer, which is invalid since Writer was using the full schema, however partition columns weren't persisted in the Data Files (ie dropped, since they're already encoded into the partition path)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)