You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Yann Byron (Jira)" <ji...@apache.org> on 2022/02/10 04:04:00 UTC

[jira] [Updated] (HUDI-3338) Use custom relation instead of HadoopFsRelation

     [ https://issues.apache.org/jira/browse/HUDI-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yann Byron updated HUDI-3338:
-----------------------------
    Status: Patch Available  (was: In Progress)

> Use custom relation instead of HadoopFsRelation
> -----------------------------------------------
>
>                 Key: HUDI-3338
>                 URL: https://issues.apache.org/jira/browse/HUDI-3338
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: spark, spark-sql
>            Reporter: Yann Byron
>            Assignee: Yann Byron
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> For HUDI-3204, COW table and MOR table in read_optimized query mode should return the 'yyyy-MM-dd' format of origin `data_date`, not yyyy/MM/dd''.
> And the reason for that is because Hudi use HadoopFsRelation for the snapshot query mode of cow and the read_optimized query mode of mor.
> Spark HadoopFsRelation will append the partition value of the real partition path. However, different from the normal table, Hudi will persist the partition value in the parquet file. So we just need read the partition value from the parquet file, not leave it to spark.
> So we should not use `HadoopFsRelation` any more, and implement Hudi own `Relation` to deal with it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)