You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Sudarshan Vasudevan (Jira)" <ji...@apache.org> on 2020/08/03 19:10:05 UTC

[jira] [Updated] (GOBBLIN-877) Add column metadata for partition for inline hive registration

     [ https://issues.apache.org/jira/browse/GOBBLIN-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sudarshan Vasudevan updated GOBBLIN-877:
----------------------------------------
    Fix Version/s: 0.15.0

> Add column metadata for partition for inline hive registration
> --------------------------------------------------------------
>
>                 Key: GOBBLIN-877
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-877
>             Project: Apache Gobblin
>          Issue Type: Task
>            Reporter: Zihan Li
>            Priority: Major
>             Fix For: 0.15.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Previously, we remove the schema.literal for partition.  Because Avro schemas should _only_ be defined at the table level. Hive overrides table properties if the same property is defined on the partition. Defining them at the partition level may lead to partitions with inconsistent schemas. And because column metadata is calculated from schema.literal, so we remove the column metadata as well.
> Then we encounter a problem that presto cannot read data from orc file. Because ORC (and other Hive serdes) need metadata in the partitions so that coercion can be done between a partition schema and the table schema.
> So we need to treat Avro and other formate separately to make sure hive registration works well so that user can read right data from Presto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)