You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2021/11/03 15:21:00 UTC
[jira] [Reopened] (HUDI-1441) HoodieAvroUtils - rewrite() is not
handling evolution of a nested record field.
[ https://issues.apache.org/jira/browse/HUDI-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinoth Chandar reopened HUDI-1441:
----------------------------------
Assignee: Vinoth Chandar (was: Sagar Sumit)
> HoodieAvroUtils - rewrite() is not handling evolution of a nested record field.
> -------------------------------------------------------------------------------
>
> Key: HUDI-1441
> URL: https://issues.apache.org/jira/browse/HUDI-1441
> Project: Apache Hudi
> Issue Type: Bug
> Components: Common Core
> Reporter: Balajee Nagasubramaniam
> Assignee: Vinoth Chandar
> Priority: Critical
> Labels: pull-request-available, sev:critical
>
> When a schema has nested record field and one of the fields of the nested record evolves, then rewrite() results in SchemaCompatibilityException (or ArrayIndexOutOfBoundsException).
> {{/*
> * OldRecord: NewRecord:
> * field1 : String field1 : String
> * field2 : record field2 : record
> * field_21 : string field_21 : string
> * field_22 : Integer field_22 : Integer
> * field3: Integer field_23 : String
> * field_24 : Integer
> * field3: Integer
> *
> * When a nested record has changed/evolved, newRecord.put(field2, oldRecord.get(field2)), is not sufficient.
> * Requires a deep-copy/rewrite of the evolved field.
> */}}
> Note 1: When reading the parquet file using the writer schema, this should not be a problem, as new fields are substituted with null. When reading the parquet using reader schema and writing to a new file using the writer schema, this issue is manifested.
> Note 2: Hudi test suite - upsertNode exercies this path. (fixed as a work around in a separate task).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)