You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2021/11/03 15:21:00 UTC

[jira] [Reopened] (HUDI-1441) HoodieAvroUtils - rewrite() is not handling evolution of a nested record field.

     [ https://issues.apache.org/jira/browse/HUDI-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinoth Chandar reopened HUDI-1441:
----------------------------------
    Assignee: Vinoth Chandar  (was: Sagar Sumit)

> HoodieAvroUtils - rewrite() is not handling evolution of a nested record field.
> -------------------------------------------------------------------------------
>
>                 Key: HUDI-1441
>                 URL: https://issues.apache.org/jira/browse/HUDI-1441
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Common Core
>            Reporter: Balajee Nagasubramaniam
>            Assignee: Vinoth Chandar
>            Priority: Critical
>              Labels: pull-request-available, sev:critical
>
> When a schema has nested record field and one of the fields of the nested record evolves, then rewrite() results in SchemaCompatibilityException (or ArrayIndexOutOfBoundsException).
> {{/*
>    *  OldRecord:                     NewRecord:
>    *      field1 : String                field1 : String
>    *      field2 : record                field2 : record
>    *         field_21 : string              field_21 : string
>    *         field_22 : Integer             field_22 : Integer
>    *      field3: Integer                   field_23 : String
>    *                                       field_24 : Integer
>    *                                     field3: Integer
>    *
>    *  When a nested record has changed/evolved, newRecord.put(field2, oldRecord.get(field2)), is not sufficient.
>    *  Requires a deep-copy/rewrite of the evolved field.
>    */}}
> Note 1:  When reading the parquet file using the writer schema, this should not be a problem, as new fields are substituted with null.  When reading the parquet using reader schema and writing to a new file using the writer schema, this issue is manifested.
> Note 2:  Hudi test suite - upsertNode exercies this path.  (fixed as a work around in a separate task).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)