You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/09/07 21:37:00 UTC

[jira] [Commented] (HUDI-1441) HoodieAvroUtils - rewrite() is not handling evolution of a nested record field.

    [ https://issues.apache.org/jira/browse/HUDI-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411528#comment-17411528 ] 

ASF GitHub Bot commented on HUDI-1441:
--------------------------------------

hudi-bot commented on pull request #2309:
URL: https://github.com/apache/hudi/pull/2309#issuecomment-914645226


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9abc305fbd4cf4d4ca9799b7f791e8e03e3ff0d3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9abc305fbd4cf4d4ca9799b7f791e8e03e3ff0d3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9abc305fbd4cf4d4ca9799b7f791e8e03e3ff0d3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> HoodieAvroUtils - rewrite() is not handling evolution of a nested record field.
> -------------------------------------------------------------------------------
>
>                 Key: HUDI-1441
>                 URL: https://issues.apache.org/jira/browse/HUDI-1441
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Common Core
>            Reporter: Balajee Nagasubramaniam
>            Priority: Critical
>              Labels: pull-request-available, sev:critical
>
> When a schema has nested record field and one of the fields of the nested record evolves, then rewrite() results in SchemaCompatibilityException (or ArrayIndexOutOfBoundsException).
> {{/*
>    *  OldRecord:                     NewRecord:
>    *      field1 : String                field1 : String
>    *      field2 : record                field2 : record
>    *         field_21 : string              field_21 : string
>    *         field_22 : Integer             field_22 : Integer
>    *      field3: Integer                   field_23 : String
>    *                                       field_24 : Integer
>    *                                     field3: Integer
>    *
>    *  When a nested record has changed/evolved, newRecord.put(field2, oldRecord.get(field2)), is not sufficient.
>    *  Requires a deep-copy/rewrite of the evolved field.
>    */}}
> Note 1:  When reading the parquet file using the writer schema, this should not be a problem, as new fields are substituted with null.  When reading the parquet using reader schema and writing to a new file using the writer schema, this issue is manifested.
> Note 2:  Hudi test suite - upsertNode exercies this path.  (fixed as a work around in a separate task).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)