You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/09/07 21:37:00 UTC
[jira] [Commented] (HUDI-1441) HoodieAvroUtils - rewrite() is not
handling evolution of a nested record field.
[ https://issues.apache.org/jira/browse/HUDI-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411528#comment-17411528 ]
ASF GitHub Bot commented on HUDI-1441:
--------------------------------------
hudi-bot commented on pull request #2309:
URL: https://github.com/apache/hudi/pull/2309#issuecomment-914645226
<!--
Meta data
{
"version" : 1,
"metaDataEntries" : [ {
"hash" : "9abc305fbd4cf4d4ca9799b7f791e8e03e3ff0d3",
"status" : "UNKNOWN",
"url" : "TBD",
"triggerID" : "9abc305fbd4cf4d4ca9799b7f791e8e03e3ff0d3",
"triggerType" : "PUSH"
} ]
}-->
## CI report:
* 9abc305fbd4cf4d4ca9799b7f791e8e03e3ff0d3 UNKNOWN
<details>
<summary>Bot commands</summary>
@hudi-bot supports the following commands:
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
> HoodieAvroUtils - rewrite() is not handling evolution of a nested record field.
> -------------------------------------------------------------------------------
>
> Key: HUDI-1441
> URL: https://issues.apache.org/jira/browse/HUDI-1441
> Project: Apache Hudi
> Issue Type: Bug
> Components: Common Core
> Reporter: Balajee Nagasubramaniam
> Priority: Critical
> Labels: pull-request-available, sev:critical
>
> When a schema has nested record field and one of the fields of the nested record evolves, then rewrite() results in SchemaCompatibilityException (or ArrayIndexOutOfBoundsException).
> {{/*
> * OldRecord: NewRecord:
> * field1 : String field1 : String
> * field2 : record field2 : record
> * field_21 : string field_21 : string
> * field_22 : Integer field_22 : Integer
> * field3: Integer field_23 : String
> * field_24 : Integer
> * field3: Integer
> *
> * When a nested record has changed/evolved, newRecord.put(field2, oldRecord.get(field2)), is not sufficient.
> * Requires a deep-copy/rewrite of the evolved field.
> */}}
> Note 1: When reading the parquet file using the writer schema, this should not be a problem, as new fields are substituted with null. When reading the parquet using reader schema and writing to a new file using the writer schema, this issue is manifested.
> Note 2: Hudi test suite - upsertNode exercies this path. (fixed as a work around in a separate task).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)