You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Well Tang (Jira)" <ji...@apache.org> on 2021/05/23 10:07:00 UTC
[jira] [Assigned] (HUDI-1919) Fix column misalignment occurs when
reading the copy_on_write type of hudi table through Flink
[ https://issues.apache.org/jira/browse/HUDI-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Well Tang reassigned HUDI-1919:
-------------------------------
Assignee: Well Tang
> Fix column misalignment occurs when reading the copy_on_write type of hudi table through Flink
> ----------------------------------------------------------------------------------------------
>
> Key: HUDI-1919
> URL: https://issues.apache.org/jira/browse/HUDI-1919
> Project: Apache Hudi
> Issue Type: Bug
> Components: Flink Integration
> Environment: Hudi version : 0.9.0-SNAPSHOT
> Flink version : 1.12.2
> Hadoop version : 2.9.2
> Storage (HDFS/S3/GCS..) : HDFS
> Reporter: Well Tang
> Assignee: Well Tang
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2021-05-22-00-02-03-762.png, image-2021-05-22-00-02-41-706.png
>
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> The timing of the exception is: when the specified partition column field is not at the end of the sequence of fields written to the hudi table.
> For example, if the order of the fields (including partition columns) written in the hudi table is: col1, col2, col3. At this time, if the partition column field is col1, the exception will be generated. If the partition column field is col3, it can work.
>
> The exception stack is as follows:
> !image-2021-05-22-00-02-03-762.png!
> The local debugging is as follows:
> !image-2021-05-22-00-02-41-706.png!
> The location_type field is a partition field,and it is not at the end of the field order to occur the field name and field datatype to be misplaced in subsequent processing.
>
> *Initial diagnosis reason*:
> When reading the hudi table through Flink, org.apache.hudi.table.format.cow.ParquetSplitReaderUtil#genPartColumnarRowReader will be called. This method returns that the *selectedTypes* and *selectedFieldNames* arrays in the *ParquetColumnarRowSplitReader* object are misaligned.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)