You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "tangchenhao (Jira)" <ji...@apache.org> on 2021/05/21 16:15:00 UTC

[jira] [Created] (HUDI-1919) Column misalignment occurs when reading the COPY_ON_WRITE type of hudi table through Flink

tangchenhao created HUDI-1919:
---------------------------------

             Summary: Column misalignment occurs when reading the COPY_ON_WRITE type of hudi table through Flink
                 Key: HUDI-1919
                 URL: https://issues.apache.org/jira/browse/HUDI-1919
             Project: Apache Hudi
          Issue Type: Bug
          Components: Flink Integration
         Environment: Hudi version : 0.9.0-SNAPSHOT
Flink version : 1.12.2
Hadoop version : 2.9.2
Storage (HDFS/S3/GCS..) : HDFS
            Reporter: tangchenhao
             Fix For: 0.9.0
         Attachments: image-2021-05-22-00-02-03-762.png, image-2021-05-22-00-02-41-706.png

The timing of the exception is: when the specified partition column field is not at the end of the sequence of fields written to the hudi table.

For example, if the order of the fields (including partition columns) written in the hudi table is: col1, col2, col3. At this time, if the partition column field is col1, the exception will be generated. If the partition column field is col3, it can work normally.

 

The exception stack is as follows：

!image-2021-05-22-00-02-03-762.png!

The local debugging is as follows:

!image-2021-05-22-00-02-41-706.png!

The location_type field is a partition field.

*Initial diagnosis reason*:

When reading the hudi table through Flink, org.apache.hudi.table.format.cow.ParquetSplitReaderUtil#genPartColumnarRowReader will be called. This method returns that the selectedTypes and selectedFieldNames arrays in the ParquetColumnarRowSplitReader object are misaligned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)