You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Danny Chen (Jira)" <ji...@apache.org> on 2023/01/12 08:05:00 UTC

[jira] [Assigned] (HUDI-5381) Class cast exception with Flink 1.15 source when reading table written using bulk insert

     [ https://issues.apache.org/jira/browse/HUDI-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Danny Chen reassigned HUDI-5381:
--------------------------------

    Assignee: Danny Chen

> Class cast exception with Flink 1.15 source when reading table written using bulk insert
> ----------------------------------------------------------------------------------------
>
>                 Key: HUDI-5381
>                 URL: https://issues.apache.org/jira/browse/HUDI-5381
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: flink
>            Reporter: Kenneth William Krugler
>            Assignee: Danny Chen
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.13.0
>
>
> When running a unit test that reads records which were written by a Flink workflow using MOR and batch insert, the following exception occurs:
> {{22/12/13 08:22:04 WARN taskmanager.Task:1104 - Source: Hudi Source -> Conver Row Data To Enriched Netflow (1/1)#5 (75289b6b98a35bc5d4522caaed3753a1) switched from RUNNING to FAILED with failure cause: java.lang.NoSuchMethodError: org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.nextRecord()Lorg/apache/flink/table/data/ColumnarRowData;
> 	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat$BaseFileOnlyFilteringIterator.reachedEnd(MergeOnReadInputFormat.java:510)
> 	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:245)
> 	at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:89)
> 	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
> 	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
> 	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:332)
> }}
> MergeOnReadInputFormat (which is part of hudi-flink) uses ParquetColumnarRowSplitReader to get the next record, and each Flink-specific jar (hudi-flink1.13x, 1.14x, 1.15x) has their own version of ParquetColumnarRowSplitReader.
> Unfortunately the nextRecord() method in this class returns Flinkā€™s ColumnarRowData, which changed packages between Flink 1.14 and 1.15. So when you use MergeOnReadInputFormat with Flink 1.15, you get a class cast error, because (I assume) MergeOnReadInputFormat was compiled against 1.13 or 1.14, where the result type from nextRecord() is org.apache.hudi.table.data.ColumnarRowData, not org.apache.flink.table.data.columnar.ColumnarRowData.
> This error didn't happen previously when the Flink workflow was using insert (versus bulk insert) to write the table...though I haven't dug into how this is related.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)