You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/04/19 09:45:00 UTC
[jira] [Updated] (HUDI-6103) Validate that fieldNames are valid for streaming reads
[ https://issues.apache.org/jira/browse/HUDI-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-6103:
---------------------------------
Labels: pull-request-available (was: )
> Validate that fieldNames are valid for streaming reads
> ------------------------------------------------------
>
> Key: HUDI-6103
> URL: https://issues.apache.org/jira/browse/HUDI-6103
> Project: Apache Hudi
> Issue Type: Improvement
> Components: flink, flink-sql
> Reporter: voon
> Assignee: voon
> Priority: Major
> Labels: pull-request-available
>
> The current error message that is thrown when an invalid fieldName is provided in the FlinkSQL table source DDL is ambiguous and not helpful.
>
> Example:
>
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.lambda$genPartColumnarRowReader$0(ParquetSplitReaderUtil.java:119)
> at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
> at java.util.Spliterators$IntArraySpliterator.forEachRemaining(Spliterators.java:1032)
> at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
> at org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.genPartColumnarRowReader(ParquetSplitReaderUtil.java:121)
> at org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:56)
> at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.getBaseFileIterator(MergeOnReadInputFormat.java:341)
> at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.getBaseFileIterator(MergeOnReadInputFormat.java:316)
> at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.initIterator(MergeOnReadInputFormat.java:200)
> at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.open(MergeOnReadInputFormat.java:185)
> at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.open(MergeOnReadInputFormat.java:91)
> at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84)
> at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
> at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333) {code}
>
>
> Such user-errors can be easily fixed, but the error message that is thrown does not make such an error obvious.
>
> This Jira ticket aims to add better error message so to make such errors more obvious.
>
>
> *How to trigger the error*
> For a source table with a schema as such:
>
> {code:java}
> CREATE TABLE `table_with_correct_schema` (
> `id` INT,
> `user_id` INT,
> `name` STRING,
> `partition_col` STRING
> ) PARTITIONED BY (`partition_col`)
> WITH (
> 'connector' = 'hudi',
> ...
> ){code}
>
> Change a column to an incorrect name as such when reading:
> {code:java}
> CREATE TABLE `table_with_correct_schema` (
> `id` INT,
> `user_id_with_typo123` INT,
> `name` STRING,
> `partition_col` STRING
> ) PARTITIONED BY (`partition_col`)
> WITH (
> 'connector' = 'hudi',
> ...
> ){code}
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)