You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/02/02 14:51:00 UTC

[jira] [Updated] (HUDI-1453) Throw Exception when input data schema is not equal to the hoodie table schema

     [ https://issues.apache.org/jira/browse/HUDI-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan updated HUDI-1453:
--------------------------------------
    Labels: pull-request-available user-support-issues  (was: pull-request-available)

> Throw Exception when input data schema is not equal to the hoodie table schema
> ------------------------------------------------------------------------------
>
>                 Key: HUDI-1453
>                 URL: https://issues.apache.org/jira/browse/HUDI-1453
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Writer Core
>            Reporter: pengzhiwei
>            Assignee: pengzhiwei
>            Priority: Major
>              Labels: pull-request-available, user-support-issues
>             Fix For: 0.8.0
>
>
> The hoodie table *h0's* schema is :
> {code:java}
> (id long, price double){code}
> when I write the *dataframe* to *h0* with the follow schema:
> {code:java}
> (id long, price int){code}
> An Exception is threw as follow:
> {code:java}
> at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136) at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49) at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:102) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ... 4 moreCaused by: java.lang.UnsupportedOperationException: org.apache.parquet.avro.AvroConverters$FieldIntegerConverter at org.apache.parquet.io.api.PrimitiveConverter.addDouble(PrimitiveConverter.java:84) at org.apache.parquet.column.impl.ColumnReaderImpl$2$2.writeValue(ColumnReaderImpl.java:228) at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367) at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226) ... 11 more
> {code}
> I have enable the *AVRO_SCHEMA_VALIDATE,* it    *can pass the schema validate in HoodieTable#validateUpsertSchema,* so it is right to write the "int" data to  the "double"  field in hoodie.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)