You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2022/04/12 06:00:00 UTC

[jira] [Assigned] (HUDI-3096) fixed the bug that the cow table(contains decimalType) write by flink cannot be read by spark

     [ https://issues.apache.org/jira/browse/HUDI-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raymond Xu reassigned HUDI-3096:
--------------------------------

    Assignee: Tao Meng

> fixed the bug that  the cow table(contains decimalType) write by flink cannot be read by spark
> ----------------------------------------------------------------------------------------------
>
>                 Key: HUDI-3096
>                 URL: https://issues.apache.org/jira/browse/HUDI-3096
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: flink
>    Affects Versions: 0.10.0
>         Environment: flink  1.13.1
> spark 3.1.1
>            Reporter: Tao Meng
>            Assignee: Tao Meng
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> now,  flink will write decimalType as byte[]
> when spark read that decimal Type, if spark find the precision of current decimal is small spark treat it as int/long which caused the fllow error:
>  
> Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfs://xxxxx/tmp/hudi/hudi_xxxxx/46d44c57-aa43-41e2-a8aa-76dcc9dac7e4_0-4-0_20211221201230.parquet. Column: [c7], Expected: decimal(10,4), Found: BINARY
>   at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179)
>   at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>   at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:517)
>   at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)