You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "mike (Jira)" <ji...@apache.org> on 2021/10/12 13:45:00 UTC
[jira] [Updated] (SPARK-36983) ignoreCorruptFiles does not work
when schema change from int to string
[ https://issues.apache.org/jira/browse/SPARK-36983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
mike updated SPARK-36983:
-------------------------
Summary: ignoreCorruptFiles does not work when schema change from int to string (was: ignoreCorruptFiles does work when schema change from int to string)
> ignoreCorruptFiles does not work when schema change from int to string
> ----------------------------------------------------------------------
>
> Key: SPARK-36983
> URL: https://issues.apache.org/jira/browse/SPARK-36983
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.4.8, 3.1.2
> Reporter: mike
> Priority: Major
>
> Precondition:
> In folder A having two parquet files
> * File 1: have some columns and one of them is column X with data type Int
> * File 2: Same schema with File 1 except column X having data type String
> Read file 1 to get schema of file 1.
> Read folder A with schema of file 1.
> Expected: Read successfully, file 2 will be ignored as the data type of column X changed to string.
> Actual: File 2 seems to be not ignored and get error:
> `WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78 executor driver): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78 executor driver): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:45)`
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org