You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "mike (Jira)" <ji...@apache.org> on 2021/10/12 13:45:00 UTC

[jira] [Updated] (SPARK-36983) ignoreCorruptFiles does not work when schema change from int to string

     [ https://issues.apache.org/jira/browse/SPARK-36983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

mike updated SPARK-36983:
-------------------------
    Summary: ignoreCorruptFiles does not work when schema change from int to string  (was: ignoreCorruptFiles does work when schema change from int to string)

> ignoreCorruptFiles does not work when schema change from int to string
> ----------------------------------------------------------------------
>
>                 Key: SPARK-36983
>                 URL: https://issues.apache.org/jira/browse/SPARK-36983
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.8, 3.1.2
>            Reporter: mike
>            Priority: Major
>
> Precondition:
> In folder A having two parquet files
>  * File 1: have some columns and one of them is column X with data type Int
>  * File 2: Same schema with File 1 except column X  having data type String
> Read file 1 to get schema of file 1.
> Read folder A with schema of file 1.
> Expected: Read successfully, file 2 will be ignored as the data type of column X changed to string.
> Actual: File 2 seems to be not ignored and get error:
>  `WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78 executor driver): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78 executor driver): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:45)`
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org