You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ian Cook (Jira)" <ji...@apache.org> on 2021/01/27 22:27:00 UTC
[jira] [Updated] (ARROW-11409) [Integration] Enable Arrow to read
Parquet files from Spark 2.x with illegal nulls
[ https://issues.apache.org/jira/browse/ARROW-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Cook updated ARROW-11409:
-----------------------------
Attachment: spark_2.0.0_illegal_null.parquet
> [Integration] Enable Arrow to read Parquet files from Spark 2.x with illegal nulls
> ----------------------------------------------------------------------------------
>
> Key: ARROW-11409
> URL: https://issues.apache.org/jira/browse/ARROW-11409
> Project: Apache Arrow
> Issue Type: Bug
> Components: Integration
> Affects Versions: 3.0.0
> Reporter: Ian Cook
> Priority: Minor
> Attachments: spark_2.0.0_illegal_null.parquet
>
>
> While running integration tests with Arrow and Spark, I observed that Spark 2.x can in some circumstances write Parquet files with illegal nulls in non-nullable columns. (This appears to have been fixed in Spark 3.0.) Arrow throws an {{Unexpected end of stream}} error when attempting to read illegal Parquet files like this.
> The attached Parquet file written by Spark 2.0.0 can be used to repro this behavior. It contains only one column, a non-nullable integer named {{x}}, with three records:
> {code:java}
> +-----+
> | x|
> +-----+
> | 1|
> | null|
> | 3|
> +-----+
> {code}
> This issue is for awareness only. I expect this should be closed as "won't fix".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)