You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Ian Cook (Jira)" <ji...@apache.org> on 2021/01/27 22:27:00 UTC

[jira] [Updated] (ARROW-11409) [Integration] Enable Arrow to read Parquet files from Spark 2.x with illegal nulls

     [ https://issues.apache.org/jira/browse/ARROW-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Cook updated ARROW-11409:
-----------------------------
    Attachment: spark_2.0.0_illegal_null.parquet

> [Integration] Enable Arrow to read Parquet files from Spark 2.x with illegal nulls
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-11409
>                 URL: https://issues.apache.org/jira/browse/ARROW-11409
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 3.0.0
>            Reporter: Ian Cook
>            Priority: Minor
>         Attachments: spark_2.0.0_illegal_null.parquet
>
>
> While running integration tests with Arrow and Spark, I observed that Spark 2.x can in some circumstances write Parquet files with illegal nulls in non-nullable columns. (This appears to have been fixed in Spark 3.0.) Arrow throws an {{Unexpected end of stream}} error when attempting to read illegal Parquet files like this.
> The attached Parquet file written by Spark 2.0.0 can be used to repro this behavior. It contains only one column, a non-nullable integer named {{x}}, with three records:
> {code:java}
> +-----+
> |    x|
> +-----+
> |    1|
> | null|
> |    3|
> +-----+ 
> {code}
> This issue is for awareness only. I expect this should be closed as "won't fix".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)