You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/01/17 11:03:00 UTC

[jira] [Resolved] (ARROW-11271) [Rust] [Parquet] List schema to Arrow parser misinterpreting child nullability

     [ https://issues.apache.org/jira/browse/ARROW-11271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Lamb resolved ARROW-11271.
---------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 9216
[https://github.com/apache/arrow/pull/9216]

> [Rust] [Parquet] List schema to Arrow parser misinterpreting child nullability
> ------------------------------------------------------------------------------
>
>                 Key: ARROW-11271
>                 URL: https://issues.apache.org/jira/browse/ARROW-11271
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust
>    Affects Versions: 2.0.0
>            Reporter: Neville Dipale
>            Assignee: Neville Dipale
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.0.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We currently do not propagate child nullability correctly when reading parquet files from Spark 3.0.1 (parquet-mr 1.10.1).
> For example, the below taken from [https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] is currently interpreted incorrectly:
>  
> {code:java}
> // List<String> (list nullable, elements non-null) 
> optional group my_list (LIST) {
>     repeated group list { 
>         required binary element (UTF8); 
>     } 
> }{code}
> The Arrow type should be:
> {code:java}
> Field::new(
>     "my_list",
>     DataType::List(
>         box Field::new("element", DataType::Utf8, nullable: false),
>     ),
>     nullable: true
> ){code}
> but we currently end up with 
> {code:java}
> Field::new(
>    "my_list",
>    DataType::List(
>        box Field::new("list", DataType::Utf8, nullable: true),
>    ),
>    nullable: true
> )
> {code}
> This doesn't seem to be an issue with the master branch as of opening this issue, so it might not be severe enough to try force into the 3.0.0 release.
> I tested null and non-null Spark files, and was able to read them correctly. This becomes an issue with nested lists, which I'm working on.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)