You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ben Sully (Jira)" <ji...@apache.org> on 2020/12/30 16:38:00 UTC

[jira] [Updated] (ARROW-11077) ParquetFileArrowReader panicks when trying to read nested list

     [ https://issues.apache.org/jira/browse/ARROW-11077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ben Sully updated ARROW-11077:
------------------------------
    Description: 
I think this is documented in the code, but I can't be 100% sure.

When trying to execute a DataFusion query over a Parquet file where one field is a struct with a nested list, the thread panicks due to unwrapping on an `Option::None` [at this point|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337] [.|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337].] This `None` is returned by [`visit_primitive`|https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs#L1243-L1245], but I can't quite make sense of _why_ it returns a `None` rather than an error?

I added a couple of dbg! calls to see what the item_type and list_type are:
{code}
[/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1339] &item_type = PrimitiveType {
    basic_info: BasicTypeInfo {
        name: "item",
        repetition: Some(
            OPTIONAL,
        ),
        logical_type: UTF8,
        id: None,
    },
    physical_type: BYTE_ARRAY,
    type_length: -1,
    scale: -1,
    precision: -1,
}
[/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1340] &list_type = GroupType {
    basic_info: BasicTypeInfo {
        name: "tags",
        repetition: Some(
            OPTIONAL,
        ),
        logical_type: LIST,
        id: None,
    },
    fields: [
        GroupType {
            basic_info: BasicTypeInfo {
                name: "list",
                repetition: Some(
                    REPEATED,
                ),
                logical_type: NONE,
                id: None,
            },
            fields: [
                PrimitiveType {
                    basic_info: BasicTypeInfo {
                        name: "item",
                        repetition: Some(
                            OPTIONAL,
                        ),
                        logical_type: UTF8,
                        id: None,
                    },
                    physical_type: BYTE_ARRAY,
                    type_length: -1,
                    scale: -1,
                    precision: -1,
                },
            ],
        },
    ],
}{code}
I guess we should at least use `.expect` here instead of `.unwrap` so it's more clear why this is happening!

  was:
I think this is documented in the code, but I can't be 100% sure.

When trying to execute a DataFusion query over a Parquet file where one field is a struct with a nested list, the thread panicks due to unwrapping on an `Option::None` [at this point|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337] [.|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337].] This `None` is returned by [`visit_primitive`|https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs#L1243-L1245], but I can't quite make sense of _why_ it returns a `None` rather than an error?

I added a couple of dbg! calls to see what the item_type and list_type are:
{code:rust}
[/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1339] &item_type = PrimitiveType {
    basic_info: BasicTypeInfo {
        name: "item",
        repetition: Some(
            OPTIONAL,
        ),
        logical_type: UTF8,
        id: None,
    },
    physical_type: BYTE_ARRAY,
    type_length: -1,
    scale: -1,
    precision: -1,
}
[/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1340] &list_type = GroupType {
    basic_info: BasicTypeInfo {
        name: "tags",
        repetition: Some(
            OPTIONAL,
        ),
        logical_type: LIST,
        id: None,
    },
    fields: [
        GroupType {
            basic_info: BasicTypeInfo {
                name: "list",
                repetition: Some(
                    REPEATED,
                ),
                logical_type: NONE,
                id: None,
            },
            fields: [
                PrimitiveType {
                    basic_info: BasicTypeInfo {
                        name: "item",
                        repetition: Some(
                            OPTIONAL,
                        ),
                        logical_type: UTF8,
                        id: None,
                    },
                    physical_type: BYTE_ARRAY,
                    type_length: -1,
                    scale: -1,
                    precision: -1,
                },
            ],
        },
    ],
}{code}
I guess we should at least use `.expect` here instead of `.unwrap` so it's more clear why this is happening!


> ParquetFileArrowReader panicks when trying to read nested list
> --------------------------------------------------------------
>
>                 Key: ARROW-11077
>                 URL: https://issues.apache.org/jira/browse/ARROW-11077
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust
>            Reporter: Ben Sully
>            Priority: Major
>
> I think this is documented in the code, but I can't be 100% sure.
> When trying to execute a DataFusion query over a Parquet file where one field is a struct with a nested list, the thread panicks due to unwrapping on an `Option::None` [at this point|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337] [.|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337].] This `None` is returned by [`visit_primitive`|https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs#L1243-L1245], but I can't quite make sense of _why_ it returns a `None` rather than an error?
> I added a couple of dbg! calls to see what the item_type and list_type are:
> {code}
> [/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1339] &item_type = PrimitiveType {
>     basic_info: BasicTypeInfo {
>         name: "item",
>         repetition: Some(
>             OPTIONAL,
>         ),
>         logical_type: UTF8,
>         id: None,
>     },
>     physical_type: BYTE_ARRAY,
>     type_length: -1,
>     scale: -1,
>     precision: -1,
> }
> [/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1340] &list_type = GroupType {
>     basic_info: BasicTypeInfo {
>         name: "tags",
>         repetition: Some(
>             OPTIONAL,
>         ),
>         logical_type: LIST,
>         id: None,
>     },
>     fields: [
>         GroupType {
>             basic_info: BasicTypeInfo {
>                 name: "list",
>                 repetition: Some(
>                     REPEATED,
>                 ),
>                 logical_type: NONE,
>                 id: None,
>             },
>             fields: [
>                 PrimitiveType {
>                     basic_info: BasicTypeInfo {
>                         name: "item",
>                         repetition: Some(
>                             OPTIONAL,
>                         ),
>                         logical_type: UTF8,
>                         id: None,
>                     },
>                     physical_type: BYTE_ARRAY,
>                     type_length: -1,
>                     scale: -1,
>                     precision: -1,
>                 },
>             ],
>         },
>     ],
> }{code}
> I guess we should at least use `.expect` here instead of `.unwrap` so it's more clear why this is happening!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)