You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ben Sully (Jira)" <ji...@apache.org> on 2020/12/30 16:38:00 UTC
[jira] [Updated] (ARROW-11077) ParquetFileArrowReader panicks when
trying to read nested list
[ https://issues.apache.org/jira/browse/ARROW-11077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben Sully updated ARROW-11077:
------------------------------
Description:
I think this is documented in the code, but I can't be 100% sure.
When trying to execute a DataFusion query over a Parquet file where one field is a struct with a nested list, the thread panicks due to unwrapping on an `Option::None` [at this point|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337] [.|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337].] This `None` is returned by [`visit_primitive`|https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs#L1243-L1245], but I can't quite make sense of _why_ it returns a `None` rather than an error?
I added a couple of dbg! calls to see what the item_type and list_type are:
{code}
[/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1339] &item_type = PrimitiveType {
basic_info: BasicTypeInfo {
name: "item",
repetition: Some(
OPTIONAL,
),
logical_type: UTF8,
id: None,
},
physical_type: BYTE_ARRAY,
type_length: -1,
scale: -1,
precision: -1,
}
[/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1340] &list_type = GroupType {
basic_info: BasicTypeInfo {
name: "tags",
repetition: Some(
OPTIONAL,
),
logical_type: LIST,
id: None,
},
fields: [
GroupType {
basic_info: BasicTypeInfo {
name: "list",
repetition: Some(
REPEATED,
),
logical_type: NONE,
id: None,
},
fields: [
PrimitiveType {
basic_info: BasicTypeInfo {
name: "item",
repetition: Some(
OPTIONAL,
),
logical_type: UTF8,
id: None,
},
physical_type: BYTE_ARRAY,
type_length: -1,
scale: -1,
precision: -1,
},
],
},
],
}{code}
I guess we should at least use `.expect` here instead of `.unwrap` so it's more clear why this is happening!
was:
I think this is documented in the code, but I can't be 100% sure.
When trying to execute a DataFusion query over a Parquet file where one field is a struct with a nested list, the thread panicks due to unwrapping on an `Option::None` [at this point|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337] [.|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337].] This `None` is returned by [`visit_primitive`|https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs#L1243-L1245], but I can't quite make sense of _why_ it returns a `None` rather than an error?
I added a couple of dbg! calls to see what the item_type and list_type are:
{code:rust}
[/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1339] &item_type = PrimitiveType {
basic_info: BasicTypeInfo {
name: "item",
repetition: Some(
OPTIONAL,
),
logical_type: UTF8,
id: None,
},
physical_type: BYTE_ARRAY,
type_length: -1,
scale: -1,
precision: -1,
}
[/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1340] &list_type = GroupType {
basic_info: BasicTypeInfo {
name: "tags",
repetition: Some(
OPTIONAL,
),
logical_type: LIST,
id: None,
},
fields: [
GroupType {
basic_info: BasicTypeInfo {
name: "list",
repetition: Some(
REPEATED,
),
logical_type: NONE,
id: None,
},
fields: [
PrimitiveType {
basic_info: BasicTypeInfo {
name: "item",
repetition: Some(
OPTIONAL,
),
logical_type: UTF8,
id: None,
},
physical_type: BYTE_ARRAY,
type_length: -1,
scale: -1,
precision: -1,
},
],
},
],
}{code}
I guess we should at least use `.expect` here instead of `.unwrap` so it's more clear why this is happening!
> ParquetFileArrowReader panicks when trying to read nested list
> --------------------------------------------------------------
>
> Key: ARROW-11077
> URL: https://issues.apache.org/jira/browse/ARROW-11077
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust
> Reporter: Ben Sully
> Priority: Major
>
> I think this is documented in the code, but I can't be 100% sure.
> When trying to execute a DataFusion query over a Parquet file where one field is a struct with a nested list, the thread panicks due to unwrapping on an `Option::None` [at this point|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337] [.|https://github.com/apache/arrow/blob/36d80e37373ab49454eb47b2a89c10215ca1b67e/rust/parquet/src/arrow/array_reader.rs#L1334-L1337].] This `None` is returned by [`visit_primitive`|https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs#L1243-L1245], but I can't quite make sense of _why_ it returns a `None` rather than an error?
> I added a couple of dbg! calls to see what the item_type and list_type are:
> {code}
> [/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1339] &item_type = PrimitiveType {
> basic_info: BasicTypeInfo {
> name: "item",
> repetition: Some(
> OPTIONAL,
> ),
> logical_type: UTF8,
> id: None,
> },
> physical_type: BYTE_ARRAY,
> type_length: -1,
> scale: -1,
> precision: -1,
> }
> [/home/ben/repos/rust/arrow/rust/parquet/src/arrow/array_reader.rs:1340] &list_type = GroupType {
> basic_info: BasicTypeInfo {
> name: "tags",
> repetition: Some(
> OPTIONAL,
> ),
> logical_type: LIST,
> id: None,
> },
> fields: [
> GroupType {
> basic_info: BasicTypeInfo {
> name: "list",
> repetition: Some(
> REPEATED,
> ),
> logical_type: NONE,
> id: None,
> },
> fields: [
> PrimitiveType {
> basic_info: BasicTypeInfo {
> name: "item",
> repetition: Some(
> OPTIONAL,
> ),
> logical_type: UTF8,
> id: None,
> },
> physical_type: BYTE_ARRAY,
> type_length: -1,
> scale: -1,
> precision: -1,
> },
> ],
> },
> ],
> }{code}
> I guess we should at least use `.expect` here instead of `.unwrap` so it's more clear why this is happening!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)