You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "&res (Jira)" <ji...@apache.org> on 2022/12/16 10:36:00 UTC
[jira] [Created] (ARROW-18439) Misleading message when loading parquet data with invalid null data
&res created ARROW-18439:
----------------------------
Summary: Misleading message when loading parquet data with invalid null data
Key: ARROW-18439
URL: https://issues.apache.org/jira/browse/ARROW-18439
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Affects Versions: 10.0.1
Reporter: &res
I'm saving an arrow table to parquet. One column is a list of structs, which elements are marked as non nullable. But the data isn't valid because I've put a null in one of the nested field.
When I save this data to parquet and try to load it back I get a very misleading message:
{code:java}
Length spanned by list offsets (2) larger than values array (length 1){code}
I would rather arrow complains when creating the table or when saving it to parquet.
Here's how to reproduce the issue:
{code:java}
struct = pa.struct(
[
pa.field("nested_string", pa.string(), nullable=False),
]
)
schema = pa.schema(
[pa.field("list_column", pa.list_(pa.field("item", struct, nullable=False)))]
)
table = pa.table(
{"list_column": [[{"nested_string": ""}, {"nested_string": None}]]}, schema=schema
)
with io.BytesIO() as file:
pq.write_table(table, file)
file.seek(0)
pq.read_table(file) # Raises pa.ArrowInvalid
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)