You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2022/04/24 07:10:00 UTC
[jira] [Created] (ARROW-16299) [Parquet] Null count in list arrays incorrect?
Jorge Leitão created ARROW-16299:
------------------------------------
Summary: [Parquet] Null count in list arrays incorrect?
Key: ARROW-16299
URL: https://issues.apache.org/jira/browse/ARROW-16299
Project: Apache Arrow
Issue Type: Bug
Components: Parquet
Reporter: Jorge Leitão
The minimal example below reproduces the point:
{code:python}
import pyarrow as pa # pyarrow==7
import pyarrow.parquet
path = "bla.parquet"
data = [[0, 1], None, [2, None, 3], [4, 5, 6], [], [7, 8, 9], None, [10]]
t = pa.table(
[pa.array(data)],
schema=pa.schema([pa.field("int64", pa.list_(pa.int64()), nullable=True)]),
)
pyarrow.parquet.write_table(t,path)
parquet_file = pyarrow.parquet.ParquetFile(path)
nulls = parquet_file.metadata.row_group(0).column(0).statistics.null_count
assert nulls == 1, nulls
{code}
the null count should be equal to 1 because the inner array only has one null value (right?)
--
This message was sent by Atlassian Jira
(v8.20.7#820007)