You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2022/04/24 07:10:00 UTC

[jira] [Created] (ARROW-16299) [Parquet] Null count in list arrays incorrect?

Jorge Leitão created ARROW-16299:
------------------------------------

             Summary: [Parquet] Null count in list arrays incorrect?
                 Key: ARROW-16299
                 URL: https://issues.apache.org/jira/browse/ARROW-16299
             Project: Apache Arrow
          Issue Type: Bug
          Components: Parquet
            Reporter: Jorge Leitão


The minimal example below reproduces the point:

{code:python}
import pyarrow as pa  # pyarrow==7
import pyarrow.parquet

path = "bla.parquet"
data = [[0, 1], None, [2, None, 3], [4, 5, 6], [], [7, 8, 9], None, [10]]

t = pa.table(
    [pa.array(data)],
    schema=pa.schema([pa.field("int64", pa.list_(pa.int64()), nullable=True)]),
)

pyarrow.parquet.write_table(t,path)

parquet_file = pyarrow.parquet.ParquetFile(path)
nulls = parquet_file.metadata.row_group(0).column(0).statistics.null_count
assert nulls == 1, nulls
{code}

the null count should be equal to 1 because the inner array only has one null value (right?)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)