You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/09/30 19:02:00 UTC

[jira] [Updated] (PARQUET-2067) [C++] null_count and num_nulls incorrect for repeated columns

     [ https://issues.apache.org/jira/browse/PARQUET-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated PARQUET-2067:
------------------------------------
    Labels: pull-request-available  (was: )

> [C++]  null_count and num_nulls incorrect for repeated columns
> --------------------------------------------------------------
>
>                 Key: PARQUET-2067
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2067
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Micah Kornfield
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently only nulls at the leaf are accounted for in the null count statstics.  For nested lists this is incorrect because null lists have zero elements and don't show up in the leaf.
>  
> Example from mailing list discussion
>  
> [[0, 1], None, [2, None, 3]]
>  
> should have a null count of 2 (it currently reports as 1).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)