You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2019/09/16 20:40:01 UTC

[jira] [Closed] (PARQUET-1652) [C++] ColumnWriter writes incorrect "num_values" metadata for nested types

     [ https://issues.apache.org/jira/browse/PARQUET-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney closed PARQUET-1652.
---------------------------------
    Assignee:     (was: Wes McKinney)

> [C++] ColumnWriter writes incorrect "num_values" metadata for nested types
> --------------------------------------------------------------------------
>
>                 Key: PARQUET-1652
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1652
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Wes McKinney
>            Priority: Major
>
> While investigating ARROW-5630, I discovered that we are writing incorrect "num_values" metadata in {{DataPageHeader}} when writing nested types. Instead of writing "Number of values, including NULLs, in this data page" as the specification in parquet.thrift says, we are writing the number of definition levels. For flat types, the number of definition levels and number of values with nulls in the same, but for nested types the number of values with nulls will generally be smaller. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)