You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Attila Jeges (Jira)" <ji...@apache.org> on 2021/09/07 19:44:00 UTC

[jira] [Resolved] (IMPALA-10879) Add parquet stats to iceberg manifest

     [ https://issues.apache.org/jira/browse/IMPALA-10879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Attila Jeges resolved IMPALA-10879.
-----------------------------------
    Resolution: Implemented

> Add parquet stats to iceberg manifest
> -------------------------------------
>
>                 Key: IMPALA-10879
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10879
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>    Affects Versions: Impala 4.0.0
>            Reporter: Attila Jeges
>            Assignee: Attila Jeges
>            Priority: Major
>              Labels: impala-iceberg
>
> Parquet stats should be written to iceberg manifest as per-datafile metrics.
> This task is specifically about the following metrics:
> - column_sizes : Map from column id to the total size on disk of all regions that store the column. Does not include bytes necessary to read other columns, like footers. Leave null for row-oriented formats
> - null_value_counts : Map from column id to number of null values in the column.
> - lower_bounds : Map from column id to lower bound in the column serialized as binary. Each value must be less than or equal to all non-null, non-NaN values in the column for the file.
> - upper_bounds : Map from column id to upper bound in the column serialized as binary. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file.
> Iceberg manifest doc: 
> https://iceberg.apache.org/spec/#manifests
> lower_bounds and upper_bounds values should be Single-value serialized to binary:
> https://iceberg.apache.org/spec/#appendix-d-single-value-serialization



--
This message was sent by Atlassian Jira
(v8.3.4#803005)