You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Julien Le Dem (JIRA)" <ji...@apache.org> on 2017/06/07 17:36:18 UTC

[jira] [Resolved] (PARQUET-839) Min-max should be computed based on logical type

     [ https://issues.apache.org/jira/browse/PARQUET-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Le Dem resolved PARQUET-839.
-----------------------------------
    Resolution: Duplicate

> Min-max should be computed based on logical type
> ------------------------------------------------
>
>                 Key: PARQUET-839
>                 URL: https://issues.apache.org/jira/browse/PARQUET-839
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format
>    Affects Versions: format-2.3.1
>            Reporter: Tim Armstrong
>
> The min/max stats are currently underspecified - it is not clear in any cases from the spec what the expected ordering is.
> There are some related issues, like PARQUET-686 to fix specific problems, but there seems to be a general assumption that the min/max should be defined based on the primitive type instead of the logical type.
> However, this makes the stats nearly useless for some logical types. E.g. consider a DECIMAL encoded into a (variable-length) BINARY. The min-max of the underlying binary type is based on the lexical order of the byte string, but that does not correspond to any reasonable ordering of the decimal values. E.g. 16 (0x1 0x0) will be ordered between 1 (0x0) and (0x2). This makes min-max filtering a lot less effective and would force query engines using parquet to implement workarounds to produce correct results (e.g. custom comparators).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)