You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/02/26 08:37:00 UTC

[jira] [Updated] (PARQUET-1655) [C++] Decimal comparisons used for min/max statistics are not correct

     [ https://issues.apache.org/jira/browse/PARQUET-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated PARQUET-1655:
------------------------------------
    Labels: pull-request-available  (was: )

> [C++] Decimal comparisons used for min/max statistics are not correct
> ---------------------------------------------------------------------
>
>                 Key: PARQUET-1655
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1655
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Philip Felton
>            Assignee: Micah Kornfield
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The [Parquet Format specifications|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] says
> bq. If the column uses int32 or int64 physical types, then signed comparison of the integer values produces the correct ordering. If the physical type is fixed, then the correct ordering can be produced by flipping the most-significant bit in the first byte and then using unsigned byte-wise comparison.
> However this isn't followed in the C++ Parquet code. 16-byte decimal comparison is implemented using a lexicographical comparison of signed chars.
> This appears to be because the function [https://github.com/apache/arrow/blob/master/cpp/src/parquet/statistics.cc#L183] just goes off the sort_order (signed) and physical_type (FIXED_LENGTH_BYTE_ARRAY), there is no override for decimal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)