You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by zi...@apache.org on 2018/03/26 13:00:10 UTC

[parquet-format] branch master updated: PARQUET-1251: Clarify ambiguous min/max stats for FLOAT/DOUBLE (#88)

This is an automated email from the ASF dual-hosted git repository.

zivanfi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new 952c263  PARQUET-1251: Clarify ambiguous min/max stats for FLOAT/DOUBLE (#88)
952c263 is described below

commit 952c26375eb15c6a27a770f26a6292264b1b7328
Author: Gabor Szadovszky <ga...@apache.org>
AuthorDate: Mon Mar 26 15:00:04 2018 +0200

    PARQUET-1251: Clarify ambiguous min/max stats for FLOAT/DOUBLE (#88)
    
    Describe handling of the ambigous min/max statistics for FLOAT/DOUBLE.
---
 src/main/thrift/parquet.thrift | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index d0c4c31..f3aac25 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -751,10 +751,19 @@ union ColumnOrder {
    *   INT32 - signed comparison
    *   INT64 - signed comparison
    *   INT96 (only used for legacy timestamps) - undefined
-   *   FLOAT - signed comparison of the represented value
-   *   DOUBLE - signed comparison of the represented value
+   *   FLOAT - signed comparison of the represented value (*)
+   *   DOUBLE - signed comparison of the represented value (*)
    *   BYTE_ARRAY - unsigned byte-wise comparison
    *   FIXED_LEN_BYTE_ARRAY - unsigned byte-wise comparison
+   *
+   * (*) Because the sorting order is not specified properly for floating
+   *     point values (relations vs. total ordering) the following
+   *     compatibility rules should be applied when reading statistics:
+   *     - If the min is a NaN, it should be ignored.
+   *     - If the max is a NaN, it should be ignored.
+   *     - If the min is +0, the row group may contain -0 values as well.
+   *     - If the max is -0, the row group may contain +0 values as well.
+   *     - When looking for NaN values, min and max should be ignored.
    */
   1: TypeDefinedOrder TYPE_ORDER;
 }

-- 
To stop receiving notification emails like this one, please contact
zivanfi@apache.org.