You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by zi...@apache.org on 2018/03/26 13:00:10 UTC
[parquet-format] branch master updated: PARQUET-1251: Clarify
ambiguous min/max stats for FLOAT/DOUBLE (#88)
This is an automated email from the ASF dual-hosted git repository.
zivanfi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git
The following commit(s) were added to refs/heads/master by this push:
new 952c263 PARQUET-1251: Clarify ambiguous min/max stats for FLOAT/DOUBLE (#88)
952c263 is described below
commit 952c26375eb15c6a27a770f26a6292264b1b7328
Author: Gabor Szadovszky <ga...@apache.org>
AuthorDate: Mon Mar 26 15:00:04 2018 +0200
PARQUET-1251: Clarify ambiguous min/max stats for FLOAT/DOUBLE (#88)
Describe handling of the ambigous min/max statistics for FLOAT/DOUBLE.
---
src/main/thrift/parquet.thrift | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index d0c4c31..f3aac25 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -751,10 +751,19 @@ union ColumnOrder {
* INT32 - signed comparison
* INT64 - signed comparison
* INT96 (only used for legacy timestamps) - undefined
- * FLOAT - signed comparison of the represented value
- * DOUBLE - signed comparison of the represented value
+ * FLOAT - signed comparison of the represented value (*)
+ * DOUBLE - signed comparison of the represented value (*)
* BYTE_ARRAY - unsigned byte-wise comparison
* FIXED_LEN_BYTE_ARRAY - unsigned byte-wise comparison
+ *
+ * (*) Because the sorting order is not specified properly for floating
+ * point values (relations vs. total ordering) the following
+ * compatibility rules should be applied when reading statistics:
+ * - If the min is a NaN, it should be ignored.
+ * - If the max is a NaN, it should be ignored.
+ * - If the min is +0, the row group may contain -0 values as well.
+ * - If the max is -0, the row group may contain +0 values as well.
+ * - When looking for NaN values, min and max should be ignored.
*/
1: TypeDefinedOrder TYPE_ORDER;
}
--
To stop receiving notification emails like this one, please contact
zivanfi@apache.org.