You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "JFinis (via GitHub)" <gi...@apache.org> on 2023/06/12 10:49:40 UTC

[GitHub] [parquet-format] JFinis commented on a diff in pull request #196: PARQUET-2249: Add nan_count to handle NaNs in statistics

JFinis commented on code in PR #196:
URL: https://github.com/apache/parquet-format/pull/196#discussion_r1226466358


##########
README.md:
##########
@@ -163,18 +163,25 @@ following rules:
       [Thrift definition](src/main/thrift/parquet.thrift) in the
       `ColumnOrder` union. They are summarized here but the Thrift definition
       is considered authoritative:
-      * NaNs should not be written to min or max statistics fields.
-      * If the computed max value is zero (whether negative or positive),
-        `+0.0` should be written into the max statistics field.
-      * If the computed min value is zero (whether negative or positive),
-        `-0.0` should be written into the min statistics field.
-
-      For backwards compatibility when reading files:
-      * If the min is a NaN, it should be ignored.
-      * If the max is a NaN, it should be ignored.
-      * If the min is +0, the row group may contain -0 values as well.
-      * If the max is -0, the row group may contain +0 values as well.
-      * When looking for NaN values, min and max should be ignored.
+      * The following compatibility rules should be applied when reading statistics:

Review Comment:
   I have removed the dulpicate explanation.



##########
src/main/thrift/parquet.thrift:
##########
@@ -223,6 +223,8 @@ struct Statistics {
     */
    5: optional binary max_value;
    6: optional binary min_value;
+   /** count of NaN values in the column; only present if type is FLOAT or DOUBLE */

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org