You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/04/04 09:43:00 UTC
[jira] [Commented] (PARQUET-1222) Specify a well-defined sorting
order for float and double types
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314456#comment-17314456 ]
Antoine Pitrou commented on PARQUET-1222:
-----------------------------------------
I'll note that Parquet C++ now has the following behaviour:
* signed zeros are properly ordered (ARROW-5562)
* NaNs are ignored when computing min/max (PARQUET-1225); if a page or column chunk only has NaNs, the statistics are unset
> Specify a well-defined sorting order for float and double types
> ---------------------------------------------------------------
>
> Key: PARQUET-1222
> URL: https://issues.apache.org/jira/browse/PARQUET-1222
> Project: Parquet
> Issue Type: Bug
> Components: parquet-format
> Reporter: Zoltan Ivanfi
> Priority: Critical
>
> Currently parquet-format specifies the sort order for floating point numbers as follows:
> {code:java}
> * FLOAT - signed comparison of the represented value
> * DOUBLE - signed comparison of the represented value
> {code}
> The problem is that the comparison of floating point numbers is only a partial ordering with strange behaviour in specific corner cases. For example, according to IEEE 754, -0 is neither less nor more than \+0 and comparing NaN to anything always returns false. This ordering is not suitable for statistics. Additionally, the Java implementation already uses a different (total) ordering that handles these cases correctly but differently than the C\+\+ implementations, which leads to interoperability problems.
> TypeDefinedOrder for doubles and floats should be deprecated and a new TotalFloatingPointOrder should be introduced. The default for writing doubles and floats would be the new TotalFloatingPointOrder. This ordering should be effective and easy to implement in all programming languages.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)