You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2022/10/09 03:49:00 UTC
[jira] [Commented] (PARQUET-1222) Specify a well-defined sorting order for float and double types
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614581#comment-17614581 ]
Micah Kornfield commented on PARQUET-1222:
------------------------------------------
Elevating the specification level seems fine. I was under the impression the thrift file was the specification? Where do we need to do the PR to elevate them?
> Specify a well-defined sorting order for float and double types
> ---------------------------------------------------------------
>
> Key: PARQUET-1222
> URL: https://issues.apache.org/jira/browse/PARQUET-1222
> Project: Parquet
> Issue Type: Bug
> Components: parquet-format
> Reporter: Zoltan Ivanfi
> Priority: Critical
>
> Currently parquet-format specifies the sort order for floating point numbers as follows:
> {code:java}
> * FLOAT - signed comparison of the represented value
> * DOUBLE - signed comparison of the represented value
> {code}
> The problem is that the comparison of floating point numbers is only a partial ordering with strange behaviour in specific corner cases. For example, according to IEEE 754, -0 is neither less nor more than \+0 and comparing NaN to anything always returns false. This ordering is not suitable for statistics. Additionally, the Java implementation already uses a different (total) ordering that handles these cases correctly but differently than the C\+\+ implementations, which leads to interoperability problems.
> TypeDefinedOrder for doubles and floats should be deprecated and a new TotalFloatingPointOrder should be introduced. The default for writing doubles and floats would be the new TotalFloatingPointOrder. This ordering should be effective and easy to implement in all programming languages.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)