You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2022/10/09 03:49:00 UTC

[jira] [Commented] (PARQUET-1222) Specify a well-defined sorting order for float and double types

    [ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614581#comment-17614581 ] 

Micah Kornfield commented on PARQUET-1222:
------------------------------------------

Elevating the specification level seems fine.  I was under the impression the thrift file was the specification?  Where do we need to do the PR to elevate them?

> Specify a well-defined sorting order for float and double types
> ---------------------------------------------------------------
>
>                 Key: PARQUET-1222
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1222
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format
>            Reporter: Zoltan Ivanfi
>            Priority: Critical
>
> Currently parquet-format specifies the sort order for floating point numbers as follows:
> {code:java}
>    *   FLOAT - signed comparison of the represented value
>    *   DOUBLE - signed comparison of the represented value
> {code}
> The problem is that the comparison of floating point numbers is only a partial ordering with strange behaviour in specific corner cases. For example, according to IEEE 754, -0 is neither less nor more than \+0 and comparing NaN to anything always returns false. This ordering is not suitable for statistics. Additionally, the Java implementation already uses a different (total) ordering that handles these cases correctly but differently than the C\+\+ implementations, which leads to interoperability problems.
> TypeDefinedOrder for doubles and floats should be deprecated and a new TotalFloatingPointOrder should be introduced. The default for writing doubles and floats would be the new TotalFloatingPointOrder. This ordering should be effective and easy to implement in all programming languages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)