You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/30 07:16:52 UTC

[GitHub] [iceberg] findepi commented on pull request #2891: Fix -NaN ordering in spec

findepi commented on pull request #2891:
URL: https://github.com/apache/iceberg/pull/2891#issuecomment-889685056


   > Is this specific to Java? Are -NaN values ordered in other languages?
   
   @rdblue this i do not know. Since the spec points at Java sorting as the 'reference', so i focused on that.
   
   > This brings up the question of how different NaN representations should be handled in Iceberg. Should writers canonicalize them? 
   
   @electrum this is a good question, and i was thinking about this too.
   it seems that, from Trino perspective, it doesn't matter much, because we treat all NaN values as indistinguishable. The canonicalzation is applied at comparison time, in the engine, so storage is not required to canonicalize. Of course, it would be better to have writers canonicalize, but I am concerned we will be never able to assume that at read time, because of pre-existing data.
   
   However, even if we follow this path, we still could want to define how NaNs interact with `distinct_counts` in manifest.
   Or, we would ignore `distinct_counts` whenever `nan_value_counts > 0`.
   (I don't know yet, whether this is important. We may or may not use `distinct_counts`.)
   
   > What do ORC and Parquet do for non-canonical values?
   
   @electrum you mean the reference writer implementations? i don't know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org