You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/29 22:21:06 UTC

[GitHub] [iceberg] yyanyy commented on a change in pull request #2055: Spec: add sort order to spec

yyanyy commented on a change in pull request #2055:
URL: https://github.com/apache/iceberg/pull/2055#discussion_r679522667



##########
File path: site/docs/spec.md
##########
@@ -254,6 +254,24 @@ Notes:
 2. The width, `W`, used to truncate decimal values is applied using the scale of the decimal column to avoid additional (and potentially conflicting) parameters.
 
 
+### Sorting
+
+Users can sort their data within partitions by columns to gain performance. The information on how the data is sorted can be declared per data or delete file, by a **sort order**.
+
+A sort order is defined by an sort order id and a list of sort fields. The order of the sort fields within the list defines the order in which the sort is applied to the data. Each sort field consists of:
+
+*   A **source column id** from the table's schema
+*   A **transform** that is used to produce values to be sorted on from the source column. This is the same transform as described in [partition transforms](#partition-transforms).
+*   A **sort direction**, that can only be either `asc` or `desc`
+*   A **null order** that describes the order of null values when sorted. Can only be either `nulls-first` or `nulls-last`
+
+Order id `0` is reserved for the unsorted order. 
+
+Sorting floating-point numbers should produce the following behavior: `-NaN` < `-Infinity` < `-value` < `-0` < `0` < `value` < `Infinity` < `NaN`. This aligns with the implementation of Java floating-point types comparisons. 

Review comment:
       Thanks for the input! You are right that in Java `-Double.NaN` equals `Double.NaN` so I think we should drop the mentioning of `-NaN` in spec. For the interpretation of having the sign bit set, I did some quick reading and it seems like NaN can be represented by a lot of ways (from [here](https://stackoverflow.com/a/2154512) it says anything that looks like `x111 1111 1axx xxxx xxxx xxxx xxxx xxxx` where x means anything would be NaN), and the sign bit for NaN seems to have [no meaning](https://en.wikipedia.org/wiki/IEEE_754). In this case I think we probably want to avoid mentioning `-NaN` in spec completely. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org