You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by Anton Okolnychyi <ao...@apple.com.INVALID> on 2019/07/01 11:18:30 UTC

Sort Spec

Hey folks,

Iceberg users are advised not only to partition their data but also to sort within partitions by columns in predicates in order to get the best performance. Right now, this process is mostly manual and performed by users before writing.
I am wondering if we should extend Iceberg metadata so that query engines can do this automatically in the future. We already have `sortColumns` in DataFile but they are not used.
Do we need a notion of sort columns in TableMetadata?
Spark’s sort spec is tightly coupled with bucketing and cannot be used alone. However, it seems reasonable to have partitioned and sorted tables without bucketing. How do we see this in Iceberg?
If we decide to have sort spec in the metadata, do we want to make it part of PartitionSpec or have it separately?
Thanks,
Anton

Re: Sort Spec

Posted by Anton Okolnychyi <ao...@apple.com.INVALID>.

I think we are all on the same page. By that statement, I meant that we should not assume the current sort order is always applied to all files in the table, as that would require rewriting data immediately when we change the sort order. Also, different parts of the table can be ordered differently (e.g. we can use multi-dimensional clustering for partitions that do not change and use a less expensive sort order for partitions that we are writing to now).

> On 18 Jul 2019, at 20:11, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> 
> Yes, I agree. My point is that we have to support cases where data is not yet optimized. That's why I suggest we match up sort order of deltas with sort order of data files. Most of the time, this should be fine but we can't assume that it always will be.
> 
> On Thu, Jul 18, 2019 at 10:09 AM Owen O'Malley <owen.omalley@gmail.com <ma...@gmail.com>> wrote:
> I agree that we need to manage changes to the sort order, just like we need to handle changes to the schema. Neither one should require rewriting data immediately, but when data is compacted or restated, it could be sorted to the new order.
> 
> .. Owen
> 
> On Thu, Jul 18, 2019 at 10:01 AM Ryan Blue <rblue@netflix.com <ma...@netflix.com>> wrote:
> > This one seems really problematic. Too many important optimizations depend on the file sort order. Can we have the writer verify the sort order as the files are written
> 
> Even if we did, when the desired sort order changes we can't just rewrite all of the data in the table. I think that this will lead to cases where performance degrades without regular maintenance, but I don't see a way to handle that other than degrading performance. Forcing a rewrite to sort data when the desired order changes just isn't possible, right?
> 
> On Thu, Jul 18, 2019 at 7:33 AM Owen O'Malley <owen.omalley@gmail.com <ma...@gmail.com>> wrote:
> 
> 
> On Thu, Jul 18, 2019 at 5:30 AM Anton Okolnychyi <ao...@apple.com.invalid> wrote:
> Let me summarize what we talked here and follow up with a PR.
> 
> - Iceberg should allow users to define a sort oder in its metadata that applies to partitions.
> - We should never assume the sort order is actually applied to all files in the table.
> 
> This one seems really problematic. Too many important optimizations depend on the file sort order. Can we have the writer verify the sort order as the files are written?
> 
> - Sort orders might evolve and change over time. When this happens, existing files will not be rewritten. Query engines should follow the updated sort order during subsequent writes. As a result, files within a table or partition can be sorted differently at a given point in time.
> - We should be able to define a sort order even for unpartitioned tables, as opposed to current Spark tables that allow a sort order only for bucketed tables.
> - SortOrder is separate from PartitionSpec.
> - SortOrder will rely on transformations to define complex sort orders.
> - Files will be annotated with sort_order_id instead of sort_columns. We keep the question of file_ordinal open for now.
> - To begin with, we will support asc/desc natural sort orders (UTF8 ordering for Strings).
> 
> Thanks,
> Anton
> 
>> On 16 Jul 2019, at 23:56, Ryan Blue <rblue@netflix.com.INVALID <ma...@netflix.com.INVALID>> wrote:
>> 
>> I agree that Iceberg metadata should include a way to configure a desired sort order. But I want to note that I don’t think that we can ever assume that it has been applied. Table configuration will evolve as use changes. We don’t want to require rewrites when a configuration gets updated, so an assumption should be that data files might not be sorted.
>> 
>> Files that are sorted should indicate how they are sorted, so that optimizations are applied if the file’s metadata indicates it can be safely applied. For example, if both deletes and data rows are sorted the same way, you can merge the two streams instead of using a hash set to check whether a record has been deleted. I think this should rely on the delete file’s sort order matching the data file it is applied to.
>> 
>> Should Iceberg allow users to define a sort spec only if the table is bucketed?
>> 
>> No. In Iceberg, bucketing is just another partition transform.
>> 
>> However, I think we need to consider what a sort order will mean. Here are a few observations:
>> 
>> Each file can have a sort order for its rows (Spark’s sortWithinPartitions, which sorts each task’s data)
>> Sorting is also used to cluster values across files so it makes sense for a table sort order to be applied within partitions (ORDER BY)
>> Multiple writes to the same partition are not expected to rewrite existing data, so a partition may only be partially sorted or may have multiple sorted file sets
>> Partitioning is independent from sorting. Even when partitioning is orthogonal to a sort order (i.e., bucketing), partitioning must still take precedence.
>> My conclusion is that a configured sort order applies to partitions, not data across partitions. Again, bucketing is just another type of partition.
>> 
>> How should Iceberg encode sort specs?
>> 
>> I don’t think this should be in table properties. The sort order should reference columns by ID so it doesn’t need to be changed when columns are renamed. I think this should be implemented like PartitionSpec.
>> 
>> If sorting is applied within partitions, then I would make PartitionSpec and SortOrder separate. I would still use transforms to produce more complex sort orders. I think that’s a great idea, but we don’t need to mix partitioning and sorting to reuse transforms. Like partition specs, I think a table should be able to define multiple sort orders and each should be identified by ID. Then each data file can encode which sort order it was written with, just like manifests and partition specs.
>> 
>> I think we should add sort-orders like partition-specs, and a default-sort-order-id like default-spec-id. This would also require removing sort_columns from data files in the spec <http://iceberg.apache.org/spec/#manifests> and adding sort_order_id. We can keep file_ordinal, but probably want to add some context to know the group of files where it is valid. We could also remove it.
>> 
>> Which sort orders should Iceberg support?
>> 
>> I agree with what’s already been said: we should use a natural order for each type <https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>, ascending and descending. To start, Strings must use UTF-8’s natural ordering and we can expand from there.
>> 
>> Here’s what a sort order might look like:
>> 
>> "sort-orders": [
>>     { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] },
>>     { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, { "source-id": 5, "ascending": false } ] },
>>     { "id": 3, "ordering": [ { "transform": "zorder", "ascending": false, "source-ids": [4, 5] } ] },
>>   ]
>> 
>> On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi <aokolnychyi@apple.com.invalid <ma...@apple.com.invalid>> wrote:
>> In order to begin prototyping, I would start with the following questions.
>> 
>> 1) Does Iceberg need a sort spec?
>>  	- I would say yes
>> 2) Should Iceberg allow users to define a sort spec only if the table is bucketed?
>> 	- I would say no, as it seems valid to have partitioned and sorted tables.
>> 3) How should Iceberg encode sort specs?
>> 	- Option #1 is to rely on table properties, which will allow us to use ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am not sure it would be easy to encode non-trivial sort specs and track sort spec evolution (if needed).
>> 	- Option #2 is to extend PartitionSpec to cover sorting as well. This option will allow us to use transformations to encode non-trivial sorts and won't require many changes to the codebase.
>>   	- Option #3 is to store SortSpec separately from PartitionSpec. This will require more changes compared to Option #2 but can also give us extra flexibility.
>> 
>> Each option has its own trade-offs, but I tend to think #2 is reasonable.
>> 
>> 4) Which sort orders should Iceberg support?
>> 	- I think we have to be flexible and support adding more sort orders later. In addition to what Owen said, we can add sorting based on multi-dimensional space-filling curves in the future.
>> 
>> 
>> What do you think?
>> 
>> Thanks,
>> Anton
>> 
>>> On 1 Jul 2019, at 18:06, Owen O'Malley <owen.omalley@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> My thought is just like Iceberg has to define partitioning and bucketing, it has to define a canonical sort order. In particular, we can’t afford to have Spark, Presto, and Hive writing files in different orders. I believe the right approach is to define a sort order as a series of columns where each column is either ascending or descending and defining the natural sort order for each type.
>>> 
>>> The hard bit will be if we need to support non-natural sorts of strings. For example, if we need to support case-insensitive sorts or the different collations that databases support, I’d hope that we could start with the default of utf-8 byte ordering and expand as needed. If you are curious what the different collations look like - https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database <https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database> .
>>> 
>>> .. Owen
>>> 
>>>> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <aokolnychyi@apple.com.INVALID <ma...@apple.com.INVALID>> wrote:
>>>> 
>>>> Hey folks,
>>>> 
>>>> Iceberg users are advised not only to partition their data but also to sort within partitions by columns in predicates in order to get the best performance. Right now, this process is mostly manual and performed by users before writing.
>>>> I am wondering if we should extend Iceberg metadata so that query engines can do this automatically in the future. We already have `sortColumns` in DataFile but they are not used.
>>>> Do we need a notion of sort columns in TableMetadata?
>>>> Spark’s sort spec is tightly coupled with bucketing and cannot be used alone. However, it seems reasonable to have partitioned and sorted tables without bucketing. How do we see this in Iceberg?
>>>> If we decide to have sort spec in the metadata, do we want to make it part of PartitionSpec or have it separately?
>>>> Thanks,
>>>> Anton
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Ryan Blue
>> Software Engineer
>> Netflix
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Re: Sort Spec

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

Yes, I agree. My point is that we have to support cases where data is not
yet optimized. That's why I suggest we match up sort order of deltas with
sort order of data files. Most of the time, this should be fine but we
can't assume that it always will be.

On Thu, Jul 18, 2019 at 10:09 AM Owen O'Malley <ow...@gmail.com>
wrote:

> I agree that we need to manage changes to the sort order, just like we
> need to handle changes to the schema. Neither one should require rewriting
> data immediately, but when data is compacted or restated, it could be
> sorted to the new order.
>
> .. Owen
>
> On Thu, Jul 18, 2019 at 10:01 AM Ryan Blue <rb...@netflix.com> wrote:
>
>> > This one seems really problematic. Too many important optimizations
>> depend on the file sort order. Can we have the writer verify the sort order
>> as the files are written
>>
>> Even if we did, when the desired sort order changes we can't just rewrite
>> all of the data in the table. I think that this will lead to cases where
>> performance degrades without regular maintenance, but I don't see a way to
>> handle that other than degrading performance. Forcing a rewrite to sort
>> data when the desired order changes just isn't possible, right?
>>
>> On Thu, Jul 18, 2019 at 7:33 AM Owen O'Malley <ow...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Jul 18, 2019 at 5:30 AM Anton Okolnychyi
>>> <ao...@apple.com.invalid> wrote:
>>>
>>>> Let me summarize what we talked here and follow up with a PR.
>>>>
>>>> - Iceberg should allow users to define a sort oder in its metadata that
>>>> applies to partitions.
>>>> - We should never assume the sort order is actually applied to all
>>>> files in the table.
>>>>
>>>
>>> This one seems really problematic. Too many important optimizations
>>> depend on the file sort order. Can we have the writer verify the sort order
>>> as the files are written?
>>>
>>> - Sort orders might evolve and change over time. When this happens,
>>>> existing files will not be rewritten. Query engines should follow the
>>>> updated sort order during subsequent writes. As a result, files within a
>>>> table or partition can be sorted differently at a given point in time.
>>>> - We should be able to define a sort order even for unpartitioned
>>>> tables, as opposed to current Spark tables that allow a sort order only for
>>>> bucketed tables.
>>>> - SortOrder is separate from PartitionSpec.
>>>> - SortOrder will rely on transformations to define complex sort orders.
>>>> - Files will be annotated with sort_order_id instead of sort_columns.
>>>> We keep the question of file_ordinal open for now.
>>>> - To begin with, we will support asc/desc natural sort orders (UTF8
>>>> ordering for Strings).
>>>>
>>>> Thanks,
>>>> Anton
>>>>
>>>> On 16 Jul 2019, at 23:56, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>>>>
>>>> I agree that Iceberg metadata should include a way to configure a
>>>> desired sort order. But I want to note that I don’t think that we can ever
>>>> assume that it has been applied. Table configuration will evolve as use
>>>> changes. We don’t want to require rewrites when a configuration gets
>>>> updated, so an assumption should be that data files might not be sorted.
>>>>
>>>> Files that are sorted should indicate how they are sorted, so that
>>>> optimizations are applied if the file’s metadata indicates it can be safely
>>>> applied. For example, if both deletes and data rows are sorted the same
>>>> way, you can merge the two streams instead of using a hash set to check
>>>> whether a record has been deleted. I think this should rely on the delete
>>>> file’s sort order matching the data file it is applied to.
>>>>
>>>> Should Iceberg allow users to define a sort spec only if the table is
>>>> bucketed?
>>>>
>>>> No. In Iceberg, bucketing is just another partition transform.
>>>>
>>>> However, I think we need to consider what a sort order will mean. Here
>>>> are a few observations:
>>>>
>>>>    - Each file can have a sort order for its rows (Spark’s
>>>>    sortWithinPartitions, which sorts each task’s data)
>>>>    - Sorting is also used to cluster values across files so it makes
>>>>    sense for a table sort order to be applied within partitions (ORDER
>>>>    BY)
>>>>    - Multiple writes to the same partition are not expected to rewrite
>>>>    existing data, so a partition may only be partially sorted or may have
>>>>    multiple sorted file sets
>>>>    - Partitioning is independent from sorting. Even when partitioning
>>>>    is orthogonal to a sort order (i.e., bucketing), partitioning must still
>>>>    take precedence.
>>>>
>>>> My conclusion is that a configured sort order applies to partitions,
>>>> not data across partitions. Again, bucketing is just another type of
>>>> partition.
>>>>
>>>> How should Iceberg encode sort specs?
>>>>
>>>> I don’t think this should be in table properties. The sort order should
>>>> reference columns by ID so it doesn’t need to be changed when columns are
>>>> renamed. I think this should be implemented like PartitionSpec.
>>>>
>>>> If sorting is applied within partitions, then I would make
>>>> PartitionSpec and SortOrder separate. I would still use transforms to
>>>> produce more complex sort orders. I think that’s a great idea, but we don’t
>>>> need to mix partitioning and sorting to reuse transforms. Like partition
>>>> specs, I think a table should be able to define multiple sort orders and
>>>> each should be identified by ID. Then each data file can encode which sort
>>>> order it was written with, just like manifests and partition specs.
>>>>
>>>> I think we should add sort-orders like partition-specs, and a
>>>> default-sort-order-id like default-spec-id. This would also require
>>>> removing sort_columns from data files in the spec
>>>> <http://iceberg.apache.org/spec/#manifests> and adding sort_order_id.
>>>> We can keep file_ordinal, but probably want to add some context to
>>>> know the group of files where it is valid. We could also remove it.
>>>>
>>>> Which sort orders should Iceberg support?
>>>>
>>>> I agree with what’s already been said: we should use a natural order
>>>> for each type
>>>> <https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>,
>>>> ascending and descending. To start, Strings must use UTF-8’s natural
>>>> ordering and we can expand from there.
>>>>
>>>> Here’s what a sort order might look like:
>>>>
>>>> "sort-orders": [
>>>>     { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] },
>>>>     { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, { "source-id": 5, "ascending": false } ] },
>>>>     { "id": 3, "ordering": [ { "transform": "zorder", "ascending": false, "source-ids": [4, 5] } ] },
>>>>   ]
>>>>
>>>>
>>>> On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi <
>>>> aokolnychyi@apple.com.invalid> wrote:
>>>>
>>>>> In order to begin prototyping, I would start with the following
>>>>> questions.
>>>>>
>>>>> 1) Does Iceberg need a sort spec?
>>>>>   - I would say yes
>>>>> 2) Should Iceberg allow users to define a sort spec only if the table
>>>>> is bucketed?
>>>>> - I would say no, as it seems valid to have partitioned and sorted
>>>>> tables.
>>>>> 3) How should Iceberg encode sort specs?
>>>>> - Option #1 is to rely on table properties, which will allow us to use
>>>>> ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am
>>>>> not sure it would be easy to encode non-trivial sort specs and track sort
>>>>> spec evolution (if needed).
>>>>> - Option #2 is to extend PartitionSpec to cover sorting as well. This
>>>>> option will allow us to use transformations to encode non-trivial sorts and
>>>>> won't require many changes to the codebase.
>>>>>   - Option #3 is to store SortSpec separately from PartitionSpec.
>>>>> This will require more changes compared to Option #2 but can also give us
>>>>> extra flexibility.
>>>>>
>>>>> Each option has its own trade-offs, but I tend to think #2 is
>>>>> reasonable.
>>>>>
>>>>> 4) Which sort orders should Iceberg support?
>>>>> - I think we have to be flexible and support adding more sort orders
>>>>> later. In addition to what Owen said, we can add sorting based on
>>>>> multi-dimensional space-filling curves in the future.
>>>>>
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Thanks,
>>>>> Anton
>>>>>
>>>>> On 1 Jul 2019, at 18:06, Owen O'Malley <ow...@gmail.com> wrote:
>>>>>
>>>>> My thought is just like Iceberg has to define partitioning and
>>>>> bucketing, it has to define a canonical sort order. In particular, we can’t
>>>>> afford to have Spark, Presto, and Hive writing files in different orders. I
>>>>> believe the right approach is to define a sort order as a series of columns
>>>>> where each column is either ascending or descending and defining the
>>>>> natural sort order for each type.
>>>>>
>>>>> The hard bit will be if we need to support non-natural sorts of
>>>>> strings. For example, if we need to support case-insensitive sorts or the
>>>>> different collations that databases support, I’d hope that we could start
>>>>> with the default of utf-8 byte ordering and expand as needed. If you are
>>>>> curious what the different collations look like -
>>>>> https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database
>>>>>  .
>>>>>
>>>>> .. Owen
>>>>>
>>>>> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <
>>>>> aokolnychyi@apple.com.INVALID> wrote:
>>>>>
>>>>> Hey folks,
>>>>>
>>>>> Iceberg users are advised not only to partition their data but also to
>>>>> sort within partitions by columns in predicates in order to get the best
>>>>> performance. Right now, this process is mostly manual and performed by
>>>>> users before writing.
>>>>> I am wondering if we should extend Iceberg metadata so that query
>>>>> engines can do this automatically in the future. We already have
>>>>> `sortColumns` in DataFile but they are not used.
>>>>>
>>>>>    - Do we need a notion of sort columns in TableMetadata?
>>>>>    - Spark’s sort spec is tightly coupled with bucketing and cannot
>>>>>    be used alone. However, it seems reasonable to have partitioned and sorted
>>>>>    tables without bucketing. How do we see this in Iceberg?
>>>>>    - If we decide to have sort spec in the metadata, do we want to
>>>>>    make it part of PartitionSpec or have it separately?
>>>>>
>>>>> Thanks,
>>>>> Anton
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>>
>>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Sort Spec

Posted by Owen O'Malley <ow...@gmail.com>.

I agree that we need to manage changes to the sort order, just like we need
to handle changes to the schema. Neither one should require rewriting data
immediately, but when data is compacted or restated, it could be sorted to
the new order.

.. Owen

On Thu, Jul 18, 2019 at 10:01 AM Ryan Blue <rb...@netflix.com> wrote:

> > This one seems really problematic. Too many important optimizations
> depend on the file sort order. Can we have the writer verify the sort order
> as the files are written
>
> Even if we did, when the desired sort order changes we can't just rewrite
> all of the data in the table. I think that this will lead to cases where
> performance degrades without regular maintenance, but I don't see a way to
> handle that other than degrading performance. Forcing a rewrite to sort
> data when the desired order changes just isn't possible, right?
>
> On Thu, Jul 18, 2019 at 7:33 AM Owen O'Malley <ow...@gmail.com>
> wrote:
>
>>
>>
>> On Thu, Jul 18, 2019 at 5:30 AM Anton Okolnychyi
>> <ao...@apple.com.invalid> wrote:
>>
>>> Let me summarize what we talked here and follow up with a PR.
>>>
>>> - Iceberg should allow users to define a sort oder in its metadata that
>>> applies to partitions.
>>> - We should never assume the sort order is actually applied to all files
>>> in the table.
>>>
>>
>> This one seems really problematic. Too many important optimizations
>> depend on the file sort order. Can we have the writer verify the sort order
>> as the files are written?
>>
>> - Sort orders might evolve and change over time. When this happens,
>>> existing files will not be rewritten. Query engines should follow the
>>> updated sort order during subsequent writes. As a result, files within a
>>> table or partition can be sorted differently at a given point in time.
>>> - We should be able to define a sort order even for unpartitioned
>>> tables, as opposed to current Spark tables that allow a sort order only for
>>> bucketed tables.
>>> - SortOrder is separate from PartitionSpec.
>>> - SortOrder will rely on transformations to define complex sort orders.
>>> - Files will be annotated with sort_order_id instead of sort_columns. We
>>> keep the question of file_ordinal open for now.
>>> - To begin with, we will support asc/desc natural sort orders (UTF8
>>> ordering for Strings).
>>>
>>> Thanks,
>>> Anton
>>>
>>> On 16 Jul 2019, at 23:56, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>>>
>>> I agree that Iceberg metadata should include a way to configure a
>>> desired sort order. But I want to note that I don’t think that we can ever
>>> assume that it has been applied. Table configuration will evolve as use
>>> changes. We don’t want to require rewrites when a configuration gets
>>> updated, so an assumption should be that data files might not be sorted.
>>>
>>> Files that are sorted should indicate how they are sorted, so that
>>> optimizations are applied if the file’s metadata indicates it can be safely
>>> applied. For example, if both deletes and data rows are sorted the same
>>> way, you can merge the two streams instead of using a hash set to check
>>> whether a record has been deleted. I think this should rely on the delete
>>> file’s sort order matching the data file it is applied to.
>>>
>>> Should Iceberg allow users to define a sort spec only if the table is
>>> bucketed?
>>>
>>> No. In Iceberg, bucketing is just another partition transform.
>>>
>>> However, I think we need to consider what a sort order will mean. Here
>>> are a few observations:
>>>
>>>    - Each file can have a sort order for its rows (Spark’s
>>>    sortWithinPartitions, which sorts each task’s data)
>>>    - Sorting is also used to cluster values across files so it makes
>>>    sense for a table sort order to be applied within partitions (ORDER
>>>    BY)
>>>    - Multiple writes to the same partition are not expected to rewrite
>>>    existing data, so a partition may only be partially sorted or may have
>>>    multiple sorted file sets
>>>    - Partitioning is independent from sorting. Even when partitioning
>>>    is orthogonal to a sort order (i.e., bucketing), partitioning must still
>>>    take precedence.
>>>
>>> My conclusion is that a configured sort order applies to partitions, not
>>> data across partitions. Again, bucketing is just another type of partition.
>>>
>>> How should Iceberg encode sort specs?
>>>
>>> I don’t think this should be in table properties. The sort order should
>>> reference columns by ID so it doesn’t need to be changed when columns are
>>> renamed. I think this should be implemented like PartitionSpec.
>>>
>>> If sorting is applied within partitions, then I would make PartitionSpec
>>> and SortOrder separate. I would still use transforms to produce more
>>> complex sort orders. I think that’s a great idea, but we don’t need to mix
>>> partitioning and sorting to reuse transforms. Like partition specs, I think
>>> a table should be able to define multiple sort orders and each should be
>>> identified by ID. Then each data file can encode which sort order it was
>>> written with, just like manifests and partition specs.
>>>
>>> I think we should add sort-orders like partition-specs, and a
>>> default-sort-order-id like default-spec-id. This would also require
>>> removing sort_columns from data files in the spec
>>> <http://iceberg.apache.org/spec/#manifests> and adding sort_order_id.
>>> We can keep file_ordinal, but probably want to add some context to know
>>> the group of files where it is valid. We could also remove it.
>>>
>>> Which sort orders should Iceberg support?
>>>
>>> I agree with what’s already been said: we should use a natural order
>>> for each type
>>> <https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>,
>>> ascending and descending. To start, Strings must use UTF-8’s natural
>>> ordering and we can expand from there.
>>>
>>> Here’s what a sort order might look like:
>>>
>>> "sort-orders": [
>>>     { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] },
>>>     { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, { "source-id": 5, "ascending": false } ] },
>>>     { "id": 3, "ordering": [ { "transform": "zorder", "ascending": false, "source-ids": [4, 5] } ] },
>>>   ]
>>>
>>>
>>> On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi <
>>> aokolnychyi@apple.com.invalid> wrote:
>>>
>>>> In order to begin prototyping, I would start with the following
>>>> questions.
>>>>
>>>> 1) Does Iceberg need a sort spec?
>>>>   - I would say yes
>>>> 2) Should Iceberg allow users to define a sort spec only if the table
>>>> is bucketed?
>>>> - I would say no, as it seems valid to have partitioned and sorted
>>>> tables.
>>>> 3) How should Iceberg encode sort specs?
>>>> - Option #1 is to rely on table properties, which will allow us to use
>>>> ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am
>>>> not sure it would be easy to encode non-trivial sort specs and track sort
>>>> spec evolution (if needed).
>>>> - Option #2 is to extend PartitionSpec to cover sorting as well. This
>>>> option will allow us to use transformations to encode non-trivial sorts and
>>>> won't require many changes to the codebase.
>>>>   - Option #3 is to store SortSpec separately from PartitionSpec. This
>>>> will require more changes compared to Option #2 but can also give us extra
>>>> flexibility.
>>>>
>>>> Each option has its own trade-offs, but I tend to think #2 is
>>>> reasonable.
>>>>
>>>> 4) Which sort orders should Iceberg support?
>>>> - I think we have to be flexible and support adding more sort orders
>>>> later. In addition to what Owen said, we can add sorting based on
>>>> multi-dimensional space-filling curves in the future.
>>>>
>>>>
>>>> What do you think?
>>>>
>>>> Thanks,
>>>> Anton
>>>>
>>>> On 1 Jul 2019, at 18:06, Owen O'Malley <ow...@gmail.com> wrote:
>>>>
>>>> My thought is just like Iceberg has to define partitioning and
>>>> bucketing, it has to define a canonical sort order. In particular, we can’t
>>>> afford to have Spark, Presto, and Hive writing files in different orders. I
>>>> believe the right approach is to define a sort order as a series of columns
>>>> where each column is either ascending or descending and defining the
>>>> natural sort order for each type.
>>>>
>>>> The hard bit will be if we need to support non-natural sorts of
>>>> strings. For example, if we need to support case-insensitive sorts or the
>>>> different collations that databases support, I’d hope that we could start
>>>> with the default of utf-8 byte ordering and expand as needed. If you are
>>>> curious what the different collations look like -
>>>> https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database
>>>>  .
>>>>
>>>> .. Owen
>>>>
>>>> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <
>>>> aokolnychyi@apple.com.INVALID> wrote:
>>>>
>>>> Hey folks,
>>>>
>>>> Iceberg users are advised not only to partition their data but also to
>>>> sort within partitions by columns in predicates in order to get the best
>>>> performance. Right now, this process is mostly manual and performed by
>>>> users before writing.
>>>> I am wondering if we should extend Iceberg metadata so that query
>>>> engines can do this automatically in the future. We already have
>>>> `sortColumns` in DataFile but they are not used.
>>>>
>>>>    - Do we need a notion of sort columns in TableMetadata?
>>>>    - Spark’s sort spec is tightly coupled with bucketing and cannot be
>>>>    used alone. However, it seems reasonable to have partitioned and sorted
>>>>    tables without bucketing. How do we see this in Iceberg?
>>>>    - If we decide to have sort spec in the metadata, do we want to
>>>>    make it part of PartitionSpec or have it separately?
>>>>
>>>> Thanks,
>>>> Anton
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>>
>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Sort Spec

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

> This one seems really problematic. Too many important optimizations
depend on the file sort order. Can we have the writer verify the sort order
as the files are written

Even if we did, when the desired sort order changes we can't just rewrite
all of the data in the table. I think that this will lead to cases where
performance degrades without regular maintenance, but I don't see a way to
handle that other than degrading performance. Forcing a rewrite to sort
data when the desired order changes just isn't possible, right?

On Thu, Jul 18, 2019 at 7:33 AM Owen O'Malley <ow...@gmail.com>
wrote:

>
>
> On Thu, Jul 18, 2019 at 5:30 AM Anton Okolnychyi
> <ao...@apple.com.invalid> wrote:
>
>> Let me summarize what we talked here and follow up with a PR.
>>
>> - Iceberg should allow users to define a sort oder in its metadata that
>> applies to partitions.
>> - We should never assume the sort order is actually applied to all files
>> in the table.
>>
>
> This one seems really problematic. Too many important optimizations depend
> on the file sort order. Can we have the writer verify the sort order as the
> files are written?
>
> - Sort orders might evolve and change over time. When this happens,
>> existing files will not be rewritten. Query engines should follow the
>> updated sort order during subsequent writes. As a result, files within a
>> table or partition can be sorted differently at a given point in time.
>> - We should be able to define a sort order even for unpartitioned tables,
>> as opposed to current Spark tables that allow a sort order only for
>> bucketed tables.
>> - SortOrder is separate from PartitionSpec.
>> - SortOrder will rely on transformations to define complex sort orders.
>> - Files will be annotated with sort_order_id instead of sort_columns. We
>> keep the question of file_ordinal open for now.
>> - To begin with, we will support asc/desc natural sort orders (UTF8
>> ordering for Strings).
>>
>> Thanks,
>> Anton
>>
>> On 16 Jul 2019, at 23:56, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>>
>> I agree that Iceberg metadata should include a way to configure a desired
>> sort order. But I want to note that I don’t think that we can ever assume
>> that it has been applied. Table configuration will evolve as use changes.
>> We don’t want to require rewrites when a configuration gets updated, so an
>> assumption should be that data files might not be sorted.
>>
>> Files that are sorted should indicate how they are sorted, so that
>> optimizations are applied if the file’s metadata indicates it can be safely
>> applied. For example, if both deletes and data rows are sorted the same
>> way, you can merge the two streams instead of using a hash set to check
>> whether a record has been deleted. I think this should rely on the delete
>> file’s sort order matching the data file it is applied to.
>>
>> Should Iceberg allow users to define a sort spec only if the table is
>> bucketed?
>>
>> No. In Iceberg, bucketing is just another partition transform.
>>
>> However, I think we need to consider what a sort order will mean. Here
>> are a few observations:
>>
>>    - Each file can have a sort order for its rows (Spark’s
>>    sortWithinPartitions, which sorts each task’s data)
>>    - Sorting is also used to cluster values across files so it makes
>>    sense for a table sort order to be applied within partitions (ORDER BY
>>    )
>>    - Multiple writes to the same partition are not expected to rewrite
>>    existing data, so a partition may only be partially sorted or may have
>>    multiple sorted file sets
>>    - Partitioning is independent from sorting. Even when partitioning is
>>    orthogonal to a sort order (i.e., bucketing), partitioning must still take
>>    precedence.
>>
>> My conclusion is that a configured sort order applies to partitions, not
>> data across partitions. Again, bucketing is just another type of partition.
>>
>> How should Iceberg encode sort specs?
>>
>> I don’t think this should be in table properties. The sort order should
>> reference columns by ID so it doesn’t need to be changed when columns are
>> renamed. I think this should be implemented like PartitionSpec.
>>
>> If sorting is applied within partitions, then I would make PartitionSpec
>> and SortOrder separate. I would still use transforms to produce more
>> complex sort orders. I think that’s a great idea, but we don’t need to mix
>> partitioning and sorting to reuse transforms. Like partition specs, I think
>> a table should be able to define multiple sort orders and each should be
>> identified by ID. Then each data file can encode which sort order it was
>> written with, just like manifests and partition specs.
>>
>> I think we should add sort-orders like partition-specs, and a
>> default-sort-order-id like default-spec-id. This would also require
>> removing sort_columns from data files in the spec
>> <http://iceberg.apache.org/spec/#manifests> and adding sort_order_id. We
>> can keep file_ordinal, but probably want to add some context to know the
>> group of files where it is valid. We could also remove it.
>>
>> Which sort orders should Iceberg support?
>>
>> I agree with what’s already been said: we should use a natural order for
>> each type
>> <https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>,
>> ascending and descending. To start, Strings must use UTF-8’s natural
>> ordering and we can expand from there.
>>
>> Here’s what a sort order might look like:
>>
>> "sort-orders": [
>>     { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] },
>>     { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, { "source-id": 5, "ascending": false } ] },
>>     { "id": 3, "ordering": [ { "transform": "zorder", "ascending": false, "source-ids": [4, 5] } ] },
>>   ]
>>
>>
>> On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi <
>> aokolnychyi@apple.com.invalid> wrote:
>>
>>> In order to begin prototyping, I would start with the following
>>> questions.
>>>
>>> 1) Does Iceberg need a sort spec?
>>>   - I would say yes
>>> 2) Should Iceberg allow users to define a sort spec only if the table is
>>> bucketed?
>>> - I would say no, as it seems valid to have partitioned and sorted
>>> tables.
>>> 3) How should Iceberg encode sort specs?
>>> - Option #1 is to rely on table properties, which will allow us to use
>>> ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am
>>> not sure it would be easy to encode non-trivial sort specs and track sort
>>> spec evolution (if needed).
>>> - Option #2 is to extend PartitionSpec to cover sorting as well. This
>>> option will allow us to use transformations to encode non-trivial sorts and
>>> won't require many changes to the codebase.
>>>   - Option #3 is to store SortSpec separately from PartitionSpec. This
>>> will require more changes compared to Option #2 but can also give us extra
>>> flexibility.
>>>
>>> Each option has its own trade-offs, but I tend to think #2 is reasonable.
>>>
>>> 4) Which sort orders should Iceberg support?
>>> - I think we have to be flexible and support adding more sort orders
>>> later. In addition to what Owen said, we can add sorting based on
>>> multi-dimensional space-filling curves in the future.
>>>
>>>
>>> What do you think?
>>>
>>> Thanks,
>>> Anton
>>>
>>> On 1 Jul 2019, at 18:06, Owen O'Malley <ow...@gmail.com> wrote:
>>>
>>> My thought is just like Iceberg has to define partitioning and
>>> bucketing, it has to define a canonical sort order. In particular, we can’t
>>> afford to have Spark, Presto, and Hive writing files in different orders. I
>>> believe the right approach is to define a sort order as a series of columns
>>> where each column is either ascending or descending and defining the
>>> natural sort order for each type.
>>>
>>> The hard bit will be if we need to support non-natural sorts of strings.
>>> For example, if we need to support case-insensitive sorts or the different
>>> collations that databases support, I’d hope that we could start with the
>>> default of utf-8 byte ordering and expand as needed. If you are curious
>>> what the different collations look like -
>>> https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database
>>>  .
>>>
>>> .. Owen
>>>
>>> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <
>>> aokolnychyi@apple.com.INVALID> wrote:
>>>
>>> Hey folks,
>>>
>>> Iceberg users are advised not only to partition their data but also to
>>> sort within partitions by columns in predicates in order to get the best
>>> performance. Right now, this process is mostly manual and performed by
>>> users before writing.
>>> I am wondering if we should extend Iceberg metadata so that query
>>> engines can do this automatically in the future. We already have
>>> `sortColumns` in DataFile but they are not used.
>>>
>>>    - Do we need a notion of sort columns in TableMetadata?
>>>    - Spark’s sort spec is tightly coupled with bucketing and cannot be
>>>    used alone. However, it seems reasonable to have partitioned and sorted
>>>    tables without bucketing. How do we see this in Iceberg?
>>>    - If we decide to have sort spec in the metadata, do we want to make
>>>    it part of PartitionSpec or have it separately?
>>>
>>> Thanks,
>>> Anton
>>>
>>>
>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>>
>>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Sort Spec

Posted by Owen O'Malley <ow...@gmail.com>.

On Thu, Jul 18, 2019 at 5:30 AM Anton Okolnychyi
<ao...@apple.com.invalid> wrote:

> Let me summarize what we talked here and follow up with a PR.
>
> - Iceberg should allow users to define a sort oder in its metadata that
> applies to partitions.
> - We should never assume the sort order is actually applied to all files
> in the table.
>

This one seems really problematic. Too many important optimizations depend
on the file sort order. Can we have the writer verify the sort order as the
files are written?

- Sort orders might evolve and change over time. When this happens,
> existing files will not be rewritten. Query engines should follow the
> updated sort order during subsequent writes. As a result, files within a
> table or partition can be sorted differently at a given point in time.
> - We should be able to define a sort order even for unpartitioned tables,
> as opposed to current Spark tables that allow a sort order only for
> bucketed tables.
> - SortOrder is separate from PartitionSpec.
> - SortOrder will rely on transformations to define complex sort orders.
> - Files will be annotated with sort_order_id instead of sort_columns. We
> keep the question of file_ordinal open for now.
> - To begin with, we will support asc/desc natural sort orders (UTF8
> ordering for Strings).
>
> Thanks,
> Anton
>
> On 16 Jul 2019, at 23:56, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>
> I agree that Iceberg metadata should include a way to configure a desired
> sort order. But I want to note that I don’t think that we can ever assume
> that it has been applied. Table configuration will evolve as use changes.
> We don’t want to require rewrites when a configuration gets updated, so an
> assumption should be that data files might not be sorted.
>
> Files that are sorted should indicate how they are sorted, so that
> optimizations are applied if the file’s metadata indicates it can be safely
> applied. For example, if both deletes and data rows are sorted the same
> way, you can merge the two streams instead of using a hash set to check
> whether a record has been deleted. I think this should rely on the delete
> file’s sort order matching the data file it is applied to.
>
> Should Iceberg allow users to define a sort spec only if the table is
> bucketed?
>
> No. In Iceberg, bucketing is just another partition transform.
>
> However, I think we need to consider what a sort order will mean. Here are
> a few observations:
>
>    - Each file can have a sort order for its rows (Spark’s
>    sortWithinPartitions, which sorts each task’s data)
>    - Sorting is also used to cluster values across files so it makes
>    sense for a table sort order to be applied within partitions (ORDER BY)
>    - Multiple writes to the same partition are not expected to rewrite
>    existing data, so a partition may only be partially sorted or may have
>    multiple sorted file sets
>    - Partitioning is independent from sorting. Even when partitioning is
>    orthogonal to a sort order (i.e., bucketing), partitioning must still take
>    precedence.
>
> My conclusion is that a configured sort order applies to partitions, not
> data across partitions. Again, bucketing is just another type of partition.
>
> How should Iceberg encode sort specs?
>
> I don’t think this should be in table properties. The sort order should
> reference columns by ID so it doesn’t need to be changed when columns are
> renamed. I think this should be implemented like PartitionSpec.
>
> If sorting is applied within partitions, then I would make PartitionSpec
> and SortOrder separate. I would still use transforms to produce more
> complex sort orders. I think that’s a great idea, but we don’t need to mix
> partitioning and sorting to reuse transforms. Like partition specs, I think
> a table should be able to define multiple sort orders and each should be
> identified by ID. Then each data file can encode which sort order it was
> written with, just like manifests and partition specs.
>
> I think we should add sort-orders like partition-specs, and a
> default-sort-order-id like default-spec-id. This would also require
> removing sort_columns from data files in the spec
> <http://iceberg.apache.org/spec/#manifests> and adding sort_order_id. We
> can keep file_ordinal, but probably want to add some context to know the
> group of files where it is valid. We could also remove it.
>
> Which sort orders should Iceberg support?
>
> I agree with what’s already been said: we should use a natural order for
> each type
> <https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>,
> ascending and descending. To start, Strings must use UTF-8’s natural
> ordering and we can expand from there.
>
> Here’s what a sort order might look like:
>
> "sort-orders": [
>     { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] },
>     { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, { "source-id": 5, "ascending": false } ] },
>     { "id": 3, "ordering": [ { "transform": "zorder", "ascending": false, "source-ids": [4, 5] } ] },
>   ]
>
>
> On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi <
> aokolnychyi@apple.com.invalid> wrote:
>
>> In order to begin prototyping, I would start with the following questions.
>>
>> 1) Does Iceberg need a sort spec?
>>   - I would say yes
>> 2) Should Iceberg allow users to define a sort spec only if the table is
>> bucketed?
>> - I would say no, as it seems valid to have partitioned and sorted tables.
>> 3) How should Iceberg encode sort specs?
>> - Option #1 is to rely on table properties, which will allow us to use
>> ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am
>> not sure it would be easy to encode non-trivial sort specs and track sort
>> spec evolution (if needed).
>> - Option #2 is to extend PartitionSpec to cover sorting as well. This
>> option will allow us to use transformations to encode non-trivial sorts and
>> won't require many changes to the codebase.
>>   - Option #3 is to store SortSpec separately from PartitionSpec. This
>> will require more changes compared to Option #2 but can also give us extra
>> flexibility.
>>
>> Each option has its own trade-offs, but I tend to think #2 is reasonable.
>>
>> 4) Which sort orders should Iceberg support?
>> - I think we have to be flexible and support adding more sort orders
>> later. In addition to what Owen said, we can add sorting based on
>> multi-dimensional space-filling curves in the future.
>>
>>
>> What do you think?
>>
>> Thanks,
>> Anton
>>
>> On 1 Jul 2019, at 18:06, Owen O'Malley <ow...@gmail.com> wrote:
>>
>> My thought is just like Iceberg has to define partitioning and bucketing,
>> it has to define a canonical sort order. In particular, we can’t afford to
>> have Spark, Presto, and Hive writing files in different orders. I believe
>> the right approach is to define a sort order as a series of columns where
>> each column is either ascending or descending and defining the natural sort
>> order for each type.
>>
>> The hard bit will be if we need to support non-natural sorts of strings.
>> For example, if we need to support case-insensitive sorts or the different
>> collations that databases support, I’d hope that we could start with the
>> default of utf-8 byte ordering and expand as needed. If you are curious
>> what the different collations look like -
>> https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database
>>  .
>>
>> .. Owen
>>
>> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <
>> aokolnychyi@apple.com.INVALID> wrote:
>>
>> Hey folks,
>>
>> Iceberg users are advised not only to partition their data but also to
>> sort within partitions by columns in predicates in order to get the best
>> performance. Right now, this process is mostly manual and performed by
>> users before writing.
>> I am wondering if we should extend Iceberg metadata so that query engines
>> can do this automatically in the future. We already have `sortColumns` in
>> DataFile but they are not used.
>>
>>    - Do we need a notion of sort columns in TableMetadata?
>>    - Spark’s sort spec is tightly coupled with bucketing and cannot be
>>    used alone. However, it seems reasonable to have partitioned and sorted
>>    tables without bucketing. How do we see this in Iceberg?
>>    - If we decide to have sort spec in the metadata, do we want to make
>>    it part of PartitionSpec or have it separately?
>>
>> Thanks,
>> Anton
>>
>>
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>

Re: Sort Spec

Posted by Anton Okolnychyi <ao...@apple.com.INVALID>.

Let me summarize what we talked here and follow up with a PR.

- Iceberg should allow users to define a sort oder in its metadata that applies to partitions.
- We should never assume the sort order is actually applied to all files in the table.
- Sort orders might evolve and change over time. When this happens, existing files will not be rewritten. Query engines should follow the updated sort order during subsequent writes. As a result, files within a table or partition can be sorted differently at a given point in time.
- We should be able to define a sort order even for unpartitioned tables, as opposed to current Spark tables that allow a sort order only for bucketed tables.
- SortOrder is separate from PartitionSpec.
- SortOrder will rely on transformations to define complex sort orders.
- Files will be annotated with sort_order_id instead of sort_columns. We keep the question of file_ordinal open for now.
- To begin with, we will support asc/desc natural sort orders (UTF8 ordering for Strings).

Thanks,
Anton

> On 16 Jul 2019, at 23:56, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> 
> I agree that Iceberg metadata should include a way to configure a desired sort order. But I want to note that I don’t think that we can ever assume that it has been applied. Table configuration will evolve as use changes. We don’t want to require rewrites when a configuration gets updated, so an assumption should be that data files might not be sorted.
> 
> Files that are sorted should indicate how they are sorted, so that optimizations are applied if the file’s metadata indicates it can be safely applied. For example, if both deletes and data rows are sorted the same way, you can merge the two streams instead of using a hash set to check whether a record has been deleted. I think this should rely on the delete file’s sort order matching the data file it is applied to.
> 
> Should Iceberg allow users to define a sort spec only if the table is bucketed?
> 
> No. In Iceberg, bucketing is just another partition transform.
> 
> However, I think we need to consider what a sort order will mean. Here are a few observations:
> 
> Each file can have a sort order for its rows (Spark’s sortWithinPartitions, which sorts each task’s data)
> Sorting is also used to cluster values across files so it makes sense for a table sort order to be applied within partitions (ORDER BY)
> Multiple writes to the same partition are not expected to rewrite existing data, so a partition may only be partially sorted or may have multiple sorted file sets
> Partitioning is independent from sorting. Even when partitioning is orthogonal to a sort order (i.e., bucketing), partitioning must still take precedence.
> My conclusion is that a configured sort order applies to partitions, not data across partitions. Again, bucketing is just another type of partition.
> 
> How should Iceberg encode sort specs?
> 
> I don’t think this should be in table properties. The sort order should reference columns by ID so it doesn’t need to be changed when columns are renamed. I think this should be implemented like PartitionSpec.
> 
> If sorting is applied within partitions, then I would make PartitionSpec and SortOrder separate. I would still use transforms to produce more complex sort orders. I think that’s a great idea, but we don’t need to mix partitioning and sorting to reuse transforms. Like partition specs, I think a table should be able to define multiple sort orders and each should be identified by ID. Then each data file can encode which sort order it was written with, just like manifests and partition specs.
> 
> I think we should add sort-orders like partition-specs, and a default-sort-order-id like default-spec-id. This would also require removing sort_columns from data files in the spec <http://iceberg.apache.org/spec/#manifests> and adding sort_order_id. We can keep file_ordinal, but probably want to add some context to know the group of files where it is valid. We could also remove it.
> 
> Which sort orders should Iceberg support?
> 
> I agree with what’s already been said: we should use a natural order for each type <https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>, ascending and descending. To start, Strings must use UTF-8’s natural ordering and we can expand from there.
> 
> Here’s what a sort order might look like:
> 
> "sort-orders": [
>     { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] },
>     { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, { "source-id": 5, "ascending": false } ] },
>     { "id": 3, "ordering": [ { "transform": "zorder", "ascending": false, "source-ids": [4, 5] } ] },
>   ]
> 
> On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi <ao...@apple.com.invalid> wrote:
> In order to begin prototyping, I would start with the following questions.
> 
> 1) Does Iceberg need a sort spec?
>  	- I would say yes
> 2) Should Iceberg allow users to define a sort spec only if the table is bucketed?
> 	- I would say no, as it seems valid to have partitioned and sorted tables.
> 3) How should Iceberg encode sort specs?
> 	- Option #1 is to rely on table properties, which will allow us to use ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am not sure it would be easy to encode non-trivial sort specs and track sort spec evolution (if needed).
> 	- Option #2 is to extend PartitionSpec to cover sorting as well. This option will allow us to use transformations to encode non-trivial sorts and won't require many changes to the codebase.
>   	- Option #3 is to store SortSpec separately from PartitionSpec. This will require more changes compared to Option #2 but can also give us extra flexibility.
> 
> Each option has its own trade-offs, but I tend to think #2 is reasonable.
> 
> 4) Which sort orders should Iceberg support?
> 	- I think we have to be flexible and support adding more sort orders later. In addition to what Owen said, we can add sorting based on multi-dimensional space-filling curves in the future.
> 
> 
> What do you think?
> 
> Thanks,
> Anton
> 
>> On 1 Jul 2019, at 18:06, Owen O'Malley <owen.omalley@gmail.com <ma...@gmail.com>> wrote:
>> 
>> My thought is just like Iceberg has to define partitioning and bucketing, it has to define a canonical sort order. In particular, we can’t afford to have Spark, Presto, and Hive writing files in different orders. I believe the right approach is to define a sort order as a series of columns where each column is either ascending or descending and defining the natural sort order for each type.
>> 
>> The hard bit will be if we need to support non-natural sorts of strings. For example, if we need to support case-insensitive sorts or the different collations that databases support, I’d hope that we could start with the default of utf-8 byte ordering and expand as needed. If you are curious what the different collations look like - https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database <https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database> .
>> 
>> .. Owen
>> 
>>> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <aokolnychyi@apple.com.INVALID <ma...@apple.com.INVALID>> wrote:
>>> 
>>> Hey folks,
>>> 
>>> Iceberg users are advised not only to partition their data but also to sort within partitions by columns in predicates in order to get the best performance. Right now, this process is mostly manual and performed by users before writing.
>>> I am wondering if we should extend Iceberg metadata so that query engines can do this automatically in the future. We already have `sortColumns` in DataFile but they are not used.
>>> Do we need a notion of sort columns in TableMetadata?
>>> Spark’s sort spec is tightly coupled with bucketing and cannot be used alone. However, it seems reasonable to have partitioned and sorted tables without bucketing. How do we see this in Iceberg?
>>> If we decide to have sort spec in the metadata, do we want to make it part of PartitionSpec or have it separately?
>>> Thanks,
>>> Anton
>>> 
>> 
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Re: Sort Spec

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

I agree that Iceberg metadata should include a way to configure a desired
sort order. But I want to note that I don’t think that we can ever assume
that it has been applied. Table configuration will evolve as use changes.
We don’t want to require rewrites when a configuration gets updated, so an
assumption should be that data files might not be sorted.

Files that are sorted should indicate how they are sorted, so that
optimizations are applied if the file’s metadata indicates it can be safely
applied. For example, if both deletes and data rows are sorted the same
way, you can merge the two streams instead of using a hash set to check
whether a record has been deleted. I think this should rely on the delete
file’s sort order matching the data file it is applied to.

Should Iceberg allow users to define a sort spec only if the table is
bucketed?

No. In Iceberg, bucketing is just another partition transform.

However, I think we need to consider what a sort order will mean. Here are
a few observations:

   - Each file can have a sort order for its rows (Spark’s
   sortWithinPartitions, which sorts each task’s data)
   - Sorting is also used to cluster values across files so it makes sense
   for a table sort order to be applied within partitions (ORDER BY)
   - Multiple writes to the same partition are not expected to rewrite
   existing data, so a partition may only be partially sorted or may have
   multiple sorted file sets
   - Partitioning is independent from sorting. Even when partitioning is
   orthogonal to a sort order (i.e., bucketing), partitioning must still take
   precedence.

My conclusion is that a configured sort order applies to partitions, not
data across partitions. Again, bucketing is just another type of partition.

How should Iceberg encode sort specs?

I don’t think this should be in table properties. The sort order should
reference columns by ID so it doesn’t need to be changed when columns are
renamed. I think this should be implemented like PartitionSpec.

If sorting is applied within partitions, then I would make PartitionSpec
and SortOrder separate. I would still use transforms to produce more
complex sort orders. I think that’s a great idea, but we don’t need to mix
partitioning and sorting to reuse transforms. Like partition specs, I think
a table should be able to define multiple sort orders and each should be
identified by ID. Then each data file can encode which sort order it was
written with, just like manifests and partition specs.

I think we should add sort-orders like partition-specs, and a
default-sort-order-id like default-spec-id. This would also require
removing sort_columns from data files in the spec
<http://iceberg.apache.org/spec/#manifests> and adding sort_order_id. We
can keep file_ordinal, but probably want to add some context to know the
group of files where it is valid. We could also remove it.

Which sort orders should Iceberg support?

I agree with what’s already been said: we should use a natural order for
each type
<https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>,
ascending and descending. To start, Strings must use UTF-8’s natural
ordering and we can expand from there.

Here’s what a sort order might look like:

"sort-orders": [
    { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] },
    { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, {
"source-id": 5, "ascending": false } ] },
    { "id": 3, "ordering": [ { "transform": "zorder", "ascending":
false, "source-ids": [4, 5] } ] },
  ]

On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi
<ao...@apple.com.invalid> wrote:

> In order to begin prototyping, I would start with the following questions.
>
> 1) Does Iceberg need a sort spec?
>   - I would say yes
> 2) Should Iceberg allow users to define a sort spec only if the table is
> bucketed?
> - I would say no, as it seems valid to have partitioned and sorted tables.
> 3) How should Iceberg encode sort specs?
> - Option #1 is to rely on table properties, which will allow us to use
> ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am
> not sure it would be easy to encode non-trivial sort specs and track sort
> spec evolution (if needed).
> - Option #2 is to extend PartitionSpec to cover sorting as well. This
> option will allow us to use transformations to encode non-trivial sorts and
> won't require many changes to the codebase.
>   - Option #3 is to store SortSpec separately from PartitionSpec. This
> will require more changes compared to Option #2 but can also give us extra
> flexibility.
>
> Each option has its own trade-offs, but I tend to think #2 is reasonable.
>
> 4) Which sort orders should Iceberg support?
> - I think we have to be flexible and support adding more sort orders
> later. In addition to what Owen said, we can add sorting based on
> multi-dimensional space-filling curves in the future.
>
>
> What do you think?
>
> Thanks,
> Anton
>
> On 1 Jul 2019, at 18:06, Owen O'Malley <ow...@gmail.com> wrote:
>
> My thought is just like Iceberg has to define partitioning and bucketing,
> it has to define a canonical sort order. In particular, we can’t afford to
> have Spark, Presto, and Hive writing files in different orders. I believe
> the right approach is to define a sort order as a series of columns where
> each column is either ascending or descending and defining the natural sort
> order for each type.
>
> The hard bit will be if we need to support non-natural sorts of strings.
> For example, if we need to support case-insensitive sorts or the different
> collations that databases support, I’d hope that we could start with the
> default of utf-8 byte ordering and expand as needed. If you are curious
> what the different collations look like -
> https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database
>  .
>
> .. Owen
>
> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <
> aokolnychyi@apple.com.INVALID> wrote:
>
> Hey folks,
>
> Iceberg users are advised not only to partition their data but also to
> sort within partitions by columns in predicates in order to get the best
> performance. Right now, this process is mostly manual and performed by
> users before writing.
> I am wondering if we should extend Iceberg metadata so that query engines
> can do this automatically in the future. We already have `sortColumns` in
> DataFile but they are not used.
>
>    - Do we need a notion of sort columns in TableMetadata?
>    - Spark’s sort spec is tightly coupled with bucketing and cannot be
>    used alone. However, it seems reasonable to have partitioned and sorted
>    tables without bucketing. How do we see this in Iceberg?
>    - If we decide to have sort spec in the metadata, do we want to make
>    it part of PartitionSpec or have it separately?
>
> Thanks,
> Anton
>
>
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Sort Spec

Posted by Anton Okolnychyi <ao...@apple.com.INVALID>.

In order to begin prototyping, I would start with the following questions.

1) Does Iceberg need a sort spec?
 	- I would say yes
2) Should Iceberg allow users to define a sort spec only if the table is bucketed?
	- I would say no, as it seems valid to have partitioned and sorted tables.
3) How should Iceberg encode sort specs?
	- Option #1 is to rely on table properties, which will allow us to use ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am not sure it would be easy to encode non-trivial sort specs and track sort spec evolution (if needed).
	- Option #2 is to extend PartitionSpec to cover sorting as well. This option will allow us to use transformations to encode non-trivial sorts and won't require many changes to the codebase.
  	- Option #3 is to store SortSpec separately from PartitionSpec. This will require more changes compared to Option #2 but can also give us extra flexibility.

Each option has its own trade-offs, but I tend to think #2 is reasonable.

4) Which sort orders should Iceberg support?
	- I think we have to be flexible and support adding more sort orders later. In addition to what Owen said, we can add sorting based on multi-dimensional space-filling curves in the future.


What do you think?

Thanks,
Anton

> On 1 Jul 2019, at 18:06, Owen O'Malley <ow...@gmail.com> wrote:
> 
> My thought is just like Iceberg has to define partitioning and bucketing, it has to define a canonical sort order. In particular, we can’t afford to have Spark, Presto, and Hive writing files in different orders. I believe the right approach is to define a sort order as a series of columns where each column is either ascending or descending and defining the natural sort order for each type.
> 
> The hard bit will be if we need to support non-natural sorts of strings. For example, if we need to support case-insensitive sorts or the different collations that databases support, I’d hope that we could start with the default of utf-8 byte ordering and expand as needed. If you are curious what the different collations look like - https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database <https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database> .
> 
> .. Owen
> 
>> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <aokolnychyi@apple.com.INVALID <ma...@apple.com.INVALID>> wrote:
>> 
>> Hey folks,
>> 
>> Iceberg users are advised not only to partition their data but also to sort within partitions by columns in predicates in order to get the best performance. Right now, this process is mostly manual and performed by users before writing.
>> I am wondering if we should extend Iceberg metadata so that query engines can do this automatically in the future. We already have `sortColumns` in DataFile but they are not used.
>> Do we need a notion of sort columns in TableMetadata?
>> Spark’s sort spec is tightly coupled with bucketing and cannot be used alone. However, it seems reasonable to have partitioned and sorted tables without bucketing. How do we see this in Iceberg?
>> If we decide to have sort spec in the metadata, do we want to make it part of PartitionSpec or have it separately?
>> Thanks,
>> Anton
>> 
>

Re: Sort Spec

Posted by Owen O'Malley <ow...@gmail.com>.

My thought is just like Iceberg has to define partitioning and bucketing, it has to define a canonical sort order. In particular, we can’t afford to have Spark, Presto, and Hive writing files in different orders. I believe the right approach is to define a sort order as a series of columns where each column is either ascending or descending and defining the natural sort order for each type.

The hard bit will be if we need to support non-natural sorts of strings. For example, if we need to support case-insensitive sorts or the different collations that databases support, I’d hope that we could start with the default of utf-8 byte ordering and expand as needed. If you are curious what the different collations look like - https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database <https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database> .

.. Owen

> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <ao...@apple.com.INVALID> wrote:
> 
> Hey folks,
> 
> Iceberg users are advised not only to partition their data but also to sort within partitions by columns in predicates in order to get the best performance. Right now, this process is mostly manual and performed by users before writing.
> I am wondering if we should extend Iceberg metadata so that query engines can do this automatically in the future. We already have `sortColumns` in DataFile but they are not used.
> Do we need a notion of sort columns in TableMetadata?
> Spark’s sort spec is tightly coupled with bucketing and cannot be used alone. However, it seems reasonable to have partitioned and sorted tables without bucketing. How do we see this in Iceberg?
> If we decide to have sort spec in the metadata, do we want to make it part of PartitionSpec or have it separately?
> Thanks,
> Anton
>