You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Bhavyam Kamal <bh...@dremio.com> on 2021/07/21 11:49:48 UTC

Proposal: Z-Ordering in Iceberg

Hi Everyone,

I would like to discuss and get feedback on the following proposal for
Z-Ordering in the Iceberg Sync today:

https://docs.google.com/document/d/1UfGxaB7qlrGzzMk9pBm03oKPOkm-jk-NQVQQvHP-0Bc/edit?usp=sharing

Please let me know if you have any thoughts or suggestions by adding
comments in the doc.

Thanks and regards,
Bhavyam

Re: Proposal: Z-Ordering in Iceberg

Posted by Russell Spitzer <ru...@gmail.com>.
Yep! We discussed this yesterday.

The general plan going forward will be

Phase 1:
Merge Sort based compaction
Allow compaction/rewrite of data files using a space filling curve based sort. No planning or persisting of metrics.

Phase 2:
Support for Transforms with multiple arguments and possible parameterization
Store and metrics for curve values in datafile metrics along with transform used when writing file
Query planning using these metrics.


In my mind the final picture looks like

DataFileMetrics { zMax = ?, zMin = ?, sortOrder = 1)

Table Metadata {
  SortOrder 1 = "HilbertCurve(x, y, z) + Options { }"
  SortOrder 2 = "ZOrder(x,y) + Options(y using 128 bytes)" 
}

Or something like that. This way for any given data file we can generate filters based on the ordering function used for a particular data file and we can update our definitions of functions over time etc ...

I think the main spec change here is figuring out how to store these transforms with more information (and multiple args)

> On Jul 22, 2021, at 8:37 AM, Piotr Findeisen <pi...@starburstdata.com> wrote:
> 
> Hi Bhavyam,
> 
> Has this been discussed on the sync?
> Ryan, will it be making into the table metadata spec?
> 
> Best,
> PF
> 
> On Wed, Jul 21, 2021 at 1:50 PM Bhavyam Kamal <bhavyam.kamal@dremio.com <ma...@dremio.com>> wrote:
> Hi Everyone,
> 
> I would like to discuss and get feedback on the following proposal for Z-Ordering in the Iceberg Sync today:
> 
> https://docs.google.com/document/d/1UfGxaB7qlrGzzMk9pBm03oKPOkm-jk-NQVQQvHP-0Bc/edit?usp=sharing <https://docs.google.com/document/d/1UfGxaB7qlrGzzMk9pBm03oKPOkm-jk-NQVQQvHP-0Bc/edit?usp=sharing>
> 
> Please let me know if you have any thoughts or suggestions by adding comments in the doc.
> 
> Thanks and regards,
> Bhavyam
> 


Re: Proposal: Z-Ordering in Iceberg

Posted by Piotr Findeisen <pi...@starburstdata.com>.
Hi Bhavyam,

Has this been discussed on the sync?
Ryan, will it be making into the table metadata spec?

Best,
PF

On Wed, Jul 21, 2021 at 1:50 PM Bhavyam Kamal <bh...@dremio.com>
wrote:

> Hi Everyone,
>
> I would like to discuss and get feedback on the following proposal for
> Z-Ordering in the Iceberg Sync today:
>
>
> https://docs.google.com/document/d/1UfGxaB7qlrGzzMk9pBm03oKPOkm-jk-NQVQQvHP-0Bc/edit?usp=sharing
>
> Please let me know if you have any thoughts or suggestions by adding
> comments in the doc.
>
> Thanks and regards,
> Bhavyam
>
>