You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Norbert Luksa (Jira)" <ji...@apache.org> on 2020/06/30 09:11:00 UTC

[jira] [Resolved] (IMPALA-8755) Implement Z-ordering for Impala

     [ https://issues.apache.org/jira/browse/IMPALA-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Norbert Luksa resolved IMPALA-8755.
-----------------------------------
    Target Version: Impala 4.0
        Resolution: Implemented

> Implement Z-ordering for Impala
> -------------------------------
>
>                 Key: IMPALA-8755
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8755
>             Project: IMPALA
>          Issue Type: New Feature
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Norbert Luksa
>            Priority: Major
>
> Implement Z-ordering for Impala: [https://en.wikipedia.org/wiki/Z-order_curve]
> A Z-order curve defines an ordering on multi-dimensional data. Data sorted that way can be efficiently filtered by min/max statistics regarding to the columns participating in the ordering.
> Impala currently only supports lexicographic ordering via the SORT BY clause. This strongly prefers the first column, i.e. given the "SORT BY A, B, C" clause => A will be totally ordered (hence filtering on A will be very efficient), but values belonging to B and C will be scattered throughout the data set (hence filtering on B or C will barely do any good).
> We could add a new clause, e.g. a "ZSORT BY" clause to Impala that writes the data in Z-order.
> "ZSORT BY A, B C" would cluster the rows in a way that filtering on A, B, or C would be equally efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)