You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/09/19 20:30:00 UTC

[jira] [Closed] (HUDI-2814) Address issues w/ Z-order Layout Optimization

     [ https://issues.apache.org/jira/browse/HUDI-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Kudinkin closed HUDI-2814.
---------------------------------
    Resolution: Fixed

> Address issues w/ Z-order Layout Optimization
> ---------------------------------------------
>
>                 Key: HUDI-2814
>                 URL: https://issues.apache.org/jira/browse/HUDI-2814
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: index
>            Reporter: Alexey Kudinkin
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>
> During extensive testing following issues have been discovered, which we're planning to addres in the upcoming PR:
>  * Data-skipping seq incorrectly handles cases when columns that are not Z-sorted are present in the query (it simply ignores this fact, while it should abandon pruning altogether[1])
>  * Exception w/in file-pruning seq should not be affecting overall query (it should in the worst case fallback to full-scan)
>  * Merging seq prefers records from the old Z-index table, while should prefer those from the new one.
>  * After clustering columns change, Z-index should simply overwrite index (currently it actually does the opposite – it skips updating the index in case old and new tables diverge in schemas)
>  * Incorrect type conversions (for ex, Decimal is converted to Double)
> Additionally we're planning to beef up current Z-index implementation test-suite making sure that all critical flows of the Z-indexing have appropriate coverage.
> [1] Actually, with more advanced analysis we could still prune the search space, but this requires substantial sophistication of the analysis conducted, which is beyond our current focus



--
This message was sent by Atlassian Jira
(v8.20.10#820010)