You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/09/19 20:30:00 UTC
[jira] [Closed] (HUDI-2814) Address issues w/ Z-order Layout Optimization
[ https://issues.apache.org/jira/browse/HUDI-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Kudinkin closed HUDI-2814.
---------------------------------
Resolution: Fixed
> Address issues w/ Z-order Layout Optimization
> ---------------------------------------------
>
> Key: HUDI-2814
> URL: https://issues.apache.org/jira/browse/HUDI-2814
> Project: Apache Hudi
> Issue Type: Task
> Components: index
> Reporter: Alexey Kudinkin
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.10.0
>
>
> During extensive testing following issues have been discovered, which we're planning to addres in the upcoming PR:
> * Data-skipping seq incorrectly handles cases when columns that are not Z-sorted are present in the query (it simply ignores this fact, while it should abandon pruning altogether[1])
> * Exception w/in file-pruning seq should not be affecting overall query (it should in the worst case fallback to full-scan)
> * Merging seq prefers records from the old Z-index table, while should prefer those from the new one.
> * After clustering columns change, Z-index should simply overwrite index (currently it actually does the opposite – it skips updating the index in case old and new tables diverge in schemas)
> * Incorrect type conversions (for ex, Decimal is converted to Double)
> Additionally we're planning to beef up current Z-index implementation test-suite making sure that all critical flows of the Z-indexing have appropriate coverage.
> [1] Actually, with more advanced analysis we could still prune the search space, but this requires substantial sophistication of the analysis conducted, which is beyond our current focus
--
This message was sent by Atlassian Jira
(v8.20.10#820010)