You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/04/07 18:34:00 UTC

[jira] [Updated] (HUDI-3812) Make sure Data Skipping respects Metadata Table config

     [ https://issues.apache.org/jira/browse/HUDI-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Kudinkin updated HUDI-3812:
----------------------------------
    Summary: Make sure Data Skipping respects Metadata Table config  (was: Metadata is not enabled by default on the Read Path)

> Make sure Data Skipping respects Metadata Table config
> ------------------------------------------------------
>
>                 Key: HUDI-3812
>                 URL: https://issues.apache.org/jira/browse/HUDI-3812
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Alexey Kudinkin
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> While Metadata Table is enabled by default on the Write Path (in HoodieMetadataConfig), it's disabled by default on the Read Path (at least in Spark).
>  
> Now with the Data Skipping enabled by default (as of 0.10, actually) it fails b/c Data Skipping now solely relies on MT and Column Stats to function.
>  
> We need to revisit current default configs to make sure they make sense. So that we either
>  # Switch off Data Skipping by default as well (If we want to go ultra-conservative)
>  # Switch on Metadata Table by default.
>  
> Frankly, i can hardly imagine why we'd enable MT on the write path by default, but not enable it on the Read Path by default as this will bring the cost of it into everyone's flows, but no benefits (out of the box, people will have to discover that it's switched off and switch it on themselves, which seems like something everyone is likely to do regardless).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)