You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/04/06 22:31:00 UTC

[jira] [Created] (HUDI-3812) Metadata is not enabled by default on the Read Path

Alexey Kudinkin created HUDI-3812:
-------------------------------------

             Summary: Metadata is not enabled by default on the Read Path
                 Key: HUDI-3812
                 URL: https://issues.apache.org/jira/browse/HUDI-3812
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Alexey Kudinkin
            Assignee: Alexey Kudinkin


While Metadata Table is enabled by default on the Write Path (in HoodieMetadataConfig), it's disabled by default on the Read Path (at least in Spark).

 

Now with the Data Skipping enabled by default (as of 0.10, actually) it fails b/c Data Skipping now solely relies on MT and Column Stats to function.

 

We need to revisit current default configs to make sure they make sense. So that we either
 # Switch off Data Skipping by default as well (If we want to go ultra-conservative)
 # Switch on Metadata Table by default.

 

Frankly, i can hardly imagine why we'd enable MT on the write path by default, but not enable it on the Read Path by default as this will bring the cost of it into everyone's flows, but no benefits (out of the box, people will have to discover that it's switched off and switch it on themselves, which seems like something everyone is likely to do regardless).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)