You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/01/21 06:21:00 UTC

[jira] [Updated] (HUDI-3091) Make simple index as the default hoodie.index.type

     [ https://issues.apache.org/jira/browse/HUDI-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan updated HUDI-3091:
--------------------------------------
    Fix Version/s: 0.11.0

> Make simple index as the default hoodie.index.type
> --------------------------------------------------
>
>                 Key: HUDI-3091
>                 URL: https://issues.apache.org/jira/browse/HUDI-3091
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: index
>            Reporter: Vinoth Govindarajan
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> When performing upserts with derived datasets, we often run into an OOM issue with the bloom filter, hence we changed all the dataset index types to simple to resolve the issue.
>  
> Some of the tables were non-partitioned tables for which bloom index is not the right choice.
> I'm proposing to make a simple index as the default value and on case-by-case basics, folks can choose the bloom filter for additional performance gains offered by bloom filters.
>  
> I agree that the performance will not be optimal but for regular use cases simple index would not break and give them sub-optimal read/write performance but it won't break any ingestion/derived jobs.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)