You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/07/26 10:10:00 UTC

[jira] [Updated] (HUDI-4210) Create custom hbase index to solve data skew issue on hbase regions

     [ https://issues.apache.org/jira/browse/HUDI-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HUDI-4210:
---------------------------------
    Labels: pull-request-available  (was: )

> Create custom hbase index to solve data skew issue on hbase regions
> -------------------------------------------------------------------
>
>                 Key: HUDI-4210
>                 URL: https://issues.apache.org/jira/browse/HUDI-4210
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: index
>            Reporter: Jian Feng
>            Assignee: Jian Feng
>            Priority: Major
>              Labels: pull-request-available
>
> In our production environment, since many table's id is auto-increment， if using Hbase index, will cause a data skew issue in HBase regions. it is better to find a way to add random prefixes and also keep ordering in hudi itself.
> we may have a small modification to the HBase index. add the prefix on the aspect of query and update HBase. In
> this way, the pk in HBase will be different from the one in hudi but such
> logic will be transparent to business logic. I have adopted this method in
> prod environment. Using withIndexClass config in IndexConfig could specify
> the custom index 
>  
> Another work, driven by uber engineers [https://github.com/apache/hudi/pull/3508] could
> technically solve the issue by directly reading HFiles, but still in progress, this approach should resolve this issue immediately



--
This message was sent by Atlassian Jira
(v8.20.10#820010)