You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/07/26 10:10:00 UTC
[jira] [Updated] (HUDI-4210) Create custom hbase index to solve data skew issue on hbase regions
[ https://issues.apache.org/jira/browse/HUDI-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-4210:
---------------------------------
Labels: pull-request-available (was: )
> Create custom hbase index to solve data skew issue on hbase regions
> -------------------------------------------------------------------
>
> Key: HUDI-4210
> URL: https://issues.apache.org/jira/browse/HUDI-4210
> Project: Apache Hudi
> Issue Type: Improvement
> Components: index
> Reporter: Jian Feng
> Assignee: Jian Feng
> Priority: Major
> Labels: pull-request-available
>
> In our production environment, since many table's id is auto-increment, if using Hbase index, will cause a data skew issue in HBase regions. it is better to find a way to add random prefixes and also keep ordering in hudi itself.
> we may have a small modification to the HBase index. add the prefix on the aspect of query and update HBase. In
> this way, the pk in HBase will be different from the one in hudi but such
> logic will be transparent to business logic. I have adopted this method in
> prod environment. Using withIndexClass config in IndexConfig could specify
> the custom index
>
> Another work, driven by uber engineers [https://github.com/apache/hudi/pull/3508] could
> technically solve the issue by directly reading HFiles, but still in progress, this approach should resolve this issue immediately
--
This message was sent by Atlassian Jira
(v8.20.10#820010)