You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Jian Feng (Jira)" <ji...@apache.org> on 2022/06/08 15:49:00 UTC
[jira] [Created] (HUDI-4210) Create custom hbase index to solve data skew issue on hbase regions
Jian Feng created HUDI-4210:
-------------------------------
Summary: Create custom hbase index to solve data skew issue on hbase regions
Key: HUDI-4210
URL: https://issues.apache.org/jira/browse/HUDI-4210
Project: Apache Hudi
Issue Type: Improvement
Components: index
Reporter: Jian Feng
Assignee: Jian Feng
In our production environment, since many table's id is auto-increment, if using Hbase index, will cause a data skew issue in HBase regions. it is better to find a way to add random prefixes and also keep ordering in hudi itself.
we may have a small modification to the HBase index. add the prefix on the aspect of query and update HBase. In
this way, the pk in HBase will be different from the one in hudi but such
logic will be transparent to business logic. I have adopted this method in
prod environment. Using withIndexClass config in IndexConfig could specify
the custom index
Another work, driven by uber engineers [https://github.com/apache/hudi/pull/3508] could
technically solve the issue by directly reading HFiles, but still in progress, this approach should resolve this issue immediately
--
This message was sent by Atlassian Jira
(v8.20.7#820007)