You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Jian Feng (Jira)" <ji...@apache.org> on 2022/06/08 15:49:00 UTC

[jira] [Created] (HUDI-4210) Create custom hbase index to solve data skew issue on hbase regions

Jian Feng created HUDI-4210:
-------------------------------

             Summary: Create custom hbase index to solve data skew issue on hbase regions
                 Key: HUDI-4210
                 URL: https://issues.apache.org/jira/browse/HUDI-4210
             Project: Apache Hudi
          Issue Type: Improvement
          Components: index
            Reporter: Jian Feng
            Assignee: Jian Feng


In our production environment, since many table's id is auto-increment, if using Hbase index, will cause a data skew issue in HBase regions. it is better to find a way to add random prefixes and also keep ordering in hudi itself.

we may have a small modification to the HBase index. add the prefix on the aspect of query and update HBase. In
this way, the pk in HBase will be different from the one in hudi but such
logic will be transparent to business logic. I have adopted this method in
prod environment. Using withIndexClass config in IndexConfig could specify
the custom index 

 

Another work, driven by uber engineers [https://github.com/apache/hudi/pull/3508] could
technically solve the issue by directly reading HFiles, but still in progress, this approach should resolve this issue immediately



--
This message was sent by Atlassian Jira
(v8.20.7#820007)