You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "zhuobin zheng (Jira)" <ji...@apache.org> on 2021/01/15 04:13:00 UTC

[jira] [Updated] (HBASE-25510) Optimize TableName.valueOf from O(n) to O(1). We can get benefits when the number of tables in the cluster is greater than dozens

     [ https://issues.apache.org/jira/browse/HBASE-25510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhuobin zheng updated HBASE-25510:
----------------------------------
    Description: 
Now, TableName.valueOf will try to find TableName Object in cache linearly(code show as below). So it is too slow when we has  thousands of tables on cluster.
{code:java}
// code placeholder
for (TableName tn : tableCache) {
  if (Bytes.equals(tn.getQualifier(), qns) && Bytes.equals(tn.getNamespace(), bns)) {
    return tn;
  }
}{code}
I try to store the object in the hash table, so it can look up more quickly. code like this
{code:java}
// code placeholder
TableName oldTable = tableCache.get(nameAsStr);{code}
 

In our cluster which has tens thousands of tables. (Most of that is KYLIN table). 
 We found that in the following two cases, the TableName.valueOf method will severely restrict our performance.
  
 Common premise: tens of thousands table in cluster
 cause: TableName.valueOf with low performance. (because we need to traverse all caches linearly)
  
 Case1. Replication
 premise1: one of table write with high qps, small value, Non-batch request. cause too much wal entry

premise2: deserialize WAL Entry includes calling the TableName.valueOf method.

Cause: Replicat Stuck. A lot of WAL files pile up.

 

Case2. Active Master Start up

NamespaceStateManager init should init all RegionInfo, and regioninfo init will call TableName.valueOf.  It will cost some time if TableName.valueOf is slow.
  

  was:
There are tens of thousands of tables on our cluster (Most of that is KYLIN table). 
We found that in the following two cases, the TableName.valueOf method will severely restrict our performance.
 
Common premise: tens of thousands table in cluster
cause: TableName.valueOf with low performance. (because we need to traverse all caches linearly)
 
Case1. Replication
premise: one of table write with high qps, small value, Non-batch request.
cause: There are too much wal entry in WAL. So we need to deserialize too many WAL Entry which includes calling the TableName.valueOf method to instantiate the TableName object.


Case2. Active Master Start up
 
 


> Optimize TableName.valueOf from O(n) to O(1).  We can get benefits when the number of tables in the cluster is greater than dozens
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25510
>                 URL: https://issues.apache.org/jira/browse/HBASE-25510
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Replication
>    Affects Versions: 1.2.12, 1.4.13, 2.4.1
>            Reporter: zhuobin zheng
>            Priority: Major
>
> Now, TableName.valueOf will try to find TableName Object in cache linearly(code show as below). So it is too slow when we has  thousands of tables on cluster.
> {code:java}
> // code placeholder
> for (TableName tn : tableCache) {
>   if (Bytes.equals(tn.getQualifier(), qns) && Bytes.equals(tn.getNamespace(), bns)) {
>     return tn;
>   }
> }{code}
> I try to store the object in the hash table, so it can look up more quickly. code like this
> {code:java}
> // code placeholder
> TableName oldTable = tableCache.get(nameAsStr);{code}
>  
> In our cluster which has tens thousands of tables. (Most of that is KYLIN table). 
>  We found that in the following two cases, the TableName.valueOf method will severely restrict our performance.
>   
>  Common premise: tens of thousands table in cluster
>  cause: TableName.valueOf with low performance. (because we need to traverse all caches linearly)
>   
>  Case1. Replication
>  premise1: one of table write with high qps, small value, Non-batch request. cause too much wal entry
> premise2: deserialize WAL Entry includes calling the TableName.valueOf method.
> Cause: Replicat Stuck. A lot of WAL files pile up.
>  
> Case2. Active Master Start up
> NamespaceStateManager init should init all RegionInfo, and regioninfo init will call TableName.valueOf.  It will cost some time if TableName.valueOf is slow.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)