You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Mohammad Arshad (Jira)" <ji...@apache.org> on 2020/08/03 05:43:00 UTC

[jira] [Updated] (HBASE-24211) Create table is slow in large cluster when AccessController is enabled.

     [ https://issues.apache.org/jira/browse/HBASE-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mohammad Arshad updated HBASE-24211:
------------------------------------
    Component/s: Performance

> Create table is slow in large cluster when AccessController is enabled.
> -----------------------------------------------------------------------
>
>                 Key: HBASE-24211
>                 URL: https://issues.apache.org/jira/browse/HBASE-24211
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>    Affects Versions: 1.3.6, master, 2.2.4
>            Reporter: Mohammad Arshad
>            Assignee: Mohammad Arshad
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0
>
>
> *Problem:*
> In HBase 1.3.x  large, performance test, cluster (100 RS, 60k tables, 600k regions) a simple table creation takes around 150 seconds. The time taken varies but still takes lot of time.
> *Analysis:*
> 1. When HBase creates a table , it calls AssignmentManager#assign(final ServerName destination, final List<HRegionInfo> regions)
>  In AssignmentManager#assign,it calls asyncSetOfflineInZooKeeper(state, cb, destination), and waits in below code loop for 2 minutes. 
> {code:java}
>  if (useZKForAssignment) {
>           // Wait until all unassigned nodes have been put up and watchers set.
>           int total = states.size();
>           for (int oldCounter = 0; !server.isStopped();) {
>             int count = counter.get();
>             if (oldCounter != count) {
>               LOG.debug(destination.toString() + " unassigned znodes=" + count +
>                 " of total=" + total + "; oldCounter=" + oldCounter);
>               oldCounter = count;
>             }
>             if (count >= total) break;
>             Thread.sleep(5);
>           }
>         }
> {code}
> 2. asyncSetOfflineInZooKeeper creates a znode under /hbase/region-in-transition/ and calls exist to ensure that znode is created. This is simple operation should not take much time. Then where the time it taken!!!
> 3. ZooKeeper client API process watcher notification and async API response through a queue one by one.
>  If there is a delay in any watcher/response processing by the client, in this case HBase, all other response processing is delayed. Then it appears as if API call has taken more time.
>  Same thing happen in this issue.
> Watcher processing for znode creation under /hbase/acl took most of the time and delayed /hbase/region-in-transition/region znode creation processing. This is why wait in loop was too long. 
> 4. Watcher processing for znode creation under hbase/acl/ calls ZKPermissionWatcher#nodeChildrenChanged, which internally calls ZKUtil.getChildDataAndWatchForNewChildren
>  *which calls ZooKeeper's getData API, in this use case, 60k times which takes most of the time.*
> *Solutions:*
>  Move getChildDataAndWatchForNewChildren call into the async code block in ZKPermissionWatcher#nodeChildrenChanged. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)