You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Mohammad Arshad (Jira)" <ji...@apache.org> on 2020/04/17 22:58:00 UTC
[jira] [Created] (HBASE-24211) Create table is slow in large cluster when AccessController is enabled.

Mohammad Arshad created HBASE-24211:
---------------------------------------

             Summary: Create table is slow in large cluster when AccessController is enabled.
                 Key: HBASE-24211
                 URL: https://issues.apache.org/jira/browse/HBASE-24211
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.2.4, 1.3.6, master
            Reporter: Mohammad Arshad
            Assignee: Mohammad Arshad


*Problem:*

In HBase 1.3.x  large, performance test, cluster (100 RS, 60k tables, 600k regions) a simple table creation takes around 150 seconds. The time taken varies but still takes lot of time.

*Analysis:*

1. When HBase creates a table , it calls AssignmentManager#assign(final ServerName destination, final List<HRegionInfo> regions)
 In AssignmentManager#assign，it calls asyncSetOfflineInZooKeeper(state, cb, destination), and waits in below code loop for 2 minutes. 
{code:java}
 if (useZKForAssignment) {
          // Wait until all unassigned nodes have been put up and watchers set.
          int total = states.size();
          for (int oldCounter = 0; !server.isStopped();) {
            int count = counter.get();
            if (oldCounter != count) {
              LOG.debug(destination.toString() + " unassigned znodes=" + count +
                " of total=" + total + "; oldCounter=" + oldCounter);
              oldCounter = count;
            }
            if (count >= total) break;
            Thread.sleep(5);
          }
        }
{code}
2. asyncSetOfflineInZooKeeper creates a znode under /hbase/region-in-transition/ and calls exist to ensure that znode is created. This is simple operation should not take much time. Then where the time it taken!!!

3. ZooKeeper client API process watcher notification and async API response through a queue one by one.
 If there is a delay in any watcher/response processing by the client, in this case HBase, all other response processing is delayed. Then it appears as if API call has taken more time.
 Same thing happen in this issue.

Watcher processing for znode creation under /hbase/acl took most of the time and delayed /hbase/region-in-transition/region znode creation processing. This is why wait in loop was too long. 

4. Watcher processing for znode creation under hbase/acl/ calls ZKPermissionWatcher#nodeChildrenChanged, which internally calls ZKUtil.getChildDataAndWatchForNewChildren
 *which calls ZooKeeper's getData API, in this use case, 60k times which takes most of the time.*

*Solutions:*
 Move getChildDataAndWatchForNewChildren call into the async code block in ZKPermissionWatcher#nodeChildrenChanged. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)