You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2014/04/19 01:28:17 UTC

[jira] [Created] (HBASE-11037) Race condition in TestZKBasedOpenCloseRegion

Lars Hofhansl created HBASE-11037:
-------------------------------------

             Summary: Race condition in TestZKBasedOpenCloseRegion
                 Key: HBASE-11037
                 URL: https://issues.apache.org/jira/browse/HBASE-11037
             Project: HBase
          Issue Type: Bug
            Reporter: Lars Hofhansl
             Fix For: 0.94.19


testCloseRegion is called before testReOpenRegion.

Here's the sequence of events:
{code}
2014-04-18 20:58:05,645 INFO  [Thread-380] master.TestZKBasedOpenCloseRegion(313): Running testCloseRegion
2014-04-18 20:58:05,645 INFO  [Thread-380] master.TestZKBasedOpenCloseRegion(315): Number of region servers = 2
2014-04-18 20:58:05,645 INFO  [Thread-380] master.TestZKBasedOpenCloseRegion(164): -ROOT-,,0.70236052
2014-04-18 20:58:05,646 DEBUG [Thread-380] master.TestZKBasedOpenCloseRegion(320): Asking RS to close region -ROOT-,,0.70236052
...
2014-04-18 20:58:06,237 INFO  [RS_CLOSE_ROOT-hemera.apache.org,46533,1397854669633-0] regionserver.HRegion(1148): Closed -ROOT-,,0.70236052
...
2014-04-18 20:58:06,404 INFO  [Thread-380] master.TestZKBasedOpenCloseRegion(333): Done with testCloseRegion
{code}
Then
{code}
2014-04-18 20:58:06,431 INFO  [pool-1-thread-1] hbase.ResourceChecker(157): before master.TestZKBasedOpenCloseRegion#testReOpenRegion: 234 threads, 388 file descriptors 4 connections, 
...
2014-04-18 20:58:06,466 DEBUG [MASTER_OPEN_REGION-hemera.apache.org,52650,1397854669138-3] zookeeper.ZKUtil(1597): master:52650-0x14576a1835d0000 Retrieved 62 byte(s) of data from znode /hbase/unassigned/70236052; data=region=-ROOT-,,0, origin=hemera.apache.org,46533,1397854669633, state=RS_ZK_REGION_OPENED
2014-04-18 20:58:06,473 DEBUG [pool-1-thread-1] client.ClientScanner(191): Finished with scanning at {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192,}
2014-04-18 20:58:06,473 INFO  [Thread-396] master.TestZKBasedOpenCloseRegion(123): Number of region servers = 2
2014-04-18 20:58:06,474 INFO  [Thread-396] master.TestZKBasedOpenCloseRegion(164): -ROOT-,,0.70236052
2014-04-18 20:58:06,474 DEBUG [Thread-396] master.TestZKBasedOpenCloseRegion(130): Asking RS to close region -ROOT-,,0.70236052
2014-04-18 20:58:06,474 INFO  [Thread-396] master.TestZKBasedOpenCloseRegion(147): Unassign -ROOT-,,0.70236052
2014-04-18 20:58:06,474 DEBUG [Thread-396] master.AssignmentManager(2126): Starting unassignment of region -ROOT-,,0.70236052 (offlining)
2014-04-18 20:58:06,475 DEBUG [Thread-396] master.AssignmentManager(2132): Attempted to unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere
2014-04-18 20:58:06,478 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(294): master:52650-0x14576a1835d0000 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/unassigned/70236052
2014-04-18 20:58:06,478 DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(1176): The znode of region -ROOT-,,0.70236052 has been deleted.
2014-04-18 20:58:06,478 INFO  [pool-1-thread-1-EventThread] master.AssignmentManager(1188): The master has opened the region -ROOT-,,0.70236052 that was online on hemera.apache.org,46533,1397854669633
2014-04-18 20:58:06,478 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(294): master:52650-0x14576a1835d0000 Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/unassigned
{code}
Then nothing happens. So testCloseRegion unassigns the ROOT region and testReOpenRegion starts before ROOT is reassigned. Hence it waits forever for the close event, since it never happens.

This is the key "master.AssignmentManager(2132): Attempted to unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere"

The easiest fix is to just run testCloseRegion last (as it was before we switched junit).



--
This message was sent by Atlassian JIRA
(v6.2#6252)