You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2013/07/10 21:23:48 UTC

[jira] [Commented] (HBASE-8920) TestZKBasedOpenCloseRegion.testReOpenRegion fails occasionally

    [ https://issues.apache.org/jira/browse/HBASE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704942#comment-13704942 ] 

stack commented on HBASE-8920:
------------------------------

So, here is the testClose finishing up:

{code}
2013-07-10 11:58:46,979 INFO  [Thread-387] master.TestZKBasedOpenCloseRegion(317): Done with testCloseRegion
2013-07-10 11:58:46,981 INFO  [StoreOpener-953c5884dc661334f302052cfe8d547b-1] compactions.CompactionConfiguration(85): size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; delete expired; major period 604800000, major jitter 0.500000
2013-07-10 11:58:46,993 DEBUG [StoreOpener-953c5884dc661334f302052cfe8d547b-1] regionserver.HStore(443): loaded hdfs://localhost:56405/user/jenkins/hbase/TestZKBasedOpenCloseRegion/953c5884dc661334f302052cfe8d547b/a/fb400fbb145949dcb3563bd7b5a3c9ab, isReference=false, isBulkLoadResult=false, seqid=2, majorCompaction=false
2013-07-10 11:58:46,994 INFO  [StoreOpener-953c5884dc661334f302052cfe8d547b-1] compactions.CompactionConfiguration(85): size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; delete expired; major period 604800000, major jitter 0.500000
2013-07-10 11:58:46,996 INFO  [StoreOpener-953c5884dc661334f302052cfe8d547b-1] compactions.CompactionConfiguration(85): size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; delete expired; major period 604800000, major jitter 0.500000
2013-07-10 11:58:47,000 INFO  [RS_OPEN_REGION-juno:39691-0] regionserver.HRegion(619): Onlined 953c5884dc661334f302052cfe8d547b; next sequenceid=3
2013-07-10 11:58:47,000 DEBUG [RS_OPEN_REGION-juno:39691-0] zookeeper.ZKAssign(598): regionserver:39691-0x13fc8727d2e0002 Attempting to retransition the opening state of node 953c5884dc661334f302052cfe8d547b
2013-07-10 11:58:47,001 INFO  [pool-1-thread-1] hbase.ResourceChecker(171): after: master.TestZKBasedOpenCloseRegion#testCloseRegion Thread=281 (was 278)
Potentially hanging thread: MASTER_CLOSE_REGION-juno:45449-0
	sun.misc.Unsafe.park(Native Method)
	java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
	java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
	java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
	java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:957)
	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:917)
	java.lang.Thread.run(Thread.java:662)

Potentially hanging thread: RS_CLOSE_REGION-juno:39691-0
	sun.misc.Unsafe.park(Native Method)
	java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
	java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
	java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
	java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:957)
	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:917)
	java.lang.Thread.run(Thread.java:662)

Potentially hanging thread: StoreOpener-953c5884dc661334f302052cfe8d547b-1
	java.util.regex.Pattern$GroupTail.match(Pattern.java:4231)
	java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
	java.util.regex.Pattern$Curly.match(Pattern.java:3744)
	java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
	java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
	java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
	java.util.regex.Pattern$Begin.match(Pattern.java:3120)
	java.util.regex.Matcher.match(Matcher.java:1127)
	java.util.regex.Matcher.matches(Matcher.java:502)
	org.apache.hadoop.hbase.io.HFileLink.isHFileLink(HFileLink.java:140)
	org.apache.hadoop.hbase.regionserver.StoreFileInfo.validateStoreFileName(StoreFileInfo.java:297)
	org.apache.hadoop.hbase.regionserver.StoreFileInfo.isValid(StoreFileInfo.java:322)
	org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java:190)
	org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:407)
	org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:242)
	org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3104)
	org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:655)
	org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:653)
	java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	java.util.concurrent.FutureTask.run(FutureTask.java:138)
	java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
	java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	java.util.concurrent.FutureTask.run(FutureTask.java:138)
	java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	java.lang.Thread.run(Thread.java:662)
 - Thread LEAK? -, OpenFileDescriptor=445 (was 442) - OpenFileDescriptor LEAK? -, MaxFileDescriptor=40000 (was 40000), SystemLoadAverage=324 (was 324), ProcessCount=141 (was 141), AvailableMemoryMB=1280 (was 1283), ConnectionCount=6 (was 6)
2013-07-10 11:58:47,002 INFO  [PostOpenDeployTasks:953c5884dc661334f302052cfe8d547b] regionserver.HRegionServer(1698): Post open deploy tasks for region=TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,009 INFO  [PostOpenDeployTasks:953c5884dc661334f302052cfe8d547b] catalog.MetaEditor(432): Updated row TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. with server=juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,009 INFO  [PostOpenDeployTasks:953c5884dc661334f302052cfe8d547b] regionserver.HRegionServer(1723): Done with post open deploy task for region=TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,009 DEBUG [RS_OPEN_REGION-juno:39691-0] zookeeper.ZKAssign(786): regionserver:39691-0x13fc8727d2e0002 Attempting to transition node 953c5884dc661334f302052cfe8d547b from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
2013-07-10 11:58:47,020 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(307): master:45449-0x13fc8727d2e0000 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/region-in-transition/953c5884dc661334f302052cfe8d547b
2013-07-10 11:58:47,020 DEBUG [RS_OPEN_REGION-juno:39691-0] zookeeper.ZKAssign(862): regionserver:39691-0x13fc8727d2e0002 Successfully transitioned node 953c5884dc661334f302052cfe8d547b from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
2013-07-10 11:58:47,020 DEBUG [RS_OPEN_REGION-juno:39691-0] handler.OpenRegionHandler(373): region transitioned to opened in zookeeper: {ENCODED => 953c5884dc661334f302052cfe8d547b, NAME => 'TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.', STARTKEY => 'bbb', ENDKEY => 'ccc'}, server: juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,020 DEBUG [RS_OPEN_REGION-juno:39691-0] handler.OpenRegionHandler(186): Opened TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. on server:juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,021 DEBUG [hbase-am-zkevent-worker-pool-2-thread-15] master.AssignmentManager(768): Handling transition=RS_ZK_REGION_OPENED, server=juno.apache.org,39691,1373457514592, region=953c5884dc661334f302052cfe8d547b, current state from region state map ={953c5884dc661334f302052cfe8d547b state=OPENING, ts=1373457526962, server=juno.apache.org,39691,1373457514592}

....
{code}

The region reopen is going ahead after the close succeeds.

Now, we fall into the next test, the testReOpen, the one that is failing:

{code}
...
2013-07-10 11:58:47,021 INFO  [pool-1-thread-1] hbase.ResourceChecker(147): before: master.TestZKBasedOpenCloseRegion#testReOpenRegion Thread=281, OpenFileDescriptor=448, MaxFileDescriptor=40000, SystemLoadAverage=324, ProcessCount=141, AvailableMemoryMB=1280, ConnectionCount=6
2013-07-10 11:58:47,021 INFO  [hbase-am-zkevent-worker-pool-2-thread-15] master.RegionStates(265): Region transitioned from {953c5884dc661334f302052cfe8d547b state=OPENING, ts=1373457526962, server=juno.apache.org,39691,1373457514592} to {953c5884dc661334f302052cfe8d547b state=OPEN, ts=1373457527021, server=juno.apache.org,39691,1373457514592}
2013-07-10 11:58:47,022 DEBUG [MASTER_OPEN_REGION-juno:45449-2] handler.OpenedRegionHandler(145): Handling OPENED event for 953c5884dc661334f302052cfe8d547b from juno.apache.org,39691,1373457514592; deleting unassigned node
2013-07-10 11:58:47,022 DEBUG [MASTER_OPEN_REGION-juno:45449-2] zookeeper.ZKAssign(405): master:45449-0x13fc8727d2e0000 Deleting existing unassigned node for 953c5884dc661334f302052cfe8d547b that is in expected state RS_ZK_REGION_OPENED
2013-07-10 11:58:47,032 DEBUG [pool-1-thread-1] client.ClientScanner(195): Finished region={ENCODED => 1028785192, NAME => '.META.,,1', STARTKEY => '', ENDKEY => ''}
2013-07-10 11:58:47,032 INFO  [Thread-394] master.TestZKBasedOpenCloseRegion(109): Number of region servers = 2
2013-07-10 11:58:47,033 INFO  [Thread-394] master.TestZKBasedOpenCloseRegion(150): TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,033 DEBUG [Thread-394] master.TestZKBasedOpenCloseRegion(116): Asking RS to close region TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,034 INFO  [Thread-394] master.TestZKBasedOpenCloseRegion(133): Unassign TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,034 DEBUG [Thread-394] master.AssignmentManager(2206): Starting unassignment of region TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. (offlining)
2013-07-10 11:58:47,035 DEBUG [Thread-394] master.AssignmentManager(2279): Attempting to unassign region TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. but it is already in transition (OPEN, force=false)
2013-07-10 11:58:47,036 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(307): master:45449-0x13fc8727d2e0000 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/region-in-transition/953c5884dc661334f302052cfe8d547b
2013-07-10 11:58:47,036 DEBUG [hbase-am-zkevent-worker-pool-2-thread-13] master.AssignmentManager$4(1219): The znode of region TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. has been deleted, region state: {953c5884dc661334f302052cfe8d547b state=OPEN, ts=1373457527021, server=juno.apache.org,39691,1373457514592}
2013-07-10 11:58:47,036 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(307): master:45449-0x13fc8727d2e0000 Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/region-in-transition
2013-07-10 11:58:47,036 INFO  [hbase-am-zkevent-worker-pool-2-thread-13] master.AssignmentManager$4(1224): The master has opened the region TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. that was online on juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,036 DEBUG [MASTER_OPEN_REGION-juno:45449-2] zookeeper.ZKAssign(434): master:45449-0x13fc8727d2e0000 Successfully deleted unassigned node for region 953c5884dc661334f302052cfe8d547b in expected state RS_ZK_REGION_OPENED
...
{code}

We are dropping the close event.  It is stomped on by the successful open.

I could ensure nothing is in transition before starting the test but this seems like a conflict we could see out on running cluster.
                
> TestZKBasedOpenCloseRegion.testReOpenRegion fails occasionally
> --------------------------------------------------------------
>
>                 Key: HBASE-8920
>                 URL: https://issues.apache.org/jira/browse/HBASE-8920
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>
> Failed here:
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/607/testReport/org.apache.hadoop.hbase.master/TestZKBasedOpenCloseRegion/testReOpenRegion/
> and
> https://builds.apache.org/job/HBase-TRUNK/4232/testReport/org.apache.hadoop.hbase.master/TestZKBasedOpenCloseRegion/testReOpenRegion/
> In both cases, we fail for the same reason.  The previous test closes a region.  The failing test starts but we are reopening the region we'd just closed as we are wont to do.  This is the region that gets chosen by the failing test to practise reopen on only  it is not yet fully open so in comes the close and it gets stomped on by the ongoing open.
> Let me post some log.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira