You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Zhoushuaifeng <zh...@huawei.com> on 2011/04/28 06:08:26 UTC

found one deadlock on hbase?

Logs are below, is it a deadlock of hbase? How it happens and how to avoid?

Found one Java-level deadlock:
=============================
"IPC Server handler 9 on 60020":
  waiting to lock monitor 0x00000000409f3908 (object 0x00007fe7cbacbd48, a org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
  which is held by "IPC Server handler 7 on 60020"
"IPC Server handler 7 on 60020":
  waiting for ownable synchronizer 0x00007fe7cbb06228, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "regionserver60020.cacheFlusher"
"regionserver60020.cacheFlusher":
  waiting to lock monitor 0x00000000409f3908 (object 0x00007fe7cbacbd48, a org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
  which is held by "IPC Server handler 7 on 60020"

Java stack information for the threads listed above:
===================================================
"IPC Server handler 9 on 60020":
                at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java)
                - waiting to lock <0x00007fe7cbacbd48> (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
                at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558)
                at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
                at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
"IPC Server handler 7 on 60020":
                at sun.misc.Unsafe.$$YJP$$park(Native Method)
                - parking to wait for  <0x00007fe7cbb06228> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
                at sun.misc.Unsafe.park(Unsafe.java)
                at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
                at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
                at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
                at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:429)
                - locked <0x00007fe7cbacbd48> (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
                at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558)
                at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
                at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
"regionserver60020.cacheFlusher":
                at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
                - waiting to lock <0x00007fe7cbacbd48> (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
                at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
                at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
                at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
                at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
                at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
                at java.security.AccessController.$$YJP$$doPrivileged(Native Method)
                at java.security.AccessController.doPrivileged(AccessController.java)
                at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
                at sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
                at sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
                at sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
                at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
                at java.util.TimeZone.getDisplayName(TimeZone.java:350)
                at java.util.Date.toString(Date.java:1025)
                at java.lang.String.valueOf(String.java:2826)
                at java.lang.StringBuilder.append(StringBuilder.java:115)
                at org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114)
                at java.lang.String.valueOf(String.java:2826)
                at java.lang.StringBuilder.append(StringBuilder.java:115)
                at org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145)
                - locked <0x00007fe7ccabd258> (a java.util.HashMap)
                at org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188)
                at org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140)
                - locked <0x00007fe7cbaf08c8> (a org.apache.hadoop.hbase.regionserver.CompactSplitThread)
                at org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118)
                - locked <0x00007fe7cbaf08c8> (a org.apache.hadoop.hbase.regionserver.CompactSplitThread)
                at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
                at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:194)
                at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:214)

Found 1 deadlock.

Zhou Shuaifeng(Frank)





Re: About parameter

Posted by Jean-Daniel Cryans <jd...@apache.org>.
The issue you saw could have been mitigated by:
https://issues.apache.org/jira/browse/HBASE-3741

Take also into account this bug:
https://issues.apache.org/jira/browse/HBASE-3669

J-D

On Thu, Apr 28, 2011 at 3:00 AM, Gaojinchao <ga...@huawei.com> wrote:
> In my test cluster. It can't assign Meta table.(one Hmaster and two region server).
> I find assigned meta region timed out and  reopened.
>
> I think we should set default value (hbase.master.assignment.timeoutmonitor.timeout)  bigger.
>
> 2011-04-14 11:48:19,240 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: .META.,,1.1028785192
> 2011-04-14 11:48:19,252 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of .META.,,1.1028785192
> 2011-04-14 11:48:19,257 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempting to transition node 1028785192/.META. from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
> 2011-04-14 11:48:19,291 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Successfully transitioned node 1028785192 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
> 2011-04-14 11:48:19,311 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: REGION => {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME => '.META.', IS_META => 'true', FAMILIES => [{NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'true', BLOCKCACHE => 'true'}]}}
> 2011-04-14 11:48:19,814 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated .META.,,1.1028785192
> 2011-04-14 11:48:21,297 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://10.18.52.108:9000/hbase/.META./1028785192/info/3950786077714265980, isReference=false, isBulkLoadResult=false, seqid=204, majorCompaction=false
> 2011-04-14 11:48:21,796 INFO org.apache.hadoop.hbase.regionserver.HRegion: Replaying edits from hdfs://10.18.52.108:9000/hbase/.META./1028785192/recovered.edits/0000000000000000207; minSequenceid=204
> 2011-04-14 11:48:59,243 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: .META.,,1.1028785192
> 2011-04-14 11:49:22,391 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempting to transition node 1028785192/.META. from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
> 2011-04-14 11:49:22,396 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempt to transition the unassigned node for 1028785192 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING failed, the node existed but was version 102 not the expected version 101
> 2011-04-14 11:49:22,396 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed refreshing OPENING; region=1028785192, context=open_region_progress
> 2011-04-14 11:49:22,396 WARN org.apache.hadoop.hbase.regionserver.HRegion: Progressable reporter failed, stopping replay
> 2011-04-14 11:49:22,447 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=.META.,,1.1028785192
> java.io.IOException: Progressable reporter failed, stopping replay
>        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1903)
>        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1828)
>        at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:353)
>        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2546)
>        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532)
>        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262)
>        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94)
>        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> 2011-04-14 11:49:22,452 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of .META.,,1.1028785192
> 2011-04-14 11:49:22,452 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempting to transition node 1028785192/.META. from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
> 2011-04-14 11:49:22,523 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Successfully transitioned node 1028785192 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
> 2011-04-14 11:49:22,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: REGION => {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME => '.META.', IS_META => 'true', FAMILIES => [{NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'true', BLOCKCACHE => 'true'}]}}
> 2011-04-14 11:49:22,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated .META.,,1.1028785192
> 2011-04-14 11:49:24,194 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://10.18.52.108:9000/hbase/.META./1028785192/info/3950786077714265980, isReference=false, isBulkLoadResult=false, seqid=204, majorCompaction=false
> 2011-04-14 11:49:24,520 INFO org.apache.hadoop.hbase.regionserver.HRegion: Replaying edits from hdfs://10.18.52.108:9000/hbase/.META./1028785192/recovered.edits/0000000000000000207; minSequenceid=204
> 2011-04-14 11:49:59,234 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: .META.,,1.1028785192
> 2011-04-14 11:50:19,620 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempting to transition node 1028785192/.META. from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
> 2011-04-14 11:50:19,624 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempt to transition the unassigned node for 1028785192 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING failed, the node existed but was version 104 not the expected version 103
> 2011-04-14 11:50:19,624 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed refreshing OPENING; region=1028785192, context=open_region_progress
> 2011-04-14 11:50:19,624 WARN org.apache.hadoop.hbase.regionserver.HRegion: Progressable reporter failed, stopping replay
> 2011-04-14 11:50:19,662 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=.META.,,1.1028785192
> java.io.IOException: Progressable reporter failed, stopping replay
>        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1903)
>        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1828)
>        at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:353)
>        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2546)
>        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532)
>        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262)
>        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94)
>        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> 2011-04-14 11:50:19,663 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of .META.,,1.1028785192
>
>

About parameter

Posted by Gaojinchao <ga...@huawei.com>.
In my test cluster. It can't assign Meta table.(one Hmaster and two region server).
I find assigned meta region timed out and  reopened.

I think we should set default value (hbase.master.assignment.timeoutmonitor.timeout)  bigger.

2011-04-14 11:48:19,240 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: .META.,,1.1028785192
2011-04-14 11:48:19,252 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of .META.,,1.1028785192
2011-04-14 11:48:19,257 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempting to transition node 1028785192/.META. from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
2011-04-14 11:48:19,291 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Successfully transitioned node 1028785192 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
2011-04-14 11:48:19,311 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: REGION => {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME => '.META.', IS_META => 'true', FAMILIES => [{NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'true', BLOCKCACHE => 'true'}]}}
2011-04-14 11:48:19,814 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated .META.,,1.1028785192
2011-04-14 11:48:21,297 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://10.18.52.108:9000/hbase/.META./1028785192/info/3950786077714265980, isReference=false, isBulkLoadResult=false, seqid=204, majorCompaction=false
2011-04-14 11:48:21,796 INFO org.apache.hadoop.hbase.regionserver.HRegion: Replaying edits from hdfs://10.18.52.108:9000/hbase/.META./1028785192/recovered.edits/0000000000000000207; minSequenceid=204
2011-04-14 11:48:59,243 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: .META.,,1.1028785192
2011-04-14 11:49:22,391 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempting to transition node 1028785192/.META. from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
2011-04-14 11:49:22,396 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempt to transition the unassigned node for 1028785192 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING failed, the node existed but was version 102 not the expected version 101
2011-04-14 11:49:22,396 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed refreshing OPENING; region=1028785192, context=open_region_progress
2011-04-14 11:49:22,396 WARN org.apache.hadoop.hbase.regionserver.HRegion: Progressable reporter failed, stopping replay
2011-04-14 11:49:22,447 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=.META.,,1.1028785192
java.io.IOException: Progressable reporter failed, stopping replay
	at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1903)
	at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1828)
	at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:353)
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2546)
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532)
	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262)
	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94)
	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)
2011-04-14 11:49:22,452 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of .META.,,1.1028785192
2011-04-14 11:49:22,452 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempting to transition node 1028785192/.META. from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
2011-04-14 11:49:22,523 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Successfully transitioned node 1028785192 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
2011-04-14 11:49:22,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: REGION => {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME => '.META.', IS_META => 'true', FAMILIES => [{NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'true', BLOCKCACHE => 'true'}]}}
2011-04-14 11:49:22,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated .META.,,1.1028785192
2011-04-14 11:49:24,194 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://10.18.52.108:9000/hbase/.META./1028785192/info/3950786077714265980, isReference=false, isBulkLoadResult=false, seqid=204, majorCompaction=false
2011-04-14 11:49:24,520 INFO org.apache.hadoop.hbase.regionserver.HRegion: Replaying edits from hdfs://10.18.52.108:9000/hbase/.META./1028785192/recovered.edits/0000000000000000207; minSequenceid=204
2011-04-14 11:49:59,234 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: .META.,,1.1028785192
2011-04-14 11:50:19,620 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempting to transition node 1028785192/.META. from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
2011-04-14 11:50:19,624 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x12f52766dcd0014 Attempt to transition the unassigned node for 1028785192 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING failed, the node existed but was version 104 not the expected version 103
2011-04-14 11:50:19,624 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed refreshing OPENING; region=1028785192, context=open_region_progress
2011-04-14 11:50:19,624 WARN org.apache.hadoop.hbase.regionserver.HRegion: Progressable reporter failed, stopping replay
2011-04-14 11:50:19,662 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=.META.,,1.1028785192
java.io.IOException: Progressable reporter failed, stopping replay
	at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1903)
	at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1828)
	at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:353)
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2546)
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532)
	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262)
	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94)
	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)
2011-04-14 11:50:19,663 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of .META.,,1.1028785192


Re: found one deadlock on hbase?

Posted by Zhoushuaifeng <zh...@huawei.com>.
I had opened an issue in hbase jira.
Thanks for paying attention to it.
https://issues.apache.org/jira/browse/HBASE-3830


Zhou Shuaifeng(Frank)


-----邮件原件-----
发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
发送时间: 2011年4月29日 13:19
收件人: user@hbase.apache.org
抄送: Yanlijun; Chenjian
主题: Re: found one deadlock on hbase?

I'd say file an issue with the info that is in this thread.  We've not
seen this before, to our knowledge, but maybe there is something to it
but as is, the thread dump is not telling a clean story.

"IPC Server handler 9 on 60020" waiting on "IPC Server handler 7 on
60020" is fine but looking at code, I can't see how
"regionserver60020.cacheFlusher" would be in on the mix.  I see how it
takes out the reentrant lock to add a compaction request to a queue.
In your stack trace, it looks like its the Log.debug String
construction that some how ends up wanting to call
reclaimMemStoreMemory but can't get in because of "IPC Server handler
7 on 60020".

Thanks for digging in on this,
St.Ack


Re: found one deadlock on hbase?

Posted by Stack <st...@duboce.net>.
I'd say file an issue with the info that is in this thread.  We've not
seen this before, to our knowledge, but maybe there is something to it
but as is, the thread dump is not telling a clean story.

"IPC Server handler 9 on 60020" waiting on "IPC Server handler 7 on
60020" is fine but looking at code, I can't see how
"regionserver60020.cacheFlusher" would be in on the mix.  I see how it
takes out the reentrant lock to add a compaction request to a queue.
In your stack trace, it looks like its the Log.debug String
construction that some how ends up wanting to call
reclaimMemStoreMemory but can't get in because of "IPC Server handler
7 on 60020".

Thanks for digging in on this,
St.Ack

2011/4/28 Zhoushuaifeng <zh...@huawei.com>:
> Yes,profiler is enabled, may be this is the problem.
>
> Zhou Shuaifeng(Frank)
>
>
> -----邮件原件-----
> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
> 发送时间: 2011年4月29日 12:38
> 收件人: user@hbase.apache.org
> 抄送: Yanlijun; Chenjian
> 主题: Re: found one deadlock on hbase?
>
> Hmm.
>
> The profiler is enabled when you see this?
>
> Something is way off with the last of the threads showing in your thread dump:
>
>
> "regionserver60020.cacheFlusher":
>               at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
>               - waiting to lock <0x00007fe7cbacbd48> (a
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
>               at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
>               at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
>               at
> java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
>               at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
> ....
>
>
> How is it that we are trying to get into a synchronized hbase method,
> MemStoreFlusher, in the depths of an i18n call; we're trying to append
> a locale-appropriate date to a String.
>
> Something is way off?
>
> St.Ack
>
>

Re: found one deadlock on hbase?

Posted by Zhoushuaifeng <zh...@huawei.com>.
Yes,profiler is enabled, may be this is the problem.

Zhou Shuaifeng(Frank)


-----邮件原件-----
发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
发送时间: 2011年4月29日 12:38
收件人: user@hbase.apache.org
抄送: Yanlijun; Chenjian
主题: Re: found one deadlock on hbase?

Hmm.

The profiler is enabled when you see this?

Something is way off with the last of the threads showing in your thread dump:


"regionserver60020.cacheFlusher":
               at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
               - waiting to lock <0x00007fe7cbacbd48> (a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
               at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
               at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
               at
java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
               at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
....


How is it that we are trying to get into a synchronized hbase method,
MemStoreFlusher, in the depths of an i18n call; we're trying to append
a locale-appropriate date to a String.

Something is way off?

St.Ack


Re: found one deadlock on hbase?

Posted by Stack <st...@duboce.net>.
Hmm.

The profiler is enabled when you see this?

Something is way off with the last of the threads showing in your thread dump:


"regionserver60020.cacheFlusher":
               at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
               - waiting to lock <0x00007fe7cbacbd48> (a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
               at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
               at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
               at
java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
               at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
....


How is it that we are trying to get into a synchronized hbase method,
MemStoreFlusher, in the depths of an i18n call; we're trying to append
a locale-appropriate date to a String.

Something is way off?

St.Ack

2011/4/28 Zhoushuaifeng <zh...@huawei.com>:
> I rechecked this, and maybe it's not so bad.
> I read the notes of the lock, like this:
> Acquires the lock if it is not held by another thread and returns immediately, setting the lock hold count to one.
>
> If the current thread already holds the lock then the hold count is incremented by one and the method returns immediately.
>
> If the lock is held by another thread then the current thread becomes disabled for thread scheduling purposes and lies dormant until the lock has been acquired, at which time the lock hold count is set to one.
>
> Specified by: lock() in Lock
>
> Put op calling lock() like this:
>        this.cacheFlusher.reclaimMemStoreMemory();
> So, it's still locked by cacheflusher. If so, it's locked by the same thread (cacheFlusher), and can be locked at the same time with flushRegion.
> I'm not so familiar with ReentrantLock, please check if I'm write. If not, this is a critical priority issue.
>
>
> Zhou Shuaifeng(Frank)
>
>
> -----邮件原件-----
> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
> 发送时间: 2011年4月29日 11:48
> 收件人: user@hbase.apache.org
> 抄送: Yanlijun; Chenjian
> 主题: Re: found one deadlock on hbase?
>
> Yes.  The below looks viable (though strange we have not seen it up to
> this).  The profiler may have slowed things to bring on the deadlock
> -- or the run up to the high water mark -- but its still a deadlock.
> Please file a critical priority issue.
>
> If you have a patch, that'd be excellent.
>
> Thanks for digging in on this,
> St.Ack
>
>
> 2011/4/28 Zhoushuaifeng <zh...@huawei.com>:
>> Thanks, I will do more test.
>> Maybe the deadlock hapened like this? Please point it out if it's wrong.
>>
>> 1,One handler is handling put op, and reclaimMemStoreMemory, but the memory is isAboveHighWaterMark, so this handler locked the memstoreflusher until global mem is lower:
>>
>> public synchronized void reclaimMemStoreMemory() {
>>    if (isAboveHighWaterMark()) {
>>      lock.lock();
>>      try {
>>        while (isAboveHighWaterMark() && !server.isStopped()) {
>>          wakeupFlushThread();
>>          try {
>>            // we should be able to wait forever, but we've seen a bug where
>>            // we miss a notify, so put a 5 second bound on it at least.
>>            flushOccurred.await(5, TimeUnit.SECONDS);
>>          } catch (InterruptedException ie) {
>>            Thread.currentThread().interrupt();
>>          }
>>        }
>>      } finally {
>>        lock.unlock();
>>
>> 2, flushforGlobalPressure is trigered, but to flush the memstore, it needed to lock the memstoreflusher:
>>
>>  private boolean flushRegion(final HRegion region, final boolean emergencyFlush) {
>>    synchronized (this.regionsInQueue) {
>>      FlushRegionEntry fqe = this.regionsInQueue.remove(region);
>>      if (fqe != null && emergencyFlush) {
>>        // Need to remove from region from delay queue.  When NOT an
>>        // emergencyFlush, then item was removed via a flushQueue.poll.
>>        flushQueue.remove(fqe);
>>     }
>>     lock.lock();
>>    }
>>
>> 3, because lock is locked by the ipchandler of put op, the flushRegion will never get the lock and flush will never happen.
>> 4, no flush, memory stay in AboveHighWaterMark state, and never unlock, so, deadlock happend.
>>
>> Is it right?
>>
>> Zhou Shuaifeng(Frank)
>>
>

Re: found one deadlock on hbase?

Posted by Zhoushuaifeng <zh...@huawei.com>.
I rechecked this, and maybe it's not so bad.
I read the notes of the lock, like this:
Acquires the lock if it is not held by another thread and returns immediately, setting the lock hold count to one. 

If the current thread already holds the lock then the hold count is incremented by one and the method returns immediately. 

If the lock is held by another thread then the current thread becomes disabled for thread scheduling purposes and lies dormant until the lock has been acquired, at which time the lock hold count is set to one.

Specified by: lock() in Lock

Put op calling lock() like this:
        this.cacheFlusher.reclaimMemStoreMemory();
So, it's still locked by cacheflusher. If so, it's locked by the same thread (cacheFlusher), and can be locked at the same time with flushRegion.
I'm not so familiar with ReentrantLock, please check if I'm write. If not, this is a critical priority issue.


Zhou Shuaifeng(Frank)


-----邮件原件-----
发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
发送时间: 2011年4月29日 11:48
收件人: user@hbase.apache.org
抄送: Yanlijun; Chenjian
主题: Re: found one deadlock on hbase?

Yes.  The below looks viable (though strange we have not seen it up to
this).  The profiler may have slowed things to bring on the deadlock
-- or the run up to the high water mark -- but its still a deadlock.
Please file a critical priority issue.

If you have a patch, that'd be excellent.

Thanks for digging in on this,
St.Ack


2011/4/28 Zhoushuaifeng <zh...@huawei.com>:
> Thanks, I will do more test.
> Maybe the deadlock hapened like this? Please point it out if it's wrong.
>
> 1,One handler is handling put op, and reclaimMemStoreMemory, but the memory is isAboveHighWaterMark, so this handler locked the memstoreflusher until global mem is lower:
>
> public synchronized void reclaimMemStoreMemory() {
>    if (isAboveHighWaterMark()) {
>      lock.lock();
>      try {
>        while (isAboveHighWaterMark() && !server.isStopped()) {
>          wakeupFlushThread();
>          try {
>            // we should be able to wait forever, but we've seen a bug where
>            // we miss a notify, so put a 5 second bound on it at least.
>            flushOccurred.await(5, TimeUnit.SECONDS);
>          } catch (InterruptedException ie) {
>            Thread.currentThread().interrupt();
>          }
>        }
>      } finally {
>        lock.unlock();
>
> 2, flushforGlobalPressure is trigered, but to flush the memstore, it needed to lock the memstoreflusher:
>
>  private boolean flushRegion(final HRegion region, final boolean emergencyFlush) {
>    synchronized (this.regionsInQueue) {
>      FlushRegionEntry fqe = this.regionsInQueue.remove(region);
>      if (fqe != null && emergencyFlush) {
>        // Need to remove from region from delay queue.  When NOT an
>        // emergencyFlush, then item was removed via a flushQueue.poll.
>        flushQueue.remove(fqe);
>     }
>     lock.lock();
>    }
>
> 3, because lock is locked by the ipchandler of put op, the flushRegion will never get the lock and flush will never happen.
> 4, no flush, memory stay in AboveHighWaterMark state, and never unlock, so, deadlock happend.
>
> Is it right?
>
> Zhou Shuaifeng(Frank)
>

Re: found one deadlock on hbase?

Posted by Stack <st...@duboce.net>.
Yes.  The below looks viable (though strange we have not seen it up to
this).  The profiler may have slowed things to bring on the deadlock
-- or the run up to the high water mark -- but its still a deadlock.
Please file a critical priority issue.

If you have a patch, that'd be excellent.

Thanks for digging in on this,
St.Ack


2011/4/28 Zhoushuaifeng <zh...@huawei.com>:
> Thanks, I will do more test.
> Maybe the deadlock hapened like this? Please point it out if it's wrong.
>
> 1,One handler is handling put op, and reclaimMemStoreMemory, but the memory is isAboveHighWaterMark, so this handler locked the memstoreflusher until global mem is lower:
>
> public synchronized void reclaimMemStoreMemory() {
>    if (isAboveHighWaterMark()) {
>      lock.lock();
>      try {
>        while (isAboveHighWaterMark() && !server.isStopped()) {
>          wakeupFlushThread();
>          try {
>            // we should be able to wait forever, but we've seen a bug where
>            // we miss a notify, so put a 5 second bound on it at least.
>            flushOccurred.await(5, TimeUnit.SECONDS);
>          } catch (InterruptedException ie) {
>            Thread.currentThread().interrupt();
>          }
>        }
>      } finally {
>        lock.unlock();
>
> 2, flushforGlobalPressure is trigered, but to flush the memstore, it needed to lock the memstoreflusher:
>
>  private boolean flushRegion(final HRegion region, final boolean emergencyFlush) {
>    synchronized (this.regionsInQueue) {
>      FlushRegionEntry fqe = this.regionsInQueue.remove(region);
>      if (fqe != null && emergencyFlush) {
>        // Need to remove from region from delay queue.  When NOT an
>        // emergencyFlush, then item was removed via a flushQueue.poll.
>        flushQueue.remove(fqe);
>     }
>     lock.lock();
>    }
>
> 3, because lock is locked by the ipchandler of put op, the flushRegion will never get the lock and flush will never happen.
> 4, no flush, memory stay in AboveHighWaterMark state, and never unlock, so, deadlock happend.
>
> Is it right?
>
> Zhou Shuaifeng(Frank)
>
>
> -----邮件原件-----
> 发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
> 发送时间: 2011年4月29日 3:09
> 收件人: user@hbase.apache.org
> 主题: Re: found one deadlock on hbase?
>
> Like I said in the previous thread you made about this issue, it seems
> that the YourKit profiler is doing something unexpected from the HBase
> POV. Can you try running without it and see if it still happens?
>
> J-D
>
> 2011/4/28 Zhoushuaifeng <zh...@huawei.com>:
>> Thanks, version is 0.90.1
>>
>> Zhou Shuaifeng(Frank)
>>
>> -----邮件原件-----
>> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
>> 发送时间: 2011年4月28日 13:10
>> 收件人: user@hbase.apache.org
>> 抄送: Yanlijun
>> 主题: Re: found one deadlock on hbase?
>>
>> Must be a deadlock if the dumb JVM can figure it out.  What version of
>> hbase please so I can dig into source code?
>> Thanks,
>> St.Ack
>>
>>
>

Re: found one deadlock on hbase?

Posted by Zhoushuaifeng <zh...@huawei.com>.
Thanks, I will do more test.
Maybe the deadlock hapened like this? Please point it out if it's wrong.

1,One handler is handling put op, and reclaimMemStoreMemory, but the memory is isAboveHighWaterMark, so this handler locked the memstoreflusher until global mem is lower:

public synchronized void reclaimMemStoreMemory() {
    if (isAboveHighWaterMark()) {
      lock.lock();
      try {
        while (isAboveHighWaterMark() && !server.isStopped()) {
          wakeupFlushThread();
          try {
            // we should be able to wait forever, but we've seen a bug where
            // we miss a notify, so put a 5 second bound on it at least.
            flushOccurred.await(5, TimeUnit.SECONDS);
          } catch (InterruptedException ie) {
            Thread.currentThread().interrupt();
          }
        }
      } finally {
        lock.unlock();
        
2, flushforGlobalPressure is trigered, but to flush the memstore, it needed to lock the memstoreflusher:

  private boolean flushRegion(final HRegion region, final boolean emergencyFlush) {
    synchronized (this.regionsInQueue) {
      FlushRegionEntry fqe = this.regionsInQueue.remove(region);
      if (fqe != null && emergencyFlush) {
        // Need to remove from region from delay queue.  When NOT an
        // emergencyFlush, then item was removed via a flushQueue.poll.
        flushQueue.remove(fqe);
     }
     lock.lock();
    }
    
3, because lock is locked by the ipchandler of put op, the flushRegion will never get the lock and flush will never happen.
4, no flush, memory stay in AboveHighWaterMark state, and never unlock, so, deadlock happend.

Is it right?

Zhou Shuaifeng(Frank)


-----邮件原件-----
发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年4月29日 3:09
收件人: user@hbase.apache.org
主题: Re: found one deadlock on hbase?

Like I said in the previous thread you made about this issue, it seems
that the YourKit profiler is doing something unexpected from the HBase
POV. Can you try running without it and see if it still happens?

J-D

2011/4/28 Zhoushuaifeng <zh...@huawei.com>:
> Thanks, version is 0.90.1
>
> Zhou Shuaifeng(Frank)
>
> -----邮件原件-----
> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
> 发送时间: 2011年4月28日 13:10
> 收件人: user@hbase.apache.org
> 抄送: Yanlijun
> 主题: Re: found one deadlock on hbase?
>
> Must be a deadlock if the dumb JVM can figure it out.  What version of
> hbase please so I can dig into source code?
> Thanks,
> St.Ack
>
>

Re: found one deadlock on hbase?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Like I said in the previous thread you made about this issue, it seems
that the YourKit profiler is doing something unexpected from the HBase
POV. Can you try running without it and see if it still happens?

J-D

2011/4/28 Zhoushuaifeng <zh...@huawei.com>:
> Thanks, version is 0.90.1
>
> Zhou Shuaifeng(Frank)
>
> -----邮件原件-----
> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
> 发送时间: 2011年4月28日 13:10
> 收件人: user@hbase.apache.org
> 抄送: Yanlijun
> 主题: Re: found one deadlock on hbase?
>
> Must be a deadlock if the dumb JVM can figure it out.  What version of
> hbase please so I can dig into source code?
> Thanks,
> St.Ack
>
>

Re: found one deadlock on hbase?

Posted by Zhoushuaifeng <zh...@huawei.com>.
Thanks, version is 0.90.1

Zhou Shuaifeng(Frank)

-----邮件原件-----
发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
发送时间: 2011年4月28日 13:10
收件人: user@hbase.apache.org
抄送: Yanlijun
主题: Re: found one deadlock on hbase?

Must be a deadlock if the dumb JVM can figure it out.  What version of
hbase please so I can dig into source code?
Thanks,
St.Ack


Re: found one deadlock on hbase?

Posted by Stack <st...@duboce.net>.
Must be a deadlock if the dumb JVM can figure it out.  What version of
hbase please so I can dig into source code?
Thanks,
St.Ack

On Wed, Apr 27, 2011 at 9:08 PM, Zhoushuaifeng <zh...@huawei.com> wrote:
> Logs are below, is it a deadlock of hbase? How it happens and how to avoid?
>
> Found one Java-level deadlock:
> =============================
> "IPC Server handler 9 on 60020":
>  waiting to lock monitor 0x00000000409f3908 (object 0x00007fe7cbacbd48, a org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
>  which is held by "IPC Server handler 7 on 60020"
> "IPC Server handler 7 on 60020":
>  waiting for ownable synchronizer 0x00007fe7cbb06228, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
>  which is held by "regionserver60020.cacheFlusher"
> "regionserver60020.cacheFlusher":
>  waiting to lock monitor 0x00000000409f3908 (object 0x00007fe7cbacbd48, a org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
>  which is held by "IPC Server handler 7 on 60020"
>
> Java stack information for the threads listed above:
> ===================================================
> "IPC Server handler 9 on 60020":
>                at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java)
>                - waiting to lock <0x00007fe7cbacbd48> (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
>                at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558)
>                at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>                at java.lang.reflect.Method.invoke(Method.java:597)
>                at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>                at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> "IPC Server handler 7 on 60020":
>                at sun.misc.Unsafe.$$YJP$$park(Native Method)
>                - parking to wait for  <0x00007fe7cbb06228> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
>                at sun.misc.Unsafe.park(Unsafe.java)
>                at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>                at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
>                at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
>                at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
>                at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>                at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>                at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:429)
>                - locked <0x00007fe7cbacbd48> (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
>                at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558)
>                at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>                at java.lang.reflect.Method.invoke(Method.java:597)
>                at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>                at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> "regionserver60020.cacheFlusher":
>                at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
>                - waiting to lock <0x00007fe7cbacbd48> (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
>                at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
>                at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
>                at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
>                at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
>                at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
>                at java.security.AccessController.$$YJP$$doPrivileged(Native Method)
>                at java.security.AccessController.doPrivileged(AccessController.java)
>                at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
>                at sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
>                at sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
>                at sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
>                at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
>                at java.util.TimeZone.getDisplayName(TimeZone.java:350)
>                at java.util.Date.toString(Date.java:1025)
>                at java.lang.String.valueOf(String.java:2826)
>                at java.lang.StringBuilder.append(StringBuilder.java:115)
>                at org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114)
>                at java.lang.String.valueOf(String.java:2826)
>                at java.lang.StringBuilder.append(StringBuilder.java:115)
>                at org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145)
>                - locked <0x00007fe7ccabd258> (a java.util.HashMap)
>                at org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188)
>                at org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140)
>                - locked <0x00007fe7cbaf08c8> (a org.apache.hadoop.hbase.regionserver.CompactSplitThread)
>                at org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118)
>                - locked <0x00007fe7cbaf08c8> (a org.apache.hadoop.hbase.regionserver.CompactSplitThread)
>                at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
>                at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:194)
>                at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:214)
>
> Found 1 deadlock.
>
> Zhou Shuaifeng(Frank)
>
>
>
>
>