You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Rajeshbabu Chintaguntla (JIRA)" <ji...@apache.org> on 2017/08/29 07:25:00 UTC

[jira] [Commented] (PHOENIX-4131) UngroupedAggregateRegionObserver.preClose() and doPostScannerOpen() can deadlock

    [ https://issues.apache.org/jira/browse/PHOENIX-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144862#comment-16144862 ] 

Rajeshbabu Chintaguntla commented on PHOENIX-4131:
--------------------------------------------------

[~samarthjain] are you working on it or you want me to take  a look?

> UngroupedAggregateRegionObserver.preClose() and doPostScannerOpen() can deadlock
> --------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4131
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4131
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Samarth Jain
>
> On my local test run I saw that the tests were not completing because the mini cluster couldn't shut down. So I took a jstack and discovered the following deadlock:
> {code}
> "RS:0;samarthjai-wsm4:59006" #16265 prio=5 os_prio=31 tid=0x00007fafa6327000 nid=0x37b3f runnable [0x00007000115f5000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.lang.Object.wait(Native Method)
> 	at org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.preClose(UngroupedAggregateRegionObserver.java:1201)
> 	- locked <0x000000072bc406b8> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$4.call(RegionCoprocessorHost.java:494)
> 	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673)
> 	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749)
> 	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preClose(RegionCoprocessorHost.java:490)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2843)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2805)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2423)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1052)
> 	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:157)
> 	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110)
> 	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:141)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:360)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> 	at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:334)
> 	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:139)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}
> {code}
> "RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=59006" #16246 daemon prio=5 os_prio=31 tid=0x00007fafae856000 nid=0x1abdb waiting for monitor entry [0x00007000102bc000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> 	at org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:734)
> 	- waiting to lock <0x000000072bc406b8> (a java.lang.Object)
> 	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:236)
> 	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:281)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2629)
> 	- locked <0x000000072b625a90> (a org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2833)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {code}
> preClose() has the object monitor and is waiting for scanReferencesCount to go down to 0. doPostScannerOpen() is trying to acquire the same lock so that it can reduce the scanReferencesCount to 0.
> I think this bug was introduced in PHOENIX-3111 to solve other deadlocks. FYI, [~rajeshbabu], [~sergey.soldatov], [~enis], [~lhofhansl].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)