You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Murtadha Hubail (JIRA)" <ji...@apache.org> on 2015/11/12 21:59:10 UTC

[jira] [Resolved] (ASTERIXDB-1170) Deadlock in shutdown with DatasetLifecycleManager

     [ https://issues.apache.org/jira/browse/ASTERIXDB-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Murtadha Hubail resolved ASTERIXDB-1170.
----------------------------------------
    Resolution: Fixed

By the above commit which broke the dependancy between DatasetLifeCycleManager and PrimaryIndexOperationTracker and commit fa7963b5ad4545ad0df1ce9ae6553253acbff524 which fixed the shutdown sequence, this issue should be fixed now.

> Deadlock in shutdown with DatasetLifecycleManager
> -------------------------------------------------
>
>                 Key: ASTERIXDB-1170
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1170
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Ian Maxon
>            Assignee: Murtadha Hubail
>         Attachments: trace.txt
>
>
> During cancel of a test run, I observed this deadlock in the DatasetLifeCycleManager. It looks like the checkpoint thread is holding the optracker but needs the monitor on the DatasetLifeCycleManager, and the DatasetLifecycleManager needs the converse. This in turn, prevents clean shutdown.
> "org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:2:0:0@5996" daemon prio=5 tid=0x74 nid=NA waiting for monitor entry
>   java.lang.Thread.State: BLOCKED
> 	 blocks org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995
> 	 waiting for org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995 to release lock on <0x17dc> (a org.apache.asterix.common.context.DatasetLifecycleManager)
> 	  at org.apache.asterix.common.context.DatasetLifecycleManager.allocateDatasetMemory(DatasetLifecycleManager.java:639)
> 	  at org.apache.asterix.common.context.PrimaryIndexOperationTracker.beforeOperation(PrimaryIndexOperationTracker.java:64)
> 	  at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.enterComponents(LSMHarness.java:180)
> 	  at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.getAndEnterComponents(LSMHarness.java:115)
> 	  - locked <0x17dd> (a org.apache.asterix.common.context.PrimaryIndexOperationTracker)
> 	  at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:333)
> 	  at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:327)
> 	  at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.insert(LSMTreeIndexAccessor.java:50)
> 	  at org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.nextFrame(AsterixLSMInsertDeleteOperatorNodePushable.java:102)
> 	  at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:342)
> 	  at org.apache.hyracks.control.nc.Task.run(Task.java:290)
> 	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	  at java.lang.Thread.run(Thread.java:745)
> "org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995" daemon prio=5 tid=0x77 nid=NA waiting for monitor entry
>   java.lang.Thread.State: BLOCKED
> 	 blocks Thread-55@5983
> 	 blocks org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:2:0:0@5996
> 	 waiting for org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:2:0:0@5996 to release lock on <0x17dd> (a org.apache.asterix.common.context.PrimaryIndexOperationTracker)
> 	  at org.apache.asterix.common.context.DatasetLifecycleManager.open(DatasetLifecycleManager.java:205)
> 	  - locked <0x17dc> (a org.apache.asterix.common.context.DatasetLifecycleManager)
> 	  at org.apache.hyracks.storage.am.common.dataflow.IndexDataflowHelper.open(IndexDataflowHelper.java:116)
> 	  at org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.open(AsterixLSMInsertDeleteOperatorNodePushable.java:61)
> 	  at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:334)
> 	  at org.apache.hyracks.control.nc.Task.run(Task.java:290)
> 	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	  at java.lang.Thread.run(Thread.java:745)
> "Thread-55@5983" prio=5 tid=0x5a nid=NA waiting for monitor entry
>   java.lang.Thread.State: BLOCKED
> 	 waiting for org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:0:0:3:0:0@5995 to release lock on <0x17dc> (a org.apache.asterix.common.context.DatasetLifecycleManager)
> 	  at org.apache.asterix.common.context.DatasetLifecycleManager.flushAllDatasets(DatasetLifecycleManager.java:474)
> 	  at org.apache.asterix.transaction.management.service.recovery.RecoveryManager.checkpoint(RecoveryManager.java:406)
> 	  - locked <0x17f5> (a org.apache.asterix.transaction.management.service.recovery.RecoveryManager)
> 	  at org.apache.asterix.hyracks.bootstrap.NCApplicationEntryPoint.stop(NCApplicationEntryPoint.java:132)
> 	  at org.apache.hyracks.control.nc.NodeControllerService.stop(NodeControllerService.java:347)
> 	  - locked <0x17f7> (a org.apache.hyracks.control.nc.NodeControllerService)
> 	  at org.apache.hyracks.control.nc.NodeControllerService$JVMShutdownHook.run(NodeControllerService.java:588)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)