You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by "Sam Corbett (JIRA)" <ji...@apache.org> on 2016/08/16 17:56:20 UTC

[jira] [Created] (BROOKLYN-332) Blocked task holding mutex lives beyond application lifetime and blocks tasks in subsequent applications

Sam Corbett created BROOKLYN-332:
------------------------------------

             Summary: Blocked task holding mutex lives beyond application lifetime and blocks tasks in subsequent applications
                 Key: BROOKLYN-332
                 URL: https://issues.apache.org/jira/browse/BROOKLYN-332
             Project: Brooklyn
          Issue Type: Bug
            Reporter: Sam Corbett


Andrea deployed a VanillaJavaApp that got stuck starting on a task that was never going to complete. The task that was stuck had obtained a mutex on an SshMachineLocation in ArchiveUtils.deploy. The application was stopped but the task was not stopped and the mutex was never released. This stacktrace is from a thread dump after stopping the app:

{code}
brooklyn-execmanager-EiOrzrfj-9" #54 daemon prio=5 os_prio=31 tid=0x00007fa7f191a800 nid=0x8e03 in Object.wait() [0x0000700003151000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:502)
	at org.apache.brooklyn.util.core.task.BasicTask.blockUntilStarted(BasicTask.java:389)
	- locked <0x0000000784e96ac8> (a org.apache.brooklyn.util.core.task.BasicTask)
	at org.apache.brooklyn.util.core.task.BasicTask.blockUntilStarted(BasicTask.java:378)
	- locked <0x0000000784e96ac8> (a org.apache.brooklyn.util.core.task.BasicTask)
	at org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:360)
	at org.apache.brooklyn.util.core.task.BasicTask.getUnchecked(BasicTask.java:370)
	at org.apache.brooklyn.util.core.task.system.ProcessTaskWrapper.get(ProcessTaskWrapper.java:153)
	at org.apache.brooklyn.util.core.file.ArchiveUtils.deploy(ArchiveUtils.java:277)
	at org.apache.brooklyn.util.core.file.ArchiveUtils.deploy(ArchiveUtils.java:237)
	at org.apache.brooklyn.entity.java.VanillaJavaAppSshDriver.customize(VanillaJavaAppSshDriver.java:99)
	at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver$3$2.run(AbstractSoftwareProcessDriver.java:175)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
	at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{code}

The app was redeployed to the same location (Andrea to clarify whether it was localhost or BYON). ArchiveUtils' attempt to obtain the machine mutex failed because the mutex was still owned by the zombie task:

{code}
brooklyn-execmanager-EiOrzrfj-0" #45 daemon prio=5 os_prio=31 tid=0x00007fa7f0fb3800 nid=0x7c03 waiting on condition [0x0000700002835000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000007838ff1d8> (a java.util.concurrent.Semaphore$FairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
	at org.apache.brooklyn.util.core.mutex.SemaphoreWithOwners.acquire(SemaphoreWithOwners.java:51)
	at org.apache.brooklyn.util.core.mutex.MutexSupport.acquireMutex(MutexSupport.java:77)
	at org.apache.brooklyn.location.ssh.SshMachineLocation.acquireMutex(SshMachineLocation.java:1078)
	at org.apache.brooklyn.util.core.file.ArchiveUtils.deploy(ArchiveUtils.java:266)
	at org.apache.brooklyn.util.core.file.ArchiveUtils.deploy(ArchiveUtils.java:237)
	at org.apache.brooklyn.entity.java.VanillaJavaAppSshDriver.customize(VanillaJavaAppSshDriver.java:99)
	at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver$3$2.run(AbstractSoftwareProcessDriver.java:175)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
	at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{code}

This manifested itself as an app that was forever "installing archive" and could only really be understood with a thread dump.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)