You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nifi.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/12/23 21:43:47 UTC

[jira] [Commented] (NIFI-1333) FlowController fails to shut down gracefully even though there is nothing going on in the flow

    [ https://issues.apache.org/jira/browse/NIFI-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070172#comment-15070172 ] 

ASF GitHub Bot commented on NIFI-1333:
--------------------------------------

GitHub user olegz opened a pull request:

    https://github.com/apache/nifi/pull/148

    NIFI-1333 fixed FlowController shutdown deadlock

    The relevant test is available here: https://github.com/olegz/nifi/blob/int-test/nifi-integration-tests/src/test/java/org/apache/nifi/test/flowcontroll/FlowControllerTests.java#L50 
    
    Unfortunately this is one of those multi-module situations.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/olegz/nifi NIFI-1333

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #148
    
----
commit 302dafbf59e1eab48bece7be8a3e98682c7fc14b
Author: Oleg Zhurakousky <ol...@suitcase.io>
Date:   2015-12-23T20:41:54Z

    NIFI-1333 fixed FlowController shutdown deadlock

----


> FlowController fails to shut down gracefully even though there is nothing going on in the flow
> ----------------------------------------------------------------------------------------------
>
>                 Key: NIFI-1333
>                 URL: https://issues.apache.org/jira/browse/NIFI-1333
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 0.4.1
>            Reporter: Oleg Zhurakousky
>            Assignee: Oleg Zhurakousky
>            Priority: Trivial
>             Fix For: 0.5.0
>
>
> Basically the following test fails: https://github.com/olegz/nifi/blob/int-test/nifi-integration-tests/src/test/java/org/apache/nifi/test/flowcontroll/FlowControllerTests.java#L50 even though there is no compelling reason for it to fail based on what's in the flow.
> Also, the message in logs is confusing . . .
> {code}
> Initiated graceful shutdown of flow controller...waiting up to 10 seconds
> 2015-12-23 15:19:11,977 WARN [main] o.apache.nifi.controller.FlowController Controller hasn't terminated properly.  There exists an uninterruptable thread that will take an indeterminate amount of time to stop.  Might need to kill the program manually.
> {code}
> What actually happens is deadlock during the shutdown.
> Below are the relevant jstack:
> {code}
> java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000007aeb20988> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> 	at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
> 	at org.apache.nifi.controller.FlowController.shutdown(FlowController.java:1124)
> 	at org.apache.nifi.test.s2s.SiteToSiteTests.bar(SiteToSiteTests.java:75)
> . . .
> "Framework Task Thread Thread-1" prio=5 tid=0x00007fc8a2064800 nid=0x6a03 waiting on condition [0x0000700001ded000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000007aeb20288> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
> 	at org.apache.nifi.controller.FlowController.getRootGroupId(FlowController.java:1262)
> 	at org.apache.nifi.controller.tasks.ExpireFlowFiles.run(ExpireFlowFiles.java:54)
> . . .
> "Timer-Driven Process Thread-1" prio=5 tid=0x00007fc8a3146800 nid=0x6c03 waiting on condition [0x0000700001ef0000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000007aeb20288> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
> 	at org.apache.nifi.controller.FlowController.isClustered(FlowController.java:2984)
> 	at org.apache.nifi.controller.FlowController.heartbeat(FlowController.java:3444)
> {code}
> The issue the way I see it is that FlowController's _shutdown_ routine is synchronized under the same lock as most of the FlowController callbacks made by other threads, hence those threads can't be shutdown since they are in dead-lock.
> I don't think there is any reason to synchronize the the shutdown routine since all we are trying to do is shut down the very same threads that are blocking. Removing synchronization resolves the issue.
> Will submit a patch in a few



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)