You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Yan Jian (JIRA)" <ji...@apache.org> on 2016/11/02 09:50:58 UTC

[jira] [Updated] (FLUME-2786) It will enter a deadlock state when modify the conf file before I stop flume-ng

     [ https://issues.apache.org/jira/browse/FLUME-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yan Jian updated FLUME-2786:
----------------------------
    Attachment: flume-2786-v1.6.0.patch

This bug also occured in our production environment.
It can lead a nested monitor lockout between thread _agent-shutdown-hook_ and _conf-file-poller_, details as below:
# _agent-shutdown-hook_ acquired {{application}} lock and tried to stop the {{executeService}} ??a {{ThreadPoolExecutor}} instance??.
# _conf-file-poller_ is scheduled to running in the {{executeService}}'s pool, preventing the {{executeService}} from being stopped.
# _conf-file-poller_ waits for {{application}} lock which was held by _agent-shutdown-hook_.

In our solution, {{synchronized}} is upgraded to {{ReentrantLock}}, and _conf-file-poller_ watches {{beingStopped}} condition with a 500ms interval when trying to acquire {{application}} lock.
Our solution based on 1.6.0 is shared as +flume-2786-v1.6.0.patch+.

>  It will enter a deadlock state when modify the conf file before I stop flume-ng
> --------------------------------------------------------------------------------
>
>                 Key: FLUME-2786
>                 URL: https://issues.apache.org/jira/browse/FLUME-2786
>             Project: Flume
>          Issue Type: Bug
>          Components: Master
>    Affects Versions: v1.6.0
>            Reporter: godfrey he
>            Priority: Blocker
>         Attachments: flume-2786-v1.6.0.patch
>
>
> When modify the conf fileļ¼Œand then I stop the flume-ng,  It will enter a deadlock state. 
> jstack result:
> "agent-shutdown-hook" prio=10 tid=0x00007f2e26419800 nid=0x333ae waiting on condition [0x0000000042c16000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000eaff3df8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>         at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
>         at java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:635)
>         at org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop(PollingPropertiesFileConfigurationProvider.java:87)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.stop(LifecycleSupervisor.java:106)
>         - locked <0x00000000eaf2daa0> (a org.apache.flume.lifecycle.LifecycleSupervisor)
>         at org.apache.flume.node.Application.stop(Application.java:93)
>         - locked <0x00000000eaf3c580> (a org.apache.flume.node.Application)
>         at org.apache.flume.node.Application$1.run(Application.java:348)
> "conf-file-poller-0" prio=10 tid=0x00007f2e2e8cd000 nid=0x21819 waiting for monitor entry [0x0000000041e3f000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.flume.node.Application.handleConfigurationEvent(Application.java:88)
>         - waiting to lock <0x00000000eaf3c580> (a org.apache.flume.node.Application)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)