You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Yan Jian (JIRA)" <ji...@apache.org> on 2016/11/02 09:50:58 UTC
[jira] [Updated] (FLUME-2786) It will enter a deadlock state when
modify the conf file before I stop flume-ng
[ https://issues.apache.org/jira/browse/FLUME-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yan Jian updated FLUME-2786:
----------------------------
Attachment: flume-2786-v1.6.0.patch
This bug also occured in our production environment.
It can lead a nested monitor lockout between thread _agent-shutdown-hook_ and _conf-file-poller_, details as below:
# _agent-shutdown-hook_ acquired {{application}} lock and tried to stop the {{executeService}} ??a {{ThreadPoolExecutor}} instance??.
# _conf-file-poller_ is scheduled to running in the {{executeService}}'s pool, preventing the {{executeService}} from being stopped.
# _conf-file-poller_ waits for {{application}} lock which was held by _agent-shutdown-hook_.
In our solution, {{synchronized}} is upgraded to {{ReentrantLock}}, and _conf-file-poller_ watches {{beingStopped}} condition with a 500ms interval when trying to acquire {{application}} lock.
Our solution based on 1.6.0 is shared as +flume-2786-v1.6.0.patch+.
> It will enter a deadlock state when modify the conf file before I stop flume-ng
> --------------------------------------------------------------------------------
>
> Key: FLUME-2786
> URL: https://issues.apache.org/jira/browse/FLUME-2786
> Project: Flume
> Issue Type: Bug
> Components: Master
> Affects Versions: v1.6.0
> Reporter: godfrey he
> Priority: Blocker
> Attachments: flume-2786-v1.6.0.patch
>
>
> When modify the conf fileļ¼and then I stop the flume-ng, It will enter a deadlock state.
> jstack result:
> "agent-shutdown-hook" prio=10 tid=0x00007f2e26419800 nid=0x333ae waiting on condition [0x0000000042c16000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000eaff3df8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
> at java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:635)
> at org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop(PollingPropertiesFileConfigurationProvider.java:87)
> at org.apache.flume.lifecycle.LifecycleSupervisor.stop(LifecycleSupervisor.java:106)
> - locked <0x00000000eaf2daa0> (a org.apache.flume.lifecycle.LifecycleSupervisor)
> at org.apache.flume.node.Application.stop(Application.java:93)
> - locked <0x00000000eaf3c580> (a org.apache.flume.node.Application)
> at org.apache.flume.node.Application$1.run(Application.java:348)
> "conf-file-poller-0" prio=10 tid=0x00007f2e2e8cd000 nid=0x21819 waiting for monitor entry [0x0000000041e3f000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.apache.flume.node.Application.handleConfigurationEvent(Application.java:88)
> - waiting to lock <0x00000000eaf3c580> (a org.apache.flume.node.Application)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)