You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "Claus Ibsen (JIRA)" <ji...@apache.org> on 2019/03/23 07:33:00 UTC
[jira] [Resolved] (CAMEL-13358) Sometimes SedaConsumer does not
start and remains stuck
[ https://issues.apache.org/jira/browse/CAMEL-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Claus Ibsen resolved CAMEL-13358.
---------------------------------
Resolution: Fixed
Assignee: Claus Ibsen
Fix Version/s: 3.0.0
Thanks for the report and the detailed explanation. Yeah its improved/fixed in 3.0.
> Sometimes SedaConsumer does not start and remains stuck
> -------------------------------------------------------
>
> Key: CAMEL-13358
> URL: https://issues.apache.org/jira/browse/CAMEL-13358
> Project: Camel
> Issue Type: Bug
> Components: camel-core
> Affects Versions: 2.19.0
> Reporter: Graeme Moss
> Assignee: Claus Ibsen
> Priority: Major
> Fix For: 3.0.0
>
>
> Sometimes we notice that some of our routes get "stuck" and do not process anything. Looking at the stacktrace for such a route we see something like this:
> {code:java}
> "Camel (SomeName) thread #648 - seda://someName" #1287 daemon prio=5 os_prio=0 tid=0x00007f23880cf000 nid=0x25006 waiting on condition [0x00007f1f23530000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000006e4beb8e0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> whereas we would instead normally expect to see something like this:
> {code:java}
> "Camel (SomeName) thread #840 - seda://someName" #1577 daemon prio=5 os_prio=0 tid=0x00007f227c02b000 nid=0x256fe waiting on condition [0x00007f1f0ca20000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000006eb69c1f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at org.apache.camel.component.seda.SedaConsumer.doRun(SedaConsumer.java:198)
> at org.apache.camel.component.seda.SedaConsumer.run(SedaConsumer.java:154)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Note that whereas this latter example has SedaConsumer in the stacktrace polling for an exchange, the former example has no such SedaConsumer in the stacktrace and is stuck waiting for a new "task" to be given to the ThreadPoolExecutor.
> After adding debug logging, I see that the issue is as follows:
> The code in ServiceSupport here writes to the starting and started flags using one thread (call it Thread W):
> {code}
> public void start() throws Exception {
> if (isStarting() || isStarted()) {
> // only start service if not already started
> LOG.trace("Service already started");
> return;
> }
> if (starting.compareAndSet(false, true)) {
> LOG.trace("Starting service");
> try {
> doStart();
> started.set(true);
> starting.set(false);
> {code}
> and on a different thread (call it thread R) this code in ServiceSupport reads the starting and started flags:
> {code}
> public boolean isRunAllowed() {
> // if we have not yet initialized, then all options is false
> boolean unused1 = !started.get() && !starting.get() && !stopping.get() && !stopped.get();
> boolean unused2 = !suspending.get() && !suspended.get() && !shutdown.get() && !shuttingdown.get();
> if (unused1 && unused2) {
> return false;
> }
> return !isStoppingOrStopped();
> }
> {code}
> So the order of events is:
> # Thread W: calls ServiceSupport.start()
> # Thread W: compareAndSet starting from false to true
> # Thread W: calls ServiceSupport.doStart() which eventually creates and starts Thread R
> # Thread R: calls SedaConsumer.doRun() which calls isRunAllowed()
> # Thread R: in isRunAllowed() reads started as false
> # Thread W: set started to true
> # Thread W: set starting to false
> # Thread R: in isRunAllowed() reads starting as false
> # Thread R: concludes that isRunAllowed is false
> # Thread R: the while loop in SedaConsumer.doRun is not executed and the thread will forever remain stuck waiting for a new "task" (on ThreadPoolExecutor.java:1067)
> One fix would be to swap reading of starting and started so that starting is read first, then started.
> Looking at the latest code in Camel 3.0.x, this code has been refactored to use a single "status" field instead of multiple different AtomicBoolean fields. That should also fix this bug. However, for completeness, I thought it best to log this bug anyway.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)