You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "Claus Ibsen (JIRA)" <ji...@apache.org> on 2019/03/23 07:33:00 UTC

[jira] [Resolved] (CAMEL-13358) Sometimes SedaConsumer does not start and remains stuck

     [ https://issues.apache.org/jira/browse/CAMEL-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Claus Ibsen resolved CAMEL-13358.
---------------------------------
       Resolution: Fixed
         Assignee: Claus Ibsen
    Fix Version/s: 3.0.0

Thanks for the report and the detailed explanation. Yeah its improved/fixed in 3.0.

> Sometimes SedaConsumer does not start and remains stuck
> -------------------------------------------------------
>
>                 Key: CAMEL-13358
>                 URL: https://issues.apache.org/jira/browse/CAMEL-13358
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-core
>    Affects Versions: 2.19.0
>            Reporter: Graeme Moss
>            Assignee: Claus Ibsen
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Sometimes we notice that some of our routes get "stuck" and do not process anything.  Looking at the stacktrace for such a route we see something like this:
> {code:java}
> "Camel (SomeName) thread #648 - seda://someName" #1287 daemon prio=5 os_prio=0 tid=0x00007f23880cf000 nid=0x25006 waiting on condition [0x00007f1f23530000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000006e4beb8e0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>         at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> whereas we would instead normally expect to see something like this:
> {code:java}
> "Camel (SomeName) thread #840 - seda://someName" #1577 daemon prio=5 os_prio=0 tid=0x00007f227c02b000 nid=0x256fe waiting on condition [0x00007f1f0ca20000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000006eb69c1f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>         at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
>         at org.apache.camel.component.seda.SedaConsumer.doRun(SedaConsumer.java:198)
>         at org.apache.camel.component.seda.SedaConsumer.run(SedaConsumer.java:154)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> Note that whereas this latter example has SedaConsumer in the stacktrace polling for an exchange, the former example has no such SedaConsumer in the stacktrace and is stuck waiting for a new "task" to be given to the ThreadPoolExecutor.
> After adding debug logging, I see that the issue is as follows:
> The code in ServiceSupport here writes to the starting and started flags using one thread (call it Thread W):
> {code}
>     public void start() throws Exception {
>         if (isStarting() || isStarted()) {
>             // only start service if not already started
>             LOG.trace("Service already started");
>             return;
>         }
>         if (starting.compareAndSet(false, true)) {
>             LOG.trace("Starting service");
>             try {
>                 doStart();
>                 started.set(true);
>                 starting.set(false);
> {code}
> and on a different thread (call it thread R) this code in ServiceSupport reads the starting and started flags:
> {code}
>     public boolean isRunAllowed() {
>         // if we have not yet initialized, then all options is false
>         boolean unused1 = !started.get() && !starting.get() && !stopping.get() && !stopped.get();
>         boolean unused2 = !suspending.get() && !suspended.get() && !shutdown.get() && !shuttingdown.get();
>         if (unused1 && unused2) {
>             return false;
>         }
>         return !isStoppingOrStopped();
>     }
> {code}
> So the order of events is:
> # Thread W: calls ServiceSupport.start()
> # Thread W: compareAndSet starting from false to true
> # Thread W: calls ServiceSupport.doStart() which eventually creates and starts Thread R
> # Thread R: calls SedaConsumer.doRun() which calls isRunAllowed()
> # Thread R: in isRunAllowed() reads started as false
> # Thread W: set started to true
> # Thread W: set starting to false
> # Thread R: in isRunAllowed() reads starting as false
> # Thread R: concludes that isRunAllowed is false
> # Thread R: the while loop in SedaConsumer.doRun is not executed and the thread will forever remain stuck waiting for a new "task" (on ThreadPoolExecutor.java:1067)
> One fix would be to swap reading of starting and started so that starting is read first, then started.
> Looking at the latest code in Camel 3.0.x, this code has been refactored to use a single "status" field instead of multiple different AtomicBoolean fields.  That should also fix this bug.  However, for completeness, I thought it best to log this bug anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)