You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Tim Bain <tb...@alumni.duke.edu> on 2015/01/22 01:00:27 UTC

Broker logs show repeated "java.lang.IllegalStateException: Timer already cancelled." lines

The other day one of our brokers running 5.10.0 spewed over 1 million of
the following exceptions to the log over a 2-hour period:

Transport Connection to: tcp://127.0.0.1:xxxxx failed: java.io.IOException:
Unexpected error occurred: java.lang.IllegalStateException: Timer already
cancelled.

Clients were observed to hang on startup (which we believe means they were
unable to connect to the broker) until it was rebooted, after which we
haven't seen the exception again.

Once the exceptions started, there were no stack traces or other log lines
that would indicate anything else about the cause, just those messages
repeating.  The problems started immediately (a few milliseconds) after
an EOFException in the broker logs; we see those EOFExceptions pretty often
and they've never before resulted in "Timer already cancelled"
exceptions, so that might indicate what got us into a bad state but then
again it might be entirely unrelated.

I searched JIRA and the mailing list archives for similar issues, and
although there are a lot of incidences of "Timer already cancelled"
exceptions, none of them exactly match our situation.

   -
   http://activemq.2283324.n4.nabble.com/Producer-connections-keep-breaking-td4671152.html
   describes repeated copies of the line in the logs and is the closest
   parallel I've found, but it sounded like messages were still getting
   passed, albeit more slowly than normal, whereas the developer on my team
   who hit this said he didn't think any messages were getting sent.  But it's
   the closest match of the group.
   - https://issues.apache.org/jira/browse/AMQ-5508 has a detailed
   investigation into the root cause of the problem that Pero Atanasov saw,
   but his scenario occurred only on broker shutdown, whereas our broker was
   not shutting down at the time, and it wasn't clear that the log line
   repeated for him.
   -
   http://activemq.2283324.n4.nabble.com/Timer-already-cancelled-and-KahaDB-Recovering-checkpoint-thread-after-death-td4676684.html
   has repeated messages like I'm seeing, but appears to be specific to KahaDB
   which I'm not using (we use non-persistent messages only).
   - https://issues.apache.org/jira/browse/AMQ-4805 has the same
   inner exception but not the full log line, and it shows a full stack trace
   whereas I only see the one line without the stack trace.
   -
   http://activemq.2283324.n4.nabble.com/can-t-send-message-Timer-already-cancelled-td4680175.html
   appears to see the exception on the producer when sending rather than on
   the broker.

The only thing that we observed that might be related was that earlier in
the broker's uptime, the developer who was running it ran up against his
maxProc limit and wasn't able to create new native threads.  That didn't
result in any issues in the broker logs at the time (we only hit these
issues several hours later), so I'm skeptical that it's related given that
there are lots of other things that cause "Timer already cancelled"
exceptions as evidenced by the partial list above, but it could be.

Has anyone seen the issue we just experienced?  I'd love to help figure out
what's causing the bug so we can get a fix, but since we saw it only once
and can't reproduce it on demand, I'm at a loss for how to make progress
towards identifying the root cause.

Tim

Re: Broker logs show repeated "java.lang.IllegalStateException: Timer already cancelled." lines

Posted by Tim Bain <tb...@alumni.duke.edu>.
Since no one has seen behavior similar to what we observed, I created
https://issues.apache.org/jira/browse/AMQ-5543 to capture it.  Since we've
only seen this once and don't know how to reliably reproduce it, if anyone
sees similar behavior in the future, please comment on that JIRA to add
your observations.

On Wed, Jan 21, 2015 at 5:00 PM, Tim Bain <tb...@alumni.duke.edu> wrote:

> The other day one of our brokers running 5.10.0 spewed over 1 million of
> the following exceptions to the log over a 2-hour period:
>
> Transport Connection to: tcp://127.0.0.1:xxxxx failed:
> java.io.IOException: Unexpected error occurred:
> java.lang.IllegalStateException: Timer already cancelled.
>
> Clients were observed to hang on startup (which we believe means they were
> unable to connect to the broker) until it was rebooted, after which we
> haven't seen the exception again.
>
> Once the exceptions started, there were no stack traces or other log lines
> that would indicate anything else about the cause, just those messages
> repeating.  The problems started immediately (a few milliseconds) after
> an EOFException in the broker logs; we see those EOFExceptions pretty often
> and they've never before resulted in "Timer already cancelled"
> exceptions, so that might indicate what got us into a bad state but then
> again it might be entirely unrelated.
>
> I searched JIRA and the mailing list archives for similar issues, and
> although there are a lot of incidences of "Timer already cancelled"
> exceptions, none of them exactly match our situation.
>
>    -
>    http://activemq.2283324.n4.nabble.com/Producer-connections-keep-breaking-td4671152.html
>    describes repeated copies of the line in the logs and is the closest
>    parallel I've found, but it sounded like messages were still getting
>    passed, albeit more slowly than normal, whereas the developer on my team
>    who hit this said he didn't think any messages were getting sent.  But it's
>    the closest match of the group.
>    - https://issues.apache.org/jira/browse/AMQ-5508 has a detailed
>    investigation into the root cause of the problem that Pero Atanasov saw,
>    but his scenario occurred only on broker shutdown, whereas our broker was
>    not shutting down at the time, and it wasn't clear that the log line
>    repeated for him.
>    -
>    http://activemq.2283324.n4.nabble.com/Timer-already-cancelled-and-KahaDB-Recovering-checkpoint-thread-after-death-td4676684.html
>    has repeated messages like I'm seeing, but appears to be specific to KahaDB
>    which I'm not using (we use non-persistent messages only).
>    - https://issues.apache.org/jira/browse/AMQ-4805 has the same
>    inner exception but not the full log line, and it shows a full stack trace
>    whereas I only see the one line without the stack trace.
>    -
>    http://activemq.2283324.n4.nabble.com/can-t-send-message-Timer-already-cancelled-td4680175.html
>    appears to see the exception on the producer when sending rather than on
>    the broker.
>
> The only thing that we observed that might be related was that earlier in
> the broker's uptime, the developer who was running it ran up against his
> maxProc limit and wasn't able to create new native threads.  That didn't
> result in any issues in the broker logs at the time (we only hit these
> issues several hours later), so I'm skeptical that it's related given that
> there are lots of other things that cause "Timer already cancelled"
> exceptions as evidenced by the partial list above, but it could be.
>
> Has anyone seen the issue we just experienced?  I'd love to help figure
> out what's causing the bug so we can get a fix, but since we saw it only
> once and can't reproduce it on demand, I'm at a loss for how to make
> progress towards identifying the root cause.
>
> Tim
>