You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Piotr Klimczak (JIRA)" <ji...@apache.org> on 2015/09/10 23:31:45 UTC

[jira] [Comment Edited] (AMQ-5543) Broker logs show repeated "java.lang.IllegalSt​ateException: Timer already cancelled." lines

    [ https://issues.apache.org/jira/browse/AMQ-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739631#comment-14739631 ] 

Piotr Klimczak edited comment on AMQ-5543 at 9/10/15 9:31 PM:
--------------------------------------------------------------

In general increasing Thread Limit is a bad idea.
In big deployments most of threads can be taken by connections, when you use CACHE_NONE, which sometimes is default value for AMQ connection cache.
Just try to use CACHE_CONSUMER (non-XA) or CACHE_CONNECTION (XA) altogether with SingleConneectionPool.
So now you can have 10 concurrent consumers with 10 destinations, using max 1 connection instead of max 100 + publishers.
In case of XA environment, please make sure your transaction manager supports CACHE_CONNECTION.

Apache Aries (open source) and Atomikos Extreme Transactions (commercial) both support them very well.
But don't even try to use CACHE_CONNECTION with Atomikos Essentials (open source)- it will fail ugly.

Thanks to above we have reduced amount of connections from about 500 to about 25.
Number of started threads/day dropped from tens of millions a day to tens of thousands.

Everybody who are experiencing problems as in this issue, should try above approach 1st before increasing os limits.
There is nothing good in having so many threads.

Cheers


was (Author: nannou9):
In general increasing Thread Limit is a bad idea.
In big deployments most of threads are taken by connections.
Just try to use CACHE_CONSUMER (non-XA) or CACHE_CONNECTION (XA) altogether with SingleConneectionPool.
So now you can have 10 concurrent consumers with 10 destinations, using max 1 connection instead of max 100 + publishers.
In case of XA environment, please make sure your transaction manager supports CACHE_CONNECTION.

Apache Aries (open source) and Atomikos Extreme Transactions (commercial) both support them very well.
But don't even try to use CACHE_CONNECTION with Atomikos Essentials (open source)- it will fail ugly.

Thanks to above we have reduced amount of connections from about 500 to about 25.
Number of started threads/day dropped from tens of millions a day to tens of thousands.

Everybody who are experiencing problems as in this issue, should try above approach 1st before increasing os limits.
There is nothing good in having so many threads.

Cheers

> Broker logs show repeated "java.lang.IllegalSt​ateException: Timer already cancelled." lines
> --------------------------------------------------------------------------------------------
>
>                 Key: AMQ-5543
>                 URL: https://issues.apache.org/jira/browse/AMQ-5543
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Transport
>    Affects Versions: 5.10.0
>            Reporter: Tim Bain
>
> One of our brokers running 5.10.0 spewed over 1 million of the following exceptions to the log over a 2-hour period:
> Transport Connection to: tcp://127.0.0.1:xxxxx failed: java.io.IOException: Unexpected error occurred: java.lang.IllegalStateException: Timer already cancelled.
> Clients were observed to hang on startup (which we believe means they were unable to connect to the broker) until it was rebooted, after which we haven't seen the exception again.
> Once the exceptions started, there were no stack traces or other log lines that would indicate anything else about the cause, just those messages repeating.  The problems started immediately (a few milliseconds) after an EOFException in the broker logs; we see those EOFExceptions pretty often and they've never before resulted in "Timer already cancelled" exceptions, so that might indicate what got us into a bad state but then again it might be entirely unrelated.
> I searched JIRA and the mailing list archives for similar issues, and although there are a lot of incidences of "Timer already cancelled" exceptions, none of them exactly match our situation.
> * http://activemq.2283324.n4.nabble.com/Producer-connections-keep-breaking-td4671152.html describes repeated copies of the line in the logs and is the closest parallel I've found, but it sounded like messages were still getting passed, albeit more slowly than normal, whereas the developer on my team who hit this said he didn't think any messages were getting sent.  But it's the closest match of the group.
> * AMQ-5508 has a detailed investigation into the root cause of the problem that Pero Atanasov saw, but his scenario occurred only on broker shutdown, whereas our broker was not shutting down at the time, and it wasn't clear that the log line repeated for him.
> * http://activemq.2283324.n4.nabble.com/Timer-already-cancelled-and-KahaDB-Recovering-checkpoint-thread-after-death-td4676684.html has repeated messages like I'm seeing, but appears to be specific to KahaDB which I'm not using (we use non-persistent messages only).
> * AMQ-4805 has the same inner exception but not the full log line, and it shows a full stack trace whereas I only see the one line without the stack trace.
> * http://activemq.2283324.n4.nabble.com/can-t-send-message-Timer-already-cancelled-td4680175.html appears to see the exception on the producer when sending rather than on the broker.
> The only thing that we observed that might be related was that earlier in the broker's uptime, the developer who was running it ran up against his maxProc limit and wasn't able to create new native threads.  That didn't result in any issues in the broker logs at the time (we only hit these issues several hours later), so I'm skeptical that it's related given that there are lots of other things that cause "Timer already cancelled" exceptions as evidenced by the partial list above, but it could be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)