You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Justin Bertram (Jira)" <ji...@apache.org> on 2020/03/27 14:15:00 UTC

[jira] [Comment Edited] (ARTEMIS-2677) Artemis 2.11.0 RejectedExecutionException after successful failover

    [ https://issues.apache.org/jira/browse/ARTEMIS-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065455#comment-17065455 ] 

Justin Bertram edited comment on ARTEMIS-2677 at 3/27/20, 2:14 PM:
-------------------------------------------------------------------

Attached broker.xml and broker-slave.xml.

I am running a master on 3GB RAM and a slave on 2GB RAM. In general failover works.

Load on Artemis we process is around average 5-7M messages (text and binary) a day (approx single message size text message size 7kb). Max messages around 1GB.  

I tried to reproduce again with same configuration in AWS with other pair of master and slave by shutting down master. Failover worked fine and all consumers were successfully moved to the slave, producers got updated topology and Artemis was receiving messages, and didn't get {{RejectedExecutionException}}. 

I will try reproducing it in isolation on one machine. Can you please guide in which situations Artemis throws {{RejectedExecutionException}} (e.g, is it related to resources (memory, CPU, or slow NFS) on the machine at given point-in-time when this happened)?

Please let me know if you have any suggestion related to configuration tunings or resource changes needs to be applied help reduce the possibility of such exception?


was (Author: jigaronline):
Attached broker.xml and broker-slave.xml.

I am running a master on 3GB RAM and a slave on 2GB RAM. In-general failover works.

Load on Artemis we process is around average 5-7M messages (text and binary) a day (approx single message size text message size7 kb). Max messages around 1GB. 

 

I tried to reproduce again with same configuration in AWS with other pair of master and slave by shutting down master. Failover worked fine and all consumers were successfully moved to the slave, producers got updated topology and Artemis was receiving messages, and didn't get "RejectedExecutionException".

 

I will try reproducing it, in isolation on one machine. Can you please guide in which situations Artemis throws "RejectedExecutionException"?

E.g, I mean is it related to resources (memory, cpu or slow NFS) on the machine at given point-in-time when this happened.

Please let me know if you have any suggestion related to configuration tunings or resource changes needs to be applied help reduce the possiblity of such exception?

 

 

 

 

 

> Artemis 2.11.0 RejectedExecutionException after successful failover
> -------------------------------------------------------------------
>
>                 Key: ARTEMIS-2677
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2677
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>         Environment: Environment: Artemis 2.11.0 (Master-Slave)
>  SharedStore, FilePing. AWS EFS/NFS.
>            Reporter: Jigar Shah
>            Priority: Major
>         Attachments: broker - slave.xml, broker.xml
>
>
> Observed an issue on master shutdown, the slave became active. Right after slave being active it started printing "RejectedExecutionException". Also, consumers from master to slave was not transferred. And client application stopped processing messages.
> Note RejectedExecutionException had [Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 59059]
> Following are the logs during failover from master to slave:
> Master1:
> {noformat}
> 2020-03-18 12:34:39,911 WARN [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session 26fb51af-690c-11ea-959a-12f8371a8293
> 2020-03-18 12:34:39,912 INFO [org.apache.activemq.artemis.core.server] AMQ221029: stopped bridge $.artemis.internal.sf.my-cluster.2c85cd9e-5f75-11ea-9be1-0af9119190b9
> 2020-03-18 12:34:39,950 WARN [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session 2655265e-690c-11ea-9d7c-12f7d16c4f81
> 2020-03-18 12:34:39,951 WARN [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session 2655265e-690c-11ea-9d7c-12f7d16c4f81
> 2020-03-18 12:34:39,951 WARN [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session 2662e209-690c-11ea-9d7c-12f7d16c4f81
> 2020-03-18 12:34:39,951 WARN [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session 2662e209-690c-11ea-9d7c-12f7d16c4f81
> 2020-03-18 12:34:39,951 WARN [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session 2669bfdd-690c-11ea-9d7c-12f7d16c4f81
> 2020-03-18 12:34:39,951 WARN [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session 2669bfdd-690c-11ea-9d7c-12f7d16c4f81
> 2020-03-18 12:34:39,993 WARN [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session 26615b64-690c-11ea-9d7c-12f7d16c4f81
> 2020-03-18 12:34:39,994 WARN [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session 26615b64-690c-11ea-9d7c-12f7d16c4f81
> 2020-03-18 12:34:39,994 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to artemis1.sl.idsk.com/10.110.0.17:61616 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
> 2020-03-18 12:34:40,123 FINE [org.jgroups.protocols.FILE_PING] remove persistence-fs
> 2020-03-18 12:34:40,137 FINE [org.jgroups.protocols.FD_SOCK] ip-10-110-0-17-18496: socket to ip-10-110-0-88-23224 was closed gracefully
> 2020-03-18 12:34:40,138 FINE [org.jgroups.protocols.TCP] closing sockets and stopping threads
> 2020-03-18 12:34:40,169 WARN [org.apache.activemq.artemis.core.client] AMQ212004: Failed to connect to server.
> 2020-03-18 12:34:41,336 INFO [io.hawt.web.AuthenticationFilter] Destroying hawtio authentication filter
> 2020-03-18 12:34:41,341 INFO [io.hawt.HawtioContextListener] Destroying hawtio services
> 2020-03-18 12:34:41,464 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
> 2020-03-18 12:34:41,472 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Destroyed activemq-branding plugin
> 2020-03-18 12:34:41,509 INFO [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.11.0 [19ade3bb-5f75-11ea-b327-1216d251b187] stopped, uptime 5 hours 2 minutes
> {noformat}
> Slave1:
> {noformat}
> 2020-03-18 12:35:01,613 INFO [org.apache.activemq.artemis.core.server] AMQ221010: Backup Server is now live
> 2020-03-18 12:35:01,626 INFO [org.apache.activemq.artemis.core.server] AMQ221027: Bridge ClusterConnectionBridge@5ced3917 [name=$.artemis.internal.sf.my-cluster.2c85cd9e-5f75-11ea-9be1-0af9119190b9, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.2c85cd9e-5f75-11ea-9be1-0af9119190b9, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=19ade3bb-5f75-11ea-b327-1216d251b187], temp=false]@58d2a652 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@5ced3917 [name=$.artemis.internal.sf.my-cluster.2c85cd9e-5f75-11ea-9be1-0af9119190b9, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.2c85cd9e-5f75-11ea-9be1-0af9119190b9, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=19ade3bb-5f75-11ea-b327-1216d251b187], temp=false]@58d2a652 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=netty-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=artemis2-sl-idsk-com], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1788190971[nodeUUID=19ade3bb-5f75-11ea-b327-1216d251b187, connector=TransportConfiguration(name=netty-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61617&host=artemis2-sl-idsk-com, address=, server=ActiveMQServerImpl::serverUUID=19ade3bb-5f75-11ea-b327-1216d251b187])) [initialConnectors=[TransportConfiguration(name=netty-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=artemis2-sl-idsk-com], discoveryGroupConfiguration=null]] is connected
> 2020-03-18 12:35:20,621 WARN [org.apache.activemq.artemis.core.server] AMQ222054: Error on executing IOCallback: java.util.concurrent.RejectedExecutionException: Task org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$25/191327468@479377d0 rejected from org.apache.activemq.artemis.utils.ActiveMQThreadPoolExecutor@291e146[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 59059]
>  at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) [rt.jar:1.8.0_212]
>  at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) [rt.jar:1.8.0_212]
>  at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) [rt.jar:1.8.0_212]
>  at org.apache.activemq.artemis.utils.actors.ProcessorBase.onAddedTaskIfNotRunning(ProcessorBase.java:205) [artemis-commons-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.utils.actors.ProcessorBase.task(ProcessorBase.java:193) [artemis-commons-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.utils.actors.OrderedExecutor.execute(OrderedExecutor.java:54) [artemis-commons-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.execute(OperationContextImpl.java:238) [artemis-server-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.checkTasks(OperationContextImpl.java:221) [artemis-server-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.done(OperationContextImpl.java:197) [artemis-server-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.core.io.IOCallback.lambda$done$0(IOCallback.java:43) [artemis-journal-2.11.0.jar:2.11.0]
>  at java.util.ArrayList.forEach(ArrayList.java:1257) [rt.jar:1.8.0_212]
>  at org.apache.activemq.artemis.core.io.IOCallback.done(IOCallback.java:41) [artemis-journal-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.core.io.DelegateCallback.done(DelegateCallback.java:41) [artemis-journal-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.doInternalWrite(NIOSequentialFile.java:395) [artemis-journal-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.internalWrite(NIOSequentialFile.java:359) [artemis-journal-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.access$100(NIOSequentialFile.java:43) [artemis-journal-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile$SyncLocalBufferObserver.flushBuffer(NIOSequentialFile.java:434) [artemis-journal-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.core.io.buffer.TimedBuffer.flushBatch(TimedBuffer.java:361) [artemis-journal-2.11.0.jar:2.11.0]
>  at org.apache.activemq.artemis.core.io.buffer.TimedBuffer$CheckTimer.run(TimedBuffer.java:457) [artemis-journal-2.11.0.jar:2.11.0]
>  at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_212]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)