You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Suman Moorthy (Jira)" <ji...@apache.org> on 2021/01/19 20:08:00 UTC

[jira] [Updated] (ARTEMIS-3076) Artemis Master node not starting after failover to Slave

     [ https://issues.apache.org/jira/browse/ARTEMIS-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suman Moorthy updated ARTEMIS-3076:
-----------------------------------
    Description: 
I have an Artemis (version 2.11.0) HA configured (Master and Slave).

Master node goes down for unknown reason, the below log get printed continuously.
*AMQ222154: Error checking DLQ: ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in state=LOADED, was [STOPPED]]*
{code:java}
2021-01-15 23:02:05,414 WARN  [org.apache.activemq.artemis.core.server] AMQ222154: Error checking DLQ: ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in state=LOADED, was [STOPPED]] 
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkJournalIsLoaded(JournalImpl.java:1087) [artemis-journal-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendUpdateRecord(JournalImpl.java:886) [artemis-journal-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.journal.Journal.appendUpdateRecord(Journal.java:98) [artemis-journal-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.updateDeliveryCount(AbstractJournalStorageManager.java:756) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.QueueImpl.checkRedelivery(QueueImpl.java:3052) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.RefsOperation.rollbackRedelivery(RefsOperation.java:166) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.RefsOperation.afterRollback(RefsOperation.java:113) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:589) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.access$200(TransactionImpl.java:40) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:442) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:244) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.11.0.jar:2.11.0] 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_275] 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_275] 
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.11.0.jar:2.11.0]
{code}
 

The Slave comes up as expected, but throws an NPE:
{noformat}
2021-01-15 23:02:27,529 INFO  [org.apache.activemq.artemis.core.server] AMQ221010: Backup Server is now live
2021-01-15 23:02:27,545 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: java.lang.NullPointerException 
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation$FailbackChecker.<init>(SharedStoreBackupActivation.java:193) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.startFailbackChecker(SharedStoreBackupActivation.java:185) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:118) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3863) [artemis-server-2.11.0.jar:2.11.0]{noformat}
Master attempts to start but, it doesn't progress beyond *"AMQ221034: Waiting indefinitely to obtain live lock"*
 The logs are stuck at this point even after multiple restarts.
{noformat}
2021-01-15 23:03:56,238 INFO  [org.apache.activemq.artemis.core.server] AMQ221006: Waiting to obtain live lock
2021-01-15 23:03:56,300 INFO  [org.apache.activemq.artemis.core.server] AMQ221013: Using NIO Journal
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\cd776bae-1a55-11eb-985d-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\a84f1e4f-1f1a-11eb-a37f-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\a87edff5-1f1a-11eb-a37f-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,988 INFO  [org.apache.activemq.artemis.core.server] AMQ221034: Waiting indefinitely to obtain live lock{noformat}
 

Can you please advise on the issue here and the steps to recover?

Does NPE in Slave start-up have any effects on the queue/functioning?

Do I need to stop the Slave manually to get the Master to start successfully?

  was:
I have an Artemis (version 2.11.0) HA configured (Master and Slave).

Master node goes down for unknown reason, the below log get printed continuously.
{code:java}
2021-01-15 23:02:05,414 WARN  [org.apache.activemq.artemis.core.server] AMQ222154: Error checking DLQ: ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in state=LOADED, was [STOPPED]]
2021-01-15 23:02:05,414 WARN  [org.apache.activemq.artemis.core.server] AMQ222154: Error checking DLQ: ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in state=LOADED, was [STOPPED]] 
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkJournalIsLoaded(JournalImpl.java:1087) [artemis-journal-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendUpdateRecord(JournalImpl.java:886) [artemis-journal-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.journal.Journal.appendUpdateRecord(Journal.java:98) [artemis-journal-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.updateDeliveryCount(AbstractJournalStorageManager.java:756) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.QueueImpl.checkRedelivery(QueueImpl.java:3052) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.RefsOperation.rollbackRedelivery(RefsOperation.java:166) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.RefsOperation.afterRollback(RefsOperation.java:113) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:589) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.access$200(TransactionImpl.java:40) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:442) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:244) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.11.0.jar:2.11.0] 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_275] 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_275] 
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.11.0.jar:2.11.0]
{code}
 

The Slave comes up as expected, but throws an NPE:
{noformat}

2021-01-15 23:02:27,529 INFO  [org.apache.activemq.artemis.core.server] AMQ221010: Backup Server is now live
2021-01-15 23:02:27,545 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: java.lang.NullPointerException 
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation$FailbackChecker.<init>(SharedStoreBackupActivation.java:193) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.startFailbackChecker(SharedStoreBackupActivation.java:185) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:118) [artemis-server-2.11.0.jar:2.11.0] 
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3863) [artemis-server-2.11.0.jar:2.11.0]{noformat}
Master attempts to start but, it doesn't progress beyond *"AMQ221034: Waiting indefinitely to obtain live lock"*
The logs are stuck at this point even after multiple restarts.
{noformat}
2021-01-15 23:03:56,238 INFO  [org.apache.activemq.artemis.core.server] AMQ221006: Waiting to obtain live lock
2021-01-15 23:03:56,300 INFO  [org.apache.activemq.artemis.core.server] AMQ221013: Using NIO Journal
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\cd776bae-1a55-11eb-985d-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\a84f1e4f-1f1a-11eb-a37f-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\a87edff5-1f1a-11eb-a37f-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,988 INFO  [org.apache.activemq.artemis.core.server] AMQ221034: Waiting indefinitely to obtain live lock{noformat}
 

Can you please advise on the issue here and the steps to recover?

Does NPE in Slave start-up have any effects on the queue/functioning?

Do I need to stop the Slave manually to get the Master to start successfully?


> Artemis Master node not starting after failover to Slave
> --------------------------------------------------------
>
>                 Key: ARTEMIS-3076
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3076
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.11.0
>            Reporter: Suman Moorthy
>            Priority: Major
>
> I have an Artemis (version 2.11.0) HA configured (Master and Slave).
> Master node goes down for unknown reason, the below log get printed continuously.
> *AMQ222154: Error checking DLQ: ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in state=LOADED, was [STOPPED]]*
> {code:java}
> 2021-01-15 23:02:05,414 WARN  [org.apache.activemq.artemis.core.server] AMQ222154: Error checking DLQ: ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in state=LOADED, was [STOPPED]] 
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkJournalIsLoaded(JournalImpl.java:1087) [artemis-journal-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendUpdateRecord(JournalImpl.java:886) [artemis-journal-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.journal.Journal.appendUpdateRecord(Journal.java:98) [artemis-journal-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.updateDeliveryCount(AbstractJournalStorageManager.java:756) [artemis-server-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.server.impl.QueueImpl.checkRedelivery(QueueImpl.java:3052) [artemis-server-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.server.impl.RefsOperation.rollbackRedelivery(RefsOperation.java:166) [artemis-server-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.server.impl.RefsOperation.afterRollback(RefsOperation.java:113) [artemis-server-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:589) [artemis-server-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.access$200(TransactionImpl.java:40) [artemis-server-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:442) [artemis-server-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:244) [artemis-server-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.11.0.jar:2.11.0] 
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_275] 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_275] 
> at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.11.0.jar:2.11.0]
> {code}
>  
> The Slave comes up as expected, but throws an NPE:
> {noformat}
> 2021-01-15 23:02:27,529 INFO  [org.apache.activemq.artemis.core.server] AMQ221010: Backup Server is now live
> 2021-01-15 23:02:27,545 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: java.lang.NullPointerException 
> at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation$FailbackChecker.<init>(SharedStoreBackupActivation.java:193) [artemis-server-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.startFailbackChecker(SharedStoreBackupActivation.java:185) [artemis-server-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:118) [artemis-server-2.11.0.jar:2.11.0] 
> at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3863) [artemis-server-2.11.0.jar:2.11.0]{noformat}
> Master attempts to start but, it doesn't progress beyond *"AMQ221034: Waiting indefinitely to obtain live lock"*
>  The logs are stuck at this point even after multiple restarts.
> {noformat}
> 2021-01-15 23:03:56,238 INFO  [org.apache.activemq.artemis.core.server] AMQ221006: Waiting to obtain live lock
> 2021-01-15 23:03:56,300 INFO  [org.apache.activemq.artemis.core.server] AMQ221013: Using NIO Journal
> 2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
> 2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
> 2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
> 2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
> 2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
> 2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
> 2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\cd776bae-1a55-11eb-985d-0050569136c8 did not have an identification file address.txt
> 2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\a84f1e4f-1f1a-11eb-a37f-0050569136c8 did not have an identification file address.txt
> 2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\a87edff5-1f1a-11eb-a37f-0050569136c8 did not have an identification file address.txt
> 2021-01-15 23:03:56,988 INFO  [org.apache.activemq.artemis.core.server] AMQ221034: Waiting indefinitely to obtain live lock{noformat}
>  
> Can you please advise on the issue here and the steps to recover?
> Does NPE in Slave start-up have any effects on the queue/functioning?
> Do I need to stop the Slave manually to get the Master to start successfully?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)