You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Derek Wilhelm (JIRA)" <ji...@apache.org> on 2018/10/24 20:06:00 UTC
[jira] [Created] (ARTEMIS-2147) Fail over and Fail back race condition with dynamic queues

Derek Wilhelm created ARTEMIS-2147:
--------------------------------------

             Summary: Fail over and Fail back race condition with dynamic queues
                 Key: ARTEMIS-2147
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2147
             Project: ActiveMQ Artemis
          Issue Type: Bug
          Components: Broker
    Affects Versions: 2.6.3, 2.5.0, 2.4.0
            Reporter: Derek Wilhelm


There appears to be a race condition when using dynamically created queues with replication based fail over and fail back and using the CORE jms client.  When a fail over and/or fail back occurs the server will log an exception:

`ERROR [org.apache.activemq.artemis.core.server] AMQ224016: Caught exception: ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST message=AMQ119017: Queue test.queue does not exist]`

The client never sees an exception (after the initial connection failure) and appears to believe that the re-connection was a success.  However, the client will no longer receive messages that are sent to the queue.  If you debug through the code upon a fail over at the part where the consumer is being created you will not see the problem occur unless you set the break point after the address lookup at which point it will occasionally fail.  Hence the belief that this is a race condition.

 

Steps to reproduce:

1. Create master server with replication, check-for-live-server=true

2. Create backup server with replication, allow-failback=true, failback-delay=5000

3. Start master server

4. Start backup server

5. Create a consumer on a dynamically defined, named queue (e.g. test.queue) using the artemis core jms client

6. Create a producer from another connection on the same queue and start sending periodic messages

7. Stop the master server

 - Failover to the backup will take place.  The client will log the connection failure

 - The error may occur at this point where the backup server will log the aforementioned exception - If the error does occur, the consumer will stop receiving new messages

8. Start the master server

 - Fail back to the master server will take place once it has started

 - The client will log the connection failure once the master takes over

 - The error may occur at this point where the master server will log the aforementioned exception - If the error does occur, the consumer will stop receiving new messages

9. If the ActiveMQNonExistentQueueException does not occur, repeat steps 7 and 8.

 

The exception most often occurs during the fail back to the master server and often within only 1 or 2 fail back attempts.  This has been seen on 2.4.0, 2.5.0, 2.6.3, and 2.7.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)