You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by "Josh Carlson (JIRA)" <ji...@apache.org> on 2010/02/24 20:16:40 UTC

[jira] Updated: (AMQ-2627) Failover causes duplicate messages

     [ https://issues.apache.org/activemq/browse/AMQ-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Carlson updated AMQ-2627:
------------------------------

    Attachment: broken_failover.tar.bz2

Attached reproducer described in this bug report.

> Failover causes duplicate messages
> ----------------------------------
>
>                 Key: AMQ-2627
>                 URL: https://issues.apache.org/activemq/browse/AMQ-2627
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.3.0
>         Environment: Server: 2 RHEL 5.3 x86-64 machines. Kernel version 2.6.18-128.0.0.0.2.el5.
> Client: Same as above. Also tested with same results on Fedora Core 11
>            Reporter: Josh Carlson
>            Priority: Blocker
>         Attachments: broken_failover.tar.bz2
>
>
> When using a shared file system master/server activemq configuration and client acknoledgements we run into a problem when
> our clients fail over to a new server. The problem is that the new server does not appear to have any knowledge of pending
> messages that the old server had dispatched to clients. Consequently all of these pending messages get dispatched a second
> time even though the clients had acknowledged them.
> Please confirm my suspicion that this is a server side bug and if there are any suggestions for working around this issue so that it might work. I have put this at Priority 'Blocker' because it blocks our progress towards deploying an ActiveMQ solution to our infrastructure. 
> If you look at the log file from the new broker you can see that the ack for those messages do not get matched:
>    2010-02-24 12:46:49,759 | WARN  | Async error occurred: javax.jms.JMSException: Unmatched acknowledege:
> I do not know whether this gets bubbled up to the client or not. If it does it must be under the hood in activemq-cpp
> because from the application layer I do not see any errors. In our in house Perl Stomp client we wind up getting an ERROR
> frame which it did not know what to do with. This is where I intially ran into this problem. Today is my first day using
> CMS to attempt to verify if the bug is independent of the client and to provide a reproducer using a client everyone
> should have ready access to.
> The attached tar file will contain the following details for reproducing this problem.
> Contents:
>    README.txt                   - This File
>    activemq_1.xml               - ActiveMQ config for the server that was master at the time I started the consumer
>    activemq_2.xml               - ActiveMQ config for the broker which became the master after the original master failed
>    activemq_1.log               - Log file from the first server
>    activemq_2.log               - Log for the second server
>    producers/SimpleProducer.cpp - Modified version of program shipped in activemq-cpp-library-3.1.0 to
>                                   send only 2 messages and provide two broker hosts on the command line.
>    consumers/SimpleConsumer.cpp - New file ... but really just a modified version of SimpleAsyncConsumer shipped with
>                                   activemq-cpp-library-3.1.0. Modified as follows:
>                                      - Retrieves messages synchronously and in one thread (so we can see what is going on)
>                                      - Takes two command line options to name broker hosts to use in broker URI
>                                      - Uses Client Acknoledgements.
>                                      - After retrieving a message it blocks waiting for standard input (so one has time to go kill the server)
>     Makefile.am                 - Modified version of the makefile to build the new SimpleConsumer program.
>     
>     
> Note that the build for these files require that they be built from inside a activemq-cpp build tree. So the first step to reproduce this problem would be to copy producers/SimpleProducer.cpp consumers/SimpleConsumer.cpp and Makefile.am to your src/examples directory. Then run a top level, configure and make. I ran this using activemq-cpp-library version 3.1.0
>     
> This reproducer expects that you only have 2 activemq brokers and that they be configured using a shared file system master/slave configuration. It also expects an openwire transport connector listening on port 61616 on those two machines. (Note: you'll see my activemq configs using the transport uri: uri="tcp://q1masterhost:61616", q1masterhost goes to the ethernet 0 interface on each of the hosts.)
> Once you have those two brokers set up and running. Go ahead and run the simple_producer code passing the hostnames of your two brokers on the command line:
>         [jcarlson@rocky examples]$ ./simple_producer mmq1 mmq2
>         =====================================================
>         Starting the example:
>         -----------------------------------------------------
>         Sent message #1 from thread 139817389041504
>         Sent message #2 from thread 139817389041504
>         -----------------------------------------------------
>         Finished with the example.
>         =====================================================
> Now do the same for the simple_consumer:
>         [jcarlson@rocky examples]$ ./simple_consumer mmq1 mmq2
>         =====================================================
>         Starting the example:
>         -----------------------------------------------------
>         Message #1 Received: Hello world! from thread 139817389041504
>         Waiting for stdin to acknoledge
> The app has retrieved one message but has not ack'ed it yet. Now go identify
> which host has the master broker and kill the process. The master broker will
> be the one which is *not* printing 'Database [lockfile] is locked' messages.
> In my case the broker was on mmq1 so I did this in another terminal:
>         ssh -t mmq1 sudo pkill java
> Immediatly I see this in the console I started the consumer in:
>   The Connection's Transport has been Interrupted.
> and then a few seconds later I see:
>   The Connection's Transport has been Restored.
> At this point I hit enter in the terminal so that the message I recieved on
> the other broker gets acknoledged and the consumer trys to get another message
>   Message #2 Received: Hello world! from thread 139817389041504
>   Waiting for stdin to acknoledge
> Ok at this point, since I have only put two messages on the queue I don't
> expect any more so when I hit enter and go back to get another message I
> expect it to just sit and wait for another message to come in. This is not
> what happens. A third message is retrieved:
>   Message #3 Received: Hello world! from thread 139817389041504
>   Waiting for stdin to acknoledge
> At this point when I hit enter again the app blocks and I kill it with Cntrl
> C.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.