You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by JamesDeery <ja...@twistedfish.com> on 2018/04/27 09:37:45 UTC

Messages Getting Stuck in Durable Subscriptions

We are having a problem in multiple installations of ActiveMQ where messages
are getting "stuck" in durable subscription queues. By "stuck" I mean that
the pending queue size for the subscription is greater than zero, but the
consumer never receives anything. New messages to the topic are delivered to
the consumer immediately, but the stuck ones are never delivered. Restarting
both the consumer and the broker doesn't change anything.

If this was just a case of a counter being wrong we wouldn't have a problem,
but there are KahaDB `db-XXXX.log` files which are not being collected
because they contain these stuck messages. Over time more messages get stuck
and the `.log` files accumulate. The only way we've been able to clean up
the situation is to delete the durable subscription while the entire
application is stopped (in order to avoid data loss through missed messages
before the subscription is recreated). This is not something we want to have
to do on a production system, and we are losing messages without even
knowing what they were or when they were originally sent.

This has been happening for several years on many separate installations
using ActiveMQ versions from 5.11 (IIRC) to 5.15. We have never been able to
pinpoint when this starts happening or deliberately reproduce it in any way,
but I have been able to capture a fairly minimal example of a KahaDB
database which has these stuck messages in it.

Is there a way that we can see the stuck messages? Is it possible to
identify whether the KahaDB files are corrupt/inconsistent in some way?
We've turned on detailed logging for KahaDB and we're not getting any errors
reported. Is there anything else we can do to debug? Is it possible that
we're doing something wrong that is causing this to happen, or is it a bug
in ActiveMQ?

I've put together a list of steps that illustrates the problem we are
having. If someone wants to take a look themselves the KahaDB files can be
downloaded from
https://drive.google.com/uc?id=123YD8s_Mp7cPWMHcvvpHesvzbV9P6hL4&export=download
(2.5MB, MD5 277b0c4ebf0b6cfe12e1202330afa682).

Environment: Windows 10, JRE 8u171

1. Download ActiveMQ 5.15.3 and extract
2. Edit `conf/log4j.properties` and append:

    log4j.appender.kahadb=org.apache.log4j.RollingFileAppender
    log4j.appender.kahadb.file=${activemq.base}/data/kahadb.log
    log4j.appender.kahadb.maxFileSize=1024KB
    log4j.appender.kahadb.maxBackupIndex=5
    log4j.appender.kahadb.append=true
    log4j.appender.kahadb.layout=org.apache.log4j.PatternLayout
    log4j.appender.kahadb.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
    log4j.logger.org.apache.activemq.store.kahadb.MessageDatabase=TRACE,
kahadb

3. Extract the KahaDB files into the `data/` directory
4. From within `bin/` run `activemq start`
5. Browse to subscriber information in the web console,
http://localhost:8161/admin/subscribers.jsp
6. We can see that there is one offline durable subscriber with a pending
queue size of 36
7. Attempt to view the pending messages through the Jolokia API,
http://localhost:8161/api/jolokia/exec/org.apache.activemq:type=Broker,brokerName=localhost,destinationType=Topic,destinationName=Data.Changed,endpoint=Consumer,clientId=session-server-1,consumerId=Durable(session-server-1_Mobile.DataListener)/browse()
8. We expect to see the details of all 36 pending messages, but instead we
get back an empty list
9. Create a consumer to receive the pending messages for the durable
subscription, eg. with the following C# using the `ActiveMQ.NMS.ActiveMQ`
1.7.2 NuGet package:

    class ActiveMQTest {
        static void Main(string[] args) {
            var connectionFactory = new
Apache.NMS.ActiveMQ.ConnectionFactory("tcp://localhost:61616",
"session-server-1");
            var connection = connectionFactory.CreateConnection();
            connection.Start();
    
            var session = connection.CreateSession();
            var consumer = session.CreateDurableConsumer(
                session.GetTopic("Data.Changed"),
                "Mobile.DataListener",
                "EchoType=128 OR EchoType=22 OR EchoType=21 OR EchoType=2 OR
EchoType=90 OR EchoType=177 OR EchoType=178 OR EchoType=442 OR EchoType=28
OR EchoType=29 OR EchoType=72 OR EchoType=26 OR EchoType=20 OR EchoType=25
OR EchoType=24 OR EchoType=155 OR EchoType=4 OR EchoType=89 OR EchoType=91
OR EchoType=210 OR EchoType=198 OR EchoType=188 OR EchoType=181 OR
EchoType=182 OR EchoType=179 OR EchoType=180 OR EchoType=234 OR EchoType=33
OR EchoType=32 OR EchoType=79 OR EchoType=15 OR EchoType=110 OR EchoType=7
OR EchoType=477",
                false);
    
            while (true) {
                System.Console.WriteLine("Waiting for next message...");
                var message = consumer.Receive();
                System.Console.WriteLine("Received message {0}",
message.NMSMessageId);
            }
        }
    }

10. The durable subscription is now online, but the call to
`consumer.Receive()` never returns and the pending queue size still shows as
36
11. Send a message to the `Data.Changed` topic with an `EchoType` property
with an integer value of 128
12. The consumer immediately receives this message
13. Stop the consumer
14. Send another message as in step 11
15. The pending queue size now shows as 37
16. Looking through the Jolokia API again, we can see the details of the one
message we just sent, but not the 36 others
17. Looking at the messages in `data/kahadb.log` we can see the following
messages repeated:

    Last update: 1024:7412231, full gc candidates set: [731, 1024]
    gc candidates after producerSequenceIdTrackerLocation:1024:7401196,
[731]
    gc candidates after ackMessageFileMapLocation:1024:7412488, [731]
    gc candidates after in progress tx range:[null, null], [731]
    gc candidates after dest:1:Data.Changed, []
    gc candidates: []
    ackMessageFileMap: {1024=[1024]}

18. Delete the durable subscription
19. After waiting for a KahaDB GC cycle we get the following messages in
`kahadb.log`:

    Last update: 1024:7416948, full gc candidates set: [731, 1024]
    gc candidates after producerSequenceIdTrackerLocation:1024:7401196,
[731]
    gc candidates after ackMessageFileMapLocation:1024:7417205, [731]
    gc candidates after in progress tx range:[null, null], [731]
    gc candidates after dest:1:Data.Changed, [731]
    gc candidates: [731]
    ackMessageFileMap: {1024=[1024]}
    Cleanup removing the data files: [731]

We can see that it is the durable subscription to the `Data.Changed` topic
that is holding on to references to the messages in the `db-731.log` file.
Since the consumer is not receiving the messages, the only thing that will
make the references go away appears to be to delete the subscription.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html