You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by Shobhana <sh...@quickride.in> on 2016/05/17 12:35:56 UTC

AMQ 5.13.2 : Kaha DB logs cleanup issue

We use AMQ to exchange MQTT messages. I observed that the log files in Kaha
DB don't get deleted even after many days.

I used the steps mentioned in 
http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html
<http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html>  
to locate which destination is containing unacked messages or slow consumer.
Following is the trace log output :

2016-05-17 18:05:32,077 | TRACE | Last update: 7:1109948, full gc candidates
set: [1, 2, 3, 4, 5, 6, 7] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,078 | TRACE | gc candidates after
producerSequenceIdTrackerLocation:7, [1, 2, 3, 4, 5, 6] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,078 | TRACE | gc candidates after
ackMessageFileMapLocation:7, [1, 2, 3, 4, 5, 6] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,079 | TRACE | gc candidates after tx range:[null, null],
[1, 2, 3, 4, 5, 6] | org.apache.activemq.store.kahadb.MessageDatabase |
ActiveMQ Journal Checkpoint Worker
2016-05-17 18:05:32,079 | TRACE | gc candidates after
dest:1:riderride.chat.200, [1, 3, 4, 5, 6] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,081 | TRACE | gc candidates after
dest:1:riderride.chat.203, [1, 3, 4, 5, 6] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,081 | TRACE | gc candidates after
dest:1:riderride.chat.206, [1, 3, 5, 6] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,083 | TRACE | gc candidates after
dest:1:riderride.chat.209, [1, 3, 5, 6] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,083 | TRACE | gc candidates after
dest:1:passengerride.user.1234567890, [1, 3, 5] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,083 | TRACE | gc candidates after
dest:1:passengerride.user.1234567891, [1, 3, 5] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,085 | TRACE | gc candidates after
dest:1:passengerride.status.160, [1, 3, 5] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,089 | TRACE | gc candidates after
dest:1:riderride.chat.218, [1, 3, 5] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,090 | TRACE | gc candidates after
dest:1:invitationStatus.1234567891, [1, 3, 5] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,090 | TRACE | gc candidates after
dest:1:chat.9206705762, [3, 5] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,090 | TRACE | gc candidates after
dest:1:invitationStatus.1234567890, [3] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker
2016-05-17 18:05:32,091 | TRACE | gc candidates after
dest:1:riderride.chat.220, [3] |
org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
Checkpoint Worker

As per description given in the link mentioned above, there must be some
unacked messages or slow consumers on topic riderride.chat.209 which is
preventing log file # 6 to be GCed. However, when I check in the admin
console to see how many messages were exchanged on this topic, I see that
the enqueued and dequeued messages are 0. So there is no chance for any
unacked messages. There is 1 inactive subscriber though. Will this inactive
subscriber prevent the GCing of this log file?

Same issue is observed for all other log files too. Any idea what's going
wrong here?



--
View this message in context: http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Posted by RuralHunter <ru...@gmail.com>.

based on your log, I think it should be passengerride.user.1234567890 which
prevents the deletion of #6



--
View this message in context: http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4712707.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Posted by Shobhana <sh...@quickride.in>.

Hi Tim,

I tried 5.13.3 and all further versions till 5.14.0, but this problem is
still not gone. I still see a lot of Kaha DB log files which consume huge
amounts of disk space and eventually the broker stops functioning!

I made some more changes to check if I could get rid of this problem :
a) I enabled following plug-in to set expiry of 1 day for every message (we
use only MQTT messages) :
<plugins>
  <discardingDLQBrokerPlugin dropAll="true" dropTemporaryTopics="true"
dropTemporaryQueues="true" />
	<timeStampingBrokerPlugin ttlCeiling="86400000"
zeroExpirationOverride="86400000"/>
</plugins>

b) I enabled DLQ to drop all expired messages for all topics and queues :
    <deadLetterStrategy>
          <sharedDeadLetterStrategy processExpired="false" />
    </deadLetterStrategy>

<discardingDLQBrokerPlugin dropAll="true" dropTemporaryTopics="true"
dropTemporaryQueues="true" />

c) I enabled offline durable subscribers to timeout after 1 day :
offlineDurableSubscriberTimeout="86400000"
offlineDurableSubscriberTaskSchedule="3600000"

After these changes, I could see a few log files got deleted, but majority
of the log files remain in the Kahadb folder which consume lot of disk
space.

I observed this in our production server where I cannot enable trace or JMX.
Is there any other way to identify what is causing this issue?

TIA,
Shobhana






--
View this message in context: http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4717976.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Posted by Tim Bain <tb...@alumni.duke.edu>.

Sorry, my brain's not working.  Definitely try 5.13.3 to see if the fix for
the bug I referenced and you found (yes, that's the one) allows deletion of
the old files.
On May 23, 2016 6:54 AM, "Tim Bain" <tb...@alumni.duke.edu> wrote:

> We try not to break backwards compatibility between versions, but since no
> one formally tests cross-version compatibility, it's not guaranteed.   But
> between 5.13.2 and 5.13.3, we should not have knowingly made a breaking
> change.
>
> Why are you trying 5.13.3?  Is there a bug in the release notes that you
> think changes this behavior?  If not, why bother?
> On May 22, 2016 11:26 AM, "Shobhana" <sh...@quickride.in> wrote:
>
> Tim, as you mentioned, I could see one message in many KahaDB log files. I
> couldn't get JConsole to work with the broker installed on EC2 and Jolokia
> Rest APIs didn't provide me these stats either. So I just tested by
> deleting
> some older KahaDB log files (after taking backup) and restarted the broker
> by configuring ignoreMissingJournalfiles="true". When the broker started, I
> could see an INFO message indicating "X" no of messages are dropped from
> the
> index where "X" was equal to the no of older files I deleted. So I think
> each file contained just one message.
>
> I'll now test by upgrading to 5.13.3 version. If I upgrade the broker,
> should I upgrade the jars on my client side as well? Or is it optional? I
> think I should upgrade.
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4712250.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Posted by Tim Bain <tb...@alumni.duke.edu>.

We try not to break backwards compatibility between versions, but since no
one formally tests cross-version compatibility, it's not guaranteed.   But
between 5.13.2 and 5.13.3, we should not have knowingly made a breaking
change.

Why are you trying 5.13.3?  Is there a bug in the release notes that you
think changes this behavior?  If not, why bother?
On May 22, 2016 11:26 AM, "Shobhana" <sh...@quickride.in> wrote:

Tim, as you mentioned, I could see one message in many KahaDB log files. I
couldn't get JConsole to work with the broker installed on EC2 and Jolokia
Rest APIs didn't provide me these stats either. So I just tested by deleting
some older KahaDB log files (after taking backup) and restarted the broker
by configuring ignoreMissingJournalfiles="true". When the broker started, I
could see an INFO message indicating "X" no of messages are dropped from the
index where "X" was equal to the no of older files I deleted. So I think
each file contained just one message.

I'll now test by upgrading to 5.13.3 version. If I upgrade the broker,
should I upgrade the jars on my client side as well? Or is it optional? I
think I should upgrade.



--
View this message in context:
http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4712250.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Posted by Shobhana <sh...@quickride.in>.

Tim, as you mentioned, I could see one message in many KahaDB log files. I
couldn't get JConsole to work with the broker installed on EC2 and Jolokia
Rest APIs didn't provide me these stats either. So I just tested by deleting
some older KahaDB log files (after taking backup) and restarted the broker
by configuring ignoreMissingJournalfiles="true". When the broker started, I
could see an INFO message indicating "X" no of messages are dropped from the
index where "X" was equal to the no of older files I deleted. So I think
each file contained just one message.

I'll now test by upgrading to 5.13.3 version. If I upgrade the broker,
should I upgrade the jars on my client side as well? Or is it optional? I
think I should upgrade.



--
View this message in context: http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4712250.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Posted by Shobhana <sh...@quickride.in>.

When checked using the Jolokia Rest APIs, I get same result ... enqueued
message count on these topics show 0; so I guess the Jolokia Rest APIs also
give only stats since the broker started last.



--
View this message in context: http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4712151.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Posted by Shobhana <sh...@quickride.in>.

Sure, I'll check when I have some time.

Btw, I tried to use JConsole to view the current status of consumers. I
tried hard getting JConsole to connect to my broker that runs on an EC2
instance, but couldn't succeed. Can I use the Jolokia Rest APIs instead? Do
these APIs give same result as JMX viewer (cumulative stats) or only from
the time the broker is restarted?

Are you talking about  AMQ-6203
<https://issues.apache.org/jira/browse/AMQ-6203>  ? Currently we use
ActiveMQ 5.13.2 and still see this problem. If the enhancement that you are
talking about is AMQ-6203, we will upgrade to 5.13.3 version in which this
is implemented.



--
View this message in context: http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4712109.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Posted by Tim Bain <tb...@alumni.duke.edu>.

You're right, your interpretation is consistent with the documented
example, whereas mine is consistent with the text of the log line.  In that
situation, I always assume the documentation is wrong, but that's just an
assumption.  If you or someone else has time to look at the algorithm and
confirm how it actually works, we can update the wiki.  If not, I'll get to
it when I have time.  In the meantime, you should investigate the
conclusions from both algorithms, since you can't be sure which is correct.

I'd be interested to hear what you see via JMX.

Also every durable subscriber on a topic has a subscription message that
can't be GC'ed, so you're never going to get to zero files.  I think I saw
a recent JIRA enhancement to allow those messages to be replayed (into the
newest file) when there are no messages on the topic.  You might want to
search for that enhancement and confirm whether it's in the version you're
using, and upgrade if not.

Tim
On May 17, 2016 9:38 PM, "Shobhana" <sh...@quickride.in> wrote:

> Thanks Tim, I did not know that the stats are reset when the broker is
> restarted. I'll check how to use JConsole to view the current status of
> consumers.
>
> Yes, all topics have durable subscribers.
>
> I used the logic mentioned in the example given at
>
> http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html
> <
> http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html
> >
> to identify the topic that prevents log file from being GCed. Even I was
> confused with the example. I expected "dest:0:J" to be holding messages in
> #86; but since the example said "dest:0:I", I thought the trace log was
> meant for "before" and not "after". Maybe this example can be corrected in
> the AMQ link.
> However, since I was unsure whether the trace log was wrong or the example
> was wrong, I cross-checked for any pending messages in both
> riderride.chat.209 and passengerride.user.1234567890 and found that both
> were 0. Hence I raised this question.
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4712083.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Posted by Shobhana <sh...@quickride.in>.

Thanks Tim, I did not know that the stats are reset when the broker is
restarted. I'll check how to use JConsole to view the current status of
consumers.

Yes, all topics have durable subscribers.

I used the logic mentioned in the example given at 
http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html
<http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html>  
to identify the topic that prevents log file from being GCed. Even I was
confused with the example. I expected "dest:0:J" to be holding messages in
#86; but since the example said "dest:0:I", I thought the trace log was
meant for "before" and not "after". Maybe this example can be corrected in
the AMQ link.
However, since I was unsure whether the trace log was wrong or the example
was wrong, I cross-checked for any pending messages in both
riderride.chat.209 and passengerride.user.1234567890 and found that both
were 0. Hence I raised this question.



--
View this message in context: http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046p4712083.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: AMQ 5.13.2 : Kaha DB logs cleanup issue

Posted by Tim Bain <tb...@alumni.duke.edu>.

As far as I know, the stats cover only the time since the broker was last
restarted, so seeing zeros is no guarantee that there are no pending
messages in the persistence store.  I would use a JMX viewer such as
JConsole to look at what the broker thinks is the current status of the
consumer(s) on the topics in question.

Are the subscribers to your topics using durable or non-durable
subscriptions?  I assume durable...

I don't understand the logic that led you to conclude that
riderride.chat.209 is preventing log file #6 from being GC'ed.  As I read
your logs, what I see is that:

   - riderride.chat.200 prevents #2 from being GC'ed
   - riderride.chat.206 prevents #4 from being GC'ed
   - passengerride.user.1234567890 prevents #6 from being GC'ed
   - chat.9206705762 prevents #1 from being GC'ed
   - invitationStatus.1234567890 prevents #5 from being GC'ed

Log #6 is blocked by passengerride.user.1234567890, and riderride.chat.209
doesn't block anything.  What am I missing?

Tim

On May 17, 2016 7:11 AM, "Shobhana" <sh...@quickride.in> wrote:

> We use AMQ to exchange MQTT messages. I observed that the log files in Kaha
> DB don't get deleted even after many days.
>
> I used the steps mentioned in
>
> http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html
> <
> http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html
> >
> to locate which destination is containing unacked messages or slow
> consumer.
> Following is the trace log output :
>
> 2016-05-17 18:05:32,077 | TRACE | Last update: 7:1109948, full gc
> candidates
> set: [1, 2, 3, 4, 5, 6, 7] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,078 | TRACE | gc candidates after
> producerSequenceIdTrackerLocation:7, [1, 2, 3, 4, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,078 | TRACE | gc candidates after
> ackMessageFileMapLocation:7, [1, 2, 3, 4, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,079 | TRACE | gc candidates after tx range:[null,
> null],
> [1, 2, 3, 4, 5, 6] | org.apache.activemq.store.kahadb.MessageDatabase |
> ActiveMQ Journal Checkpoint Worker
> 2016-05-17 18:05:32,079 | TRACE | gc candidates after
> dest:1:riderride.chat.200, [1, 3, 4, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,081 | TRACE | gc candidates after
> dest:1:riderride.chat.203, [1, 3, 4, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,081 | TRACE | gc candidates after
> dest:1:riderride.chat.206, [1, 3, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,083 | TRACE | gc candidates after
> dest:1:riderride.chat.209, [1, 3, 5, 6] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,083 | TRACE | gc candidates after
> dest:1:passengerride.user.1234567890, [1, 3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,083 | TRACE | gc candidates after
> dest:1:passengerride.user.1234567891, [1, 3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,085 | TRACE | gc candidates after
> dest:1:passengerride.status.160, [1, 3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,089 | TRACE | gc candidates after
> dest:1:riderride.chat.218, [1, 3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,090 | TRACE | gc candidates after
> dest:1:invitationStatus.1234567891, [1, 3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,090 | TRACE | gc candidates after
> dest:1:chat.9206705762, [3, 5] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,090 | TRACE | gc candidates after
> dest:1:invitationStatus.1234567890, [3] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
> 2016-05-17 18:05:32,091 | TRACE | gc candidates after
> dest:1:riderride.chat.220, [3] |
> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
> Checkpoint Worker
>
> As per description given in the link mentioned above, there must be some
> unacked messages or slow consumers on topic riderride.chat.209 which is
> preventing log file # 6 to be GCed. However, when I check in the admin
> console to see how many messages were exchanged on this topic, I see that
> the enqueued and dequeued messages are 0. So there is no chance for any
> unacked messages. There is 1 inactive subscriber though. Will this inactive
> subscriber prevent the GCing of this log file?
>
> Same issue is observed for all other log files too. Any idea what's going
> wrong here?
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/AMQ-5-13-2-Kaha-DB-logs-cleanup-issue-tp4712046.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>