You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by JoeC <jo...@gmail.com> on 2011/07/26 13:25:28 UTC

Re: KahaDB corruption

I'm currently on 5.5.0 and ran into a different and unrecoverable kahadb
case.
I ran the system out of diskspace and not unreasonably activemq didn't like
it.
After freeing up some space I ran into database corruption as follows.
2011-07-26 10:00:23,316 | INFO  | Corrupt journal records found in
'/opt/ivb/apache-activemq-5.5.0/data/kahadb/db-326.log' between offsets:
19460423-21031378 | org.apache.kahadb.journal.Journal | main
...
2011-07-26 10:00:23,826 | INFO  | Recovering from the journal ... |
org.apache.activemq.store.kahadb.MessageDatabase | main
2011-07-26 10:00:23,953 | ERROR | Failed to start ActiveMQ JMS Message
Broker. Reason: org.apache.activemq.protobuf.InvalidProtocolBufferException:
Protocol message contained an invalid tag (zero). |
org.apache.activemq.broker.BrokerService | main

Removing the db.data made no difference.
I then removed the db-326.log file and restarted twice.
The first it complains about not finding db-326.log.
The second time is uses a newly created db-1.log.

Fortunately this was not a production environment, so the data doesn't
matter however I would like a way of recovering the data. This could even be
an offline process.
i.e. I quickly reset the database to restore service and then push in the
older messages later.
My application domain is somewhat tolerant of that approach but it is not
tolerant of extended outages.
For me, I'd rather (temporarily) lose some data than have a long outage so a
fully automated recovery is what I'd ideally like irrespective of
corruption.

Cheers
Joe

JoeC wrote:
> 
> I've upgraded to 5.4.2 and will let you know how it goes.
> I didn't rebuild the index as I've already restarted the process.
> In normal operation the queues should be empty for our application so
> that was not an issue for me.
> 
> Thanks
> Joe
> 
> On 23 February 2011 18:06, Gary Tully &lt;gary.tully@gmail.com&gt; wrote:
>> 5.4.2 is better w.r.t abortive shutdown, but for this case, rebuilding
>> the index should work.
>> remove kahadb/db.data and restart, it will parse the journal to
>> rebuild the index.
>>
> 

--
View this message in context: http://activemq.2283324.n4.nabble.com/KahaDB-corruption-tp3321382p3695392.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: KahaDB corruption

Posted by Gary Tully <ga...@gmail.com>.

If the filesystem is corrupted, there is not much one can do.
ignoreMissingJournalfiles should really be called ignoreCorruptJournalRecords.

A Journal record is the unit of data written to the journal in one
sequential write. If a unit cannot be read (the read values don't
match their checksum) it can be ingnored when
ignoreMissingJournalfiles=true and recovery will continue with some
missing messages.

running with ignoreMissingJournalfiles=true means that you will only
loose a subset of messages, the ones that fall into corrupt records.
So there is no need to remove the entire data file.

There are some tests that spit random data into journal files and
validate recovery, but we could always do with more of these for
specific scenarios.
see: org.apache.activemq.store.kahadb.KahaDBTest

With sync send or transacted producer/consumer and fsync support by
the underlying filesystem persistence is guaranteed.
when there is failed read/write in the index we can recreate the index
from the journal. When there is something wrong in the journal we are
into the realm of missing messages and we try and reduce the scope
with the journal record checksum.
Reducing the journal write batch size could ensure that a journal
record has a minimum of messages in it, but this is a trade off
between failure recovery and throughput. In essence, AMQ delegates to
the file system for reliable storage so the expectation
is that what is written can be read.

It would be interesting to understand more detail about the particular
failure you are experiencing to see if we can do better in that case.

Ideally we can try and replicate in a unit test and investigate a way
to improve. Patches are always welcome.

On 11 June 2013 19:21, pollotek <cl...@gmail.com> wrote:
> So your proposed fix is to remove the corrupted log file and restart the
> brokers?
>
> I would lose the messages in those files if I did that. These files contain
> messages from different queues that are handled by on the same broker (I
> wouldn't build a new broker master/slave pair per queue type). Message
> Ordering would be also be lost and it would be next to impossible for my app
> to identify and re-create the messages that were lost and re-inject them
> into the queue. And even the effort of writing such logic would be
> absolutely not cost efficient.
>
> I don't think your solution is something I'm comfortable with at all. If I
> was ok with losing messages, I'd rather make my broker non-persistent and
> forget about this whole issue.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/KahaDB-corruption-tp3321382p4668100.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.

-- 
http://redhat.com
http://blog.garytully.com

Re: KahaDB corruption

Posted by pollotek <cl...@gmail.com>.

So your proposed fix is to remove the corrupted log file and restart the
brokers? 

I would lose the messages in those files if I did that. These files contain
messages from different queues that are handled by on the same broker (I
wouldn't build a new broker master/slave pair per queue type). Message
Ordering would be also be lost and it would be next to impossible for my app
to identify and re-create the messages that were lost and re-inject them
into the queue. And even the effort of writing such logic would be
absolutely not cost efficient.

I don't think your solution is something I'm comfortable with at all. If I
was ok with losing messages, I'd rather make my broker non-persistent and
forget about this whole issue.



--
View this message in context: http://activemq.2283324.n4.nabble.com/KahaDB-corruption-tp3321382p4668100.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: KahaDB corruption

Posted by victorhdamian <vi...@gmail.com>.

Try this:
ActiveMQ v5.5.1 Corrupt data log found recovery:

Symptom:
The ActiveMQ slave process died and will not restarted.

Root Cause:
Corrupt data log found

Root Cause verification:
Search the affected ActiveMQ log file for the following entries in sequence:
Corrupt journal records found
Failed to discard data file
Failed to start ActiveMQ JMS Message Broker
shutting down

Recovery:
Shutdown the ActiveMQ master instance.
Rename the Kaha db storage file
Restart the ActiveMQ Master and Slave instances

Note: the journal data affected by the corruption will be lost. The affected
journal data will need to be identify and resent to the ActiveMQ appropriate
queue.



--
View this message in context: http://activemq.2283324.n4.nabble.com/KahaDB-corruption-tp3321382p4668099.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: KahaDB corruption

Posted by pollotek <cl...@gmail.com>.

I have the same issue with ActiveMQ 5.6.0

I started to get the following messages in the log after bouncing ActiveMQ
stand alone broker:

2012-09-20 20:09:58,748 | ERROR |
org.apache.activemq.broker.region.cursors.QueueStorePrefetch@18287811:com.xxxx.queue.timedData,batchResetNeeded=false,storeHasMessages=true,size=22772,cacheEnabled=false
- Failed to fill batch |
org.apache.activemq.broker.region.cursors.AbstractStoreCursor |
Queue:com.xxxxt.queue.timedData
java.lang.RuntimeException:
org.apache.activemq.protobuf.InvalidProtocolBufferException: Protocol
message contained an invalid tag (zero).
	at
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:280)
	at
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:113)
	at
org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157)
	at
org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1766)
	at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1995)
	at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1488)
	at
org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
	at
org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
Caused by: org.apache.activemq.protobuf.InvalidProtocolBufferException:
Protocol message contained an invalid tag (zero).
	at
org.apache.activemq.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:48)
	at
org.apache.activemq.protobuf.CodedInputStream.readTag(CodedInputStream.java:75)
	at
org.apache.activemq.store.kahadb.data.KahaAddMessageCommand.mergeUnframed(KahaAddMessageCommand.java:110)
	at
org.apache.activemq.store.kahadb.data.KahaAddMessageCommand.mergeUnframed(KahaAddMessageCommand.java:7)
	at
org.apache.activemq.protobuf.BaseMessage.mergeUnframed(BaseMessage.java:184)
	at
org.apache.activemq.protobuf.BaseMessage.mergeUnframed(BaseMessage.java:213)
	at
org.apache.activemq.protobuf.BaseMessage.mergeFramed(BaseMessage.java:237)
	at
org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:938)
	at
org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1015)
	at
org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:556)
	at org.apache.kahadb.page.Transaction.execute(Transaction.java:769)
	at
org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:545)
	at
org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:106)
	at
org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:97)
	at
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:277)
	... 7 more
2012-09-20 20:09:58,748 | ERROR | Failed to page in more queue messages  |
org.apache.activemq.broker.region.Queue | Queue:com.xxxxt.queue.timedData
java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.activemq.protobuf.InvalidProtocolBufferException: Protocol
message contained an invalid tag (zero).
	at
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:116)
	at
org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157)
	at
org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1766)
	at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1995)
	at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1488)
	at
org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
	at
org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
Caused by: java.lang.RuntimeException:
org.apache.activemq.protobuf.InvalidProtocolBufferException: Protocol
message contained an invalid tag (zero).
	at
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:280)
	at
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:113)
	... 6 more
Caused by: org.apache.activemq.protobuf.InvalidProtocolBufferException:
Protocol message contained an invalid tag (zero).
	at
org.apache.activemq.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:48)
	at
org.apache.activemq.protobuf.CodedInputStream.readTag(CodedInputStream.java:75)
	at
org.apache.activemq.store.kahadb.data.KahaAddMessageCommand.mergeUnframed(KahaAddMessageCommand.java:110)
	at
org.apache.activemq.store.kahadb.data.KahaAddMessageCommand.mergeUnframed(KahaAddMessageCommand.java:7)
	at
org.apache.activemq.protobuf.BaseMessage.mergeUnframed(BaseMessage.java:184)
	at
org.apache.activemq.protobuf.BaseMessage.mergeUnframed(BaseMessage.java:213)
	at
org.apache.activemq.protobuf.BaseMessage.mergeFramed(BaseMessage.java:237)
	at
org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:938)
	at
org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1015)
	at
org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:556)
	at org.apache.kahadb.page.Transaction.execute(Transaction.java:769)
	at
org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:545)
	at
org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:106)
	at
org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:97)
	at
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:277)
	... 7 more

This is my kahadb configuration:

        <persistenceAdapter>
            <kahaDB directory="/var/gluster/activemq/data/kahadb"
                    ignoreMissingJournalfiles="true"
                    checkForCorruptJournalFiles="true"
                    checksumJournalFiles="true"/>
        </persistenceAdapter>





--
View this message in context: http://activemq.2283324.n4.nabble.com/KahaDB-corruption-tp3321382p4656773.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: KahaDB corruption

Posted by Joe Carter <jo...@gmail.com>.

Thanks Gary - I'd moved from embedded to a separate broker and had
missed this completely.
For reference I added the following to conf/activemq.xml
                    ignoreMissingJournalfiles="true"
                    checkForCorruptJournalFiles="true"
                    checksumJournalFiles="true" />
to give
        <persistenceAdapter>
            <kahaDB directory="${activemq.base}/data/kahadb"
                    ignoreMissingJournalfiles="true"
                    checkForCorruptJournalFiles="true"
                    checksumJournalFiles="true" />
        </persistenceAdapter>

I pushed the corrupted database back in place and it recovered correctly.
2011-07-27 09:31:19,644 | INFO  | Recovery replayed 2 operations from
the journal in 0.036 seconds. |
org.apache.activemq.store.kahadb.MessageDatabase | main
2011-07-27 09:31:19,649 | INFO  | Detected missing/corrupt journal
files.  Dropped 5 messages from the index in 0.0030 seconds. |
org.apache.activemq.store.kahadb.MessageDatabase | main

So I can confirm the recovery code is working for me.

Thanks.
Joe

On 26 July 2011 12:59, Gary Tully <ga...@gmail.com> wrote:
> The flags: checksumJournalFiles, checkForCorruptJournalFiles and
> ignoreMissingJournalfiles are designed to for this use case. Have you
> those enabled?
>
> http://activemq.apache.org/kahadb.html
>
> On 26 July 2011 12:25, JoeC <jo...@gmail.com> wrote:
>> I'm currently on 5.5.0 and ran into a different and unrecoverable kahadb
>> case.
>> I ran the system out of diskspace and not unreasonably activemq didn't like
>> it.
>> After freeing up some space I ran into database corruption as follows.
>> 2011-07-26 10:00:23,316 | INFO  | Corrupt journal records found in
>> '/opt/ivb/apache-activemq-5.5.0/data/kahadb/db-326.log' between offsets:
>> 19460423-21031378 | org.apache.kahadb.journal.Journal | main
>> ...
>> 2011-07-26 10:00:23,826 | INFO  | Recovering from the journal ... |
>> org.apache.activemq.store.kahadb.MessageDatabase | main
>> 2011-07-26 10:00:23,953 | ERROR | Failed to start ActiveMQ JMS Message
>> Broker. Reason: org.apache.activemq.protobuf.InvalidProtocolBufferException:
>> Protocol message contained an invalid tag (zero). |
>> org.apache.activemq.broker.BrokerService | main
>>
>> Removing the db.data made no difference.
>> I then removed the db-326.log file and restarted twice.
>> The first it complains about not finding db-326.log.
>> The second time is uses a newly created db-1.log.
>>
>> Fortunately this was not a production environment, so the data doesn't
>> matter however I would like a way of recovering the data. This could even be
>> an offline process.
>> i.e. I quickly reset the database to restore service and then push in the
>> older messages later.
>> My application domain is somewhat tolerant of that approach but it is not
>> tolerant of extended outages.
>> For me, I'd rather (temporarily) lose some data than have a long outage so a
>> fully automated recovery is what I'd ideally like irrespective of
>> corruption.
>>
>> Cheers
>> Joe
>>
>>
>> JoeC wrote:
>>>
>>> I've upgraded to 5.4.2 and will let you know how it goes.
>>> I didn't rebuild the index as I've already restarted the process.
>>> In normal operation the queues should be empty for our application so
>>> that was not an issue for me.
>>>
>>> Thanks
>>> Joe
>>>
>>> On 23 February 2011 18:06, Gary Tully &lt;gary.tully@gmail.com&gt; wrote:
>>>> 5.4.2 is better w.r.t abortive shutdown, but for this case, rebuilding
>>>> the index should work.
>>>> remove kahadb/db.data and restart, it will parse the journal to
>>>> rebuild the index.
>>>>
>>>
>>
>>
>> --
>> View this message in context: http://activemq.2283324.n4.nabble.com/KahaDB-corruption-tp3321382p3695392.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> http://fusesource.com
> http://blog.garytully.com
>

Re: KahaDB corruption

Posted by Gary Tully <ga...@gmail.com>.

The flags: checksumJournalFiles, checkForCorruptJournalFiles and
ignoreMissingJournalfiles are designed to for this use case. Have you
those enabled?

http://activemq.apache.org/kahadb.html

On 26 July 2011 12:25, JoeC <jo...@gmail.com> wrote:
> I'm currently on 5.5.0 and ran into a different and unrecoverable kahadb
> case.
> I ran the system out of diskspace and not unreasonably activemq didn't like
> it.
> After freeing up some space I ran into database corruption as follows.
> 2011-07-26 10:00:23,316 | INFO  | Corrupt journal records found in
> '/opt/ivb/apache-activemq-5.5.0/data/kahadb/db-326.log' between offsets:
> 19460423-21031378 | org.apache.kahadb.journal.Journal | main
> ...
> 2011-07-26 10:00:23,826 | INFO  | Recovering from the journal ... |
> org.apache.activemq.store.kahadb.MessageDatabase | main
> 2011-07-26 10:00:23,953 | ERROR | Failed to start ActiveMQ JMS Message
> Broker. Reason: org.apache.activemq.protobuf.InvalidProtocolBufferException:
> Protocol message contained an invalid tag (zero). |
> org.apache.activemq.broker.BrokerService | main
>
> Removing the db.data made no difference.
> I then removed the db-326.log file and restarted twice.
> The first it complains about not finding db-326.log.
> The second time is uses a newly created db-1.log.
>
> Fortunately this was not a production environment, so the data doesn't
> matter however I would like a way of recovering the data. This could even be
> an offline process.
> i.e. I quickly reset the database to restore service and then push in the
> older messages later.
> My application domain is somewhat tolerant of that approach but it is not
> tolerant of extended outages.
> For me, I'd rather (temporarily) lose some data than have a long outage so a
> fully automated recovery is what I'd ideally like irrespective of
> corruption.
>
> Cheers
> Joe
>
>
> JoeC wrote:
>>
>> I've upgraded to 5.4.2 and will let you know how it goes.
>> I didn't rebuild the index as I've already restarted the process.
>> In normal operation the queues should be empty for our application so
>> that was not an issue for me.
>>
>> Thanks
>> Joe
>>
>> On 23 February 2011 18:06, Gary Tully &lt;gary.tully@gmail.com&gt; wrote:
>>> 5.4.2 is better w.r.t abortive shutdown, but for this case, rebuilding
>>> the index should work.
>>> remove kahadb/db.data and restart, it will parse the journal to
>>> rebuild the index.
>>>
>>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/KahaDB-corruption-tp3321382p3695392.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>



-- 
http://fusesource.com
http://blog.garytully.com