You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by victorhdamian <vi...@gmail.com> on 2013/06/11 19:24:34 UTC

Re: KahaDB corruption

Try this:
ActiveMQ v5.5.1 Corrupt data log found recovery:

Symptom:
The ActiveMQ slave process died and will not restarted.

Root Cause:
Corrupt data log found

Root Cause verification:
Search the affected ActiveMQ log file for the following entries in sequence:
Corrupt journal records found
Failed to discard data file
Failed to start ActiveMQ JMS Message Broker
shutting down

Recovery:
Shutdown the ActiveMQ master instance.
Rename the Kaha db storage file
Restart the ActiveMQ Master and Slave instances

Note: the journal data affected by the corruption will be lost. The affected
journal data will need to be identify and resent to the ActiveMQ appropriate
queue.



--
View this message in context: http://activemq.2283324.n4.nabble.com/KahaDB-corruption-tp3321382p4668099.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: KahaDB corruption

Posted by Gary Tully <ga...@gmail.com>.

If the filesystem is corrupted, there is not much one can do.
ignoreMissingJournalfiles should really be called ignoreCorruptJournalRecords.

A Journal record is the unit of data written to the journal in one
sequential write. If a unit cannot be read (the read values don't
match their checksum) it can be ingnored when
ignoreMissingJournalfiles=true and recovery will continue with some
missing messages.

running with ignoreMissingJournalfiles=true means that you will only
loose a subset of messages, the ones that fall into corrupt records.
So there is no need to remove the entire data file.

There are some tests that spit random data into journal files and
validate recovery, but we could always do with more of these for
specific scenarios.
see: org.apache.activemq.store.kahadb.KahaDBTest

With sync send or transacted producer/consumer and fsync support by
the underlying filesystem persistence is guaranteed.
when there is failed read/write in the index we can recreate the index
from the journal. When there is something wrong in the journal we are
into the realm of missing messages and we try and reduce the scope
with the journal record checksum.
Reducing the journal write batch size could ensure that a journal
record has a minimum of messages in it, but this is a trade off
between failure recovery and throughput. In essence, AMQ delegates to
the file system for reliable storage so the expectation
is that what is written can be read.

It would be interesting to understand more detail about the particular
failure you are experiencing to see if we can do better in that case.

Ideally we can try and replicate in a unit test and investigate a way
to improve. Patches are always welcome.

On 11 June 2013 19:21, pollotek <cl...@gmail.com> wrote:
> So your proposed fix is to remove the corrupted log file and restart the
> brokers?
>
> I would lose the messages in those files if I did that. These files contain
> messages from different queues that are handled by on the same broker (I
> wouldn't build a new broker master/slave pair per queue type). Message
> Ordering would be also be lost and it would be next to impossible for my app
> to identify and re-create the messages that were lost and re-inject them
> into the queue. And even the effort of writing such logic would be
> absolutely not cost efficient.
>
> I don't think your solution is something I'm comfortable with at all. If I
> was ok with losing messages, I'd rather make my broker non-persistent and
> forget about this whole issue.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/KahaDB-corruption-tp3321382p4668100.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.

-- 
http://redhat.com
http://blog.garytully.com

Re: KahaDB corruption

Posted by pollotek <cl...@gmail.com>.

So your proposed fix is to remove the corrupted log file and restart the
brokers? 

I would lose the messages in those files if I did that. These files contain
messages from different queues that are handled by on the same broker (I
wouldn't build a new broker master/slave pair per queue type). Message
Ordering would be also be lost and it would be next to impossible for my app
to identify and re-create the messages that were lost and re-inject them
into the queue. And even the effort of writing such logic would be
absolutely not cost efficient.

I don't think your solution is something I'm comfortable with at all. If I
was ok with losing messages, I'd rather make my broker non-persistent and
forget about this whole issue.



--
View this message in context: http://activemq.2283324.n4.nabble.com/KahaDB-corruption-tp3321382p4668100.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.