You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by pwalker <pw...@navinet.net> on 2013/12/09 17:14:40 UTC

Any way to protect from corruption being replicated in LevelDB?

Hey,

Quite new to ActiveMQ. We noticed an issue with replicated LevelDB where the
corrupted leveldb was being replicated between machines.

I had hoped that I would be able to use the configuration parameters for
verifyChecksums or paranoidChecks to help identify this scenario. Am I
missing something here?

To test this we did the following in our cluster of 3 brokers.

Cleared the level db directory on 2 brokers leaving them clean.
Left the corrupt DB (corrupt in that it's causing the exception below when
trying to expire messages) on 1 broker

I set both config parameters above to true but I see the broker failing over
with an IO Exception as below

INFO   | jvm 1    | 2013/12/06 05:12:20 |  INFO | Stopping
BrokerService[localhost] due to exception, java.io.IOException
INFO   | jvm 1    | 2013/12/06 05:12:20 | java.io.IOException
INFO   | jvm 1    | 2013/12/06 05:12:20 |       at
org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:39)
INFO   | jvm 1    | 2013/12/06 05:12:20 |       at
org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:543)
INFO   | jvm 1    | 2013/12/06 05:12:20 |       at
org.apache.activemq.leveldb.LevelDBClient.might_fail_using_index(LevelDBClient.scala:974)
INFO   | jvm 1    | 2013/12/06 05:12:20 |       at
org.apache.activemq.leveldb.LevelDBClient.collectionCursor(LevelDBClient.scala:1270)
INFO   | jvm 1    | 2013/12/06 05:12:20 |       at
org.apache.activemq.leveldb.LevelDBClient.queueCursor(LevelDBClient.scala:1194)
INFO   | jvm 1    | 2013/12/06 05:12:20 |       at
org.apache.activemq.leveldb.DBManager.cursorMessages(DBManager.scala:708)
INFO   | jvm 1    | 2013/12/06 05:12:20 |       at
org.apache.activemq.leveldb.LevelDBStore$LevelDBMessageStore.recoverNextMessages(LevelDBStore.scala:741)

This happens on both servers as the corrupted db has been replicated over.

Having a quick look it doesn't seem like ReadOptions are used for this
function so no verifyChecksums flag is passed in right?

Was hoping that on initialization of the masterLevelDBClient we would be
able to validate the datastore at that point and if it was invalid fall
over?

Is that the expected behavior of those parameters or is it completely
distinct from the replication process as they are used in the non-replicated
leveldb adapter as well?



--
View this message in context: http://activemq.2283324.n4.nabble.com/Any-way-to-protect-from-corruption-being-replicated-in-LevelDB-tp4675294.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Any way to protect from corruption being replicated in LevelDB?

Posted by Hiram Chirino <hi...@hiramchirino.com>.
On Mon, Dec 9, 2013 at 11:14 AM, pwalker <pw...@navinet.net> wrote:
> Having a quick look it doesn't seem like ReadOptions are used for this
> function so no verifyChecksums flag is passed in right?
>

Yep.. perhaps we should.

> Was hoping that on initialization of the masterLevelDBClient we would be
> able to validate the datastore at that point and if it was invalid fall
> over?

It might be impractical to check for all corruption at startup since
that might significantly delay the startup process.

> Is that the expected behavior of those parameters or is it completely
> distinct from the replication process as they are used in the non-replicated
> leveldb adapter as well?

The replication bits added to leveldb replicate files at the block
level and don't really check the integrity of the files it's
transferring.

Perhaps if we do detect consistency problem with a master we should
just stop replication and mark it's data files as being inconsistent
so that it does not try to become a master in the future.  That still
would require that one of the slaves data files be consistent to be
able to recover from the failure.



-- 
Hiram Chirino
Engineering | Red Hat, Inc.
hchirino@redhat.com | fusesource.com | redhat.com
skype: hiramchirino | twitter: @hiramchirino