You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Jason Gantner (Jira)" <ji...@apache.org> on 2021/04/14 09:48:00 UTC

[jira] [Commented] (AMQ-5540) KahaDB can't fail over to the slave if the master is unable to write to disk

    [ https://issues.apache.org/jira/browse/AMQ-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320860#comment-17320860 ] 

Jason Gantner commented on AMQ-5540:
------------------------------------

This issue is still present as of version 5.15.13 with the same behaviour.
A single I/O failure triggers a shutdown but the process (deadlocks|doesn't finish the routine) because KahaDB has a missing PageFile (from previous the I/O error).
We end up with a "frozen" master still actively locking the DB and a waiting slave waiting for the lock to be released.
A manual `activemq restart` solves the problem, but we loose the quick reaction time offered by the HA mode.

> KahaDB can't fail over to the slave if the master is unable to write to disk
> ----------------------------------------------------------------------------
>
>                 Key: AMQ-5540
>                 URL: https://issues.apache.org/jira/browse/AMQ-5540
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker, Message Store
>    Affects Versions: 5.10.0
>         Environment: Using Master-slave topology with shared kahadb. 
> Using KahaDB on NFS. 
>            Reporter: Anuj Khandelwal
>            Priority: Major
>         Attachments: ActiveMQ_config.xml, Logs.txt
>
>
> This is coming from http://activemq.2283324.n4.nabble.com/kahadb-corruption-quot-Checkpoint-failed-java-io-IOException-Input-output-error-quot-td4690378.html#a4690442 . 
> Scenario : We had some failure on filer because of which applications (ActiveMQ) was not able to read/write on kahadb. I have attached the logs to see the details. Master broker was not completely killed. Master has stopped it's transport connectors and plugins but it didn't release it's lock from the kahadb. I have checked from "ps" command that master broker was running. And since master didn't release the lock on kahadb, slave broker was not able to acquire the lock. 
> Master broker should shutdown properly in such cases and let the slave take over the persistence store. 
> Thanks,
> Anuj



--
This message was sent by Atlassian Jira
(v8.3.4#803005)