You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Timothy Bish (JIRA)" <ji...@apache.org> on 2017/01/16 14:53:26 UTC

[jira] [Commented] (AMQ-6564) HA: Slow Failover with AMQ + mKahaDb in Master/Slave setup with shared filesystem

    [ https://issues.apache.org/jira/browse/AMQ-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824107#comment-15824107 ] 

Timothy Bish commented on AMQ-6564:
-----------------------------------

Questions like this will get better support from the users mailing list, JIRA is where bugs are reported.  

> HA: Slow Failover with AMQ + mKahaDb in Master/Slave setup with shared filesystem
> ---------------------------------------------------------------------------------
>
>                 Key: AMQ-6564
>                 URL: https://issues.apache.org/jira/browse/AMQ-6564
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: KahaDB
>    Affects Versions: 5.14.3
>            Reporter: Johannes F. Knauf
>
> Consider the following scenario:
> * AMQ Host A and Host B are configured exactly the same
> * Host A and Host B share a common filesystem storage for their (m)kahadb in order to create HA as described in http://activemq.apache.org/shared-file-system-master-slave.html 
> * high-traffic scenario, where at each point in time quite some amount of messages is still in each queue
> Expected:
> Given Host A is current master and Host B is polling for the lock every 10 seconds (default),
> when Host A is going down,
> then Host B should be able to serve producer enqueue requests after 10 seconds + some microseconds at max.
> Reality:
> Host B needs to replay the whole journals before being available to accept new messages again. This can take a long time, especially if consistency checks are required. This means Master/Slave with shared FS is not really providing HA.
> It is perfectly understandable, that for consumers the failover takes that long. They can only continue receiving messages, when all journals have been read. Otherwise order of messages would be destroyed.
> For producers this is not the case, as AMQ could just create a fresh journal file and start appending immediately. Am I wrong?
> Also it seems, that each kahaDB in an mKahaDB ist checked in sequence, so that in worst case even less filled queues are not available before everything is checked completely.
> Long unavailability for producers is unacceptable in most scenarios. It means that all producing clients have to take a serious amount of effort to protect against these scenarios in order not to lose messages (buffering, etc.). Or is there a best practise workaround?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)