You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Justin Ross (JIRA)" <ji...@apache.org> on 2014/06/12 23:59:03 UTC
[jira] [Updated] (QPID-5719) HA becomes unresponsive once any of
the brokers are SIGSTOPed
[ https://issues.apache.org/jira/browse/QPID-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Justin Ross updated QPID-5719:
------------------------------
Fix Version/s: 0.29
> HA becomes unresponsive once any of the brokers are SIGSTOPed
> -------------------------------------------------------------
>
> Key: QPID-5719
> URL: https://issues.apache.org/jira/browse/QPID-5719
> Project: Qpid
> Issue Type: Bug
> Components: C++ Clustering
> Affects Versions: 0.28
> Reporter: Alan Conway
> Assignee: Alan Conway
> Fix For: 0.29
>
> Attachments: ha-heartbeat.diff
>
>
> See also: https://bugzilla.redhat.com/show_bug.cgi?id=1086638
> Description of problem:
> qpid HA becomes unresponsive once any of the brokers are SIGSTOPed.
> There are three different cases:
> a] stopped ALL brokers
> b] stopped the primary
> c] stopped a backup
> In any of above listed cases following observations were made:
> a-c] RHCS clustat is just fine and report everything is just ok
> a-c] qpid-ha (status --all) hangs
> a,b,c*] any other clients are indefinitely blocked
> a-b] cases directly at the beginning
> c] case at the end, client able to recover after minute or so,
> due to connection timeout
> In fact this defect also proves that qpid-ha can be out of sync when compared to clustat as tracked by BZ.
> The expectations are:
> * a] quorum lost HA down (same as kill -9 to all nodes)
> no clients able to communicate
> * b] promotion of new primary, there has to be mechanism to get rid of stopped process
> clients should be able to communicate after recovery
> * c] unresponsive backup should get restarted
> clients should be able to communicate after duration when backup is detected as unresponsive
> * Generally better integration Qpid HA environment <-> RHCS is needed
> aka SIGSTOP detection
> * Heartbeat primary <-> backups probably needed
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org