You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by Oleg Kiorsak <ki...@gmail.com> on 2010/06/18 15:10:10 UTC

what about Slave failing in "Pure-Master Slave" setup?


One of the benefits of Pure Master Slave are that supposedly it provides
some "HA" - 
namely books, tutorials, wiki site describe that "when MASTER fails SLAVE"
becomes a MASTER
and clients are smoothly reconnected to it via the virtue of "failover
transport"...

That is all nice and good, and TRUE (I tested it)

But the question arises - what if its the SLAVE one that fails (as it is a
50/50 chance)...

In my testing (when I "kill -KILL") the SLAVE's process the end result is
that MASTER just stops accepting any connections and even the queues even
disappear from JMX jConsole....

only the restart of both restores "status quo"... but restart is something
that has to be done manually...


so as far as "HA" the solution seems to be asymmetrical - is is only an "HA"
when it is the MASTER that fails first...

Am I missing something?

???

Is there maybe some way to configure it so that MASTER continues to work...
alone (just as the SLAVE would if MASTER failed)...
??


cheers,
O.K.














-- 
View this message in context: http://old.nabble.com/what-about-Slave-failing-in-%22Pure-Master-Slave%22-setup--tp28925866p28925866.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: what about Slave failing in "Pure-Master Slave" setup?

Posted by Oleg Kiorsak <ki...@gmail.com>.


Thanks Garry

RE:
>should just ignore the slave if any replication attempt fails.

exactly!


the current way it works is that it provides "continuity" only when MASTER
fails in the pair, but not when
SLAVE fails (from what you describe it is the case when SYNC(ACKed) send is
used, but this is most common one)

I will look into creating a JIRA...

(my "use case" would be that SLAVE can just as equaly be the first one to
fail as MASTER! ;)


as far as the more sophisticated clustering with "shared storage" vs "pure
(aka shared nothing)"

I encountered a problem just yesterday that upon googling turned out to be a
known bug
https://issues.apache.org/activemq/browse/AMQ-2672

and the concern that it raises is whether a "storage" can get corrupted (is
it whats causein the "AMQ-2672"?)
so I thought a benefit of "pure" M/S would be that there is a redundancy
(two copies of persisted data)
??


cheers,
O.K.





Gary Tully wrote:
> 
> This is true, pure master slave is asymmetrical and it requires manual
> intervention to restore the paring once the master fails. There is an
> option
> on the broker to have the master shutdown if the slave fails, but this is
> off by default. The problem is that there is currently no way to have a
> slave connect and play catch up with an already running and active master.
> 
> Most folk use the shared data store fault tolerant strategy. where N
> brokers
> can share a data store (shared file system or jdbc) and one broker gains
> an
> exclusive lock and becomes active. the rest become passive slaves.
> 
> (That said, from looking at the code, the master will carry on (without
> the
> slave) if it fails to replicate an async command but does not stop
> replication on the failure of a  sync command, which is a little bogus. It
> should just ignore the slave if any replication attempt fails. If you have
> a
> use case for pure master slave, please open a jira issue so we can ensure
> a
> master can carry on in the event of a slave failure)
> 
> 
> On 18 June 2010 14:10, Oleg Kiorsak <ki...@gmail.com> wrote:
> 
>>
>>
>> One of the benefits of Pure Master Slave are that supposedly it provides
>> some "HA" -
>> namely books, tutorials, wiki site describe that "when MASTER fails
>> SLAVE"
>> becomes a MASTER
>> and clients are smoothly reconnected to it via the virtue of "failover
>> transport"...
>>
>> That is all nice and good, and TRUE (I tested it)
>>
>> But the question arises - what if its the SLAVE one that fails (as it is
>> a
>> 50/50 chance)...
>>
>> In my testing (when I "kill -KILL") the SLAVE's process the end result is
>> that MASTER just stops accepting any connections and even the queues even
>> disappear from JMX jConsole....
>>
>> only the restart of both restores "status quo"... but restart is
>> something
>> that has to be done manually...
>>
>>
>> so as far as "HA" the solution seems to be asymmetrical - is is only an
>> "HA"
>> when it is the MASTER that fails first...
>>
>> Am I missing something?
>>
>> ???
>>
>> Is there maybe some way to configure it so that MASTER continues to
>> work...
>> alone (just as the SLAVE would if MASTER failed)...
>> ??
>>
>>
>> cheers,
>> O.K.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/what-about-Slave-failing-in-%22Pure-Master-Slave%22-setup--tp28925866p28925866.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> http://blog.garytully.com
> 
> Open Source Integration
> http://fusesource.com
> 
> 

-- 
View this message in context: http://old.nabble.com/what-about-Slave-failing-in-%22Pure-Master-Slave%22-setup--tp28925866p28926308.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: what about Slave failing in "Pure-Master Slave" setup?

Posted by Oleg Kiorsak <ki...@gmail.com>.


just discovered another issue with "pure Master-Slave"

in 4 out of 5 test it all worked "by the book" - i.e. 
when I kill MASTER, SLAVE accepgts connections and becomes a master...

but tried it out one more tiem and got this

[root@ip-10-195-225-236 bin]# ERROR MasterConnector                - Network
connection between vm://SLAVE#0 and tcp:///10.252.219.112:10001 shutdown:
Channel was inactive for too long: /10.252.219.112:10001
org.apache.activemq.transport.InactivityIOException: Channel was inactive
for too long: /10.252.219.112:10001
	at
org.apache.activemq.transport.InactivityMonitor$4.run(InactivityMonitor.java:168)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)

 
so looks like SLAVE is not going to become a MASTER and the whole thing is
defunct.
???
what am I doing wrong?
(my conf in SLAVE is

...
<masterConnector remoteURI="tcp://10.252.219.112:10001"/>
...

do I need to specify some "timeout" parameters or keepalive or smthng?
(I would of thought by the nature of SLAVE<->MASTER relationships all proper
setting
would be inherent and implied in the link between them... ???)



thanks&cheers
O.K.









Gary Tully wrote:
> 
> This is true, pure master slave is asymmetrical and it requires manual
> intervention to restore the paring once the master fails. There is an
> option
> on the broker to have the master shutdown if the slave fails, but this is
> off by default. The problem is that there is currently no way to have a
> slave connect and play catch up with an already running and active master.
> 
> Most folk use the shared data store fault tolerant strategy. where N
> brokers
> can share a data store (shared file system or jdbc) and one broker gains
> an
> exclusive lock and becomes active. the rest become passive slaves.
> 
> (That said, from looking at the code, the master will carry on (without
> the
> slave) if it fails to replicate an async command but does not stop
> replication on the failure of a  sync command, which is a little bogus. It
> should just ignore the slave if any replication attempt fails. If you have
> a
> use case for pure master slave, please open a jira issue so we can ensure
> a
> master can carry on in the event of a slave failure)
> 
> 
> On 18 June 2010 14:10, Oleg Kiorsak <ki...@gmail.com> wrote:
> 
>>
>>
>> One of the benefits of Pure Master Slave are that supposedly it provides
>> some "HA" -
>> namely books, tutorials, wiki site describe that "when MASTER fails
>> SLAVE"
>> becomes a MASTER
>> and clients are smoothly reconnected to it via the virtue of "failover
>> transport"...
>>
>> That is all nice and good, and TRUE (I tested it)
>>
>> But the question arises - what if its the SLAVE one that fails (as it is
>> a
>> 50/50 chance)...
>>
>> In my testing (when I "kill -KILL") the SLAVE's process the end result is
>> that MASTER just stops accepting any connections and even the queues even
>> disappear from JMX jConsole....
>>
>> only the restart of both restores "status quo"... but restart is
>> something
>> that has to be done manually...
>>
>>
>> so as far as "HA" the solution seems to be asymmetrical - is is only an
>> "HA"
>> when it is the MASTER that fails first...
>>
>> Am I missing something?
>>
>> ???
>>
>> Is there maybe some way to configure it so that MASTER continues to
>> work...
>> alone (just as the SLAVE would if MASTER failed)...
>> ??
>>
>>
>> cheers,
>> O.K.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/what-about-Slave-failing-in-%22Pure-Master-Slave%22-setup--tp28925866p28925866.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> http://blog.garytully.com
> 
> Open Source Integration
> http://fusesource.com
> 
> 

-- 
View this message in context: http://old.nabble.com/what-about-Slave-failing-in-%22Pure-Master-Slave%22-setup--tp28925866p28926658.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: what about Slave failing in "Pure-Master Slave" setup?

Posted by Gary Tully <ga...@gmail.com>.

This is true, pure master slave is asymmetrical and it requires manual
intervention to restore the paring once the master fails. There is an option
on the broker to have the master shutdown if the slave fails, but this is
off by default. The problem is that there is currently no way to have a
slave connect and play catch up with an already running and active master.

Most folk use the shared data store fault tolerant strategy. where N brokers
can share a data store (shared file system or jdbc) and one broker gains an
exclusive lock and becomes active. the rest become passive slaves.

(That said, from looking at the code, the master will carry on (without the
slave) if it fails to replicate an async command but does not stop
replication on the failure of a  sync command, which is a little bogus. It
should just ignore the slave if any replication attempt fails. If you have a
use case for pure master slave, please open a jira issue so we can ensure a
master can carry on in the event of a slave failure)

On 18 June 2010 14:10, Oleg Kiorsak <ki...@gmail.com> wrote:

>
>
> One of the benefits of Pure Master Slave are that supposedly it provides
> some "HA" -
> namely books, tutorials, wiki site describe that "when MASTER fails SLAVE"
> becomes a MASTER
> and clients are smoothly reconnected to it via the virtue of "failover
> transport"...
>
> That is all nice and good, and TRUE (I tested it)
>
> But the question arises - what if its the SLAVE one that fails (as it is a
> 50/50 chance)...
>
> In my testing (when I "kill -KILL") the SLAVE's process the end result is
> that MASTER just stops accepting any connections and even the queues even
> disappear from JMX jConsole....
>
> only the restart of both restores "status quo"... but restart is something
> that has to be done manually...
>
>
> so as far as "HA" the solution seems to be asymmetrical - is is only an
> "HA"
> when it is the MASTER that fails first...
>
> Am I missing something?
>
> ???
>
> Is there maybe some way to configure it so that MASTER continues to work...
> alone (just as the SLAVE would if MASTER failed)...
> ??
>
>
> cheers,
> O.K.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://old.nabble.com/what-about-Slave-failing-in-%22Pure-Master-Slave%22-setup--tp28925866p28925866.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>

-- 
http://blog.garytully.com

Open Source Integration
http://fusesource.com