You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by "smays@edmunds.com" <sm...@edmunds.com> on 2010/04/10 00:14:14 UTC

Failover and Fail BACK

Hello there! We are using 5.3.1 deployed as a HA failover cluster via an NFS
mount.

We are able to kill a broker (which we are calling the HOT broker) and get a
second broker (or COLD broker) to take over with only a few messages lost in
transition (353 out of 1.3 Million) which are chalking up to "was in transit
in memory on the way to the NFS server". However, if we restart the HOT
broker it waits for the exclusive lock and if we kill the COLD broker (which
is then the live broker) we get what looks like either KahaDB or
persistent-store corruption and are unable to continue until we stop both
brokers, rm -rf the store/data directory then restart the HOT broker.

Has anyone else seen this issue? Anyone know what we could do to make it
fail BACK to HOT without failure?

Next up for us will be HA failover with network of brokers and we'll post
how we did it with an NFS mount when we get it to work!

Thank you all!

Steve Mays
Edmunds.com
-- 
View this message in context: http://old.nabble.com/Failover-and-Fail-BACK-tp28198179p28198179.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Failover and Fail BACK

Posted by "smays@edmunds.com" <sm...@edmunds.com>.

No. Not required.  See explanation in previous reply!


Gary Tully wrote:
> 
> NFSv4?
> 
> On 9 April 2010 23:14, smays@edmunds.com <sm...@edmunds.com> wrote:
> 
>>
>> Hello there! We are using 5.3.1 deployed as a HA failover cluster via an
>> NFS
>> mount.
>>
>> We are able to kill a broker (which we are calling the HOT broker) and
>> get
>> a
>> second broker (or COLD broker) to take over with only a few messages lost
>> in
>> transition (353 out of 1.3 Million) which are chalking up to "was in
>> transit
>> in memory on the way to the NFS server". However, if we restart the HOT
>> broker it waits for the exclusive lock and if we kill the COLD broker
>> (which
>> is then the live broker) we get what looks like either KahaDB or
>> persistent-store corruption and are unable to continue until we stop both
>> brokers, rm -rf the store/data directory then restart the HOT broker.
>>
>> Has anyone else seen this issue? Anyone know what we could do to make it
>> fail BACK to HOT without failure?
>>
>> Next up for us will be HA failover with network of brokers and we'll post
>> how we did it with an NFS mount when we get it to work!
>>
>> Thank you all!
>>
>> Steve Mays
>> Edmunds.com
>> --
>> View this message in context:
>> http://old.nabble.com/Failover-and-Fail-BACK-tp28198179p28198179.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> http://blog.garytully.com
> 
> Open Source Integration
> http://fusesource.com
> 
> 

-- 
View this message in context: http://old.nabble.com/Failover-and-Fail-BACK-tp28198179p28232430.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Failover and Fail BACK

Posted by Gary Tully <ga...@gmail.com>.

NFSv4?

On 9 April 2010 23:14, smays@edmunds.com <sm...@edmunds.com> wrote:

>
> Hello there! We are using 5.3.1 deployed as a HA failover cluster via an
> NFS
> mount.
>
> We are able to kill a broker (which we are calling the HOT broker) and get
> a
> second broker (or COLD broker) to take over with only a few messages lost
> in
> transition (353 out of 1.3 Million) which are chalking up to "was in
> transit
> in memory on the way to the NFS server". However, if we restart the HOT
> broker it waits for the exclusive lock and if we kill the COLD broker
> (which
> is then the live broker) we get what looks like either KahaDB or
> persistent-store corruption and are unable to continue until we stop both
> brokers, rm -rf the store/data directory then restart the HOT broker.
>
> Has anyone else seen this issue? Anyone know what we could do to make it
> fail BACK to HOT without failure?
>
> Next up for us will be HA failover with network of brokers and we'll post
> how we did it with an NFS mount when we get it to work!
>
> Thank you all!
>
> Steve Mays
> Edmunds.com
> --
> View this message in context:
> http://old.nabble.com/Failover-and-Fail-BACK-tp28198179p28198179.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>


-- 
http://blog.garytully.com

Open Source Integration
http://fusesource.com

Re: Failover and Fail BACK

Posted by "smays@edmunds.com" <sm...@edmunds.com>.

Hello there, 
NFS v3 is NOT required.  The problem identified is a failure in lock
management due to SOME implementations of NFS v3 having "soft" locking.  If
your kernel supports "i'm serious, I'm locking you mr. file" AND you put the
correct settings on your server and client to ensure that you never have
hung locks it will all work fine.

Settings example:
in /etc/exports on server:
/messages *(secure,rw,sync,no_root_squash)

in /etc/fstab on clients:
nas:/messages           /messages       nfs    
rsize=8192,wsize=8192,soft,bg,intr,nfsvers=3,tcp,timeo=14




ttmgary wrote:
> 
> Are you using NFS 4.x? I understand that is required.
> 
> Gary
> 
> 
> smays@edmunds.com wrote:
>> 
>> Hello there! We are using 5.3.1 deployed as a HA failover cluster via an
>> NFS mount.
>> 
>> We are able to kill a broker (which we are calling the HOT broker) and
>> get a second broker (or COLD broker) to take over with only a few
>> messages lost in transition (353 out of 1.3 Million) which are chalking
>> up to "was in transit in memory on the way to the NFS server". However,
>> if we restart the HOT broker it waits for the exclusive lock and if we
>> kill the COLD broker (which is then the live broker) we get what looks
>> like either KahaDB or persistent-store corruption and are unable to
>> continue until we stop both brokers, rm -rf the store/data directory then
>> restart the HOT broker.
>> 
>> Has anyone else seen this issue? Anyone know what we could do to make it
>> fail BACK to HOT without failure?
>> 
>> Next up for us will be HA failover with network of brokers and we'll post
>> how we did it with an NFS mount when we get it to work!
>> 
>> Thank you all!
>> 
>> Steve Mays
>> Edmunds.com
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Failover-and-Fail-BACK-tp28198179p28219192.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Failover and Fail BACK

Posted by ttmgary <ga...@ttmsolutions.com>.

Are you using NFS 4.x? I understand that is required.

Gary


smays@edmunds.com wrote:
> 
> Hello there! We are using 5.3.1 deployed as a HA failover cluster via an
> NFS mount.
> 
> We are able to kill a broker (which we are calling the HOT broker) and get
> a second broker (or COLD broker) to take over with only a few messages
> lost in transition (353 out of 1.3 Million) which are chalking up to "was
> in transit in memory on the way to the NFS server". However, if we restart
> the HOT broker it waits for the exclusive lock and if we kill the COLD
> broker (which is then the live broker) we get what looks like either
> KahaDB or persistent-store corruption and are unable to continue until we
> stop both brokers, rm -rf the store/data directory then restart the HOT
> broker.
> 
> Has anyone else seen this issue? Anyone know what we could do to make it
> fail BACK to HOT without failure?
> 
> Next up for us will be HA failover with network of brokers and we'll post
> how we did it with an NFS mount when we get it to work!
> 
> Thank you all!
> 
> Steve Mays
> Edmunds.com
> 

-- 
View this message in context: http://old.nabble.com/Failover-and-Fail-BACK-tp28198179p28219183.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.