You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by ejosterberg <eo...@l1id.com> on 2010/07/02 17:03:11 UTC

Noob Questions - Fail-over / Redundancy Help.

I'm having a hell of a time with setting up fail-over / redundancy with
activemq 5.3.0

I've tried using Oracle as a shared backend and fail-over works fine if I
shutdown the activemq process on the active or hot host. But if I yank the
power cord from the host that's the active or hot host a database lock
remains blocking any of the slave or backup hosts from taking over as a
primary or hot host.

I've tried nfsv4 on linux redhat 5.5 and the timeouts are taking forever.

I'm using the following for mounting options:
nfs4
rw,vers=4,rsize=32768,wsize=32768,hard,intr,proto=tcp,timeo=150,retrans=3,sec=sys

nfs seems to be the best option so far. However when I was testing
yesterday, killing the host power again, I didn't see the failover after
waiting 25 minutes and gave up hope and left the building for lunch only to
return an hour later and finding we were back up and processing on a
slave/backup instance.  I assume I need to either tune tcp or nfs to get
faster results.

Any advice?

What's the best option for prompt fail-over for my mission critical
application. Downtime not exceeding 6 minutes a year is the expectation to
meet.

Thanks!
-- 
View this message in context: http://old.nabble.com/Noob-Questions---Fail-over---Redundancy-Help.-tp29057308p29057308.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Noob Questions - Fail-over / Redundancy Help.

Posted by ejosterberg <eo...@l1id.com>.
In the last case, what was done is that the network was disconnected from the
active node until Oracle timed out the connection and the lock was released. 
Once the network was reconnected, the failed node began processing with the
new node in parallel. I'm only reporting what was shared with me. I'm not
certain of what was seen.  

I'm looking for advice from anyone who is comfortable that they have a HA
solution that is working for them and asking what method they used.

==================

Now what about the last scenario. How was ActiveMQ shut down? How did was is
restarted? How did you test that the restarted instance was actually
processing messages?  
   
-Clark 


Our problem with using Oracle was that if the Active or Hot instance were to
become disconnected and with the changes made to Oracle to timeout the
connection and therefore release the lock on the database were to succeed,
we would indeed have a secondary or standby instance begin processing and
all is well until the previous instance again returns to the network and
what we are finding is that it will again create a session with Oracle and
will begin processing in parallel without attempting to gain a lock on the
DB. Now we have a problem of two instances of ActiveMQ are running.

Any advice on the best method? 

-- 
View this message in context: http://old.nabble.com/Noob-Questions---Fail-over---Redundancy-Help.-tp29057308p29216328.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Noob Questions - Fail-over / Redundancy Help.

Posted by cobrien <cl...@ttmsolutions.com>.
Hi,
It seems to me you have  tested three scenarios.
>From your first posting.
1) You kill the activemq process for the master. Result: In this case
everything worked fine. 
2) You pulled the plug on the machine running activemq. Result: The 
database did not detect the connection was closed in a timely fashion so
lock was not released on table. A solution to this problem is to make the
database detect closed connections sooner- TCP config etc. 


Now what about the last scenario. How was ActiveMQ shut down? How did was is
restarted? How did you test that the restarted instance was actually
processing messages?  
   
-Clark 

www.ttmsolutions.com 
ActiveMQ reference guide at 
http://bit.ly/AMQRefGuide 




















Our problem with using Oracle was that if the Active or Hot instance were to
become disconnected and with the changes made to Oracle to timeout the
connection and therefore release the lock on the database were to succeed,
we would indeed have a secondary or standby instance begin processing and
all is well until the previous instance again returns to the network and
what we are finding is that it will again create a session with Oracle and
will begin processing in parallel without attempting to gain a lock on the
DB. Now we have a problem of two instances of ActiveMQ are running.

Any advice on the best method? 

I see there have been some problems with persistence store corruption with
NFS as well.
http://old.nabble.com/Failover-and-Fail-BACK-td28198179.html#a28222719

Is ActiveMQ not ready for production enterprise networks or is there a
better method of implementing H.A.?


cobrien wrote:
> 
> For Oracle, the  master instance of ActiveMQ obtains a lock the database
> using a "select for update"  SQL statement. 
> It appears that when you pull the plug, the data store does not detect the
> stale connection in a  timely enough  fashion for your requirements. 
> You can shorten the time needed to detect the stale connection by tuning
> the  keepAlive TCP parameters ( OS specific) to meet your uptime
> requirements.  When using oracle, setting  'ENABLE=BROKEN' in the TNS  ora 
> will enable  use of the keepAlive packets.
> Oracle also allows you to ping the client at regular intervals set by
> sqlnet.expire_time (in minutes!). 
> 
> 
> As always, do your testing in  an environment that   mimics your
> production environment first. You may have to use trial and error  to find
> the right settings for your OS and data store.
> 
> 

-- 
View this message in context: http://old.nabble.com/Noob-Questions---Fail-over---Redundancy-Help.-tp29057308p29090904.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Noob Questions - Fail-over / Redundancy Help.

Posted by ejosterberg <eo...@l1id.com>.
Our problem with using Oracle was that if the Active or Hot instance were to
become disconnected and with the changes made to Oracle to timeout the
connection and therefore release the lock on the database were to succeed,
we would indeed have a secondary or standby instance begin processing and
all is well until the previous instance again returns to the network and
what we are finding is that it will again create a session with Oracle and
will begin processing in parallel without attempting to gain a lock on the
DB. Now we have a problem of two instances of ActiveMQ are running.

Any advice on the best method? 

I see there have been some problems with persistence store corruption with
NFS as well.
http://old.nabble.com/Failover-and-Fail-BACK-td28198179.html#a28222719

Is ActiveMQ not ready for production enterprise networks or is there a
better method of implementing H.A.?


For Oracle, the  master instance of ActiveMQ obtains a lock the database
using a "select for update"  SQL statement. 
It appears that when you pull the plug, the data store does not detect the
stale connection in a  timely enough  fashion for your requirements. 
You can shorten the time needed to detect the stale connection by tuning the 
keepAlive TCP parameters ( OS specific) to meet your uptime requirements. 
When using oracle, setting  'ENABLE=BROKEN' in the TNS  ora  will enable 
use of the keepAlive packets.
Oracle also allows you to ping the client at regular intervals set by
sqlnet.expire_time (in minutes!). 


As always, do your testing in  an environment that   mimics your production
environment first. You may have to use trial and error  to find the right
settings for your OS and data store.

-- 
View this message in context: http://old.nabble.com/Noob-Questions---Fail-over---Redundancy-Help.-tp29057308p29090284.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Noob Questions - Fail-over / Redundancy Help.

Posted by cobrien <cl...@ttmsolutions.com>.
For Oracle, the  master instance of ActiveMQ obtains a lock the database
using a "select for update"  SQL statement. 
It appears that when you pull the plug, the data store does not detect the
stale connection in a  timely enough  fashion for your requirements. 
You can shorten the time needed to detect the stale connection by tuning the 
keepAlive TCP parameters ( OS specific) to meet your uptime requirements. 
When using oracle, setting  'ENABLE=BROKEN' in the TNS  ora  will enable 
use of the keepAlive packets.
Oracle also allows you to ping the client at regular intervals set by
sqlnet.expire_time (in minutes!). 


As always, do your testing in  an environment that   mimics your production
environment first. You may have to use trial and error  to find the right
settings for your OS and data store.


 
Happy Coding
Clark 

www.ttmsolutions.com 
ActiveMQ reference guide at 
http://bit.ly/AMQRefGuide
 





ejosterberg wrote:
> 
> I'm having a hell of a time with setting up fail-over / redundancy with
> activemq 5.3.0
> 
> I've tried using Oracle as a shared backend and fail-over works fine if I
> shutdown the activemq process on the active or hot host. But if I yank the
> power cord from the host that's the active or hot host a database lock
> remains blocking any of the slave or backup hosts from taking over as a
> primary or hot host.
> 
> I've tried nfsv4 on linux redhat 5.5 and the timeouts are taking forever.
> 
> I'm using the following for mounting options:
> nfs4
> rw,vers=4,rsize=32768,wsize=32768,hard,intr,proto=tcp,timeo=150,retrans=3,sec=sys
> 
> nfs seems to be the best option so far. However when I was testing
> yesterday, killing the host power again, I didn't see the failover after
> waiting 25 minutes and gave up hope and left the building for lunch only
> to return an hour later and finding we were back up and processing on a
> slave/backup instance.  I assume I need to either tune tcp or nfs to get
> faster results.
> 
> Any advice?
> 
> What's the best option for prompt fail-over for my mission critical
> application. Downtime not exceeding 6 minutes a year is the expectation to
> meet.
> 
> Thanks!
> 

-- 
View this message in context: http://old.nabble.com/Noob-Questions---Fail-over---Redundancy-Help.-tp29057308p29066738.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.