You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by "R.I.Pienaar" <ri...@devco.net> on 2011/09/30 12:54:02 UTC

MySQL active/passive cluster not recovering from master power failure

hello,

I have a active/passive setup using a mysql datastore:

    <bean id="mysql-ds" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close">
       <property name="driverClassName" value="com.mysql.jdbc.Driver"/>
       <property name="url" value="jdbc:mysql://jmsdb1/activemq?relaxAutoCommit=true"/>
       <property name="username" value="activemq"/>
       <property name="password" value="xx"/>
       <property name="poolPreparedStatements" value="true"/>
     </bean>

this works fine, one of the pair is master and one is slave based on the 
lock contention on the table.

When the master fails - like with a power failure or a kernel panic - the lock
does not get released though.  Even through a restart of the server the lock
do not get released.  If we just cleanly shut down the master like for maintenance
then it is all good.

After restart when the server eventually came back up it re-acquired the lock and
was again the master.

Failover never happened.  Is there some tunable setting or advice you can give on 
improving this setup to be better resilient to failure of this nature?
-- 
R.I.Pienaar

Re: MySQL active/passive cluster not recovering from master power failure

Posted by "R.I.Pienaar" <ri...@devco.net>.

----- Original Message -----
> 
> 
> ----- Original Message -----
> > How did you simulate such error?
> > I have tested JDBC master/slave in the past and killed -9 the
> > master.
> > The lock on the database was released immediately and so the slave
> > was able to take over.
> > 
> > Have never simulated a kernel panic though.
> > Do you use a default MySQL configuration?
> 
> not simulated, the box died :P
> 
> I am working on the assumption that it is the default mysql
> wait_timeout of
> 8 hours that caused it to not notice the machine go away and so didnt
> release the lock.
> 
> Busy testing how activemq behave if I drop this to a low number

Indeed this was the problem, setting wait_timeout=60 in the my.cnf
of the mysql server solves this, 

I tested this using 2 virtual machines and just using the suspend feature
to stop the one dead in its tracks, without setting wait_timeout failover
doesnt happen.  With setting it to 60 failover happens in ~ 60 seconds.

hth
-- 
R.I.Pienaar

Re: MySQL active/passive cluster not recovering from master power failure

Posted by "R.I.Pienaar" <ri...@devco.net>.

----- Original Message -----
> How did you simulate such error?
> I have tested JDBC master/slave in the past and killed -9 the master.
> The lock on the database was released immediately and so the slave
> was able to take over.
> 
> Have never simulated a kernel panic though.
> Do you use a default MySQL configuration?

not simulated, the box died :P

I am working on the assumption that it is the default mysql wait_timeout of
8 hours that caused it to not notice the machine go away and so didnt release
the lock.

Busy testing how activemq behave if I drop this to a low number

Re: MySQL active/passive cluster not recovering from master power failure

Posted by Torsten Mielke <to...@fusesource.com>.
How did you simulate such error? 
I have tested JDBC master/slave in the past and killed -9 the master. The lock on the database was released immediately and so the slave was able to take over.

Have never simulated a kernel panic though.
Do you use a default MySQL configuration?


Torsten Mielke
torsten@fusesource.com
tmielke@blogspot.com

 

On Sep 30, 2011, at 12:54 PM, R.I.Pienaar wrote:

> hello,
> 
> I have a active/passive setup using a mysql datastore:
> 
>    <bean id="mysql-ds" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close">
>       <property name="driverClassName" value="com.mysql.jdbc.Driver"/>
>       <property name="url" value="jdbc:mysql://jmsdb1/activemq?relaxAutoCommit=true"/>
>       <property name="username" value="activemq"/>
>       <property name="password" value="xx"/>
>       <property name="poolPreparedStatements" value="true"/>
>     </bean>
> 
> this works fine, one of the pair is master and one is slave based on the 
> lock contention on the table.
> 
> When the master fails - like with a power failure or a kernel panic - the lock
> does not get released though.  Even through a restart of the server the lock
> do not get released.  If we just cleanly shut down the master like for maintenance
> then it is all good.
> 
> After restart when the server eventually came back up it re-acquired the lock and
> was again the master.
> 
> Failover never happened.  Is there some tunable setting or advice you can give on 
> improving this setup to be better resilient to failure of this nature?
> -- 
> R.I.Pienaar