You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by "Gary Tully (JIRA)" <ji...@apache.org> on 2013/02/25 12:58:12 UTC

[jira] [Comment Edited] (AMQ-4122) Lease Database Locker failover broken

    [ https://issues.apache.org/jira/browse/AMQ-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584596#comment-13584596 ] 

Gary Tully edited comment on AMQ-4122 at 2/25/13 11:57 AM:
-----------------------------------------------------------

@st.h - thanks for the slq log.
>From looking at https://issues.apache.org/jira/secure/attachment/12570319/mysql.log, it looks like a configuration problem.
node-h03-ap21 is obtaining a 5s lease that it renews every 10s. So there is a s period when the lease is available to others.

It needs to obtain a 10 second lease and update it every 5 seconds. So that a second (slave) broker always sees time > now when it attempts an update as part of an acquire.

You need:
{code}
<jdbcPersistenceAdapter ... lockKeepAlivePeriod="5000">
   ..
   <locker>
     <lease-database-locker lockAcquireSleepInterval="10000"/>
   </locker>
{code}
lockAcquireSleepInterval is the lease duration, lockKeepAlivePeriod is the lease renew period. On a renew, the lease is extended by the lockAcquireSleepInterval (lease duration). So a master is always (lockAcquireSleepInterval - lockKeepAlivePeriod) ahead with its lease.
In short, ensure: lockAcquireSleepInterval > lockKeepAlivePeriod.

Can you verify this.
I think it may also makes sense to add lease related attributes to this locker. leaseDuration, leaseRenewPeriod so that it is a little more intuitive and obvious that the leaseDuration > leaseRenewPeriod

                
      was (Author: gtully):
    @SouNayi - thanks for the slq log.
>From looking at https://issues.apache.org/jira/secure/attachment/12570319/mysql.log, it looks like a configuration problem.
node-h03-ap21 is obtaining a 5s lease that it renews every 10s. So there is a s period when the lease is available to others.

It needs to obtain a 10 second lease and update it every 5 seconds. So that a second (slave) broker always sees time > now when it attempts an update as part of an acquire.

You need:
{code}
<jdbcPersistenceAdapter ... lockKeepAlivePeriod="5000">
   ..
   <locker>
     <lease-database-locker lockAcquireSleepInterval="10000"/>
   </locker>
{code}
lockAcquireSleepInterval is the lease duration, lockKeepAlivePeriod is the lease renew period. On a renew, the lease is extended by the lockAcquireSleepInterval (lease duration). So a master is always (lockAcquireSleepInterval - lockKeepAlivePeriod) ahead with its lease.
In short, ensure: lockAcquireSleepInterval > lockKeepAlivePeriod.

Can you verify this.
I think it may also makes sense to add lease related attributes to this locker. leaseDuration, leaseRenewPeriod so that it is a little more intuitive and obvious that the leaseDuration > leaseRenewPeriod

                  
> Lease Database Locker failover broken
> -------------------------------------
>
>                 Key: AMQ-4122
>                 URL: https://issues.apache.org/jira/browse/AMQ-4122
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.7.0
>         Environment: Java 7u9, SUSE 11, Mysql
>            Reporter: st.h
>            Assignee: Gary Tully
>             Fix For: 5.8.0
>
>         Attachments: activemq-kyle.xml, activemq.xml, activemq.xml, AMQ4122.patch, mysql.log
>
>
> We are using ActiveMQ 5.7.0 together with a mysql database and could not observe correct failover behavior with lease database locker.
> It seems that there is a race condition, which prevents the correct failover procedure.
> We noticed that when starting up two instances, both instance are becoming master.
> We did several test, including the following and could not observe intended functionality:
> - shutdown all instances
> - manipulate database lock that one node has lock and set expiry time in distance future
> - start up both instances. both instances are unable to acquire lock, as the lock hasn't expired, which should be correct behavior.
> - update the expiry time in database, so that the lock is expired.
> - first instance notices expired lock and becomes master
> - when second instance checks for lock, it also updates the database and becomes master.
> To my understanding the second instance should not be able to update the lock, as it is held by the first instance and should not be able to become master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira