You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by Sebastian Dellwig <se...@iqser.com> on 2016/06/21 12:01:23 UTC

Artemis: Automatic failback does not work

Hello,
I'm trying to use Artemis with automatic failover.
I have two Servers, one is Master the other is Slave.
I'm using them in "replication" mode, not shared file.

Master has also "check-for-live-server" enabled.
Slave has "allow-failback" enabled.

Failover works fine, but when the master is available again, both server
start complaining that there were other servers with the same ID. (That's ok
so far). 
But Master does not synchronize with Slave, and so the Slave does not do the
failback.

Does some know what I possibly could do wrong?
Or is this a bug?

-Sebastian





--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-Automatic-failback-does-not-work-tp4713190.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis: Automatic failback does not work

Posted by Martyn Taylor <mt...@redhat.com>.

Hi Sebastian,

Good to hear this resolved your problem.  The whole connector/acceptor
concept in Artemis is a little bit confusing.  In short, the acceptor in
the config is a way to configure the broker to allow clients to connect.
It opens a port, configures the protocols etc....

A Connector is essentially information that is passed to the clients (or
other brokers) instructing them how to connect to the broker.  It get's
broadcast to other nodes (providing broadcast is configured),  This allows
cluster nodes to discover each other, they take the connector information
then use it to connect to an acceptor on another node.

In your case, you had the connector/acceptor configured properly on the
master node, so the slave was able to discover and connector to it.  But
when failback happened the master node was unable to connect to the slave
(current live) since the connector information was wrong.

There's a section on transports in the user manual that covers acceptors
and connectors, it's worth a read because the acceptor/connector stuff is
not obvious.
https://activemq.apache.org/artemis/docs/1.3.0/configuring-transports.html

Cheers
Martyn

On Wed, Jun 22, 2016 at 7:18 AM, Sebastian Dellwig <
sebastian.dellwig.ext@iqser.com> wrote:

> yes, that's it.
> It works now. I did not knew, that there is a correlation between acceptor
> and connector.
>
> Thank you very much
> -Sebastian
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Artemis-Automatic-failback-does-not-work-tp4713190p4713231.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Artemis: Automatic failback does not work

Posted by Sebastian Dellwig <se...@iqser.com>.

yes, that's it. 
It works now. I did not knew, that there is a correlation between acceptor
and connector.

Thank you very much
-Sebastian



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-Automatic-failback-does-not-work-tp4713190p4713231.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis: Automatic failback does not work

Posted by Martyn Taylor <mt...@redhat.com>.

Sebastian,

Looks like there's a mistake in your slave.xml.  If you change your
acceptor port to match the configured connector (or visa versa), I think
you'll be good to go.

Please let me know if this resolves your problem.

Regards
Martyn

---

<connectors>
   <connector name="netty-connector">tcp://192.168.50.1:61617</connector>
</connectors>

<acceptors>
     <acceptor name="netty-acceptor">tcp://192.168.50.1:61616</acceptor>  //
Should be tcp://192.168.50.1:61617
</acceptors>

On Tue, Jun 21, 2016 at 4:26 PM, Martyn Taylor <mt...@redhat.com> wrote:

> How many messages do you have in the broker when this happens?  There is a
> sync period which may take a short while, particularly if there's a lot of
> data in the broker or the connection between the live and backup is slow.
>
> Essentially what is happening during this period is.
>
> 1. Master starts up checks to see if there is broker running with the same
> ID.
> 2. Same ID found, master starts as slave
> 3. Master starts replicating the live node until both nodes are in sync.
> 4. Live is then shutdown
> 5. Master takes over as live
> 6. Slave starts as backup.  Sync happens again for the backup (in case any
> messages were sent during it's shutdown).
>
> You will experience some delay during this sync period, the length depends
> on how many messages you have in the broker.
>
> That said, I have seen issues where the master hangs on failback.  So,
> I'll take a look here.
>
> Could you also let me know if you have many persisted messages and could
> you try again and just wait a little longer to ensure it's not the expected
> delay described above.
>
> Thanks
>
> On Tue, Jun 21, 2016 at 3:23 PM, Sebastian Dellwig <
> sebastian.dellwig.ext@iqser.com> wrote:
>
>> Hi Martyn,
>> I tried it now with version 1.3.0.
>> Same behavior.
>> I waited a couple of minutes, but the failback does not happen.
>> Instead both permanently post: AMQ212034: There are more than one servers
>> on
>> the network broadcasting the same node id.
>>
>> I'll send you my broker.xml files.
>>
>> Thanks in advance.
>>
>> -Sebastian
>>
>>
>>
>> --
>> View this message in context:
>> http://activemq.2283324.n4.nabble.com/Artemis-Automatic-failback-does-not-work-tp4713190p4713197.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>
>
>

Re: Artemis: Automatic failback does not work

Posted by Martyn Taylor <mt...@redhat.com>.

How many messages do you have in the broker when this happens?  There is a
sync period which may take a short while, particularly if there's a lot of
data in the broker or the connection between the live and backup is slow.

Essentially what is happening during this period is.

1. Master starts up checks to see if there is broker running with the same
ID.
2. Same ID found, master starts as slave
3. Master starts replicating the live node until both nodes are in sync.
4. Live is then shutdown
5. Master takes over as live
6. Slave starts as backup.  Sync happens again for the backup (in case any
messages were sent during it's shutdown).

You will experience some delay during this sync period, the length depends
on how many messages you have in the broker.

That said, I have seen issues where the master hangs on failback.  So, I'll
take a look here.

Could you also let me know if you have many persisted messages and could
you try again and just wait a little longer to ensure it's not the expected
delay described above.

Thanks

On Tue, Jun 21, 2016 at 3:23 PM, Sebastian Dellwig <
sebastian.dellwig.ext@iqser.com> wrote:

> Hi Martyn,
> I tried it now with version 1.3.0.
> Same behavior.
> I waited a couple of minutes, but the failback does not happen.
> Instead both permanently post: AMQ212034: There are more than one servers
> on
> the network broadcasting the same node id.
>
> I'll send you my broker.xml files.
>
> Thanks in advance.
>
> -Sebastian
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Artemis-Automatic-failback-does-not-work-tp4713190p4713197.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Artemis: Automatic failback does not work

Posted by Sebastian Dellwig <se...@iqser.com>.

Hi Martyn,
I tried it now with version 1.3.0. 
Same behavior.
I waited a couple of minutes, but the failback does not happen.
Instead both permanently post: AMQ212034: There are more than one servers on
the network broadcasting the same node id.

I'll send you my broker.xml files.

Thanks in advance.

-Sebastian



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-Automatic-failback-does-not-work-tp4713190p4713197.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis: Automatic failback does not work

Posted by Martyn Taylor <mt...@redhat.com>.

Hi Sebastian,

There is a period where both servers will be up at the same time during
when using replication, so you might see the "more than one server with the
same ID" message for a period.  This is normal.

Which version of Apache Artemis are you using.  There were several fixes
around replication and HA that are in the latest release (1.3.0).  I'd
recommend trying with this version if you haven't already.

If it's still failing with 1.3.0 could you please send me your broker.xml
for each server and I'll take a look.

Thanks
Martyn

On Tue, Jun 21, 2016 at 1:01 PM, Sebastian Dellwig <
sebastian.dellwig.ext@iqser.com> wrote:

> Hello,
> I'm trying to use Artemis with automatic failover.
> I have two Servers, one is Master the other is Slave.
> I'm using them in "replication" mode, not shared file.
>
> Master has also "check-for-live-server" enabled.
> Slave has "allow-failback" enabled.
>
> Failover works fine, but when the master is available again, both server
> start complaining that there were other servers with the same ID. (That's
> ok
> so far).
> But Master does not synchronize with Slave, and so the Slave does not do
> the
> failback.
>
> Does some know what I possibly could do wrong?
> Or is this a bug?
>
> -Sebastian
>
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Artemis-Automatic-failback-does-not-work-tp4713190.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>