You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by martk <12...@web.de> on 2017/05/08 07:33:58 UTC

Artemis HA cluster with replication

Hello,

I am using ActiveMQ Artemis 1.5.4 and configured a high available cluster
(master/slave broker) with replication (using static connectors; see main
configuration parts below).

Under normal conditions (network connection fails or process shutdown/kill)
the switch from master to slave and backwards (desired to do by hand) works
nearly all the time (sometimes the backup server is not in sync although
both were parallel started for quite a time).

Simulating a busy master server results in two active master broker
(processing messages but with no replication any more). To test/reproduce I
have done the following steps:

1. Master and slave proper started (master is live and slave is backup).
2. Master stopped by sending the SIGSTOP signal to the process. After some
time the slave recognized the problem and gets live.
4. Sending the SIGCONT signal to the master process causes a running master
and slave. This could then only be resolved with a manual shutdown of both
and probably a lose of messages.

I would like to ensure only one live broker at the same time and the other
to do the backup (a shared storage is not possible).
Maybe it can be resolved by configuration otherwise I think that is a bug
because both server should always perform a continuous live-check.


-------------------- master-broker.xml
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xsi:schemaLocation="urn:activemq
/schema/artemis-configuration.xsd">
   <core xmlns="urn:activemq:core">

      <name>master</name>
     
      <persistence-enabled>true</persistence-enabled>

      <ha-policy>
         <replication>
            <master>
               <check-for-live-server>true</check-for-live-server>
            </master>
         </replication>
      </ha-policy>

      <connectors>
         <connector name="netty-connector">tcp://MASTERIP:61616</connector>
         <connector
name="netty-backup-connector-slave">tcp://SLAVEIP:61616</connector>
      </connectors>

      <acceptors>
         <acceptor name="netty-acceptor">tcp://MASTERIP:61616</acceptor>
      </acceptors>

      <cluster-connections>
         <cluster-connection name="cluster">
            <address>jms</address>
            <connector-ref>netty-connector</connector-ref>
            <retry-interval>500</retry-interval>
            <use-duplicate-detection>true</use-duplicate-detection>
            <static-connectors>
               <connector-ref>netty-backup-connector-slave</connector-ref>
            </static-connectors>
         </cluster-connection>
      </cluster-connections>

   </core>
</configuration>


-------------------- slave-broker.xml
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xsi:schemaLocation="urn:activemq
/schema/artemis-configuration.xsd">
   <core xmlns="urn:activemq:core">

      <name>slave1</name>
     
      <persistence-enabled>true</persistence-enabled>

      <ha-policy>
         <replication>
            <slave>
               <allow-failback>false</allow-failback>
            </slave>
         </replication>
      </ha-policy>

      <connectors>
         <connector
name="netty-live-connector">tcp://MASTERIP:61616</connector>
         <connector name="netty-connector">tcp://SLAVEIP:61616</connector>
      </connectors>

      <acceptors>
         <acceptor name="netty-acceptor">tcp://SLAVEIP:61616</acceptor>
      </acceptors>

      <cluster-connections>
         <cluster-connection name="cluster">
            <address>jms</address>
            <connector-ref>netty-connector</connector-ref>
            <retry-interval>500</retry-interval>
            <use-duplicate-detection>true</use-duplicate-detection>
            <static-connectors>
               <connector-ref>netty-live-connector</connector-ref>
            </static-connectors>
         </cluster-connection>
      </cluster-connections>

   </core>
</configuration>


--------------------
Regards,
Martin



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis HA cluster with replication

Posted by Justin Bertram <jb...@apache.org>.

I recommend you start a new thread about your issue rather than using a
thread which has been dormant for more than a year.

Justin

On Wed, Jul 4, 2018 at 9:03 AM, rajkaur152 <mi...@gmail.com> wrote:

> Hi Justin,
>
> I have a question. I want to configure cluster with "colocated"
> <HA-Policy>.
> I have 3 servers in a cluster. can you suggest how it is achievable. I
> cannot specify the group-name as suggested in official documentation to
> allow servers in a particular group to act as backup.
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>

Re: Artemis HA cluster with replication

Posted by rajkaur152 <mi...@gmail.com>.

Hi Justin,

I have a question. I want to configure cluster with "colocated" <HA-Policy>.
I have 3 servers in a cluster. can you suggest how it is achievable. I
cannot specify the group-name as suggested in official documentation to
allow servers in a particular group to act as backup.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Artemis HA cluster with replication

Posted by Justin Bertram <jb...@apache.org>.

Just to tie a bow on this...

Replication is designed to be used in production.  However, whether or not
it is ready for your specific use-case is something only you can validate
with testing.

Justin

On Wed, Jun 7, 2017 at 9:37 AM, praneethg <pr...@concur.com>
wrote:

> So can we go in to production using replication ? want to make sure Artemis
> is production ready , is any one using this in Production?
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4727161.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Artemis HA cluster with replication

Posted by praneethg <pr...@concur.com>.

So can we go in to production using replication ? want to make sure Artemis
is production ready , is any one using this in Production? 



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4727161.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis HA cluster with replication

Posted by Clebert Suconic <cl...@gmail.com>.

Performance over replication is only due to network latencies.

There is a wish at the moment.  But I am not aware of planning yet. It will
happen but I don't know when yet.

On Wed, Jun 7, 2017 at 9:51 AM praneethg <pr...@concur.com>
wrote:

> Are you planning to include ZooKeeper integration pretty soon? Any
> timeline?
> And would that improve performance over replication?
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4727132.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
-- 
Clebert Suconic

Re: Artemis HA cluster with replication

Posted by praneethg <pr...@concur.com>.

Are you planning to include ZooKeeper integration pretty soon? Any timeline?
And would that improve performance over replication?



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4727132.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis HA cluster with replication

Posted by Clebert Suconic <cl...@gmail.com>.

Just create 3 pairs.. connected in clustering.

The other option would be to develop a manager with zoo keeper..
that's not currently supported. but we could work together if you're
interested. You will still need 3 nodes on zookeeper to manage your
broker.

On Tue, May 16, 2017 at 10:51 AM, martk <12...@web.de> wrote:
> How do I have to set up the e.g. 4 nodes? One live (master) and 3 backup
> (slave)? And would this solve the problem of two live broker (the steps to
> reproduce are in first post)?
>
>
> clebertsuconic wrote
>> There isn't much we can do with a single backup pair.
>>
>> On this case you will need 3 pairs...
>>
>> or maybe you could embed zookeeper somehow. We had thought about using
>> zookeeper (which will require more than 3 nodes to elect quorum
>> anyways).
>>
>>
>> We can only guarantee that a single backup/live pair is not isolated
>> on the network. Now if you want to break the system, sure you can do
>> it.
>>
>> if you had more than 3 nodes the quorum would decide to set the node down.
>>
>> On Tue, May 16, 2017 at 9:50 AM, martk <12...@web.de> wrote:
>>> No, I would like to simulate a busy/overloaded server (e.g. the broker
>>> process is not working correctly). The network is all the time available
>>> and
>>> the server is also answering a ping.
>>>
>>>
>>> jbertram wrote
>>>> I realize you're attempting to simulate a network outage, but from what
>>>> I
>>>> understand using SIGSTOP isn't necessarily an accurate way to do it. It
>>>> was
>>>> explained to me awhile back by a colleague who had done quite a bit of
>>>> work
>>>> in this area that SIGSTOP works differently at the socket level from
>>>> something like pulling a network cable out of a NIC or even killing the
>>>> process. See more here [1]. In mention this because you might want to
>>>> develop an alternate testing mechanism to more accurately simulate a
>>>> network
>>>> outage use-case.
>>>>
>>>>
>>>> Justin
>>>>
>>>> [1]
>>>> https://unix.stackexchange.com/questions/202104/what-happens-to-requests-to-a-service-that-is-stopped-with-sigstop
>>>
>>> Quoted from:
>>> http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726194.html
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726197.html
>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>
>>
>>
>> --
>> Clebert Suconic
>
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726200.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.



-- 
Clebert Suconic

Re: Artemis HA cluster with replication

Posted by martk <12...@web.de>.

How do I have to set up the e.g. 4 nodes? One live (master) and 3 backup
(slave)? And would this solve the problem of two live broker (the steps to
reproduce are in first post)?


clebertsuconic wrote
> There isn't much we can do with a single backup pair.
> 
> On this case you will need 3 pairs...
> 
> or maybe you could embed zookeeper somehow. We had thought about using
> zookeeper (which will require more than 3 nodes to elect quorum
> anyways).
> 
> 
> We can only guarantee that a single backup/live pair is not isolated
> on the network. Now if you want to break the system, sure you can do
> it.
> 
> if you had more than 3 nodes the quorum would decide to set the node down.
> 
> On Tue, May 16, 2017 at 9:50 AM, martk <12...@web.de> wrote:
>> No, I would like to simulate a busy/overloaded server (e.g. the broker
>> process is not working correctly). The network is all the time available
>> and
>> the server is also answering a ping.
>>
>>
>> jbertram wrote
>>> I realize you're attempting to simulate a network outage, but from what
>>> I
>>> understand using SIGSTOP isn't necessarily an accurate way to do it. It
>>> was
>>> explained to me awhile back by a colleague who had done quite a bit of
>>> work
>>> in this area that SIGSTOP works differently at the socket level from
>>> something like pulling a network cable out of a NIC or even killing the
>>> process. See more here [1]. In mention this because you might want to
>>> develop an alternate testing mechanism to more accurately simulate a
>>> network
>>> outage use-case.
>>>
>>>
>>> Justin
>>>
>>> [1]
>>> https://unix.stackexchange.com/questions/202104/what-happens-to-requests-to-a-service-that-is-stopped-with-sigstop
>>
>> Quoted from:
>> http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726194.html
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726197.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
> 
> 
> 
> -- 
> Clebert Suconic





--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726200.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis HA cluster with replication

Posted by Clebert Suconic <cl...@gmail.com>.

There isn't much we can do with a single backup pair.

On this case you will need 3 pairs...

or maybe you could embed zookeeper somehow. We had thought about using
zookeeper (which will require more than 3 nodes to elect quorum
anyways).


We can only guarantee that a single backup/live pair is not isolated
on the network. Now if you want to break the system, sure you can do
it.

if you had more than 3 nodes the quorum would decide to set the node down.

On Tue, May 16, 2017 at 9:50 AM, martk <12...@web.de> wrote:
> No, I would like to simulate a busy/overloaded server (e.g. the broker
> process is not working correctly). The network is all the time available and
> the server is also answering a ping.
>
>
> jbertram wrote
>> I realize you're attempting to simulate a network outage, but from what I
>> understand using SIGSTOP isn't necessarily an accurate way to do it. It
>> was
>> explained to me awhile back by a colleague who had done quite a bit of
>> work
>> in this area that SIGSTOP works differently at the socket level from
>> something like pulling a network cable out of a NIC or even killing the
>> process. See more here [1]. In mention this because you might want to
>> develop an alternate testing mechanism to more accurately simulate a
>> network
>> outage use-case.
>>
>>
>> Justin
>>
>> [1]
>> https://unix.stackexchange.com/questions/202104/what-happens-to-requests-to-a-service-that-is-stopped-with-sigstop
>
> Quoted from:
> http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726194.html
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726197.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.



-- 
Clebert Suconic

Re: Artemis HA cluster with replication

Posted by martk <12...@web.de>.

No, I would like to simulate a busy/overloaded server (e.g. the broker
process is not working correctly). The network is all the time available and
the server is also answering a ping.


jbertram wrote
> I realize you're attempting to simulate a network outage, but from what I
> understand using SIGSTOP isn't necessarily an accurate way to do it. It
> was
> explained to me awhile back by a colleague who had done quite a bit of
> work
> in this area that SIGSTOP works differently at the socket level from
> something like pulling a network cable out of a NIC or even killing the
> process. See more here [1]. In mention this because you might want to
> develop an alternate testing mechanism to more accurately simulate a
> network
> outage use-case.
> 
> 
> Justin
> 
> [1]
> https://unix.stackexchange.com/questions/202104/what-happens-to-requests-to-a-service-that-is-stopped-with-sigstop

Quoted from: 
http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726194.html




--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726197.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis HA cluster with replication

Posted by Justin Bertram <jb...@apache.org>.

Now that 2.1 is released you should be using that.

You need to provide a network address:port or URL via the <network-check-list>. I don't believe it is sufficient to provide the address:port of the backup on the live and vice versa. The cluster nodes already have those addresses and ports by virtue of their clustering together. The whole point of the <network-check-list> is to provide a reference that is *outside* of the existing cluster to act as a kind of sentinel so the nodes can hopefully get a clearer understanding of whether or not they are isolated.

Justin

----- Original Message -----
From: "martk" <12...@web.de>
To: users@activemq.apache.org
Sent: Monday, May 15, 2017 11:49:45 PM
Subject: Re: Artemis HA cluster with replication

With master you are talking about the 2.2.0-SNAPSHOT?
How do I have to provide the address? Is the (current) entry in the
broker.xml (IP:PORT) enough?

Shared storage is unfortunately no option (that is why trying to use
Artemis).

On Mon, May 15, 2017, Clebert Suconic wrote: 
> that's how it should work...
> 
> Try master (current version now).. it should work..
> 
> You have to provide an address that's unique for both servers, and you can
> use to validate if the local network is out. that is you need to solve
> your network issues as well.. if you split the network there's
no way araound.

--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726186.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis HA cluster with replication

Posted by martk <12...@web.de>.

With master you are talking about the 2.2.0-SNAPSHOT?
How do I have to provide the address? Is the (current) entry in the
broker.xml (IP:PORT) enough?

Shared storage is unfortunately no option (that is why trying to use
Artemis).

On Mon, May 15, 2017, Clebert Suconic wrote: 
> that's how it should work...
> 
> Try master (current version now).. it should work..
> 
> You have to provide an address that's unique for both servers, and you can
> use to validate if the local network is out. that is you need to solve
> your network issues as well.. if you split the network there's
no way araound.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4726186.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis HA cluster with replication

Posted by martk <12...@web.de>.

From which script are you talking about?

I am configuring and starting the EmbeddedJMS from a simple Main-class.
Maybe such a check could be triggered at that point? 



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4725817.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis HA cluster with replication

Posted by Clebert Suconic <cl...@gmail.com>.

You could also contribute a change to the script to reboot the server
in case of JVM crash, that way client reconnects would work and you
would only need the network pinger in case of a real box crash.

On Mon, May 8, 2017 at 10:32 AM, Clebert Suconic
<cl...@gmail.com> wrote:
> you need a network of brokers of at least 3 pairs to have quorum working fine.
>
> On Mon, May 8, 2017 at 8:47 AM, martk <12...@web.de> wrote:
>> Hi clebertsuconic,
>>
>> I have tested the network check by adding (on master with the SLAVEIP and on
>> slave with the MASTERIP):
>>
>> <network-check-period>10000</network-check-period>
>> <network-check-timeout>1000</network-check-timeout>
>> <network-check-list>MASTERIP/SLAVEIP</network-check-list>
>>
>> Using the described signals to test this does not work (which makes sense,
>> because the ping is working).
>>
>> How many servers would I need for a group? What is needed to configure it?
>> Can you provide an example please?
>>
>>
>>
>> --
>> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4725741.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>
>
> --
> Clebert Suconic



-- 
Clebert Suconic

Re: Artemis HA cluster with replication

Posted by Clebert Suconic <cl...@gmail.com>.

you need a network of brokers of at least 3 pairs to have quorum working fine.

On Mon, May 8, 2017 at 8:47 AM, martk <12...@web.de> wrote:
> Hi clebertsuconic,
>
> I have tested the network check by adding (on master with the SLAVEIP and on
> slave with the MASTERIP):
>
> <network-check-period>10000</network-check-period>
> <network-check-timeout>1000</network-check-timeout>
> <network-check-list>MASTERIP/SLAVEIP</network-check-list>
>
> Using the described signals to test this does not work (which makes sense,
> because the ping is working).
>
> How many servers would I need for a group? What is needed to configure it?
> Can you provide an example please?
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4725741.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.



-- 
Clebert Suconic

Re: Artemis HA cluster with replication

Posted by martk <12...@web.de>.

Hi clebertsuconic,

I have tested the network check by adding (on master with the SLAVEIP and on
slave with the MASTERIP):

<network-check-period>10000</network-check-period>
<network-check-timeout>1000</network-check-timeout>
<network-check-list>MASTERIP/SLAVEIP</network-check-list>

Using the described signals to test this does not work (which makes sense,
because the ping is working).

How many servers would I need for a group? What is needed to configure it?
Can you provide an example please?



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734p4725741.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Artemis HA cluster with replication

Posted by Clebert Suconic <cl...@gmail.com>.

U need either a group of servers to establish quorums or u need the network
pinger.


On Mon, May 8, 2017 at 3:46 AM martk <12...@web.de> wrote:

> Hello,
>
> I am using ActiveMQ Artemis 1.5.4 and configured a high available cluster
> (master/slave broker) with replication (using static connectors; see main
> configuration parts below).
>
> Under normal conditions (network connection fails or process shutdown/kill)
> the switch from master to slave and backwards (desired to do by hand) works
> nearly all the time (sometimes the backup server is not in sync although
> both were parallel started for quite a time).
>
> Simulating a busy master server results in two active master broker
> (processing messages but with no replication any more). To test/reproduce I
> have done the following steps:
>
> 1. Master and slave proper started (master is live and slave is backup).
> 2. Master stopped by sending the SIGSTOP signal to the process. After some
> time the slave recognized the problem and gets live.
> 4. Sending the SIGCONT signal to the master process causes a running master
> and slave. This could then only be resolved with a manual shutdown of both
> and probably a lose of messages.
>
> I would like to ensure only one live broker at the same time and the other
> to do the backup (a shared storage is not possible).
> Maybe it can be resolved by configuration otherwise I think that is a bug
> because both server should always perform a continuous live-check.
>
>
> -------------------- master-broker.xml
> <?xml version='1.0'?>
> <configuration xmlns="urn:activemq"
>                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>                xsi:schemaLocation="urn:activemq
> /schema/artemis-configuration.xsd">
>    <core xmlns="urn:activemq:core">
>
>       <name>master</name>
>
>       <persistence-enabled>true</persistence-enabled>
>
>       <ha-policy>
>          <replication>
>             <master>
>                <check-for-live-server>true</check-for-live-server>
>             </master>
>          </replication>
>       </ha-policy>
>
>       <connectors>
>          <connector name="netty-connector">tcp://MASTERIP:61616</connector>
>          <connector
> name="netty-backup-connector-slave">tcp://SLAVEIP:61616</connector>
>       </connectors>
>
>       <acceptors>
>          <acceptor name="netty-acceptor">tcp://MASTERIP:61616</acceptor>
>       </acceptors>
>
>       <cluster-connections>
>          <cluster-connection name="cluster">
>             <address>jms</address>
>             <connector-ref>netty-connector</connector-ref>
>             <retry-interval>500</retry-interval>
>             <use-duplicate-detection>true</use-duplicate-detection>
>             <static-connectors>
>                <connector-ref>netty-backup-connector-slave</connector-ref>
>             </static-connectors>
>          </cluster-connection>
>       </cluster-connections>
>
>    </core>
> </configuration>
>
>
> -------------------- slave-broker.xml
> <?xml version='1.0'?>
> <configuration xmlns="urn:activemq"
>                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>                xsi:schemaLocation="urn:activemq
> /schema/artemis-configuration.xsd">
>    <core xmlns="urn:activemq:core">
>
>       <name>slave1</name>
>
>       <persistence-enabled>true</persistence-enabled>
>
>       <ha-policy>
>          <replication>
>             <slave>
>                <allow-failback>false</allow-failback>
>             </slave>
>          </replication>
>       </ha-policy>
>
>       <connectors>
>          <connector
> name="netty-live-connector">tcp://MASTERIP:61616</connector>
>          <connector name="netty-connector">tcp://SLAVEIP:61616</connector>
>       </connectors>
>
>       <acceptors>
>          <acceptor name="netty-acceptor">tcp://SLAVEIP:61616</acceptor>
>       </acceptors>
>
>       <cluster-connections>
>          <cluster-connection name="cluster">
>             <address>jms</address>
>             <connector-ref>netty-connector</connector-ref>
>             <retry-interval>500</retry-interval>
>             <use-duplicate-detection>true</use-duplicate-detection>
>             <static-connectors>
>                <connector-ref>netty-live-connector</connector-ref>
>             </static-connectors>
>          </cluster-connection>
>       </cluster-connections>
>
>    </core>
> </configuration>
>
>
> --------------------
> Regards,
> Martin
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
-- 
Clebert Suconic

Re: Artemis HA cluster with replication

Posted by Justin Bertram <jb...@apache.org>.

I realize you're attempting to simulate a network outage, but from what I understand using SIGSTOP isn't necessarily an accurate way to do it. It was explained to me awhile back by a colleague who had done quite a bit of work in this area that SIGSTOP works differently at the socket level from something like pulling a network cable out of a NIC or even killing the process. See more here [1]. In mention this because you might want to develop an alternate testing mechanism to more accurately simulate a network outage use-case.

Justin

[1] https://unix.stackexchange.com/questions/202104/what-happens-to-requests-to-a-service-that-is-stopped-with-sigstop

----- Original Message -----
From: "martk" <12...@web.de>
To: users@activemq.apache.org
Sent: Monday, May 8, 2017 2:33:58 AM
Subject: Artemis HA cluster with replication

Hello,

I am using ActiveMQ Artemis 1.5.4 and configured a high available cluster
(master/slave broker) with replication (using static connectors; see main
configuration parts below).

Under normal conditions (network connection fails or process shutdown/kill)
the switch from master to slave and backwards (desired to do by hand) works
nearly all the time (sometimes the backup server is not in sync although
both were parallel started for quite a time).

Simulating a busy master server results in two active master broker
(processing messages but with no replication any more). To test/reproduce I
have done the following steps:

1. Master and slave proper started (master is live and slave is backup).
2. Master stopped by sending the SIGSTOP signal to the process. After some
time the slave recognized the problem and gets live.
4. Sending the SIGCONT signal to the master process causes a running master
and slave. This could then only be resolved with a manual shutdown of both
and probably a lose of messages.

I would like to ensure only one live broker at the same time and the other
to do the backup (a shared storage is not possible).
Maybe it can be resolved by configuration otherwise I think that is a bug
because both server should always perform a continuous live-check.

-------------------- master-broker.xml
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xsi:schemaLocation="urn:activemq
/schema/artemis-configuration.xsd">
   <core xmlns="urn:activemq:core">

      <name>master</name>

      <persistence-enabled>true</persistence-enabled>

      <ha-policy>
         <replication>
            <master>
               <check-for-live-server>true</check-for-live-server>
            </master>
         </replication>
      </ha-policy>

      <connectors>
         <connector name="netty-connector">tcp://MASTERIP:61616</connector>
         <connector
name="netty-backup-connector-slave">tcp://SLAVEIP:61616</connector>
      </connectors>

      <acceptors>
         <acceptor name="netty-acceptor">tcp://MASTERIP:61616</acceptor>
      </acceptors>

      <cluster-connections>
         <cluster-connection name="cluster">
            <address>jms</address>
            <connector-ref>netty-connector</connector-ref>
            <retry-interval>500</retry-interval>
            <use-duplicate-detection>true</use-duplicate-detection>
            <static-connectors>
               <connector-ref>netty-backup-connector-slave</connector-ref>
            </static-connectors>
         </cluster-connection>
      </cluster-connections>

   </core>
</configuration>

-------------------- slave-broker.xml
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xsi:schemaLocation="urn:activemq
/schema/artemis-configuration.xsd">
   <core xmlns="urn:activemq:core">

      <name>slave1</name>

      <persistence-enabled>true</persistence-enabled>

      <ha-policy>
         <replication>
            <slave>
               <allow-failback>false</allow-failback>
            </slave>
         </replication>
      </ha-policy>

      <connectors>
         <connector
name="netty-live-connector">tcp://MASTERIP:61616</connector>
         <connector name="netty-connector">tcp://SLAVEIP:61616</connector>
      </connectors>

      <acceptors>
         <acceptor name="netty-acceptor">tcp://SLAVEIP:61616</acceptor>
      </acceptors>

      <cluster-connections>
         <cluster-connection name="cluster">
            <address>jms</address>
            <connector-ref>netty-connector</connector-ref>
            <retry-interval>500</retry-interval>
            <use-duplicate-detection>true</use-duplicate-detection>
            <static-connectors>
               <connector-ref>netty-live-connector</connector-ref>
            </static-connectors>
         </cluster-connection>
      </cluster-connections>

   </core>
</configuration>

--------------------
Regards,
Martin

--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.