You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by red3 <al...@aeso.ca> on 2006/06/07 00:20:56 UTC

[Un]reliable:// network of AMQ brokers with Lingo

We are using Lingo 1.1 and ActiveMQ 3.1 to achieve RPC through Spring and JMS
(Message-Driven POJOs).

Things were fine until... We set up two ActiveMQ brokers in a
reliable/failover configuration.
We use the reliable: protocol with two fixed port addresses.
E.g. 
    <bean id="jmsBrokerUrl" abstract="true">
    	<property name="brokerURL"  
value="reliable:(tcp://localhost:61616%3FsoTimeout=5000,tcp://localhost:
61617%3FsoTimeout=5000)?maximumRetries=0&amp;establishConnectionTimeout=
21000&amp;keepAliveTimeout=300000"/>
    </bean>

(in our production environment the brokers are at separate IP
addresses.)

In this scenario we have experienced random, but frequent, timeouts at the
client. It seems that the outgoing message gets to the server fine
(resulting in a database operation), but the response to the client is
either lost or delayed.

We have also experienced this error at the client: 
	org.logicblaze.lingo.jms.impl.MultiplexingRequestor: Response received for
unknown request

And in some scenarios we have seen the same request being processed more
than once on the server.

Suffice it to say, for now we are running just one broker in production!

For the record, our client is Swing, and we are forced in our organization
to use the Oracle OC4J container to host the JMS in production. However,
thanks to Spring/Lingo we are able to run independent of the container in
development environments and unit tests.

I have written unit tests to simulate sending many client requests on five
different threads. Some requests are on Topics and others on temporary
Queues.

We have many theories but no conclusions as yet. We experience different
problems in various different scenarios. We have lost connections and
experienced timeouts when taking down a broker and then starting it back up.
We have experienced different problems when the container has to be
restarted.
We have run our tests successfully several times with just one broker in
place.

We have several theories we are evaluating - your input as to which you feel
is most likely would be much appreciated:
1) That our configuration is somehow flawed.
2) That Lingo has not been thoroughly tested in the scenario described above
and either has bugs to be resolved or is not designed for this scenario.
3) That ActiveMQ is somehow dropping connections, or not correlating request
correctly between the brokers.
4) That there is a problem with the container intercepting/blocking the
requests. - We deem this unlikely since we are connecting directly to the
JMS brokers through the port addresses defined in Lingo.

So without going into too much detail at this point, can you confirm
that:
1) Lingo and a network of ActiveMQ brokers is a feasible combination for
realiable failover in an enterprise environment.
2) That temporary queues and topics are not lost or dropped in the
dual-broker scenario.
3) That the container is not intercepting or blocking requests in the
dual-broker scenario. (JTA conflicts?)
	(Note that OC4J is not JMS 1.1 compliant, but we are overriding its JMS
with embedded ActiveMQ in production.)

A dump of the exceptions is included below:

Concurrent timeout exception:

Cannot access JMS invoker remote service at [null]; nested exception is
javax.jms.JMSException: EDU.oswego.cs.dl.util.concurrent.TimeoutException
org.springframework.remoting.RemoteAccessException: Cannot access JMS
invoker remote service at [null]; nested exception is
javax.jms.JMSException: EDU.oswego.cs.dl.util.concurrent.TimeoutException
javax.jms.JMSException: EDU.oswego.cs.dl.util.concurrent.TimeoutException
	at
org.logicblaze.lingo.jms.impl.MultiplexingRequestor.createJMSException(MultiplexingRequestor.java:156)
...etc

Response received for unknown request:

 WARN: org.logicblaze.lingo.jms.impl.MultiplexingRequestor: Response
received for unknown request: ACTIVEMQ_OBJECT_MESSAGE: id = 0
ActiveMQMessage{ , jmsMessageID = ID:AD050003-3535-1149547683183-64:35,
bodyAsBytes = org.activemq.io.util.ByteArray@1eef2c, readOnlyMessage = true,
jmsClientID = 'ID:AD050003-1862-1149537087845-6:0' , jmsCorrelationID = '52'
, jmsDestination =
TemporaryQueue-{TD{ID:AD050003-3564-1149547697636-89:0}TD}ID:AD050003-35
64-1149547697636-97:0, jmsReplyTo = null, jmsDeliveryMode = 1,
jmsRedelivered = false, jmsType = 'null' , jmsExpiration = 0, jmsPriority =
4, jmsTimestamp = 1149548125466, properties = null, readOnlyProperties =
true, entryBrokerName = 'broker1' , entryClusterName = 'default' ,
consumerNos = [0], transactionId = 'null'
, xaTransacted = false, consumerIdentifer =
'ID:AD050003-3564-1149547697636-89:0.2.1' , messageConsumed = false,
transientConsumed = false, sequenceNumber = 97, deliveryCount = 1,
dispatchedFromDLQ = false, messageAcknowledge =
org.activemq.ActiveMQSession@1bf7b23, jmsMessageIdentity = null, producerKey
= ID:AD050003-1862-1149537087845-627: } ActiveMQObjectMessage{ object =
org.springframework.remoting.support.RemoteInvocationResult@d99277 }

This one occurs when the request is processed twice, usually happens if a
broker is taken down and then restarted:

org.springframework.dao.DataIntegrityViolationException: Hibernate
operation: Could not execute JDBC batch update; SQL []; ORA-00001:
unique constraint (ABIGGS.OV_PARID_EFFDT_EXP_OVERRIDE_AK) violated ; nested
exception is java.sql.BatchUpdateException: ORA-00001: unique constraint
(ABIGGS.OV_PARID_EFFDT_EXP_OVERRIDE_AK) violated

--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4742177
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by red3 <al...@aeso.ca>.

James,

The ActiveMQ 4 M5 release which is quoted as being the only release this
feature is available in by this article does not seem to be available for
download.



James.Strachan wrote:
> 
> If you want reliability and high availability i'd recommend switching
> to using ActiveMQ 4.0 with a Master/Slave configuration which works
> much better in a request-reply type of way than a store/forward
> network of brokers in 3.1.
> 
> http://incubator.apache.org/activemq/masterslave.html
> 
> James
> -------
> http://radio.weblogs.com/0112098/
> 
> 
--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4760780
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by red3 <al...@aeso.ca>.

Update: I put ActiveMQ 4.0 release in place this week and the situation has
not improved any. Timeouts are still occurring regularly.

We are suspicious about the handling of temporary queues with Networks of
Brokers and I have posted my findings in a new thread.
--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4895500
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by red3 <al...@aeso.ca>.

James.Strachan wrote:
> 
>> >
>> > Could you (or someone) please clarify?
>> >
>> >
>>
>>
>> If I am not mistaken, the failover is not a feature of network brokers
>> and
>> master/slave topologies.  Failover is handled by the JMS client or other
>> failover solutions.
>>
>> For the failover to work, the JMS client must be aware of the brokers in
>> the
>> network (failover:(tcp://master1:61616,tcp://master2:61616)).  Please
>> refer
>> to http://www.activemq.org/site/configuring-transports.html for
>> configuring
>> the client connection.
> 
> Agreed. Its a complex area this isn't it :)
> 
> So failover of connections is a client side feature- if the broker
> goes down the client can failover to another broker and resend any
> in-progress messages and acknowledgements.
> 
> The problem with networks is that they are simple store/forward by
> nature - a message is owned by one broker or the other. If a broker
> goes down, messages stay on that failed brokers disk until it comes
> back up. For some folks this is fine.
> 
> ...
> 
> So if you are ever in doubt, just use master/slave :). Networks are
> really only for store/forward only
> --
> 
> James
> -------
> http://radio.weblogs.com/0112098/
> 
> 

Thanks for your input James. I finally feel like we're getting somewhere.

I think we only need the forwarding feature of the broker networks. To
explain why, I think I need to clarify a couple of things:

- We are not persisting messages in the brokers. 

- We are using ActiveMQ as part of the Message Driven POJO strategy with
Spring and Lingo. 
[Ref this article by Craig Walls:
http://www.jroller.com/page/habuma/20050722]

- We have decided that the loss of a message here or there in a situation
where a broker fails in a catastrophic way is OK. It's rare and is usually a
sign of more serious issues. What we can't live with is a system where in
normal everyday operation message delivery is delayed or times out
regularly; or a response to a successful call is not received in a timely
fashion (ideally sub-second) by the client that made it.

We have two messaging scenarios within our app, which may be causing some
confusion:
1) Publish/subscribe: We have services which publish their result sets to
other services which then act on those results to create a new result set
which eventually get published to the client. These messages use Topics set
up using Lingo. All services and the client are JMS listeners. Think of it
as a matrix of spreadsheets, which are all interconnected, having
dependencies on one another. SOA and ESB are acronyms which definitely apply
here. 
2) Transactional messaging: Here we are using Spring/Lingo and ActiveMQ
together as a remote procedure call transport to replace RMI, if you will.
We make a transparent call through Lingo and expect a callback (or an
exception). We are setting these messages up on Queues configured in Lingo.
In this scenario we are having the most problems. This is where we are
getting timeouts and other issues.

We need this to all work against multiple brokers in a failover
configuration, because our scenario cannot allow for the case where no
services are available and the system comes to a halt. (We absolutely need
24/7 uptime.) It's OK if a request for an update does not complete, due to a
broker failure, because another request will be made a minute or two later.
As long as that second one is serviced by an alternate broker on a separate
physical server.

What we do need is for:
a) Transactional processing under Lingo to behave in a similar way to a
traditional remote procedure call client/server style system.
b) For the callback to be guaranteed in normal operation (no "response
received for unknown request" problems and minimal timeouts. One rare
timeout when a broker goes down is acceptable. However, we are also
experiencing lingering problems after recovery of a broker service.)
c) When a broker goes down, it can be brought back up easily, preferably
automatically. (Unless there's a catastrophic failure, like a hardware or
network failure, of course.)

So here are our current thoughts:
We will evaluate ActiveMQ 4's network of brokers by running our tests
through it to see if the problems we have using ActiveMQ 3's networks are
diminished. (This is one of the questions we were trying to get an answer to
in the original post. The discussion on Master/Slave has kind of clouded the
issue unfortunately. It looks like a great feature, but I don't think it's
for us.)

If our tests still fail as badly as they are now, I think we will have to
completely reconsider our strategy.

Master/Slave does not help us, because it is too convaluted to bring up the
master once it has failed. (It seems to be more designed as an insurance
policy for catastrophic failure than a failover/load balancing solution.)
And besides, it's a VERY new feature. At this stage in our project's
lifecycle, frankly, it's a risk for us.

Any advice on what else we should be looking at to solve our issues and get
a reliable SOA in place would be greatly appreciated.

Best regards,

Alan

P.S. I know we're in the thick of it, but I also want to go on record by
saying that ActiveMQ and Lingo have been the core of our architectural
design and we are delighted by the ability to run our systems at any level:
locally in development on one machine as a simple java app from the command
line or in a distributed production environment embedded within a container,
and mocked out in our unit tests. Once we get through these issues we will
be very happy to be ActiveMQ advocates.

--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4777704
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by James Strachan <ja...@gmail.com>.

On 6/7/06, amerigo5 <am...@hotmail.com> wrote:
>
>
> red3 wrote:
> >
> >
> > amerigo5 wrote:
> >>
> >> My solution to have a HA system is to combine Network of Brokers and
> >> Master/Slave topologies.
> >>
> >> Assumptions: Network of Brokers is a clustering solution.  Master/Slave
> >> is a replication solution.
> >>
> >> My setup would be:
> >>
> >> 1. Have a network of master brokers ie master1 and master2.
> >> 2. Have 2 slave brokers ie slave1 (master1 slave) and slave2 (master2
> >> slave).  Slave1 and slave2 brokers are not networked.
> >> 3. Configure the slave broker so that when its master broker shuts down
> >> (or fails), it would also shut down (activemq.xml:
> >> shutdownOnMasterFailure=true). Please refer to
> >> http://www.activemq.org/site/masterslave.html for setting this
> >> configuration.
> >>
> >> This is setup is not a perfect HA solution but it is close.
> >>
> >> Thanks.
> >>
> >
> > Forgive my confusion, but James said that master/slave was probably the
> > situation we want, since we want to failover to the slave broker in the
> > event that the master fails.
> >
> > However, you are telling me that the failover is a feature of the network
> > of brokers, and to shut down the slave in the event of master failure and
> > failover to the other master/slave combination.
> >
> > On top of this we have encountered some serious anomolies in networked
> > broker scenarios (with ActiveMQ 3) and have had to switch back to using
> > just one ActiveMQ broker in production.
> >
> > Could you (or someone) please clarify?
> >
> >
>
>
> If I am not mistaken, the failover is not a feature of network brokers and
> master/slave topologies.  Failover is handled by the JMS client or other
> failover solutions.
>
> For the failover to work, the JMS client must be aware of the brokers in the
> network (failover:(tcp://master1:61616,tcp://master2:61616)).  Please refer
> to http://www.activemq.org/site/configuring-transports.html for configuring
> the client connection.

Agreed. Its a complex area this isn't it :)

So failover of connections is a client side feature- if the broker
goes down the client can failover to another broker and resend any
in-progress messages and acknowledgements.

The problem with networks is that they are simple store/forward by
nature - a message is owned by one broker or the other. If a broker
goes down, messages stay on that failed brokers disk until it comes
back up. For some folks this is fine.

The problem though is that you may get duplicates or out of order
issues with queues if you use a store/forward network as an attempt at
making a HA cluster - since in a network one broker just forwards
things to another - there is no master/slave synchronisation.

With master/slave on the other hand, the slave is fully aware of whats
going on, what the master is doing and so if a failure occurs, the
clients can replay any in-progress messages/acks and things work as
you'd expect.

In many ways, the ActiveMQ project should have done Master/Slave first
- since thats what most people actually want and the main use case for
why people try to use networks - then done store/forward networks
afterwards which is a much less used feature. e.g. store/forward is
good for doing WAN optimisations, moving messages from one broker to
another due to the location of consumers and producers being separated
over a WAN etc.

i.e. you can use networks to kinda do a kinda HA cluster, but it can
lead to duplicates and can lead to messages sat on a disk for a long
time until the old broker comes back. With Master/Slave you can loose
the entire machine & hard disk of the master and you don't loose a
message or get duplicates; things just work.

So if you are ever in doubt, just use master/slave :). Networks are
really only for store/forward only
--

James
-------
http://radio.weblogs.com/0112098/

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by "Christopher G. Stach II" <cg...@ldsys.net>.

James Strachan wrote:
> On 6/8/06, Christopher G. Stach II <cg...@ldsys.net> wrote:
>> James Strachan wrote:
>> > Note that the client side reliable:// in AMQ3 and failover:// in AMQ4
>> > (BTW it was renamed in 4 to make it explicit what it does - failover
>> > to a new connection, thats all) -purely does client side reconnection
>> > - it does not implement HA clustering where messages are replicated to
>> > multiple brokers in case one fails and you loose the disk; only
>> > Master/Slave does that - which is a broker-side configuration.
>> >
>> > Though store/forward and Master/Slave depends on the client using
>> failover.
>>
>> What happens when a client hiccups and does failover to the slave, but
>> all of the other clients are still using the master?
> 
> The slave doesn't accept new connections  until the master fails.
> 

Awesome!  Thanks.

-- 
Christopher G. Stach II

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by James Strachan <ja...@gmail.com>.

On 6/12/06, Thomas Swindells <ts...@ndsuk.com> wrote:
> James.Strachan wrote:
> >
> > Yes. Ultimately we'd like a quorum based system where you could have a
> > cluster of N nodes, say N=3 so that so long as 2 broker nodes are
> > communicating, you are fine and the 1 broker node that can't see
> > anyone else would shut down - then you can handle network splits OK.
> >
> > Until we get there I'd recommend running clients on different machines
> > to the master/slave broker unless you are prepared to handle network
> > splits manually when they occur.
> >
>
> Has there been any design work on doing this,

We've had a few design sessions in bars on the back of envelopes ;)
But so far thats about it.


> or (more importantly) on
> automatic replication of state between brokers such that a master can become
> a slave to the (old) slave when it comes back up (so that it would actually
> be possible to use it in a HA system). Has there been any
> designs/brainstorms on how this synchronization should occour/the
> appropraite data gathered?

No - am afraid not.

Its probably easiest to just suspend a broker, checkpoint to disk,
then rsync the file system across to the other broker.

Another option could be to suspend the broker, force a checkpoint in
the journal to JDBC; then do a JDBC <-> JDBC synchronisation; though
this approach will only work with JDBC based persistence and not kaha
(the file system based persistence model)

-- 

James
-------
http://radio.weblogs.com/0112098/

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by Thomas Swindells <ts...@ndsuk.com>.

James.Strachan wrote:
> 
> Yes. Ultimately we'd like a quorum based system where you could have a
> cluster of N nodes, say N=3 so that so long as 2 broker nodes are
> communicating, you are fine and the 1 broker node that can't see
> anyone else would shut down - then you can handle network splits OK.
> 
> Until we get there I'd recommend running clients on different machines
> to the master/slave broker unless you are prepared to handle network
> splits manually when they occur.
> 

Has there been any design work on doing this, or (more importantly) on
automatic replication of state between brokers such that a master can become
a slave to the (old) slave when it comes back up (so that it would actually
be possible to use it in a HA system). Has there been any
designs/brainstorms on how this synchronization should occour/the
appropraite data gathered?

Thomas 
--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4828810
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by James Strachan <ja...@gmail.com>.

On 6/12/06, Thomas Swindells <ts...@ndsuk.com> wrote:
> James.Strachan wrote:
> > On 6/12/06, Thomas Swindells <ts...@ndsuk.com> wrote:
> >> James.Strachan wrote:
> >> What happens if the slave and a client hiccups and thinks the master has
> >> failed?
> >
> > If the slave and the clients think the master is down then failover to
> > the slave will occur.
> >
> > Note that both the clients and the slave must think the master is
> > down. e.g. if the clients think the master is down but the slave can
> > still see the master, then the clients will not be able to connect to
> > the slave (as it doesn't allow connections until the master has failed
> > over), so they will reconnect to the master again.
> >
>
> Presumably a problem would occour on (say) a 2 host system set up as
> A[master + clientA] and B[slave+clientB] in this situation if there is a
> network hiccup between the two hosts would you end up with with a bipartite
> system - A and B totally seperate not talking to each other?

Yes. Ultimately we'd like a quorum based system where you could have a
cluster of N nodes, say N=3 so that so long as 2 broker nodes are
communicating, you are fine and the 1 broker node that can't see
anyone else would shut down - then you can handle network splits OK.

Until we get there I'd recommend running clients on different machines
to the master/slave broker unless you are prepared to handle network
splits manually when they occur.

-- 

James
-------
http://radio.weblogs.com/0112098/

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by Thomas Swindells <ts...@ndsuk.com>.


James.Strachan wrote:
> 
> On 6/12/06, Thomas Swindells <ts...@ndsuk.com> wrote:
>> James.Strachan wrote:
>> What happens if the slave and a client hiccups and thinks the master has
>> failed?
> 
> If the slave and the clients think the master is down then failover to
> the slave will occur.
> 
> Note that both the clients and the slave must think the master is
> down. e.g. if the clients think the master is down but the slave can
> still see the master, then the clients will not be able to connect to
> the slave (as it doesn't allow connections until the master has failed
> over), so they will reconnect to the master again.
> 

Presumably a problem would occour on (say) a 2 host system set up as
A[master + clientA] and B[slave+clientB] in this situation if there is a
network hiccup between the two hosts would you end up with with a bipartite
system - A and B totally seperate not talking to each other?

Thomas
--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4828608
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by James Strachan <ja...@gmail.com>.

On 6/12/06, Thomas Swindells <ts...@ndsuk.com> wrote:
> James.Strachan wrote:
> >
> > On 6/8/06, Christopher G. Stach II <cg...@ldsys.net> wrote:
> >> James Strachan wrote:
> >> > Note that the client side reliable:// in AMQ3 and failover:// in AMQ4
> >> > (BTW it was renamed in 4 to make it explicit what it does - failover
> >> > to a new connection, thats all) -purely does client side reconnection
> >> > - it does not implement HA clustering where messages are replicated to
> >> > multiple brokers in case one fails and you loose the disk; only
> >> > Master/Slave does that - which is a broker-side configuration.
> >> >
> >> > Though store/forward and Master/Slave depends on the client using
> >> failover.
> >>
> >> What happens when a client hiccups and does failover to the slave, but
> >> all of the other clients are still using the master?
> >
> > The slave doesn't accept new connections  until the master fails.
> >
> What happens if the slave and a client hiccups and thinks the master has
> failed?

If the slave and the clients think the master is down then failover to
the slave will occur.

Note that both the clients and the slave must think the master is
down. e.g. if the clients think the master is down but the slave can
still see the master, then the clients will not be able to connect to
the slave (as it doesn't allow connections until the master has failed
over), so they will reconnect to the master again.

-- 

James
-------
http://radio.weblogs.com/0112098/

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by Thomas Swindells <ts...@ndsuk.com>.


James.Strachan wrote:
> 
> On 6/8/06, Christopher G. Stach II <cg...@ldsys.net> wrote:
>> James Strachan wrote:
>> > Note that the client side reliable:// in AMQ3 and failover:// in AMQ4
>> > (BTW it was renamed in 4 to make it explicit what it does - failover
>> > to a new connection, thats all) -purely does client side reconnection
>> > - it does not implement HA clustering where messages are replicated to
>> > multiple brokers in case one fails and you loose the disk; only
>> > Master/Slave does that - which is a broker-side configuration.
>> >
>> > Though store/forward and Master/Slave depends on the client using
>> failover.
>>
>> What happens when a client hiccups and does failover to the slave, but
>> all of the other clients are still using the master?
> 
> The slave doesn't accept new connections  until the master fails.
> 
What happens if the slave and a client hiccups and thinks the master has
failed? 

Thomas
--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4828428
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by James Strachan <ja...@gmail.com>.

On 6/8/06, Christopher G. Stach II <cg...@ldsys.net> wrote:
> James Strachan wrote:
> > Note that the client side reliable:// in AMQ3 and failover:// in AMQ4
> > (BTW it was renamed in 4 to make it explicit what it does - failover
> > to a new connection, thats all) -purely does client side reconnection
> > - it does not implement HA clustering where messages are replicated to
> > multiple brokers in case one fails and you loose the disk; only
> > Master/Slave does that - which is a broker-side configuration.
> >
> > Though store/forward and Master/Slave depends on the client using failover.
>
> What happens when a client hiccups and does failover to the slave, but
> all of the other clients are still using the master?

The slave doesn't accept new connections  until the master fails.

-- 

James
-------
http://radio.weblogs.com/0112098/

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by "Christopher G. Stach II" <cg...@ldsys.net>.

James Strachan wrote:
> Note that the client side reliable:// in AMQ3 and failover:// in AMQ4
> (BTW it was renamed in 4 to make it explicit what it does - failover
> to a new connection, thats all) -purely does client side reconnection
> - it does not implement HA clustering where messages are replicated to
> multiple brokers in case one fails and you loose the disk; only
> Master/Slave does that - which is a broker-side configuration.
> 
> Though store/forward and Master/Slave depends on the client using failover.

What happens when a client hiccups and does failover to the slave, but
all of the other clients are still using the master?

-- 
Christopher G. Stach II

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by James Strachan <ja...@gmail.com>.

On 6/7/06, red3 <al...@aeso.ca> wrote:
>
> If you read my original post at the start of the thread you will see that I
> have already read this documentation and that we have the reliable://
> transport in place (AMQ 3), which if you will read the documentation is the
> same as failover:// and was renamed for ActiveMQ 4.
>
> Our problems are with the unreliable behaviour experienced when this
> architecture is in place.

Note that the client side reliable:// in AMQ3 and failover:// in AMQ4
(BTW it was renamed in 4 to make it explicit what it does - failover
to a new connection, thats all) -purely does client side reconnection
- it does not implement HA clustering where messages are replicated to
multiple brokers in case one fails and you loose the disk; only
Master/Slave does that - which is a broker-side configuration.

Though store/forward and Master/Slave depends on the client using failover.

---
James
-------
http://radio.weblogs.com/0112098/

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by red3 <al...@aeso.ca>.

If you read my original post at the start of the thread you will see that I
have already read this documentation and that we have the reliable://
transport in place (AMQ 3), which if you will read the documentation is the
same as failover:// and was renamed for ActiveMQ 4.

Our problems are with the unreliable behaviour experienced when this
architecture is in place.

--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4762360
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by amerigo5 <am...@hotmail.com>.


red3 wrote:
> 
> 
> amerigo5 wrote:
>> 
>> My solution to have a HA system is to combine Network of Brokers and
>> Master/Slave topologies.
>> 
>> Assumptions: Network of Brokers is a clustering solution.  Master/Slave
>> is a replication solution.
>> 
>> My setup would be: 
>> 
>> 1. Have a network of master brokers ie master1 and master2.
>> 2. Have 2 slave brokers ie slave1 (master1 slave) and slave2 (master2
>> slave).  Slave1 and slave2 brokers are not networked.
>> 3. Configure the slave broker so that when its master broker shuts down
>> (or fails), it would also shut down (activemq.xml:
>> shutdownOnMasterFailure=true). Please refer to
>> http://www.activemq.org/site/masterslave.html for setting this
>> configuration.
>> 
>> This is setup is not a perfect HA solution but it is close.
>> 
>> Thanks.
>> 
> 
> Forgive my confusion, but James said that master/slave was probably the
> situation we want, since we want to failover to the slave broker in the
> event that the master fails.
> 
> However, you are telling me that the failover is a feature of the network
> of brokers, and to shut down the slave in the event of master failure and
> failover to the other master/slave combination.
> 
> On top of this we have encountered some serious anomolies in networked
> broker scenarios (with ActiveMQ 3) and have had to switch back to using
> just one ActiveMQ broker in production.
> 
> Could you (or someone) please clarify?
> 
> 


If I am not mistaken, the failover is not a feature of network brokers and
master/slave topologies.  Failover is handled by the JMS client or other
failover solutions.

For the failover to work, the JMS client must be aware of the brokers in the
network (failover:(tcp://master1:61616,tcp://master2:61616)).  Please refer
to http://www.activemq.org/site/configuring-transports.html for configuring
the client connection.

Thanks.
--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4759834
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by red3 <al...@aeso.ca>.

amerigo5 wrote:
> 
> My solution to have a HA system is to combine Network of Brokers and
> Master/Slave topologies.
> 
> Assumptions: Network of Brokers is a clustering solution.  Master/Slave is
> a replication solution.
> 
> My setup would be: 
> 
> 1. Have a network of master brokers ie master1 and master2.
> 2. Have 2 slave brokers ie slave1 (master1 slave) and slave2 (master2
> slave).  Slave1 and slave2 brokers are not networked.
> 3. Configure the slave broker so that when its master broker shuts down
> (or fails), it would also shut down (activemq.xml:
> shutdownOnMasterFailure=true). Please refer to
> http://www.activemq.org/site/masterslave.html for setting this
> configuration.
> 
> This is setup is not a perfect HA solution but it is close.
> 
> Thanks.
> 

Forgive my confusion, but James said that master/slave was probably the
situation we want, since we want to failover to the slave broker in the
event that the master fails.

However, you are telling me that the failover is a feature of the network of
brokers, and to shut down the slave in the event of master failure and
failover to the other master/slave combination.

On top of this we have encountered some serious anomolies in networked
broker scenarios (with ActiveMQ 3) and have had to switch back to using just
one ActiveMQ broker in production.

Could you (or someone) please clarify?

--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4757892
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by amerigo5 <am...@hotmail.com>.

My solution to have a HA system is to combine Network of Brokers and
Master/Slave topologies.

Assumptions: Network of Brokers is a clustering solution.  Master/Slave is a
replication solution.

My setup would be: 

1. Have a network of master brokers ie master1 and master2.
2. Have 2 slave brokers ie slave1 (master1 slave) and slave2 (master2
slave).  Slave1 and slave2 brokers are not networked.
3. Configure the slave broker so that when its master broker shuts down (or
fails), it would also shut down (activemq.xml:
shutdownOnMasterFailure=true). Please refer to
http://www.activemq.org/site/masterslave.html for setting this
configuration.

This is setup is not a perfect HA solution but it is close.

Thanks.
--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4756929
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by red3 <al...@aeso.ca>.

James.Strachan wrote:
> 
> On 6/7/06, red3 <al...@aeso.ca> wrote:
> 
>> What are the known problems with the existing networks-of-brokers
>> strategy
> 
> Its a long story; but basically lots of bugs have been fixed since 3.x
> 

So would you wholeheartedly recommend using networks of brokers in ActiveMQ
4?
What scenarios is it suited for?
Or is this not really suited to our application? (Lingo as a RPC from SWING
to server-side services/plus pub/sub between services with reliable
failover.)

The only way we can say this is reliable right now is to use only one broker
as I see it, which defeats the object of our HA environment.

James.Strachan wrote:
> 
>> and why was it necessary to introduce the master-slave strategy?
> 
> Master/Slave is quite different to Networks. Networks are about
> storing and forwarding messages from broker to broker. Master/Slave is
> about replicating messages to a pair of brokers so that if a broker
> goes down the other is a hot standby and can failover fast (which
> seems to be what folks want most of the time).
> 
> In the 3.x days folks would often use networks when they really wanted
> master/slave
> 

What we want is reliable failover! If this is the recommended way to do it
then great. However, it will be tough for me to convince the operations guys
to change strategy (and move up to a major, pre-release milestone) at this
stage in the game (for this project, anyway!)

James.Strachan wrote:
> 
>> What if we wanted more than two brokers in a master-slave configuration?
>> Is
>> this possible?
> 
> You can have pairs of master/slave brokers in a network. But we don't
> support more than 1 slave right now.
> 

But would you honestly recommend this in a production environment at this
early stage of the feature's development?

James.Strachan wrote:
> 
>> After a master goes down, how do you intend that it is restarted?
>> Manually?
> 
> So the slave becomes the master automatically. If you want to bring
> the old master back online later on - it will be out of sync so you
> have to take the slave down, copy its files to the old master then
> restart the old master. Thats a manual process right now.
> 

I'm sorry, but at the risk of sounding facetious (which is not my
intention), this doesn't sound like uninterrupted failover to me.

What if you bring up a new broker as the slave to the new master?
Is that feasible, or is it just as messy?

--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4756297
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by James Strachan <ja...@gmail.com>.

On 6/7/06, red3 <al...@aeso.ca> wrote:
>
> Thanks for your response, James.
>
> What are the known problems with the existing networks-of-brokers strategy

Its a long story; but basically lots of bugs have been fixed since 3.x

> and why was it necessary to introduce the master-slave strategy?

Master/Slave is quite different to Networks. Networks are about
storing and forwarding messages from broker to broker. Master/Slave is
about replicating messages to a pair of brokers so that if a broker
goes down the other is a hot standby and can failover fast (which
seems to be what folks want most of the time).

In the 3.x days folks would often use networks when they really wanted
master/slave

> What if we wanted more than two brokers in a master-slave configuration? Is
> this possible?

You can have pairs of master/slave brokers in a network. But we don't
support more than 1 slave right now.

> After a master goes down, how do you intend that it is restarted? Manually?

So the slave becomes the master automatically. If you want to bring
the old master back online later on - it will be out of sync so you
have to take the slave down, copy its files to the old master then
restart the old master. Thats a manual process right now.

-- 

James
-------
http://radio.weblogs.com/0112098/

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by red3 <al...@aeso.ca>.

Thanks for your response, James.

What are the known problems with the existing networks-of-brokers strategy
and why was it necessary to introduce the master-slave strategy?

What if we wanted more than two brokers in a master-slave configuration? Is
this possible?
After a master goes down, how do you intend that it is restarted? Manually?

--
View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4754823
Sent from the ActiveMQ - User forum at Nabble.com.

Re: [Un]reliable:// network of AMQ brokers with Lingo

Posted by James Strachan <ja...@gmail.com>.

If you want reliability and high availability i'd recommend switching
to using ActiveMQ 4.0 with a Master/Slave configuration which works
much better in a request-reply type of way than a store/forward
network of brokers in 3.1.

http://incubator.apache.org/activemq/masterslave.html

On 6/6/06, red3 <al...@aeso.ca> wrote:
>
> We are using Lingo 1.1 and ActiveMQ 3.1 to achieve RPC through Spring and JMS
> (Message-Driven POJOs).
>
> Things were fine until... We set up two ActiveMQ brokers in a
> reliable/failover configuration.
> We use the reliable: protocol with two fixed port addresses.
> E.g.
>     <bean id="jmsBrokerUrl" abstract="true">
>         <property name="brokerURL"
> value="reliable:(tcp://localhost:61616%3FsoTimeout=5000,tcp://localhost:
> 61617%3FsoTimeout=5000)?maximumRetries=0&amp;establishConnectionTimeout=
> 21000&amp;keepAliveTimeout=300000"/>
>     </bean>
>
> (in our production environment the brokers are at separate IP
> addresses.)
>
> In this scenario we have experienced random, but frequent, timeouts at the
> client. It seems that the outgoing message gets to the server fine
> (resulting in a database operation), but the response to the client is
> either lost or delayed.
>
> We have also experienced this error at the client:
>         org.logicblaze.lingo.jms.impl.MultiplexingRequestor: Response received for
> unknown request
>
> And in some scenarios we have seen the same request being processed more
> than once on the server.
>
> Suffice it to say, for now we are running just one broker in production!
>
> For the record, our client is Swing, and we are forced in our organization
> to use the Oracle OC4J container to host the JMS in production. However,
> thanks to Spring/Lingo we are able to run independent of the container in
> development environments and unit tests.
>
> I have written unit tests to simulate sending many client requests on five
> different threads. Some requests are on Topics and others on temporary
> Queues.
>
> We have many theories but no conclusions as yet. We experience different
> problems in various different scenarios. We have lost connections and
> experienced timeouts when taking down a broker and then starting it back up.
> We have experienced different problems when the container has to be
> restarted.
> We have run our tests successfully several times with just one broker in
> place.
>
> We have several theories we are evaluating - your input as to which you feel
> is most likely would be much appreciated:
> 1) That our configuration is somehow flawed.
> 2) That Lingo has not been thoroughly tested in the scenario described above
> and either has bugs to be resolved or is not designed for this scenario.
> 3) That ActiveMQ is somehow dropping connections, or not correlating request
> correctly between the brokers.
> 4) That there is a problem with the container intercepting/blocking the
> requests. - We deem this unlikely since we are connecting directly to the
> JMS brokers through the port addresses defined in Lingo.
>
> So without going into too much detail at this point, can you confirm
> that:
> 1) Lingo and a network of ActiveMQ brokers is a feasible combination for
> realiable failover in an enterprise environment.
> 2) That temporary queues and topics are not lost or dropped in the
> dual-broker scenario.
> 3) That the container is not intercepting or blocking requests in the
> dual-broker scenario. (JTA conflicts?)
>         (Note that OC4J is not JMS 1.1 compliant, but we are overriding its JMS
> with embedded ActiveMQ in production.)
>
> A dump of the exceptions is included below:
>
> Concurrent timeout exception:
>
> Cannot access JMS invoker remote service at [null]; nested exception is
> javax.jms.JMSException: EDU.oswego.cs.dl.util.concurrent.TimeoutException
> org.springframework.remoting.RemoteAccessException: Cannot access JMS
> invoker remote service at [null]; nested exception is
> javax.jms.JMSException: EDU.oswego.cs.dl.util.concurrent.TimeoutException
> javax.jms.JMSException: EDU.oswego.cs.dl.util.concurrent.TimeoutException
>         at
> org.logicblaze.lingo.jms.impl.MultiplexingRequestor.createJMSException(MultiplexingRequestor.java:156)
> ...etc
>
> Response received for unknown request:
>
>  WARN: org.logicblaze.lingo.jms.impl.MultiplexingRequestor: Response
> received for unknown request: ACTIVEMQ_OBJECT_MESSAGE: id = 0
> ActiveMQMessage{ , jmsMessageID = ID:AD050003-3535-1149547683183-64:35,
> bodyAsBytes = org.activemq.io.util.ByteArray@1eef2c, readOnlyMessage = true,
> jmsClientID = 'ID:AD050003-1862-1149537087845-6:0' , jmsCorrelationID = '52'
> , jmsDestination =
> TemporaryQueue-{TD{ID:AD050003-3564-1149547697636-89:0}TD}ID:AD050003-35
> 64-1149547697636-97:0, jmsReplyTo = null, jmsDeliveryMode = 1,
> jmsRedelivered = false, jmsType = 'null' , jmsExpiration = 0, jmsPriority =
> 4, jmsTimestamp = 1149548125466, properties = null, readOnlyProperties =
> true, entryBrokerName = 'broker1' , entryClusterName = 'default' ,
> consumerNos = [0], transactionId = 'null'
> , xaTransacted = false, consumerIdentifer =
> 'ID:AD050003-3564-1149547697636-89:0.2.1' , messageConsumed = false,
> transientConsumed = false, sequenceNumber = 97, deliveryCount = 1,
> dispatchedFromDLQ = false, messageAcknowledge =
> org.activemq.ActiveMQSession@1bf7b23, jmsMessageIdentity = null, producerKey
> = ID:AD050003-1862-1149537087845-627: } ActiveMQObjectMessage{ object =
> org.springframework.remoting.support.RemoteInvocationResult@d99277 }
>
> This one occurs when the request is processed twice, usually happens if a
> broker is taken down and then restarted:
>
> org.springframework.dao.DataIntegrityViolationException: Hibernate
> operation: Could not execute JDBC batch update; SQL []; ORA-00001:
> unique constraint (ABIGGS.OV_PARID_EFFDT_EXP_OVERRIDE_AK) violated ; nested
> exception is java.sql.BatchUpdateException: ORA-00001: unique constraint
> (ABIGGS.OV_PARID_EFFDT_EXP_OVERRIDE_AK) violated
>
> --
> View this message in context: http://www.nabble.com/-Un-reliable%3A---network-of-AMQ-brokers-with-Lingo-t1744760.html#a4742177
> Sent from the ActiveMQ - User forum at Nabble.com.
>
>


-- 

James
-------
http://radio.weblogs.com/0112098/