You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by photonikd <br...@me.com> on 2013/02/04 21:35:50 UTC

Networked brokers dropping/suppressing messages

Hello All, 

I'm currently in the process of configuring 2 ActiveMQ (5.7.0) brokers with
the primary objective to use persistant queues to sync messages from
Datacenter 1, to a broker in Datacenter 2 (and vice versa) via a static
openwire Network Connector. After building the 2 brokers and running some
validation tests, I am experiencing sporadic behavior where some messages
that are sent to the remote broker seem to never make it to their
destination queue. Here is a "visual" representation of the configuration: 

Datacenter 1 Producer --> DC1 Broker: SYNC.OUT (Comp Q) --> DC2.SYNC (Phys
Q)(Statically Included Network Connector)--> Datacenter Broker DC2 -->
DC2.SYNC (Comp Q) --> SYNC.IN(Phys Q) (Consumer Attached to Q) 

As I mentioned perviously, the process is vice versa from DC2 back to DC1 
Datacenter 2 Producer --> DC2 Broker: SYNC.OUT (Comp Q) --> DC1.SYNC (Phys
Q)(Statically Included Network Connector)--> Datacenter Broker DC1 -->
DC1.SYNC (Comp Q) --> SYNC.IN(Phys Q) (Consumer Attached to Q) 

At first I was concerned that the messages were not actually being
transferred at a TCP level from datacenter to datacenter as the message
enqueue/dequeue count on the near end would increment/decrement as expected.
But, the far end queue would never appear to receive the message and
increment/decrement its queue. After enabling debug logging, I noticed that
when the lost message behavior would occur, I would see this entry in the
far end/receiving broker, indicating that the message did indeed arrive: 

suppressing duplicate message send
[ID:brokerDC2-60716-1359583622675-0:0:1:1:272] with producerSequenceId [272]
less than last stored: 328 |
org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport:
tcp:///x.x.x.x:37877@61617 

While searching out other users experiencing this problem, I discovered and
added the following configuration items to activemq.xml: 
(within the network connector configuration) 
suppressDuplicateQueueSubscriptions="false" 
suppressDuplicateTopicSubscriptions="false" 

(within the KahaDB configuration) 
<kahaDB maxFailoverProducersToTrack="0" />

Notes
In order to rule out Datacenter to Datacenter latency (the bandwidth between
these DCs is, let's say, abundant), I built the same environment on 2
virtual machines, locally and on the same VLAN. Regardless of the
connectivity situation, the problem still existed in either scenario. 

Test Cases
- After adding the above configuration items and restarting both ActiveMQ
instances, the problem would still occur. 
- After adding the above configuration items, removing the existing KahaDB
data directory, restarting the AmQ instance and letting the data directory
re-create itself, the problem DID NOT occur again.
- After restarting the brokers again, I was faced with this problem until I
again cleared out the KahaDB directory. 


So, I'm left with a few unanswered questions: 
- Why am I receiving duplicate messages or why does AMQ think this is
happening? 
- Did the configuration changes that I made (in conjunction with the Kaha
rebuild) actually resolve the issue, if so, why? 
- The Plot Twist! Currently, I am accomplishing message replication with a
5.3.0 environment without issue. 

Any insight is greatly appreciated. 
Thanks! 




--
View this message in context: http://activemq.2283324.n4.nabble.com/Networked-brokers-dropping-suppressing-messages-tp4662747.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Networked brokers dropping/suppressing messages

Posted by talavera <at...@gmail.com>.
Using "masterslave:" works, although the log still shows a somewhat weird
behaviour, I've pasted it below.

It looks like broker B doesn't realize of the connection failure and rejects
at first the new incoming network connector from broker A.
(I'm not testing a failover, but a lost connection between brokers)

Anyways, I started getting duplicates. It seems I needed
auditNetworkProducers="true" in the transport.
And then... lost messages again :(

And again they're being suppressed:
2013-05-27 16:08:55,298 | DEBUG | suppressing duplicate message send 
[ID:corerec1-48878-1369670925644-1:64:1:1:74] from network producer with
producerSequenceId [74] less than last stored: 76 |
org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport:
tcp:///10.95.178.21:37549@62618

I find the ID's root several times in the producer's log
(corerec1-48878-1369670925644-1:64)

Can these session/connection IDs get duplicated?

-----------------
2013-05-27 14:51:20,967 | WARN  | Transport (tcp://10.95.178.22:62618)
failed, reason:  org.apache.activemq.transport.InactivityIOException:
Channel was inactive for too (>30000) long: tcp://10.95.178.22:62618, not
attempting to automatically reconnect |
org.apache.activemq.transport.failover.FailoverTransport | ActiveMQ
InactivityMonitor Worker
2013-05-27 14:51:20,967 | INFO  | Network connection between
vm://rec1_master#0 and unconnected shutdown due to a local error:
org.apache.activemq.transport.InactivityIOException: Channel was inactive
for too (>30000) long: tcp://10.95.178.22:62618 |
org.apache.activemq.network.DemandForwardingBridgeSupport | ActiveMQ
InactivityMonitor Worker
2013-05-27 14:51:20,970 | INFO  | Establishing network connection from
vm://rec1_master?async=false&network=true to
failover:(tcp://corerec2:62618,nio://slave2:62602)?randomize=false&maxReconnectAttempts=0
| org.apache.activemq.network.DiscoveryNetworkConnector | Simple Discovery
Agent-1
2013-05-27 14:51:20,973 | INFO  | Successfully connected to
tcp://corerec2:62618 |
org.apache.activemq.transport.failover.FailoverTransport | ActiveMQ Failover
Worker: 46376251
2013-05-27 14:51:20,979 | WARN  | Failed to add Connection
ID:corerec1-60717-1369666090596-3:1, reason:
javax.jms.InvalidClientIDException: Broker: rec1_master - Client:
nc1_master_rec2_inbound_rec1_master already connected from
vm://rec1_master#0 | org.apache.activemq.broker.TransportConnection |
triggerStartAsyncNetworkBridgeCreation: remoteBroker=unconnected,
localBroker= vm://rec1_master#2
2013-05-27 14:51:20,980 | INFO  | Network connection between
vm://rec1_master#2 and tcp://corerec2:62618 shutdown due to a local error:
javax.jms.InvalidClientIDException: Broker: rec1_master - Client:
nc1_master_rec2_inbound_rec1_master already connected from
vm://rec1_master#0 |
org.apache.activemq.network.DemandForwardingBridgeSupport |
triggerStartAsyncNetworkBridgeCreation: remoteBroker=unconnected,
localBroker= vm://rec1_master#2
2013-05-27 14:51:20,984 | INFO  | rec1_master bridge to master_rec2 stopped
| org.apache.activemq.network.DemandForwardingBridgeSupport | ActiveMQ
BrokerService[rec1_master] Task-210
2013-05-27 14:51:21,987 | INFO  | Establishing network connection from
vm://rec1_master?async=false&network=true to
failover:(tcp://corerec2:62618,nio://slave2:62602)?randomize=false&maxReconnectAttempts=0
| org.apache.activemq.network.DiscoveryNetworkConnector | Simple Discovery
Agent-2
2013-05-27 14:51:21,989 | INFO  | Successfully connected to
tcp://corerec2:62618 |
org.apache.activemq.transport.failover.FailoverTransport | ActiveMQ Failover
Worker: 1840307907
2013-05-27 14:51:21,994 | INFO  | Network connection between
vm://rec1_master#4 and tcp://corerec2:62618(master_rec2) has been
established. | org.apache.activemq.network.DemandForwardingBridgeSupport |
triggerStartAsyncNetworkBridgeCreation: remoteBroker=unconnected,
localBroker= vm://rec1_master#4
2013-05-27 14:51:30,974 | INFO  | Network Could not shutdown in a timely
manner | org.apache.activemq.network.DemandForwardingBridgeSupport |
ActiveMQ BrokerService[rec1_master] Task-205
2013-05-27 14:51:30,974 | INFO  | rec1_master bridge to master_rec2 stopped
| org.apache.activemq.network.DemandForwardingBridgeSupport | ActiveMQ
BrokerService[rec1_master] Task-205



--
View this message in context: http://activemq.2283324.n4.nabble.com/Networked-brokers-dropping-suppressing-messages-tp4662747p4667512.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Networked brokers dropping/suppressing messages

Posted by talavera <at...@gmail.com>.
That explains why I don't see the "Network connection ... has been
established" log!
And of course there's no BrokerInfo command when failover is in control,
even when only reconnecting to the master.

Excellent entry indeed!
I see it's in https://activemq.apache.org/ now too.

I'll try the masterslave transport then, thank you!



--
View this message in context: http://activemq.2283324.n4.nabble.com/Networked-brokers-dropping-suppressing-messages-tp4662747p4667471.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Networked brokers dropping/suppressing messages

Posted by Christian Posta <ch...@gmail.com>.
don't use the failover directly like that... use masterslave: as discussed
at Torsten's excellent blog entry:

http://tmielke.blogspot.com/2011/09/activemq-network-bridge-to-masterslave.html


On Fri, May 24, 2013 at 8:15 AM, talavera <at...@gmail.com> wrote:

> Did you find a solution?
>
> I've just run into the problem too, this what I get:
> 2013-05-23 17:47:04,927 | DEBUG | suppressing duplicate message send
> [ID:corerec2-53231-1368512827283-0:385:1:1:1] with producerSequenceId [1]
> less than last stored: 9 |
> org.apache.activemq.broker.ProducerBrokerExchange
> | ActiveMQ Transport: tcp:///10.25.144.85:41356@61616
>
> Looking into the code [1], it seems this particular log indicates the
> connection from which is received belongs to a local producer, but I've
> tracked the message ID back to a different broker, which says it's bridging
> the message.
> The log prints a transport originating in that remote broker, anyways.
>
> I wonder if this shouldn't be recognized as a networkProducer and then
> "demultiplex" the last stored Id, instead of using a unique index for the
> entire network connection (I'm using conduit subscriptions, the default
> config)
>
> This behaviour seems to happen after a network connection is broken (a
> firewall deciding it's been too long) and the originating broker restarts
> it.
> I haven't found anything interesting during the connection restore phase in
> logs.
> Only restarting the destination broker helps after the broker starts
> marking
> messages as duplicated.
>
> I'll try to get a network capture of the restablishment of the network
> connection, I'd like to see if there's a BrokerInfo command coming to the
> destination broker, which seems to be what helps identifying a network
> connection in destination.
> Not sure of this one though, any hints would be helpful.
>
> I'm running AMQ 5.7.0, using two standalone brokers A and B, with a network
> connection A->B which has a failover to a third C broker, which is a slave
> of A.
> Broker C have been down during the tests, only the network connection uses
> the failover protocol.
>
> This is the config in broker A:
>          <networkConnectors>
>             <networkConnector
> uri="static:(failover:(tcp://B:61616,tcp://C:61617)?randomize=false)"/>
>         </networkConnectors>
>
> [1]
>
> https://svn.apache.org/viewvc/activemq/trunk/activemq-broker/src/main/java/org/apache/activemq/broker/ProducerBrokerExchange.java?revision=1406629&view=markup
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Networked-brokers-dropping-suppressing-messages-tp4662747p4667469.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>



-- 
*Christian Posta*
http://www.christianposta.com/blog
twitter: @christianposta

Re: Networked brokers dropping/suppressing messages

Posted by talavera <at...@gmail.com>.
Did you find a solution?

I've just run into the problem too, this what I get:
2013-05-23 17:47:04,927 | DEBUG | suppressing duplicate message send
[ID:corerec2-53231-1368512827283-0:385:1:1:1] with producerSequenceId [1]
less than last stored: 9 | org.apache.activemq.broker.ProducerBrokerExchange
| ActiveMQ Transport: tcp:///10.25.144.85:41356@61616

Looking into the code [1], it seems this particular log indicates the
connection from which is received belongs to a local producer, but I've
tracked the message ID back to a different broker, which says it's bridging
the message.
The log prints a transport originating in that remote broker, anyways.

I wonder if this shouldn't be recognized as a networkProducer and then
"demultiplex" the last stored Id, instead of using a unique index for the
entire network connection (I'm using conduit subscriptions, the default
config)

This behaviour seems to happen after a network connection is broken (a
firewall deciding it's been too long) and the originating broker restarts
it.
I haven't found anything interesting during the connection restore phase in
logs.
Only restarting the destination broker helps after the broker starts marking
messages as duplicated.

I'll try to get a network capture of the restablishment of the network
connection, I'd like to see if there's a BrokerInfo command coming to the
destination broker, which seems to be what helps identifying a network
connection in destination.
Not sure of this one though, any hints would be helpful.

I'm running AMQ 5.7.0, using two standalone brokers A and B, with a network
connection A->B which has a failover to a third C broker, which is a slave
of A.
Broker C have been down during the tests, only the network connection uses
the failover protocol.

This is the config in broker A:
         <networkConnectors>
            <networkConnector
uri="static:(failover:(tcp://B:61616,tcp://C:61617)?randomize=false)"/>
        </networkConnectors>

[1]
https://svn.apache.org/viewvc/activemq/trunk/activemq-broker/src/main/java/org/apache/activemq/broker/ProducerBrokerExchange.java?revision=1406629&view=markup



--
View this message in context: http://activemq.2283324.n4.nabble.com/Networked-brokers-dropping-suppressing-messages-tp4662747p4667469.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Networked brokers dropping/suppressing messages

Posted by Christian Posta <ch...@gmail.com>.
So i'll give it a shot later today when I get a free moment. Hopefully can
get you an answer.




On Tue, Feb 5, 2013 at 8:33 AM, photonikd <br...@me.com> wrote:

> Hi Christian,
>
> Thank you for your response. I have attached both broker configurations for
> you to review. I am running a continuous testing scenario that is
> performing
> the following tasks:
>
> Produce 2 Messages at DC1 --> Forwarded to DC2 and consumed
> Pause 10 seconds
> Produce 2 Messages at DC2 --> Forwarded to DC1 and consumed.
>
> As of now, I am experiencing a 30-40% failure rate.
>
> Thanks!
> brokerDC1.xml
> <http://activemq.2283324.n4.nabble.com/file/n4662780/brokerDC1.xml>
> brokerDC2.xml
> <http://activemq.2283324.n4.nabble.com/file/n4662780/brokerDC2.xml>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Networked-brokers-dropping-suppressing-messages-tp4662747p4662780.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>



-- 
*Christian Posta*
http://www.christianposta.com/blog
twitter: @christianposta

Re: Networked brokers dropping/suppressing messages

Posted by photonikd <br...@me.com>.
Hi Christian,

Thank you for your response. I have attached both broker configurations for
you to review. I am running a continuous testing scenario that is performing
the following tasks:

Produce 2 Messages at DC1 --> Forwarded to DC2 and consumed
Pause 10 seconds
Produce 2 Messages at DC2 --> Forwarded to DC1 and consumed.

As of now, I am experiencing a 30-40% failure rate.

Thanks!
brokerDC1.xml
<http://activemq.2283324.n4.nabble.com/file/n4662780/brokerDC1.xml>  
brokerDC2.xml
<http://activemq.2283324.n4.nabble.com/file/n4662780/brokerDC2.xml>  



--
View this message in context: http://activemq.2283324.n4.nabble.com/Networked-brokers-dropping-suppressing-messages-tp4662747p4662780.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Networked brokers dropping/suppressing messages

Posted by Christian Posta <ch...@gmail.com>.
This one seems curious. Can you post your broker configs and I can try on
my side?


On Mon, Feb 4, 2013 at 1:35 PM, photonikd <br...@me.com> wrote:

> Hello All,
>
> I'm currently in the process of configuring 2 ActiveMQ (5.7.0) brokers with
> the primary objective to use persistant queues to sync messages from
> Datacenter 1, to a broker in Datacenter 2 (and vice versa) via a static
> openwire Network Connector. After building the 2 brokers and running some
> validation tests, I am experiencing sporadic behavior where some messages
> that are sent to the remote broker seem to never make it to their
> destination queue. Here is a "visual" representation of the configuration:
>
> Datacenter 1 Producer --> DC1 Broker: SYNC.OUT (Comp Q) --> DC2.SYNC (Phys
> Q)(Statically Included Network Connector)--> Datacenter Broker DC2 -->
> DC2.SYNC (Comp Q) --> SYNC.IN(Phys Q) (Consumer Attached to Q)
>
> As I mentioned perviously, the process is vice versa from DC2 back to DC1
> Datacenter 2 Producer --> DC2 Broker: SYNC.OUT (Comp Q) --> DC1.SYNC (Phys
> Q)(Statically Included Network Connector)--> Datacenter Broker DC1 -->
> DC1.SYNC (Comp Q) --> SYNC.IN(Phys Q) (Consumer Attached to Q)
>
> At first I was concerned that the messages were not actually being
> transferred at a TCP level from datacenter to datacenter as the message
> enqueue/dequeue count on the near end would increment/decrement as
> expected.
> But, the far end queue would never appear to receive the message and
> increment/decrement its queue. After enabling debug logging, I noticed that
> when the lost message behavior would occur, I would see this entry in the
> far end/receiving broker, indicating that the message did indeed arrive:
>
> suppressing duplicate message send
> [ID:brokerDC2-60716-1359583622675-0:0:1:1:272] with producerSequenceId
> [272]
> less than last stored: 328 |
> org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport:
> tcp:///x.x.x.x:37877@61617
>
> While searching out other users experiencing this problem, I discovered and
> added the following configuration items to activemq.xml:
> (within the network connector configuration)
> suppressDuplicateQueueSubscriptions="false"
> suppressDuplicateTopicSubscriptions="false"
>
> (within the KahaDB configuration)
> <kahaDB maxFailoverProducersToTrack="0" />
>
> Notes
> In order to rule out Datacenter to Datacenter latency (the bandwidth
> between
> these DCs is, let's say, abundant), I built the same environment on 2
> virtual machines, locally and on the same VLAN. Regardless of the
> connectivity situation, the problem still existed in either scenario.
>
> Test Cases
> - After adding the above configuration items and restarting both ActiveMQ
> instances, the problem would still occur.
> - After adding the above configuration items, removing the existing KahaDB
> data directory, restarting the AmQ instance and letting the data directory
> re-create itself, the problem DID NOT occur again.
> - After restarting the brokers again, I was faced with this problem until I
> again cleared out the KahaDB directory.
>
>
> So, I'm left with a few unanswered questions:
> - Why am I receiving duplicate messages or why does AMQ think this is
> happening?
> - Did the configuration changes that I made (in conjunction with the Kaha
> rebuild) actually resolve the issue, if so, why?
> - The Plot Twist! Currently, I am accomplishing message replication with a
> 5.3.0 environment without issue.
>
> Any insight is greatly appreciated.
> Thanks!
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Networked-brokers-dropping-suppressing-messages-tp4662747.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>



-- 
*Christian Posta*
http://www.christianposta.com/blog
twitter: @christianposta