You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by AntonR <an...@volvo.com> on 2020/05/28 14:39:00 UTC

Re: Artemis cluster - Messages stuck in Delivering state

I don't know if anyone is looking into this or have any ideas, but I have
made some new discoveries that might help in figuring out what is going on.

I still have not been able to replicate the issue in a smaller/more
controlled environment, even though pretty much all is the same in regards
to broker, configuration application and client setup. I suspect it might be
caused in part due to the number of clients in the real environment,
something I can not really simulate.

What I have found though, is two workarounds, neither of which are ideal,
but maybe they can give a hint to someone other than me.

Workaround 1: If I remove failover nodes in the RA configuration the problem
won't appear, so that means the config is roughly:
RA1:
failover:(tcp://broker1:61616)?nested.soLinger=10&nested.soTimeout=200000&jms.rmIdFromConnectionId=true&maxReconnectAttempts=0
RA2:
failover:(tcp://broker2:61616)?nested.soLinger=10&nested.soTimeout=200000&jms.rmIdFromConnectionId=true&maxReconnectAttempts=0
And so on...

This eliminates the issue entirely, but at the cost of one RA and
corresponding MDB not failing over and thus are unable to perform any work
for the duration of broker downtime.

Workaround 2: If I add "initialReconnectDelay" to a value of 5000 or more
this sort of fixes the issue.
Example of one RA connectionURL:
failover:(tcp://broker1:61616,tcp://broker2:61616,tcp://broker3:61616)?nested.soLinger=10&nested.soTimeout=200000&jms.rmIdFromConnectionId=true&randomize=false&priorityBackup=true&maxReconnectAttempts=0&initialReconnectDelay=5000

This kind of works, but at least with 5000 delay I still get the lock every
now and then, with the upside that an additional broker restart fixes it. I
do not want this setup in a production environment but at least it sort of
works without any major impact on application performance.

Without much evidence to support it I think the issue might be explained in
the  ActiveMQ Failover documentation
<https://activemq.apache.org/failover-transport-reference>  . Under
transactions they describe an issue that sounds sort of similar to what I am
seeing, but release notes for the fix version seem to be offline so I have
need unable to track the specific fix implemented. Perhaps it is something
that could be adopted in the Artemis broker as well?

Any thought on this? Or is there something inherently incompatible with my
setup and the Artemis broker?

Br,
Anton



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html