You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by John Lilley <jo...@redpointglobal.com.INVALID> on 2024/01/30 19:34:52 UTC

HA follow-up: can I use a different Session than the Queue's ?

Looking at this more closely, I’m unsure that can re-create the Connection/Session in the reply-to case.

In our onMessage() handler, the reply-to queue is fetched from the Message
var replyQueue = (Queue)message.getJMSReplyTo();

And presumably it “belongs to” the same session as the received message.  Can I make a whole new Connection/Session/Producer and send to the replyQueue using the new Producer, even though it does not use the Session to which the replyQueue belongs?

I am a confused as to whether a Destination really “belongs to” a Session.  While the ActiveMQDestination has a field:
private final transient ActiveMQSession session;
It is often set to null and I don’t know whether it matters.

Please advise.

Thanks
John




[rg] <https://www.redpointglobal.com/>

John Lilley

Data Management Chief Architect, Redpoint Global Inc.

34 Washington Street, Suite 205 Wellesley Hills, MA 02481

M: +1 7209385761<tel:+1%207209385761> | john.lilley@redpointglobal.com<ma...@redpointglobal.com>
From: John Lilley <jo...@redpointglobal.com.INVALID>
Sent: Tuesday, January 30, 2024 10:41 AM
To: users@activemq.apache.org
Subject: RE: Question about HA configuration failover problem

Hi Justin,

Thanks for the advice.  We’ll definitely look into reducing those other parameters, because we’d like to get the overall reconnect time under our default RPC timeout.

However, when we see AMQ219014, I would like to “fail different” and avoid the timeout on the 2nd retry.  Asking for clarification on what could be done to “change the application itself” as you mentioned.  I’m kind of assuming that getting a whole new Connection/Session/Producer set would be the way to go, but perhaps you have other suggestions.

Thanks
John



[rg]<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.redpointglobal.com%2f&c=E,1,Gv2gufo2YIoL8BRXd-k-DkRn2fIRb11kWC-KqdE_mQXGUnLLJNdBsdlTJGlhXGS1J_B42XEahPURAe2bzXH-hWkReugazKrEMKqd6q9ayLujha_4Y9ci&typo=1>

John Lilley

Data Management Chief Architect, Redpoint Global Inc.

34 Washington Street, Suite 205 Wellesley Hills, MA 02481

M: +1 7209385761<tel:+1%207209385761> | john.lilley@redpointglobal.com<ma...@redpointglobal.com>
From: Justin Bertram <jb...@apache.org>>
Sent: Monday, January 29, 2024 1:55 PM
To: users@activemq.apache.org<ma...@activemq.apache.org>
Subject: Re: Question about HA configuration failover problem

> Is this a bug in the AMQ JMS client?

At this point I don't believe it is a bug in the ActiveMQ Artemis core JMS client.

I talked about AMQ219014 previously, but I suppose it bears repeating here. The timeout is ultimately ambiguous. The client can't reliably conclude that the broker has failed due to a timeout like this. It could be the result of a network issue or a broker slow-down for some reason (e.g. long GC pause). The broker may have received the data sent but simply failed to send a response back within the timeout or it may not have received anything. How to respond to the timeout is ultimately up to the application.

In this case the application retries the operation which itself fails after 44 seconds due to a connection loss. I believe the connection loss is based on the default connection TTL of 60 seconds (i.e. 15 + 44 = 59 which is close enough) during which time the client never receives any data from the broker (i.e. no pings, etc.).

> When we encounter this error, should we attempt to close/destroy/recreate the Producer, Session, or Connection?

It's hard to say what you "should" do in this circumstance, but if this delay is too long then you should probably either change your connection URL (e.g. lower your clientFailureCheckPeriod & connectionTTL [1]) or change the application itself (as you mentioned) to deal with it so that it functions appropriately for your use-case.


Justin

[1] https://activemq.apache.org/components/artemis/documentation/latest/connection-ttl.html#detecting-failure-from-the-client

On Fri, Jan 26, 2024 at 12:51 PM John Lilley <jo...@redpointglobal.com.invalid>> wrote:
Greetings,

This is something of a follow-up on previous failover issue reports, but we’ve taken careful notes and logs and hopefully we have enough information to diagnose what is happening.

We are experiencing an error during AMQ broker failover from live to backup.  We are testing this using a load-generator of our own devising, in which multiple threads are performing RPC calls, in which requests are posted to a named queue and replies are returned on the reply-to temporary queue.

Both JMS client and the broker are version 2.31.2

Our live (master) broker.xml: https://drive.google.com/file/d/10lDHv13AJXKOHZOLIdT7Cph8pt7VghEl/view?usp=sharing<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f10lDHv13AJXKOHZOLIdT7Cph8pt7VghEl%2fview%3fusp%3dsharing&c=E,1,jmtY0WvX5Hh4LTTyzdE_nBVD8UV70vO0oF-fJUL00KWnGL3GfuWTGBKejOAvuGLNlQJbZp1BEtTcvXq5cwzCFDeowZV9bM689Y5atjeX&typo=1>
Our backup (slave) broker.xml: https://drive.google.com/file/d/10gNkpFSABskxaPODFI_1GV-DzMIj4tem/view?usp=sharing<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f10gNkpFSABskxaPODFI_1GV-DzMIj4tem%2fview%3fusp%3dsharing&c=E,1,ocsncDoSiI3XMXR8zPK5gINtURhD71mdkFDGEFtXDFTdWMNbjwzq1z8ygPqZQpBKEjBwTIBoKIqE8mqoNx0ipv7XcXb-yAprtPHHYg0zWISigQ,,&typo=1>

Oddly, the failover from backup to live never has an issue.

Synopsis of the timeline is:

2024-01-23T22:31:20.719 our RPC service attempts to send reply message to reply-to-queue

2024-01-23T22:31:35.721 the call to Producer.send() fails after 15 seconds: AMQ219014: Timed out after waiting 15000 ms for response when sending packet 45

Our code delays for two seconds and attempts to call Producer.send() again

Meanwhile, the backup AMQ broker has sensed failure and taken over, and is processing messages from *other* clients:
2024-01-23 22:29:58,245 INFO  [org.apache.activemq.artemis.core.server] AMQ221024: Backup server ActiveMQServerImpl::name=backup is synchronized with live server, nodeID=10952195-b6ec-11ee-9c87-aa03cb64206a.
2024-01-23 22:29:58,252 INFO  [org.apache.activemq.artemis.core.server] AMQ221031: backup announced
2024-01-23 22:31:20,720 WARN  [org.apache.activemq.artemis.core.server] AMQ222295: There is a possible split brain on nodeID 10952195-b6ec-11ee-9c87-aa03cb64206a. Topology update ignored
2024-01-23 22:31:20,721 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: LiveFailoverQuorumVote
2024-01-23 22:31:20,723 INFO  [org.apache.activemq.artemis.core.server] AMQ221084: Requested 0 quorum votes
2024-01-23 22:31:20,723 INFO  [org.apache.activemq.artemis.core.server] AMQ221083: ignoring quorum vote as max cluster size is 1.
2024-01-23 22:31:20,723 INFO  [org.apache.activemq.artemis.core.server] AMQ221071: Failing over based on quorum vote results.
2024-01-23 22:31:20,732 WARN  [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to dm-activemq-live-svc/10.0.52.174:61616<http://10.0.52.174:61616> has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
2024-01-23 22:31:20,733 WARN  [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to dm-activemq-live-svc/10.0.52.174:61616<http://10.0.52.174:61616> has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
…
2024-01-23 22:31:21,450 INFO  [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
2024-01-23 22:31:21,459 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:61617<http://0.0.0.0:61617> for protocols [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE]
2024-01-23 22:31:21,816 INFO  [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: HEADER= {"version":1,"type":"get_task_status_request","id":"d70xpfwljofo","api":"test_harness","method":"get_task_status","instance":"combined","authorization":"73DZU/fb1A2fFnKdzABPbLzAHVw7Z7VsfSLcQ7VqSBQ="}, BODY={"id":"909b23ae-578f-412d-9706-9f300adb9119","progress_start_index":0,"message_...

But… our second Producer.send() attempt fails again after about 44 seconds:
2024-01-23T22:32:21.936 [Thread-2 (ActiveMQ-client-global-threads)] JmsProducerPool.send_:376 [j6ugszoiu1gl] WARN - Error sending message, will retry javax.jms.JMSException: AMQ219016: Connection failure detected. Unblocking a blocking call that will never get a response
                at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:560<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fChannelImpl.java%3a560&c=E,1,hJP-oLuiosQYaEm2_JIgfYibgfziJ80h9nAD2hhzaU9jP8UI8hgmo4V534e1qd2VNBecsVoH1vnLlTioacw8A9KjC8FSCsO9pyeO6wuVw00p4kOGbg,,&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:452<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fChannelImpl.java%3a452&c=E,1,bZYbkoqh308OLJAX07WMy0Vl3b46bqOnqgoyBAdAaxYLlfKMLxMJi_LJlTidINwjMqbvICr8WSQbyPkmpCJ2hdNCnA1jnYjxD2LeWpsmPpp_N4E,&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.addressQuery(ActiveMQSessionContext.java:434<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQSessionContext.java%3a434&c=E,1,AnTeTm1mDGLBAioRldPylrVu9CoAzJgL8lRlliCwiFA0epHQKXbBvKNJiS5WeqfoV7OZm9DeA3_5vYTY8l4EJ-vtTm92ekX0JvGwCclbMXg5fp4,&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.addressQuery(ClientSessionImpl.java:808<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientSessionImpl.java%3a808&c=E,1,E_FuT-UKlNzzbUgnUXLxP-Wr-gaDvUe7el6V88IhKksPFjrG75Ze9IAn4XXgDn4-TImvLKEYUUyUZ-u9qK6jYYAU8qS__KIQqV9U2TNwwioI&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.jms.client.ActiveMQSession.checkDestination(ActiveMQSession.java:390<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQSession.java%3a390&c=E,1,lnKU2NJAcT23jm0JHWdkGzQli2s3dkT2cyrSwRyyqnYzODwV2dNr6nmhfoRIN2Nj6B9Lo41nC8N-_Tmq76SV96_s2shJPRDr3CHhyvFTSARk-Jw4WkTRMQk,&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.doSendx(ActiveMQMessageProducer.java:406<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQMessageProducer.java%3a406&c=E,1,IIEmrC_0qXXMdbyvWIe5u7krlX5RXe_ivg0BC7I8L1v5HcPrRvRqmOIg4-_Rm07MY5ey-sw5tGDlm4tpad2MOkc7MxxHvEWCMBUTz3cvN_B_q9PitXJxRLw,&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:221<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQMessageProducer.java%3a221&c=E,1,uVg96MC8n8jpK0A0A5EQEzrUr3de_RE_MKjddGCwSX7HomRboWDMcTknoS-6yIGyNA5HB8akKiGSUaVPRRmi4MvfBtBRjwe8cmpjGO1-Rw,,&typo=1&ancr_add=1>)
                at net.redpoint.ipc.jms.JmsProducerPool.send_(JmsProducerPool.java:372<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsProducerPool.java%3a372&c=E,1,fs54QRXtDQ9u-z0lY17ayNv7RRQu088D5Mb4T6Rb9st7JMt6ceUGY51R_mSe7sWNIhApIbD3pkxeoy0tqyWVJ2KRMW1RNMQkLJW8zg6IClLB1C2stHtY3fvkDrew&typo=1&ancr_add=1>)
                at net.redpoint.ipc.jms.JmsProducerPool.sendResponse(JmsProducerPool.java:319<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsProducerPool.java%3a319&c=E,1,kp3fPhMOjHutXSX8Deq3KLQz6o5KLht80g-bH4rU8-mLIDSeNFmTZJdu9ZlSCoE1w9Q8Hf1ETs_OesfzU_39pHrvKy6ARcM3FTtMHjggdkECauoovRrB6DU,&typo=1&ancr_add=1>)
                at net.redpoint.ipc.jms.JmsRpcServer$RpcReceiver.handleMessage(JmsRpcServer.java:225<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsRpcServer.java%3a225&c=E,1,atzVauAZcn76EYXbRwXl-yZg4NSQTkBoOh6Zl8lN71qSVxOjh-FEIoC0TF4Wy4TNodoZKAh5DacsW-Zpxcp_a3AU0YbHT6PzPvsVCViudckavehGMoHnX30E&typo=1&ancr_add=1>)
                at net.redpoint.ipc.jms.JmsRpcServer$RpcReceiver.onMessage(JmsRpcServer.java:158<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsRpcServer.java%3a158&c=E,1,Jm4ywGLfNQey4PX8CyHU5wi7B0PoMVPo79LsaEdSjFtYfTLo7t0MI1zOL1-1PRwV3YlkWPDXq5rUjKtOLW3edOvNiOHAejaPb9X0JpMNNGYLwj1X2CWKgvRW&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.jms.client.JMSMessageListenerWrapper.onMessage(JMSMessageListenerWrapper.java:110<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJMSMessageListenerWrapper.java%3a110&c=E,1,8EHxNHEJye1w-zdSUuBfquALqXjzezwNFaK9FKfByH1rZsBKf9HmKQvnCiG7w8e7bLlGX_1hjv3JzYD63H_nShx6e03HNym8BHEB6AsGYmLTZX4DeWxPFATH&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:982<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientConsumerImpl.java%3a982&c=E,1,u7UapKp5BMF8FGcc5a1o1mqSo9VYLaQB9g2QTFpAq6JJ7QM9KcnUSVGmRU6gmD1Nhsj3MvL9JRLtCVYIZF0WvI4vcyCEUmNNqgssxn13Ps4xX5YCGOBtHg,,&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl$Runner.run<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fRunner.run&c=E,1,Fx9Ld2ucdlt3_6p-ZbXUjI_SlgsOyVwnvmcl26sXvFAgTasD82Md2N_EjrRcKbLKHue3crRJHPHyrS9ejGlfrFO8wOniNLS-IvPNqLJKNpgLX4QsXXBWD9A,&typo=1&ancr_add=1>(ClientConsumerImpl.java:1139<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientConsumerImpl.java%3a1139&c=E,1,rc9ciF96U-0OoMBNQHBoSVfqaE4jGKbWLg6dhaK7WCYccJMW3amIzQA1BbeqmFRAMPRJ0njluDkORyp2JTQnknVqWVJPaBOM-SkeJ10nmXQgAXi5bribWgIscg,,&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:57<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fOrderedExecutor.java%3a57&c=E,1,mjc9IcM1sJ7-uOg5Q8cQKmDZWoSLVx3L90bNL2gHGjbTzHFTWR9Xmct_l1fZGUi5wAFw9Mwwq0BKBqWVrd-FW4Un4tsclZjRGhsTiUhwKui3&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:32<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fOrderedExecutor.java%3a32&c=E,1,cHlqlCUCUFKGLjJJXqVccNSrb6l0TU6LwNCMoGa_1JS3W_Tlr40V-eoNySZtvuVgWsTLXj9t1wbWD6Q1QyvSb3J_RD-tDpgwhR2HvVcCLsQWP9n46sn0AQ,,&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:68<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fProcessorBase.java%3a68&c=E,1,aPd-9lYQMk9t6gU4ORJVd8KN5pDL6BL1FF3bEOJfXAVGd3R8xVHweKadojHKXFNIwP3mb5mxZqvLWKzUwc5UpkzSBbudUI0TIUaZn1NihrmKxHmhaYxG4JRzs2U,&typo=1&ancr_add=1>)
                at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fThreadPoolExecutor.java%3a1136&c=E,1,oagND5zTtV9BSx1sgOncvLVjcpkKY-EKyoMcGYwSN9Ot8Oi5rKTXzz1eSmqKOu-O3ZmIHGzOzWqOBfNJgaKAO7ppz38EdKXRuIexrEonLiQ14T0tVC5gwpU,&typo=1&ancr_add=1>)
                at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fWorker.run&c=E,1,F2i39NYR5NdRjP-gobPho-u7VXh9GBPbIZCM3IadW-1x7RwzFF-Gopm5c7J-Qq23e5-kUzAtr063W2bI3X8IUk8qqAzKB3p5-sLoo8PKGJOWo1w,&typo=1&ancr_add=1>(ThreadPoolExecutor.java:635<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fThreadPoolExecutor.java%3a635&c=E,1,w1js-B5KJ0ipSc6iwmwlRNqjX4NTe6OPejHfmSCpfoVpoqrZvbyC8hS-BnVJt_FRfz_W69Hdw2KRaxS35X12VpuUX4vGZsbaT_ESJAYN&typo=1&ancr_add=1>)
                at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2f1.run&c=E,1,8B8Nm6i_36dhHLklnYJiMhiHKPZ9-7v62nlKBipCORN3jPl48UpQ-O33Z5gKw2Teya4WG4tD58ANdXy8z8B_Yap08Y0xT5-q3VVV7o8NI-s7lfJDOb2INFI,&typo=1&ancr_add=1>(ActiveMQThreadFactory.java:118<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQThreadFactory.java%3a118&c=E,1,vhaxwhXsE4pqQ7Pen3JsMgiDE7-KBJwOeCEPNvwT7Te8NQprIRmm6R8WWN5q1BoIY6-WPOnhH0CqGgCfn8BPuKfxUV1Ru6_BVAp3DQoe7fGO&typo=1&ancr_add=1>)
Caused by: ActiveMQUnBlockedException[errorType=UNBLOCKED message=AMQ219016: Connection failure detected. Unblocking a blocking call that will never get a response]
                ... 20 more
Caused by: ActiveMQDisconnectedException[errorType=DISCONNECTED message=AMQ219015: The connection was disconnected because of server shutdown]
                at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$CloseRunnable.run<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fCloseRunnable.run&c=E,1,2ZmSM23T2gvUV2hroRdgg3OwuDZnBIHtk7Gj4eKcrdwreoRXOxGLAaVqJ9fUjPFH7SpJd-nx9EJStjl1Yi82QTrDTh0h1YY7POcwXzuug32Yfaw,&typo=1&ancr_add=1>(ClientSessionFactoryImpl.java:1172<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientSessionFactoryImpl.java%3a1172&c=E,1,IHroZJaGwnx3TWcbWFH18rbHo19rhJimanChxpwpg_hwV0dN0nSO0LkZf061Q4xY2XSA_kkMQq6PtraZkUF9VQipkUtLU9uZ9jUUYFeD&typo=1&ancr_add=1>)
                ... 6 more

We retry again, and this third attempt at Producer.send() does succeed, as seen in the backup broker’s log:
2024-01-23 22:32:23,937 INFO  [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: HEADER= {"version":1,"type":"testing_echo_response","id":"j6ugszoiu1gl","http_code":200}…
2024-01-23 22:32:23,937 INFO  [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] DELIVER: HEADER= {"version":1,"type":"testing_echo_response","id":"j6ugszoiu1gl","http_code":200}…

Unfortunately, by this time a whole 63 seconds has gone by form the RPC caller’s point of view, and our RPC client timed out and gave up.

It seems to us that the problem can be summarized as "Once the client gets the 'AMQ219014: Timed out after waiting 15000 ms' error, an attempt at retry will fail again after 44 seconds".

It is worth noting that, in our send-retry code, we do not attempt to destroy/recreate the Connection, Session, or Producer; we believe that the client should take care of that for us.  Which it mostly does, except for this one case.  And even in this case it does eventually, but the 44-second delay is too long for us.  And it is unclear where that 44-second delay even comes from.

FYI our retry loop looks like:
private static final int SEND_RETRIES = 3;
private static final long SEND_RETRY_DELAY_MS = 2000;
...
var producer = ticket.pi.getProducer();
for (int retry = 0; retry < SEND_RETRIES; retry++) {
  try {
    producer.send(ticket.destination, jmsRequest, producer.getDeliveryMode(), producer.getPriority(), ttlMs);
    break;
  } catch (javax.jms.JMSException ex) {
    if (Arrays.stream<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fArrays.stream&c=E,1,q8qmG4USRQcV-y4DdZPj8L9ra-iq79AWF2VL7Jz-d8geggbDGiaFMkKwSEkV2Q-3OF1BWkM7ESmd3Vl0qfPFM0bgnFB71LbRXrMFUe02Y2MuT8VrqwrGXa-IYq0,&typo=1&ancr_add=1>(retryableCodes).anyMatch(code -> ex.getMessage().contains(code)) && retry + 1 < SEND_RETRIES) {
      LOG.warn("Error sending message, will retry", ex);
      Thread.sleep(SEND_RETRY_DELAY_MS);
      continue;
    } else {
      throw ex;
    }
  }
}

Also see the thread dump generated at the 60-second mark, which is *after* the first retry fails but *before* the second retry fails (in other words, this is the thread dump of the JVM state when our second attempt at Producer.send() is pending):
https://drive.google.com/file/d/10dIWqAL65zwWMEfN03WGzC_Ya1QayPGB/view?usp=sharing<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f10dIWqAL65zwWMEfN03WGzC_Ya1QayPGB%2fview%3fusp%3dsharing&c=E,1,lSiz9_2UYM2vFybSlyoASL1tvg_Z30xWs2ZGpgsFFuvAuOG1rMBhhQIXnq5lFXb-XOIOC1do4p1t2QiWv-4aaTtk-AlfzJATe1lqQGd9Vk5IHc2YPvnauKetHYg,&typo=1>

Our questions come down to two things:

  *   Is this a bug in the AMQ JMS client?
  *   When we encounter this error, should we attempt to close/destroy/recreate the Producer, Session, or Connection?

Please let me know if you can think of a workaround, or if there is more information we should capture.  This problem is readily reproducible.

Thanks
John




[rg]<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.redpointglobal.com%2f&c=E,1,FNxLyCg6EZpqCXU3KJhfSUofx4lGIaLC8rwl24yjbZJL9LYFgjo0C7OHTdxUO7LIHhzRqXtOxO_1jOzh-PMrEMYcTItHHGQH4zEmg4_OKg,,&typo=1>

John Lilley

Data Management Chief Architect, Redpoint Global Inc.

34 Washington Street, Suite 205 Wellesley Hills, MA 02481

M: +1 7209385761<tel:+1%207209385761> | john.lilley@redpointglobal.com<ma...@redpointglobal.com>

PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is confidential and is intended solely for the use of the individual(s) to whom it is addressed. If you believe you received this e-mail in error, please notify the sender immediately, delete the e-mail from your computer and do not copy, print or disclose it to anyone else. If you properly received this e-mail as a customer, partner or vendor of Redpoint, you should maintain its contents in confidence subject to the terms and conditions of your agreement(s) with Redpoint.

PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is confidential and is intended solely for the use of the individual(s) to whom it is addressed. If you believe you received this e-mail in error, please notify the sender immediately, delete the e-mail from your computer and do not copy, print or disclose it to anyone else. If you properly received this e-mail as a customer, partner or vendor of Redpoint, you should maintain its contents in confidence subject to the terms and conditions of your agreement(s) with Redpoint.

PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is confidential and is intended solely for the use of the individual(s) to whom it is addressed. If you believe you received this e-mail in error, please notify the sender immediately, delete the e-mail from your computer and do not copy, print or disclose it to anyone else. If you properly received this e-mail as a customer, partner or vendor of Redpoint, you should maintain its contents in confidence subject to the terms and conditions of your agreement(s) with Redpoint.

Re: HA follow-up: can I use a different Session than the Queue's ?

Posted by Justin Bertram <jb...@apache.org>.
> Can I make a whole new Connection/Session/Producer and send to the
replyQueue using the new Producer, even though it does not use the Session
to which the replyQueue belongs?

Yes. That shouldn't be a problem.

The session is only relevant when you're deleting temporary destinations.


Justin

On Tue, Jan 30, 2024 at 1:36 PM John Lilley
<jo...@redpointglobal.com.invalid> wrote:

> Looking at this more closely, I’m unsure that can re-create the
> Connection/Session in the reply-to case.
>
>
>
> In our onMessage() handler, the reply-to queue is fetched from the Message
>
> var replyQueue = (Queue)message.getJMSReplyTo();
>
> And presumably it “belongs to” the same session as the received message.
> Can I make a whole new Connection/Session/Producer and send to the
> replyQueue using the new Producer, even though it does not use the Session
> to which the replyQueue belongs?
>
>
>
> I am a confused as to whether a Destination really “belongs to” a
> Session.  While the ActiveMQDestination has a field:
>
> private final transient ActiveMQSession session;
>
> It is often set to null and I don’t know whether it matters.
>
>
>
> Please advise.
>
>
>
> Thanks
>
> John
>
>
>
>
> [image: rg] <https://www.redpointglobal.com/>
>
> John Lilley
>
> Data Management Chief Architect, Redpoint Global Inc.
>
> 34 Washington Street, Suite 205 Wellesley Hills, MA 02481
>
> *M: *+1 7209385761 <+1%207209385761> | john.lilley@redpointglobal.com
>
> *From:* John Lilley <jo...@redpointglobal.com.INVALID>
> *Sent:* Tuesday, January 30, 2024 10:41 AM
> *To:* users@activemq.apache.org
> *Subject:* RE: Question about HA configuration failover problem
>
>
>
> Hi Justin,
>
>
>
> Thanks for the advice.  We’ll definitely look into reducing those other
> parameters, because we’d like to get the overall reconnect time under our
> default RPC timeout.
>
>
>
> However, when we see AMQ219014, I would like to “fail different” and avoid
> the timeout on the 2nd retry.  Asking for clarification on what could be
> done to “change the application itself” as you mentioned.  I’m kind of
> assuming that getting a whole new Connection/Session/Producer set would be
> the way to go, but perhaps you have other suggestions.
>
>
>
> Thanks
>
> John
>
>
>
>
>
> [image: rg]
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.redpointglobal.com%2f&c=E,1,Gv2gufo2YIoL8BRXd-k-DkRn2fIRb11kWC-KqdE_mQXGUnLLJNdBsdlTJGlhXGS1J_B42XEahPURAe2bzXH-hWkReugazKrEMKqd6q9ayLujha_4Y9ci&typo=1>
>
> *John Lilley *
>
> *Data Management Chief Architect, Redpoint Global Inc. *
>
> 34 Washington Street, Suite 205 Wellesley Hills, MA 02481
>
> *M: *+1 7209385761 <+1%207209385761> | john.lilley@redpointglobal.com
>
> *From:* Justin Bertram <jb...@apache.org>
> *Sent:* Monday, January 29, 2024 1:55 PM
> *To:* users@activemq.apache.org
> *Subject:* Re: Question about HA configuration failover problem
>
>
>
> > Is this a bug in the AMQ JMS client?
>
>
>
> At this point I don't believe it is a bug in the ActiveMQ Artemis core JMS
> client.
>
>
>
> I talked about AMQ219014 previously, but I suppose it bears repeating
> here. The timeout is ultimately ambiguous. The client can't reliably
> conclude that the broker has failed due to a timeout like this. It could be
> the result of a network issue or a broker slow-down for some reason (e.g.
> long GC pause). The broker may have received the data sent but simply
> failed to send a response back within the timeout or it may not have
> received anything. How to respond to the timeout is ultimately up to the
> application.
>
>
>
> In this case the application retries the operation which itself fails
> after 44 seconds due to a connection loss. I believe the connection loss is
> based on the default connection TTL of 60 seconds (i.e. 15 + 44 = 59 which
> is close enough) during which time the client never receives any data from
> the broker (i.e. no pings, etc.).
>
>
>
> > When we encounter this error, should we attempt to
> close/destroy/recreate the Producer, Session, or Connection?
>
>
>
> It's hard to say what you "should" do in this circumstance, but if this
> delay is too long then you should probably either change your connection
> URL (e.g. lower your clientFailureCheckPeriod & connectionTTL [1]) or
> change the application itself (as you mentioned) to deal with it so that it
> functions appropriately for your use-case.
>
>
>
>
>
> Justin
>
>
>
> [1]
> https://activemq.apache.org/components/artemis/documentation/latest/connection-ttl.html#detecting-failure-from-the-client
>
>
>
> On Fri, Jan 26, 2024 at 12:51 PM John Lilley <
> john.lilley@redpointglobal.com.invalid> wrote:
>
> Greetings,
>
>
>
> This is something of a follow-up on previous failover issue reports, but
> we’ve taken careful notes and logs and hopefully we have enough information
> to diagnose what is happening.
>
>
>
> We are experiencing an error during AMQ broker failover from live to
> backup.  We are testing this using a load-generator of our own devising, in
> which multiple threads are performing RPC calls, in which requests are
> posted to a named queue and replies are returned on the reply-to temporary
> queue.
>
>
>
> Both JMS client and the broker are version 2.31.2
>
>
>
> Our live (master) broker.xml:
> https://drive.google.com/file/d/10lDHv13AJXKOHZOLIdT7Cph8pt7VghEl/view?usp=sharing
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f10lDHv13AJXKOHZOLIdT7Cph8pt7VghEl%2fview%3fusp%3dsharing&c=E,1,jmtY0WvX5Hh4LTTyzdE_nBVD8UV70vO0oF-fJUL00KWnGL3GfuWTGBKejOAvuGLNlQJbZp1BEtTcvXq5cwzCFDeowZV9bM689Y5atjeX&typo=1>
>
> Our backup (slave) broker.xml:
> https://drive.google.com/file/d/10gNkpFSABskxaPODFI_1GV-DzMIj4tem/view?usp=sharing
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f10gNkpFSABskxaPODFI_1GV-DzMIj4tem%2fview%3fusp%3dsharing&c=E,1,ocsncDoSiI3XMXR8zPK5gINtURhD71mdkFDGEFtXDFTdWMNbjwzq1z8ygPqZQpBKEjBwTIBoKIqE8mqoNx0ipv7XcXb-yAprtPHHYg0zWISigQ,,&typo=1>
>
>
>
> Oddly, the failover from backup to live never has an issue.
>
>
>
> Synopsis of the timeline is:
>
>
>
> 2024-01-23T22:31:20.719 our RPC service attempts to send reply message to
> reply-to-queue
>
>
>
> 2024-01-23T22:31:35.721 the call to Producer.send() fails after 15
> seconds: AMQ219014: Timed out after waiting 15000 ms for response when
> sending packet 45
>
>
>
> Our code delays for two seconds and attempts to call Producer.send() again
>
>
>
> Meanwhile, the backup AMQ broker has sensed failure and taken over, and is
> processing messages from **other** clients:
>
> 2024-01-23 22:29:58,245 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221024: Backup server ActiveMQServerImpl::name=backup is synchronized
> with live server, nodeID=10952195-b6ec-11ee-9c87-aa03cb64206a.
>
> 2024-01-23 22:29:58,252 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221031: backup announced
>
> 2024-01-23 22:31:20,720 WARN  [org.apache.activemq.artemis.core.server]
> AMQ222295: There is a possible split brain on nodeID
> 10952195-b6ec-11ee-9c87-aa03cb64206a. Topology update ignored
>
> 2024-01-23 22:31:20,721 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221066: Initiating quorum vote: LiveFailoverQuorumVote
>
> 2024-01-23 22:31:20,723 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221084: Requested 0 quorum votes
>
> 2024-01-23 22:31:20,723 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221083: ignoring quorum vote as max cluster size is 1.
>
> 2024-01-23 22:31:20,723 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221071: Failing over based on quorum vote results.
>
> 2024-01-23 22:31:20,732 WARN  [org.apache.activemq.artemis.core.client]
> AMQ212037: Connection failure to dm-activemq-live-svc/10.0.52.174:61616
> has been detected: AMQ219015: The connection was disconnected because of
> server shutdown [code=DISCONNECTED]
>
> 2024-01-23 22:31:20,733 WARN  [org.apache.activemq.artemis.core.client]
> AMQ212037: Connection failure to dm-activemq-live-svc/10.0.52.174:61616
> has been detected: AMQ219015: The connection was disconnected because of
> server shutdown [code=DISCONNECTED]
>
> …
>
> 2024-01-23 22:31:21,450 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221007: Server is now live
>
> 2024-01-23 22:31:21,459 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221020: Started EPOLL Acceptor at 0.0.0.0:61617 for protocols
> [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE]
>
> 2024-01-23 22:31:21,816 INFO
> [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: HEADER=
> {"version":1,"type":"get_task_status_request","id":"d70xpfwljofo","api":"test_harness","method":"get_task_status","instance":"combined","authorization":"73DZU/fb1A2fFnKdzABPbLzAHVw7Z7VsfSLcQ7VqSBQ="},
> BODY={"id":"909b23ae-578f-412d-9706-9f300adb9119","progress_start_index":0,"message_...
>
>
>
> But… our second Producer.send() attempt fails again after about 44 seconds:
>
> 2024-01-23T22:32:21.936 [Thread-2 (ActiveMQ-client-global-threads)]
> JmsProducerPool.send_:376 [j6ugszoiu1gl] WARN - Error sending message, will
> retry javax.jms.JMSException: AMQ219016: Connection failure detected.
> Unblocking a blocking call that will never get a response
>
>                 at
> org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(
> ChannelImpl.java:560
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fChannelImpl.java%3a560&c=E,1,hJP-oLuiosQYaEm2_JIgfYibgfziJ80h9nAD2hhzaU9jP8UI8hgmo4V534e1qd2VNBecsVoH1vnLlTioacw8A9KjC8FSCsO9pyeO6wuVw00p4kOGbg,,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(
> ChannelImpl.java:452
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fChannelImpl.java%3a452&c=E,1,bZYbkoqh308OLJAX07WMy0Vl3b46bqOnqgoyBAdAaxYLlfKMLxMJi_LJlTidINwjMqbvICr8WSQbyPkmpCJ2hdNCnA1jnYjxD2LeWpsmPpp_N4E,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.addressQuery(
> ActiveMQSessionContext.java:434
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQSessionContext.java%3a434&c=E,1,AnTeTm1mDGLBAioRldPylrVu9CoAzJgL8lRlliCwiFA0epHQKXbBvKNJiS5WeqfoV7OZm9DeA3_5vYTY8l4EJ-vtTm92ekX0JvGwCclbMXg5fp4,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.addressQuery(
> ClientSessionImpl.java:808
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientSessionImpl.java%3a808&c=E,1,E_FuT-UKlNzzbUgnUXLxP-Wr-gaDvUe7el6V88IhKksPFjrG75Ze9IAn4XXgDn4-TImvLKEYUUyUZ-u9qK6jYYAU8qS__KIQqV9U2TNwwioI&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.jms.client.ActiveMQSession.checkDestination(
> ActiveMQSession.java:390
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQSession.java%3a390&c=E,1,lnKU2NJAcT23jm0JHWdkGzQli2s3dkT2cyrSwRyyqnYzODwV2dNr6nmhfoRIN2Nj6B9Lo41nC8N-_Tmq76SV96_s2shJPRDr3CHhyvFTSARk-Jw4WkTRMQk,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.doSendx(
> ActiveMQMessageProducer.java:406
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQMessageProducer.java%3a406&c=E,1,IIEmrC_0qXXMdbyvWIe5u7krlX5RXe_ivg0BC7I8L1v5HcPrRvRqmOIg4-_Rm07MY5ey-sw5tGDlm4tpad2MOkc7MxxHvEWCMBUTz3cvN_B_q9PitXJxRLw,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.send(
> ActiveMQMessageProducer.java:221
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQMessageProducer.java%3a221&c=E,1,uVg96MC8n8jpK0A0A5EQEzrUr3de_RE_MKjddGCwSX7HomRboWDMcTknoS-6yIGyNA5HB8akKiGSUaVPRRmi4MvfBtBRjwe8cmpjGO1-Rw,,&typo=1&ancr_add=1>
> )
>
>                 at net.redpoint.ipc.jms.JmsProducerPool.send_(
> JmsProducerPool.java:372
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsProducerPool.java%3a372&c=E,1,fs54QRXtDQ9u-z0lY17ayNv7RRQu088D5Mb4T6Rb9st7JMt6ceUGY51R_mSe7sWNIhApIbD3pkxeoy0tqyWVJ2KRMW1RNMQkLJW8zg6IClLB1C2stHtY3fvkDrew&typo=1&ancr_add=1>
> )
>
>                 at net.redpoint.ipc.jms.JmsProducerPool.sendResponse(
> JmsProducerPool.java:319
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsProducerPool.java%3a319&c=E,1,kp3fPhMOjHutXSX8Deq3KLQz6o5KLht80g-bH4rU8-mLIDSeNFmTZJdu9ZlSCoE1w9Q8Hf1ETs_OesfzU_39pHrvKy6ARcM3FTtMHjggdkECauoovRrB6DU,&typo=1&ancr_add=1>
> )
>
>                 at
> net.redpoint.ipc.jms.JmsRpcServer$RpcReceiver.handleMessage(
> JmsRpcServer.java:225
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsRpcServer.java%3a225&c=E,1,atzVauAZcn76EYXbRwXl-yZg4NSQTkBoOh6Zl8lN71qSVxOjh-FEIoC0TF4Wy4TNodoZKAh5DacsW-Zpxcp_a3AU0YbHT6PzPvsVCViudckavehGMoHnX30E&typo=1&ancr_add=1>
> )
>
>                 at net.redpoint.ipc.jms.JmsRpcServer$RpcReceiver.onMessage(
> JmsRpcServer.java:158
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsRpcServer.java%3a158&c=E,1,Jm4ywGLfNQey4PX8CyHU5wi7B0PoMVPo79LsaEdSjFtYfTLo7t0MI1zOL1-1PRwV3YlkWPDXq5rUjKtOLW3edOvNiOHAejaPb9X0JpMNNGYLwj1X2CWKgvRW&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.jms.client.JMSMessageListenerWrapper.onMessage(
> JMSMessageListenerWrapper.java:110
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJMSMessageListenerWrapper.java%3a110&c=E,1,8EHxNHEJye1w-zdSUuBfquALqXjzezwNFaK9FKfByH1rZsBKf9HmKQvnCiG7w8e7bLlGX_1hjv3JzYD63H_nShx6e03HNym8BHEB6AsGYmLTZX4DeWxPFATH&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.callOnMessage(
> ClientConsumerImpl.java:982
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientConsumerImpl.java%3a982&c=E,1,u7UapKp5BMF8FGcc5a1o1mqSo9VYLaQB9g2QTFpAq6JJ7QM9KcnUSVGmRU6gmD1Nhsj3MvL9JRLtCVYIZF0WvI4vcyCEUmNNqgssxn13Ps4xX5YCGOBtHg,,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl$Runner.run
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fRunner.run&c=E,1,Fx9Ld2ucdlt3_6p-ZbXUjI_SlgsOyVwnvmcl26sXvFAgTasD82Md2N_EjrRcKbLKHue3crRJHPHyrS9ejGlfrFO8wOniNLS-IvPNqLJKNpgLX4QsXXBWD9A,&typo=1&ancr_add=1>
> (ClientConsumerImpl.java:1139
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientConsumerImpl.java%3a1139&c=E,1,rc9ciF96U-0OoMBNQHBoSVfqaE4jGKbWLg6dhaK7WCYccJMW3amIzQA1BbeqmFRAMPRJ0njluDkORyp2JTQnknVqWVJPaBOM-SkeJ10nmXQgAXi5bribWgIscg,,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(
> OrderedExecutor.java:57
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fOrderedExecutor.java%3a57&c=E,1,mjc9IcM1sJ7-uOg5Q8cQKmDZWoSLVx3L90bNL2gHGjbTzHFTWR9Xmct_l1fZGUi5wAFw9Mwwq0BKBqWVrd-FW4Un4tsclZjRGhsTiUhwKui3&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(
> OrderedExecutor.java:32
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fOrderedExecutor.java%3a32&c=E,1,cHlqlCUCUFKGLjJJXqVccNSrb6l0TU6LwNCMoGa_1JS3W_Tlr40V-eoNySZtvuVgWsTLXj9t1wbWD6Q1QyvSb3J_RD-tDpgwhR2HvVcCLsQWP9n46sn0AQ,,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(
> ProcessorBase.java:68
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fProcessorBase.java%3a68&c=E,1,aPd-9lYQMk9t6gU4ORJVd8KN5pDL6BL1FF3bEOJfXAVGd3R8xVHweKadojHKXFNIwP3mb5mxZqvLWKzUwc5UpkzSBbudUI0TIUaZn1NihrmKxHmhaYxG4JRzs2U,&typo=1&ancr_add=1>
> )
>
>                 at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1136
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fThreadPoolExecutor.java%3a1136&c=E,1,oagND5zTtV9BSx1sgOncvLVjcpkKY-EKyoMcGYwSN9Ot8Oi5rKTXzz1eSmqKOu-O3ZmIHGzOzWqOBfNJgaKAO7ppz38EdKXRuIexrEonLiQ14T0tVC5gwpU,&typo=1&ancr_add=1>
> )
>
>                 at java.base/java.util.concurrent.ThreadPoolExecutor$
> Worker.run
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fWorker.run&c=E,1,F2i39NYR5NdRjP-gobPho-u7VXh9GBPbIZCM3IadW-1x7RwzFF-Gopm5c7J-Qq23e5-kUzAtr063W2bI3X8IUk8qqAzKB3p5-sLoo8PKGJOWo1w,&typo=1&ancr_add=1>
> (ThreadPoolExecutor.java:635
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fThreadPoolExecutor.java%3a635&c=E,1,w1js-B5KJ0ipSc6iwmwlRNqjX4NTe6OPejHfmSCpfoVpoqrZvbyC8hS-BnVJt_FRfz_W69Hdw2KRaxS35X12VpuUX4vGZsbaT_ESJAYN&typo=1&ancr_add=1>
> )
>
>                 at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$
> 1.run
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2f1.run&c=E,1,8B8Nm6i_36dhHLklnYJiMhiHKPZ9-7v62nlKBipCORN3jPl48UpQ-O33Z5gKw2Teya4WG4tD58ANdXy8z8B_Yap08Y0xT5-q3VVV7o8NI-s7lfJDOb2INFI,&typo=1&ancr_add=1>
> (ActiveMQThreadFactory.java:118
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQThreadFactory.java%3a118&c=E,1,vhaxwhXsE4pqQ7Pen3JsMgiDE7-KBJwOeCEPNvwT7Te8NQprIRmm6R8WWN5q1BoIY6-WPOnhH0CqGgCfn8BPuKfxUV1Ru6_BVAp3DQoe7fGO&typo=1&ancr_add=1>
> )
>
> Caused by: ActiveMQUnBlockedException[errorType=UNBLOCKED
> message=AMQ219016: Connection failure detected. Unblocking a blocking call
> that will never get a response]
>
>                 ... 20 more
>
> Caused by: ActiveMQDisconnectedException[errorType=DISCONNECTED
> message=AMQ219015: The connection was disconnected because of server
> shutdown]
>
>                 at
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$
> CloseRunnable.run
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fCloseRunnable.run&c=E,1,2ZmSM23T2gvUV2hroRdgg3OwuDZnBIHtk7Gj4eKcrdwreoRXOxGLAaVqJ9fUjPFH7SpJd-nx9EJStjl1Yi82QTrDTh0h1YY7POcwXzuug32Yfaw,&typo=1&ancr_add=1>
> (ClientSessionFactoryImpl.java:1172
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientSessionFactoryImpl.java%3a1172&c=E,1,IHroZJaGwnx3TWcbWFH18rbHo19rhJimanChxpwpg_hwV0dN0nSO0LkZf061Q4xY2XSA_kkMQq6PtraZkUF9VQipkUtLU9uZ9jUUYFeD&typo=1&ancr_add=1>
> )
>
>                 ... 6 more
>
>
>
> We retry again, and this third attempt at Producer.send() does succeed, as
> seen in the backup broker’s log:
>
> 2024-01-23 22:32:23,937 INFO
> [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: HEADER=
> {"version":1,"type":"testing_echo_response","id":"j6ugszoiu1gl","http_code":200}…
>
>
> 2024-01-23 22:32:23,937 INFO
> [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] DELIVER: HEADER=
> {"version":1,"type":"testing_echo_response","id":"j6ugszoiu1gl","http_code":200}…
>
>
>
> Unfortunately, by this time a whole 63 seconds has gone by form the RPC
> caller’s point of view, and our RPC client timed out and gave up.
>
>
>
> It seems to us that the problem can be summarized as "Once the client gets
> the 'AMQ219014: Timed out after waiting 15000 ms' error, an attempt at
> retry will fail again after 44 seconds".
>
>
>
> It is worth noting that, in our send-retry code, we do not attempt to
> destroy/recreate the Connection, Session, or Producer; we believe that the
> client should take care of that for us.  Which it mostly does, except for
> this one case.  And even in this case it does eventually, but the 44-second
> delay is too long for us.  And it is unclear where that 44-second delay
> even comes from.
>
>
>
> FYI our retry loop looks like:
>
> private static final int SEND_RETRIES = 3;
>
> private static final long SEND_RETRY_DELAY_MS = 2000;
>
> ...
>
> var producer = ticket.pi.getProducer();
>
> for (int retry = 0; retry < SEND_RETRIES; retry++) {
>
>   try {
>
>     producer.send(ticket.destination, jmsRequest,
> producer.getDeliveryMode(), producer.getPriority(), ttlMs);
>
>     break;
>
>   } catch (javax.jms.JMSException ex) {
>
>     if (Arrays.stream
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fArrays.stream&c=E,1,q8qmG4USRQcV-y4DdZPj8L9ra-iq79AWF2VL7Jz-d8geggbDGiaFMkKwSEkV2Q-3OF1BWkM7ESmd3Vl0qfPFM0bgnFB71LbRXrMFUe02Y2MuT8VrqwrGXa-IYq0,&typo=1&ancr_add=1>(retryableCodes).anyMatch(code
> -> ex.getMessage().contains(code)) && retry + 1 < SEND_RETRIES) {
>
>       LOG.warn("Error sending message, will retry", ex);
>
>       Thread.sleep(SEND_RETRY_DELAY_MS);
>
>       continue;
>
>     } else {
>
>       throw ex;
>
>     }
>
>   }
>
> }
>
>
>
> Also see the thread dump generated at the 60-second mark, which is *
> *after** the first retry fails but **before** the second retry fails (in
> other words, this is the thread dump of the JVM state when our second
> attempt at Producer.send() is pending):
>
>
> https://drive.google.com/file/d/10dIWqAL65zwWMEfN03WGzC_Ya1QayPGB/view?usp=sharing
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f10dIWqAL65zwWMEfN03WGzC_Ya1QayPGB%2fview%3fusp%3dsharing&c=E,1,lSiz9_2UYM2vFybSlyoASL1tvg_Z30xWs2ZGpgsFFuvAuOG1rMBhhQIXnq5lFXb-XOIOC1do4p1t2QiWv-4aaTtk-AlfzJATe1lqQGd9Vk5IHc2YPvnauKetHYg,&typo=1>
>
>
>
> Our questions come down to two things:
>
>    - Is this a bug in the AMQ JMS client?
>    - When we encounter this error, should we attempt to
>    close/destroy/recreate the Producer, Session, or Connection?
>
>
>
> Please let me know if you can think of a workaround, or if there is more
> information we should capture.  This problem is readily reproducible.
>
>
>
> Thanks
>
> John
>
>
>
>
>
>
>
> [image: rg]
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.redpointglobal.com%2f&c=E,1,FNxLyCg6EZpqCXU3KJhfSUofx4lGIaLC8rwl24yjbZJL9LYFgjo0C7OHTdxUO7LIHhzRqXtOxO_1jOzh-PMrEMYcTItHHGQH4zEmg4_OKg,,&typo=1>
>
> *John Lilley *
>
> *Data Management Chief Architect, Redpoint Global Inc. *
>
> 34 Washington Street, Suite 205 Wellesley Hills, MA 02481
>
> *M: *+1 7209385761 <+1%207209385761> | john.lilley@redpointglobal.com
>
>
> PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is
> confidential and is intended solely for the use of the individual(s) to
> whom it is addressed. If you believe you received this e-mail in error,
> please notify the sender immediately, delete the e-mail from your computer
> and do not copy, print or disclose it to anyone else. If you properly
> received this e-mail as a customer, partner or vendor of Redpoint, you
> should maintain its contents in confidence subject to the terms and
> conditions of your agreement(s) with Redpoint.
>
>
> PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is
> confidential and is intended solely for the use of the individual(s) to
> whom it is addressed. If you believe you received this e-mail in error,
> please notify the sender immediately, delete the e-mail from your computer
> and do not copy, print or disclose it to anyone else. If you properly
> received this e-mail as a customer, partner or vendor of Redpoint, you
> should maintain its contents in confidence subject to the terms and
> conditions of your agreement(s) with Redpoint.
>
> PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is
> confidential and is intended solely for the use of the individual(s) to
> whom it is addressed. If you believe you received this e-mail in error,
> please notify the sender immediately, delete the e-mail from your computer
> and do not copy, print or disclose it to anyone else. If you properly
> received this e-mail as a customer, partner or vendor of Redpoint, you
> should maintain its contents in confidence subject to the terms and
> conditions of your agreement(s) with Redpoint.
>