You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by foo bar <st...@gmail.com> on 2021/06/21 13:13:46 UTC

ActiveMQ Artemis Core Bridge stops forwarding messages

Hello,

I have managed to replicate an issue that is occuring in our production
environment with the setup noted below. We are using ActiveMQ Artemis
version 2.15.

There are two brokers (broker1 and broker2). Each broker has only one queue
called test. There is a bridge from broker1 test queue to broker2 test
queue.

<bridges>
 <bridge name="broker1-to-broker2-bridge">
    <queue-name>test</queue-name>
    <static-connectors>
       <connector-ref>broker2-connector</connector-ref>
    </static-connectors>
 </bridge>
</bridges>

Using a java program (JMS API) we create 10 producers. Each producer puts
200 messages into the broker1 test queue (concurrently). The size of each
message is 70 megabytes.What should happen is that all messages should move
across the bridge and eventually end up in the broker2 test queue. What
happens is approximately 1 message (in some cases more) will make it to the
broker2 test queue, the other messages pile up in the broker1 test queue.
The consumer count on the broker1 test queue is always 1 and the delivering
count is 0. At this point nothing will ever get through the bridge - it is
completely sunk. There are absolutely *no log messages* in broker1 or
broker2 that would indicate something is wrong. Producers can however still
send messages to the broker1 test queue. In our actual production
environment, where all messages are not 70 megabytes, a restart of the
broker will move messages across the bridge until the issue occurs again
and we need to *manually restart the broker*.

If I rerun the above test with debug on, the only thing that seems to stick
out in the logs is that there are lots of SessionProducerCreditsMessages"
with credits=0 which leads us to the test below.

If we rerun the our test again with this configuration:

<bridges>
 <bridge name="broker1-to-broker2-bridge">
    <queue-name>test</queue-name>
    *<producer-window-size>209715200</producer-window-size>*
    <static-connectors>
       <connector-ref>broker2-connector</connector-ref>
    </static-connectors>
 </bridge>
</bridges>


The messages all flow from broker1 test queue to broker 2 test queue. The
documentation says this about producer-window-size:


   - producer-window-size. This optional parameter determines the producer
   flow control through the bridge. You usually leave this off unless you are
   dealing with huge large messages


*Questions*

1.) Could not configuring a producer-window-size on the bridge cause the
issue noted above? It says that it is disabled by default.
2.) What is considered a really huge large message?
3.) Should there not be something in the logs to indicate that something
fatal is going here?
4.) If you do need to configure the producer-window-size, how do you
determine the size that it should be set to (especially if you get all
sorts of message sizes)?
5.) Is there any reason why we wouldn't just set a very high
producer-window-size on all of our bridges? The only time the bridge should
not send messages is if the other side is down for some reason.

The issue is with Artemis 2.15, but we can also reproduce this on 2.16.

Thanks

Re: ActiveMQ Artemis Core Bridge stops forwarding messages

Posted by foo bar <st...@gmail.com>.

*> Typically this comes down to tuning performance & providing safety for
theremote broker. The higher the producerWindowSize the less often
theproducer will ask for credits and therefore the less likely it will be
toblock which means the more likely it is to overwhelm the remote broker.*

Which is what we gathered. By that token, the default producer (on a
bridge) should work even for large messages, albeit slightly slower since
it has to ask the broker
for credits more often. However, it does not seem to be working that way
with the version that we are using.

*> Can you reproduce this on 2.17.0? How about 2.18.0-SNAPSHOT [3]? *

I have done some preliminary testing with version 2.17 and 2.18 snapshot. I
was **not** able to reproduce the problem with those versions. Is there
something specific
in 2.17 that fixes the problem? I didn't see anything in the release notes
or corresponding JIRA tickets.

If we do upgrade, can we use the 2.15 client with the 2.17/2.18 server?

I have included my test broker files and corresponding test.


*Bridge Test*

public class BridgeTest {

    public static void main(String[] args) throws Exception {
        System.out.println("Start test");

        var threads = new ArrayList<Thread>();
        var largeMessage = "X".repeat(1024 * 1024 * 70);
        for (int i = 0; i < 10; i++) {
            var thread = new Thread(() -> {
                try (var factory = new
ActiveMQConnectionFactory("tcp://localhost:61620");
                    var context = factory.createContext()) {

context.createProducer().send(context.createQueue("test"), largeMessage);
                }
            });
            threads.add(thread);
            thread.start();
        }
        for (var thread : threads) {
            thread.join();
        }

        System.out.println("Finish test");
    }
}

*Broker 1*

<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:xi="http://www.w3.org/2001/XInclude"
               xsi:schemaLocation="urn:activemq
/schema/artemis-configuration.xsd">

   <core xmlns="urn:activemq:core" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="urn:activemq:core ">

      <name>0.0.0.0</name>

      <persistence-enabled>true</persistence-enabled>

      <journal-type>MAPPED</journal-type>

      <paging-directory>data/paging</paging-directory>

      <bindings-directory>data/bindings</bindings-directory>

      <journal-directory>data/journal</journal-directory>


<large-messages-directory>data/large-messages</large-messages-directory>

      <journal-datasync>true</journal-datasync>

      <journal-min-files>2</journal-min-files>

      <journal-pool-files>10</journal-pool-files>

      <journal-device-block-size>4096</journal-device-block-size>

      <journal-file-size>10M</journal-file-size>

      <journal-buffer-timeout>0</journal-buffer-timeout>

      <journal-max-io>1</journal-max-io>

      <disk-scan-period>5000</disk-scan-period>

      <max-disk-usage>90</max-disk-usage>

      <critical-analyzer>true</critical-analyzer>

      <critical-analyzer-timeout>120000</critical-analyzer-timeout>

      <critical-analyzer-check-period>60000</critical-analyzer-check-period>

      <critical-analyzer-policy>HALT</critical-analyzer-policy>

      <page-sync-timeout>14368000</page-sync-timeout>

       <connectors>
         <connector
name="broker1-connector">tcp://localhost:61620</connector>
         <connector
name="broker2-connector">tcp://localhost:61621</connector>
      </connectors>

      <acceptors>
         <acceptor name="broker1-acceptor">tcp://localhost:61620</acceptor>
      </acceptors>

      <bridges>
         <bridge name="broker1-to-broker2-bridge">
            <queue-name>test</queue-name>
           <!-- <producer-window-size>209715200</producer-window-size> -->
            <static-connectors>
               <connector-ref>broker2-connector</connector-ref>
            </static-connectors>
         </bridge>
      </bridges>

      <security-settings>
         <security-setting match="#">
            <permission type="createNonDurableQueue" roles="amq"/>
            <permission type="deleteNonDurableQueue" roles="amq"/>
            <permission type="createDurableQueue" roles="amq"/>
            <permission type="deleteDurableQueue" roles="amq"/>
            <permission type="createAddress" roles="amq"/>
            <permission type="deleteAddress" roles="amq"/>
            <permission type="consume" roles="amq"/>
            <permission type="browse" roles="amq"/>
            <permission type="send" roles="amq"/>
            <!-- we need this otherwise ./artemis data imp wouldn't work -->
            <permission type="manage" roles="amq"/>
         </security-setting>
      </security-settings>

      <address-settings>
         <!-- if you define auto-create on certain queues, management has
to be auto-create -->
         <address-setting match="activemq.management#">
            <dead-letter-address>DLQ</dead-letter-address>
            <expiry-address>ExpiryQueue</expiry-address>
            <redelivery-delay>0</redelivery-delay>
            <!-- with -1 only the global-max-size is in use for limiting -->
            <max-size-bytes>-1</max-size-bytes>

<message-counter-history-day-limit>10</message-counter-history-day-limit>
            <address-full-policy>PAGE</address-full-policy>
            <auto-create-queues>true</auto-create-queues>
            <auto-create-addresses>true</auto-create-addresses>
            <auto-create-jms-queues>true</auto-create-jms-queues>
            <auto-create-jms-topics>true</auto-create-jms-topics>
         </address-setting>
         <!--default for catch all-->
         <address-setting match="#">
            <dead-letter-address>DLQ</dead-letter-address>
            <expiry-address>ExpiryQueue</expiry-address>
            <redelivery-delay>0</redelivery-delay>
            <!-- with -1 only the global-max-size is in use for limiting -->
            <max-size-bytes>-1</max-size-bytes>

<message-counter-history-day-limit>10</message-counter-history-day-limit>
            <address-full-policy>PAGE</address-full-policy>
            <auto-create-queues>true</auto-create-queues>
            <auto-create-addresses>true</auto-create-addresses>
            <auto-create-jms-queues>true</auto-create-jms-queues>
            <auto-create-jms-topics>true</auto-create-jms-topics>
         </address-setting>
      </address-settings>

      <addresses>
         <address name="DLQ">
            <anycast>
               <queue name="DLQ" />
            </anycast>
         </address>
         <address name="test">
            <anycast>
               <queue name="test" />
            </anycast>
         </address>
      </addresses>

   </core>
</configuration>


*Broker 2*

<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:xi="http://www.w3.org/2001/XInclude"
               xsi:schemaLocation="urn:activemq
/schema/artemis-configuration.xsd">

   <core xmlns="urn:activemq:core" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="urn:activemq:core ">

      <name>0.0.0.0</name>

      <persistence-enabled>true</persistence-enabled>

      <journal-type>MAPPED</journal-type>

      <paging-directory>data/paging</paging-directory>

      <bindings-directory>data/bindings</bindings-directory>

      <journal-directory>data/journal</journal-directory>


<large-messages-directory>data/large-messages</large-messages-directory>

      <journal-datasync>true</journal-datasync>

      <journal-min-files>2</journal-min-files>

      <journal-pool-files>10</journal-pool-files>

      <journal-device-block-size>4096</journal-device-block-size>

      <journal-file-size>10M</journal-file-size>

      <journal-buffer-timeout>0</journal-buffer-timeout>

      <journal-max-io>1</journal-max-io>

      <disk-scan-period>5000</disk-scan-period>

      <max-disk-usage>90</max-disk-usage>

      <critical-analyzer>true</critical-analyzer>

      <critical-analyzer-timeout>120000</critical-analyzer-timeout>

      <critical-analyzer-check-period>60000</critical-analyzer-check-period>

      <critical-analyzer-policy>HALT</critical-analyzer-policy>

      <page-sync-timeout>14420000</page-sync-timeout>

      <connectors>
         <connector
name="broker2-connector">tcp://localhost:61621</connector>
      </connectors>

      <acceptors>
         <acceptor name="broker2-acceptor">tcp://localhost:61621</acceptor>
      </acceptors>

      <security-settings>
         <security-setting match="#">
            <permission type="createNonDurableQueue" roles="amq"/>
            <permission type="deleteNonDurableQueue" roles="amq"/>
            <permission type="createDurableQueue" roles="amq"/>
            <permission type="deleteDurableQueue" roles="amq"/>
            <permission type="createAddress" roles="amq"/>
            <permission type="deleteAddress" roles="amq"/>
            <permission type="consume" roles="amq"/>
            <permission type="browse" roles="amq"/>
            <permission type="send" roles="amq"/>
            <!-- we need this otherwise ./artemis data imp wouldn't work -->
            <permission type="manage" roles="amq"/>
         </security-setting>
      </security-settings>

      <address-settings>
         <!-- if you define auto-create on certain queues, management has
to be auto-create -->
         <address-setting match="activemq.management#">
            <dead-letter-address>DLQ</dead-letter-address>
            <expiry-address>ExpiryQueue</expiry-address>
            <redelivery-delay>0</redelivery-delay>
            <!-- with -1 only the global-max-size is in use for limiting -->
            <max-size-bytes>-1</max-size-bytes>

<message-counter-history-day-limit>10</message-counter-history-day-limit>
            <address-full-policy>PAGE</address-full-policy>
            <auto-create-queues>true</auto-create-queues>
            <auto-create-addresses>true</auto-create-addresses>
            <auto-create-jms-queues>true</auto-create-jms-queues>
            <auto-create-jms-topics>true</auto-create-jms-topics>
         </address-setting>
         <!--default for catch all-->
         <address-setting match="#">
            <dead-letter-address>DLQ</dead-letter-address>
            <expiry-address>ExpiryQueue</expiry-address>
            <redelivery-delay>0</redelivery-delay>
            <!-- with -1 only the global-max-size is in use for limiting -->
            <max-size-bytes>-1</max-size-bytes>

<message-counter-history-day-limit>10</message-counter-history-day-limit>
            <address-full-policy>PAGE</address-full-policy>
            <auto-create-queues>true</auto-create-queues>
            <auto-create-addresses>true</auto-create-addresses>
            <auto-create-jms-queues>true</auto-create-jms-queues>
            <auto-create-jms-topics>true</auto-create-jms-topics>
         </address-setting>
      </address-settings>

      <addresses>
         <address name="DLQ">
            <anycast>
               <queue name="DLQ" />
            </anycast>
         </address>
         <address name="test">
            <anycast>
               <queue name="test" />
            </anycast>
         </address>
      </addresses>

   </core>
</configuration>





On Mon, Jun 21, 2021 at 1:59 PM Justin Bertram <jb...@apache.org> wrote:

> > Could not configuring a producer-window-size on the bridge cause the
> issue noted above? It says that it is disabled by default.
>
> Clearly it's connected at some level, but it's hard to say more without a
> better understanding of the root-cause.
>
> > What is considered a really huge large message?
>
> This documentation was added as part of ARTEMIS-94 [1] where the size of a
> message exceeded the max value of a Java Integer [2] so this appears to be
> the threshold of "really huge".
>
> > Should there not be something in the logs to indicate that something
> fatal is going here?
>
> Possibly. Again, it's hard to say more without understanding the
> root-cause.
>
> > If you do need to configure the producer-window-size, how do you
> determine the size that it should be set to (especially if you get all
> sorts of message sizes)?
>
> Typically this comes down to tuning performance & providing safety for the
> remote broker. The higher the producerWindowSize the less often the
> producer will ask for credits and therefore the less likely it will be to
> block which means the more likely it is to overwhelm the remote broker.
>
> > Is there any reason why we wouldn't just set a very high
> producer-window-size on all of our bridges? The only time the bridge should
> not send messages is if the other side is down for some reason.
>
> See my previous answer.
>
>
> Can you reproduce this on 2.17.0? How about 2.18.0-SNAPSHOT [3]?
>
>
> Justin
>
> [1] https://issues.apache.org/jira/browse/ARTEMIS-94
> [2]
> https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html#MAX_VALUE
> [3]
>
> https://repository.apache.org/content/repositories/snapshots/org/apache/activemq/apache-artemis/2.18.0-SNAPSHOT/
>
> On Mon, Jun 21, 2021 at 8:14 AM foo bar <st...@gmail.com> wrote:
>
> > Hello,
> >
> > I have managed to replicate an issue that is occuring in our production
> > environment with the setup noted below. We are using ActiveMQ Artemis
> > version 2.15.
> >
> > There are two brokers (broker1 and broker2). Each broker has only one
> queue
> > called test. There is a bridge from broker1 test queue to broker2 test
> > queue.
> >
> > <bridges>
> >  <bridge name="broker1-to-broker2-bridge">
> >     <queue-name>test</queue-name>
> >     <static-connectors>
> >        <connector-ref>broker2-connector</connector-ref>
> >     </static-connectors>
> >  </bridge>
> > </bridges>
> >
> > Using a java program (JMS API) we create 10 producers. Each producer puts
> > 200 messages into the broker1 test queue (concurrently). The size of each
> > message is 70 megabytes.What should happen is that all messages should
> move
> > across the bridge and eventually end up in the broker2 test queue. What
> > happens is approximately 1 message (in some cases more) will make it to
> the
> > broker2 test queue, the other messages pile up in the broker1 test queue.
> > The consumer count on the broker1 test queue is always 1 and the
> delivering
> > count is 0. At this point nothing will ever get through the bridge - it
> is
> > completely sunk. There are absolutely *no log messages* in broker1 or
> > broker2 that would indicate something is wrong. Producers can however
> still
> > send messages to the broker1 test queue. In our actual production
> > environment, where all messages are not 70 megabytes, a restart of the
> > broker will move messages across the bridge until the issue occurs again
> > and we need to *manually restart the broker*.
> >
> > If I rerun the above test with debug on, the only thing that seems to
> stick
> > out in the logs is that there are lots of SessionProducerCreditsMessages"
> > with credits=0 which leads us to the test below.
> >
> > If we rerun the our test again with this configuration:
> >
> > <bridges>
> >  <bridge name="broker1-to-broker2-bridge">
> >     <queue-name>test</queue-name>
> >     *<producer-window-size>209715200</producer-window-size>*
> >     <static-connectors>
> >        <connector-ref>broker2-connector</connector-ref>
> >     </static-connectors>
> >  </bridge>
> > </bridges>
> >
> >
> > The messages all flow from broker1 test queue to broker 2 test queue. The
> > documentation says this about producer-window-size:
> >
> >
> >    - producer-window-size. This optional parameter determines the
> producer
> >    flow control through the bridge. You usually leave this off unless you
> > are
> >    dealing with huge large messages
> >
> >
> > *Questions*
> >
> > 1.) Could not configuring a producer-window-size on the bridge cause the
> > issue noted above? It says that it is disabled by default.
> > 2.) What is considered a really huge large message?
> > 3.) Should there not be something in the logs to indicate that something
> > fatal is going here?
> > 4.) If you do need to configure the producer-window-size, how do you
> > determine the size that it should be set to (especially if you get all
> > sorts of message sizes)?
> > 5.) Is there any reason why we wouldn't just set a very high
> > producer-window-size on all of our bridges? The only time the bridge
> should
> > not send messages is if the other side is down for some reason.
> >
> > The issue is with Artemis 2.15, but we can also reproduce this on 2.16.
> >
> > Thanks
> >
>

Re: ActiveMQ Artemis Core Bridge stops forwarding messages

Posted by Justin Bertram <jb...@apache.org>.

> Could not configuring a producer-window-size on the bridge cause the
issue noted above? It says that it is disabled by default.

Clearly it's connected at some level, but it's hard to say more without a
better understanding of the root-cause.

> What is considered a really huge large message?

This documentation was added as part of ARTEMIS-94 [1] where the size of a
message exceeded the max value of a Java Integer [2] so this appears to be
the threshold of "really huge".

> Should there not be something in the logs to indicate that something
fatal is going here?

Possibly. Again, it's hard to say more without understanding the root-cause.

> If you do need to configure the producer-window-size, how do you
determine the size that it should be set to (especially if you get all
sorts of message sizes)?

Typically this comes down to tuning performance & providing safety for the
remote broker. The higher the producerWindowSize the less often the
producer will ask for credits and therefore the less likely it will be to
block which means the more likely it is to overwhelm the remote broker.

> Is there any reason why we wouldn't just set a very high
producer-window-size on all of our bridges? The only time the bridge should
not send messages is if the other side is down for some reason.

See my previous answer.


Can you reproduce this on 2.17.0? How about 2.18.0-SNAPSHOT [3]?


Justin

[1] https://issues.apache.org/jira/browse/ARTEMIS-94
[2]
https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html#MAX_VALUE
[3]
https://repository.apache.org/content/repositories/snapshots/org/apache/activemq/apache-artemis/2.18.0-SNAPSHOT/

On Mon, Jun 21, 2021 at 8:14 AM foo bar <st...@gmail.com> wrote:

> Hello,
>
> I have managed to replicate an issue that is occuring in our production
> environment with the setup noted below. We are using ActiveMQ Artemis
> version 2.15.
>
> There are two brokers (broker1 and broker2). Each broker has only one queue
> called test. There is a bridge from broker1 test queue to broker2 test
> queue.
>
> <bridges>
>  <bridge name="broker1-to-broker2-bridge">
>     <queue-name>test</queue-name>
>     <static-connectors>
>        <connector-ref>broker2-connector</connector-ref>
>     </static-connectors>
>  </bridge>
> </bridges>
>
> Using a java program (JMS API) we create 10 producers. Each producer puts
> 200 messages into the broker1 test queue (concurrently). The size of each
> message is 70 megabytes.What should happen is that all messages should move
> across the bridge and eventually end up in the broker2 test queue. What
> happens is approximately 1 message (in some cases more) will make it to the
> broker2 test queue, the other messages pile up in the broker1 test queue.
> The consumer count on the broker1 test queue is always 1 and the delivering
> count is 0. At this point nothing will ever get through the bridge - it is
> completely sunk. There are absolutely *no log messages* in broker1 or
> broker2 that would indicate something is wrong. Producers can however still
> send messages to the broker1 test queue. In our actual production
> environment, where all messages are not 70 megabytes, a restart of the
> broker will move messages across the bridge until the issue occurs again
> and we need to *manually restart the broker*.
>
> If I rerun the above test with debug on, the only thing that seems to stick
> out in the logs is that there are lots of SessionProducerCreditsMessages"
> with credits=0 which leads us to the test below.
>
> If we rerun the our test again with this configuration:
>
> <bridges>
>  <bridge name="broker1-to-broker2-bridge">
>     <queue-name>test</queue-name>
>     *<producer-window-size>209715200</producer-window-size>*
>     <static-connectors>
>        <connector-ref>broker2-connector</connector-ref>
>     </static-connectors>
>  </bridge>
> </bridges>
>
>
> The messages all flow from broker1 test queue to broker 2 test queue. The
> documentation says this about producer-window-size:
>
>
>    - producer-window-size. This optional parameter determines the producer
>    flow control through the bridge. You usually leave this off unless you
> are
>    dealing with huge large messages
>
>
> *Questions*
>
> 1.) Could not configuring a producer-window-size on the bridge cause the
> issue noted above? It says that it is disabled by default.
> 2.) What is considered a really huge large message?
> 3.) Should there not be something in the logs to indicate that something
> fatal is going here?
> 4.) If you do need to configure the producer-window-size, how do you
> determine the size that it should be set to (especially if you get all
> sorts of message sizes)?
> 5.) Is there any reason why we wouldn't just set a very high
> producer-window-size on all of our bridges? The only time the bridge should
> not send messages is if the other side is down for some reason.
>
> The issue is with Artemis 2.15, but we can also reproduce this on 2.16.
>
> Thanks
>