You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@activemq.apache.org by GitBox <gi...@apache.org> on 2020/12/05 23:11:09 UTC

[GitHub] [activemq-artemis] franz1981 opened a new pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

franz1981 opened a new pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370


   https://issues.apache.org/jira/browse/ARTEMIS-3021


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on a change in pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on a change in pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#discussion_r552075767



##########
File path: artemis-core-client/src/main/java/org/apache/activemq/artemis/core/message/impl/CoreMessage.java
##########
@@ -622,6 +619,12 @@ private RuntimeException onCheckPropertiesError(Throwable e) {
    @Override
    public int getMemoryEstimate() {
       if (memoryEstimate == -1) {
+         // durable messages can enlarge buffer::capacity while persisted:
+         // better to encode it to properly account its max capacity
+         if (isDurable()) {
+            checkEncode();

Review comment:
       That means that a durable message would get encoded some time before then expected: @clebertsuconic it's something we can tolerate? 
   if not the ideal thing would be to estimate it based on its state, but I'm not sure is feasable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-763872471


   @clebertsuconic @jbertram 
   I see that we've some concurrency issue on `TypedProperties`: 
   ```
   org.apache.activemq.artemis.core.protocol.hornetq.PropertiesConversionTest.testMultiThreadChanges
   org.apache.activemq.artemis.tests.compatibility.HornetQSoakTest.testSoakHornetQ on testSoakHornetQ(org.apache.activemq.artemis.tests.compatibility.HornetQSoakTest)
   org.apache.activemq.artemis.tests.integration.cluster.failover.ReplicatedMultipleServerFailoverExtraBackupsTest.testStartLiveFirst
   ```
   These tests were failing with my first version of PR that was enabling some strict assertion to check for concurrent modification of `TypedProperties` during `CoreMessage::encode` calls.
   I believe the change I've made here to replaced the assert with a debug log is masking a real (rare) issue. wdyt?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] michaelandrepearce commented on a change in pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
michaelandrepearce commented on a change in pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#discussion_r578785892



##########
File path: artemis-core-client/src/main/java/org/apache/activemq/artemis/core/message/impl/CoreMessage.java
##########
@@ -426,7 +426,6 @@ protected CoreMessage(CoreMessage other, TypedProperties copyProperties) {
       // with getEncodedBuffer(), otherwise can introduce race condition when delivering concurrently to
       // many subscriptions and bridging to other nodes in a cluster
       synchronized (other) {
-         this.body = other.body;

Review comment:
       I might be a plank, but surely this avoids using up memory as the body object is copied by reference here, rather than later needing to deserialise again and creating a new object in memory.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-740451251


   @gemmellr it seems that AMQP isn't affected by this because a pooled Netty (heap) buffer is used to encode stuff and then copy an exact sized byte[] where needed but I see another issue and is exactly the polled Netty heap buffer.
   That stealthy makes the broker memory footprint on heap much larger because Netty would create large byte[] arenas. I suggest to not use pooled heap buffer or it can stealthy cause OOM as well.
   This means either:
   - use off heap pooled Netty buffers to temporarily encode things 
   - use unpooled heap smartly sizes ones to save both the additional copy and pooling as a whole
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] gtully commented on a change in pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
gtully commented on a change in pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#discussion_r749395283



##########
File path: artemis-commons/src/main/java/org/apache/activemq/artemis/utils/ByteUtil.java
##########
@@ -389,6 +389,16 @@ private static boolean equalsUnsafe(final byte[] left,
       return true;
    }
 
+   /**
+    * This ensure a more exact resizing then {@link ByteBuf#ensureWritable(int), if needed.<br>

Review comment:
       this seems problematic:
   /home/runner/work/activemq-artemis/activemq-artemis/artemis-commons/src/main/java/org/apache/activemq/artemis/utils/ByteUtil.java:393: error: unexpected text after parenthesis
   Error:      * This ensure a more exact resizing then {@link ByteBuf#ensureWritable(int), if needed.<br>
   Error:                                               ^




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-766613506


   Seems that writing a proper reproducer for this is hard: I can reproduce it by hand by using a 2 nodes cluster. (upstream and downstream), a disconnected durable subscriber on the downstream broker, 40K messages and a producer that keep running vs the upstream one until getting the downstream broker to go OOM. On this scenario a dump on the upstream broker shows 64KiB buffer sized core messages ie the next available power of 2 sized one.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on a change in pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on a change in pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#discussion_r552075767



##########
File path: artemis-core-client/src/main/java/org/apache/activemq/artemis/core/message/impl/CoreMessage.java
##########
@@ -622,6 +619,12 @@ private RuntimeException onCheckPropertiesError(Throwable e) {
    @Override
    public int getMemoryEstimate() {
       if (memoryEstimate == -1) {
+         // durable messages can enlarge buffer::capacity while persisted:
+         // better to encode it to properly account its max capacity
+         if (isDurable()) {
+            checkEncode();

Review comment:
       That means that a durable message would get encoded some time before then expected: @clebertsuconic it's something we can tolerate? 
   if not the ideal thing would be to estimate it based on its state, but I'm not sure is feasible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-763872471


   @clebertsuconic @jbertram 
   I see that we've some concurrency issue on `TypedProperties`: 
   ```
   org.apache.activemq.artemis.core.protocol.hornetq.PropertiesConversionTest.testMultiThreadChanges
   org.apache.activemq.artemis.tests.compatibility.HornetQSoakTest.testSoakHornetQ on testSoakHornetQ(org.apache.activemq.artemis.tests.compatibility.HornetQSoakTest)
   org.apache.activemq.artemis.tests.integration.cluster.failover.ReplicatedMultipleServerFailoverExtraBackupsTest.testStartLiveFirst
   ```
   These tests were failing with my first version of PR that was enabling some strict assertion to check for concurrent modification of `TypedProperties` during `CoreMessage::encode` calls.
   I believe the change I've made here to replace the assert with a debug log is masking a real (rare) issue. wdyt?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-970395697


   I need to check if https://github.com/apache/activemq-artemis/pull/3855 together with these changes isn't breaking lazy decoding of core messages on journal loading optimization


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-759505479


   This is breaking many tests, need to understand what's going on
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-983325073


   Given that CI is green and I got reviewed by @clebertsuconic about it, I think this is ready to go...merging soon today if none complain :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on a change in pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on a change in pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#discussion_r578949930



##########
File path: artemis-core-client/src/main/java/org/apache/activemq/artemis/core/message/impl/CoreMessage.java
##########
@@ -426,7 +426,6 @@ protected CoreMessage(CoreMessage other, TypedProperties copyProperties) {
       // with getEncodedBuffer(), otherwise can introduce race condition when delivering concurrently to
       // many subscriptions and bridging to other nodes in a cluster
       synchronized (other) {
-         this.body = other.body;

Review comment:
       That body param seems was unused, I just clean it up as dead code




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-739480817


   @gemmellr I suppose that the Netty ByteBuf behaviour in case the buffer is enlarged can (and should) be fixed for AMQP as well:
   TLDR if you got a 32K msg and write a single byte into it, Netty will enlarge it to be the next power of 2 capacity ie 64K, wasting ~50% of its space in case the message will sit on the queue awaiting it to be sent.
   
   I've "solved" this by forcing Netty to know upfront the desired required capacity and use https://netty.io/4.0/api/io/netty/buffer/ByteBuf.html#capacity-int- to size it upfront.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-969115886


   I see that we change copied message address while moving messages and that is causing re-encoding: it would be better to save it to happen as a whole...adding a commit to address it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-766613506


   Seems that writing a proper reproducer for this is hard: I can reproduce it by hand by using a 2 nodes cluster. (upstream and downstream), a disconnected durable subscriber on the downstream broker, 40K messages and a producer that keep running vs the upstream one until getting the downstream broker to go OOM. On this scenario an heap dump on the upstream broker shows 64KiB buffer sized core messages ie the next available power of 2 sized one.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-764505865


   the CI seems fine with this, so I'm just going to prepare a test for the specific change and we're ready to go 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-970076874


   CI was green but I would like to add a benchmark + a test to verify it won't regress in the future
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on a change in pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on a change in pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#discussion_r753982983



##########
File path: artemis-core-client/src/main/java/org/apache/activemq/artemis/core/message/impl/CoreMessage.java
##########
@@ -748,21 +754,42 @@ public synchronized CoreMessage encode() {
 
    public void encodeHeadersAndProperties(final ByteBuf buffer) {
       final TypedProperties properties = getProperties();
-      messageIDPosition = buffer.writerIndex();
-      buffer.writeLong(messageID);
-      SimpleString.writeNullableSimpleString(buffer, address);
-      if (userID == null) {
-         buffer.writeByte(DataConstants.NULL);
-      } else {
-         buffer.writeByte(DataConstants.NOT_NULL);
-         buffer.writeBytes(userID.asBytes());
+      final int initialWriterIndex = buffer.writerIndex();
+      messageIDPosition = initialWriterIndex;
+      final UUID userID = this.userID;
+      final int userIDEncodedSize = userID == null ? Byte.BYTES : Byte.BYTES + userID.asBytes().length;
+      final SimpleString address = this.address;
+      final int addressEncodedBytes = SimpleString.sizeofNullableString(address);
+      final int headersSize =
+         Long.BYTES +                                    // messageID
+         addressEncodedBytes +                           // address
+         userIDEncodedSize +                             // userID
+         Byte.BYTES +                                    // type
+         Byte.BYTES +                                    // durable
+         Long.BYTES +                                    // expiration
+         Long.BYTES +                                    // timestamp
+         Byte.BYTES;                                     // priority
+      synchronized (properties) {

Review comment:
       @clebertsuconic this is needed because there are tests/cases where `properties`'s content can change concurrently and the presized underlying  buffer using `properties::getEncodeSize` is no longer valid.
   
   I think we shouldn't allow `properties` to change concurrently while encoding is happening: is it a valid use case? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-740451251


   @gemmellr it seems that AMQP isn't affected by this because a pooled Netty (heap) buffer is used to encode stuff and then copy an exact sized byte[] where needed but I see another issue and is the pooled Netty heap buffer.
   That stealthy makes the broker memory footprint on heap much larger because Netty would create large byte[] arenas. I suggest to not use pooled heap buffer or it can stealthy cause OOM as well.
   This means either:
   - use off heap pooled Netty buffers to temporarily encode things 
   - use unpooled heap smartly sizes ones to save both the additional copy and pooling as a whole
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-766613506






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-739480817


   @gemmellr I suppose that the Netty ByteBuf behaviour in case the buffer is enlarged can (and should) be fixed for AMQP as well:
   TLDR if you got a 32K msg and write a single byte into it, Netty will enlarge it to be the next power of 2 capacity ie 64K, wasting ~50% of its space in case the message will sit on the queue awaiting it to be sent.
   
   I've "solved" this by forcing Netty to know upfront the desired required capacity and use https://netty.io/4.0/api/io/netty/buffer/ByteBuf.html#capacity-int- to size it upfront (instead of `ensureWritable`)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] michaelandrepearce commented on a change in pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
michaelandrepearce commented on a change in pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#discussion_r578787230



##########
File path: artemis-core-client/src/main/java/org/apache/activemq/artemis/core/message/impl/CoreMessage.java
##########
@@ -426,7 +426,6 @@ protected CoreMessage(CoreMessage other, TypedProperties copyProperties) {
       // with getEncodedBuffer(), otherwise can introduce race condition when delivering concurrently to
       // many subscriptions and bridging to other nodes in a cluster
       synchronized (other) {
-         this.body = other.body;

Review comment:
       as well as a perf improvement as serialisation is a time burn.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-763874659


   I still need to write a test that would show how clustered messages were getting a too wrong estimation without this change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-763872471


   @clebertsuconic @jbertram 
   I see that we've some concurrency issue on `TypedProperties`: 
   ```
   org.apache.activemq.artemis.core.protocol.hornetq.PropertiesConversionTest.testMultiThreadChanges
   org.apache.activemq.artemis.tests.compatibility.HornetQSoakTest.testSoakHornetQ on testSoakHornetQ(org.apache.activemq.artemis.tests.compatibility.HornetQSoakTest)
   org.apache.activemq.artemis.tests.integration.cluster.failover.ReplicatedMultipleServerFailoverExtraBackupsTest.testStartLiveFirst
   ```
   These tests were failing with my first version of PR that was enabling some strict assertion to check for concurrent modification of `TypedProperty` during `CoreMessage::encode` calls.
   I believe the change I've made here to replaced the assert with a debug log is masking a real (rare) issue. wdyt?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-766613506


   Seems that writing a proper reproducer for this is hard: I can reproduce it by hand by using a 2 nodes cluster (upstream and downstream), a disconnected durable subscriber on the downstream broker, 40K messages and a producer that keep running vs the upstream one until getting the downstream broker to go OOM. On this scenario an heap dump on the upstream broker shows 64KiB buffer sized core messages ie the next available power of 2 sized one.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-766613506


   Seems that writing a proper reproducer for this is hard: I can reproduce it by hand by using a 2 nodes cluster. (upstream and downstream), a disconnected durable subscriber on the downstream broker, 40K messages and a producer that keep running vs the upstream one until getting the downstream broker to go OOM. On this scenario a dump on the upstream broker shows 64KiB buffer sized core messages ie the next available power of 2 sized one.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-763874659


   I still need to write a test that shows how clustered messages were getting a too wrong estimation without this change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-764505865


   the CI seems fine with this, so I'm just going to prepare a test for the specific change and we're ready to go 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-971226293


   There's a real failure on test `org.apache.activemq.artemis.tests.compatibility.HornetQSoakTest.testSoakHornetQ on testSoakHornetQ(org.apache.activemq.artemis.tests.compatibility.HornetQSoakTest)`:
   
   ```
   [Thread-22 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@48247149)] 11:30:12,040 WARN  [org.apache.activemq.artemis.core.server] AMQ222151: removing consumer which did not handle a message, consumer=ServerConsumerImpl [id=0, filter=null, binding=LocalQueueBinding [address=topic, queue=QueueImpl[name=ARTEMIS8.test, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=localhost], temp=false]@cd27f06, filter=null, name=ARTEMIS8.test, clusterName=ARTEMIS8.test77120279-46fa-11ec-a1ac-ecf4bbced624]], message=Reference[397]:RELIABLE:CoreMessage[messageID=397,durable=true,userID=78923a2e-46fa-11ec-a1ac-ecf4bbced624,priority=4, timestamp=Tue Nov 16 11:30:12 EST 2021,expiration=0, durable=true, address=jms.topic.topic,size=282,properties=TypedProperties[__AMQ_CID=78764db3-46fa-11ec-a1ac-ecf4bbced624,_AMQ_ROUTING_TYPE=0]]@350453798: java.lang.AssertionError: Bad Headers encode size estimation
   	at org.apache.activemq.artemis.core.message.impl.CoreMessage.encodeHeadersAndProperties(CoreMessage.java:785)
   	at org.apache.activemq.artemis.core.message.impl.CoreMessage.encode(CoreMessage.java:748)
   	at org.apache.activemq.artemis.core.message.impl.CoreMessage.checkEncode(CoreMessage.java:354)
   	at org.apache.activemq.artemis.core.message.impl.CoreMessage.getEncodeSize(CoreMessage.java:828)
   	at org.apache.activemq.artemis.core.protocol.core.impl.wireformat.SessionReceiveMessage.expectedEncodeSize(SessionReceiveMessage.java:54)
   	at org.apache.activemq.artemis.core.protocol.core.impl.PacketImpl.createPacket(PacketImpl.java:356)
   	at org.apache.activemq.artemis.core.protocol.core.impl.PacketImpl.encode(PacketImpl.java:327)
   	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:379)
   	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBatched(ChannelImpl.java:333)
   	at org.apache.activemq.artemis.core.protocol.core.impl.CoreSessionCallback.sendMessage(CoreSessionCallback.java:128)
   	at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.deliverStandardMessage(ServerConsumerImpl.java:1165)
   	at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.proceedDeliver(ServerConsumerImpl.java:510)
   	at org.apache.activemq.artemis.core.server.impl.QueueImpl.proceedDeliver(QueueImpl.java:3840)
   	at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:3782)
   	at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliverDirect(QueueImpl.java:3741)
   	at org.apache.activemq.artemis.core.server.impl.QueueImpl.addTail(QueueImpl.java:1251)
   	at org.apache.activemq.artemis.core.server.impl.RoutingContextImpl.internalprocessReferences(RoutingContextImpl.java:197)
   	at org.apache.activemq.artemis.core.server.impl.RoutingContextImpl.processReferences(RoutingContextImpl.java:192)
   	at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl$1.done(PostOfficeImpl.java:1563)
   	at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:284)
   	at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
   	at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
   	at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   	at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
   ```
   I see that using `assert` for invariant has some value :P @gtully 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-973223879


   `org.apache.activemq.artemis.tests.compatibility.HornetQSoakTest.testSoakHornetQ` is now fixed: waiting if CI is green and we're ready to go :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] asfgit closed pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-914061015


   This need some time to be ready, but it's worth to be fixed IMO, so I'll leave it opened as draft


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-739428558


   I need to check yet how the CI would behave on this and add some tests.
   I'll next check if large messages and AMQP are affected the same in the same way
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-759505479


   This is breaking many tests, need to understand what's going on
   eg testSendMessageWithCustomBodyLongString
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [activemq-artemis] franz1981 commented on pull request #3370: ARTEMIS-3021 OOM due to wrong CORE message memory estimation

Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3370:
URL: https://github.com/apache/activemq-artemis/pull/3370#issuecomment-971226293


   There's a real failure on test `org.apache.activemq.artemis.tests.compatibility.HornetQSoakTest.testSoakHornetQ on testSoakHornetQ(org.apache.activemq.artemis.tests.compatibility.HornetQSoakTest)`:
   
   ```
   [Thread-22 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@48247149)] 11:30:12,040 WARN  [org.apache.activemq.artemis.core.server] AMQ222151: removing consumer which did not handle a message, consumer=ServerConsumerImpl [id=0, filter=null, binding=LocalQueueBinding [address=topic, queue=QueueImpl[name=ARTEMIS8.test, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=localhost], temp=false]@cd27f06, filter=null, name=ARTEMIS8.test, clusterName=ARTEMIS8.test77120279-46fa-11ec-a1ac-ecf4bbced624]], message=Reference[397]:RELIABLE:CoreMessage[messageID=397,durable=true,userID=78923a2e-46fa-11ec-a1ac-ecf4bbced624,priority=4, timestamp=Tue Nov 16 11:30:12 EST 2021,expiration=0, durable=true, address=jms.topic.topic,size=282,properties=TypedProperties[__AMQ_CID=78764db3-46fa-11ec-a1ac-ecf4bbced624,_AMQ_ROUTING_TYPE=0]]@350453798: java.lang.AssertionError: Bad Headers encode size estimation
   	at org.apache.activemq.artemis.core.message.impl.CoreMessage.encodeHeadersAndProperties(CoreMessage.java:785)
   	at org.apache.activemq.artemis.core.message.impl.CoreMessage.encode(CoreMessage.java:748)
   	at org.apache.activemq.artemis.core.message.impl.CoreMessage.checkEncode(CoreMessage.java:354)
   	at org.apache.activemq.artemis.core.message.impl.CoreMessage.getEncodeSize(CoreMessage.java:828)
   	at org.apache.activemq.artemis.core.protocol.core.impl.wireformat.SessionReceiveMessage.expectedEncodeSize(SessionReceiveMessage.java:54)
   	at org.apache.activemq.artemis.core.protocol.core.impl.PacketImpl.createPacket(PacketImpl.java:356)
   	at org.apache.activemq.artemis.core.protocol.core.impl.PacketImpl.encode(PacketImpl.java:327)
   	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:379)
   	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBatched(ChannelImpl.java:333)
   	at org.apache.activemq.artemis.core.protocol.core.impl.CoreSessionCallback.sendMessage(CoreSessionCallback.java:128)
   	at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.deliverStandardMessage(ServerConsumerImpl.java:1165)
   	at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.proceedDeliver(ServerConsumerImpl.java:510)
   	at org.apache.activemq.artemis.core.server.impl.QueueImpl.proceedDeliver(QueueImpl.java:3840)
   	at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:3782)
   	at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliverDirect(QueueImpl.java:3741)
   	at org.apache.activemq.artemis.core.server.impl.QueueImpl.addTail(QueueImpl.java:1251)
   	at org.apache.activemq.artemis.core.server.impl.RoutingContextImpl.internalprocessReferences(RoutingContextImpl.java:197)
   	at org.apache.activemq.artemis.core.server.impl.RoutingContextImpl.processReferences(RoutingContextImpl.java:192)
   	at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl$1.done(PostOfficeImpl.java:1563)
   	at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:284)
   	at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
   	at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
   	at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   	at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
   	```
   I see that using `assert` for invariant has some value :P @gtully 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org