You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/04/22 16:33:05 UTC

[GitHub] [pulsar] BewareMyPower opened a new pull request #10330: Fix message buffer being copied from direct memory to heap memory

BewareMyPower opened a new pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330


   ### Motivation
   
   When I tested pulsar with `AppendIndexMetadataInterceptor` configured, i.e.
   
   ```properties
   brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor
   ```
   
   I found the heap memory increased very fast so that GC happened frequently. After the analysis from the dump info, I found many messages, which have `BrokerEntryMetadata` in the head, are stored in heap memory instead of direct memory.
   
   When I removed the above configuration, the heap memory became slow to increase. 
   
   ### Modifications
   
   - Copy two buffers into a single buffer instead of using `CompositeByteBuf`. It's hard to explain, however, the heap memory increasing problem was solved after that.
   - Remove the unused return value from `ManagedLedgerInterceptor#beforeAddEntry` interface.
   - And the reference count check to ensure that after the change, the input and output buffers of `addBrokerEntryMetadata` will be released finally after the output buffer is released.
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825611829


   @dlg99  PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825812859


   I'll try to debug BK dependencies first. Maybe it was not caused by the checksum.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dlg99 commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
dlg99 commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825738761


   @codelipenghui nice catch. The checksum part should be easy to fix, DigestManager.computeDigestAndPackageForSending can check if ByteBuf is an instance of CompositeByteBuf, treat it as Iterable<ByteBuf> (implemented by CompositeByteBuf), update digest in a loop. Or something along these lines.
   PCBC.addEntry is doing something similar with ByteBufList. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825733411


   @dlg99 please follow up on BK.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dlg99 commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
dlg99 commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825812706


   @BewareMyPower Looking at original chart, heap utilization used to go up to 4.5G, now about 3G (assuming I interpret these charts correctly, the tests and configuration are are the same etc). I'd expect these to be the same but grow slower and less frequent GC pauses but I have no idea what these charts actually show.
   
   You may need to run a profiler to check where the allocations happen. IIRC JFR can record data over some interval and then JMC can show you where the most allocations happened, with stacktraces etc.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-826085969






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
merlimat commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825892870


   > So the argument type, which is passed to LedgerHandle#asyncAddEntry, is UnpooledDuplicatedByteBuf not CompositeByteBuf.
   
   @BewareMyPower We might need to use `ByteBuf.unwrap()` to access to the wrapped buffer.
   
   
   > I tried to change the code above to
   >  ByteBuf duplicateBuffer = data.retainedDuplicate();
   > But it still didn't work (even worse). I've added some debug logs to BK:
   
   There's an extra retain, that would also require another release on the buffer.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower closed pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower closed pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-826038049


   @dlg99 Really thanks for your help. It should work now.
   
   But there's still a little problem that is not related to your code change. When I run Pulsar with BK 4.14-SNAPSHOT, it looks like that the JNI library can't be loaded:
   
   > WARN  com.scurrilous.circe.checksum.Crc32cIntChecksum - Failed to load Circe JNI library. Falling back to Java based CRC32c provider
   
   This is the reason that I cannot get a good test result. However, when I run the Pulsar with BK 4.13, it succeeded to load the JNI library:
   
   > INFO  com.scurrilous.circe.checksum.Crc32cIntChecksum - SSE4.2 CRC32C provider initialized
   
   They are on the same VM so I wondered what could cause this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825875852


   @dlg99 I just found there's something wrong with @codelipenghui 's analysis, which leads to a result that your PR didn't work.
   
   We can debug `BrokerEntryMetadataE2ETest` and go to https://github.com/apache/pulsar/blob/5a7ba52193c069bc40705bcdc74287aae1066b17/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java#L123-L128
   
   - The original `data` is `io.netty.buffer.CompositeByteBuf`
   - However, `duplicateBuffer` is `io.netty.buffer.UnpooledDuplicatedByteBuf`
   
   So the argument type, which is passed to `LedgerHandle#asyncAddEntry`, is `UnpooledDuplicatedByteBuf` not `CompositeByteBuf`.
   
   ----
   
   Though I tried to change the code above to
   
   ```java
               ByteBuf duplicateBuffer = data.retainedDuplicate();
               final ByteBuf originalData = data;
               data = duplicateBuffer;
   
               // internally asyncAddEntry() will take the ownership of the buffer and release it at the end
               addOpCount = ManagedLedgerImpl.ADD_OP_COUNT_UPDATER.incrementAndGet(ml);
               lastInitTime = System.nanoTime();
               ledger.asyncAddEntry(originalData, this, addOpCount);
   ```
   
   But it still didn't work. I'll take a look later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower edited a comment on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower edited a comment on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-826038049


   @dlg99 Really thanks for your help. It should work now.
   
   But there's still a little problem that is not related to your code change. When I run Pulsar with BK 4.14-SNAPSHOT, it looks like that the JNI library can't be loaded:
   
   > WARN  com.scurrilous.circe.checksum.Crc32cIntChecksum - Failed to load Circe JNI library. Falling back to Java based CRC32c provider
   
   This is the reason that I still cannot get a good test result. However, when I run the Pulsar with BK 4.13, it succeeded to load the JNI library:
   
   > INFO  com.scurrilous.circe.checksum.Crc32cIntChecksum - SSE4.2 CRC32C provider initialized
   
   They are on the same VM so I wondered what could cause this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-826087017






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower edited a comment on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower edited a comment on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825008206


   Here're the monitor statistics after and before this change. The first case (before 00:15) uses this PR's `addBrokerEntryMetadata` method.
   
   <img width="1261" alt="截屏2021-04-23 上午12 19 32" src="https://user-images.githubusercontent.com/18204803/115751365-b7aa1e00-a3cb-11eb-9448-a7312bda0fce.png">
   <img width="841" alt="截屏2021-04-23 上午12 19 46" src="https://user-images.githubusercontent.com/18204803/115751387-bb3da500-a3cb-11eb-8b10-26661e6e0984.png">
   
   I used a producer with nearly 200 MB/s write speed.
   
   ```bash
   ./bin/pulsar-perf produce -r 300000 my-topic
   ```
   
   As we can see, the heap memory of second case (after 00:15) increased very fast and the GC happened very frequently. The difference of these two cases are only the `pulsar-common.jar`.
   
   I'm not an expert of Netty and also very confused why the `CompositeByteBuf` make the whole buffer be copied into the heap memory.
   
   Here's a dumped result that was generated by `jmap -dump:format=b`.
   
   ![image](https://user-images.githubusercontent.com/18204803/115752469-b4fbf880-a3cc-11eb-84a0-e609259f255c.png)
   
   We can see there're many messages in heap memory. The first two bytes are `[14, 2]`, which are equivalent to `magicBrokerEntryMetadata`. See https://github.com/apache/pulsar/blob/37e3180f06188c4d34cd347d46d66bcde3516445/pulsar-common/src/main/java/org/apache/pulsar/common/protocol/Commands.java#L117


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825723693


   I've updated the code and PR description, PTAL again. @eolivelli @codelipenghui 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on a change in pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on a change in pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#discussion_r619188621



##########
File path: managed-ledger/src/main/java/org/apache/bookkeeper/mledger/intercept/ManagedLedgerInterceptor.java
##########
@@ -36,9 +36,8 @@
      * Intercept an OpAddEntry and return an OpAddEntry.
      * @param op an OpAddEntry to be intercepted.
      * @param numberOfMessages
-     * @return an OpAddEntry.
      */
-    OpAddEntry beforeAddEntry(OpAddEntry op, int numberOfMessages);
+    void beforeAddEntry(OpAddEntry op, int numberOfMessages);

Review comment:
       It's new API introduced in master, we have not released it yet.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825745204


   BTW, I'm not sure if BK could be released in time so that Pulsar 2.8.0 won't be delayed too long. Currently I suggest applying this PR's patch first. If BK can't be released in time, we could apply the BK's fix to Pulsar 2.8.1.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall commented on a change in pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on a change in pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#discussion_r619588992



##########
File path: managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
##########
@@ -766,26 +766,24 @@ private synchronized void internalAsyncAddEntry(OpAddEntry addOperation) {
                 STATE_UPDATER.set(this, State.ClosingLedger);
             }
             // interceptor entry before add to bookie
-            if (beforeAddEntry(addOperation)) {
-                addOperation.initiate();
-            }
+            beforeAddEntry(addOperation).ifPresent(OpAddEntry::initiate);
         }
     }
 
-    private boolean beforeAddEntry(OpAddEntry addOperation) {
+    private Optional<OpAddEntry> beforeAddEntry(OpAddEntry addOperation) {
         // if no interceptor, just return true to make sure addOperation will be initiate()

Review comment:
       This comment should be updated to reflect the new return type.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dlg99 commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
dlg99 commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825892952


   @BewareMyPower I updated the pr to unwrap DuplicatedByteBuf


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825619753


   Seems the problem is related to the crc32 checksum in the bookkeeper client. Since we have more than 1 component in the CompositeByteBuf, `CompositeByteBuf.hasMemoryAddress()` will return false, details to see implement at the CompositeByteBuf:
   
   ```
   public boolean hasMemoryAddress() {
           switch(this.componentCount) {
           case 0:
               return Unpooled.EMPTY_BUFFER.hasMemoryAddress();
           case 1:
               return this.components[0].buf.hasMemoryAddress();
           default:
               return false;
           }
       }
   ```
   
   For the crc32 checksum in the bookkeeper will use nioBuffer() to resume the CRC hash, due to `CompositeByteBuf..nioBuffer()` will copy the data to the HeapByteBuffer if the component size > 1.
   
   ```
   public static int resumeChecksum(int previousChecksum, ByteBuf payload) {
           if (payload.hasMemoryAddress() && (CRC32C_HASH instanceof Sse42Crc32C)) {
               return CRC32C_HASH.resume(previousChecksum, payload.memoryAddress() + payload.readerIndex(),
                   payload.readableBytes());
           } else if (payload.hasArray()) {
               return CRC32C_HASH.resume(previousChecksum, payload.array(), payload.arrayOffset() + payload.readerIndex(),
                   payload.readableBytes());
           } else {
               return CRC32C_HASH.resume(previousChecksum, payload.nioBuffer());
           }
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-826087620


   @eolivelli Sounds reasonable. I'll give it a shot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825012935


   @codelipenghui @hangc0276 @wuzhanpeng  @dockerzhang @aloyszhang PTAL. Also it would be appreciated if you can compare the heap memory usage before and after this PR.
   
   I'm not sure if there's something wrong with my test environment. The VM is AWS EC2 Ubuntu Server 18.04 LTS (HVM) ami-05248307900d52e3a i3.4xlarge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-830845906


   Can you please start a discussion on dev@bookkeeper.apache.org ?
   I am fine with cutting a release


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-826087017


   @eolivelli I built BK on my MacOS with JDK 8, see https://github.com/apache/pulsar/pull/10330#issuecomment-825801576
   
   And I run the Pulsar with BK 4.14 on AWS EC2 Ubuntu Server 18.04 LTS (HVM) ami-05248307900d52e3a, with JDK 11.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dlg99 commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
dlg99 commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-827159034


   @BewareMyPower can you confirm that the copy to heap issue is resolved by the bookkeeper change? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825740927


   >  since this only happens when the broker entry metadata enabled, if this can not be fixed completely, we'd better disable this feature by default in 2.8.0.
   
   This feature is disabled by default now because the default `brokerEntryMetadataInterceptors` is empty and `managedLedgerInterceptor` is `null`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-826038503


   Also I have a question. Should we add an option to configure that if the CompositeByteBuf could be merged? Because for the platforms that can't load the Circe JNI library, the performance impact that is caused by frequent GC is much more than the data copy in memory. With the Java based CRC32c provider, the merged `ByteBuf` is still in direct memory but the `CompositeByteBuf` will call `nioBuffer()` and copy data into heap memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on a change in pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on a change in pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#discussion_r619252871



##########
File path: managed-ledger/src/main/java/org/apache/bookkeeper/mledger/intercept/ManagedLedgerInterceptor.java
##########
@@ -36,9 +36,8 @@
      * Intercept an OpAddEntry and return an OpAddEntry.
      * @param op an OpAddEntry to be intercepted.
      * @param numberOfMessages
-     * @return an OpAddEntry.
      */
-    OpAddEntry beforeAddEntry(OpAddEntry op, int numberOfMessages);
+    void beforeAddEntry(OpAddEntry op, int numberOfMessages);

Review comment:
       IMO, the initial API design is bad because the returned value is never used. And the reviewers, including me, all ignored this problem. Also, @codelipenghui is right, there's no release version that includes this API yet.
   
   But after thinking for a while, I'll take your advice in this PR and change where this API is used to
   
   ```java
   op = beforeAddEntry(op, numOfMessages);
   ```
   
   instead of
   
   ```java
   beforeAddEntry(op, numOfMessages);
   ```
   
   Though the assign is somehow useless, it indicated that `op` is changed during `beforeAddEntry`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on a change in pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on a change in pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#discussion_r619591544



##########
File path: managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
##########
@@ -766,26 +766,24 @@ private synchronized void internalAsyncAddEntry(OpAddEntry addOperation) {
                 STATE_UPDATER.set(this, State.ClosingLedger);
             }
             // interceptor entry before add to bookie
-            if (beforeAddEntry(addOperation)) {
-                addOperation.initiate();
-            }
+            beforeAddEntry(addOperation).ifPresent(OpAddEntry::initiate);
         }
     }
 
-    private boolean beforeAddEntry(OpAddEntry addOperation) {
+    private Optional<OpAddEntry> beforeAddEntry(OpAddEntry addOperation) {
         // if no interceptor, just return true to make sure addOperation will be initiate()

Review comment:
       OK, but since this PR may not be merged because the BookKeeper side could fix it. I may do this code refactor in another PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower closed pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower closed pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825694113


   @codelipenghui Thanks for you help, I'll update the PR description after I changed the `beforeAddEntry` back.
   
   @eolivelli As what @codelipenghui mentioned, here we cannot avoid copying data because in BookKeeper side `CompositeByteBuf#nioBuffer` will be called. I think we can open an issue in https://github.com/apache/bookkeeper. So after 
    BookKeeper side fixed the problem, we could change the code here to avoid copying data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825756860


   @dlg99 Thanks for your quick fix.
   
   And what I concern is if the PR is merged, when will BK be released and then we can upgrade Pulsar's BK dependency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-827262101


   @dlg99 Yeah, it's resolved. Sorry I forgot to update the result. Here's the compare with the same workload of https://github.com/apache/pulsar/pull/10330#issuecomment-825801576
   
   ![image](https://user-images.githubusercontent.com/18204803/116174493-1e6b6680-a741-11eb-8796-e791a4931d8d.png)
   
   The heap memory is stable and there's no GC.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-831803572


   @eolivelli Sorry for the late reply because I'm on vacation now. I sent an email just now, PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825008206


   Here're the monitor statistics after and before this change. The first case (before 00:15) uses this PR's `addBrokerEntryMetadata` method.
   
   <img width="1261" alt="截屏2021-04-23 上午12 19 32" src="https://user-images.githubusercontent.com/18204803/115751365-b7aa1e00-a3cb-11eb-9448-a7312bda0fce.png">
   <img width="841" alt="截屏2021-04-23 上午12 19 46" src="https://user-images.githubusercontent.com/18204803/115751387-bb3da500-a3cb-11eb-8b10-26661e6e0984.png">
   
   I used a producer with nearly 200 MB/s write speed. As we can see, the heap memory of second case (after 00:15) increased very fast and the GC happened very frequently.
   
   The difference of these two cases are only the `pulsar-common.jar`.
   
   I'm not an expert of Netty and also very confused why the `CompositeByteBuf` make the whole buffer be copied into the heap memory.
   
   Here's a dumped result that was generated by `jmap -dump:format=b`.
   
   ![image](https://user-images.githubusercontent.com/18204803/115752469-b4fbf880-a3cc-11eb-84a0-e609259f255c.png)
   
   We can see there're many messages in heap memory. The first two bytes are `[14, 2]`, which are equivalent to `magicBrokerEntryMetadata`. See https://github.com/apache/pulsar/blob/37e3180f06188c4d34cd347d46d66bcde3516445/pulsar-common/src/main/java/org/apache/pulsar/common/protocol/Commands.java#L117


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-830835804


   Hi, @eolivelli I see https://github.com/apache/bookkeeper/pull/2701 has been merged for a few days. Is it any plan for BookKeeper's release now? Then the Pulsar's BK dependency could be upgraded to fix https://github.com/apache/pulsar/pull/10330.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on a change in pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#discussion_r619166288



##########
File path: managed-ledger/src/main/java/org/apache/bookkeeper/mledger/intercept/ManagedLedgerInterceptor.java
##########
@@ -36,9 +36,8 @@
      * Intercept an OpAddEntry and return an OpAddEntry.
      * @param op an OpAddEntry to be intercepted.
      * @param numberOfMessages
-     * @return an OpAddEntry.
      */
-    OpAddEntry beforeAddEntry(OpAddEntry op, int numberOfMessages);
+    void beforeAddEntry(OpAddEntry op, int numberOfMessages);

Review comment:
       please do not change public APIs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dlg99 commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
dlg99 commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825767997


   @BewareMyPower I think we have a few other interesting changes since bk 4.13.0 to justify 4.13.1 release. 
   @eolivelli what do you think? We'll need to review pending PRs, what can be merged etc. Release process (+voting etc) easily takes a week. 
   
   @BewareMyPower meanwhile, can I ask you to build BK locally and confirm that the fix in fact improves the situation with heap allocations in your scenario? Let's assume this is a pre-req for starting the release process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825769546


   @dlg99 I agree with you 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-826085969


   BK 4.14 is basically equals to 4.13 in this scope.
   So probably it is a problem about how you have built it locally.
   On which platform you are? Arw you on Linux? Which JDK?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825590429


   It looks like the heap memory problem is a BK side problem. I doubt that when `LedgerHandle#asyncAddEntry(ByteBuf data, AddCallback cb, Object ctx)` accepts a `CompositeByteBuf` as the `data`, it will finally concat `CompositeByteBuf`'s buffers to a single buffer in heap memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825772532


   @dlg99 OK, I'll verify it soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dlg99 commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
dlg99 commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825752962


   https://github.com/apache/bookkeeper/pull/2701 should fix it. 
   Existing tests should be enough, we are not unit-testing memory allocations. 
   I didn't wait for the test run locally, let's see what CI checks say.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825801576


   @dlg99 It looks like there's not significant improvement.
   
   ![image](https://user-images.githubusercontent.com/18204803/115906442-1b028180-a49a-11eb-9a02-1bcf78ca84a4.png)
   
   I'll add some logs to check if the BK dependency is updated. BTW, my build steps are:
   1. Build latest BK with PR 2701
   ```bash
   mvn clean install -DskipTests
   ```
   2. Upgrade Pulsar's BK version
   ```xml
   <bookkeeper.version>4.14.0-SNAPSHOT</bookkeeper.version>
   ```
   3. Build Pulsar
   ```bash
   mvn clean install -DskipTests -Pcore-modules
   ```
   4. Then upload the `distribution/server/target/apache-pulsar-2.8.0-SNAPSHOT-bin.tar.gz` to my VM and start it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-826087287


   Probably (I am not sure)  the problem is that you have to build BK on Linux, in order to build the CirceChecksum library correctly.
   
   When we release BK, even while using Mace's, we run the build in a docker env, you can use the script you find in the 'dev' folder for instance 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825731737


   @BewareMyPower Could you please help add an issue for tracking the Pulsar side and an issue for tracking the bookkeeper side? Before the 2.8.0 release, we need to fix both of them, this also needs a release of the bookkeeper which contains this fix. I agree with we can copying the data for now to avoid copy data into the JVM,  since this only happens when the broker entry metadata enabled, if this can not be fixed completely, we'd better disable this feature by default in 2.8.0.
   
   @eolivelli WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower edited a comment on pull request #10330: Fix message buffer being copied from direct memory to heap memory

Posted by GitBox <gi...@apache.org>.
BewareMyPower edited a comment on pull request #10330:
URL: https://github.com/apache/pulsar/pull/10330#issuecomment-825875852


   @dlg99 I just found there's something wrong with @codelipenghui 's analysis, which leads to a result that your PR didn't work.
   
   We can debug `BrokerEntryMetadataE2ETest` and go to https://github.com/apache/pulsar/blob/5a7ba52193c069bc40705bcdc74287aae1066b17/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java#L123-L128
   
   - The original `data` is `io.netty.buffer.CompositeByteBuf`
   - However, `duplicateBuffer` is `io.netty.buffer.UnpooledDuplicatedByteBuf`
   
   So the argument type, which is passed to `LedgerHandle#asyncAddEntry`, is `UnpooledDuplicatedByteBuf` not `CompositeByteBuf`.
   
   ----
   
   I tried to change the code above to
   
   ```java
               ByteBuf duplicateBuffer = data.retainedDuplicate();
               final ByteBuf originalData = data;
               data = duplicateBuffer;
   
               // internally asyncAddEntry() will take the ownership of the buffer and release it at the end
               addOpCount = ManagedLedgerImpl.ADD_OP_COUNT_UPDATER.incrementAndGet(ml);
               lastInitTime = System.nanoTime();
               ledger.asyncAddEntry(originalData, this, addOpCount);
   ```
   
   But it still didn't work (even worse). I've added some debug logs to BK:
   
   ```java
       public static int resumeChecksum(int previousChecksum, ByteBuf payload) {
           log.info("XYZ resumeChecksum payload.hasMemoryAddress(): {}, hasArray(): {}, is CompositeByteBuf: {},"
                           + " class name: {}",
                   payload.hasMemoryAddress(), payload.hasArray(), (payload instanceof CompositeByteBuf),
                   payload.getClass().getName());
   ```
   
   Then send 1 message, the logs is:
   
   > 19:41:21.303 [BookKeeperClientWorker-OrderedExecutor-12-0] INFO  com.scurrilous.circe.checksum.Crc32cIntChecksum - XYZ resumeChecksum payload.hasMemoryAddress(): true, hasArray(): false, is CompositeByteBuf: false, class name: io.netty.buffer.PooledUnsafeDirectByteBuf
   19:41:21.304 [BookKeeperClientWorker-OrderedExecutor-12-0] INFO  com.scurrilous.circe.checksum.Crc32cIntChecksum - XYZ resumeChecksum payload.hasMemoryAddress(): true, hasArray(): false, is CompositeByteBuf: false, class name: io.netty.buffer.PooledUnsafeDirectByteBuf
   19:41:21.305 [BookKeeperClientWorker-OrderedExecutor-12-0] INFO  com.scurrilous.circe.checksum.Crc32cIntChecksum - XYZ resumeChecksum payload.hasMemoryAddress(): true, hasArray(): false, is CompositeByteBuf: false, class name: io.netty.buffer.AbstractPooledDerivedByteBuf$PooledNonRetainedSlicedByteBuf
   
   The `CompositeByteBuf` was split to a `PooledUnsafeDirectByteBuf` and a `AbstractPooledDerivedByteBuf$PooledNonRetainedSlicedByteBuf`. It seems that `nioBuffer()` wouldn't be called, but the heap memory problem still existed and might get worse.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org