You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "szetszwo (via GitHub)" <gi...@apache.org> on 2023/10/26 23:48:19 UTC

[PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

szetszwo opened a new pull request, #5497:
URL: https://github.com/apache/ozone/pull/5497

   ## What changes were proposed in this pull request?
   
   Change `ChunkBuffer` to allocate direct buffers and in order to avoid buffer copying.
   
   ## What is the link to the Apache JIRA
   
   HDDS-9536
   
   ## How was this patch tested?
   
   Modified existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "duongkame (via GitHub)" <gi...@apache.org>.
duongkame commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1787613416

   > > What I am thinking is, what if we load the log into directBuffer  ...
   > 
   > @umamaheswararao , thanks for testing it! The log is from the network but not loaded from the log.
   
   I filed 2 JIRA to make zero-copy work for ratis GRPC.
   https://issues.apache.org/jira/browse/RATIS-1925
   https://issues.apache.org/jira/browse/RATIS-1926
   
   (Did a [quick try](https://github.com/duongkame/ratis/commit/5a26ce953a5ab25c02a2614ce32f1c695f517328) for zero-copy in GrpcService and I'm positive it's feasible).
   
   Let's keep this PR for readChunk only. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "duongkame (via GitHub)" <gi...@apache.org>.
duongkame commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1782326537

   Thanks for the patch @szetszwo. Is this optimization for the WriteChunk path, or only for the read path? Looks like it's only for reads.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1789396866

   > ... found some bugs in this PR.  ...
   
   The bug is that a buffer can be released only after the proto is sent out to network.  We use [gRPC onNext(..)](https://grpc.github.io/grpc-java/javadoc/io/grpc/stub/StreamObserver.html#onNext(V)) which is asynchronous.  However, onNext(..) does not return a future.  Not sure how to wait for the asynchronous task to complete.
   
   @duongkame , do you have any idea?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "umamaheswararao (via GitHub)" <gi...@apache.org>.
umamaheswararao commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1782340055

   Yes, looking at the changes in KeyValueHandler.java, changes seems to be for readChunk. Are you planning separate JIRA for writeChunk?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1783507992

   Found that gRPC uses `CodedInputStream` to decode the incoming messages and there is an `UnsafeDirectNioDecoder` implementation.  Not sure if we can config gRPC to use it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "umamaheswararao (via GitHub)" <gi...@apache.org>.
umamaheswararao commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1786208956

   I was just doing some experimenting here:
   I was trying to pass direct buffer as input to codec buffer and seems like we get CodedInputStream$UnsafeDirectNioDecoder in that case.
   
   Here is my sample:
    ```
   public static void main(String[] args) throws IOException {
       ByteBuffer buf = ByteBuffer.allocateDirect(1024);
       for(int i =0;i< 1024; i++) {
         buf.put((byte)i);
       }
       CodedInputStream input = CodedInputStream.newInstance(buf);
       input.enableAliasing(true);
       System.out.println(input.readBytes());
     }
   ```
     
     However I get some expection as my input is just some random bytes and not a proper proto format. But trace is telling that bytes are getting extracted from directBuffer.
     
     ```
   Exception in thread "main" org.apache.ratis.thirdparty.com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.
   	at org.apache.ratis.thirdparty.com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:107)
   	at org.apache.ratis.thirdparty.com.google.protobuf.CodedInputStream$UnsafeDirectNioDecoder.readRawByte(CodedInputStream.java:1954)
   	at org.apache.ratis.thirdparty.com.google.protobuf.CodedInputStream$UnsafeDirectNioDecoder.readRawVarint64SlowPath(CodedInputStream.java:1856)
   	at org.apache.ratis.thirdparty.com.google.protobuf.CodedInputStream$UnsafeDirectNioDecoder.readRawVarint32(CodedInputStream.java:1751)
   	at org.apache.ratis.thirdparty.com.google.protobuf.CodedInputStream$UnsafeDirectNioDecoder.readBytes(CodedInputStream.java:1621)
   ```
     
     What I am thinking is, what if we load the log into directBuffer and make CodedInputStream backed by that codecBuffer. I am not sure if that is possible option in Ratis code, but just throwing some thoughts if that make sense in case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1783203272

   @duongkame , @umamaheswararao , this PR is mainly to change `ChunkBuffer` not to allocate non-direct buffers.  You are right that it is mostly for Read but not Write.
   
   For Write, the buffers are allocated by the gRPC server when it receives the requests.  Let me see if the buffers are direct or not.  If the buffers from gRPC are direct, we might have copied them to non-direct buffers somewhere in our code.
   
   > ... WriteChunk starts from the moment the ContainerCommandRequestProto [is parsed from the ratis log entry](https://github.com/apache/ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L656). The resulted StateMachineLogEntry is not a NioByteString backed by direct buffer. ...
   
   Thanks for the hint.  Let me check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "duongkame (via GitHub)" <gi...@apache.org>.
duongkame commented on code in PR #5497:
URL: https://github.com/apache/ozone/pull/5497#discussion_r1428647929


##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChunkBuffer.java:
##########
@@ -52,6 +57,30 @@ static ChunkBuffer allocate(int capacity, int increment) {
     return new ChunkBufferImplWithByteBuffer(ByteBuffer.allocate(capacity));
   }
 
+  /** Preallocate the entire buffer. */
+  static ChunkBuffer preallocate(long capacity, int increment) {
+    Preconditions.assertTrue(increment > 0);
+    if (capacity <= increment) {
+      final CodecBuffer c = CodecBuffer.allocateDirect(Math.toIntExact(capacity));

Review Comment:
   Would be cleaner if we directly deal with `ByteBufAllocator` and  `ByteBuf` in ChunkBuffer. `CodecBuffer` logic doesn't provide much, but adds an unnecessary dependency. 



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/GrpcXceiverService.java:
##########
@@ -107,9 +108,20 @@ public StreamObserver<ContainerCommandRequestProto> send(
 
       @Override
       public void onNext(ContainerCommandRequestProto request) {
+        final DispatcherContext context;

Review Comment:
   I think we got to do the same in ContainerStateMachine.readStateMachineData. Otherwise there'll be memory leak. Not sure if I should put this comment in https://github.com/apache/ozone/pull/5805



##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/IncrementalChunkBuffer.java:
##########
@@ -115,7 +128,9 @@ private ByteBuffer getAndAllocateAtIndex(int index) {
     // allocate upto the given index
     ByteBuffer b = null;
     for (; i <= index; i++) {
-      b = ByteBuffer.allocate(getBufferCapacityAtIndex(i));
+      final CodecBuffer c = CodecBuffer.allocateDirect(getBufferCapacityAtIndex(i));

Review Comment:
   I guess this particular change in IncrementalChunkBuffer is mostly for client-side library. 
   
   This will solve the inefficient use of memory as documented per HDDS-9843, because here we move to use `PooledByteBufAllocator` to allocate direct buffers. And usage of pooled/reused direct buffers will completely avoid the problem of repeating allocation in heap. cc. @kerneltime 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1783252989

   > ... I think the notion of "zero copies" has to start with ratis log reader, aka, the start point of the data flow. ...
   
   The client requests are received from the network (for leader, it is directly from client; for followers, it is the log entries from the leader).  It should not be related to the log reader unless the cache is full and the entry is invalidated.   This remind me that Ozone does have a `ContainerStateMachine` cache bug.
   
   Anyway, the `ByteString` received in gRPC unfortunately seems not `NioByteString`.  Let me verify it in Ozone.
   
   > ... I guess that is a part of ratis-streaming and cannot be just Ozone code change.
   
   Yes, Ratis Streaming uses Netty directly.  It neither use gRPC nor Protobuf for data.  (It does use Protobuf for headers.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "umamaheswararao (via GitHub)" <gi...@apache.org>.
umamaheswararao commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1786172768

   supportsUnsafeByteBufferOperations seems to be depending on MEMORY_ACCESSOR. I am not sure we have a way to control it. Do we ?
   private static boolean supportsUnsafeByteBufferOperations() {
       if (MEMORY_ACCESSOR == null) {
         return false;
       }
       return MEMORY_ACCESSOR.supportsUnsafeByteBufferOperations();
     }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1786393506

   > What I am thinking is, what if we load the log into directBuffer  ...
   
   @umamaheswararao , thanks for testing it!  The log is from the network but not loaded from the log.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "duongkame (via GitHub)" <gi...@apache.org>.
duongkame commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1784167529

   > Found that gRPC uses `CodedInputStream` to decode the incoming messages and there is an `UnsafeDirectNioDecoder` implementation. Not sure if we can config gRPC to use it.
   
   Thanks for the diggings, @szetszwo.
   
   Seems to me that gRPC recently finalized the APIs to support zero-copy, details in https://github.com/grpc/grpc-java/issues/7387. This implies some effort to configure the right marshaller for the `CodedInputStream` to leverage NIO.
   
   Today datannodes clone and copy a WriteChunk data buffer 3+ times, and this is due to the `LogEntryProto` is not marshaled using NioByteBuffer (so any subsequence operation on the log proto will just copy buffers instead of deriving). 
   I'm not sure if we should follow the route to try making ratis with gRPC zero-copy, or directly switch to ratis-streaming and deprecate the current approach.  @kerneltime @szetszwo 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "umamaheswararao (via GitHub)" <gi...@apache.org>.
umamaheswararao commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1787590390

   @szetszwo yeah, I had offline chat with @duongkame. He is actually trying to get directBuffers from stream directly. I would let him comment once he has reasonable results!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1787701529

   @duongkame , sure, let's do only readChunk here. Could you review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1783231777

   > 2023-10-27 09:52:40,873 INFO  server.GrpcClientProtocolService (GrpcClientProtocolService.java:onNext(244)) - XXX gRPC class org.apache.ratis.thirdparty.com.google.protobuf.ByteString$LiteralByteString
   
   In a ratis test, it shows that the `ByteString` received in gRPC unfortunately is `LiteralByteString` but not `NioByteString`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1783295637

   Tried testing Ratis with a larger message size (32MB) to see if gRPC will change to use `NioByteString`.  Unfortunately, no!
   
   > 2023-10-27 10:49:51,701 INFO  server.GrpcClientProtocolService (GrpcClientProtocolService.java:onNext(246)) - XXX message size 33554454, class LiteralByteString
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-9536. Datanode perf: Copying (heap) buffers is costly. [ozone]

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on PR #5497:
URL: https://github.com/apache/ozone/pull/5497#issuecomment-1787760388

   @duongkame , found some bugs in this PR.  Let me fix them first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org