You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/08/20 10:44:21 UTC

[GitHub] [ozone] captainzmc opened a new pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

captainzmc opened a new pull request #2557:
URL: https://github.com/apache/ozone/pull/2557


   ## What changes were proposed in this pull request?
   
   In BlockDataStreamOutput, we should write directly to the underlying stream. We should not use BufferPool and ChunkBuffer in BlockDataStreamOutput anymore since we want to avoid buffer copying.
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5599
   
   ## How was this patch tested?
   
   Existing UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-906107346


   @captainzmc , also, can you please put up. a doc explaining how the retry mechanism going to look like, now that buffer pool is removed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] captainzmc commented on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
captainzmc commented on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-905601955


   hi @szetszwo, CI has been running correct. Could you help take another look this PR? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] captainzmc commented on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
captainzmc commented on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-902606684


   Hi @szetszwo @bshashikant @mukul1987 
   I've removed the BufferPool dependency in this PR. The ChunkBuffer is not completely removed due to the checksum, so maybe we can discuss it here.
   Also, since BufferPool is removed, Retry and Flush policy will be different, and I think we can discuss their implementations in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] captainzmc edited a comment on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
captainzmc edited a comment on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-906217012


   Agree with @szetszwo.
   At present the writing flow of Ozone streaming has working. And our client also had avoided buffer copy.(Although we are currently using star Topology).
   I think we can do a simple performance test now. Currently our implementation is very simple, which makes it easier to locate performance issues. We need to address performance issues first.
   @bshashikant 
   In addition, I will test the performance of streaming with checksum enabled and without checksum enabled. To verify checksum's impact on performance. (The default value of io.bytes.per.checksum in HDFS is 512 bytes)Perhaps checksum's performance overhead isn't that high. Let's make sure by testing it first.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] szetszwo commented on a change in pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#discussion_r693578999



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/CommitWatcher.java
##########
@@ -84,6 +84,13 @@ public CommitWatcher(BufferPool bufferPool, XceiverClientSpi xceiverClient) {
     futureMap = new ConcurrentHashMap<>();
   }
 
+  public CommitWatcher(XceiverClientSpi xceiverClient) {

Review comment:
       Let's copy CommitWatcher to a new class, say StreamCommitWatcher?  In StreamCommitWatcher, we don't need bufferPool.  The commitIndex2flushedDataMap should be different -- it should not use ChunkBuffer anymore.

##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockDataStreamOutput.java
##########
@@ -501,38 +356,11 @@ private void watchForCommit(boolean bufferFull) throws IOException {
     return flushFuture;
   }
 
+  // TODO We are no longer using the bufferPool, so we do not need to flush the
+  //  data in it. We can discuss whether add other behaviors here.

Review comment:
       flush() should wait for all the futures to complete.  Move the code below 
   {code}
   //executePutBlock(..)
      try {
         CompletableFuture.allOf(futures.toArray(EMPTY_FUTURE_ARRAY)).get();
       } catch (Exception e) {
         LOG.warn("Failed to write all chunks through stream: " + e);
         throw new IOException(e);
       }
   {code}
   from executePutBlock(..) to flush().

##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockDataStreamOutput.java
##########
@@ -426,16 +294,6 @@ private void watchForCommit(boolean bufferFull) throws IOException {
       boolean force) throws IOException {
     checkOpen();
     long flushPos = totalDataFlushedLength;
-    final List<ChunkBuffer> byteBufferList;
-    if (!force) {
-      Preconditions.checkNotNull(bufferList);
-      byteBufferList = bufferList;
-      bufferList = null;
-      Preconditions.checkNotNull(byteBufferList);
-    } else {
-      byteBufferList = null;
-    }
-

Review comment:
       Move the try-catch below to flush() and then call flush() here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kaijchen commented on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
kaijchen commented on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-906115221


   We no longer copy the data into buffer pool. But we still need to retain the ByteBuf until it gets flushed. This retain is for the retry.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant edited a comment on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
bshashikant edited a comment on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-906104431


   Sorry for getting late on this. Although we have removed bufferPool, i feel we still need to cache atleast "bytePerCheckSum" worth of data in ozone client before ending it to the underlying stream , so that we don't compute checksum for very small packets (using padding to fill up 1 MB) for write packets(packets < 1 MB). 
   
   What is the solution proposed to fix this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-906104431


   Sorry for getting late on this. Although we have removed bufferPool, i feel we still need to cache atleast "bytePerCheckSum" worth of data in ozone client before ending it to the underlying stream , so that we don't compute checksum for very small packets (using padding top fill up 1 MB) for write packets(packets < 1 MB). 
   
   What is the solution proposed to fix this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] captainzmc commented on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
captainzmc commented on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-906217012


   Agree with @szetszwo.
   At present the writing flow of Ozone streaming has working. And our client also had avoided buffer copy.(Although we are currently using star Topology).
   I thought we could do a simple performance test. Currently our implementation is very simple, which makes it easier to locate performance issues. We need to address performance issues first.
   @bshashikant 
   In addition, I will test the performance of streaming with checksum enabled and without checksum enabled. To verify checksum's impact on performance. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] szetszwo commented on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
szetszwo commented on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-906160813


   @bshashikant , @kaijchen , @captainzmc , I suggest that we first make the streaming write working (without retry, checksum, error handling, etc) so that we have something to test.  Then, we can add those features.  What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] szetszwo commented on a change in pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#discussion_r693577605



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockDataStreamOutput.java
##########
@@ -501,38 +356,11 @@ private void watchForCommit(boolean bufferFull) throws IOException {
     return flushFuture;
   }
 
+  // TODO We are no longer using the bufferPool, so we do not need to flush the
+  //  data in it. We can discuss whether add other behaviors here.

Review comment:
       flush() should wait for all the futures to complete.  Move the code below 
   ```
   //executePutBlock(..)
      try {
         CompletableFuture.allOf(futures.toArray(EMPTY_FUTURE_ARRAY)).get();
       } catch (Exception e) {
         LOG.warn("Failed to write all chunks through stream: " + e);
         throw new IOException(e);
       }
   ```
   from executePutBlock(..) to flush().




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] captainzmc edited a comment on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
captainzmc edited a comment on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-906217012


   Agree with @szetszwo.
   At present the writing flow of Ozone streaming has working. And our client also had avoided buffer copy.(Although we are currently using star Topology).
   I think we can do a simple performance test now. Currently our implementation is very simple, which makes it easier to locate performance issues. We need to address performance issues first.
   @bshashikant 
   In addition, I will test the performance of streaming with checksum enabled and without checksum enabled. To verify checksum's impact on performance. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] captainzmc edited a comment on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
captainzmc edited a comment on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-902606684


   Hi @szetszwo @bshashikant @mukul1987 
   I've removed the BufferPool and ChunkBuffer from BlockDataStreamOutput in this PR. 
   Also, since BufferPool is removed, Retry and Flush policy will be different, and I think we can discuss their implementations in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] szetszwo merged pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
szetszwo merged pull request #2557:
URL: https://github.com/apache/ozone/pull/2557


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] captainzmc commented on pull request #2557: HDDS-5599. [Ozone-Streaming]drop BufferPool and ChunkBuffer to avoid buffer copying

Posted by GitBox <gi...@apache.org>.
captainzmc commented on pull request #2557:
URL: https://github.com/apache/ozone/pull/2557#issuecomment-905408342


   > +1, the change looks good.
   > 
   > There are some test failures. Could you take a look?
   
   [CI runs correctly](https://github.com/captainzmc/hadoop-ozone/runs/3420162405) on my personal branch. It seems the error is not caused by this PR.  I will try to re-trigger the CI to make sure this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org