You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/11/23 09:21:37 UTC

[GitHub] [ozone] sadanand48 opened a new pull request #2866: HDDS-5851. Define a PutBlock/maxBuffer fixed boundary for streaming writes.

sadanand48 opened a new pull request #2866:
URL: https://github.com/apache/ozone/pull/2866


   ## What changes were proposed in this pull request?
   Currently, putBlock call gets executed only during close() when a block is written. The aim here is to have a putBlock boundary similar to async path where the putBlock gets executed which updates the key length and help reduce some memory on client. The  default boundary here is set at 32 MB where a putBlock is executed.  Also a limit is set on the ByteBuffer size the client can send  here (4mb).
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-5851
   
   ## How was this patch tested?
   Unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2866: HDDS-5851. Define a PutBlock/maxBuffer fixed boundary for streaming writes.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2866:
URL: https://github.com/apache/ozone/pull/2866#discussion_r758030876



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/OzoneClientConfig.java
##########
@@ -54,6 +54,21 @@
       tags = ConfigTag.CLIENT)
   private int streamBufferSize = 4 * 1024 * 1024;
 
+  @Config(key = "datastream.max.buffer.size",
+      defaultValue = "4MB",
+      type = ConfigType.SIZE,
+      description = "The maximum size of the ByteBuffer "

Review comment:
       Why do we need this max buffer size config here? if the total size of data in the buffer list equals "datastream.buffer.flush.size", it should be good enough. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] captainzmc commented on a change in pull request #2866: HDDS-5851. [Ozone-Streaming] Define a PutBlock/maxBuffer fixed boundary for streaming writes.

Posted by GitBox <gi...@apache.org>.
captainzmc commented on a change in pull request #2866:
URL: https://github.com/apache/ozone/pull/2866#discussion_r759234553



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockDataStreamOutput.java
##########
@@ -257,13 +257,29 @@ public void write(ByteBuffer b, int off, int len) throws IOException {
     if (len == 0) {
       return;
     }
+    int curLen = len;
+    // set limit on the number of bytes that a ByteBuffer(StreamBuffer) can hold
+    int maxBufferLen = config.getDataStreamMaxBufferSize();
+    while (curLen > 0) {
+      int writeLen = Math.min(curLen, maxBufferLen);
+      final StreamBuffer buf = new StreamBuffer(b, off, writeLen);
+      off += writeLen;
+      bufferList.add(buf);
+      writeChunkToContainer(buf.duplicate());
+      curLen -= writeLen;
+      writtenDataLength += writeLen;
+      doFlushIfNeeded();
+    }
+  }
 
-    final StreamBuffer buf = new StreamBuffer(b, off, len);
-    bufferList.add(buf);
-
-    writeChunkToContainer(buf.duplicate());
-
-    writtenDataLength += len;
+  private void doFlushIfNeeded() throws IOException {
+    Preconditions.checkArgument(config.getDataStreamBufferFlushSize() > config
+        .getDataStreamMaxBufferSize());
+    long boundary = config.getDataStreamBufferFlushSize() / config
+        .getDataStreamMaxBufferSize();
+    if (bufferList.size() % boundary == 0) {
+      executePutBlock(false, false);

Review comment:
       Thanks for explaining, make sense.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] captainzmc commented on a change in pull request #2866: HDDS-5851. Define a PutBlock/maxBuffer fixed boundary for streaming writes.

Posted by GitBox <gi...@apache.org>.
captainzmc commented on a change in pull request #2866:
URL: https://github.com/apache/ozone/pull/2866#discussion_r758971760



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockDataStreamOutput.java
##########
@@ -257,13 +257,29 @@ public void write(ByteBuffer b, int off, int len) throws IOException {
     if (len == 0) {
       return;
     }
+    int curLen = len;
+    // set limit on the number of bytes that a ByteBuffer(StreamBuffer) can hold
+    int maxBufferLen = config.getDataStreamMaxBufferSize();
+    while (curLen > 0) {
+      int writeLen = Math.min(curLen, maxBufferLen);
+      final StreamBuffer buf = new StreamBuffer(b, off, writeLen);
+      off += writeLen;
+      bufferList.add(buf);
+      writeChunkToContainer(buf.duplicate());
+      curLen -= writeLen;
+      writtenDataLength += writeLen;
+      doFlushIfNeeded();
+    }
+  }
 
-    final StreamBuffer buf = new StreamBuffer(b, off, len);
-    bufferList.add(buf);
-
-    writeChunkToContainer(buf.duplicate());
-
-    writtenDataLength += len;
+  private void doFlushIfNeeded() throws IOException {
+    Preconditions.checkArgument(config.getDataStreamBufferFlushSize() > config
+        .getDataStreamMaxBufferSize());
+    long boundary = config.getDataStreamBufferFlushSize() / config
+        .getDataStreamMaxBufferSize();
+    if (bufferList.size() % boundary == 0) {
+      executePutBlock(false, false);

Review comment:
       Did we need add updateFlushLength(), waitOnFlushFutures() and watchForCommit(false) here? Just as what we did in [handleFlush](https://github.com/apache/ozone/blob/HDDS-4454/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockDataStreamOutput.java#L435).  Currently we only releaseBuffers after watchForCommit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sadanand48 commented on a change in pull request #2866: HDDS-5851. Define a PutBlock/maxBuffer fixed boundary for streaming writes.

Posted by GitBox <gi...@apache.org>.
sadanand48 commented on a change in pull request #2866:
URL: https://github.com/apache/ozone/pull/2866#discussion_r759073315



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockDataStreamOutput.java
##########
@@ -257,13 +257,29 @@ public void write(ByteBuffer b, int off, int len) throws IOException {
     if (len == 0) {
       return;
     }
+    int curLen = len;
+    // set limit on the number of bytes that a ByteBuffer(StreamBuffer) can hold
+    int maxBufferLen = config.getDataStreamMaxBufferSize();
+    while (curLen > 0) {
+      int writeLen = Math.min(curLen, maxBufferLen);
+      final StreamBuffer buf = new StreamBuffer(b, off, writeLen);
+      off += writeLen;
+      bufferList.add(buf);
+      writeChunkToContainer(buf.duplicate());
+      curLen -= writeLen;
+      writtenDataLength += writeLen;
+      doFlushIfNeeded();
+    }
+  }
 
-    final StreamBuffer buf = new StreamBuffer(b, off, len);
-    bufferList.add(buf);
-
-    writeChunkToContainer(buf.duplicate());
-
-    writtenDataLength += len;
+  private void doFlushIfNeeded() throws IOException {
+    Preconditions.checkArgument(config.getDataStreamBufferFlushSize() > config
+        .getDataStreamMaxBufferSize());
+    long boundary = config.getDataStreamBufferFlushSize() / config
+        .getDataStreamMaxBufferSize();
+    if (bufferList.size() % boundary == 0) {
+      executePutBlock(false, false);

Review comment:
       Thanks @captainzmc for the comment. The plan here is to execute a putBlock at 32MB and not block the client right away by waiting for the flush futures and calling watchForCommit.  We will introduce another boundary say at 64MB to  check whether atleast 32MB is flushed and committed to all datanodes i.e watchForCommit() and waitForFlushFutures()  will occur at 64mb. (I will add this is in a different patch after this PR)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sadanand48 commented on a change in pull request #2866: HDDS-5851. Define a PutBlock/maxBuffer fixed boundary for streaming writes.

Posted by GitBox <gi...@apache.org>.
sadanand48 commented on a change in pull request #2866:
URL: https://github.com/apache/ozone/pull/2866#discussion_r759080282



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/OzoneClientConfig.java
##########
@@ -54,6 +54,21 @@
       tags = ConfigTag.CLIENT)
   private int streamBufferSize = 4 * 1024 * 1024;
 
+  @Config(key = "datastream.max.buffer.size",
+      defaultValue = "4MB",
+      type = ConfigType.SIZE,
+      description = "The maximum size of the ByteBuffer "

Review comment:
       @bshashikant Currently there is no limit on the size of a ByteBuffer a client can send during write , this property would solve that and also this helps in calculating the putblock boundary effectively (putblock would occur for every 8 byteBuffers).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sadanand48 commented on a change in pull request #2866: HDDS-5851. [Ozone-Streaming] Define a PutBlock/maxBuffer fixed boundary for streaming writes.

Posted by GitBox <gi...@apache.org>.
sadanand48 commented on a change in pull request #2866:
URL: https://github.com/apache/ozone/pull/2866#discussion_r759106473



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/OzoneClientConfig.java
##########
@@ -54,6 +54,21 @@
       tags = ConfigTag.CLIENT)
   private int streamBufferSize = 4 * 1024 * 1024;
 
+  @Config(key = "datastream.max.buffer.size",
+      defaultValue = "4MB",
+      type = ConfigType.SIZE,
+      description = "The maximum size of the ByteBuffer "
+          + "(used via ratis streaming)",
+      tags = ConfigTag.CLIENT)
+  private int dataStreamMaxBufferSize = 4 * 1024 * 1024;
+
+  @Config(key = "datastream.buffer.flush.size",
+      defaultValue = "32MB",
+      type = ConfigType.SIZE,
+      description = "The boundary at which putBlock is executed",
+      tags = ConfigTag.CLIENT)
+  private long dataStreamBufferFlushSize = 32 * 1024 * 1024;
+
   @Config(key = "stream.buffer.increment",

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on pull request #2866: HDDS-5851. [Ozone-Streaming] Define a PutBlock/maxBuffer fixed boundary for streaming writes.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #2866:
URL: https://github.com/apache/ozone/pull/2866#issuecomment-983303756


   Thanks @sadanand48 for the contribution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2866: HDDS-5851. [Ozone-Streaming] Define a PutBlock/maxBuffer fixed boundary for streaming writes.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2866:
URL: https://github.com/apache/ozone/pull/2866#discussion_r759101541



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/OzoneClientConfig.java
##########
@@ -54,6 +54,21 @@
       tags = ConfigTag.CLIENT)
   private int streamBufferSize = 4 * 1024 * 1024;
 
+  @Config(key = "datastream.max.buffer.size",
+      defaultValue = "4MB",
+      type = ConfigType.SIZE,
+      description = "The maximum size of the ByteBuffer "

Review comment:
       I think, this is a good config to have. Thanks for the clarification.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant merged pull request #2866: HDDS-5851. [Ozone-Streaming] Define a PutBlock/maxBuffer fixed boundary for streaming writes.

Posted by GitBox <gi...@apache.org>.
bshashikant merged pull request #2866:
URL: https://github.com/apache/ozone/pull/2866


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2866: HDDS-5851. [Ozone-Streaming] Define a PutBlock/maxBuffer fixed boundary for streaming writes.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2866:
URL: https://github.com/apache/ozone/pull/2866#discussion_r759101943



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/OzoneClientConfig.java
##########
@@ -54,6 +54,21 @@
       tags = ConfigTag.CLIENT)
   private int streamBufferSize = 4 * 1024 * 1024;
 
+  @Config(key = "datastream.max.buffer.size",
+      defaultValue = "4MB",
+      type = ConfigType.SIZE,
+      description = "The maximum size of the ByteBuffer "
+          + "(used via ratis streaming)",
+      tags = ConfigTag.CLIENT)
+  private int dataStreamMaxBufferSize = 4 * 1024 * 1024;
+
+  @Config(key = "datastream.buffer.flush.size",
+      defaultValue = "32MB",
+      type = ConfigType.SIZE,
+      description = "The boundary at which putBlock is executed",
+      tags = ConfigTag.CLIENT)
+  private long dataStreamBufferFlushSize = 32 * 1024 * 1024;
+
   @Config(key = "stream.buffer.increment",

Review comment:
       let's change this config value to 16 Mb as it is for async path.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org