You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/03/18 23:47:54 UTC

[GitHub] [ozone] hanishakoneru opened a new pull request #2062: HDDS-4553. ChunkInputStream should release buffer as soon as last byte in the buffer is read

hanishakoneru opened a new pull request #2062:
URL: https://github.com/apache/ozone/pull/2062


   ## What changes were proposed in this pull request?
   
   We currently wait for reading up till the Chunk EOF before releasing the buffers in ChunkInputStream (or when the stream is closed). Let's say a client reads first 3 MB of a chunk  of size 4MB and does not close the stream immediately. This would lead to the 3MB of data being cached in the ChunkInputStream buffers till the stream is closed.
   
   After HDDS-4552, a chunk read from DN would return the chunk data as an array of ByteBuffers. After each ByteBuffer is read, it can be released. This would greatly help with optimizing memory usage of ChunkInputStream.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4553
   
   ## How was this patch tested?
   
   Added unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a change in pull request #2062: HDDS-4553. ChunkInputStream should release buffer as soon as last byte in the buffer is read

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on a change in pull request #2062:
URL: https://github.com/apache/ozone/pull/2062#discussion_r605199652



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/ChunkInputStream.java
##########
@@ -628,12 +645,38 @@ private boolean chunkStreamEOF() {
     }
   }
 
+
+  /**
+   * Release a buffers upto the given index.

Review comment:
       Tiny nit: make sure to change it before commit
   Release a buffers --> Release the buffers




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2062: HDDS-4553. ChunkInputStream should release buffer as soon as last byte in the buffer is read

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2062:
URL: https://github.com/apache/ozone/pull/2062#issuecomment-803195652


   @adoroszlai, I fixed the review comments you posted in HDDS-4552 here. Please take a look. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a change in pull request #2062: HDDS-4553. ChunkInputStream should release buffer as soon as last byte in the buffer is read

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on a change in pull request #2062:
URL: https://github.com/apache/ozone/pull/2062#discussion_r600829152



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/ChunkInputStream.java
##########
@@ -628,12 +644,40 @@ private boolean chunkStreamEOF() {
     }
   }
 
+
+  /**
+   * Release a buffer.

Review comment:
       Probably we can refine this doc a bit? This method releases "buffers" till the given index position . 

##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/ChunkInputStream.java
##########
@@ -89,6 +88,9 @@
   // of chunk data
   private long bufferOffsetWrtChunkData;
 
+  // Index of the first buffer which has not been released
+  private int minBufferIndex = 0;

Review comment:
       I am little bit confused with this. Can we think better name? unreleasedFirstBuffIdx or so ?

##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/interfaces/ChunkManager.java
##########
@@ -101,4 +103,33 @@ default void finishWriteChunks(KeyValueContainer kvContainer,
       BlockData blockData) throws IOException {
     // no-op
   }
+
+  default long getBufferCapacityForChunkRead(ChunkInfo chunkInfo,

Review comment:
       this can be static method right?

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/read/TestChunkInputStream.java
##########
@@ -56,65 +55,151 @@ public void testChunkReadBuffers() throws Exception {
     // To read 1 byte of chunk data, ChunkInputStream should get one full
     // checksum boundary worth of data from Container and store it in buffers.
     chunk0Stream.read(new byte[1]);
-    checkBufferSizeAndCapacity(chunk0Stream.getCachedBuffers(), 1,
+    checkBufferSizeAndCapacity(chunk0Stream.getCachedBuffers(), 1, 0,
         BYTES_PER_CHECKSUM);
 
     // Read > checksum boundary of data from chunk0
     int readDataLen = BYTES_PER_CHECKSUM + (BYTES_PER_CHECKSUM / 2);
-    byte[] readData = readDataFromChunk(chunk0Stream, readDataLen, 0);
+    byte[] readData = readDataFromChunk(chunk0Stream, 0, readDataLen);
     validateData(inputData, 0, readData);
 
     // The first checksum boundary size of data was already existing in the
     // ChunkStream buffers. Once that data is read, the next checksum
     // boundary size of data will be fetched again to read the remaining data.
     // Hence there should be 1 checksum boundary size of data stored in the
     // ChunkStreams buffers at the end of the read.
-    checkBufferSizeAndCapacity(chunk0Stream.getCachedBuffers(), 1,
+    checkBufferSizeAndCapacity(chunk0Stream.getCachedBuffers(), 1, 0,
         BYTES_PER_CHECKSUM);
 
     // Seek to a position in the third checksum boundary (so that current
     // buffers do not have the seeked position) and read > BYTES_PER_CHECKSUM
     // bytes of data. This should result in 2 * BYTES_PER_CHECKSUM amount of
-    // data being read into the buffers. There should be 2 buffers each with
-    // BYTES_PER_CHECKSUM capacity.
+    // data being read into the buffers. There should be 2 buffers in the
+    // stream but the the first buffer should be released after it is read
+    // and the second buffer should have BYTES_PER_CHECKSUM capacity.
     readDataLen = BYTES_PER_CHECKSUM + (BYTES_PER_CHECKSUM / 2);
     int offset = 2 * BYTES_PER_CHECKSUM + 1;
-    readData = readDataFromChunk(chunk0Stream, readDataLen, offset);
+    readData = readDataFromChunk(chunk0Stream, offset, readDataLen);
     validateData(inputData, offset, readData);
-    checkBufferSizeAndCapacity(chunk0Stream.getCachedBuffers(), 2,
+    checkBufferSizeAndCapacity(chunk0Stream.getCachedBuffers(), 2, 1,
         BYTES_PER_CHECKSUM);
 
 
-    // Read the full chunk data -1 and verify that all chunk data is read into
-    // buffers. We read CHUNK_SIZE - 1 as otherwise the buffers will be
+    // Read the full chunk data - 1 and verify that all chunk data is read into
+    // buffers. We read CHUNK_SIZE - 1 as otherwise all the buffers will be
     // released once all chunk data is read.
-    readData = readDataFromChunk(chunk0Stream, CHUNK_SIZE - 1, 0);
+    readData = readDataFromChunk(chunk0Stream, 0, CHUNK_SIZE - 1);
     validateData(inputData, 0, readData);
+    int expectedNumBuffers = CHUNK_SIZE / BYTES_PER_CHECKSUM;
     checkBufferSizeAndCapacity(chunk0Stream.getCachedBuffers(),
-        CHUNK_SIZE / BYTES_PER_CHECKSUM, BYTES_PER_CHECKSUM);
+        expectedNumBuffers, expectedNumBuffers - 1, BYTES_PER_CHECKSUM);
 
     // Read the last byte of chunk and verify that the buffers are released.
     chunk0Stream.read(new byte[1]);
     Assert.assertNull("ChunkInputStream did not release buffers after " +
         "reaching EOF.", chunk0Stream.getCachedBuffers());
+  }
+
+  /**
+   * Test that ChunkInputStream buffers are released as soon as the last byte
+   * of the buffer is read.
+   */
+  @Test
+  public void testBufferRelease() throws Exception {
+    String keyName = getNewKeyName();
+    int dataLength = CHUNK_SIZE;
+    byte[] inputData = writeRandomBytes(keyName, dataLength);
+
+    KeyInputStream keyInputStream = getKeyInputStream(keyName);

Review comment:
       You may want to use try(KeyInputStream keyInputStream = getKeyInputStream(keyName)){... }
   So, that stream will be autoclosed
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #2062: HDDS-4553. ChunkInputStream should release buffer as soon as last byte in the buffer is read

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on pull request #2062:
URL: https://github.com/apache/ozone/pull/2062#issuecomment-806152370


   @hanishakoneru Thank you for the patch. I have posted few minor comments. Please check when you get time. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #2062: HDDS-4553. ChunkInputStream should release buffer as soon as last byte in the buffer is read

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on pull request #2062:
URL: https://github.com/apache/ozone/pull/2062#issuecomment-804072868


   > I fixed the review comments you posted in [HDDS-4552](https://issues.apache.org/jira/browse/HDDS-4552) [here](https://github.com/apache/ozone/pull/2062/commits/42f29eba3955a5aef36c26d1b17473d78abc402b)
   
   Thanks.  That part looks good.  I'll check the rest later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2062: HDDS-4553. ChunkInputStream should release buffer as soon as last byte in the buffer is read

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2062:
URL: https://github.com/apache/ozone/pull/2062#issuecomment-812052863


   Thank you @umamaheswararao and @bshashikant for the reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru edited a comment on pull request #2062: HDDS-4553. ChunkInputStream should release buffer as soon as last byte in the buffer is read

Posted by GitBox <gi...@apache.org>.
hanishakoneru edited a comment on pull request #2062:
URL: https://github.com/apache/ozone/pull/2062#issuecomment-803195652


   @adoroszlai, I fixed the review comments you posted in HDDS-4552 [here](https://github.com/apache/ozone/pull/2062/commits/42f29eba3955a5aef36c26d1b17473d78abc402b). Please take a look. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru merged pull request #2062: HDDS-4553. ChunkInputStream should release buffer as soon as last byte in the buffer is read

Posted by GitBox <gi...@apache.org>.
hanishakoneru merged pull request #2062:
URL: https://github.com/apache/ozone/pull/2062


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2062: HDDS-4553. ChunkInputStream should release buffer as soon as last byte in the buffer is read

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2062:
URL: https://github.com/apache/ozone/pull/2062#issuecomment-809640751


   Thank you @umamaheswararao for the review. I have addressed your comments in the latest commit. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org