You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/09/01 14:24:24 UTC

[GitHub] [hadoop-ozone] elek opened a new pull request #1374: HDDS-4185. Remove IncrementalByteBuffer from Ozone client

elek opened a new pull request #1374:
URL: https://github.com/apache/hadoop-ozone/pull/1374


   ## What changes were proposed in this pull request?
   
   During the teragen test it was identified that the IncrementalByteBuffer is one of the biggest bottlenecks.
   
   In the PR of HDDS-4119 (#1336) a long conversation has been started if it can be removed, or we need other solution to optimize.
   
   This jira is opened to continue the discussion and either remove or optimize the IncrementalByteByffer.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4185
   
   ## How was this patch tested?
   
   Checked with basic smoke test locally, and with full green CI on my branch.
   
   And tested with executing related, existing unit tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] elek closed pull request #1374: HDDS-4185. Remove IncrementalByteBuffer from Ozone client

Posted by GitBox <gi...@apache.org>.

elek closed pull request #1374:
URL: https://github.com/apache/hadoop-ozone/pull/1374


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] elek commented on pull request #1374: HDDS-4185. Remove IncrementalByteBuffer from Ozone client

Posted by GitBox <gi...@apache.org>.

elek commented on pull request #1374:
URL: https://github.com/apache/hadoop-ozone/pull/1374#issuecomment-688063954


   @bshashikant Do you have any feedback?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] elek closed pull request #1374: HDDS-4185. Remove IncrementalByteBuffer from Ozone client

Posted by GitBox <gi...@apache.org>.

elek closed pull request #1374:
URL: https://github.com/apache/hadoop-ozone/pull/1374


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] bshashikant commented on pull request #1374: HDDS-4185. Remove IncrementalByteBuffer from Ozone client

Posted by GitBox <gi...@apache.org>.

bshashikant commented on pull request #1374:
URL: https://github.com/apache/hadoop-ozone/pull/1374#issuecomment-688760210


   > @bshashikant Do you have any feedback?
   
   Thanks @elek for putting up the patch together. The patch looks good to me in general. I am just thinking of a scenario like this:
   
   1) A stream(KeyOutputStream) is open and write is initiated
   2) Let's say "write(stream)" is invoked multiple times on the stream each time with length of 100 bytes or such in multiple iterations.
   3) In this case, key size overall can be greater than a block but key is written in segments of let's say 100 bytes
   4) Stream is closed
   
   In cases like this, we will end up having too many small buffers (100 bytes in this cases) created (with each write call).
   Buffer pool is reused for multiple blocks of a key so the similar patten will be followed for all the blocks.
   
   Although, the overall key is larger than size, this pattern of write still will lead to allocation of very small buffers which is not desirable for performance as pointed by the perf tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] elek commented on pull request #1374: HDDS-4185. Remove IncrementalByteBuffer from Ozone client

Posted by GitBox <gi...@apache.org>.

elek commented on pull request #1374:
URL: https://github.com/apache/hadoop-ozone/pull/1374#issuecomment-694705145


   Thanks the feedback @bshashikant 
   
   I am not sure if I understood the example. Based on my understanding bufferPool is block specific and not key specific. Writing keys are independent, and full blocks (and pool) are supposed to used for almost all the blocks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] elek commented on pull request #1374: HDDS-4185. Remove IncrementalByteBuffer from Ozone client

Posted by GitBox <gi...@apache.org>.

elek commented on pull request #1374:
URL: https://github.com/apache/hadoop-ozone/pull/1374#issuecomment-696169868


   Had an offline call with @bshashikant 
   
   This patch couldn't work with keySize=0 (where key size is unknown).
   
   The proposed solution is to adjust the size of increment to the size of the buffers. By default, we won't use the IncrementalByteBuffer, but the option will be kept with additional configuration...
   
   Closing this PR and opening a new one...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] bshashikant commented on pull request #1374: HDDS-4185. Remove IncrementalByteBuffer from Ozone client

Posted by GitBox <gi...@apache.org>.

bshashikant commented on pull request #1374:
URL: https://github.com/apache/hadoop-ozone/pull/1374#issuecomment-694769935


   > Thanks the feedback @bshashikant
   > 
   > I am not sure if I understood the example. Based on my understanding bufferPool is block specific and not key specific. Writing keys are independent, and full blocks (and pool) are supposed to used for almost all the blocks.
   
   If u see BufferPool, the initilization happens in BlockOutputStreamEntryPool which in turn gets initialised in KeyoutputStream.
   The idea of maintaining the pool was itself to reuse the same for every block in a key. For example, for 1st block of a key, if we allocate say, 4MB buffers for 256 MB blocks, we will reuse the same buffers (without any new allocation) for next block of the key.... BufferPool is shared across multiple blocks in a key.
   
   To the second point, the buffer allocation with the patch is driven by the len passed in write call write(b, off, len) or write(b).
   Let's say an application starts a stream for write and calls write() api with 1KB worth of data in each write call, repeating the same pattern again and again till it closes the stream, where the total length of data written let's say 1GB. In such pattern of writes, we will end up allocating 1KB buffers every write() gets called.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] elek commented on pull request #1374: HDDS-4185. Remove IncrementalByteBuffer from Ozone client

Posted by GitBox <gi...@apache.org>.

elek commented on pull request #1374:
URL: https://github.com/apache/hadoop-ozone/pull/1374#issuecomment-684895095


   The key line is this:
   
   ```
   int effectiveBufferSize = Math.min(bufferSize, maxSize - bufferList.size() * bufferSize);
   ```
   
   If the block/key is smaller than the `BuffferPool` (16 * 4MB) we allocate only the required bytes.(last buffer in the pool is reduced by this `effectiveBufferSize`).
   
   If block size is bigger than the `BufferPool`, we need to allocate the full `BufferPool` anyway.
   
   cc @bshashikant 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org

[GitHub] [hadoop-ozone] elek commented on pull request #1374: HDDS-4185. Remove IncrementalByteBuffer from Ozone client

Posted by GitBox <gi...@apache.org>.

elek commented on pull request #1374:
URL: https://github.com/apache/hadoop-ozone/pull/1374#issuecomment-696169868


   Had an offline call with @bshashikant 
   
   This patch couldn't work with keySize=0 (where key size is unknown).
   
   The proposed solution is to adjust the size of increment to the size of the buffers. By default, we won't use the IncrementalByteBuffer, but the option will be kept with additional configuration...
   
   Closing this PR and opening a new one...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org