You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "BELUGA BEHR (JIRA)" <ji...@apache.org> on 2018/03/15 20:10:00 UTC

[jira] [Comment Edited] (HBASE-20197) Review of ByteBufferWriterOutputStream.java

    [ https://issues.apache.org/jira/browse/HBASE-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401040#comment-16401040 ] 

BELUGA BEHR edited comment on HBASE-20197 at 3/15/18 8:09 PM:
--------------------------------------------------------------

[~anoop.hbase] Thanks for the feedback.  I made the change in line with [KISS|https://en.wikipedia.org/wiki/KISS_principle].  The BBUtil code is doing an arrayCopy, the ByteBuffer is doing an arrayCopy... it just didn't feel necessary to include an extra module just to save the 'duplicate' call.  As a bonus, the side benefit of this change is to future-proof the code, as introduced by my failure test.  Thank you for your consideration.

The failed tests look like timeouts, unrelated to this change.


was (Author: belugabehr):
[~anoop.hbase] Thanks for the feedback.  I made the change in line with [KISS|https://en.wikipedia.org/wiki/KISS_principle].  The BBUtil code is doing an arrayCopy, the ByteBuffer is doing an arrayCopy... it just didn't feel necessary to include an extra module just for the heck of it and it's a one-line change.  As a bonus, the side benefit of this change is to future-proof the code, as introduced by my failure test.  Thank you for your consideration.

The failed tests look like timeouts, unrelated to this change.

> Review of ByteBufferWriterOutputStream.java
> -------------------------------------------
>
>                 Key: HBASE-20197
>                 URL: https://issues.apache.org/jira/browse/HBASE-20197
>             Project: HBase
>          Issue Type: Improvement
>          Components: hbase
>    Affects Versions: 2.0.0
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Minor
>         Attachments: HBASE-20197.1.patch, HBASE-20197.2.patch, HBASE-20197.3.patch, HBASE-20197.4.patch
>
>
> In looking at this class, two things caught my eye.
>  # Default buffer size of 4K
>  # Re-sizing of buffer on demand
>  
> Java's {{BufferedOutputStream}} uses an internal buffer size of 8K on modern JVMs.  This is due to various bench-marking that showed optimal performance at this level.
>  The Re-sizing buffer looks a bit "unsafe":
>  
> {code:java}
> public void write(ByteBuffer b, int off, int len) throws IOException {
>   byte[] buf = null;
>   if (len > TEMP_BUF_LENGTH) {
>     buf = new byte[len];
>   } else {
>     if (this.tempBuf == null) {
>       this.tempBuf = new byte[TEMP_BUF_LENGTH];
>     }
>     buf = this.tempBuf;
>   }
> ...
> }
> {code}
> If this method gets one call with a 'len' of 4000, then 4001, then 4002, then 4003, etc. then the 'tempBuf' will be re-created many times.  Also, it seems unsafe to create a buffer as large as the 'len' input.  This could theoretically lead to an internal buffer of 2GB for each instance of this class.
> I propose:
>  # Increase the default buffer size to 8K
>  # Create the buffer once and chunk the output instead of loading data into a single array and writing it to the output stream.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)