You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "BELUGA BEHR (JIRA)" <ji...@apache.org> on 2018/03/14 15:14:39 UTC
[jira] [Commented] (HBASE-20197) Review of ByteBufferWriterOutputStream.java

    [ https://issues.apache.org/jira/browse/HBASE-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398726#comment-16398726 ] 

BELUGA BEHR commented on HBASE-20197:
-------------------------------------

Bench-marking on my hardware shows:

{code:java}
# Run complete. Total time: 00:26:59

Benchmark Mode Cnt Score Error Units
BenchmarkByteBufferOutputStream.testWriteByteBufferChunkingLarge thrpt 200 6374680.759 ± 308867.921 ops/s
BenchmarkByteBufferOutputStream.testWriteByteBufferChunkingSmall thrpt 200 39059426.368 ± 258920.534 ops/s
BenchmarkByteBufferOutputStream.testWriteByteBufferLarge thrpt 200 1808688.360 ± 6144.521 ops/s
BenchmarkByteBufferOutputStream.testWriteByteBufferSmall thrpt 200 47643302.832 ± 2087925.046 ops/s
{code}

{code}
package org.test;

import java.io.IOException;
import java.nio.ByteBuffer;

import org.apache.hadoop.hbase.io.ByteBufferWriterOutputStream;
import org.apache.hadoop.hbase.io.ByteBufferWriterOutputStreamChunking;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.util.NullOutputStream;

public class BenchmarkByteBufferOutputStream
{
  @State(Scope.Thread)
  public static class Data
  {
    NullOutputStream nullOutputStream = new NullOutputStream();
    ByteBufferWriterOutputStream bbwos = new ByteBufferWriterOutputStream(nullOutputStream);
    ByteBufferWriterOutputStreamChunking bbwosNio = new ByteBufferWriterOutputStreamChunking(nullOutputStream);

    ByteBuffer bbSmall;
    ByteBuffer bbLarge;

    @Setup(Level.Trial)
    public void setupTrial()
    {
      // Easily fits into the default 4K buffer
      bbSmall = ByteBuffer.wrap(new byte[512]);

      // Creates a buffer larger than 4k
      bbLarge = ByteBuffer.wrap(new byte[6144]);

      // This testing does not demonstrate heterogeneous input sizes 
    }
  }

  @Benchmark
  public void testWriteByteBufferSmall(Data data) throws IOException
  {
    data.bbwos.write(data.bbSmall, 0, 512);
  }

  @Benchmark
  public void testWriteByteBufferLarge(Data data) throws IOException
  {
    data.bbLarge.rewind();
    data.bbwos.write(data.bbLarge, 0, 6144);
  }

  @Benchmark
  public void testWriteByteBufferChunkingLarge(Data data) throws IOException
  {
    data.bbwosNio.write(data.bbLarge, 0, 6144);
  }

  @Benchmark
  public void testWriteByteBufferChunkingSmall(Data data) throws IOException
  {
    data.bbwosNio.write(data.bbSmall, 0, 512);
  }

  public static void main(String[] args) throws RunnerException, IOException
  {
    org.openjdk.jmh.Main.main(args);
  }
}
{code}

> Review of ByteBufferWriterOutputStream.java
> -------------------------------------------
>
>                 Key: HBASE-20197
>                 URL: https://issues.apache.org/jira/browse/HBASE-20197
>             Project: HBase
>          Issue Type: Improvement
>          Components: hbase
>    Affects Versions: 2.0.0, 1.4.2
>            Reporter: BELUGA BEHR
>            Priority: Minor
>         Attachments: HBASE-20197.1.patch
>
>
> In looking at this class, two things caught my eye.
>  # Default buffer size of 4K
>  # Re-sizing of buffer on demand
>  
> Java's {{BufferedOutputStream}} uses an internal buffer size of 8K on modern JVMs.  This is due to various bench-marking that showed optimal performance at this level.
>  The Re-sizing buffer looks a bit "unsafe":
>  
> {code:java}
> public void write(ByteBuffer b, int off, int len) throws IOException {
>   byte[] buf = null;
>   if (len > TEMP_BUF_LENGTH) {
>     buf = new byte[len];
>   } else {
>     if (this.tempBuf == null) {
>       this.tempBuf = new byte[TEMP_BUF_LENGTH];
>     }
>     buf = this.tempBuf;
>   }
> ...
> }
> {code}
> If this method gets one call with a 'len' of 4000, then 4001, then 4002, then 4003, etc. then the 'tempBuf' will be re-created many times.  Also, it seems unsafe to create a buffer as large as the 'len' input.  This could theoretically lead to an internal buffer of 2GB for each instance of this class.
> I propose:
>  # Increase the default buffer size to 8K
>  # Create the buffer once and chunk the output instead of loading data into a single array and writing it to the output stream.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)