You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2021/08/04 17:16:00 UTC

[jira] [Created] (AVRO-3183) Do Not Double Buffer Data in DataFileWriter

David Mollitor created AVRO-3183:
------------------------------------

             Summary: Do Not Double Buffer Data in DataFileWriter
                 Key: AVRO-3183
                 URL: https://issues.apache.org/jira/browse/AVRO-3183
             Project: Apache Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.10.0
            Reporter: David Mollitor
            Assignee: David Mollitor


{code:java|title=DataFileWriter.java}
  private void init(OutputStream outs) throws IOException {
    this.underlyingStream = outs;
    this.out = new BufferedFileOutputStream(outs);
    EncoderFactory efactory = new EncoderFactory();
    // binaryEncoder returns a buffered Encoder and is wrapping a BufferedFileOutputStream
    this.vout = efactory.binaryEncoder(out, null);
    dout.setSchema(schema);
    buffer = new NonCopyingByteArrayOutputStream(Math.min((int) (syncInterval * 1.25), Integer.MAX_VALUE / 2 - 1));
    // binaryEncoder returns a buffered Encoder and is wrapping a NonCopyingByteArrayOutputStream
    this.bufOut = efactory.binaryEncoder(buffer, null);
    if (this.codec == null) {
      this.codec = CodecFactory.nullCodec().createInstance();
    }
    this.isOpen = true;
  }
{code}

The {{FileWriter}} is double-buffering the output which just adds redundant overhead and truthfully the buffering offered by the object returned by {{binaryEncoder}} is a bit simplistic and does not do as good of a job as the buffering in {{BufferedFileOutputStream}}.

Remove this double buffering by using a 'direct' {{binaryEncoder}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)