You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2021/08/04 17:16:00 UTC
[jira] [Created] (AVRO-3183) Do Not Double Buffer Data in
DataFileWriter
David Mollitor created AVRO-3183:
------------------------------------
Summary: Do Not Double Buffer Data in DataFileWriter
Key: AVRO-3183
URL: https://issues.apache.org/jira/browse/AVRO-3183
Project: Apache Avro
Issue Type: Improvement
Components: java
Affects Versions: 1.10.0
Reporter: David Mollitor
Assignee: David Mollitor
{code:java|title=DataFileWriter.java}
private void init(OutputStream outs) throws IOException {
this.underlyingStream = outs;
this.out = new BufferedFileOutputStream(outs);
EncoderFactory efactory = new EncoderFactory();
// binaryEncoder returns a buffered Encoder and is wrapping a BufferedFileOutputStream
this.vout = efactory.binaryEncoder(out, null);
dout.setSchema(schema);
buffer = new NonCopyingByteArrayOutputStream(Math.min((int) (syncInterval * 1.25), Integer.MAX_VALUE / 2 - 1));
// binaryEncoder returns a buffered Encoder and is wrapping a NonCopyingByteArrayOutputStream
this.bufOut = efactory.binaryEncoder(buffer, null);
if (this.codec == null) {
this.codec = CodecFactory.nullCodec().createInstance();
}
this.isOpen = true;
}
{code}
The {{FileWriter}} is double-buffering the output which just adds redundant overhead and truthfully the buffering offered by the object returned by {{binaryEncoder}} is a bit simplistic and does not do as good of a job as the buffering in {{BufferedFileOutputStream}}.
Remove this double buffering by using a 'direct' {{binaryEncoder}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)