You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Ryan Skraba (Jira)" <ji...@apache.org> on 2021/08/06 13:44:00 UTC
[jira] [Updated] (AVRO-3183) Do Not Double Buffer Data in
DataFileWriter
[ https://issues.apache.org/jira/browse/AVRO-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Skraba updated AVRO-3183:
------------------------------
Fix Version/s: 1.11.0
> Do Not Double Buffer Data in DataFileWriter
> -------------------------------------------
>
> Key: AVRO-3183
> URL: https://issues.apache.org/jira/browse/AVRO-3183
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.10.0
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
> Fix For: 1.11.0
>
>
> {code:java|title=DataFileWriter.java}
> private void init(OutputStream outs) throws IOException {
> this.underlyingStream = outs;
> this.out = new BufferedFileOutputStream(outs);
> EncoderFactory efactory = new EncoderFactory();
> // binaryEncoder returns a buffered Encoder and is wrapping a BufferedFileOutputStream
> this.vout = efactory.binaryEncoder(out, null);
> dout.setSchema(schema);
> buffer = new NonCopyingByteArrayOutputStream(Math.min((int) (syncInterval * 1.25), Integer.MAX_VALUE / 2 - 1));
> // binaryEncoder returns a buffered Encoder and is wrapping a NonCopyingByteArrayOutputStream
> this.bufOut = efactory.binaryEncoder(buffer, null);
> if (this.codec == null) {
> this.codec = CodecFactory.nullCodec().createInstance();
> }
> this.isOpen = true;
> }
> {code}
> The {{FileWriter}} is double-buffering the output which just adds redundant overhead and truthfully the buffering offered by the object returned by {{binaryEncoder}} is a bit simplistic and does not do as good of a job as the buffering in {{BufferedFileOutputStream}}.
> Remove this double buffering by using a 'direct' {{binaryEncoder}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)