You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2021/08/04 18:06:00 UTC

[jira] [Commented] (AVRO-3183) Do Not Double Buffer Data in DataFileWriter

    [ https://issues.apache.org/jira/browse/AVRO-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393373#comment-17393373 ] 

David Mollitor commented on AVRO-3183:
--------------------------------------

I came across this doing some performance testing for another product that somewhat inadvertently tests Avro as well.

> Do Not Double Buffer Data in DataFileWriter
> -------------------------------------------
>
>                 Key: AVRO-3183
>                 URL: https://issues.apache.org/jira/browse/AVRO-3183
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.10.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>
> {code:java|title=DataFileWriter.java}
>   private void init(OutputStream outs) throws IOException {
>     this.underlyingStream = outs;
>     this.out = new BufferedFileOutputStream(outs);
>     EncoderFactory efactory = new EncoderFactory();
>     // binaryEncoder returns a buffered Encoder and is wrapping a BufferedFileOutputStream
>     this.vout = efactory.binaryEncoder(out, null);
>     dout.setSchema(schema);
>     buffer = new NonCopyingByteArrayOutputStream(Math.min((int) (syncInterval * 1.25), Integer.MAX_VALUE / 2 - 1));
>     // binaryEncoder returns a buffered Encoder and is wrapping a NonCopyingByteArrayOutputStream
>     this.bufOut = efactory.binaryEncoder(buffer, null);
>     if (this.codec == null) {
>       this.codec = CodecFactory.nullCodec().createInstance();
>     }
>     this.isOpen = true;
>   }
> {code}
> The {{FileWriter}} is double-buffering the output which just adds redundant overhead and truthfully the buffering offered by the object returned by {{binaryEncoder}} is a bit simplistic and does not do as good of a job as the buffering in {{BufferedFileOutputStream}}.
> Remove this double buffering by using a 'direct' {{binaryEncoder}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)