You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/09/22 15:51:00 UTC

[jira] [Commented] (AVRO-3167) Simplify Codec Buffer Allocation

    [ https://issues.apache.org/jira/browse/AVRO-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418675#comment-17418675 ] 

ASF subversion and git services commented on AVRO-3167:
-------------------------------------------------------

Commit c5ffd6e4fa5a231e2b37cf0f165b9fe4fc4c3c6c in avro's branch refs/heads/master from belugabehr
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=c5ffd6e ]

AVRO-3167: Simplify Codec Buffer Allocation (#1275)

* AVRO-3167: Simplify DeflateCodec Buffer Allocation

* Updated other codecs as well

> Simplify Codec Buffer Allocation
> --------------------------------
>
>                 Key: AVRO-3167
>                 URL: https://issues.apache.org/jira/browse/AVRO-3167
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Some performance testing of another product highlighted some weirdness to me in Avro library.  In particular the way that blocks are compressed/decompressed in {{DeflateCodec}}.
> For each block of raw data, it is compressed/decompressed into a new buffer.  That new buffer is then immediately written out to disk.  Well, that buffer is requested by the caller with a requested size, but it's a bit odd because the buffer is cached, so only the first call has any affect.  Also, that buffer is expanded as needed, but then is maintained at that size for the life of the application, it is never resized smaller, so it could hold that large (underutilized) buffer for awhile.
> Finally, even if the requested size was working as expected, the "requested" size is quite dubious.  Right now, the requested size is equal to the size of the raw block, which means that the buffer requested for a decompress will always be too small and the buffer created for a compression will always be too big.  Instead, I propose that we just fix a sensible default value for all buffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)