You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Sai Sri Harsha Gudladona (Jira)" <ji...@apache.org> on 2021/02/08 22:27:00 UTC

[jira] [Commented] (PARQUET-118) Provide option to use on-heap buffers for Snappy compression/decompression

    [ https://issues.apache.org/jira/browse/PARQUET-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281418#comment-17281418 ] 

Sai Sri Harsha Gudladona commented on PARQUET-118:
--------------------------------------------------

Are there any better ways to handle this for compression and decompression. Using this lib in a streaming application to batch protobuf/json to snappy compressed parquet is causing sporadic OOM errors. 

> Provide option to use on-heap buffers for Snappy compression/decompression
> --------------------------------------------------------------------------
>
>                 Key: PARQUET-118
>                 URL: https://issues.apache.org/jira/browse/PARQUET-118
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.6.0
>            Reporter: Patrick Wendell
>            Priority: Major
>
> The current code uses direct off-heap buffers for decompression. If many decompressors are instantiated across multiple threads, and/or the objects being decompressed are large, this can lead to a huge amount of off-heap allocation by the JVM. This can be exacerbated if overall, there is not heap contention, since no GC will be performed to reclaim the space used by these buffers.
> It would be nice if there was a flag we cold use to simply allocate on-heap buffers here:
> https://github.com/apache/incubator-parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/codec/SnappyDecompressor.java#L28
> We ran into an issue today where these buffers totaled a very large amount of storage and caused our Java processes (running within containers) to be terminated by the kernel OOM-killer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)