You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@accumulo.apache.org by "Eric Newton (JIRA)" <ji...@apache.org> on 2015/12/28 19:13:49 UTC

[jira] [Commented] (ACCUMULO-624) iterators may open lots of compressors

    [ https://issues.apache.org/jira/browse/ACCUMULO-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072964#comment-15072964 ] 

Eric Newton commented on ACCUMULO-624:
--------------------------------------

I wrote a little experiment: 10 threads allocate 100K decompressors each.

Using  {{gz.returnDecompressor(gz.getDecompressor()}} all threads complete in 1.4 seconds.

Using {{gz.getCodec().createDecompressor()}} all threads complete in 20 seconds.

So, it is quite a bit faster to use the pool. But, allocating decompressors without the pool still takes less than a millisecond.

It seems we are not the only ones that think that [codec reuse may not be worth it | https://github.com/prestodb/presto-hive-apache/blob/master/src/main/java/org/apache/hadoop/hive/ql/io/CodecPool.java].

> iterators may open lots of compressors
> --------------------------------------
>
>                 Key: ACCUMULO-624
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-624
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>            Reporter: Eric Newton
>
> A large iterator tree may create many instances of Compressors.  These instances are pulled from a pool that never decreases in size.  So, if 50 simultaneous queries are run over dozens of files, each with a complex iterator stack, there will be thousands of compressors created.  Each of these holds a large buffer.  This can cause the server to run out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)