You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Parth Chandra (JIRA)" <ji...@apache.org> on 2014/11/07 00:19:35 UTC

[jira] [Commented] (DRILL-1480) severe memory leak query snappy compressed parquet file

    [ https://issues.apache.org/jira/browse/DRILL-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201180#comment-14201180 ] 

Parth Chandra commented on DRILL-1480:
--------------------------------------

This problem is not limited to snappy or compressed files. Direct memory in use by Drill goes up over time until the drillbit runs out of memory.
The problem with increasing memory is a result of Drill's use of Netty. Netty provides a memory allocator that divides allocated memory into thread arenas, where each thread gets memory from a specific arena, so that two independent threads are able to allocate memory from the pool without synchronization overhead. In addition Netty 4.0.20 uses a thread pool memory cache to minimize synchronization overhead. When a thread allocates memory the allocator looks for memory in its arenas cache corresponding to the threads arena and if found returns it otherwise it get memory from the arena. When memory is released, the memory is added to the arenas cache.
This works fine when the same thread allocates and releases memory. In Drill, memory allocated by one thread may be passed on to another thread and the last thread to use the memory, eventually releases it. The result is to move memory belonging to the allocating thread's arena to the releasing thread's arena cache. Since threads are reused across queries, effectively, the allocating threads are constantly 'losing' memory to the releasing threads and therefore keep allocating more.
Netty 4.0.24 fixes this. See https://github.com/netty/netty/pull/2855

Fix is to upgrade to Netty 4.0.24

Running out of heap memory (as indicated in one of the log entries above) is not related to this problem.

A patch for this is here:
https://github.com/parthchandra/incubator-drill/commit/aacc63320d4e75e6c6ef98751cd8e793935f2b85

Includes a fix for a minor problem in the Parquet reader discovered during debugging.





> severe memory leak query snappy compressed parquet file
> -------------------------------------------------------
>
>                 Key: DRILL-1480
>                 URL: https://issues.apache.org/jira/browse/DRILL-1480
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 0.6.0
>            Reporter: Chun Chang
>            Assignee: Parth Chandra
>            Priority: Blocker
>             Fix For: 0.7.0
>
>
> #Wed Oct 01 00:19:24 EDT 2014
> git.commit.id.abbrev=5c220e3
> Running TPCH query #03, drill bit shows severe memory leak and quickly ran out of memory:
> 2014-10-02 00:51:21,520 [WorkManager-116] ERROR o.apache.drill.exec.work.WorkManager - Failure while running wrapper [FragmentExecutor: 7d345235-0eb4-4189-b34f-f535fa5ad1bb:4:10]
> java.lang.OutOfMemoryError: Direct buffer memory
>         at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_65]
>         at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[na:1.7.0_65]
>         at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_65]
>         at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) ~[netty-buffer-4.0.20.Final.jar:4.0.20.Final]
>         at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-buffer-4.0.20.Final.jar:4.0.20.Final]
>         at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-buffer-4.0.20.Final.jar:4.0.20.Final]
>         at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) ~[netty-buffer-4.0.20.Final.jar:4.0.20.Final]
>         at io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:46) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:4.0.20.Final]
>         at io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:66) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:4.0.20.Final]
>         at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:205) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
>         at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:212) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
>         at org.apache.drill.exec.vector.IntVector.allocateNew(IntVector.java:149) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
>         at org.apache.drill.exec.test.generated.HashTableGen478.allocMetadataVector(HashTableTemplate.java:728) ~[na:na]
>         at org.apache.drill.exec.test.generated.HashTableGen478.access$200(HashTableTemplate.java:41) ~[na:na]
>         at org.apache.drill.exec.test.generated.HashTableGen478$BatchHolder.<init>(HashTableTemplate.java:132) ~[na:na]
>         at org.apache.drill.exec.test.generated.HashTableGen478$BatchHolder.<init>(HashTableTemplate.java:101) ~[na:na]
>         at org.apache.drill.exec.test.generated.HashTableGen478.addBatchHolder(HashTableTemplate.java:654) ~[na:na]
>         at org.apache.drill.exec.test.generated.HashTableGen478.put(HashTableTemplate.java:494) ~[na:na]
>         at org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:344) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:193) ~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
> The query runs fine against uncompressed parquet file of same 100G scale factor. Here is the query:
> [root@atsqa8c21 testcases]# cat 03.q
> -- tpch3 using 1395599672 as a seed to the RNG
> select
>   l.l_orderkey,
>   sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
>   o.o_orderdate,
>   o.o_shippriority
> from
>   customer c,
>   orders o,
>   lineitem l
> where
>   c.c_mktsegment = 'HOUSEHOLD'
>   and c.c_custkey = o.o_custkey
>   and l.l_orderkey = o.o_orderkey
>   and o.o_orderdate < date '1995-03-25'
>   and l.l_shipdate > date '1995-03-25'
> group by
>   l.l_orderkey,
>   o.o_orderdate,
>   o.o_shippriority
> order by
>   revenue desc,
>   o.o_orderdate
> limit 10;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)