You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Abhishek Girish (JIRA)" <ji...@apache.org> on 2018/05/04 01:23:00 UTC

[jira] [Updated] (DRILL-6384) TPC-H tests fail with OOM

     [ https://issues.apache.org/jira/browse/DRILL-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Abhishek Girish updated DRILL-6384:
-----------------------------------
    Description: 
On latest Apache master, we are observing that there are multiple test failures. It looks like Drill runs out of Direct memory and queries fail with OOM. Few other queries fail probably fail because they are unable to connect to Drillbits.

It looks like one of the recent commits caused this.

||Commit ID||Status||
|24193b1b038a6315681a65c76a67034b64f71fc5|FAIL|
|9173308710c3decf8ff745493ad3e85ccdaf7c37|PASS|
|c6549e58859397c88cb1de61b4f6eee52a07ed0c|PASS|

Two example queries + exceptions below. Also query log attached. 

*Query 1*: Advanced/tpch/tpch_sf100/parquet/10.q
{code}
 select
 c.c_custkey,
 c.c_name,
 sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
 c.c_acctbal,
 n.n_name,
 c.c_address,
 c.c_phone,
 c.c_comment
 from
 customer c,
 orders o,
 lineitem l,
 nation n
 where
 c.c_custkey = o.o_custkey
 and l.l_orderkey = o.o_orderkey
 and o.o_orderdate >= date '1994-03-01'
 and o.o_orderdate < date '1994-03-01' + interval '3' month
 and l.l_returnflag = 'R'
 and c.c_nationkey = n.n_nationkey
 group by
 c.c_custkey,
 c.c_name,
 c.c_acctbal,
 c.c_phone,
 n.n_name,
 c.c_address,
 c.c_comment
 order by
 revenue desc
 limit 20
{code}
Exception:
{code}
java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.

AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
 Fragment 4:88

[Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]

(org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
 org.apache.drill.exec.record.AbstractRecordBatch.next():164
 org.apache.drill.exec.physical.impl.BaseRootExec.next():105
 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
 org.apache.drill.exec.physical.impl.BaseRootExec.next():95
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
 java.security.AccessController.doPrivileged():-2
 javax.security.auth.Subject.doAs():422
 org.apache.hadoop.security.UserGroupInformation.doAs():1595
 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
 org.apache.drill.common.SelfCleaningRunnable.run():38
 java.util.concurrent.ThreadPoolExecutor.runWorker():1149
 java.util.concurrent.ThreadPoolExecutor$Worker.run():624
 java.lang.Thread.run():748

at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:530)
 at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:634)
 at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207)
 at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:155)
 at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253)
 at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.

AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
 Fragment 4:88

[Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]

(org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
 org.apache.drill.exec.record.AbstractRecordBatch.next():164
 org.apache.drill.exec.physical.impl.BaseRootExec.next():105
 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
 org.apache.drill.exec.physical.impl.BaseRootExec.next():95
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
 java.security.AccessController.doPrivileged():-2
 javax.security.auth.Subject.doAs():422
 org.apache.hadoop.security.UserGroupInformation.doAs():1595
 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
 org.apache.drill.common.SelfCleaningRunnable.run():38
 java.util.concurrent.ThreadPoolExecutor.runWorker():1149
 java.util.concurrent.ThreadPoolExecutor$Worker.run():624
 java.lang.Thread.run():748
{code}

 *Query 2:* 
Advanced/tpch/tpch_sf100/parquet/08.q
{code}
select
o_year,
sum(case
when nation = 'EGYPT' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o.o_orderdate) as o_year,
l.l_extendedprice * (1 - l.l_discount) as volume,
n2.n_name as nation
from
part p,
supplier s,
lineitem l,
orders o,
customer c,
nation n1,
nation n2,
region r
where
p.p_partkey = l.l_partkey
and s.s_suppkey = l.l_suppkey
and l.l_orderkey = o.o_orderkey
and o.o_custkey = c.c_custkey
and c.c_nationkey = n1.n_nationkey
and n1.n_regionkey = r.r_regionkey
and r.r_name = 'MIDDLE EAST'
and s.s_nationkey = n2.n_nationkey
and o.o_orderdate between date '1995-01-01' and date '1996-12-31'
and p.p_type = 'PROMO BRUSHED COPPER'
) as all_nations
group by
o_year
order by
o_year
{code}
Exception:
{code}
java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.

Failure allocating buffer.
Fragment 4:57

[Error Id: a5eeae54-ac8f-42fa-9af1-03247e6bc316 on atsqa6c82.qa.lab:31010]

  (org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating buffer.
    io.netty.buffer.PooledByteBufAllocatorL.allocate():67
    org.apache.drill.exec.memory.AllocationManager.<init>():84
    org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
    org.apache.drill.exec.memory.BaseAllocator.buffer():241
    org.apache.drill.exec.memory.BaseAllocator.buffer():211
    org.apache.drill.exec.vector.VarCharVector.allocateNew():389
    org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
    org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
    org.apache.drill.exec.vector.AllocationHelper.allocate():54
    org.apache.drill.exec.vector.AllocationHelper.allocate():28
    org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
    org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
    org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
    org.apache.drill.exec.physical.impl.ScanBatch.next():175
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.record.AbstractRecordBatch.next():108
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
    org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
    org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
    org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
    org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.physical.impl.BaseRootExec.next():105
    org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
    org.apache.drill.exec.physical.impl.BaseRootExec.next():95
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1595
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():748
  Caused By (io.netty.util.internal.OutOfDirectMemoryError) failed to allocate 16777216 byte(s) of direct memory (used: 34359738368, max: 34359738368)
    io.netty.util.internal.PlatformDependent.incrementMemoryCounter():510
    io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner():464
    io.netty.buffer.PoolArena$DirectArena.allocateDirect():766
    io.netty.buffer.PoolArena$DirectArena.newChunk():742
    io.netty.buffer.PoolArena.allocateNormal():244
    io.netty.buffer.PoolArena.allocate():226
    io.netty.buffer.PoolArena.allocate():146
    io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():169
    io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():201
    io.netty.buffer.PooledByteBufAllocatorL.allocate():65
    org.apache.drill.exec.memory.AllocationManager.<init>():84
    org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
    org.apache.drill.exec.memory.BaseAllocator.buffer():241
    org.apache.drill.exec.memory.BaseAllocator.buffer():211
    org.apache.drill.exec.vector.VarCharVector.allocateNew():389
    org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
    org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
    org.apache.drill.exec.vector.AllocationHelper.allocate():54
    org.apache.drill.exec.vector.AllocationHelper.allocate():28
    org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
    org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
    org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
    org.apache.drill.exec.physical.impl.ScanBatch.next():175
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.record.AbstractRecordBatch.next():108
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
    org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
    org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
    org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
    org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.physical.impl.BaseRootExec.next():105
    org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
    org.apache.drill.exec.physical.impl.BaseRootExec.next():95
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1595
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():748
{code}

  was:
On latest Apache master, we are observing that there are multiple test failures. It looks like Drill runs out of Direct memory and queries fail with OOM. Few other queries fail probably fail because they are unable to connect to Drillbits.

It looks like one of the recent commits caused this.

*||Commit ID||Status||*
|24193b1b038a6315681a65c76a67034b64f71fc5|FAIL|
|9173308710c3decf8ff745493ad3e85ccdaf7c37|PASS|
|c6549e58859397c88cb1de61b4f6eee52a07ed0c|PASS|

Two example queries + exceptions below. Also query log attached. 

*Query 1*: Advanced/tpch/tpch_sf100/parquet/10.q
{code}
 select
 c.c_custkey,
 c.c_name,
 sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
 c.c_acctbal,
 n.n_name,
 c.c_address,
 c.c_phone,
 c.c_comment
 from
 customer c,
 orders o,
 lineitem l,
 nation n
 where
 c.c_custkey = o.o_custkey
 and l.l_orderkey = o.o_orderkey
 and o.o_orderdate >= date '1994-03-01'
 and o.o_orderdate < date '1994-03-01' + interval '3' month
 and l.l_returnflag = 'R'
 and c.c_nationkey = n.n_nationkey
 group by
 c.c_custkey,
 c.c_name,
 c.c_acctbal,
 c.c_phone,
 n.n_name,
 c.c_address,
 c.c_comment
 order by
 revenue desc
 limit 20
{code}
Exception:
{code}
java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.

AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
 Fragment 4:88

[Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]

(org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
 org.apache.drill.exec.record.AbstractRecordBatch.next():164
 org.apache.drill.exec.physical.impl.BaseRootExec.next():105
 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
 org.apache.drill.exec.physical.impl.BaseRootExec.next():95
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
 java.security.AccessController.doPrivileged():-2
 javax.security.auth.Subject.doAs():422
 org.apache.hadoop.security.UserGroupInformation.doAs():1595
 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
 org.apache.drill.common.SelfCleaningRunnable.run():38
 java.util.concurrent.ThreadPoolExecutor.runWorker():1149
 java.util.concurrent.ThreadPoolExecutor$Worker.run():624
 java.lang.Thread.run():748

at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:530)
 at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:634)
 at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207)
 at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:155)
 at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253)
 at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.

AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
 Fragment 4:88

[Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]

(org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
 org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
 org.apache.drill.exec.record.AbstractRecordBatch.next():164
 org.apache.drill.exec.physical.impl.BaseRootExec.next():105
 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
 org.apache.drill.exec.physical.impl.BaseRootExec.next():95
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
 java.security.AccessController.doPrivileged():-2
 javax.security.auth.Subject.doAs():422
 org.apache.hadoop.security.UserGroupInformation.doAs():1595
 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
 org.apache.drill.common.SelfCleaningRunnable.run():38
 java.util.concurrent.ThreadPoolExecutor.runWorker():1149
 java.util.concurrent.ThreadPoolExecutor$Worker.run():624
 java.lang.Thread.run():748
{code}

 *Query 2:* 
Advanced/tpch/tpch_sf100/parquet/08.q
{code}
select
o_year,
sum(case
when nation = 'EGYPT' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o.o_orderdate) as o_year,
l.l_extendedprice * (1 - l.l_discount) as volume,
n2.n_name as nation
from
part p,
supplier s,
lineitem l,
orders o,
customer c,
nation n1,
nation n2,
region r
where
p.p_partkey = l.l_partkey
and s.s_suppkey = l.l_suppkey
and l.l_orderkey = o.o_orderkey
and o.o_custkey = c.c_custkey
and c.c_nationkey = n1.n_nationkey
and n1.n_regionkey = r.r_regionkey
and r.r_name = 'MIDDLE EAST'
and s.s_nationkey = n2.n_nationkey
and o.o_orderdate between date '1995-01-01' and date '1996-12-31'
and p.p_type = 'PROMO BRUSHED COPPER'
) as all_nations
group by
o_year
order by
o_year
{code}
Exception:
{code}
java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.

Failure allocating buffer.
Fragment 4:57

[Error Id: a5eeae54-ac8f-42fa-9af1-03247e6bc316 on atsqa6c82.qa.lab:31010]

  (org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating buffer.
    io.netty.buffer.PooledByteBufAllocatorL.allocate():67
    org.apache.drill.exec.memory.AllocationManager.<init>():84
    org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
    org.apache.drill.exec.memory.BaseAllocator.buffer():241
    org.apache.drill.exec.memory.BaseAllocator.buffer():211
    org.apache.drill.exec.vector.VarCharVector.allocateNew():389
    org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
    org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
    org.apache.drill.exec.vector.AllocationHelper.allocate():54
    org.apache.drill.exec.vector.AllocationHelper.allocate():28
    org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
    org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
    org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
    org.apache.drill.exec.physical.impl.ScanBatch.next():175
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.record.AbstractRecordBatch.next():108
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
    org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
    org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
    org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
    org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.physical.impl.BaseRootExec.next():105
    org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
    org.apache.drill.exec.physical.impl.BaseRootExec.next():95
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1595
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():748
  Caused By (io.netty.util.internal.OutOfDirectMemoryError) failed to allocate 16777216 byte(s) of direct memory (used: 34359738368, max: 34359738368)
    io.netty.util.internal.PlatformDependent.incrementMemoryCounter():510
    io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner():464
    io.netty.buffer.PoolArena$DirectArena.allocateDirect():766
    io.netty.buffer.PoolArena$DirectArena.newChunk():742
    io.netty.buffer.PoolArena.allocateNormal():244
    io.netty.buffer.PoolArena.allocate():226
    io.netty.buffer.PoolArena.allocate():146
    io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():169
    io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():201
    io.netty.buffer.PooledByteBufAllocatorL.allocate():65
    org.apache.drill.exec.memory.AllocationManager.<init>():84
    org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
    org.apache.drill.exec.memory.BaseAllocator.buffer():241
    org.apache.drill.exec.memory.BaseAllocator.buffer():211
    org.apache.drill.exec.vector.VarCharVector.allocateNew():389
    org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
    org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
    org.apache.drill.exec.vector.AllocationHelper.allocate():54
    org.apache.drill.exec.vector.AllocationHelper.allocate():28
    org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
    org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
    org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
    org.apache.drill.exec.physical.impl.ScanBatch.next():175
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.record.AbstractRecordBatch.next():108
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
    org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
    org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.record.AbstractRecordBatch.next():118
    org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
    org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
    org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
    org.apache.drill.exec.record.AbstractRecordBatch.next():164
    org.apache.drill.exec.physical.impl.BaseRootExec.next():105
    org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
    org.apache.drill.exec.physical.impl.BaseRootExec.next():95
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1595
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():748
{code}


> TPC-H tests fail with OOM
> -------------------------
>
>                 Key: DRILL-6384
>                 URL: https://issues.apache.org/jira/browse/DRILL-6384
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.14.0
>            Reporter: Abhishek Girish
>            Priority: Critical
>         Attachments: drillbit.log.txt
>
>
> On latest Apache master, we are observing that there are multiple test failures. It looks like Drill runs out of Direct memory and queries fail with OOM. Few other queries fail probably fail because they are unable to connect to Drillbits.
> It looks like one of the recent commits caused this.
> ||Commit ID||Status||
> |24193b1b038a6315681a65c76a67034b64f71fc5|FAIL|
> |9173308710c3decf8ff745493ad3e85ccdaf7c37|PASS|
> |c6549e58859397c88cb1de61b4f6eee52a07ed0c|PASS|
> Two example queries + exceptions below. Also query log attached. 
> *Query 1*: Advanced/tpch/tpch_sf100/parquet/10.q
> {code}
>  select
>  c.c_custkey,
>  c.c_name,
>  sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
>  c.c_acctbal,
>  n.n_name,
>  c.c_address,
>  c.c_phone,
>  c.c_comment
>  from
>  customer c,
>  orders o,
>  lineitem l,
>  nation n
>  where
>  c.c_custkey = o.o_custkey
>  and l.l_orderkey = o.o_orderkey
>  and o.o_orderdate >= date '1994-03-01'
>  and o.o_orderdate < date '1994-03-01' + interval '3' month
>  and l.l_returnflag = 'R'
>  and c.c_nationkey = n.n_nationkey
>  group by
>  c.c_custkey,
>  c.c_name,
>  c.c_acctbal,
>  c.c_phone,
>  n.n_name,
>  c.c_address,
>  c.c_comment
>  order by
>  revenue desc
>  limit 20
> {code}
> Exception:
> {code}
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
>  Fragment 4:88
> [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
>  org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
>  org.apache.drill.exec.record.AbstractRecordBatch.next():164
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>  org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1595
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748
> at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:530)
>  at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:634)
>  at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207)
>  at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:155)
>  at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253)
>  at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
>  Fragment 4:88
> [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
>  org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
>  org.apache.drill.exec.record.AbstractRecordBatch.next():164
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>  org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1595
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748
> {code}
>  *Query 2:* 
> Advanced/tpch/tpch_sf100/parquet/08.q
> {code}
> select
> o_year,
> sum(case
> when nation = 'EGYPT' then volume
> else 0
> end) / sum(volume) as mkt_share
> from
> (
> select
> extract(year from o.o_orderdate) as o_year,
> l.l_extendedprice * (1 - l.l_discount) as volume,
> n2.n_name as nation
> from
> part p,
> supplier s,
> lineitem l,
> orders o,
> customer c,
> nation n1,
> nation n2,
> region r
> where
> p.p_partkey = l.l_partkey
> and s.s_suppkey = l.l_suppkey
> and l.l_orderkey = o.o_orderkey
> and o.o_custkey = c.c_custkey
> and c.c_nationkey = n1.n_nationkey
> and n1.n_regionkey = r.r_regionkey
> and r.r_name = 'MIDDLE EAST'
> and s.s_nationkey = n2.n_nationkey
> and o.o_orderdate between date '1995-01-01' and date '1996-12-31'
> and p.p_type = 'PROMO BRUSHED COPPER'
> ) as all_nations
> group by
> o_year
> order by
> o_year
> {code}
> Exception:
> {code}
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> Failure allocating buffer.
> Fragment 4:57
> [Error Id: a5eeae54-ac8f-42fa-9af1-03247e6bc316 on atsqa6c82.qa.lab:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating buffer.
>     io.netty.buffer.PooledByteBufAllocatorL.allocate():67
>     org.apache.drill.exec.memory.AllocationManager.<init>():84
>     org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
>     org.apache.drill.exec.memory.BaseAllocator.buffer():241
>     org.apache.drill.exec.memory.BaseAllocator.buffer():211
>     org.apache.drill.exec.vector.VarCharVector.allocateNew():389
>     org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
>     org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
>     org.apache.drill.exec.vector.AllocationHelper.allocate():54
>     org.apache.drill.exec.vector.AllocationHelper.allocate():28
>     org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
>     org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
>     org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
>     org.apache.drill.exec.physical.impl.ScanBatch.next():175
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.record.AbstractRecordBatch.next():108
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748
>   Caused By (io.netty.util.internal.OutOfDirectMemoryError) failed to allocate 16777216 byte(s) of direct memory (used: 34359738368, max: 34359738368)
>     io.netty.util.internal.PlatformDependent.incrementMemoryCounter():510
>     io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner():464
>     io.netty.buffer.PoolArena$DirectArena.allocateDirect():766
>     io.netty.buffer.PoolArena$DirectArena.newChunk():742
>     io.netty.buffer.PoolArena.allocateNormal():244
>     io.netty.buffer.PoolArena.allocate():226
>     io.netty.buffer.PoolArena.allocate():146
>     io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():169
>     io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():201
>     io.netty.buffer.PooledByteBufAllocatorL.allocate():65
>     org.apache.drill.exec.memory.AllocationManager.<init>():84
>     org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
>     org.apache.drill.exec.memory.BaseAllocator.buffer():241
>     org.apache.drill.exec.memory.BaseAllocator.buffer():211
>     org.apache.drill.exec.vector.VarCharVector.allocateNew():389
>     org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
>     org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
>     org.apache.drill.exec.vector.AllocationHelper.allocate():54
>     org.apache.drill.exec.vector.AllocationHelper.allocate():28
>     org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
>     org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
>     org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
>     org.apache.drill.exec.physical.impl.ScanBatch.next():175
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.record.AbstractRecordBatch.next():108
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)