You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Abhishek Girish (JIRA)" <ji...@apache.org> on 2018/05/04 01:23:00 UTC
[jira] [Updated] (DRILL-6384) TPC-H tests fail with OOM
[ https://issues.apache.org/jira/browse/DRILL-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Girish updated DRILL-6384:
-----------------------------------
Description:
On latest Apache master, we are observing that there are multiple test failures. It looks like Drill runs out of Direct memory and queries fail with OOM. Few other queries fail probably fail because they are unable to connect to Drillbits.
It looks like one of the recent commits caused this.
||Commit ID||Status||
|24193b1b038a6315681a65c76a67034b64f71fc5|FAIL|
|9173308710c3decf8ff745493ad3e85ccdaf7c37|PASS|
|c6549e58859397c88cb1de61b4f6eee52a07ed0c|PASS|
Two example queries + exceptions below. Also query log attached.
*Query 1*: Advanced/tpch/tpch_sf100/parquet/10.q
{code}
select
c.c_custkey,
c.c_name,
sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
c.c_acctbal,
n.n_name,
c.c_address,
c.c_phone,
c.c_comment
from
customer c,
orders o,
lineitem l,
nation n
where
c.c_custkey = o.o_custkey
and l.l_orderkey = o.o_orderkey
and o.o_orderdate >= date '1994-03-01'
and o.o_orderdate < date '1994-03-01' + interval '3' month
and l.l_returnflag = 'R'
and c.c_nationkey = n.n_nationkey
group by
c.c_custkey,
c.c_name,
c.c_acctbal,
c.c_phone,
n.n_name,
c.c_address,
c.c_comment
order by
revenue desc
limit 20
{code}
Exception:
{code}
java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
Fragment 4:88
[Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
(org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.physical.impl.BaseRootExec.next():105
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():95
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:530)
at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:634)
at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207)
at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:155)
at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253)
at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
Fragment 4:88
[Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
(org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.physical.impl.BaseRootExec.next():105
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():95
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
{code}
*Query 2:*
Advanced/tpch/tpch_sf100/parquet/08.q
{code}
select
o_year,
sum(case
when nation = 'EGYPT' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o.o_orderdate) as o_year,
l.l_extendedprice * (1 - l.l_discount) as volume,
n2.n_name as nation
from
part p,
supplier s,
lineitem l,
orders o,
customer c,
nation n1,
nation n2,
region r
where
p.p_partkey = l.l_partkey
and s.s_suppkey = l.l_suppkey
and l.l_orderkey = o.o_orderkey
and o.o_custkey = c.c_custkey
and c.c_nationkey = n1.n_nationkey
and n1.n_regionkey = r.r_regionkey
and r.r_name = 'MIDDLE EAST'
and s.s_nationkey = n2.n_nationkey
and o.o_orderdate between date '1995-01-01' and date '1996-12-31'
and p.p_type = 'PROMO BRUSHED COPPER'
) as all_nations
group by
o_year
order by
o_year
{code}
Exception:
{code}
java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
Failure allocating buffer.
Fragment 4:57
[Error Id: a5eeae54-ac8f-42fa-9af1-03247e6bc316 on atsqa6c82.qa.lab:31010]
(org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating buffer.
io.netty.buffer.PooledByteBufAllocatorL.allocate():67
org.apache.drill.exec.memory.AllocationManager.<init>():84
org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
org.apache.drill.exec.memory.BaseAllocator.buffer():241
org.apache.drill.exec.memory.BaseAllocator.buffer():211
org.apache.drill.exec.vector.VarCharVector.allocateNew():389
org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
org.apache.drill.exec.vector.AllocationHelper.allocate():54
org.apache.drill.exec.vector.AllocationHelper.allocate():28
org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
org.apache.drill.exec.physical.impl.ScanBatch.next():175
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.record.AbstractRecordBatch.next():108
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.physical.impl.BaseRootExec.next():105
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():95
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
Caused By (io.netty.util.internal.OutOfDirectMemoryError) failed to allocate 16777216 byte(s) of direct memory (used: 34359738368, max: 34359738368)
io.netty.util.internal.PlatformDependent.incrementMemoryCounter():510
io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner():464
io.netty.buffer.PoolArena$DirectArena.allocateDirect():766
io.netty.buffer.PoolArena$DirectArena.newChunk():742
io.netty.buffer.PoolArena.allocateNormal():244
io.netty.buffer.PoolArena.allocate():226
io.netty.buffer.PoolArena.allocate():146
io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():169
io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():201
io.netty.buffer.PooledByteBufAllocatorL.allocate():65
org.apache.drill.exec.memory.AllocationManager.<init>():84
org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
org.apache.drill.exec.memory.BaseAllocator.buffer():241
org.apache.drill.exec.memory.BaseAllocator.buffer():211
org.apache.drill.exec.vector.VarCharVector.allocateNew():389
org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
org.apache.drill.exec.vector.AllocationHelper.allocate():54
org.apache.drill.exec.vector.AllocationHelper.allocate():28
org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
org.apache.drill.exec.physical.impl.ScanBatch.next():175
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.record.AbstractRecordBatch.next():108
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.physical.impl.BaseRootExec.next():105
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():95
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
{code}
was:
On latest Apache master, we are observing that there are multiple test failures. It looks like Drill runs out of Direct memory and queries fail with OOM. Few other queries fail probably fail because they are unable to connect to Drillbits.
It looks like one of the recent commits caused this.
*||Commit ID||Status||*
|24193b1b038a6315681a65c76a67034b64f71fc5|FAIL|
|9173308710c3decf8ff745493ad3e85ccdaf7c37|PASS|
|c6549e58859397c88cb1de61b4f6eee52a07ed0c|PASS|
Two example queries + exceptions below. Also query log attached.
*Query 1*: Advanced/tpch/tpch_sf100/parquet/10.q
{code}
select
c.c_custkey,
c.c_name,
sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
c.c_acctbal,
n.n_name,
c.c_address,
c.c_phone,
c.c_comment
from
customer c,
orders o,
lineitem l,
nation n
where
c.c_custkey = o.o_custkey
and l.l_orderkey = o.o_orderkey
and o.o_orderdate >= date '1994-03-01'
and o.o_orderdate < date '1994-03-01' + interval '3' month
and l.l_returnflag = 'R'
and c.c_nationkey = n.n_nationkey
group by
c.c_custkey,
c.c_name,
c.c_acctbal,
c.c_phone,
n.n_name,
c.c_address,
c.c_comment
order by
revenue desc
limit 20
{code}
Exception:
{code}
java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
Fragment 4:88
[Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
(org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.physical.impl.BaseRootExec.next():105
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():95
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:530)
at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:634)
at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207)
at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:155)
at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253)
at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
Fragment 4:88
[Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
(org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.physical.impl.BaseRootExec.next():105
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():95
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
{code}
*Query 2:*
Advanced/tpch/tpch_sf100/parquet/08.q
{code}
select
o_year,
sum(case
when nation = 'EGYPT' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o.o_orderdate) as o_year,
l.l_extendedprice * (1 - l.l_discount) as volume,
n2.n_name as nation
from
part p,
supplier s,
lineitem l,
orders o,
customer c,
nation n1,
nation n2,
region r
where
p.p_partkey = l.l_partkey
and s.s_suppkey = l.l_suppkey
and l.l_orderkey = o.o_orderkey
and o.o_custkey = c.c_custkey
and c.c_nationkey = n1.n_nationkey
and n1.n_regionkey = r.r_regionkey
and r.r_name = 'MIDDLE EAST'
and s.s_nationkey = n2.n_nationkey
and o.o_orderdate between date '1995-01-01' and date '1996-12-31'
and p.p_type = 'PROMO BRUSHED COPPER'
) as all_nations
group by
o_year
order by
o_year
{code}
Exception:
{code}
java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
Failure allocating buffer.
Fragment 4:57
[Error Id: a5eeae54-ac8f-42fa-9af1-03247e6bc316 on atsqa6c82.qa.lab:31010]
(org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating buffer.
io.netty.buffer.PooledByteBufAllocatorL.allocate():67
org.apache.drill.exec.memory.AllocationManager.<init>():84
org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
org.apache.drill.exec.memory.BaseAllocator.buffer():241
org.apache.drill.exec.memory.BaseAllocator.buffer():211
org.apache.drill.exec.vector.VarCharVector.allocateNew():389
org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
org.apache.drill.exec.vector.AllocationHelper.allocate():54
org.apache.drill.exec.vector.AllocationHelper.allocate():28
org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
org.apache.drill.exec.physical.impl.ScanBatch.next():175
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.record.AbstractRecordBatch.next():108
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.physical.impl.BaseRootExec.next():105
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():95
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
Caused By (io.netty.util.internal.OutOfDirectMemoryError) failed to allocate 16777216 byte(s) of direct memory (used: 34359738368, max: 34359738368)
io.netty.util.internal.PlatformDependent.incrementMemoryCounter():510
io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner():464
io.netty.buffer.PoolArena$DirectArena.allocateDirect():766
io.netty.buffer.PoolArena$DirectArena.newChunk():742
io.netty.buffer.PoolArena.allocateNormal():244
io.netty.buffer.PoolArena.allocate():226
io.netty.buffer.PoolArena.allocate():146
io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():169
io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():201
io.netty.buffer.PooledByteBufAllocatorL.allocate():65
org.apache.drill.exec.memory.AllocationManager.<init>():84
org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
org.apache.drill.exec.memory.BaseAllocator.buffer():241
org.apache.drill.exec.memory.BaseAllocator.buffer():211
org.apache.drill.exec.vector.VarCharVector.allocateNew():389
org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
org.apache.drill.exec.vector.AllocationHelper.allocate():54
org.apache.drill.exec.vector.AllocationHelper.allocate():28
org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
org.apache.drill.exec.physical.impl.ScanBatch.next():175
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.record.AbstractRecordBatch.next():108
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():118
org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.physical.impl.BaseRootExec.next():105
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():95
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
{code}
> TPC-H tests fail with OOM
> -------------------------
>
> Key: DRILL-6384
> URL: https://issues.apache.org/jira/browse/DRILL-6384
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.14.0
> Reporter: Abhishek Girish
> Priority: Critical
> Attachments: drillbit.log.txt
>
>
> On latest Apache master, we are observing that there are multiple test failures. It looks like Drill runs out of Direct memory and queries fail with OOM. Few other queries fail probably fail because they are unable to connect to Drillbits.
> It looks like one of the recent commits caused this.
> ||Commit ID||Status||
> |24193b1b038a6315681a65c76a67034b64f71fc5|FAIL|
> |9173308710c3decf8ff745493ad3e85ccdaf7c37|PASS|
> |c6549e58859397c88cb1de61b4f6eee52a07ed0c|PASS|
> Two example queries + exceptions below. Also query log attached.
> *Query 1*: Advanced/tpch/tpch_sf100/parquet/10.q
> {code}
> select
> c.c_custkey,
> c.c_name,
> sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
> c.c_acctbal,
> n.n_name,
> c.c_address,
> c.c_phone,
> c.c_comment
> from
> customer c,
> orders o,
> lineitem l,
> nation n
> where
> c.c_custkey = o.o_custkey
> and l.l_orderkey = o.o_orderkey
> and o.o_orderdate >= date '1994-03-01'
> and o.o_orderdate < date '1994-03-01' + interval '3' month
> and l.l_returnflag = 'R'
> and c.c_nationkey = n.n_nationkey
> group by
> c.c_custkey,
> c.c_name,
> c.c_acctbal,
> c.c_phone,
> n.n_name,
> c.c_address,
> c.c_comment
> order by
> revenue desc
> limit 20
> {code}
> Exception:
> {code}
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
> Fragment 4:88
> [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:530)
> at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:634)
> at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207)
> at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:155)
> at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253)
> at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
> Fragment 4:88
> [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152.
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
> org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> {code}
> *Query 2:*
> Advanced/tpch/tpch_sf100/parquet/08.q
> {code}
> select
> o_year,
> sum(case
> when nation = 'EGYPT' then volume
> else 0
> end) / sum(volume) as mkt_share
> from
> (
> select
> extract(year from o.o_orderdate) as o_year,
> l.l_extendedprice * (1 - l.l_discount) as volume,
> n2.n_name as nation
> from
> part p,
> supplier s,
> lineitem l,
> orders o,
> customer c,
> nation n1,
> nation n2,
> region r
> where
> p.p_partkey = l.l_partkey
> and s.s_suppkey = l.l_suppkey
> and l.l_orderkey = o.o_orderkey
> and o.o_custkey = c.c_custkey
> and c.c_nationkey = n1.n_nationkey
> and n1.n_regionkey = r.r_regionkey
> and r.r_name = 'MIDDLE EAST'
> and s.s_nationkey = n2.n_nationkey
> and o.o_orderdate between date '1995-01-01' and date '1996-12-31'
> and p.p_type = 'PROMO BRUSHED COPPER'
> ) as all_nations
> group by
> o_year
> order by
> o_year
> {code}
> Exception:
> {code}
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> Failure allocating buffer.
> Fragment 4:57
> [Error Id: a5eeae54-ac8f-42fa-9af1-03247e6bc316 on atsqa6c82.qa.lab:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating buffer.
> io.netty.buffer.PooledByteBufAllocatorL.allocate():67
> org.apache.drill.exec.memory.AllocationManager.<init>():84
> org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
> org.apache.drill.exec.memory.BaseAllocator.buffer():241
> org.apache.drill.exec.memory.BaseAllocator.buffer():211
> org.apache.drill.exec.vector.VarCharVector.allocateNew():389
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
> org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
> org.apache.drill.exec.vector.AllocationHelper.allocate():54
> org.apache.drill.exec.vector.AllocationHelper.allocate():28
> org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
> org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
> org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
> org.apache.drill.exec.physical.impl.ScanBatch.next():175
> org.apache.drill.exec.record.AbstractRecordBatch.next():118
> org.apache.drill.exec.record.AbstractRecordBatch.next():108
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():118
> org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
> org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():118
> org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
> org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> Caused By (io.netty.util.internal.OutOfDirectMemoryError) failed to allocate 16777216 byte(s) of direct memory (used: 34359738368, max: 34359738368)
> io.netty.util.internal.PlatformDependent.incrementMemoryCounter():510
> io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner():464
> io.netty.buffer.PoolArena$DirectArena.allocateDirect():766
> io.netty.buffer.PoolArena$DirectArena.newChunk():742
> io.netty.buffer.PoolArena.allocateNormal():244
> io.netty.buffer.PoolArena.allocate():226
> io.netty.buffer.PoolArena.allocate():146
> io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():169
> io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():201
> io.netty.buffer.PooledByteBufAllocatorL.allocate():65
> org.apache.drill.exec.memory.AllocationManager.<init>():84
> org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
> org.apache.drill.exec.memory.BaseAllocator.buffer():241
> org.apache.drill.exec.memory.BaseAllocator.buffer():211
> org.apache.drill.exec.vector.VarCharVector.allocateNew():389
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
> org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
> org.apache.drill.exec.vector.AllocationHelper.allocate():54
> org.apache.drill.exec.vector.AllocationHelper.allocate():28
> org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
> org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
> org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
> org.apache.drill.exec.physical.impl.ScanBatch.next():175
> org.apache.drill.exec.record.AbstractRecordBatch.next():118
> org.apache.drill.exec.record.AbstractRecordBatch.next():108
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():118
> org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
> org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():118
> org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
> org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)