You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Vitalii Diravka (JIRA)" <ji...@apache.org> on 2018/05/04 06:15:00 UTC

[jira] [Commented] (DRILL-6384) TPC-H tests fail with OOM

    [ https://issues.apache.org/jira/browse/DRILL-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463408#comment-16463408 ] 

Vitalii Diravka commented on DRILL-6384:
----------------------------------------

[~agirish] This is the same as DRILL-6374. The reason is the 6fcaf4268 (DRILL-6173: Support transitive closure).

I have described it in DRILL-6374 better.


> TPC-H tests fail with OOM
> -------------------------
>
>                 Key: DRILL-6384
>                 URL: https://issues.apache.org/jira/browse/DRILL-6384
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.14.0
>            Reporter: Abhishek Girish
>            Assignee: Vitalii Diravka
>            Priority: Critical
>         Attachments: drillbit.log.txt
>
>
> On latest Apache master, we are observing that there are multiple test failures. It looks like Drill runs out of Direct memory and queries fail with OOM. Few other queries fail probably fail because they are unable to connect to Drillbits.
> It looks like one of the recent commits caused this.
> ||Commit ID||Status||
> |24193b1b038a6315681a65c76a67034b64f71fc5|FAIL|
> |883c8d94b0021a83059fa79563dd516c4299b70a|FAIL|
> |2601cdd33e0685f59a7bf2ac72541bd9dcaaa18f|FAIL|
> |9173308710c3decf8ff745493ad3e85ccdaf7c37|PASS|
> |c6549e58859397c88cb1de61b4f6eee52a07ed0c|PASS|
> Two example queries + exceptions below. Also query log attached. 
> *Query 1*: Advanced/tpch/tpch_sf100/parquet/10.q
> {code}
>  select
>  c.c_custkey,
>  c.c_name,
>  sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
>  c.c_acctbal,
>  n.n_name,
>  c.c_address,
>  c.c_phone,
>  c.c_comment
>  from
>  customer c,
>  orders o,
>  lineitem l,
>  nation n
>  where
>  c.c_custkey = o.o_custkey
>  and l.l_orderkey = o.o_orderkey
>  and o.o_orderdate >= date '1994-03-01'
>  and o.o_orderdate < date '1994-03-01' + interval '3' month
>  and l.l_returnflag = 'R'
>  and c.c_nationkey = n.n_nationkey
>  group by
>  c.c_custkey,
>  c.c_name,
>  c.c_acctbal,
>  c.c_phone,
>  n.n_name,
>  c.c_address,
>  c.c_comment
>  order by
>  revenue desc
>  limit 20
> {code}
> Exception:
> {code}
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
>  Fragment 4:88
> [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
>  org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
>  org.apache.drill.exec.record.AbstractRecordBatch.next():164
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>  org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1595
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748
> at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:530)
>  at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:634)
>  at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207)
>  at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:155)
>  at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253)
>  at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
>  Fragment 4:88
> [Error Id: 81017b59-dfa3-4db9-8673-bee7b80f8acd on atsqa6c82.qa.lab:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First Phase. Partitions: 8. Estimated batch size: 18481152. values size: 1048576. Output alloc size: 1048576. Planned batches: 8 Memory limit: 313709266 so far allocated: 2097152. 
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.spillIfNeeded():1419
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doSpill():1381
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.checkGroupAndAggrValues():1304
>  org.apache.drill.exec.test.generated.HashAggregatorGen3296.doWork():592
>  org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
>  org.apache.drill.exec.record.AbstractRecordBatch.next():164
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>  org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1595
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748
> {code}
>  *Query 2:* 
> Advanced/tpch/tpch_sf100/parquet/08.q
> {code}
> select
> o_year,
> sum(case
> when nation = 'EGYPT' then volume
> else 0
> end) / sum(volume) as mkt_share
> from
> (
> select
> extract(year from o.o_orderdate) as o_year,
> l.l_extendedprice * (1 - l.l_discount) as volume,
> n2.n_name as nation
> from
> part p,
> supplier s,
> lineitem l,
> orders o,
> customer c,
> nation n1,
> nation n2,
> region r
> where
> p.p_partkey = l.l_partkey
> and s.s_suppkey = l.l_suppkey
> and l.l_orderkey = o.o_orderkey
> and o.o_custkey = c.c_custkey
> and c.c_nationkey = n1.n_nationkey
> and n1.n_regionkey = r.r_regionkey
> and r.r_name = 'MIDDLE EAST'
> and s.s_nationkey = n2.n_nationkey
> and o.o_orderdate between date '1995-01-01' and date '1996-12-31'
> and p.p_type = 'PROMO BRUSHED COPPER'
> ) as all_nations
> group by
> o_year
> order by
> o_year
> {code}
> Exception:
> {code}
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> Failure allocating buffer.
> Fragment 4:57
> [Error Id: a5eeae54-ac8f-42fa-9af1-03247e6bc316 on atsqa6c82.qa.lab:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating buffer.
>     io.netty.buffer.PooledByteBufAllocatorL.allocate():67
>     org.apache.drill.exec.memory.AllocationManager.<init>():84
>     org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
>     org.apache.drill.exec.memory.BaseAllocator.buffer():241
>     org.apache.drill.exec.memory.BaseAllocator.buffer():211
>     org.apache.drill.exec.vector.VarCharVector.allocateNew():389
>     org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
>     org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
>     org.apache.drill.exec.vector.AllocationHelper.allocate():54
>     org.apache.drill.exec.vector.AllocationHelper.allocate():28
>     org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
>     org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
>     org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
>     org.apache.drill.exec.physical.impl.ScanBatch.next():175
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.record.AbstractRecordBatch.next():108
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748
>   Caused By (io.netty.util.internal.OutOfDirectMemoryError) failed to allocate 16777216 byte(s) of direct memory (used: 34359738368, max: 34359738368)
>     io.netty.util.internal.PlatformDependent.incrementMemoryCounter():510
>     io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner():464
>     io.netty.buffer.PoolArena$DirectArena.allocateDirect():766
>     io.netty.buffer.PoolArena$DirectArena.newChunk():742
>     io.netty.buffer.PoolArena.allocateNormal():244
>     io.netty.buffer.PoolArena.allocate():226
>     io.netty.buffer.PoolArena.allocate():146
>     io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():169
>     io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():201
>     io.netty.buffer.PooledByteBufAllocatorL.allocate():65
>     org.apache.drill.exec.memory.AllocationManager.<init>():84
>     org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():258
>     org.apache.drill.exec.memory.BaseAllocator.buffer():241
>     org.apache.drill.exec.memory.BaseAllocator.buffer():211
>     org.apache.drill.exec.vector.VarCharVector.allocateNew():389
>     org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():236
>     org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():41
>     org.apache.drill.exec.vector.AllocationHelper.allocate():54
>     org.apache.drill.exec.vector.AllocationHelper.allocate():28
>     org.apache.drill.exec.physical.impl.ScanBatch$Mutator.populateImplicitVectors():446
>     org.apache.drill.exec.physical.impl.ScanBatch$Mutator.access$200():304
>     org.apache.drill.exec.physical.impl.ScanBatch.populateImplicitVectorsAndSetCount():267
>     org.apache.drill.exec.physical.impl.ScanBatch.next():175
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.record.AbstractRecordBatch.next():108
>     org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4786.executeProbePhase():127
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4786.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.record.AbstractRecordBatch.next():118
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4788.executeProbePhase():127
>     org.apache.drill.exec.test.generated.HashJoinProbeGen4788.probeAndProject():235
>     org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():220
>     org.apache.drill.exec.record.AbstractRecordBatch.next():164
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():105
>     org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():95
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)