You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Deneche A. Hakim (JIRA)" <ji...@apache.org> on 2016/01/04 18:18:39 UTC

[jira] [Commented] (DRILL-4234) Drill running out of memory for a simple CTAS query

    [ https://issues.apache.org/jira/browse/DRILL-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081397#comment-15081397 ] 

Deneche A. Hakim commented on DRILL-4234:
-----------------------------------------

I further investigated similar failures in our test cluster and for many of them the sort runs out of memory as soon as it transfers the ownership of the first batch. We should probably increase the default value of {{planner.memory.max_query_memory_per_node}} to help avoid such failures, DRILL-3549 should take care of that.

> Drill running out of memory for a simple CTAS query
> ---------------------------------------------------
>
>                 Key: DRILL-4234
>                 URL: https://issues.apache.org/jira/browse/DRILL-4234
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Writer
>    Affects Versions: 1.5.0
>            Reporter: Rahul Challapalli
>            Priority: Critical
>
> git.commit.id.abbrev=6dea429
> Memory Settings on the cluster
> {code}
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> export DRILL_JAVA_OPTS="-Xms1G -Xmx$DRILL_MAX_HEAP -XX:MaxDirectMemorySize=$DRILL_MAX_DIRECT_MEMORY -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=1G -ea"
> {code}
> The below query runs out of memory
> {code}
> create table `tpch_single_partition/lineitem` partition by (l_moddate) as select l.*, l_shipdate - extract(day from l_shipdate) + 1 l_moddate from cp.`tpch/lineitem.parquet` l;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> {code}
> Below is the information from the logs
> {code}
> 2015-12-30 17:44:00,164 [297be821-460b-689f-5b40-8897a6013ffb:frag:0:0] INFO  o.a.d.e.w.f.FragmentStatusReporter - 297be821-460b-689f-5b40-8897a6013ffb:0:0: State to report: RUNNING
> 2015-12-30 17:44:00,868 [297be821-460b-689f-5b40-8897a6013ffb:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> [Error Id: 1843305a-cc43-42c5-9b96-e1a887972999 ]
> 	at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) ~[drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
> 	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2, and not enough batchGroups to spill
> 	at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:356) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:91) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_71]
> 	at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71]
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) ~[hadoop-common-2.7.0-mapr-1506.jar:na]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	... 4 common frames omitted
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2, and not enough batchGroups to spill
> 	at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:620) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:352) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> 	... 44 common frames omitted
> 2015-12-30 17:44:00,868 [297be821-460b-689f-5b40-8897a6013ffb:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 297be821-460b-689f-5b40-8897a6013ffb:0:0: State change requested RUNNING --> FAILED
> 2015-12-30 17:44:00,881 [297be821-460b-689f-5b40-8897a6013ffb:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 297be821-460b-689f-5b40-8897a6013ffb:0:0: State change requested FAILED --> FAILED
> 2015-12-30 17:44:00,883 [297be821-460b-689f-5b40-8897a6013ffb:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 297be821-460b-689f-5b40-8897a6013ffb:0:0: State change requested FAILED --> FAILED
> 2015-12-30 17:44:00,883 [297be821-460b-689f-5b40-8897a6013ffb:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 297be821-460b-689f-5b40-8897a6013ffb:0:0: State change requested FAILED --> FINISHED
> 2015-12-30 17:44:00,927 [CONTROL-rpc-event-queue] WARN  o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED state as query is already at FAILED state (which is terminal).
> 2015-12-30 17:44:00,930 [CONTROL-rpc-event-queue] WARN  o.a.d.e.w.b.ControlMessageHandler - Dropping request to cancel fragment. 297be821-460b-689f-5b40-8897a6013ffb:0:0 does not exist.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)