You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Roman (JIRA)" <ji...@apache.org> on 2017/07/19 17:40:00 UTC

[jira] [Comment Edited] (DRILL-5083) RecordIterator can sometimes restart a query on close

    [ https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093471#comment-16093471 ] 

Roman edited comment on DRILL-5083 at 7/19/17 5:39 PM:
-------------------------------------------------------

It seems I got some reproduce of this issue. 
I use Drill from master (35d07c3bd) which includes [DRILL-5420|https://issues.apache.org/jira/browse/DRILL-5420] and [DRILL-5599|https://issues.apache.org/jira/browse/DRILL-5599] fixes (CANCELLATION_REQUESTED issues). Here is a list of my properties:

planner.enable_hashjoin = false;
planner.enable_hashagg = false;
planner.enable_mergejoin = true;
planner.memory.max_query_memory_per_node = 1048576;

I ran query (I use tpcds_sf100-query2 on parquet tables) which should fail after ~2 min with "RESOURCE ERROR: External Sort encountered an error while spilling to disk" and manually cancelled it after 1min 40 sec. In this case query hangs in CANCELLATION_REQUESTED state until I restart drillbit. It seems query hangs in code generation state. Here is my jstack example:
{code:xml}
   "26989d8b-2aa4-1a56-f7d3-2b1d7b55d786:frag:10:1" #181 daemon prio=10 os_prio=0 tid=0x00007f1eec0b0800 nid=0x2be0 sleeping[0x00007f1edc650000]
   java.lang.Thread.State: RUNNABLE
	at com.sun.codemodel.JStringLiteral.generate(JStringLiteral.java:61)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:363)
	at com.sun.codemodel.JInvocation.generate(JInvocation.java:185)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
	at com.sun.codemodel.JThrow.state(JThrow.java:67)
	at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
	at com.sun.codemodel.JBlock.generateBody(JBlock.java:448)
	at com.sun.codemodel.JBlock.generate(JBlock.java:436)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
	at com.sun.codemodel.JConditional.state(JConditional.java:115)
	at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
	at com.sun.codemodel.JBlock.generateBody(JBlock.java:448)
	at com.sun.codemodel.JBlock.generate(JBlock.java:436)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
	at com.sun.codemodel.JBlock.state(JBlock.java:464)
	at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
	at com.sun.codemodel.JBlock.generateBody(JBlock.java:448)
	at com.sun.codemodel.JBlock.generate(JBlock.java:436)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
	at com.sun.codemodel.JBlock.state(JBlock.java:464)
	at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
	at com.sun.codemodel.JMethod.declare(JMethod.java:460)
	at com.sun.codemodel.JFormatter.d(JFormatter.java:376)
	at com.sun.codemodel.JDefinedClass.declareBody(JDefinedClass.java:815)
	at com.sun.codemodel.JDefinedClass.declare(JDefinedClass.java:788)
	at com.sun.codemodel.JFormatter.d(JFormatter.java:376)
	at com.sun.codemodel.JFormatter.write(JFormatter.java:406)
	at com.sun.codemodel.JPackage.build(JPackage.java:438)
	at com.sun.codemodel.JCodeModel.build(JCodeModel.java:311)
	at com.sun.codemodel.JCodeModel.build(JCodeModel.java:301)
	at org.apache.drill.exec.expr.CodeGenerator.generate(CodeGenerator.java:191)
	at org.apache.drill.exec.compile.CodeCompiler.createInstances(CodeCompiler.java:177)
	at org.apache.drill.exec.compile.CodeCompiler.createInstance(CodeCompiler.java:159)
	at org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:325)
	at org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:319)
	at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.generateNewWorker(MergeJoinBatch.java:382)
	at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:193)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
	at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:325)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
	at org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:140)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105)
	at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:144)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95)
	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- <0x00000000e2398088> (a java.util.concurrent.ThreadPoolExecutor$Worker)

{code}

I got not the same jstack as was described in previous messages, but in this case, I can get an infinite loop in MergeJoin and I think it relates to this issue.


was (Author: romankulyk):
It seems I got some reproduce of this issue. 
I use Drill from master (35d07c3bd) which includes [DRILL-5420|https://issues.apache.org/jira/browse/DRILL-5420] and [DRILL-5599|https://issues.apache.org/jira/browse/DRILL-5599] fixes (CANCELLATION_REQUESTED issues). Here is a list of my properties:

planner.enable_hashjoin = false;
planner.enable_hashagg = false;
planner.enable_mergejoin = true;
planner.memory.max_query_memory_per_node = 1048576;

I ran query which should fail after ~2 min with "RESOURCE ERROR: External Sort encountered an error while spilling to disk" and manually cancelled it after 1min 40 sec. In this case query hangs in CANCELLATION_REQUESTED state until I restart drillbit. It seems query hangs in code generation state. Here is my jstack example:
{code:xml}
   "26989d8b-2aa4-1a56-f7d3-2b1d7b55d786:frag:10:1" #181 daemon prio=10 os_prio=0 tid=0x00007f1eec0b0800 nid=0x2be0 sleeping[0x00007f1edc650000]
   java.lang.Thread.State: RUNNABLE
	at com.sun.codemodel.JStringLiteral.generate(JStringLiteral.java:61)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:363)
	at com.sun.codemodel.JInvocation.generate(JInvocation.java:185)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
	at com.sun.codemodel.JThrow.state(JThrow.java:67)
	at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
	at com.sun.codemodel.JBlock.generateBody(JBlock.java:448)
	at com.sun.codemodel.JBlock.generate(JBlock.java:436)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
	at com.sun.codemodel.JConditional.state(JConditional.java:115)
	at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
	at com.sun.codemodel.JBlock.generateBody(JBlock.java:448)
	at com.sun.codemodel.JBlock.generate(JBlock.java:436)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
	at com.sun.codemodel.JBlock.state(JBlock.java:464)
	at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
	at com.sun.codemodel.JBlock.generateBody(JBlock.java:448)
	at com.sun.codemodel.JBlock.generate(JBlock.java:436)
	at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
	at com.sun.codemodel.JBlock.state(JBlock.java:464)
	at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
	at com.sun.codemodel.JMethod.declare(JMethod.java:460)
	at com.sun.codemodel.JFormatter.d(JFormatter.java:376)
	at com.sun.codemodel.JDefinedClass.declareBody(JDefinedClass.java:815)
	at com.sun.codemodel.JDefinedClass.declare(JDefinedClass.java:788)
	at com.sun.codemodel.JFormatter.d(JFormatter.java:376)
	at com.sun.codemodel.JFormatter.write(JFormatter.java:406)
	at com.sun.codemodel.JPackage.build(JPackage.java:438)
	at com.sun.codemodel.JCodeModel.build(JCodeModel.java:311)
	at com.sun.codemodel.JCodeModel.build(JCodeModel.java:301)
	at org.apache.drill.exec.expr.CodeGenerator.generate(CodeGenerator.java:191)
	at org.apache.drill.exec.compile.CodeCompiler.createInstances(CodeCompiler.java:177)
	at org.apache.drill.exec.compile.CodeCompiler.createInstance(CodeCompiler.java:159)
	at org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:325)
	at org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:319)
	at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.generateNewWorker(MergeJoinBatch.java:382)
	at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:193)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
	at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
	at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:325)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
	at org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:140)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105)
	at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:144)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95)
	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- <0x00000000e2398088> (a java.util.concurrent.ThreadPoolExecutor$Worker)

{code}

I got not the same jstack as was described in previous messages, but in this case, I can get an infinite loop in MergeJoin and I think it relates to this issue.

> RecordIterator can sometimes restart a query on close
> -----------------------------------------------------
>
>                 Key: DRILL-5083
>                 URL: https://issues.apache.org/jira/browse/DRILL-5083
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Roman
>            Priority: Minor
>         Attachments: DrillOperatorErrorHandlingRedesign.pdf
>
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>       // Check whether next() should even have been called in current state.
>       if (null != exceptionState) {
>         throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)