You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Sorabh Hamirwasia (JIRA)" <ji...@apache.org> on 2019/01/25 17:11:00 UTC

[jira] [Resolved] (DRILL-6998) Queries failing with "Failed to aggregate or route the RFW" due to "java.lang.ArrayIndexOutOfBoundsException" do not complete

     [ https://issues.apache.org/jira/browse/DRILL-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sorabh Hamirwasia resolved DRILL-6998.
--------------------------------------
    Resolution: Fixed

> Queries failing with "Failed to aggregate or route the RFW" due to "java.lang.ArrayIndexOutOfBoundsException" do not complete
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6998
>                 URL: https://issues.apache.org/jira/browse/DRILL-6998
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.16.0
>            Reporter: Abhishek Ravi
>            Assignee: weijie.tong
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.16.0
>
>
> Following query joins 2 tables on *two* (>1) fields.
> {noformat}
> select count(*) from lineitem l inner join partsupp p on l.l_partkey = p.ps_partkey AND l.l_suppkey = p.ps_suppkey
> {noformat}
> The query does not return even though Fragment 0:0 reports a state change from {{RUNNING}} -> {{FINISHED}}
> Following is the jstack output of the {{Frag0:0}}.
> {noformat}
> "23b85137-b102-39a9-70d9-72381c5fb93b:frag:0:0" #16037 daemon prio=10 os_prio=0 tid=0x00007f5f48d415d0 nid=0x1a61 waiting on condition [0x00007f61b32b2000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.drill.exec.work.filter.RuntimeFilterSink.close(RuntimeFilterSink.java:116)
>         at org.apache.drill.exec.work.filter.RuntimeFilterRouter.waitForComplete(RuntimeFilterRouter.java:113)
>         at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:738)
>         at org.apache.drill.exec.work.foreman.QueryStateProcessor.wrapUpCompletion(QueryStateProcessor.java:315)
>         at org.apache.drill.exec.work.foreman.QueryStateProcessor.running(QueryStateProcessor.java:276)
>         at org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:92)
>         - locked <0x000000055f9a7468> (a org.apache.drill.exec.work.foreman.QueryStateProcessor)
>         at org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:349)
>         at org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:342)
>         at org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107)
>         at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65)
>         at org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.addEvent(QueryStateProcessor.java:344)
>         at org.apache.drill.exec.work.foreman.QueryStateProcessor.addToEventQueue(QueryStateProcessor.java:155)
>         at org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:213)
>         at org.apache.drill.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:519)
>         at org.apache.drill.exec.work.foreman.QueryManager.access$100(QueryManager.java:65)
>         at org.apache.drill.exec.work.foreman.QueryManager$NodeTracker.fragmentComplete(QueryManager.java:483)
>         at org.apache.drill.exec.work.foreman.QueryManager.fragmentDone(QueryManager.java:155)
>         at org.apache.drill.exec.work.foreman.QueryManager.access$400(QueryManager.java:65)
>         at org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:546)
>         at org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:63)
>         at org.apache.drill.exec.work.batch.ControlMessageHandler.requestFragmentStatus(ControlMessageHandler.java:253)
>         at org.apache.drill.exec.rpc.control.LocalControlConnectionManager.runCommand(LocalControlConnectionManager.java:130)
>         at org.apache.drill.exec.rpc.control.ControlTunnel.sendFragmentStatus(ControlTunnel.java:89)
>         at org.apache.drill.exec.work.fragment.FragmentStatusReporter.sendStatus(FragmentStatusReporter.java:122)
>         at org.apache.drill.exec.work.fragment.FragmentStatusReporter.stateChanged(FragmentStatusReporter.java:91)
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:367)
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330)
>         at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}
> From the code, it seems that {{RuntimeFilterSink.close}} is stuck at
> {code:java}
>     while (!asyncAggregateWorker.over.get()) {
>       try {
>         Thread.sleep(100);
>       } catch (InterruptedException e) {
>         logger.error("interrupted while sleeping to wait for the aggregating worker thread to exit", e);
>       }
>     }
> {code}
> This is because {{AsyncAggregateWorker}} exits due to the following exception, before it could set  asyncAggregateWorker.over is set to *false*.
> {noformat}
> 2019-01-22 16:01:18,773 [drill-executor-1301] ERROR o.a.d.e.w.filter.RuntimeFilterSink - Failed to aggregate or route the RFW
> java.lang.ArrayIndexOutOfBoundsException: 1
>         at org.apache.drill.exec.work.filter.RuntimeFilterWritable.unwrap(RuntimeFilterWritable.java:67) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>         at org.apache.drill.exec.work.filter.RuntimeFilterWritable.aggregate(RuntimeFilterWritable.java:78) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>         at org.apache.drill.exec.work.filter.RuntimeFilterSink.aggregate(RuntimeFilterSink.java:140) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>         at org.apache.drill.exec.work.filter.RuntimeFilterSink.access$600(RuntimeFilterSink.java:52) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>         at org.apache.drill.exec.work.filter.RuntimeFilterSink$AsyncAggregateWorker.run(RuntimeFilterSink.java:246) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_151]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_151]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_151]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_151]
>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
> {noformat}
> A simple fix would be to add {{over.set(true)}} to the {{finally}} block in {{AsyncAggregateWorker.run}}.
> Hit the issue with latest changes in the PR -> https://github.com/apache/drill/pull/1600



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)