You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Robert Hou (JIRA)" <ji...@apache.org> on 2017/10/24 07:32:00 UTC

[jira] [Assigned] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

     [ https://issues.apache.org/jira/browse/DRILL-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Hou reassigned DRILL-5902:
---------------------------------

    Assignee: Pritesh Maker

> Regression: Queries encounter random failure due to RPC connection timed out
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-5902
>                 URL: https://issues.apache.org/jira/browse/DRILL-5902
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - RPC
>    Affects Versions: 1.11.0
>            Reporter: Robert Hou
>            Assignee: Pritesh Maker
>            Priority: Critical
>         Attachments: 261230f7-e3b9-0cee-22d8-921cb56e3e12.sys.drill, node196.drillbit.log
>
>
> Multiple random failures (25) occurred with the latest Functional-Baseline-88.193 run.  Here is a sample query:
> {noformat}
> /root/drillAutomation/prasadns14/framework/resources/Functional/window_functions/multiple_partitions/q27.sql
> -- Kitchen sink
> -- Use all supported functions
> select
>         rank()                          over W,
>         dense_rank()                    over W,
>         percent_rank()                  over W,
>         cume_dist()                     over W,
>         avg(c_integer + c_integer)      over W,
>         sum(c_integer/100)              over W,
>         count(*)                        over W,
>         min(c_integer)                  over W,
>         max(c_integer)                  over W,
>         row_number()                    over W
> from
>         j7
> where
>         c_boolean is not null
> window  W as (partition by c_bigint, c_date, c_time, c_boolean order by c_integer)
> {noformat}
> From the logs:
> {noformat}
> 2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - Dropping request for early fragment termination for path 261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - Dropping request for early fragment termination for path 261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - Dropping request for early fragment termination for path 261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - Dropping request for early fragment termination for path 261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - Dropping request for early fragment termination for path 261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - Dropping request for early fragment termination for path 261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> 2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - Dropping request for early fragment termination for path 261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
> {noformat}
> {noformat}
> 2017-10-23 04:14:53,941 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.196:31010 <--> /10.10.88.193:38281 (user server) timed out.  Timeout was set to 30 seconds. Closing connection.
> 2017-10-23 04:14:53,952 [UserServer-1] INFO  o.a.d.e.w.fragment.FragmentExecutor - 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested RUNNING --> FAILED
> 2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: State change requested FAILED --> FINISHED
> 2017-10-23 04:14:53,956 [UserServer-1] WARN  o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc response.
> java.lang.IllegalArgumentException: Self-suppression not permitted
>         at java.lang.Throwable.addSuppressed(Throwable.java:1043) ~[na:1.7.0_45]
>         at org.apache.drill.common.DeferredException.addException(DeferredException.java:88) ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:97) ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:413) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.access$700(FragmentExecutor.java:55) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.fail(FragmentExecutor.java:427) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:213) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:95) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:92) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.ops.StatusHandler.failed(StatusHandler.java:42) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(RequestIdMap.java:139) ~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:76) [drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:66) [drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:692) [hppc-0.7.1.jar:na]
>         at org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java:62) [drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.rpc.AbstractRemoteConnection.channelClosed(AbstractRemoteConnection.java:192) [drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.rpc.AbstractServerConnection.channelClosed(AbstractServerConnection.java:165) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:167) [drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:144) [drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) [netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) [netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) [netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) [netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104) [netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1003) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:632) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:610) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:553) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1236) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:618) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:602) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:618) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:602) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:459) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:99) [netty-handler-4.0.48.Final.jar:4.0.48.Final]
>         at org.apache.drill.exec.rpc.BasicServer$LoggingReadTimeoutHandler.readTimedOut(BasicServer.java:147) [drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at io.netty.handler.timeout.ReadTimeoutHandler.channelIdle(ReadTimeoutHandler.java:90) [netty-handler-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.handler.timeout.IdleStateHandler$ReaderIdleTimeoutTask.run(IdleStateHandler.java:494) [netty-handler-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.handler.timeout.IdleStateHandler$AbstractIdleTask.run(IdleStateHandler.java:466) [netty-handler-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) [netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) [netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) [netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) [netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed /10.10.88.196:31010 <--> /10.10.88.193:38343.
>         at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:164) [drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         ... 29 common frames omitted
> 2017-10-23 04:14:53,963 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserSession - Deleted session temporary location [/tmp/bec73426-40b7-44b6-8819-7958634614e9] from file system [maprfs:///]
> 2017-10-23 04:14:53,966 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: ChannelClosedException: Channel closed /10.10.88.196:31010 <--> /10.10.88.193:38343.
> Fragment 0:0
> [Error Id: d3c59456-284f-4c35-988f-f052b7f6c4b0 on atsqa4-196.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: ChannelClosedException: Channel closed /10.10.88.196:31010 <--> /10.10.88.193:38343.
> Fragment 0:0
> [Error Id: d3c59456-284f-4c35-988f-f052b7f6c4b0 on atsqa4-196.qa.lab:31010]
>         at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:298) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed /10.10.88.196:31010 <--> /10.10.88.193:38343.
>         at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:164) ~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:144) ~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) ~[netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) ~[netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) ~[netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) ~[netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104) ~[netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1003) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:632) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:610) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:553) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1236) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:618) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:602) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:618) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:602) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:459) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:99) ~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
>         at org.apache.drill.exec.rpc.BasicServer$LoggingReadTimeoutHandler.readTimedOut(BasicServer.java:147) ~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>         at io.netty.handler.timeout.ReadTimeoutHandler.channelIdle(ReadTimeoutHandler.java:90) ~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.handler.timeout.IdleStateHandler$ReaderIdleTimeoutTask.run(IdleStateHandler.java:494) ~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.handler.timeout.IdleStateHandler$AbstractIdleTask.run(IdleStateHandler.java:466) ~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) ~[netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) ~[netty-common-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) ~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) ~[netty-common-4.0.48.Final.jar:4.0.48.Final]
>         ... 1 common frames omitted
> 2017-10-23 04:14:54,089 [UserServer-1] INFO  o.a.drill.exec.work.foreman.Foreman - Failure while trying communicate query result to initiating client. This would happen if a client is disconnected before response notice can be sent.
> org.apache.drill.exec.rpc.RpcException: Failure sending message.
> {noformat}
> sys.version is:
> 1.12.0-SNAPSHOT	f1d1945b3772bb782039fd6811e34a7de66441c8	DRILL-5582: C++ Client: [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit	19.10.2017 @ 17:13:05 PDT	Unknown	19.10.2017 @ 18:37:13 PDT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)