You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Chris Westin (JIRA)" <ji...@apache.org> on 2015/04/01 20:39:56 UTC
[jira] [Commented] (DRILL-2654) Shutdown when queries are running results in an IllegalStateException

    [ https://issues.apache.org/jira/browse/DRILL-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391182#comment-14391182 ] 

Chris Westin commented on DRILL-2654:
-------------------------------------

We do currently wait for up to 5 seconds for all currently running queries to complete, but you're right that it won't always be enough. When the shutdown goes through, we get allocator complaints about outstanding memory allocation, but these are bogus: they result because the queries are still running, so they do indeed have outstanding memory. If they are allowed to complete before shutting down, then the allocator complaints do not occur.

Keeping this as a reminder to test this after DRILL-2656 is taken care of, or to at least issue cancellations for running queries if we don't get to having the nice options described in that bug.

> Shutdown when queries are running results in an IllegalStateException
> ---------------------------------------------------------------------
>
>                 Key: DRILL-2654
>                 URL: https://issues.apache.org/jira/browse/DRILL-2654
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 0.8.0
>            Reporter: Ramana Inukonda Nagaraj
>             Fix For: 1.0.0
>
>
> Scenario: long running TPCH queries followed by a shutdown/restart of all drillbits results in below exceptions in drillbit.out
> {code}
> Caused by: java.lang.IllegalStateException: Failure while trying to close allocator: Child level allocators not closed. Stack trace: 
> 		java.lang.Thread.getStackTrace(Thread.java:1588)
> 		org.apache.drill.exec.memory.TopLevelAllocator.getChildAllocator(TopLevelAllocator.java:129)
> 		org.apache.drill.exec.ops.FragmentContext.<init>(FragmentContext.java:111)
> 		org.apache.drill.exec.work.fragment.NonRootFragmentManager.<init>(NonRootFragmentManager.java:55)
> 		org.apache.drill.exec.work.batch.ControlHandlerImpl.startNewRemoteFragment(ControlHandlerImpl.java:136)
> 		org.apache.drill.exec.work.batch.ControlHandlerImpl.handle(ControlHandlerImpl.java:98)
> 		org.apache.drill.exec.rpc.control.ControlServer.handle(ControlServer.java:60)
> 		org.apache.drill.exec.rpc.control.ControlServer.handle(ControlServer.java:38)
> 		org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:57)
> 		org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:194)
> 		org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:173)
> 		io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
> 		io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> 		io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
> 		io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> 		io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> 		io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
> 		io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:161)
> 		io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> 		io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
> 		io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
> 		io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> 		io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
> 		io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
> 		io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
> 		io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> 		io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> 		io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> 		io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> 		io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> 		java.lang.Thread.run(Thread.java:744)
> {code}
> Currently looks like the restart/shutdown does not really care about any queries being executed or waiting for them to complete. After DRILL-2547 we wait for a short period of time but thats not enough for long running queries. 
> We may need to make this a feature and clearly define shutdown/restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)