You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Alan Höng <al...@gmail.com> on 2017/09/20 15:08:42 UTC

User client timeout with results > 2M rows

Hello,

I'm getting errors when trying to fetch results from drill with queries
that evaluate to bigger tables. Surprisingly it works like a charm if the
returned table has less than 2M rows. It also seems like the query is
executed and finishes successfully....

I'm querying parquet files with GZIP compression on S3. I'm running drill
in distributed mode with zookeeper. I use version 1.9 from the container
available on dockerhub "harisekhon/apache-drill:1.9". I'm using the pydrill
package which uses the rest api to submit queries and gather results.

I get the following error message from the client:

TransportError(500, '{\n  "errorMessage" : "CONNECTION ERROR: Connection /
172.19.0.3:52382 <--> ef53daab0ef8/172.19.0.6:31010 (user client) closed
unexpectedly. Drillbit down?\\n\\n\\n[Error Id:
6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}'

I would appreciate any help with this.

Best
Alan Höng

Re: User client timeout with results > 2M rows

Posted by Saurabh Mahapatra <sa...@gmail.com>.

Brings us back to the feedback that John Omernik has given time and time
again that we need to improve the error/warning messages. Wasn't this
discussed at a session on the user experience at the Drill hackathon this
week?

Best,
Saurabh

On Thu, Sep 21, 2017 at 10:36 AM, Kunal Khatua <kk...@mapr.com> wrote:

> Glad you could resolve it, Alan!
>
> The fact that bumping up the memory settings worked means you might be
> hitting frequent garbage collection pauses that slowed down your system to
> the point that it timed out.
>
> Could you share what is the config of the nodes and what were the memory
> settings that worked for you? That way, others can also benefit from your
> experience.
>
>
>
> -----Original Message-----
> From: Alan Höng [mailto:alan.f.hoeng@gmail.com]
> Sent: Thursday, September 21, 2017 2:25 AM
> To: user@drill.apache.org
> Subject: Re: User client timeout with results > 2M rows
>
> It was resolved by increasing heap memory and direct memory as suggested
> in the thread. Still the error message is quite confusing.
>
> Thanks for your help!
>
> On Thu, 21 Sep 2017 at 10:07 Alan Höng <al...@gmail.com> wrote:
>
> > Thanks for the search unfortunately no result from there seemed to fit
> > my problem. I think what comes closest to my issue is the following
> > thread, which I found on my initial search:
> > https://lists.apache.org/list.html?user@drill.apache.org:
> gte=1d:user%20client%20closed%20unexpectedly.
> > I'm trying to adjust resources now but I hardly doubt that this is the
> > problem as I'm running the query on quite a big machine 64 cores
> > ~500GB RAM.
> >
> > Here is the stacktrace from the drillbit log:
> >
> > 2017-09-20 14:54:23,860
> > [263d7f11-78ce-df33-344e-d9a615530c26:frag:1:0]
> > INFO  o.a.d.e.w.f.FragmentStatusReporter -
> > 263d7f11-78ce-df33-344e-d9a615530c26:1:0: State to report: FINISHED
> > 2017-09-20 14:56:24,741 [UserServer-1] INFO
> > o.a.drill.exec.rpc.user.UserServer - RPC connection /172.19.0.6:31010
> > <--> /172.19.0.3:52382 (user client) timed out.  Timeout was set to 30
> > seconds. Closing connection.
> > 2017-09-20 14:56:24,755 [UserServer-1] INFO
> > o.a.d.e.w.fragment.FragmentExecutor -
> > 263d7f11-78ce-df33-344e-d9a615530c26:0:0: State change requested
> > RUNNING
> > --> FAILED
> > 2017-09-20 14:56:24,763 [UserServer-1] WARN
> > o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to
> > fail rpc response.
> > java.lang.IllegalArgumentException: Self-suppression not permitted at
> > java.lang.Throwable.addSuppressed(Throwable.java:1043) ~[na:1.8.0_131]
> > at
> > org.apache.drill.common.DeferredException.addException(DeferredExcepti
> > on.java:88)
> > ~[drill-common-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.common.DeferredException.addThrowable(DeferredExcepti
> > on.java:97)
> > ~[drill-common-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExec
> > utor.java:407)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.work.fragment.FragmentExecutor.access$700(Fragme
> > ntExecutor.java:55)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl
> > .fail(FragmentExecutor.java:421)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:20
> > 8)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.jav
> > a:95)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.jav
> > a:92)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.ops.StatusHandler.failed(StatusHandler.java:42)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(Reques
> > tIdMap.java:134)
> > ~[drill-rpc-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(Req
> > uestIdMap.java:74)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(Req
> > uestIdMap.java:64)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:6
> > 92)
> > [hppc-0.7.1.jar:na]
> > at
> > org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java
> > :58)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RemoteConnection.channelClosed(RemoteConnect
> > ion.java:175)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet
> > e(RpcBus.java:167)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet
> > e(RpcBus.java:146)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise
> > .java:680) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromis
> > e.java:603) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise
> > .java:563) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java
> > :406) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromis
> > e.java:82) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel
> > .java:943) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChann
> > el.java:592) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.
> > java:584) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChann
> > elPipeline.java:1099) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractCha
> > nnelHandlerContext.java:615)
> > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa
> > ndlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHa
> > ndlerAdapter.java:71) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractCha
> > nnelHandlerContext.java:615)
> > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa
> > ndlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa
> > ndlerContext.java:466) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHa
> > ndler.java:187) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> > at
> > org.apache.drill.exec.rpc.BasicServer$LogggingReadTimeoutHandler.readT
> > imedOut(BasicServer.java:121)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > at
> > io.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask.run(ReadTi
> > meoutHandler.java:212) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.
> > java:38) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.j
> > ava:120) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleT
> > hreadEventExecutor.java:357)
> > [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadE
> > ventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] Caused by:
> > org.apache.drill.exec.rpc.ChannelClosedException: Channel closed
> > /172.19.0.6:31010 <--> /172.19.0.3:52382.
> > at
> > org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet
> > e(RpcBus.java:166)
> > [drill-rpc-1.9.0.jar:1.9.0]
> > ... 25 common frames omitted
> >
> > On Wed, 20 Sep 2017 at 22:18 Kunal Khatua <kk...@mapr.com> wrote:
> >
> >> The client error reported is usually trimmed. What is the stack trace
> >> in the drillbit logs? That will tell you exactly where the timeout
> >> occurred and make it easier for you to work with.
> >>
> >> I did a quick browse through for 'S3 Timeout' (
> >> https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:%20S3
> >> %20timeout ). You could browse through to see if any suggestions here
> >> can help unblock you.
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Alan Höng [mailto:alan.f.hoeng@gmail.com]
> >> Sent: Wednesday, September 20, 2017 12:22 PM
> >> To: user@drill.apache.org
> >> Subject: Re: User client timeout with results > 2M rows
> >>
> >> Yes it takes about 2-3min for the timeout to appear the query itself
> >> should finish in that time. The files are not that big for debugging.
> >> I have, but I couldn't find anything relevant or helpful in my
> >> situation so far.
> >>
> >>
> >> On Wed, 20 Sep 2017 at 20:41 Kunal Khatua <kk...@mapr.com> wrote:
> >>
> >> > Do you know in how much time does this timeout occur? There might
> >> > be some tuning needed to increase a timeout. Also, I think this (S3
> >> > specifically) has been seen before... So you might find a solution
> >> > within the mailing list archives. Did you try looking there?
> >> >
> >> >
> >> >
> >> > From: Alan Höng
> >> > Sent: Wednesday, September 20, 8:46 AM
> >> > Subject: User client timeout with results > 2M rows
> >> > To: user@drill.apache.org
> >> >
> >> >
> >> > Hello,
> >> >
> >> > I'm getting errors when trying to fetch results from drill with
> >> > queries that evaluate to bigger tables. Surprisingly it works like
> >> > a charm if the returned table has less than 2M rows. It also seems
> >> > like the query is executed and finishes successfully....
> >> >
> >> > I'm querying parquet files with GZIP compression on S3. I'm running
> >> > drill in distributed mode with zookeeper. I use version 1.9 from
> >> > the container available on dockerhub "harisekhon/apache-drill:1.9".
> >> > I'm using the pydrill package which uses the rest api to submit
> >> > queries and
> >> gather results.
> >> >
> >> > I get the following error message from the client:
> >> >
> >> > TransportError(500, '{\n  "errorMessage" : "CONNECTION ERROR:
> >> > Connection / 172.19.0.3:52382<http://172.19.0.3:52382> <-->
> >> > ef53daab0ef8/ 172.19.0.6:31010<http://172.19.0.6:31010> (user
> >> > client) closed unexpectedly. Drillbit down?\\n\\n\\n[Error Id:
> >> > 6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}'
> >> >
> >> > I would appreciate any help with this.
> >> >
> >> > Best
> >> > Alan Höng
> >> >
> >> >
> >> >
> >>
> >
>

RE: User client timeout with results > 2M rows

Posted by Kunal Khatua <kk...@mapr.com>.

Glad you could resolve it, Alan!

The fact that bumping up the memory settings worked means you might be hitting frequent garbage collection pauses that slowed down your system to the point that it timed out.

Could you share what is the config of the nodes and what were the memory settings that worked for you? That way, others can also benefit from your experience.



-----Original Message-----
From: Alan Höng [mailto:alan.f.hoeng@gmail.com] 
Sent: Thursday, September 21, 2017 2:25 AM
To: user@drill.apache.org
Subject: Re: User client timeout with results > 2M rows

It was resolved by increasing heap memory and direct memory as suggested in the thread. Still the error message is quite confusing.

Thanks for your help!

On Thu, 21 Sep 2017 at 10:07 Alan Höng <al...@gmail.com> wrote:

> Thanks for the search unfortunately no result from there seemed to fit 
> my problem. I think what comes closest to my issue is the following 
> thread, which I found on my initial search:
> https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:user%20client%20closed%20unexpectedly.
> I'm trying to adjust resources now but I hardly doubt that this is the 
> problem as I'm running the query on quite a big machine 64 cores 
> ~500GB RAM.
>
> Here is the stacktrace from the drillbit log:
>
> 2017-09-20 14:54:23,860 
> [263d7f11-78ce-df33-344e-d9a615530c26:frag:1:0]
> INFO  o.a.d.e.w.f.FragmentStatusReporter -
> 263d7f11-78ce-df33-344e-d9a615530c26:1:0: State to report: FINISHED
> 2017-09-20 14:56:24,741 [UserServer-1] INFO 
> o.a.drill.exec.rpc.user.UserServer - RPC connection /172.19.0.6:31010 
> <--> /172.19.0.3:52382 (user client) timed out.  Timeout was set to 30 
> seconds. Closing connection.
> 2017-09-20 14:56:24,755 [UserServer-1] INFO 
> o.a.d.e.w.fragment.FragmentExecutor -
> 263d7f11-78ce-df33-344e-d9a615530c26:0:0: State change requested 
> RUNNING
> --> FAILED
> 2017-09-20 14:56:24,763 [UserServer-1] WARN 
> o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to 
> fail rpc response.
> java.lang.IllegalArgumentException: Self-suppression not permitted at 
> java.lang.Throwable.addSuppressed(Throwable.java:1043) ~[na:1.8.0_131] 
> at
> org.apache.drill.common.DeferredException.addException(DeferredExcepti
> on.java:88)
> ~[drill-common-1.9.0.jar:1.9.0]
> at
> org.apache.drill.common.DeferredException.addThrowable(DeferredExcepti
> on.java:97)
> ~[drill-common-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExec
> utor.java:407)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.access$700(Fragme
> ntExecutor.java:55)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl
> .fail(FragmentExecutor.java:421)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:20
> 8)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.jav
> a:95)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.jav
> a:92)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.ops.StatusHandler.failed(StatusHandler.java:42)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(Reques
> tIdMap.java:134)
> ~[drill-rpc-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(Req
> uestIdMap.java:74)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(Req
> uestIdMap.java:64)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:6
> 92)
> [hppc-0.7.1.jar:na]
> at
> org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java
> :58)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RemoteConnection.channelClosed(RemoteConnect
> ion.java:175)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet
> e(RpcBus.java:167)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet
> e(RpcBus.java:146)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise
> .java:680) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromis
> e.java:603) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise
> .java:563) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java
> :406) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromis
> e.java:82) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel
> .java:943) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChann
> el.java:592) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.
> java:584) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChann
> elPipeline.java:1099) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractCha
> nnelHandlerContext.java:615) 
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa
> ndlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHa
> ndlerAdapter.java:71) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractCha
> nnelHandlerContext.java:615) 
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa
> ndlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHa
> ndlerContext.java:466) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHa
> ndler.java:187) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> at
> org.apache.drill.exec.rpc.BasicServer$LogggingReadTimeoutHandler.readT
> imedOut(BasicServer.java:121)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> io.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask.run(ReadTi
> meoutHandler.java:212) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.
> java:38) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.j
> ava:120) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleT
> hreadEventExecutor.java:357) 
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadE
> ventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] Caused by: 
> org.apache.drill.exec.rpc.ChannelClosedException: Channel closed 
> /172.19.0.6:31010 <--> /172.19.0.3:52382.
> at
> org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplet
> e(RpcBus.java:166)
> [drill-rpc-1.9.0.jar:1.9.0]
> ... 25 common frames omitted
>
> On Wed, 20 Sep 2017 at 22:18 Kunal Khatua <kk...@mapr.com> wrote:
>
>> The client error reported is usually trimmed. What is the stack trace 
>> in the drillbit logs? That will tell you exactly where the timeout 
>> occurred and make it easier for you to work with.
>>
>> I did a quick browse through for 'S3 Timeout' ( 
>> https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:%20S3
>> %20timeout ). You could browse through to see if any suggestions here 
>> can help unblock you.
>>
>>
>>
>> -----Original Message-----
>> From: Alan Höng [mailto:alan.f.hoeng@gmail.com]
>> Sent: Wednesday, September 20, 2017 12:22 PM
>> To: user@drill.apache.org
>> Subject: Re: User client timeout with results > 2M rows
>>
>> Yes it takes about 2-3min for the timeout to appear the query itself 
>> should finish in that time. The files are not that big for debugging. 
>> I have, but I couldn't find anything relevant or helpful in my 
>> situation so far.
>>
>>
>> On Wed, 20 Sep 2017 at 20:41 Kunal Khatua <kk...@mapr.com> wrote:
>>
>> > Do you know in how much time does this timeout occur? There might 
>> > be some tuning needed to increase a timeout. Also, I think this (S3
>> > specifically) has been seen before... So you might find a solution 
>> > within the mailing list archives. Did you try looking there?
>> >
>> >
>> >
>> > From: Alan Höng
>> > Sent: Wednesday, September 20, 8:46 AM
>> > Subject: User client timeout with results > 2M rows
>> > To: user@drill.apache.org
>> >
>> >
>> > Hello,
>> >
>> > I'm getting errors when trying to fetch results from drill with 
>> > queries that evaluate to bigger tables. Surprisingly it works like 
>> > a charm if the returned table has less than 2M rows. It also seems 
>> > like the query is executed and finishes successfully....
>> >
>> > I'm querying parquet files with GZIP compression on S3. I'm running 
>> > drill in distributed mode with zookeeper. I use version 1.9 from 
>> > the container available on dockerhub "harisekhon/apache-drill:1.9". 
>> > I'm using the pydrill package which uses the rest api to submit 
>> > queries and
>> gather results.
>> >
>> > I get the following error message from the client:
>> >
>> > TransportError(500, '{\n  "errorMessage" : "CONNECTION ERROR:
>> > Connection / 172.19.0.3:52382<http://172.19.0.3:52382> <--> 
>> > ef53daab0ef8/ 172.19.0.6:31010<http://172.19.0.6:31010> (user 
>> > client) closed unexpectedly. Drillbit down?\\n\\n\\n[Error Id:
>> > 6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}'
>> >
>> > I would appreciate any help with this.
>> >
>> > Best
>> > Alan Höng
>> >
>> >
>> >
>>
>

Re: User client timeout with results > 2M rows

Posted by Alan Höng <al...@gmail.com>.

It was resolved by increasing heap memory and direct memory as suggested in
the thread. Still the error message is quite confusing.

Thanks for your help!

On Thu, 21 Sep 2017 at 10:07 Alan Höng <al...@gmail.com> wrote:

> Thanks for the search unfortunately no result from there seemed to fit my
> problem. I think what comes closest to my issue is the following thread,
> which I found on my initial search:
> https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:user%20client%20closed%20unexpectedly.
> I'm trying to adjust resources now but I hardly doubt that this is the
> problem as I'm running the query on quite a big machine 64 cores ~500GB
> RAM.
>
> Here is the stacktrace from the drillbit log:
>
> 2017-09-20 14:54:23,860 [263d7f11-78ce-df33-344e-d9a615530c26:frag:1:0]
> INFO  o.a.d.e.w.f.FragmentStatusReporter -
> 263d7f11-78ce-df33-344e-d9a615530c26:1:0: State to report: FINISHED
> 2017-09-20 14:56:24,741 [UserServer-1] INFO
> o.a.drill.exec.rpc.user.UserServer - RPC connection /172.19.0.6:31010
> <--> /172.19.0.3:52382 (user client) timed out.  Timeout was set to 30
> seconds. Closing connection.
> 2017-09-20 14:56:24,755 [UserServer-1] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 263d7f11-78ce-df33-344e-d9a615530c26:0:0: State change requested RUNNING
> --> FAILED
> 2017-09-20 14:56:24,763 [UserServer-1] WARN
> o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc
> response.
> java.lang.IllegalArgumentException: Self-suppression not permitted
> at java.lang.Throwable.addSuppressed(Throwable.java:1043) ~[na:1.8.0_131]
> at
> org.apache.drill.common.DeferredException.addException(DeferredException.java:88)
> ~[drill-common-1.9.0.jar:1.9.0]
> at
> org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:97)
> ~[drill-common-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:407)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.access$700(FragmentExecutor.java:55)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.fail(FragmentExecutor.java:421)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:208)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:95)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:92)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at org.apache.drill.exec.ops.StatusHandler.failed(StatusHandler.java:42)
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(RequestIdMap.java:134)
> ~[drill-rpc-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:74)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:64)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:692)
> [hppc-0.7.1.jar:na]
> at
> org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java:58)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RemoteConnection.channelClosed(RemoteConnection.java:175)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:167)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:146)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1099)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:187)
> [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> at
> org.apache.drill.exec.rpc.BasicServer$LogggingReadTimeoutHandler.readTimedOut(BasicServer.java:121)
> [drill-rpc-1.9.0.jar:1.9.0]
> at
> io.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask.run(ReadTimeoutHandler.java:212)
> [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
> Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel
> closed /172.19.0.6:31010 <--> /172.19.0.3:52382.
> at
> org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:166)
> [drill-rpc-1.9.0.jar:1.9.0]
> ... 25 common frames omitted
>
> On Wed, 20 Sep 2017 at 22:18 Kunal Khatua <kk...@mapr.com> wrote:
>
>> The client error reported is usually trimmed. What is the stack trace in
>> the drillbit logs? That will tell you exactly where the timeout occurred
>> and make it easier for you to work with.
>>
>> I did a quick browse through for 'S3 Timeout' (
>> https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:%20S3%20timeout
>> ). You could browse through to see if any suggestions here can help unblock
>> you.
>>
>>
>>
>> -----Original Message-----
>> From: Alan Höng [mailto:alan.f.hoeng@gmail.com]
>> Sent: Wednesday, September 20, 2017 12:22 PM
>> To: user@drill.apache.org
>> Subject: Re: User client timeout with results > 2M rows
>>
>> Yes it takes about 2-3min for the timeout to appear the query itself
>> should finish in that time. The files are not that big for debugging. I
>> have, but I couldn't find anything relevant or helpful in my situation so
>> far.
>>
>>
>> On Wed, 20 Sep 2017 at 20:41 Kunal Khatua <kk...@mapr.com> wrote:
>>
>> > Do you know in how much time does this timeout occur? There might be
>> > some tuning needed to increase a timeout. Also, I think this (S3
>> > specifically) has been seen before... So you might find a solution
>> > within the mailing list archives. Did you try looking there?
>> >
>> >
>> >
>> > From: Alan Höng
>> > Sent: Wednesday, September 20, 8:46 AM
>> > Subject: User client timeout with results > 2M rows
>> > To: user@drill.apache.org
>> >
>> >
>> > Hello,
>> >
>> > I'm getting errors when trying to fetch results from drill with
>> > queries that evaluate to bigger tables. Surprisingly it works like a
>> > charm if the returned table has less than 2M rows. It also seems like
>> > the query is executed and finishes successfully....
>> >
>> > I'm querying parquet files with GZIP compression on S3. I'm running
>> > drill in distributed mode with zookeeper. I use version 1.9 from the
>> > container available on dockerhub "harisekhon/apache-drill:1.9". I'm
>> > using the pydrill package which uses the rest api to submit queries and
>> gather results.
>> >
>> > I get the following error message from the client:
>> >
>> > TransportError(500, '{\n  "errorMessage" : "CONNECTION ERROR:
>> > Connection / 172.19.0.3:52382<http://172.19.0.3:52382> <-->
>> > ef53daab0ef8/ 172.19.0.6:31010<http://172.19.0.6:31010> (user client)
>> > closed unexpectedly. Drillbit down?\\n\\n\\n[Error Id:
>> > 6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}'
>> >
>> > I would appreciate any help with this.
>> >
>> > Best
>> > Alan Höng
>> >
>> >
>> >
>>
>

Re: User client timeout with results > 2M rows

Posted by Alan Höng <al...@gmail.com>.

Thanks for the search unfortunately no result from there seemed to fit my
problem. I think what comes closest to my issue is the following thread,
which I found on my initial search:
https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:user%20client%20closed%20unexpectedly.
I'm trying to adjust resources now but I hardly doubt that this is the
problem as I'm running the query on quite a big machine 64 cores ~500GB
RAM.

Here is the stacktrace from the drillbit log:

2017-09-20 14:54:23,860 [263d7f11-78ce-df33-344e-d9a615530c26:frag:1:0]
INFO  o.a.d.e.w.f.FragmentStatusReporter -
263d7f11-78ce-df33-344e-d9a615530c26:1:0: State to report: FINISHED
2017-09-20 14:56:24,741 [UserServer-1] INFO
o.a.drill.exec.rpc.user.UserServer - RPC connection /172.19.0.6:31010 <--> /
172.19.0.3:52382 (user client) timed out.  Timeout was set to 30 seconds.
Closing connection.
2017-09-20 14:56:24,755 [UserServer-1] INFO
o.a.d.e.w.fragment.FragmentExecutor -
263d7f11-78ce-df33-344e-d9a615530c26:0:0: State change requested RUNNING
--> FAILED
2017-09-20 14:56:24,763 [UserServer-1] WARN
o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc
response.
java.lang.IllegalArgumentException: Self-suppression not permitted
at java.lang.Throwable.addSuppressed(Throwable.java:1043) ~[na:1.8.0_131]
at
org.apache.drill.common.DeferredException.addException(DeferredException.java:88)
~[drill-common-1.9.0.jar:1.9.0]
at
org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:97)
~[drill-common-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:407)
~[drill-java-exec-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.access$700(FragmentExecutor.java:55)
~[drill-java-exec-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.fail(FragmentExecutor.java:421)
~[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:208)
~[drill-java-exec-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:95)
~[drill-java-exec-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:92)
~[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.drill.exec.ops.StatusHandler.failed(StatusHandler.java:42)
~[drill-java-exec-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(RequestIdMap.java:134)
~[drill-rpc-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:74)
[drill-rpc-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:64)
[drill-rpc-1.9.0.jar:1.9.0]
at
com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:692)
[hppc-0.7.1.jar:na]
at
org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java:58)
[drill-rpc-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RemoteConnection.channelClosed(RemoteConnection.java:175)
[drill-rpc-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:167)
[drill-rpc-1.9.0.jar:1.9.0]
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:146)
[drill-rpc-1.9.0.jar:1.9.0]
at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1099)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:187)
[netty-handler-4.0.27.Final.jar:4.0.27.Final]
at
org.apache.drill.exec.rpc.BasicServer$LogggingReadTimeoutHandler.readTimedOut(BasicServer.java:121)
[drill-rpc-1.9.0.jar:1.9.0]
at
io.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask.run(ReadTimeoutHandler.java:212)
[netty-handler-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed
/172.19.0.6:31010 <--> /172.19.0.3:52382.
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:166)
[drill-rpc-1.9.0.jar:1.9.0]
... 25 common frames omitted

On Wed, 20 Sep 2017 at 22:18 Kunal Khatua <kk...@mapr.com> wrote:

> The client error reported is usually trimmed. What is the stack trace in
> the drillbit logs? That will tell you exactly where the timeout occurred
> and make it easier for you to work with.
>
> I did a quick browse through for 'S3 Timeout' (
> https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:%20S3%20timeout
> ). You could browse through to see if any suggestions here can help unblock
> you.
>
>
>
> -----Original Message-----
> From: Alan Höng [mailto:alan.f.hoeng@gmail.com]
> Sent: Wednesday, September 20, 2017 12:22 PM
> To: user@drill.apache.org
> Subject: Re: User client timeout with results > 2M rows
>
> Yes it takes about 2-3min for the timeout to appear the query itself
> should finish in that time. The files are not that big for debugging. I
> have, but I couldn't find anything relevant or helpful in my situation so
> far.
>
>
> On Wed, 20 Sep 2017 at 20:41 Kunal Khatua <kk...@mapr.com> wrote:
>
> > Do you know in how much time does this timeout occur? There might be
> > some tuning needed to increase a timeout. Also, I think this (S3
> > specifically) has been seen before... So you might find a solution
> > within the mailing list archives. Did you try looking there?
> >
> >
> >
> > From: Alan Höng
> > Sent: Wednesday, September 20, 8:46 AM
> > Subject: User client timeout with results > 2M rows
> > To: user@drill.apache.org
> >
> >
> > Hello,
> >
> > I'm getting errors when trying to fetch results from drill with
> > queries that evaluate to bigger tables. Surprisingly it works like a
> > charm if the returned table has less than 2M rows. It also seems like
> > the query is executed and finishes successfully....
> >
> > I'm querying parquet files with GZIP compression on S3. I'm running
> > drill in distributed mode with zookeeper. I use version 1.9 from the
> > container available on dockerhub "harisekhon/apache-drill:1.9". I'm
> > using the pydrill package which uses the rest api to submit queries and
> gather results.
> >
> > I get the following error message from the client:
> >
> > TransportError(500, '{\n  "errorMessage" : "CONNECTION ERROR:
> > Connection / 172.19.0.3:52382<http://172.19.0.3:52382> <-->
> > ef53daab0ef8/ 172.19.0.6:31010<http://172.19.0.6:31010> (user client)
> > closed unexpectedly. Drillbit down?\\n\\n\\n[Error Id:
> > 6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}'
> >
> > I would appreciate any help with this.
> >
> > Best
> > Alan Höng
> >
> >
> >
>

RE: User client timeout with results > 2M rows

Posted by Kunal Khatua <kk...@mapr.com>.

The client error reported is usually trimmed. What is the stack trace in the drillbit logs? That will tell you exactly where the timeout occurred and make it easier for you to work with.

I did a quick browse through for 'S3 Timeout' ( https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:%20S3%20timeout ). You could browse through to see if any suggestions here can help unblock you.



-----Original Message-----
From: Alan Höng [mailto:alan.f.hoeng@gmail.com] 
Sent: Wednesday, September 20, 2017 12:22 PM
To: user@drill.apache.org
Subject: Re: User client timeout with results > 2M rows

Yes it takes about 2-3min for the timeout to appear the query itself should finish in that time. The files are not that big for debugging. I have, but I couldn't find anything relevant or helpful in my situation so far.


On Wed, 20 Sep 2017 at 20:41 Kunal Khatua <kk...@mapr.com> wrote:

> Do you know in how much time does this timeout occur? There might be 
> some tuning needed to increase a timeout. Also, I think this (S3 
> specifically) has been seen before... So you might find a solution 
> within the mailing list archives. Did you try looking there?
>
>
>
> From: Alan Höng
> Sent: Wednesday, September 20, 8:46 AM
> Subject: User client timeout with results > 2M rows
> To: user@drill.apache.org
>
>
> Hello,
>
> I'm getting errors when trying to fetch results from drill with 
> queries that evaluate to bigger tables. Surprisingly it works like a 
> charm if the returned table has less than 2M rows. It also seems like 
> the query is executed and finishes successfully....
>
> I'm querying parquet files with GZIP compression on S3. I'm running 
> drill in distributed mode with zookeeper. I use version 1.9 from the 
> container available on dockerhub "harisekhon/apache-drill:1.9". I'm 
> using the pydrill package which uses the rest api to submit queries and gather results.
>
> I get the following error message from the client:
>
> TransportError(500, '{\n  "errorMessage" : "CONNECTION ERROR: 
> Connection / 172.19.0.3:52382<http://172.19.0.3:52382> <--> 
> ef53daab0ef8/ 172.19.0.6:31010<http://172.19.0.6:31010> (user client) 
> closed unexpectedly. Drillbit down?\\n\\n\\n[Error Id:
> 6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}'
>
> I would appreciate any help with this.
>
> Best
> Alan Höng
>
>
>

Re: User client timeout with results > 2M rows

Posted by Alan Höng <al...@gmail.com>.

Yes it takes about 2-3min for the timeout to appear the query itself should
finish in that time. The files are not that big for debugging. I have, but
I couldn't find anything relevant or helpful in my situation so far.


On Wed, 20 Sep 2017 at 20:41 Kunal Khatua <kk...@mapr.com> wrote:

> Do you know in how much time does this timeout occur? There might be some
> tuning needed to increase a timeout. Also, I think this (S3 specifically)
> has been seen before... So you might find a solution within the mailing
> list archives. Did you try looking there?
>
>
>
> From: Alan Höng
> Sent: Wednesday, September 20, 8:46 AM
> Subject: User client timeout with results > 2M rows
> To: user@drill.apache.org
>
>
> Hello,
>
> I'm getting errors when trying to fetch results from drill with queries
> that evaluate to bigger tables. Surprisingly it works like a charm if the
> returned table has less than 2M rows. It also seems like the query is
> executed and finishes successfully....
>
> I'm querying parquet files with GZIP compression on S3. I'm running drill
> in distributed mode with zookeeper. I use version 1.9 from the container
> available on dockerhub "harisekhon/apache-drill:1.9". I'm using the pydrill
> package which uses the rest api to submit queries and gather results.
>
> I get the following error message from the client:
>
> TransportError(500, '{\n  "errorMessage" : "CONNECTION ERROR: Connection /
> 172.19.0.3:52382<http://172.19.0.3:52382> <--> ef53daab0ef8/
> 172.19.0.6:31010<http://172.19.0.6:31010> (user client) closed
> unexpectedly. Drillbit down?\\n\\n\\n[Error Id:
> 6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}'
>
> I would appreciate any help with this.
>
> Best
> Alan Höng
>
>
>

Re: User client timeout with results > 2M rows

Posted by Kunal Khatua <kk...@mapr.com>.

Do you know in how much time does this timeout occur? There might be some tuning needed to increase a timeout. Also, I think this (S3 specifically) has been seen before... So you might find a solution within the mailing list archives. Did you try looking there?

From: Alan Höng
Sent: Wednesday, September 20, 8:46 AM
Subject: User client timeout with results > 2M rows
To: user@drill.apache.org

Hello,

I'm getting errors when trying to fetch results from drill with queries that evaluate to bigger tables. Surprisingly it works like a charm if the returned table has less than 2M rows. It also seems like the query is executed and finishes successfully....

I'm querying parquet files with GZIP compression on S3. I'm running drill in distributed mode with zookeeper. I use version 1.9 from the container available on dockerhub "harisekhon/apache-drill:1.9". I'm using the pydrill package which uses the rest api to submit queries and gather results.

I get the following error message from the client:

TransportError(500, '{\n  "errorMessage" : "CONNECTION ERROR: Connection /172.19.0.3:52382<http://172.19.0.3:52382> <--> ef53daab0ef8/172.19.0.6:31010<http://172.19.0.6:31010> (user client) closed unexpectedly. Drillbit down?\\n\\n\\n[Error Id: 6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}'

I would appreciate any help with this.

Best
Alan Höng