You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@livy.apache.org by "Jialiang Tan (JIRA)" <ji...@apache.org> on 2017/12/06 06:33:00 UTC

[jira] [Commented] (LIVY-426) Livy stops session quietly without error message

    [ https://issues.apache.org/jira/browse/LIVY-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279720#comment-16279720 ] 

Jialiang Tan commented on LIVY-426:
-----------------------------------

Also I found out that when Livy has this problem on one user, it tends to kill other users sessions as well, and the same thing is no error message. 

> Livy stops session quietly without error message
> ------------------------------------------------
>
>                 Key: LIVY-426
>                 URL: https://issues.apache.org/jira/browse/LIVY-426
>             Project: Livy
>          Issue Type: Bug
>            Reporter: Jialiang Tan
>
> Livy stops interactive sessions quietly without any error messages. In YARN log it only shows RECEIVED SIGNAL TERM which I think indicates Livy wants to terminate the driver in yarn. In Livy log it only gives Client RPC channel closed unexpectedly. No other information found.
> Spark Version: 2.0.1
> Scala Version: 2.11.8
> Livy Version: 0.4.0-incubating
> Zeppelin Version: 0.7.2
> Hadoop Version: 2.7.3
> YARN log:
> {code:java}
> 17/12/06 00:39:23 INFO storage.BlockManagerMaster: Removed 3 successfully in removeExecutor
> 17/12/06 00:39:23 INFO spark.ExecutorAllocationManager: Existing executor 3 has been removed (new total is 0)
> 17/12/06 00:46:07 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM
> 17/12/06 00:46:07 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 17/12/06 00:46:07 INFO server.ServerConnector: Stopped ServerConnector@690ca6ae{HTTP/1.1}{0.0.0.0:0}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5959d99{/stages/stage/kill,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5daa4dcf{/api,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1f30b9f2{/,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1a36836{/static,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@56c1c4df{/executors/threadDump/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5ea39f64{/executors/threadDump,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5be8acbc{/executors/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@17a813b2{/executors,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@48780245{/environment/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66bc6e53{/environment,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7b06e14a{/storage/rdd/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@58f0ffff{/storage/rdd,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4cfc52cb{/storage/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@731dd75e{/storage,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1716986b{/stages/pool/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@54380417{/stages/pool,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5b75d33{/stages/stage/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@24b81ae5{/stages/stage,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@68356b10{/stages/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4422af95{/stages,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@39431dbc{/jobs/job/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@162b5d5e{/jobs/job,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a84d7e6{/jobs/json,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@76d179b8{/jobs,null,UNAVAILABLE}
> 17/12/06 00:46:07 INFO ui.SparkUI: Stopped Spark web UI at http://10.28.24.141:33403
> {code}
> And Livy log:
> {code:java}
> 17/12/06 00:46:07 WARN rsc.RSCClient: Client RPC channel closed unexpectedly.
> 17/12/06 00:46:07 WARN rsc.RSCClient: Error stopping RPC.
> io.netty.util.concurrent.BlockingOperationException: DefaultChannelPromise@74e40000(uncancellable)
>         at io.netty.util.concurrent.DefaultPromise.checkDeadLock(DefaultPromise.java:390)
>         at io.netty.channel.DefaultChannelPromise.checkDeadLock(DefaultChannelPromise.java:157)
>         at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:251)
>         at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:129)
>         at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:28)
>         at io.netty.util.concurrent.DefaultPromise.sync(DefaultPromise.java:218)
>         at io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:117)
>         at io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:28)
>         at org.apache.livy.rsc.rpc.Rpc.close(Rpc.java:307)
>         at org.apache.livy.rsc.RSCClient.stop(RSCClient.java:232)
>         at org.apache.livy.rsc.RSCClient$2$1.onSuccess(RSCClient.java:129)
>         at org.apache.livy.rsc.RSCClient$2$1.onSuccess(RSCClient.java:123)
>         at org.apache.livy.rsc.Utils$2.operationComplete(Utils.java:108)
>         at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>         at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
>         at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
>         at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
>         at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:956)
>         at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:608)
>         at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:586)
>         at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.closeOnRead(AbstractNioByteChannel.java:71)
>         at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:158)
>         at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>         at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         at java.lang.Thread.run(Unknown Source)
> {code}
> There is not enough debug information for this behavior and the behavior happens occasionally. Sometimes the session got stopped immediately upon startup. Other times the session can run properly for hours before it hit this. I have changed some configurations in Livy, not sure if that is related:
> livy.conf:
> livy.spark.master = yarn
> livy.spark.deploy-mode = cluster
> livy.server.session.timeout = 24h
> livy.impersonation.enabled = true
> livy.file.local-dir-whitelist = /usr/lib/sparkExternalLibs
> livy.repl.enable-hive-context = true
> livy.server.yarn.app-lookup-timeout = 600s
> livy-client.conf:
> livy.rsc.server.connect.timeout = 360s
> livy.rsc.client.connect.timeout = 120s
> (I initially suspect it might be due to some rpc timeout issue and hence increased the timeout, but nothing changed)
> Not sure if anyone encountered similar issues and any workaround?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)