You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@livy.apache.org by "Jialiang Tan (JIRA)" <ji...@apache.org> on 2017/12/06 04:55:00 UTC
[jira] [Created] (LIVY-426) Livy stops session quietly without
error message
Jialiang Tan created LIVY-426:
---------------------------------
Summary: Livy stops session quietly without error message
Key: LIVY-426
URL: https://issues.apache.org/jira/browse/LIVY-426
Project: Livy
Issue Type: Bug
Reporter: Jialiang Tan
Livy stops interactive sessions quietly without any error messages. In YARN log it only shows RECEIVED SIGNAL TERM which I think indicates Livy wants to terminate the driver in yarn. In Livy log it only gives Client RPC channel closed unexpectedly. No other information found.
Spark Version: 2.0.1
Scala Version: 2.11.8
Livy Version: 0.4.0-incubating
Zeppelin Version: 0.7.2
Hadoop Version: 2.7.3
YARN log:
{code:java}
17/12/06 00:39:23 INFO storage.BlockManagerMaster: Removed 3 successfully in removeExecutor
17/12/06 00:39:23 INFO spark.ExecutorAllocationManager: Existing executor 3 has been removed (new total is 0)
17/12/06 00:46:07 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM
17/12/06 00:46:07 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/12/06 00:46:07 INFO server.ServerConnector: Stopped ServerConnector@690ca6ae{HTTP/1.1}{0.0.0.0:0}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5959d99{/stages/stage/kill,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5daa4dcf{/api,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1f30b9f2{/,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1a36836{/static,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@56c1c4df{/executors/threadDump/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5ea39f64{/executors/threadDump,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5be8acbc{/executors/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@17a813b2{/executors,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@48780245{/environment/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66bc6e53{/environment,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7b06e14a{/storage/rdd/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@58f0ffff{/storage/rdd,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4cfc52cb{/storage/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@731dd75e{/storage,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1716986b{/stages/pool/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@54380417{/stages/pool,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5b75d33{/stages/stage/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@24b81ae5{/stages/stage,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@68356b10{/stages/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4422af95{/stages,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@39431dbc{/jobs/job/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@162b5d5e{/jobs/job,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a84d7e6{/jobs/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@76d179b8{/jobs,null,UNAVAILABLE}
17/12/06 00:46:07 INFO ui.SparkUI: Stopped Spark web UI at http://10.28.24.141:33403
{code}
And Livy log:
{code:java}
17/12/06 00:46:07 WARN rsc.RSCClient: Client RPC channel closed unexpectedly.
17/12/06 00:46:07 WARN rsc.RSCClient: Error stopping RPC.
io.netty.util.concurrent.BlockingOperationException: DefaultChannelPromise@74e40000(uncancellable)
at io.netty.util.concurrent.DefaultPromise.checkDeadLock(DefaultPromise.java:390)
at io.netty.channel.DefaultChannelPromise.checkDeadLock(DefaultChannelPromise.java:157)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:251)
at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:129)
at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:28)
at io.netty.util.concurrent.DefaultPromise.sync(DefaultPromise.java:218)
at io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:117)
at io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:28)
at org.apache.livy.rsc.rpc.Rpc.close(Rpc.java:307)
at org.apache.livy.rsc.RSCClient.stop(RSCClient.java:232)
at org.apache.livy.rsc.RSCClient$2$1.onSuccess(RSCClient.java:129)
at org.apache.livy.rsc.RSCClient$2$1.onSuccess(RSCClient.java:123)
at org.apache.livy.rsc.Utils$2.operationComplete(Utils.java:108)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:956)
at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:608)
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:586)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.closeOnRead(AbstractNioByteChannel.java:71)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:158)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Unknown Source)
{code}
There is not enough debug information for this behavior and the behavior happens occasionally. Sometimes the session got stopped immediately upon startup. Other times the session can run properly for hours before it hit this. I have changed some configurations in Livy, not sure if that is related:
livy.conf:
livy.spark.master = yarn
livy.spark.deploy-mode = cluster
livy.server.session.timeout = 24h
livy.impersonation.enabled = true
livy.file.local-dir-whitelist = /usr/lib/sparkExternalLibs
livy.repl.enable-hive-context = true
livy.server.yarn.app-lookup-timeout = 600s
livy-client.conf:
livy.rsc.server.connect.timeout = 360s
livy.rsc.client.connect.timeout = 120s
(I initially suspect it might be due to some rpc timeout issue and hence increased the timeout, but nothing changed)
Not sure if anyone encountered similar issues and any workaround?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)