You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@livy.apache.org by "Jialiang Tan (JIRA)" <ji...@apache.org> on 2017/12/06 04:55:00 UTC

[jira] [Created] (LIVY-426) Livy stops session quietly without error message

Jialiang Tan created LIVY-426:
---------------------------------

             Summary: Livy stops session quietly without error message
                 Key: LIVY-426
                 URL: https://issues.apache.org/jira/browse/LIVY-426
             Project: Livy
          Issue Type: Bug
            Reporter: Jialiang Tan


Livy stops interactive sessions quietly without any error messages. In YARN log it only shows RECEIVED SIGNAL TERM which I think indicates Livy wants to terminate the driver in yarn. In Livy log it only gives Client RPC channel closed unexpectedly. No other information found.
Spark Version: 2.0.1
Scala Version: 2.11.8
Livy Version: 0.4.0-incubating
Zeppelin Version: 0.7.2
Hadoop Version: 2.7.3

YARN log:
{code:java}
17/12/06 00:39:23 INFO storage.BlockManagerMaster: Removed 3 successfully in removeExecutor
17/12/06 00:39:23 INFO spark.ExecutorAllocationManager: Existing executor 3 has been removed (new total is 0)
17/12/06 00:46:07 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM
17/12/06 00:46:07 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/12/06 00:46:07 INFO server.ServerConnector: Stopped ServerConnector@690ca6ae{HTTP/1.1}{0.0.0.0:0}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5959d99{/stages/stage/kill,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5daa4dcf{/api,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1f30b9f2{/,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1a36836{/static,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@56c1c4df{/executors/threadDump/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5ea39f64{/executors/threadDump,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5be8acbc{/executors/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@17a813b2{/executors,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@48780245{/environment/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66bc6e53{/environment,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7b06e14a{/storage/rdd/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@58f0ffff{/storage/rdd,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4cfc52cb{/storage/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@731dd75e{/storage,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1716986b{/stages/pool/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@54380417{/stages/pool,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5b75d33{/stages/stage/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@24b81ae5{/stages/stage,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@68356b10{/stages/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4422af95{/stages,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@39431dbc{/jobs/job/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@162b5d5e{/jobs/job,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a84d7e6{/jobs/json,null,UNAVAILABLE}
17/12/06 00:46:07 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@76d179b8{/jobs,null,UNAVAILABLE}
17/12/06 00:46:07 INFO ui.SparkUI: Stopped Spark web UI at http://10.28.24.141:33403
{code}

And Livy log:
{code:java}
17/12/06 00:46:07 WARN rsc.RSCClient: Client RPC channel closed unexpectedly.
17/12/06 00:46:07 WARN rsc.RSCClient: Error stopping RPC.
io.netty.util.concurrent.BlockingOperationException: DefaultChannelPromise@74e40000(uncancellable)
        at io.netty.util.concurrent.DefaultPromise.checkDeadLock(DefaultPromise.java:390)
        at io.netty.channel.DefaultChannelPromise.checkDeadLock(DefaultChannelPromise.java:157)
        at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:251)
        at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:129)
        at io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:28)
        at io.netty.util.concurrent.DefaultPromise.sync(DefaultPromise.java:218)
        at io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:117)
        at io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:28)
        at org.apache.livy.rsc.rpc.Rpc.close(Rpc.java:307)
        at org.apache.livy.rsc.RSCClient.stop(RSCClient.java:232)
        at org.apache.livy.rsc.RSCClient$2$1.onSuccess(RSCClient.java:129)
        at org.apache.livy.rsc.RSCClient$2$1.onSuccess(RSCClient.java:123)
        at org.apache.livy.rsc.Utils$2.operationComplete(Utils.java:108)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
        at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
        at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
        at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:956)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:608)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:586)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.closeOnRead(AbstractNioByteChannel.java:71)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:158)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Unknown Source)
{code}

There is not enough debug information for this behavior and the behavior happens occasionally. Sometimes the session got stopped immediately upon startup. Other times the session can run properly for hours before it hit this. I have changed some configurations in Livy, not sure if that is related:
livy.conf:
livy.spark.master = yarn
livy.spark.deploy-mode = cluster
livy.server.session.timeout = 24h
livy.impersonation.enabled = true
livy.file.local-dir-whitelist = /usr/lib/sparkExternalLibs
livy.repl.enable-hive-context = true
livy.server.yarn.app-lookup-timeout = 600s

livy-client.conf:
livy.rsc.server.connect.timeout = 360s
livy.rsc.client.connect.timeout = 120s
(I initially suspect it might be due to some rpc timeout issue and hence increased the timeout, but nothing changed)

Not sure if anyone encountered similar issues and any workaround?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)