You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Frank Rosner (JIRA)" <ji...@apache.org> on 2017/03/14 13:41:41 UTC
[jira] [Created] (ZEPPELIN-2258) Spark driver not closed on
interpreter restart
Frank Rosner created ZEPPELIN-2258:
--------------------------------------
Summary: Spark driver not closed on interpreter restart
Key: ZEPPELIN-2258
URL: https://issues.apache.org/jira/browse/ZEPPELIN-2258
Project: Zeppelin
Issue Type: Bug
Components: Interpreters
Affects Versions: 0.7.0
Environment: Spark Standalone 2.0.2
Reporter: Frank Rosner
h4. Problem
Sometimes we experience an issue that when restarting the Spark interpreter, the driver is not shutdown correctly. Due to this, the applications stays submitted to the (standalone) cluster and still consumes resources.
Looking at the logs there is not much to see, at least no error message. How does Zeppelin terminate the Spark session when restarting the interpreter?
{noformat}
grep "SparkUI" zeppelin-interpreter-spark-zeppelin-zeppelin-02.log
INFO [2017-03-13 14:01:06,415] ({pool-2-thread-3} Logging.scala[logInfo]:54) - Successfully started service 'SparkUI' on port 42935.
INFO [2017-03-13 14:01:06,418] ({pool-2-thread-3} Logging.scala[logInfo]:54) - Bound SparkUI to 0.0.0.0, and started at http://xxxxxxxxxxx:42935
WARN [2017-03-13 17:27:58,704] ({pool-2-thread-2} Logging.scala[logWarning]:66) - Service 'SparkUI' could not bind on port 42935. Attempting port 42936.
INFO [2017-03-13 17:27:58,731] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Successfully started service 'SparkUI' on port 42936.
INFO [2017-03-13 17:27:58,737] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Bound SparkUI to 0.0.0.0, and started at http://xxxxxxxxxxx:42936
{noformat}
{noformat}
grep -B 300 "could not bind on port 42935" zeppelin-interpreter-spark-zeppelin-zeppelin-02.log
INFO [2017-03-13 17:26:19,998] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
INFO [2017-03-13 17:26:20,000] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
INFO [2017-03-13 17:26:22,192] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
INFO [2017-03-13 17:26:22,193] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
INFO [2017-03-13 17:26:22,397] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
INFO [2017-03-13 17:26:22,397] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
INFO [2017-03-13 17:26:24,604] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
INFO [2017-03-13 17:26:24,604] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
INFO [2017-03-13 17:27:44,299] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
INFO [2017-03-13 17:27:44,299] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
INFO [2017-03-13 17:27:44,344] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
INFO [2017-03-13 17:27:44,344] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
INFO [2017-03-13 17:27:44,347] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
INFO [2017-03-13 17:27:44,347] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
INFO [2017-03-13 17:27:44,348] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
INFO [2017-03-13 17:27:44,348] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
INFO [2017-03-13 17:27:52,196] ({Thread-0} RemoteInterpreterServer.java[run]:96) - Starting remote interpreter server on port 47609
INFO [2017-03-13 17:27:52,561] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter
INFO [2017-03-13 17:27:52,617] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.SparkInterpreter
INFO [2017-03-13 17:27:52,623] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.SparkSqlInterpreter
INFO [2017-03-13 17:27:52,626] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.DepInterpreter
INFO [2017-03-13 17:27:52,673] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1489422472669 started by scheduler interpreter_1754241267
INFO [2017-03-13 17:27:52,682] ({pool-2-thread-2} PySparkInterpreter.java[createPythonScript]:106) - File /tmp/zeppelin_pyspark-142944822224657232.py created
INFO [2017-03-13 17:27:56,198] ({pool-2-thread-75} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1489422348563 finished by scheduler interpreter_999576780
INFO [2017-03-13 17:27:56,583] ({pool-2-thread-2} SparkInterpreter.java[createSparkSession]:350) - ------ Create new SparkContext spark://yyyyyyyyyyyyy:7077 -------
INFO [2017-03-13 17:27:56,638] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Running Spark version 2.0.2
WARN [2017-03-13 17:27:57,427] ({pool-2-thread-2} NativeCodeLoader.java[<clinit>]:62) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
INFO [2017-03-13 17:27:57,664] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing view acls to: zeppelin
INFO [2017-03-13 17:27:57,665] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing modify acls to: zeppelin
INFO [2017-03-13 17:27:57,665] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing view acls groups to:
INFO [2017-03-13 17:27:57,666] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing modify acls groups to:
INFO [2017-03-13 17:27:57,666] ({pool-2-thread-2} Logging.scala[logInfo]:54) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zeppelin); groups with view permissions: Set(); users with modify permissions: Set(zeppelin); groups with modify permissions: Set()
INFO [2017-03-13 17:27:58,148] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Successfully started service 'sparkDriver' on port 56861.
INFO [2017-03-13 17:27:58,179] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering MapOutputTracker
INFO [2017-03-13 17:27:58,212] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering BlockManagerMaster
INFO [2017-03-13 17:27:58,234] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Created local directory at /tmp/blockmgr-5dd56028-ee7c-4996-9011-245dcc7f3d6b
INFO [2017-03-13 17:27:58,254] ({pool-2-thread-2} Logging.scala[logInfo]:54) - MemoryStore started with capacity 4.1 GB
INFO [2017-03-13 17:27:58,324] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering OutputCommitCoordinator
INFO [2017-03-13 17:27:58,487] ({pool-2-thread-2} Log.java[initialized]:186) - Logging initialized @7223ms
INFO [2017-03-13 17:27:58,630] ({pool-2-thread-2} Server.java[doStart]:327) - jetty-9.2.z-SNAPSHOT
INFO [2017-03-13 17:27:58,656] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@4a482724{/jobs,null,AVAILABLE}
INFO [2017-03-13 17:27:58,657] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3f531ec4{/jobs/json,null,AVAILABLE}
INFO [2017-03-13 17:27:58,657] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3f300fc5{/jobs/job,null,AVAILABLE}
INFO [2017-03-13 17:27:58,658] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@b656e64{/jobs/job/json,null,AVAILABLE}
INFO [2017-03-13 17:27:58,659] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@9f092b{/stages,null,AVAILABLE}
INFO [2017-03-13 17:27:58,659] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@279d92c2{/stages/json,null,AVAILABLE}
INFO [2017-03-13 17:27:58,660] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@186cefd3{/stages/stage,null,AVAILABLE}
INFO [2017-03-13 17:27:58,661] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@38dab4f4{/stages/stage/json,null,AVAILABLE}
INFO [2017-03-13 17:27:58,661] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@40f42495{/stages/pool,null,AVAILABLE}
INFO [2017-03-13 17:27:58,662] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@bb4a5cb{/stages/pool/json,null,AVAILABLE}
INFO [2017-03-13 17:27:58,663] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@74d9c119{/storage,null,AVAILABLE}
INFO [2017-03-13 17:27:58,663] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@19c9a662{/storage/json,null,AVAILABLE}
INFO [2017-03-13 17:27:58,664] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@78f8f942{/storage/rdd,null,AVAILABLE}
INFO [2017-03-13 17:27:58,664] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@56cc816b{/storage/rdd/json,null,AVAILABLE}
INFO [2017-03-13 17:27:58,665] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@6ca753a2{/environment,null,AVAILABLE}
INFO [2017-03-13 17:27:58,666] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@380902e2{/environment/json,null,AVAILABLE}
INFO [2017-03-13 17:27:58,666] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@704ce48{/executors,null,AVAILABLE}
INFO [2017-03-13 17:27:58,667] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@35f1c01a{/executors/json,null,AVAILABLE}
INFO [2017-03-13 17:27:58,668] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@631daf45{/executors/threadDump,null,AVAILABLE}
INFO [2017-03-13 17:27:58,669] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@13f3afd6{/executors/threadDump/json,null,AVAILABLE}
INFO [2017-03-13 17:27:58,676] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3224456e{/static,null,AVAILABLE}
INFO [2017-03-13 17:27:58,677] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@9eea220{/,null,AVAILABLE}
INFO [2017-03-13 17:27:58,679] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@70743df{/api,null,AVAILABLE}
INFO [2017-03-13 17:27:58,679] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@984c1f6{/stages/stage/kill,null,AVAILABLE}
WARN [2017-03-13 17:27:58,682] ({pool-2-thread-2} AbstractLifeCycle.java[setFailed]:212) - FAILED ServerConnector@3a4224e0{HTTP/1.1}{0.0.0.0:42935}: java.net.BindException: Address already in use
WARN [2017-03-13 17:27:58,686] ({pool-2-thread-2} AbstractLifeCycle.java[setFailed]:212) - FAILED org.spark_project.jetty.server.Server@26ba9b5f: java.net.BindException: Address already in use
INFO [2017-03-13 17:27:58,689] ({pool-2-thread-2} AbstractConnector.java[doStop]:306) - Stopped ServerConnector@3a4224e0{HTTP/1.1}{0.0.0.0:42935}
INFO [2017-03-13 17:27:58,691] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@984c1f6{/stages/stage/kill,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@70743df{/api,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@9eea220{/,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3224456e{/static,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@13f3afd6{/executors/threadDump/json,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@631daf45{/executors/threadDump,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@35f1c01a{/executors/json,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@704ce48{/executors,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@380902e2{/environment/json,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@6ca753a2{/environment,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@56cc816b{/storage/rdd/json,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@78f8f942{/storage/rdd,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@19c9a662{/storage/json,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@74d9c119{/storage,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@bb4a5cb{/stages/pool/json,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@40f42495{/stages/pool,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@38dab4f4{/stages/stage/json,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@186cefd3{/stages/stage,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@279d92c2{/stages/json,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@9f092b{/stages,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@b656e64{/jobs/job/json,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3f300fc5{/jobs/job,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3f531ec4{/jobs/json,null,UNAVAILABLE}
INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@4a482724{/jobs,null,UNAVAILABLE}
WARN [2017-03-13 17:27:58,704] ({pool-2-thread-2} Logging.scala[logWarning]:66) - Service 'SparkUI' could not bind on port 42935. Attempting port 42936.
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)