You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Frank Rosner (JIRA)" <ji...@apache.org> on 2017/03/14 13:41:41 UTC

[jira] [Created] (ZEPPELIN-2258) Spark driver not closed on interpreter restart

Frank Rosner created ZEPPELIN-2258:
--------------------------------------

             Summary: Spark driver not closed on interpreter restart
                 Key: ZEPPELIN-2258
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2258
             Project: Zeppelin
          Issue Type: Bug
          Components: Interpreters
    Affects Versions: 0.7.0
         Environment: Spark Standalone 2.0.2
            Reporter: Frank Rosner


h4. Problem

Sometimes we experience an issue that when restarting the Spark interpreter, the driver is not shutdown correctly. Due to this, the applications stays submitted to the (standalone) cluster and still consumes resources.

Looking at the logs there is not much to see, at least no error message. How does Zeppelin terminate the Spark session when restarting the interpreter?

{noformat}
grep "SparkUI" zeppelin-interpreter-spark-zeppelin-zeppelin-02.log

 INFO [2017-03-13 14:01:06,415] ({pool-2-thread-3} Logging.scala[logInfo]:54) - Successfully started service 'SparkUI' on port 42935.
 INFO [2017-03-13 14:01:06,418] ({pool-2-thread-3} Logging.scala[logInfo]:54) - Bound SparkUI to 0.0.0.0, and started at http://xxxxxxxxxxx:42935
 WARN [2017-03-13 17:27:58,704] ({pool-2-thread-2} Logging.scala[logWarning]:66) - Service 'SparkUI' could not bind on port 42935. Attempting port 42936.
 INFO [2017-03-13 17:27:58,731] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Successfully started service 'SparkUI' on port 42936.
 INFO [2017-03-13 17:27:58,737] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Bound SparkUI to 0.0.0.0, and started at http://xxxxxxxxxxx:42936
{noformat}

{noformat}
grep -B 300 "could not bind on port 42935" zeppelin-interpreter-spark-zeppelin-zeppelin-02.log

 INFO [2017-03-13 17:26:19,998] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
 INFO [2017-03-13 17:26:20,000] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
 INFO [2017-03-13 17:26:22,192] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
 INFO [2017-03-13 17:26:22,193] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
 INFO [2017-03-13 17:26:22,397] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
 INFO [2017-03-13 17:26:22,397] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
 INFO [2017-03-13 17:26:24,604] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
 INFO [2017-03-13 17:26:24,604] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
 INFO [2017-03-13 17:27:44,299] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
 INFO [2017-03-13 17:27:44,299] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
 INFO [2017-03-13 17:27:44,344] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
 INFO [2017-03-13 17:27:44,344] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
 INFO [2017-03-13 17:27:44,347] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
 INFO [2017-03-13 17:27:44,347] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
 INFO [2017-03-13 17:27:44,348] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
 INFO [2017-03-13 17:27:44,348] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
 INFO [2017-03-13 17:27:52,196] ({Thread-0} RemoteInterpreterServer.java[run]:96) - Starting remote interpreter server on port 47609
 INFO [2017-03-13 17:27:52,561] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter
 INFO [2017-03-13 17:27:52,617] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.SparkInterpreter
 INFO [2017-03-13 17:27:52,623] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.SparkSqlInterpreter
 INFO [2017-03-13 17:27:52,626] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.DepInterpreter
 INFO [2017-03-13 17:27:52,673] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1489422472669 started by scheduler interpreter_1754241267
 INFO [2017-03-13 17:27:52,682] ({pool-2-thread-2} PySparkInterpreter.java[createPythonScript]:106) - File /tmp/zeppelin_pyspark-142944822224657232.py created
 INFO [2017-03-13 17:27:56,198] ({pool-2-thread-75} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1489422348563 finished by scheduler interpreter_999576780
 INFO [2017-03-13 17:27:56,583] ({pool-2-thread-2} SparkInterpreter.java[createSparkSession]:350) - ------ Create new SparkContext spark://yyyyyyyyyyyyy:7077 -------
 INFO [2017-03-13 17:27:56,638] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Running Spark version 2.0.2
 WARN [2017-03-13 17:27:57,427] ({pool-2-thread-2} NativeCodeLoader.java[<clinit>]:62) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 INFO [2017-03-13 17:27:57,664] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing view acls to: zeppelin
 INFO [2017-03-13 17:27:57,665] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing modify acls to: zeppelin
 INFO [2017-03-13 17:27:57,665] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing view acls groups to: 
 INFO [2017-03-13 17:27:57,666] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing modify acls groups to: 
 INFO [2017-03-13 17:27:57,666] ({pool-2-thread-2} Logging.scala[logInfo]:54) - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(zeppelin); groups with view permissions: Set(); users  with modify permissions: Set(zeppelin); groups with modify permissions: Set()
 INFO [2017-03-13 17:27:58,148] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Successfully started service 'sparkDriver' on port 56861.
 INFO [2017-03-13 17:27:58,179] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering MapOutputTracker
 INFO [2017-03-13 17:27:58,212] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering BlockManagerMaster
 INFO [2017-03-13 17:27:58,234] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Created local directory at /tmp/blockmgr-5dd56028-ee7c-4996-9011-245dcc7f3d6b
 INFO [2017-03-13 17:27:58,254] ({pool-2-thread-2} Logging.scala[logInfo]:54) - MemoryStore started with capacity 4.1 GB
 INFO [2017-03-13 17:27:58,324] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering OutputCommitCoordinator
 INFO [2017-03-13 17:27:58,487] ({pool-2-thread-2} Log.java[initialized]:186) - Logging initialized @7223ms
 INFO [2017-03-13 17:27:58,630] ({pool-2-thread-2} Server.java[doStart]:327) - jetty-9.2.z-SNAPSHOT
 INFO [2017-03-13 17:27:58,656] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@4a482724{/jobs,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,657] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3f531ec4{/jobs/json,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,657] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3f300fc5{/jobs/job,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,658] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@b656e64{/jobs/job/json,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,659] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@9f092b{/stages,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,659] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@279d92c2{/stages/json,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,660] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@186cefd3{/stages/stage,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,661] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@38dab4f4{/stages/stage/json,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,661] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@40f42495{/stages/pool,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,662] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@bb4a5cb{/stages/pool/json,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,663] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@74d9c119{/storage,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,663] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@19c9a662{/storage/json,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,664] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@78f8f942{/storage/rdd,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,664] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@56cc816b{/storage/rdd/json,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,665] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@6ca753a2{/environment,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,666] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@380902e2{/environment/json,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,666] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@704ce48{/executors,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,667] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@35f1c01a{/executors/json,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,668] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@631daf45{/executors/threadDump,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,669] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@13f3afd6{/executors/threadDump/json,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,676] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3224456e{/static,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,677] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@9eea220{/,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,679] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@70743df{/api,null,AVAILABLE}
 INFO [2017-03-13 17:27:58,679] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@984c1f6{/stages/stage/kill,null,AVAILABLE}
 WARN [2017-03-13 17:27:58,682] ({pool-2-thread-2} AbstractLifeCycle.java[setFailed]:212) - FAILED ServerConnector@3a4224e0{HTTP/1.1}{0.0.0.0:42935}: java.net.BindException: Address already in use
 WARN [2017-03-13 17:27:58,686] ({pool-2-thread-2} AbstractLifeCycle.java[setFailed]:212) - FAILED org.spark_project.jetty.server.Server@26ba9b5f: java.net.BindException: Address already in use
 INFO [2017-03-13 17:27:58,689] ({pool-2-thread-2} AbstractConnector.java[doStop]:306) - Stopped ServerConnector@3a4224e0{HTTP/1.1}{0.0.0.0:42935}
 INFO [2017-03-13 17:27:58,691] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@984c1f6{/stages/stage/kill,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@70743df{/api,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@9eea220{/,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3224456e{/static,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@13f3afd6{/executors/threadDump/json,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@631daf45{/executors/threadDump,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@35f1c01a{/executors/json,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@704ce48{/executors,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@380902e2{/environment/json,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@6ca753a2{/environment,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@56cc816b{/storage/rdd/json,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@78f8f942{/storage/rdd,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@19c9a662{/storage/json,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@74d9c119{/storage,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@bb4a5cb{/stages/pool/json,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@40f42495{/stages/pool,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@38dab4f4{/stages/stage/json,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@186cefd3{/stages/stage,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@279d92c2{/stages/json,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@9f092b{/stages,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@b656e64{/jobs/job/json,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3f300fc5{/jobs/job,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3f531ec4{/jobs/json,null,UNAVAILABLE}
 INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@4a482724{/jobs,null,UNAVAILABLE}
 WARN [2017-03-13 17:27:58,704] ({pool-2-thread-2} Logging.scala[logWarning]:66) - Service 'SparkUI' could not bind on port 42935. Attempting port 42936.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)