You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/03/08 00:32:41 UTC

[jira] [Assigned] (SPARK-13711) Apache Spark driver stopping JVM when master not available

     [ https://issues.apache.org/jira/browse/SPARK-13711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-13711:
------------------------------------

    Assignee: Apache Spark

> Apache Spark driver stopping JVM when master not available 
> -----------------------------------------------------------
>
>                 Key: SPARK-13711
>                 URL: https://issues.apache.org/jira/browse/SPARK-13711
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.4.1, 1.6.0
>            Reporter: Era
>            Assignee: Apache Spark
>
> In my application Java spark context is created with an unavailable master URL (you may assume master is down for a maintenance). When creating Java spark context it leads to stopping JVM that runs spark driver with JVM exit code 50.
> When I checked the logs I found SparkUncaughtExceptionHandler calling the System.exit. My program should run forever. 
> package test.mains;
> import org.apache.spark.SparkConf;
> import org.apache.spark.api.java.JavaSparkContext;
> public class CheckJavaSparkContext {
>     public static void main(String[] args) {
>         SparkConf conf = new SparkConf();
>         conf.setAppName("test");
>         conf.setMaster("spark://sunshinee:7077");
>         try {
>             new JavaSparkContext(conf);
>         } catch (Throwable e) {
>             System.out.println("Caught an exception : " + e.getMessage());           
>         }
>         System.out.println("Waiting to complete...");
>         while (true) {
>         }
>     }
> }
> Output log
> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 16/03/04 18:01:15 INFO SparkContext: Running Spark version 1.6.0
> 16/03/04 18:01:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 16/03/04 18:01:17 WARN Utils: Your hostname, pesamara-mobl-vm1 resolves to a loopback address: 127.0.0.1; using 10.30.9.107 instead (on interface eth0)
> 16/03/04 18:01:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
> 16/03/04 18:01:18 INFO SecurityManager: Changing view acls to: ps40233
> 16/03/04 18:01:18 INFO SecurityManager: Changing modify acls to: ps40233
> 16/03/04 18:01:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ps40233); users with modify permissions: Set(ps40233)
> 16/03/04 18:01:19 INFO Utils: Successfully started service 'sparkDriver' on port 55309.
> 16/03/04 18:01:21 INFO Slf4jLogger: Slf4jLogger started
> 16/03/04 18:01:21 INFO Remoting: Starting remoting
> 16/03/04 18:01:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.30.9.107:52128]
> 16/03/04 18:01:22 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 52128.
> 16/03/04 18:01:22 INFO SparkEnv: Registering MapOutputTracker
> 16/03/04 18:01:22 INFO SparkEnv: Registering BlockManagerMaster
> 16/03/04 18:01:22 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-87c20178-357d-4252-a46a-62a755568a98
> 16/03/04 18:01:22 INFO MemoryStore: MemoryStore started with capacity 457.7 MB
> 16/03/04 18:01:22 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/03/04 18:01:23 INFO Utils: Successfully started service 'SparkUI' on port 4040.
> 16/03/04 18:01:23 INFO SparkUI: Started SparkUI at http://10.30.9.107:4040
> 16/03/04 18:01:24 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
> 16/03/04 18:01:24 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077
> java.io.IOException: Failed to connect to sunshinee:7077
>         at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
>         at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
>         at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
>         at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
>         at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
>         at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.UnresolvedAddressException
>         at sun.nio.ch.Net.checkAddress(Net.java:123)
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621)
>         at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
>         at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
>         at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438)
>         at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)
>         at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
>         at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
>         at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         ... 1 more
> 16/03/04 18:01:44 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
> 16/03/04 18:01:44 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077
> java.io.IOException: Failed to connect to sunshinee:7077
>         at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
>         at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
>         at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
>         at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
>         at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.UnresolvedAddressException
>         at sun.nio.ch.Net.checkAddress(Net.java:123)
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621)
>         at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
>         at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
>         at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438)
>         at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)
>         at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
>         at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
>         at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         ... 1 more
> 16/03/04 18:02:04 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
> 16/03/04 18:02:04 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
> 16/03/04 18:02:04 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077
> java.io.IOException: Failed to connect to sunshinee:7077
>         at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
>         at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
>         at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
>         at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
>         at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.UnresolvedAddressException
>         at sun.nio.ch.Net.checkAddress(Net.java:123)
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621)
>         at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
>         at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
>         at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438)
>         at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)
>         at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
>         at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
>         at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         ... 1 more
> 16/03/04 18:02:24 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
> 16/03/04 18:02:24 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
> 16/03/04 18:02:24 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
> 16/03/04 18:02:24 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
> 16/03/04 18:02:24 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077
> java.io.IOException: Failed to connect to sunshinee:7077
>         at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
>         at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
>         at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
>         at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
>         at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.UnresolvedAddressException
>         at sun.nio.ch.Net.checkAddress(Net.java:123)
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621)
>         at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
>         at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
>         at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
>         at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438)
>         at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)
>         at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
>         at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
>         at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         ... 1 more
> 16/03/04 18:02:24 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44298.
> 16/03/04 18:02:24 INFO NettyBlockTransferService: Server created on 44298
> 16/03/04 18:02:24 INFO BlockManagerMaster: Trying to register BlockManager
> 16/03/04 18:02:24 INFO BlockManagerMasterEndpoint: Registering block manager 10.30.9.107:44298 with 457.7 MB RAM, BlockManagerId(driver, 10.30.9.107, 44298)
> 16/03/04 18:02:24 INFO BlockManagerMaster: Registered BlockManager
> 16/03/04 18:02:24 INFO SparkUI: Stopped Spark web UI at http://10.30.9.107:4040
> 16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Shutting down all executors
> 16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
> 16/03/04 18:02:24 WARN AppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
> 16/03/04 18:02:24 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
> java.lang.InterruptedException
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
>         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
>         at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:107)
>         at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>         at org.apache.spark.deploy.client.AppClient.stop(AppClient.scala:290)
>         at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.org$apache$spark$scheduler$cluster$SparkDeploySchedulerBackend$$stop(SparkDeploySchedulerBackend.scala:198)
>         at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.stop(SparkDeploySchedulerBackend.scala:101)
>         at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:446)
>         at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1582)
>         at org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1731)
>         at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1229)
>         at org.apache.spark.SparkContext.stop(SparkContext.scala:1730)
>         at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:127)
>         at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
>         at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:134)
>         at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1163)
>         at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:129)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 16/03/04 18:02:24 INFO DiskBlockManager: Shutdown hook called
> 16/03/04 18:02:24 INFO ShutdownHookManager: Shutdown hook called
> 16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52
> 16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52/userFiles-db548748-a55c-4406-adcb-c09e63b118bd
> Java Result: 50 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org