You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Pavan Kumar (JIRA)" <ji...@apache.org> on 2016/03/30 09:41:25 UTC

[jira] [Created] (SPARK-14266) Association with remote system [akka.tcp://sparkDriver@192.168.1.81:34047] has failed, address is now gated for [5000] ms. Reason is: [Association failed$

Pavan Kumar created SPARK-14266:
-----------------------------------

             Summary: Association with remote system [akka.tcp://sparkDriver@192.168.1.81:34047] has failed, address is now gated for [5000] ms. Reason is: [Association failed$
                 Key: SPARK-14266
                 URL: https://issues.apache.org/jira/browse/SPARK-14266
             Project: Spark
          Issue Type: Bug
          Components: PySpark, Spark Core
    Affects Versions: 1.4.1
         Environment: Ubuntu,
Spark 1.4.1
Python 2.7

java version "1.7.0_95"
OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)


            Reporter: Pavan Kumar
            Priority: Blocker
             Fix For: 1.4.1


I have a spark standalone cluster configuration with 1 master and 3 slaves.

Configuration in Master's spar-env.sh:

export SPARK_PUBLIC_DNS="173.220.132.82"
export SPARK_WORKER_CORES=6
SPARK_MASTER_IP='192.168.1.81'
SPARK_LOCAL_IP='192.168.1.81'

Configuration in Master machine /conf/salves
192.168.1.82
192.168.1.83
192.168.1.84

These are my 3 slaves.

Now when trying to run 
ubuntu@MyCareerVM1:/usr/local/spark$ MASTER=spark://192.168.1.81:7077 bin/pyspark

It is continuously throwing error
Error Logs from Master:

ubuntu@MyCareerVM1:/usr/local/spark$ MASTER=spark://192.168.1.81:7077 bin/pyspark
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
16/03/29 09:16:48 INFO SparkContext: Running Spark version 1.4.1
16/03/29 09:16:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/29 09:16:49 INFO SecurityManager: Changing view acls to: ubuntu
16/03/29 09:16:49 INFO SecurityManager: Changing modify acls to: ubuntu
16/03/29 09:16:49 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
16/03/29 09:16:49 INFO Slf4jLogger: Slf4jLogger started
16/03/29 09:16:50 INFO Remoting: Starting remoting
16/03/29 09:16:50 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.81:34901]
16/03/29 09:16:50 INFO Utils: Successfully started service 'sparkDriver' on port 34901.
16/03/29 09:16:50 INFO SparkEnv: Registering MapOutputTracker
16/03/29 09:16:50 INFO SparkEnv: Registering BlockManagerMaster
16/03/29 09:16:50 INFO DiskBlockManager: Created local directory at /tmp/spark-a77016c9-a9ae-49c5-908f-fc540dc7d3ff/blockmgr-a9e868af-4253-4230-9227-948fbb8a0d91
16/03/29 09:16:50 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
16/03/29 09:16:50 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a77016c9-a9ae-49c5-908f-fc540dc7d3ff/httpd-a78e633c-0ae7-46cf-81e8-776d8f7c3c46
16/03/29 09:16:50 INFO HttpServer: Starting HTTP Server
16/03/29 09:16:50 INFO Utils: Successfully started service 'HTTP file server' on port 34364.
16/03/29 09:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/29 09:16:50 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/03/29 09:16:50 INFO SparkUI: Started SparkUI at http://173.220.132.82:4040
16/03/29 09:16:50 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@192.168.1.81:7077/user/Master...
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160329091651-0006
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/0 on worker-20160329072744-192.168.1.84-45492 (192.168.1.84:45492) with 6 cores
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/0 on hostPort 192.168.1.84:45492 with 6 cores, 512.0 MB RAM
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/1 on worker-20160329072744-192.168.1.82-45482 (192.168.1.82:45482) with 6 cores
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/1 on hostPort 192.168.1.82:45482 with 6 cores, 512.0 MB RAM
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/2 on worker-20160329072746-192.168.1.83-38065 (192.168.1.83:38065) with 6 cores
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/2 on hostPort 192.168.1.83:38065 with 6 cores, 512.0 MB RAM
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now LOADING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now LOADING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now LOADING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now RUNNING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now RUNNING
16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now RUNNING
16/03/29 09:16:51 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42458.
16/03/29 09:16:51 INFO NettyBlockTransferService: Server created on 42458
16/03/29 09:16:51 INFO BlockManagerMaster: Trying to register BlockManager
16/03/29 09:16:51 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.81:42458 with 265.4 MB RAM, BlockManagerId(driver, 192.168.1.81, 42458)
16/03/29 09:16:51 INFO BlockManagerMaster: Registered BlockManager
16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.4.1
      /_/

Using Python version 2.7.6 (default, Jun 22 2015 17:58:13)
SparkContext available as sc, HiveContext available as sqlContext.
>>> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now EXITED (Command exited with code 1)
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/0 removed: Command exited with code 1
16/03/29 09:16:53 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/3 on worker-20160329072744-192.168.1.84-45492 (192.168.1.84:45492) with 6 cores
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/3 on hostPort 192.168.1.84:45492 with 6 cores, 512.0 MB RAM
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now EXITED (Command exited with code 1)
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/2 removed: Command exited with code 1
16/03/29 09:16:53 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/4 on worker-20160329072746-192.168.1.83-38065 (192.168.1.83:38065) with 6 cores
16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/4 on hostPort 192.168.1.83:38065 with 6 cores, 512.0 MB RAM
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/3 is now LOADING
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/4 is now LOADING
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/3 is now RUNNING
16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/4 is now RUNNING
16/03/29 09:16:54 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now EXITED (Command exited with code 1)
16/03/29 09:16:54 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/1 removed: Command exited with code 1
16/03/29 09:16:54 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1


These are the logs from my slaves:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/30 01:50:18 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
16/03/30 01:50:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/30 01:50:19 INFO SecurityManager: Changing view acls to: ubuntu
16/03/30 01:50:19 INFO SecurityManager: Changing modify acls to: ubuntu
16/03/30 01:50:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
16/03/30 01:50:20 INFO Slf4jLogger: Slf4jLogger started
16/03/30 01:50:20 INFO Remoting: Starting remoting
16/03/30 01:50:20 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@192.168.1.82:38333]
16/03/30 01:50:20 INFO Utils: Successfully started service 'driverPropsFetcher' on port 38333.
16/03/30 01:50:20 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver@192.168.1.81:34047] has failed, address is now gated for [5000] ms. Reason is: [Association failed$
Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@192.168.1.81:34047/), Path(/user/CoarseGrainedScheduler)]
        at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
        at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
        at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
        at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
        at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
        at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
        at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
        at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
        at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
        at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
        at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
        at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
        at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:508)
        at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:541)
        at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:531)
        at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
        at akka.remote.EndpointWriter.postStop(Endpoint.scala:561)
        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
        at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:415)
        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
        at akka.actor.ActorCell.terminate(ActorCell.scala:369)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
16/03/30 01:50:20 INFO Utils: Shutdown hook called





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org