You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/03/30 13:41:25 UTC

[jira] [Resolved] (SPARK-14266) Association with remote system [akka.tcp://sparkDriver@192.168.1.81:34047] has failed, address is now gated for [5000] ms. Reason is: [Association failed$

     [ https://issues.apache.org/jira/browse/SPARK-14266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-14266.
-------------------------------
          Resolution: Invalid
       Fix Version/s:     (was: 1.4.1)
    Target Version/s:   (was: 1.4.1)

I'm going to close this as there are too many things this could be caused by, most of which are configuration problems. All you show is that various services can't communicate with each other which suggests a network config problem.

Also, read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark befre opening a JIRA. For example you should never set Blocker, or Target/Fix version.

> Association with remote system [akka.tcp://sparkDriver@192.168.1.81:34047] has failed, address is now gated for [5000] ms. Reason is: [Association failed$
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-14266
>                 URL: https://issues.apache.org/jira/browse/SPARK-14266
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Spark Core
>    Affects Versions: 1.4.1
>         Environment: Ubuntu,
> Spark 1.4.1
> Python 2.7
> java version "1.7.0_95"
> OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1)
> OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
>            Reporter: Pavan Kumar
>            Priority: Blocker
>
> I have a spark standalone cluster configuration with 1 master and 3 slaves.
> Configuration in Master's spar-env.sh:
> export SPARK_PUBLIC_DNS="173.220.132.82"
> export SPARK_WORKER_CORES=6
> SPARK_MASTER_IP='192.168.1.81'
> SPARK_LOCAL_IP='192.168.1.81'
> Configuration in Master machine /conf/salves
> 192.168.1.82
> 192.168.1.83
> 192.168.1.84
> These are my 3 slaves.
> Now when trying to run 
> ubuntu@MyCareerVM1:/usr/local/spark$ MASTER=spark://192.168.1.81:7077 bin/pyspark
> It is continuously throwing error
> Error Logs from Master:
> ubuntu@MyCareerVM1:/usr/local/spark$ MASTER=spark://192.168.1.81:7077 bin/pyspark
> Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
> [GCC 4.8.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> 16/03/29 09:16:48 INFO SparkContext: Running Spark version 1.4.1
> 16/03/29 09:16:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 16/03/29 09:16:49 INFO SecurityManager: Changing view acls to: ubuntu
> 16/03/29 09:16:49 INFO SecurityManager: Changing modify acls to: ubuntu
> 16/03/29 09:16:49 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
> 16/03/29 09:16:49 INFO Slf4jLogger: Slf4jLogger started
> 16/03/29 09:16:50 INFO Remoting: Starting remoting
> 16/03/29 09:16:50 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.81:34901]
> 16/03/29 09:16:50 INFO Utils: Successfully started service 'sparkDriver' on port 34901.
> 16/03/29 09:16:50 INFO SparkEnv: Registering MapOutputTracker
> 16/03/29 09:16:50 INFO SparkEnv: Registering BlockManagerMaster
> 16/03/29 09:16:50 INFO DiskBlockManager: Created local directory at /tmp/spark-a77016c9-a9ae-49c5-908f-fc540dc7d3ff/blockmgr-a9e868af-4253-4230-9227-948fbb8a0d91
> 16/03/29 09:16:50 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
> 16/03/29 09:16:50 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a77016c9-a9ae-49c5-908f-fc540dc7d3ff/httpd-a78e633c-0ae7-46cf-81e8-776d8f7c3c46
> 16/03/29 09:16:50 INFO HttpServer: Starting HTTP Server
> 16/03/29 09:16:50 INFO Utils: Successfully started service 'HTTP file server' on port 34364.
> 16/03/29 09:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/03/29 09:16:50 INFO Utils: Successfully started service 'SparkUI' on port 4040.
> 16/03/29 09:16:50 INFO SparkUI: Started SparkUI at http://173.220.132.82:4040
> 16/03/29 09:16:50 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@192.168.1.81:7077/user/Master...
> 16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160329091651-0006
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/0 on worker-20160329072744-192.168.1.84-45492 (192.168.1.84:45492) with 6 cores
> 16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/0 on hostPort 192.168.1.84:45492 with 6 cores, 512.0 MB RAM
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/1 on worker-20160329072744-192.168.1.82-45482 (192.168.1.82:45482) with 6 cores
> 16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/1 on hostPort 192.168.1.82:45482 with 6 cores, 512.0 MB RAM
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/2 on worker-20160329072746-192.168.1.83-38065 (192.168.1.83:38065) with 6 cores
> 16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/2 on hostPort 192.168.1.83:38065 with 6 cores, 512.0 MB RAM
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now LOADING
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now LOADING
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now LOADING
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now RUNNING
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now RUNNING
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now RUNNING
> 16/03/29 09:16:51 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42458.
> 16/03/29 09:16:51 INFO NettyBlockTransferService: Server created on 42458
> 16/03/29 09:16:51 INFO BlockManagerMaster: Trying to register BlockManager
> 16/03/29 09:16:51 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.81:42458 with 265.4 MB RAM, BlockManagerId(driver, 192.168.1.81, 42458)
> 16/03/29 09:16:51 INFO BlockManagerMaster: Registered BlockManager
> 16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /__ / .__/\_,_/_/ /_/\_\   version 1.4.1
>       /_/
> Using Python version 2.7.6 (default, Jun 22 2015 17:58:13)
> SparkContext available as sc, HiveContext available as sqlContext.
> >>> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/0 is now EXITED (Command exited with code 1)
> 16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/0 removed: Command exited with code 1
> 16/03/29 09:16:53 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/3 on worker-20160329072744-192.168.1.84-45492 (192.168.1.84:45492) with 6 cores
> 16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/3 on hostPort 192.168.1.84:45492 with 6 cores, 512.0 MB RAM
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/2 is now EXITED (Command exited with code 1)
> 16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/2 removed: Command exited with code 1
> 16/03/29 09:16:53 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor added: app-20160329091651-0006/4 on worker-20160329072746-192.168.1.83-38065 (192.168.1.83:38065) with 6 cores
> 16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160329091651-0006/4 on hostPort 192.168.1.83:38065 with 6 cores, 512.0 MB RAM
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/3 is now LOADING
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/4 is now LOADING
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/3 is now RUNNING
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/4 is now RUNNING
> 16/03/29 09:16:54 INFO AppClient$ClientActor: Executor updated: app-20160329091651-0006/1 is now EXITED (Command exited with code 1)
> 16/03/29 09:16:54 INFO SparkDeploySchedulerBackend: Executor app-20160329091651-0006/1 removed: Command exited with code 1
> 16/03/29 09:16:54 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
> These are the logs from my slaves:
> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
> 16/03/30 01:50:18 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
> 16/03/30 01:50:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 16/03/30 01:50:19 INFO SecurityManager: Changing view acls to: ubuntu
> 16/03/30 01:50:19 INFO SecurityManager: Changing modify acls to: ubuntu
> 16/03/30 01:50:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
> 16/03/30 01:50:20 INFO Slf4jLogger: Slf4jLogger started
> 16/03/30 01:50:20 INFO Remoting: Starting remoting
> 16/03/30 01:50:20 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@192.168.1.82:38333]
> 16/03/30 01:50:20 INFO Utils: Successfully started service 'driverPropsFetcher' on port 38333.
> 16/03/30 01:50:20 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver@192.168.1.81:34047] has failed, address is now gated for [5000] ms. Reason is: [Association failed$
> Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@192.168.1.81:34047/), Path(/user/CoarseGrainedScheduler)]
>         at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
>         at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
>         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>         at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>         at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>         at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>         at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>         at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>         at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>         at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
>         at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
>         at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
>         at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>         at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
>         at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
>         at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:508)
>         at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:541)
>         at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:531)
>         at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
>         at akka.remote.EndpointWriter.postStop(Endpoint.scala:561)
>         at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
>         at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:415)
>         at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
>         at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
>         at akka.actor.ActorCell.terminate(ActorCell.scala:369)
>         at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
>         at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
>         at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 16/03/30 01:50:20 INFO Utils: Shutdown hook called



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org