You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by baraky <ba...@gmail.com> on 2015/10/08 13:04:03 UTC

Spark 1.5.1 standalone cluster - wrong Akka remoting config?

Doing my firsts steps with Spark, I'm facing problems submitting jobs to
cluster from the application code. Digging the logs, I noticed some periodic
WARN messages on master log:

15/10/08 13:00:00 WARN remote.ReliableDeliverySupervisor: Association with
remote system [akka.tcp://sparkDriver@192.168.254.167:64014] has failed,
address is now gated for [5000] ms. Reason: [Disassociated]

The problem is that ip address not exist on our network, and wasn't
configured anywhere. The same wrong ip is shown on the worker log when it
tries execute the task (wrong ip passed to --driver-url):

15/10/08 12:58:21 INFO worker.ExecutorRunner: Launch command:
"/usr/java/latest//bin/java" "-cp" "/path/spark/spark-1.5.1-bin-ha
doop2.6/sbin/../conf/:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/path/spark/
spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.ja
r:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/path/hadoop/2.6.0//etc/hadoop/"
"-Xms102
4M" "-Xmx1024M" "-Dspark.driver.port=64014" "-Dspark.driver.port=53411"
"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url"
"akka.tcp://sparkDriver@192.168.254.167:64014/user/CoarseGrainedScheduler"
"--executor-id" "39" "--hostname" "192.168.10.214" "--cores" "16" "--app-id" 
"app-20151008123702-0003" "--worker-url"
"akka.tcp://sparkWorker@192.168.10.214:37625/user/Worker"
15/10/08 12:59:28 INFO worker.Worker: Executor app-20151008123702-0003/39
finished with state EXITED message Command exited with code 1 exitStatus 1
Any idea what I did wrong and how can this be fixed?

The java version is 1.8.0_20, and I'm using pre-built Spark binaries.

Thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-5-1-standalone-cluster-wrong-Akka-remoting-config-tp24978.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark 1.5.1 standalone cluster - wrong Akka remoting config?

Posted by "michal.klos81@gmail.com" <mi...@gmail.com>.

Try setting spark.driver.host to the actual ip or hostname of the box submitting the work. More info the networking section in this link:

http://spark.apache.org/docs/latest/configuration.html

Also check the spark config for your application for these driver settings in the application web UI at http://<driver>:4040  in the “Environment” tab. More info in the "viewing configuration properties" section in that link.

M



> On Oct 8, 2015, at 7:04 AM, baraky <ba...@gmail.com> wrote:
> 
> Doing my firsts steps with Spark, I'm facing problems submitting jobs to
> cluster from the application code. Digging the logs, I noticed some periodic
> WARN messages on master log:
> 
> 15/10/08 13:00:00 WARN remote.ReliableDeliverySupervisor: Association with
> remote system [akka.tcp://sparkDriver@192.168.254.167:64014] has failed,
> address is now gated for [5000] ms. Reason: [Disassociated]
> 
> The problem is that ip address not exist on our network, and wasn't
> configured anywhere. The same wrong ip is shown on the worker log when it
> tries execute the task (wrong ip passed to --driver-url):
> 
> 15/10/08 12:58:21 INFO worker.ExecutorRunner: Launch command:
> "/usr/java/latest//bin/java" "-cp" "/path/spark/spark-1.5.1-bin-ha
> doop2.6/sbin/../conf/:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/path/spark/
> spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.ja
> r:/path/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/path/hadoop/2.6.0//etc/hadoop/"
> "-Xms102
> 4M" "-Xmx1024M" "-Dspark.driver.port=64014" "-Dspark.driver.port=53411"
> "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url"
> "akka.tcp://sparkDriver@192.168.254.167:64014/user/CoarseGrainedScheduler"
> "--executor-id" "39" "--hostname" "192.168.10.214" "--cores" "16" "--app-id" 
> "app-20151008123702-0003" "--worker-url"
> "akka.tcp://sparkWorker@192.168.10.214:37625/user/Worker"
> 15/10/08 12:59:28 INFO worker.Worker: Executor app-20151008123702-0003/39
> finished with state EXITED message Command exited with code 1 exitStatus 1
> Any idea what I did wrong and how can this be fixed?
> 
> The java version is 1.8.0_20, and I'm using pre-built Spark binaries.
> 
> Thanks!
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-5-1-standalone-cluster-wrong-Akka-remoting-config-tp24978.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>