You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Shubhabrata <ma...@gmail.com> on 2014/04/24 15:46:03 UTC

Deploying a python code on a spark EC2 cluster

I am stuck with an issue for last two days and did not find any solution
after several hours of googling. Here is the details.

The following is a simple python code (Temp.py):

import sys
from random import random
from operator import add

from pyspark import SparkContext
from pyspark import SparkConf

if __name__ == "__main__":

    master = 'spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077'
# sys.argv[1]
    conf = SparkConf()
    conf.setMaster(master)
    conf.setAppName("PythonPi")
    conf.set("spark.executor.memory", "2g")
    conf.set("spark.cores.max", "10")
    conf.setSparkHome("/root/spark")

    sc = SparkContext(conf = conf)

    slices = 2
    n = 100000 * slices
    def f(_):
        x = random() * 2 - 1
        y = random() * 2 - 1
        return 1 if x ** 2 + y ** 2 < 1 else 0
    count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
    print "Pi is roughly %f" % (4.0 * count / n)

    sc.stop()

I have spark installed in my local machine and when I deploy the code
locally then it works fine with pyspark (master = 'local[5]' in the above
code ).

Next I installed spark in EC2 where I can create master and a number of
slaves for deploying my code. After I create a master I get its URL which is
in the following format:
spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077. However when
I run it using pyspark
./bin/pyspark/Temp.py I get the following warning:
 TaskSchedulerImpl: Initial job has not accepted any resources; check your
cluster UI to ensure that workers are registered and have sufficient memory

I have checked from the UI that each worker has 2.7 gb memory and have not
been used. Could you please give me any idea of this error ?

Looking forward to hear from you.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Deploying a python code on a spark EC2 cluster

Posted by Shubhabrata <ma...@gmail.com>.

Well, we used the script that comes with spark I think v0.9.1. But I am gonna
try the newer version (1.0rvc2 script). I shall keep you posted about my
findings. Thanks.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4820.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Deploying a python code on a spark EC2 cluster

Posted by Shubhabrata <ma...@gmail.com>.

In order to check if there is any issue with python API I ran a scala
application provided in the examples. Still the same error

./bin/run-example org.apache.spark.examples.SparkPi
spark://[Master-URL]:7077


SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/mnt/work/spark-0.9.1/examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/mnt/work/spark-0.9.1/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/04/25 17:07:10 INFO Utils: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/04/25 17:07:10 WARN Utils: Your hostname, rd-hu resolves to a loopback
address: 127.0.1.1; using 192.168.122.1 instead (on interface virbr0)
14/04/25 17:07:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
14/04/25 17:07:11 INFO Slf4jLogger: Slf4jLogger started
14/04/25 17:07:11 INFO Remoting: Starting remoting
14/04/25 17:07:11 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://spark@192.168.122.1:26278]
14/04/25 17:07:11 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@192.168.122.1:26278]
14/04/25 17:07:11 INFO SparkEnv: Registering BlockManagerMaster
14/04/25 17:07:11 INFO DiskBlockManager: Created local directory at
/tmp/spark-local-20140425170711-d1da
14/04/25 17:07:11 INFO MemoryStore: MemoryStore started with capacity 16.0
GB.
14/04/25 17:07:11 INFO ConnectionManager: Bound socket to port 9788 with id
= ConnectionManagerId(192.168.122.1,9788)
14/04/25 17:07:11 INFO BlockManagerMaster: Trying to register BlockManager
14/04/25 17:07:11 INFO BlockManagerMasterActor$BlockManagerInfo: Registering
block manager 192.168.122.1:9788 with 16.0 GB RAM
14/04/25 17:07:11 INFO BlockManagerMaster: Registered BlockManager
14/04/25 17:07:11 INFO HttpServer: Starting HTTP Server
14/04/25 17:07:11 INFO HttpBroadcast: Broadcast server started at
http://192.168.122.1:58091
14/04/25 17:07:11 INFO SparkEnv: Registering MapOutputTracker
14/04/25 17:07:11 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-599577a4-5732-4949-a2e8-f59eb679e843
14/04/25 17:07:11 INFO HttpServer: Starting HTTP Server
14/04/25 17:07:12 WARN AbstractLifeCycle: FAILED
SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already
in use
java.net.BindException: Address already in use
	at sun.nio.ch.Net.bind0(Native Method)
	at sun.nio.ch.Net.bind(Net.java:444)
	at sun.nio.ch.Net.bind(Net.java:436)
	at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
	at
org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
	at
org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
	at
org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
	at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
	at org.eclipse.jetty.server.Server.doStart(Server.java:286)
	at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
	at
org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
	at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
	at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
	at scala.util.Try$.apply(Try.scala:161)
	at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
	at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
	at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:100)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
14/04/25 17:07:12 WARN AbstractLifeCycle: FAILED
org.eclipse.jetty.server.Server@74f4b96: java.net.BindException: Address
already in use
java.net.BindException: Address already in use
	at sun.nio.ch.Net.bind0(Native Method)
	at sun.nio.ch.Net.bind(Net.java:444)
	at sun.nio.ch.Net.bind(Net.java:436)
	at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
	at
org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
	at
org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
	at
org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
	at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
	at org.eclipse.jetty.server.Server.doStart(Server.java:286)
	at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
	at
org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
	at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
	at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
	at scala.util.Try$.apply(Try.scala:161)
	at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
	at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
	at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:100)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
14/04/25 17:07:12 INFO JettyUtils: Failed to create UI at port, 4040. Trying
again.
14/04/25 17:07:12 INFO JettyUtils: Error was:
Failure(java.net.BindException: Address already in use)
14/04/25 17:07:12 INFO SparkUI: Started Spark Web UI at
http://192.168.122.1:4041
14/04/25 17:07:12 INFO SparkContext: Added JAR
/mnt/work/spark-0.9.1/examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar
at http://192.168.122.1:49137/jars/spark-examples-assembly-0.9.1.jar with
timestamp 1398442032736
14/04/25 17:07:12 INFO AppClient$ClientActor: Connecting to master
spark://ec2-54-220-220-133.eu-west-1.compute.amazonaws.com:7077...
14/04/25 17:07:13 INFO SparkContext: Starting job: reduce at
SparkPi.scala:39
14/04/25 17:07:13 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:39)
with 2 output partitions (allowLocal=false)
14/04/25 17:07:13 INFO DAGScheduler: Final stage: Stage 0 (reduce at
SparkPi.scala:39)
14/04/25 17:07:13 INFO DAGScheduler: Parents of final stage: List()
14/04/25 17:07:13 INFO DAGScheduler: Missing parents: List()
14/04/25 17:07:13 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map
at SparkPi.scala:35), which has no missing parents
14/04/25 17:07:13 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0
(MappedRDD[1] at map at SparkPi.scala:35)
14/04/25 17:07:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Connected to Spark
cluster with app ID app-20140425160713-0002
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added:
app-20140425160713-0002/0 on
worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839
(ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140425160713-0002/0 on hostPort
ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/0 is now RUNNING
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/0 is now FAILED (class java.io.IOException: Cannot
run program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory)
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor
app-20140425160713-0002/0 removed: class java.io.IOException: Cannot run
program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added:
app-20140425160713-0002/1 on
worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839
(ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140425160713-0002/1 on hostPort
ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/1 is now RUNNING
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/1 is now FAILED (class java.io.IOException: Cannot
run program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory)
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor
app-20140425160713-0002/1 removed: class java.io.IOException: Cannot run
program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added:
app-20140425160713-0002/2 on
worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839
(ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140425160713-0002/2 on hostPort
ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/2 is now RUNNING
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/2 is now FAILED (class java.io.IOException: Cannot
run program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory)
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor
app-20140425160713-0002/2 removed: class java.io.IOException: Cannot run
program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added:
app-20140425160713-0002/3 on
worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839
(ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140425160713-0002/3 on hostPort
ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/3 is now RUNNING
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/3 is now FAILED (class java.io.IOException: Cannot
run program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory)
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor
app-20140425160713-0002/3 removed: class java.io.IOException: Cannot run
program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added:
app-20140425160713-0002/4 on
worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839
(ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140425160713-0002/4 on hostPort
ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/4 is now RUNNING
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/4 is now FAILED (class java.io.IOException: Cannot
run program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory)
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor
app-20140425160713-0002/4 removed: class java.io.IOException: Cannot run
program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added:
app-20140425160713-0002/5 on
worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839
(ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140425160713-0002/5 on hostPort
ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/5 is now RUNNING
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/5 is now FAILED (class java.io.IOException: Cannot
run program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory)
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor
app-20140425160713-0002/5 removed: class java.io.IOException: Cannot run
program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added:
app-20140425160713-0002/6 on
worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839
(ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140425160713-0002/6 on hostPort
ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/6 is now RUNNING
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/6 is now FAILED (class java.io.IOException: Cannot
run program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory)
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor
app-20140425160713-0002/6 removed: class java.io.IOException: Cannot run
program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added:
app-20140425160713-0002/7 on
worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839
(ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140425160713-0002/7 on hostPort
ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/7 is now RUNNING
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/7 is now FAILED (class java.io.IOException: Cannot
run program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory)
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor
app-20140425160713-0002/7 removed: class java.io.IOException: Cannot run
program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added:
app-20140425160713-0002/8 on
worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839
(ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140425160713-0002/8 on hostPort
ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/8 is now RUNNING
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/8 is now FAILED (class java.io.IOException: Cannot
run program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory)
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor
app-20140425160713-0002/8 removed: class java.io.IOException: Cannot run
program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor added:
app-20140425160713-0002/9 on
worker-20140425133348-ip-10-84-7-178.eu-west-1.compute.internal-57839
(ip-10-84-7-178.eu-west-1.compute.internal:57839) with 1 cores
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140425160713-0002/9 on hostPort
ip-10-84-7-178.eu-west-1.compute.internal:57839 with 1 cores, 512.0 MB RAM
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/9 is now RUNNING
14/04/25 17:07:13 INFO AppClient$ClientActor: Executor updated:
app-20140425160713-0002/9 is now FAILED (class java.io.IOException: Cannot
run program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory)
14/04/25 17:07:13 INFO SparkDeploySchedulerBackend: Executor
app-20140425160713-0002/9 removed: class java.io.IOException: Cannot run
program "/mnt/work/spark/bin/compute-classpath.sh" (in directory "."):
error=2, No such file or directory
14/04/25 17:07:13 ERROR AppClient$ClientActor: Master removed our
application: FAILED; stopping client
14/04/25 17:07:13 WARN SparkDeploySchedulerBackend: Disconnected from Spark
cluster! Waiting for reconnection...
14/04/25 17:07:28 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
14/04/25 17:07:43 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
14/04/25 17:07:58 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
14/04/25 17:08:13 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
14/04/25 17:08:28 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4833.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Deploying a python code on a spark EC2 cluster

Posted by Shubhabrata <ma...@gmail.com>.

This is the error from stderr:


Spark Executor Command: "java" "-cp"
":/root/ephemeral-hdfs/conf:/root/ephemeral-hdfs/conf:/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar"
"-Djava.library.path=/root/ephemeral-hdfs/lib/native/"
"-Dspark.local.dir=/mnt/spark" "-Dspark.local.dir=/mnt/spark"
"-Dspark.local.dir=/mnt/spark" "-Dspark.local.dir=/mnt/spark" "-Xms2048M"
"-Xmx2048M" "org.apache.spark.executor.CoarseGrainedExecutorBackend"
"akka.tcp://spark@192.168.122.1:44577/user/CoarseGrainedScheduler" "1"
"ip-10-84-7-178.eu-west-1.compute.internal" "1"
"akka.tcp://sparkWorker@ip-10-84-7-178.eu-west-1.compute.internal:57839/user/Worker"
"app-20140425133749-0000"
========================================

14/04/25 13:39:37 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/04/25 13:39:38 INFO Remoting: Starting remoting
14/04/25 13:39:38 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkExecutor@ip-10-84-7-178.eu-west-1.compute.internal:36800]
14/04/25 13:39:38 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sparkExecutor@ip-10-84-7-178.eu-west-1.compute.internal:36800]
14/04/25 13:39:38 INFO worker.WorkerWatcher: Connecting to worker
akka.tcp://sparkWorker@ip-10-84-7-178.eu-west-1.compute.internal:57839/user/Worker
14/04/25 13:39:38 INFO executor.CoarseGrainedExecutorBackend: Connecting to
driver: akka.tcp://spark@192.168.122.1:44577/user/CoarseGrainedScheduler
14/04/25 13:39:39 INFO worker.WorkerWatcher: Successfully connected to
akka.tcp://sparkWorker@ip-10-84-7-178.eu-west-1.compute.internal:57839/user/Worker
14/04/25 13:41:19 ERROR executor.CoarseGrainedExecutorBackend: Driver
Disassociated
[akka.tcp://sparkExecutor@ip-10-84-7-178.eu-west-1.compute.internal:36800]
-> [akka.tcp://spark@192.168.122.1:44577] disassociated! Shutting down.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4828.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Deploying a python code on a spark EC2 cluster

Posted by John King <us...@gmail.com>.

This happens to me when using the EC2 scripts for v1.0.0rc2 recent release.
The Master connects and then disconnects immediately, eventually saying
Master disconnected from cluster.


On Thu, Apr 24, 2014 at 4:01 PM, Matei Zaharia <ma...@gmail.com>wrote:

> Did you launch this using our EC2 scripts (
> http://spark.apache.org/docs/latest/ec2-scripts.html) or did you manually
> set up the daemons? My guess is that their hostnames are not being resolved
> properly on all nodes, so executor processes can’t connect back to your
> driver app. This error message indicates that:
>
> 14/04/24 09:00:49 WARN util.Utils: Your hostname, spark-node resolves to a
> loopback address: 127.0.0.1; using 10.74.149.251 instead (on interface
> eth0)
> 14/04/24 09:00:49 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind
> to
> another address
>
> If you launch with your EC2 scripts, or don’t manually change the
> hostnames, this should not happen.
>
> Matei
>
> On Apr 24, 2014, at 11:36 AM, John King <us...@gmail.com>
> wrote:
>
> Same problem.
>
>
> On Thu, Apr 24, 2014 at 10:54 AM, Shubhabrata <ma...@gmail.com>wrote:
>
>> Moreover it seems all the workers are registered and have sufficient
>> memory
>> (2.7GB where as I have asked for 512 MB). The UI also shows the jobs are
>> running on the slaves. But on the termial it is still the same error
>> "Initial job has not accepted any resources; check your cluster UI to
>> ensure
>> that workers are registered and have sufficient memory"
>>
>> Please see the screenshot. Thanks
>>
>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n4761/33.png>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4761.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
>

Re: Deploying a python code on a spark EC2 cluster

Posted by Matei Zaharia <ma...@gmail.com>.

Did you launch this using our EC2 scripts (http://spark.apache.org/docs/latest/ec2-scripts.html) or did you manually set up the daemons? My guess is that their hostnames are not being resolved properly on all nodes, so executor processes can’t connect back to your driver app. This error message indicates that:

14/04/24 09:00:49 WARN util.Utils: Your hostname, spark-node resolves to a
loopback address: 127.0.0.1; using 10.74.149.251 instead (on interface eth0)
14/04/24 09:00:49 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to
another address

If you launch with your EC2 scripts, or don’t manually change the hostnames, this should not happen.

Matei

On Apr 24, 2014, at 11:36 AM, John King <us...@gmail.com> wrote:

> Same problem.
> 
> 
> On Thu, Apr 24, 2014 at 10:54 AM, Shubhabrata <ma...@gmail.com> wrote:
> Moreover it seems all the workers are registered and have sufficient memory
> (2.7GB where as I have asked for 512 MB). The UI also shows the jobs are
> running on the slaves. But on the termial it is still the same error
> "Initial job has not accepted any resources; check your cluster UI to ensure
> that workers are registered and have sufficient memory"
> 
> Please see the screenshot. Thanks
> 
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n4761/33.png>
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4761.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Deploying a python code on a spark EC2 cluster

Posted by John King <us...@gmail.com>.

Same problem.


On Thu, Apr 24, 2014 at 10:54 AM, Shubhabrata <ma...@gmail.com> wrote:

> Moreover it seems all the workers are registered and have sufficient memory
> (2.7GB where as I have asked for 512 MB). The UI also shows the jobs are
> running on the slaves. But on the termial it is still the same error
> "Initial job has not accepted any resources; check your cluster UI to
> ensure
> that workers are registered and have sufficient memory"
>
> Please see the screenshot. Thanks
>
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n4761/33.png>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4761.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Deploying a python code on a spark EC2 cluster

Posted by Shubhabrata <ma...@gmail.com>.

Moreover it seems all the workers are registered and have sufficient memory
(2.7GB where as I have asked for 512 MB). The UI also shows the jobs are
running on the slaves. But on the termial it is still the same error
"Initial job has not accepted any resources; check your cluster UI to ensure
that workers are registered and have sufficient memory"

Please see the screenshot. Thanks

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n4761/33.png> 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4761.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Deploying a python code on a spark EC2 cluster

Posted by Shubhabrata <ma...@gmail.com>.

Spark Command: /usr/lib/jvm/java-1.7.0/bin/java -cp
:/root/ephemeral-hdfs/conf:/root/ephemeral-hdfs/conf:/root/ephemeral-hdfs/conf:/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar
-Dspark.akka.logLifecycleEvents=true
-Djava.library.path=/root/ephemeral-hdfs/lib/native/ -Xms512m -Xmx512m
org.apache.spark.deploy.master.Master --ip
ec2-54-216-38-82.eu-west-1.compute.amazonaws.com --port 7077 --webui-port
8080
========================================

14/04/24 09:00:49 WARN util.Utils: Your hostname, spark-node resolves to a
loopback address: 127.0.0.1; using 10.74.149.251 instead (on interface eth0)
14/04/24 09:00:49 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
14/04/24 09:00:50 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/04/24 09:00:51 INFO Remoting: Starting remoting
14/04/24 09:00:51 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkMaster@ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077]
14/04/24 09:00:52 INFO master.Master: Starting Spark master at
spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077
14/04/24 09:00:52 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/04/24 09:00:52 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/metrics/master/json,null}
14/04/24 09:00:52 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/metrics/applications/json,null}
14/04/24 09:00:52 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/static,null}
14/04/24 09:00:52 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/app/json,null}
14/04/24 09:00:52 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/app,null}
14/04/24 09:00:52 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/json,null}
14/04/24 09:00:52 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{*,null}
14/04/24 09:00:52 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:8080
14/04/24 09:00:52 INFO ui.MasterWebUI: Started Master web UI at
http://ip-10-74-149-251.eu-west-1.compute.internal:8080
14/04/24 09:00:52 INFO master.Master: I have been elected leader! New state:
ALIVE
14/04/24 09:01:16 INFO master.Master: Registering worker
ip-10-74-181-22.eu-west-1.compute.internal:44602 with 1 cores, 2.7 GB RAM
14/04/24 09:01:16 INFO master.Master: Registering worker
ip-10-75-4-137.eu-west-1.compute.internal:49042 with 1 cores, 2.7 GB RAM
14/04/24 09:01:16 INFO master.Master: Registering worker
ip-10-75-2-167.eu-west-1.compute.internal:60719 with 1 cores, 2.7 GB RAM
14/04/24 09:03:00 INFO master.Master: Registering app Pipleline FE
14/04/24 09:03:00 INFO master.Master: Registered app Pipleline FE with ID
app-20140424090300-0000
14/04/24 09:03:54 INFO master.Master: akka.tcp://spark@192.168.122.1:34717
got disassociated, removing it.
14/04/24 09:03:54 INFO master.Master: Removing app app-20140424090300-0000
14/04/24 09:03:54 INFO master.Master: akka.tcp://spark@192.168.122.1:34717
got disassociated, removing it.
14/04/24 09:03:54 INFO actor.LocalActorRef: Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
Actor[akka://sparkMaster/deadLetters] to
Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40188.142.228.166%3A53153-4#977853534]
was not delivered. [1] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/04/24 09:05:34 ERROR remote.EndpointWriter: AssociationError
[akka.tcp://sparkMaster@ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077]
-> [akka.tcp://spark@192.168.122.1:34717]: Error [Association failed with
[akka.tcp://spark@192.168.122.1:34717]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://spark@192.168.122.1:34717]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
connection timed out: /192.168.122.1:34717
]
14/04/24 09:05:34 INFO master.Master: akka.tcp://spark@192.168.122.1:34717
got disassociated, removing it.
@                                                                                                                                                     
"spark-root-org.apache.spark.deploy.master.Master-1-.out" 6854L, 829378C



clearly the workers are registered



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4760.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.