You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shubhabrata Roy <sh...@realeyesit.com> on 2014/04/24 15:40:59 UTC
Deploying a python code on a spark cluster
I am stuck with an issue for last two days and did not find any solution
after several hours of googling. Here is the details.
The following is a simple python code (Temp.py):
import sys
from random import random
from operator import add
from pyspark import SparkContext
from pyspark import SparkConf
if __name__ == "__main__":
master =
'spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077
<http://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077/>'
# sys.argv[1]
conf = SparkConf()
conf.setMaster(master)
conf.setAppName("PythonPi")
conf.set("spark.executor.memory", "2g")
conf.set("spark.cores.max", "10")
conf.setSparkHome("/root/spark")
sc = SparkContext(conf = conf)
slices = 2
n = 100000 * slices
def f(_):
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 < 1 else 0
count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
print "Pi is roughly %f" % (4.0 * count / n)
sc.stop()
I have spark installed in my local machine and when I deploy the code
locally then it works fine with pyspark (master = 'local[5]' in the above
code ).
Next I installed spark in EC2 where I can create master and a number of
slaves for deploying my code. After I create a master I get its URL which is
in the following format:
spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077
<http://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077/>. However
when
I run it using pyspark
./bin/pyspark/Temp.py I get the following warning:
TaskSchedulerImpl: Initial job has not accepted any resources; check your
cluster UI to ensure that workers are registered and have sufficient memory
I have checked from the UI that each worker has 2.7 gb memory and have not
been used. Could you please give me any idea of this error ?
Looking forward to hear from you.
On 24/04/14 15:14, neveroutgunned@hush.com wrote:
> Thanks for the info. It seems like the JTS library is exactly what I
> need (I'm not doing any raster processing at this point).
>
> So, once they successfully finish the Scala wrappers for JTS, I would
> theoretically be able to use Scala to write a Spark job that includes
> the JTS library, and then run it across a Spark cluster? That is
> absolutely fantastic!
>
> I'll have to look into contributing to the JTS wrapping effort.
>
> Thanks again!
>
>
> On 14-04-2014 at 2:25 PM, "Josh Marcus" <jm...@meetup.com> wrote:
>
> Hey there,
>
> I'd encourage you to check out the development currently going on
> with the GeoTrellis project
> (http://github.com/geotrellis/geotrellis) or talk to the
> developers on irc (freenode, #geotrellis) as they're
> currently developing raster processing capabilities with spark as
> a backend, as well as scala wrappers
> for JTS (for calculations about geometry features).
>
> --j
>
>
>
> On Wed, Apr 23, 2014 at 2:00 PM, neveroutgunned
> <ne...@hush.com> wrote:
>
> Greetings Spark users/devs! I'm interested in using Spark to
> process large volumes of data with a geospatial component, and
> I haven't been able to find much information on Spark's
> ability to handle this kind of operation. I don't need
> anything too complex; just distance between two points,
> point-in-polygon and the like.
>
> Does Spark (or possibly Shark) support this kind of query? Has
> anyone written a plugin/extension along these lines?
>
> If there isn't anything like this so far, then it seems like I
> have two options. I can either abandon Spark and fall back on
> Hadoop and Hive with the ESRI Tools extension, or I can stick
> with Spark and try to write/port a GIS toolkit. Which option
> do you think I should pursue? How hard is it for someone
> that's new to the Spark codebase to write an extension? Is
> there anyone else in the community that would be interested in
> having geospatial capability in Spark?
>
> Thanks for your help!
>
> ------------------------------------------------------------------------
> View this message in context: Is Spark a good choice for
> geospatial/GIS applications? Is a community volunteer needed
> in this area?
> <http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-a-good-choice-for-geospatial-GIS-applications-Is-a-community-volunteer-needed-in-this-area-tp4685.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at
> Nabble.com.
>
>