You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shubhabrata Roy <sh...@realeyesit.com> on 2014/04/24 15:40:59 UTC
Deploying a python code on a spark cluster

I am stuck with an issue for last two days and did not find any solution
after several hours of googling. Here is the details.

The following is a simple python code (Temp.py):

import sys
from random import random
from operator import add

from pyspark import SparkContext
from pyspark import SparkConf

if __name__ == "__main__":

     master = 
'spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077 
<http://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077/>'
# sys.argv[1]
     conf = SparkConf()
     conf.setMaster(master)
     conf.setAppName("PythonPi")
     conf.set("spark.executor.memory", "2g")
     conf.set("spark.cores.max", "10")
     conf.setSparkHome("/root/spark")

     sc = SparkContext(conf = conf)

     slices = 2
     n = 100000 * slices
     def f(_):
         x = random() * 2 - 1
         y = random() * 2 - 1
         return 1 if x ** 2 + y ** 2 < 1 else 0
     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
     print "Pi is roughly %f" % (4.0 * count / n)

     sc.stop()

I have spark installed in my local machine and when I deploy the code
locally then it works fine with pyspark (master = 'local[5]' in the above
code ).

Next I installed spark in EC2 where I can create master and a number of
slaves for deploying my code. After I create a master I get its URL which is
in the following format:
spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077 
<http://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077/>. However 
when
I run it using pyspark
./bin/pyspark/Temp.py I get the following warning:
  TaskSchedulerImpl: Initial job has not accepted any resources; check your
cluster UI to ensure that workers are registered and have sufficient memory

I have checked from the UI that each worker has 2.7 gb memory and have not
been used. Could you please give me any idea of this error ?

Looking forward to hear from you.



On 24/04/14 15:14, neveroutgunned@hush.com wrote:
> Thanks for the info. It seems like the JTS library is exactly what I 
> need (I'm not doing any raster processing at this point).
>
> So, once they successfully finish the Scala wrappers for JTS, I would 
> theoretically be able to use Scala to write a Spark job that includes 
> the JTS library, and then run it across a Spark cluster? That is 
> absolutely fantastic!
>
> I'll have to look into contributing to the JTS wrapping effort.
>
> Thanks again!
>
>
> On 14-04-2014 at 2:25 PM, "Josh Marcus" <jm...@meetup.com> wrote:
>
>     Hey there,
>
>     I'd encourage you to check out the development currently going on
>     with the GeoTrellis project
>     (http://github.com/geotrellis/geotrellis) or talk to the
>     developers on irc (freenode, #geotrellis) as they're
>     currently developing raster processing capabilities with spark as
>     a backend, as well as scala wrappers
>     for JTS (for calculations about geometry features).
>
>     --j
>
>
>
>     On Wed, Apr 23, 2014 at 2:00 PM, neveroutgunned
>     <ne...@hush.com> wrote:
>
>         Greetings Spark users/devs! I'm interested in using Spark to
>         process large volumes of data with a geospatial component, and
>         I haven't been able to find much information on Spark's
>         ability to handle this kind of operation. I don't need
>         anything too complex; just distance between two points,
>         point-in-polygon and the like.
>
>         Does Spark (or possibly Shark) support this kind of query? Has
>         anyone written a plugin/extension along these lines?
>
>         If there isn't anything like this so far, then it seems like I
>         have two options. I can either abandon Spark and fall back on
>         Hadoop and Hive with the ESRI Tools extension, or I can stick
>         with Spark and try to write/port a GIS toolkit. Which option
>         do you think I should pursue? How hard is it for someone
>         that's new to the Spark codebase to write an extension? Is
>         there anyone else in the community that would be interested in
>         having geospatial capability in Spark?
>
>         Thanks for your help!
>
>         ------------------------------------------------------------------------
>         View this message in context: Is Spark a good choice for
>         geospatial/GIS applications? Is a community volunteer needed
>         in this area?
>         <http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-a-good-choice-for-geospatial-GIS-applications-Is-a-community-volunteer-needed-in-this-area-tp4685.html>
>         Sent from the Apache Spark User List mailing list archive
>         <http://apache-spark-user-list.1001560.n3.nabble.com/> at
>         Nabble.com.
>
>