You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by neveroutgunned <ne...@hush.com> on 2014/04/23 20:00:37 UTC

Is Spark a good choice for geospatial/GIS applications? Is a community volunteer needed in this area?

Greetings Spark users/devs! I'm interested in using Spark to process
large volumes of data with a geospatial component, and I haven't been
able to find much information on Spark's ability to handle this kind
of operation. I don't need anything too complex; just distance between
two points, point-in-polygon and the like.

Does Spark (or possibly Shark) support this kind of query? Has anyone
written a plugin/extension along these lines?

If there isn't anything like this so far, then it seems like I have
two options. I can either abandon Spark and fall back on Hadoop and
Hive with the ESRI Tools extension, or I can stick with Spark and try
to write/port a GIS toolkit. Which option do you think I should
pursue? How hard is it for someone that's new to the Spark codebase to
write an extension? Is there anyone else in the community that would
be interested in having geospatial capability in Spark?

Thanks for your help!




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-a-good-choice-for-geospatial-GIS-applications-Is-a-community-volunteer-needed-in-this-area-tp4685.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Deploying a python code on a spark cluster

Posted by Shubhabrata Roy <sh...@realeyesit.com>.
I am stuck with an issue for last two days and did not find any solution
after several hours of googling. Here is the details.

The following is a simple python code (Temp.py):

import sys
from random import random
from operator import add

from pyspark import SparkContext
from pyspark import SparkConf

if __name__ == "__main__":

     master = 
'spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077 
<http://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077/>'
# sys.argv[1]
     conf = SparkConf()
     conf.setMaster(master)
     conf.setAppName("PythonPi")
     conf.set("spark.executor.memory", "2g")
     conf.set("spark.cores.max", "10")
     conf.setSparkHome("/root/spark")

     sc = SparkContext(conf = conf)

     slices = 2
     n = 100000 * slices
     def f(_):
         x = random() * 2 - 1
         y = random() * 2 - 1
         return 1 if x ** 2 + y ** 2 < 1 else 0
     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
     print "Pi is roughly %f" % (4.0 * count / n)

     sc.stop()

I have spark installed in my local machine and when I deploy the code
locally then it works fine with pyspark (master = 'local[5]' in the above
code ).

Next I installed spark in EC2 where I can create master and a number of
slaves for deploying my code. After I create a master I get its URL which is
in the following format:
spark://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077 
<http://ec2-54-216-38-82.eu-west-1.compute.amazonaws.com:7077/>. However 
when
I run it using pyspark
./bin/pyspark/Temp.py I get the following warning:
  TaskSchedulerImpl: Initial job has not accepted any resources; check your
cluster UI to ensure that workers are registered and have sufficient memory

I have checked from the UI that each worker has 2.7 gb memory and have not
been used. Could you please give me any idea of this error ?

Looking forward to hear from you.



On 24/04/14 15:14, neveroutgunned@hush.com wrote:
> Thanks for the info. It seems like the JTS library is exactly what I 
> need (I'm not doing any raster processing at this point).
>
> So, once they successfully finish the Scala wrappers for JTS, I would 
> theoretically be able to use Scala to write a Spark job that includes 
> the JTS library, and then run it across a Spark cluster? That is 
> absolutely fantastic!
>
> I'll have to look into contributing to the JTS wrapping effort.
>
> Thanks again!
>
>
> On 14-04-2014 at 2:25 PM, "Josh Marcus" <jm...@meetup.com> wrote:
>
>     Hey there,
>
>     I'd encourage you to check out the development currently going on
>     with the GeoTrellis project
>     (http://github.com/geotrellis/geotrellis) or talk to the
>     developers on irc (freenode, #geotrellis) as they're
>     currently developing raster processing capabilities with spark as
>     a backend, as well as scala wrappers
>     for JTS (for calculations about geometry features).
>
>     --j
>
>
>
>     On Wed, Apr 23, 2014 at 2:00 PM, neveroutgunned
>     <ne...@hush.com> wrote:
>
>         Greetings Spark users/devs! I'm interested in using Spark to
>         process large volumes of data with a geospatial component, and
>         I haven't been able to find much information on Spark's
>         ability to handle this kind of operation. I don't need
>         anything too complex; just distance between two points,
>         point-in-polygon and the like.
>
>         Does Spark (or possibly Shark) support this kind of query? Has
>         anyone written a plugin/extension along these lines?
>
>         If there isn't anything like this so far, then it seems like I
>         have two options. I can either abandon Spark and fall back on
>         Hadoop and Hive with the ESRI Tools extension, or I can stick
>         with Spark and try to write/port a GIS toolkit. Which option
>         do you think I should pursue? How hard is it for someone
>         that's new to the Spark codebase to write an extension? Is
>         there anyone else in the community that would be interested in
>         having geospatial capability in Spark?
>
>         Thanks for your help!
>
>         ------------------------------------------------------------------------
>         View this message in context: Is Spark a good choice for
>         geospatial/GIS applications? Is a community volunteer needed
>         in this area?
>         <http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-a-good-choice-for-geospatial-GIS-applications-Is-a-community-volunteer-needed-in-this-area-tp4685.html>
>         Sent from the Apache Spark User List mailing list archive
>         <http://apache-spark-user-list.1001560.n3.nabble.com/> at
>         Nabble.com.
>
>


Re: Is Spark a good choice for geospatial/GIS applications? Is a community volunteer needed in this area?

Posted by ne...@hush.com.
Thanks for the info. It seems like the JTS library is exactly what I
need (I'm not doing any raster processing at this point).

So, once they successfully finish the Scala wrappers for JTS, I would
theoretically be able to use Scala to write a Spark job that includes
the JTS library, and then run it across a Spark cluster? That is
absolutely fantastic!

I'll have to look into contributing to the JTS wrapping effort.

Thanks again!
On 14-04-2014 at 2:25 PM, "Josh Marcus"  wrote:Hey there,
I'd encourage you to check out the development currently going on with
the GeoTrellis project(http://github.com/geotrellis/geotrellis) or
talk to the developers on irc (freenode, #geotrellis) as they're
currently developing raster processing capabilities with spark as a
backend, as well as scala wrappersfor JTS (for calculations about
geometry features).
--j
On Wed, Apr 23, 2014 at 2:00 PM, neveroutgunned  wrote:
 Greetings Spark users/devs! I'm interested in using Spark to process
large volumes of data with a geospatial component, and I haven't been
able to find much information on Spark's ability to handle this kind
of operation. I don't need anything too complex; just distance between
two points, point-in-polygon and the like.
Does Spark (or possibly Shark) support this kind of query? Has anyone
written a plugin/extension along these lines?

If there isn't anything like this so far, then it seems like I have
two options. I can either abandon Spark and fall back on Hadoop and
Hive with the ESRI Tools extension, or I can stick with Spark and try
to write/port a GIS toolkit. Which option do you think I should
pursue? How hard is it for someone that's new to the Spark codebase to
write an extension? Is there anyone else in the community that would
be interested in having geospatial capability in Spark?
Thanks for your help!
-------------------------
 View this message in context: Is Spark a good choice for
geospatial/GIS applications? Is a community volunteer needed in this
area?
 Sent from the Apache Spark User List mailing list archive at
Nabble.com.


Re: Is Spark a good choice for geospatial/GIS applications? Is a community volunteer needed in this area?

Posted by Josh Marcus <jm...@meetup.com>.
Hey there,

I'd encourage you to check out the development currently going on with the
GeoTrellis project
(http://github.com/geotrellis/geotrellis) or talk to the developers on irc
(freenode, #geotrellis) as they're
currently developing raster processing capabilities with spark as a
backend, as well as scala wrappers
for JTS (for calculations about geometry features).

--j



On Wed, Apr 23, 2014 at 2:00 PM, neveroutgunned <ne...@hush.com>wrote:

> Greetings Spark users/devs! I'm interested in using Spark to process large
> volumes of data with a geospatial component, and I haven't been able to
> find much information on Spark's ability to handle this kind of operation.
> I don't need anything too complex; just distance between two points,
> point-in-polygon and the like.
>
> Does Spark (or possibly Shark) support this kind of query? Has anyone
> written a plugin/extension along these lines?
>
> If there isn't anything like this so far, then it seems like I have two
> options. I can either abandon Spark and fall back on Hadoop and Hive with
> the ESRI Tools extension, or I can stick with Spark and try to write/port a
> GIS toolkit. Which option do you think I should pursue? How hard is it for
> someone that's new to the Spark codebase to write an extension? Is there
> anyone else in the community that would be interested in having geospatial
> capability in Spark?
>
> Thanks for your help!
>
> ------------------------------
> View this message in context: Is Spark a good choice for geospatial/GIS
> applications? Is a community volunteer needed in this area?<http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-a-good-choice-for-geospatial-GIS-applications-Is-a-community-volunteer-needed-in-this-area-tp4685.html>
> Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>

Re: Is Spark a good choice for geospatial/GIS applications? Is a community volunteer needed in this area?

Posted by Debasish Das <de...@gmail.com>.
I am not sure the kind of operations you want....you can always put a list
of lat-long inside a census-block and add census block as a vertex in
graphx...and add edges within the census block...

Looks like set of code that uses graphx primitives...

Could you take a look at graphx apis and see if they suffice your GIS
usecase ? I have not looked into ESRI extensions...



On Wed, Apr 23, 2014 at 11:00 AM, neveroutgunned <ne...@hush.com>wrote:

> Greetings Spark users/devs! I'm interested in using Spark to process large
> volumes of data with a geospatial component, and I haven't been able to
> find much information on Spark's ability to handle this kind of operation.
> I don't need anything too complex; just distance between two points,
> point-in-polygon and the like.
>
> Does Spark (or possibly Shark) support this kind of query? Has anyone
> written a plugin/extension along these lines?
>
> If there isn't anything like this so far, then it seems like I have two
> options. I can either abandon Spark and fall back on Hadoop and Hive with
> the ESRI Tools extension, or I can stick with Spark and try to write/port a
> GIS toolkit. Which option do you think I should pursue? How hard is it for
> someone that's new to the Spark codebase to write an extension? Is there
> anyone else in the community that would be interested in having geospatial
> capability in Spark?
>
> Thanks for your help!
>
> ------------------------------
> View this message in context: Is Spark a good choice for geospatial/GIS
> applications? Is a community volunteer needed in this area?<http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-a-good-choice-for-geospatial-GIS-applications-Is-a-community-volunteer-needed-in-this-area-tp4685.html>
> Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>