You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Rob Emanuele <lo...@gmail.com> on 2013/11/07 20:49:07 UTC

Spark and geospatial data

Hello,

I'm a developer on the GeoTrellis project (http://geotrellis.github.io). We
do fast raster processing over large data sets, from web-time (sub-100ms)
processing for live endpoints to distributed raster analysis over clusters
using Akka clustering.

There's currently discussion underway about moving to support a Spark
backend for doing large scale distributed raster analysis. You can see the
discussion here:
https://groups.google.com/forum/#!topic/geotrellis-user/wkUOhFwYAvc. Any
contributions to the discussion would be welcome.

My question to the list is, is there currently any development towards a
geospatial data story for Spark, that is, using Spark for large scale
raster\vector spatial data analysis? Is there anyone using Spark currently
for this sort of work?

Thanks,
Rob Emanuele

Re: Spark and geospatial data

Posted by Rob Emanuele <re...@azavea.com>.
Hi Andy,

There would be a large architectural design effort if we decided to support
Spark, or replace our current internal actor system with Spark. My thoughts
are that the Spark DAG would be fully utilized in tracking lineage and
scheduling tasks for the Spark backend, while our current Actor system
would route operations using it's own mechanisms. There will have to be a
lot of thought put into where exactly the API would split between the Spark
backend and our own dedicated Actor system backed, and some harmonization
would need to happen; we'd love to incorporate a lot of the great ideas
Spark has for scheduling tasks, but also remain with a situation where
local and high speed use cases did not need to run through unnecessary
machinery, for performance in the small scale. This is all in early stages
of consideration, so any input in design ideas is very welcome!

The aim from the start of a Spark support story would be to implement all
GeoTrellis operations that currently support distribution over tiled
rasters to be supported in the Spark environment, so Map Algebra operations
like Classification would be carried over as a first step. As far as
feature extraction and pyramid generation, these are operations that
GeoTrellis currently does not have (besides basic vectorization
capabilities), as our focus has been more on implementing fast Map Algebra
operations, but these would certainly be great additions to any geospatial
data analysis library.

Thanks for your ideas, and looking forward to your participation.

Cheers,
Rob


On Thu, Nov 7, 2013 at 3:05 PM, andy petrella <an...@gmail.com>wrote:

> Hello Rob,
>
> As you may know I have a long experience in Geospatial data, and I'm now
> investigating Spark... So I'll be very interested further answers but also
> to participate to going forward on this great idea!
>
> For instance, I'd say that implementing classical geospatial algorithms
> like classification, feature extraction, pyramid generation and so on would
> be a geo-extension lib to Spark, this would be easier using Geotrellis API.
>
> My only question, for now, is that Geotrellis has his own notion of
> lineage and Spark as well, so maybe some harmonization work will have to be
> done to serialize and schedule them? Maybe Pickles could help for the
> serialization part...
>
> Sorry If I miss something (or even said stupidities ^^)... I'm going now
> to the thread you mentioned!
>
> Looking forward ;)
>
> Cheers
> andy
>
>
> On Thu, Nov 7, 2013 at 8:49 PM, Rob Emanuele <lo...@gmail.com> wrote:
>
>> Hello,
>>
>> I'm a developer on the GeoTrellis project (http://geotrellis.github.io).
>> We do fast raster processing over large data sets, from web-time
>> (sub-100ms) processing for live endpoints to distributed raster analysis
>> over clusters using Akka clustering.
>>
>> There's currently discussion underway about moving to support a Spark
>> backend for doing large scale distributed raster analysis. You can see the
>> discussion here:
>> https://groups.google.com/forum/#!topic/geotrellis-user/wkUOhFwYAvc. Any
>> contributions to the discussion would be welcome.
>>
>> My question to the list is, is there currently any development towards a
>> geospatial data story for Spark, that is, using Spark for large scale
>> raster\vector spatial data analysis? Is there anyone using Spark currently
>> for this sort of work?
>>
>> Thanks,
>> Rob Emanuele
>>
>
>


-- 
Rob Emanuele, GIS Software Engineer

Azavea |  340 N 12th St, Ste 402, Philadelphia, PA
remanuele@azavea.com  | T 215.701.7692  | F 215.925.2663
Web azavea.com <http://www.azavea.com/>  |  Blog
azavea.com/blogs<http://www.azavea.com/Blogs>
| Twitter @azavea <http://twitter.com/azavea>

Re: Spark and geospatial data

Posted by andy petrella <an...@gmail.com>.
Hello Rob,

As you may know I have a long experience in Geospatial data, and I'm now
investigating Spark... So I'll be very interested further answers but also
to participate to going forward on this great idea!

For instance, I'd say that implementing classical geospatial algorithms
like classification, feature extraction, pyramid generation and so on would
be a geo-extension lib to Spark, this would be easier using Geotrellis API.

My only question, for now, is that Geotrellis has his own notion of lineage
and Spark as well, so maybe some harmonization work will have to be done to
serialize and schedule them? Maybe Pickles could help for the serialization
part...

Sorry If I miss something (or even said stupidities ^^)... I'm going now to
the thread you mentioned!

Looking forward ;)

Cheers
andy


On Thu, Nov 7, 2013 at 8:49 PM, Rob Emanuele <lo...@gmail.com> wrote:

> Hello,
>
> I'm a developer on the GeoTrellis project (http://geotrellis.github.io).
> We do fast raster processing over large data sets, from web-time
> (sub-100ms) processing for live endpoints to distributed raster analysis
> over clusters using Akka clustering.
>
> There's currently discussion underway about moving to support a Spark
> backend for doing large scale distributed raster analysis. You can see the
> discussion here:
> https://groups.google.com/forum/#!topic/geotrellis-user/wkUOhFwYAvc. Any
> contributions to the discussion would be welcome.
>
> My question to the list is, is there currently any development towards a
> geospatial data story for Spark, that is, using Spark for large scale
> raster\vector spatial data analysis? Is there anyone using Spark currently
> for this sort of work?
>
> Thanks,
> Rob Emanuele
>

Re: Spark and geospatial data

Posted by "Mattmann, Chris A (398J)" <ch...@jpl.nasa.gov>.
Guh this reply didn't go through so sending again.


-----Original Message-----
From: jpluser <ma...@apache.org>
Date: Thursday, November 7, 2013 11:50 AM
To: "user@spark.incubator.apache.org" <us...@spark.incubator.apache.org>
Cc: <de...@sis.apache.org>
Subject: Re: Spark and geospatial data

>Hey Rob,
>
>That is awesome to hear! We have an effort going on in Apache SIS right
>now
>http://sis.apache.org/ and we've had some discussions but no one has
>implement
>a Spark backend yet. I'd love to see it. It would be great to see how
>GeoTrellis
>fits in.
>
>CC'ing the dev@sis.a.o list to loop them in.
>
>Cheers,
>Chris
>
>
>
>
>-----Original Message-----
>From: Rob Emanuele <lo...@gmail.com>
>Reply-To: "user@spark.incubator.apache.org"
><us...@spark.incubator.apache.org>
>Date: Thursday, November 7, 2013 12:49 PM
>To: user <us...@spark.incubator.apache.org>
>Subject: Spark and geospatial data
>
>>Hello,
>>
>>I'm a developer on the GeoTrellis project (http://geotrellis.github.io
>><http://geotrellis.github.io/>). We do fast raster processing over large
>>data sets, from web-time (sub-100ms)
>> processing for live endpoints to distributed raster analysis over
>>clusters using Akka clustering.
>>
>>
>>There's currently discussion underway about moving to support a Spark
>>backend for doing large scale distributed raster analysis. You can see
>>the discussion here:
>>https://groups.google.com/forum/#!topic/geotrellis-user/wkUOhFwYAvc.
>> Any contributions to the discussion would be welcome.
>>
>>
>>My question to the list is, is there currently any development towards a
>>geospatial data story for Spark, that is, using Spark for large scale
>>raster\vector spatial data analysis? Is there anyone using
>> Spark currently for this sort of work?
>>
>>
>>Thanks,
>>Rob Emanuele
>>
>
>


Re: Spark and geospatial data

Posted by Chris Mattmann <ma...@apache.org>.
Hey Rob,

That is awesome to hear! We have an effort going on in Apache SIS right now
http://sis.apache.org/ and we've had some discussions but no one has
implement
a Spark backend yet. I'd love to see it. It would be great to see how
GeoTrellis
fits in.

CC'ing the dev@sis.a.o list to loop them in.

Cheers,
Chris




-----Original Message-----
From: Rob Emanuele <lo...@gmail.com>
Reply-To: "user@spark.incubator.apache.org"
<us...@spark.incubator.apache.org>
Date: Thursday, November 7, 2013 12:49 PM
To: user <us...@spark.incubator.apache.org>
Subject: Spark and geospatial data

>Hello,
>
>I'm a developer on the GeoTrellis project (http://geotrellis.github.io
><http://geotrellis.github.io/>). We do fast raster processing over large
>data sets, from web-time (sub-100ms)
> processing for live endpoints to distributed raster analysis over
>clusters using Akka clustering.
>
>
>There's currently discussion underway about moving to support a Spark
>backend for doing large scale distributed raster analysis. You can see
>the discussion here:
>https://groups.google.com/forum/#!topic/geotrellis-user/wkUOhFwYAvc.
> Any contributions to the discussion would be welcome.
>
>
>My question to the list is, is there currently any development towards a
>geospatial data story for Spark, that is, using Spark for large scale
>raster\vector spatial data analysis? Is there anyone using
> Spark currently for this sort of work?
>
>
>Thanks,
>Rob Emanuele
>