You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sis.apache.org by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2012/03/17 03:21:01 UTC

Fwd: A GIS contains() for Hive?

All, FYI...

Begin forwarded message:

> From: "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>
> Date: March 16, 2012 6:49:58 PM PDT
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Cc: "hive-user@hadoop.apache.org" <hi...@hadoop.apache.org>
> Subject: Re: A GIS contains() for Hive?
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> 
> Hi Tim,
> 
> Over in the SIS community [1], eventually writing a driver for Hive or HBase to have spatial
> support a la PostGIS is something that we've wanted to get around to, but haven't yet. The 
> goal of SIS is to be an ALv2 licensed spatial toolkit, with no surprises [2]. If you are interested
> in contributing to the SIS community and helping out, I'd certainly appreciate it. As would I
> appreciate anyone in the HIVE community that has time to help us write the HIVE driver for SIS.
> We currently have the ability to support point/radius and bbox QuadTree based searches, and
> the loading of GeoRSS data into the QuadTree index.
> 
> Cheers,
> Chris
> 
> [1] http://incubator.apache.org/sis/
> [2] http://wiki.apache.org/incubator/SpatialProposal/
> 
> On Mar 16, 2012, at 2:21 AM, Tim Robertson wrote:
> 
>> Hi all,
>> 
>> I need to perform a lot of "point in polygon" checks and want to use Hive (currently I mix Hive, Sqoop and PostGIS in an Oozie workto do this).
>> 
>> In an ideal world, I would like to create a Hive table from a Shapefile containing polygons, and then do the likes of the following:
>> 
>>  SELECT p.id, pp.id FROM points p, polygons pp WHERE pp.contains(geom, toPoint(p.lat,p.lng)) 
>> 
>> Has anyone done anything along these lines?
>> 
>> Alternatively I am capable of doing a UDF that would read the shape file into memory and basically do a map side join using something like a slab decomposition technique.  It is more limited but would meet my needs allowing e.g.:
>> 
>>  SELECT contains(p.lat,p.lng, '/data/shapefiles/countries.shp') FROM points;
>> 
>> Before I start I thought I'd ask folks as I suspect people are doing this kind of thing on Hive by now (thinking FB and user profiling by political boundaries etc)
>> 
>> I'd love to hear from anyone who's investigated this or could provide any advice.
>> 
>> Thanks!
>> Tim
>> 
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++