You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Fred Zappert <fz...@gmail.com> on 2009/06/19 21:22:21 UTC

Spatial Databases on HBase (or Hadoop)

Hi,

I would like to know if anyone is using HBase for spatial databases.

The requirements are relatively simple.

1. Two dimensions.
2. Each object represented as a point.
3. Basic query is nearest neighbor, with a few qualifications such as:
a

Re: Spatial Databases on HBase (or Hadoop)

Posted by Fred Zappert <fz...@gmail.com>.

Hi - completing the message....

On Fri, Jun 19, 2009 at 2:22 PM, Fred Zappert <fz...@gmail.com> wrote:

> Hi,
>
> I would like to know if anyone is using HBase for spatial databases.
>
> The requirements are relatively simple.
>
> 1. Two dimensions.
> 2. Each object represented as a point.
> 3. Basic query is nearest neighbor, with a few qualifications such as:
>    a: Member of the same group.

       b:Status

Thanks,

Fred.

Re: Spatial Databases on HBase (or Hadoop)

Posted by Fred Zappert <fz...@gmail.com>.

Tim,

Thanks again for drilling into a very sound solution.

This problem is further partitioned because the vehicles and way points
belong to fleets, and there are several thousand fleets being tracked.

I need to look into the current implementation to see if there is any
prediction going on, because the current reporting intervals for the
vehicles is 15 minutes.

However, part of the architecture we're developing is intended to deal with
many more vehicles, and a reporting interval of several times/minute.

I would also expect that there are many way points that are common, such as
weigh stations, and loading docks that are serviced by multiple fleets.

I'm new to the map-reduce paradigm, and this is a great example of its
utility.  Most of the GIS databases are extensions to traditional databases
(Oracle, Postgres, and MySQL), and it's nice to see how those are not
needed, at least for this application.

Regards,

Fred.

On Sat, Jun 20, 2009 at 6:45 AM, tim robertson <ti...@gmail.com>wrote:

> Hi Fred,
>
> So I am guessing then your "real time" calculations are all going to
> be focused about the moving vehicles right?
> If the way-points are relatively static you can preprocess information
> about those offline (distance between each, data mining average time
> taken to travel between 2  etc).
>
> So I am guessing you would need to find way-points relative to a given
> vehicle - if this is the case, I think you are going to need to
> investigate some kind of index for the way-points.  We do this for our
> 150 million points by putting them in an identified 1 degree x 1
> degree cell (and then 0.1 x 0.1 degree cells), so that if someone is
> interested in points near a location, we first determine which cells
> are candidates and immediately we have reduced the candidate points to
> check.
>
> In database terms, we have latitude, longitude and then create a
> (cell_id int, centi_cell_id int).
>
> If you know the routes that a vehicle is taking, is there any way you
> could preplan it's route perhaps and cache that, or store somehow
> known routes between way-points?  This might allow you to really
> reduce the candidates to check.
>
> Just some ideas
>
> Tim
> skype: timrobertson100
>
>
>
>
>
> On Fri, Jun 19, 2009 at 10:16 PM, Fred Zappert<fz...@gmail.com> wrote:
> > Tim,
> >
> > Thanks so much for the additional links.
> >
> > Our problem is for the moment much smaller - 4,000,000 mapped way-points,
> > and 80,000 moving vehicles.
> >
> > Clustering the way-points into polygons makes a lot of sense.
> >
> > Fred.
> >
> > On Fri, Jun 19, 2009 at 2:43 PM, tim robertson <
> timrobertson100@gmail.com>wrote:
> >
> >> Hi Fred,
> >>
> >> I was working on 150million point records, and 150,000 fairly detailed
> >> polygons.  I had to batch it up and do 40,000 polygons in memory at a
> >> time on the MapReduce jobs.
> >>
> >> If you are dealing with a whole bunch of points, might it be worth
> >> clustering them into polygons first to get candidate points?
> >> We are running this:
> >> http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and
> >> clustering 1 million points into multipolygons in 5 seconds.  This
> >> might get the numbers down to a sensible number.
> >>
> >> It is a problem of great interest to us also, so happy to discuss
> >> ideas...
> >>
> http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html
> >> was one of my early tests.
> >>
> >> Cheers
> >>
> >> Tim
> >>
> >>
> >> On Fri, Jun 19, 2009 at 9:37 PM, Fred Zappert<fz...@gmail.com>
> wrote:
> >> > Tim,
> >> >
> >> > Thanks. That suggests an implementation that could be very effective
> at
> >> the
> >> > current scale.
> >> >
> >> > Regards,
> >> >
> >> > Fred.
> >> >
> >> > On Fri, Jun 19, 2009 at 2:27 PM, tim robertson <
> >> timrobertson100@gmail.com>wrote:
> >> >
> >> >> I've used it as a source for a bunch of point data, and then tested
> >> >> them in polygons with a contains().  I ended up loading the polygons
> >> >> into memory with an RTree index though using the GeoTools libraries.
> >> >>
> >> >> Cheers
> >> >>
> >> >> Tim
> >> >>
> >> >>
> >> >> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<fz...@gmail.com>
> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > I would like to know if anyone is using HBase for spatial
> databases.
> >> >> >
> >> >> > The requirements are relatively simple.
> >> >> >
> >> >> > 1. Two dimensions.
> >> >> > 2. Each object represented as a point.
> >> >> > 3. Basic query is nearest neighbor, with a few qualifications such
> as:
> >> >> > a
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Spatial Databases on HBase (or Hadoop)

Posted by tim robertson <ti...@gmail.com>.

Hi Fred,

So I am guessing then your "real time" calculations are all going to
be focused about the moving vehicles right?
If the way-points are relatively static you can preprocess information
about those offline (distance between each, data mining average time
taken to travel between 2  etc).

So I am guessing you would need to find way-points relative to a given
vehicle - if this is the case, I think you are going to need to
investigate some kind of index for the way-points.  We do this for our
150 million points by putting them in an identified 1 degree x 1
degree cell (and then 0.1 x 0.1 degree cells), so that if someone is
interested in points near a location, we first determine which cells
are candidates and immediately we have reduced the candidate points to
check.

In database terms, we have latitude, longitude and then create a
(cell_id int, centi_cell_id int).

If you know the routes that a vehicle is taking, is there any way you
could preplan it's route perhaps and cache that, or store somehow
known routes between way-points?  This might allow you to really
reduce the candidates to check.

Just some ideas

Tim
skype: timrobertson100

On Fri, Jun 19, 2009 at 10:16 PM, Fred Zappert<fz...@gmail.com> wrote:
> Tim,
>
> Thanks so much for the additional links.
>
> Our problem is for the moment much smaller - 4,000,000 mapped way-points,
> and 80,000 moving vehicles.
>
> Clustering the way-points into polygons makes a lot of sense.
>
> Fred.
>
> On Fri, Jun 19, 2009 at 2:43 PM, tim robertson <ti...@gmail.com>wrote:
>
>> Hi Fred,
>>
>> I was working on 150million point records, and 150,000 fairly detailed
>> polygons.  I had to batch it up and do 40,000 polygons in memory at a
>> time on the MapReduce jobs.
>>
>> If you are dealing with a whole bunch of points, might it be worth
>> clustering them into polygons first to get candidate points?
>> We are running this:
>> http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and
>> clustering 1 million points into multipolygons in 5 seconds.  This
>> might get the numbers down to a sensible number.
>>
>> It is a problem of great interest to us also, so happy to discuss
>> ideas...
>> http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html
>> was one of my early tests.
>>
>> Cheers
>>
>> Tim
>>
>>
>> On Fri, Jun 19, 2009 at 9:37 PM, Fred Zappert<fz...@gmail.com> wrote:
>> > Tim,
>> >
>> > Thanks. That suggests an implementation that could be very effective at
>> the
>> > current scale.
>> >
>> > Regards,
>> >
>> > Fred.
>> >
>> > On Fri, Jun 19, 2009 at 2:27 PM, tim robertson <
>> timrobertson100@gmail.com>wrote:
>> >
>> >> I've used it as a source for a bunch of point data, and then tested
>> >> them in polygons with a contains().  I ended up loading the polygons
>> >> into memory with an RTree index though using the GeoTools libraries.
>> >>
>> >> Cheers
>> >>
>> >> Tim
>> >>
>> >>
>> >> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<fz...@gmail.com>
>> wrote:
>> >> > Hi,
>> >> >
>> >> > I would like to know if anyone is using HBase for spatial databases.
>> >> >
>> >> > The requirements are relatively simple.
>> >> >
>> >> > 1. Two dimensions.
>> >> > 2. Each object represented as a point.
>> >> > 3. Basic query is nearest neighbor, with a few qualifications such as:
>> >> > a
>> >> >
>> >>
>> >
>>
>

Re: Spatial Databases on HBase (or Hadoop)

Posted by Fred Zappert <fz...@gmail.com>.

Tim,

Thanks so much for the additional links.

Our problem is for the moment much smaller - 4,000,000 mapped way-points,
and 80,000 moving vehicles.

Clustering the way-points into polygons makes a lot of sense.

Fred.

On Fri, Jun 19, 2009 at 2:43 PM, tim robertson <ti...@gmail.com>wrote:

> Hi Fred,
>
> I was working on 150million point records, and 150,000 fairly detailed
> polygons.  I had to batch it up and do 40,000 polygons in memory at a
> time on the MapReduce jobs.
>
> If you are dealing with a whole bunch of points, might it be worth
> clustering them into polygons first to get candidate points?
> We are running this:
> http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and
> clustering 1 million points into multipolygons in 5 seconds.  This
> might get the numbers down to a sensible number.
>
> It is a problem of great interest to us also, so happy to discuss
> ideas...
> http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html
> was one of my early tests.
>
> Cheers
>
> Tim
>
>
> On Fri, Jun 19, 2009 at 9:37 PM, Fred Zappert<fz...@gmail.com> wrote:
> > Tim,
> >
> > Thanks. That suggests an implementation that could be very effective at
> the
> > current scale.
> >
> > Regards,
> >
> > Fred.
> >
> > On Fri, Jun 19, 2009 at 2:27 PM, tim robertson <
> timrobertson100@gmail.com>wrote:
> >
> >> I've used it as a source for a bunch of point data, and then tested
> >> them in polygons with a contains().  I ended up loading the polygons
> >> into memory with an RTree index though using the GeoTools libraries.
> >>
> >> Cheers
> >>
> >> Tim
> >>
> >>
> >> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<fz...@gmail.com>
> wrote:
> >> > Hi,
> >> >
> >> > I would like to know if anyone is using HBase for spatial databases.
> >> >
> >> > The requirements are relatively simple.
> >> >
> >> > 1. Two dimensions.
> >> > 2. Each object represented as a point.
> >> > 3. Basic query is nearest neighbor, with a few qualifications such as:
> >> > a
> >> >
> >>
> >
>

Re: Spatial Databases on HBase (or Hadoop)

Posted by tim robertson <ti...@gmail.com>.

Hi Fred,

I was working on 150million point records, and 150,000 fairly detailed
polygons.  I had to batch it up and do 40,000 polygons in memory at a
time on the MapReduce jobs.

If you are dealing with a whole bunch of points, might it be worth
clustering them into polygons first to get candidate points?
We are running this:
http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and
clustering 1 million points into multipolygons in 5 seconds.  This
might get the numbers down to a sensible number.

It is a problem of great interest to us also, so happy to discuss
ideas... http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html
was one of my early tests.

Cheers

Tim

On Fri, Jun 19, 2009 at 9:37 PM, Fred Zappert<fz...@gmail.com> wrote:
> Tim,
>
> Thanks. That suggests an implementation that could be very effective at the
> current scale.
>
> Regards,
>
> Fred.
>
> On Fri, Jun 19, 2009 at 2:27 PM, tim robertson <ti...@gmail.com>wrote:
>
>> I've used it as a source for a bunch of point data, and then tested
>> them in polygons with a contains().  I ended up loading the polygons
>> into memory with an RTree index though using the GeoTools libraries.
>>
>> Cheers
>>
>> Tim
>>
>>
>> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<fz...@gmail.com> wrote:
>> > Hi,
>> >
>> > I would like to know if anyone is using HBase for spatial databases.
>> >
>> > The requirements are relatively simple.
>> >
>> > 1. Two dimensions.
>> > 2. Each object represented as a point.
>> > 3. Basic query is nearest neighbor, with a few qualifications such as:
>> > a
>> >
>>
>

Re: Spatial Databases on HBase (or Hadoop)

Posted by Fred Zappert <fz...@gmail.com>.

Tim,

Thanks. That suggests an implementation that could be very effective at the
current scale.

Regards,

Fred.

On Fri, Jun 19, 2009 at 2:27 PM, tim robertson <ti...@gmail.com>wrote:

> I've used it as a source for a bunch of point data, and then tested
> them in polygons with a contains().  I ended up loading the polygons
> into memory with an RTree index though using the GeoTools libraries.
>
> Cheers
>
> Tim
>
>
> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<fz...@gmail.com> wrote:
> > Hi,
> >
> > I would like to know if anyone is using HBase for spatial databases.
> >
> > The requirements are relatively simple.
> >
> > 1. Two dimensions.
> > 2. Each object represented as a point.
> > 3. Basic query is nearest neighbor, with a few qualifications such as:
> > a
> >
>

Re: Spatial Databases on HBase (or Hadoop)

Posted by tim robertson <ti...@gmail.com>.

I've used it as a source for a bunch of point data, and then tested
them in polygons with a contains().  I ended up loading the polygons
into memory with an RTree index though using the GeoTools libraries.

Cheers

Tim

On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<fz...@gmail.com> wrote:
> Hi,
>
> I would like to know if anyone is using HBase for spatial databases.
>
> The requirements are relatively simple.
>
> 1. Two dimensions.
> 2. Each object represented as a point.
> 3. Basic query is nearest neighbor, with a few qualifications such as:
> a
>