You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Stephane Nicoll <st...@gmail.com> on 2008/02/17 11:22:01 UTC

Using lucene with a Geospatial catalog

Hi,

I've been browsing the archive and the documentation about Lucene. It
really seems that it could help implementing my use case but I would
like to be sure first.

What I need is to be able to search data in a "catalog" which is
geo-enabled. The data is stored in a database. A record has namely

* name
* keywords
* footprint (that is a geometry that represents the record)
* date range (optional) that defines the "validity of the data"
* Timestamp
* Creation date
* various boolean flag
* custom data

I understand that Lucene is powerful for full-text based search but
what about searching something like "give me all records that contains
the kewords foo, with flag bar true, valid between 20070602 and
20070907 and whose geometry intersect a given box.

I've seen on the list people using tricks by storing the coordinates
in a way we could use range. In my case, the geometry is potentially
very complex. The database handle that for me (Oracle Spatial or
PostGIS for instance) with intersect, contains and such. Is it
possible to combine a lucene search with a DB query? Any best
practices on that topic?

Thanks,
Stéphane


-- 
Large Systems Suck: This rule is 100% transitive. If you build one,
you suck" -- S.Yegge

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Using lucene with a Geospatial catalog

Posted by Stephane Nicoll <st...@gmail.com>.
Hi,

Thanks for the fast answer. That's also what I sort of figured out by
searching on the web but it's good to know someone has implemented it
:)

The "rough" positioning is a very good idea, thanks. I am pretty sure
we have the kind of algorithm you are looking for but it is in a
commercial product. Ping me off list if you are interested.

Stéphane

On Feb 17, 2008 12:43 PM, Max Metral <ma...@artsalliancelabs.com> wrote:
> We're doing this for our site (http://boston.povo.com) the simple way: have Lucene return all matches based on non-geo criteria and then fetch the items from the db by id and run our geo logic.  We store some "rough" positioning in Lucene, such as the region and use that for first level rejection.  (If your geometries can get artificially large then of course this doesn't fully work, but in that case perhaps you store "total area" or some flag that says this item spans a region.
>
> We briefly considered a bounding box model for Lucene to do better first level rejection, but performance was more than good enough with the approach I outlined above for our corpus.
>
> There's one thing we never implemented, which was calculating the minimum distance between two geometries (we almost always have one side of the comparison as a point).  Do you happen to know a reasonably speedy algorithm to do this?
>
> Thanks!
> --Max
>
>
> -----Original Message-----
> From: Stephane Nicoll [mailto:stephane.nicoll@gmail.com]
> Sent: Sunday, February 17, 2008 5:22 AM
> To: java-user@lucene.apache.org
> Subject: Using lucene with a Geospatial catalog
>
> Hi,
>
> I've been browsing the archive and the documentation about Lucene. It
> really seems that it could help implementing my use case but I would
> like to be sure first.
>
> What I need is to be able to search data in a "catalog" which is
> geo-enabled. The data is stored in a database. A record has namely
>
> * name
> * keywords
> * footprint (that is a geometry that represents the record)
> * date range (optional) that defines the "validity of the data"
> * Timestamp
> * Creation date
> * various boolean flag
> * custom data
>
> I understand that Lucene is powerful for full-text based search but
> what about searching something like "give me all records that contains
> the kewords foo, with flag bar true, valid between 20070602 and
> 20070907 and whose geometry intersect a given box.
>
> I've seen on the list people using tricks by storing the coordinates
> in a way we could use range. In my case, the geometry is potentially
> very complex. The database handle that for me (Oracle Spatial or
> PostGIS for instance) with intersect, contains and such. Is it
> possible to combine a lucene search with a DB query? Any best
> practices on that topic?
>
> Thanks,
> Stéphane
>
>
> --
> Large Systems Suck: This rule is 100% transitive. If you build one,
> you suck" -- S.Yegge
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Large Systems Suck: This rule is 100% transitive. If you build one,
you suck" -- S.Yegge

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Using lucene with a Geospatial catalog

Posted by Max Metral <ma...@artsalliancelabs.com>.
We're doing this for our site (http://boston.povo.com) the simple way: have Lucene return all matches based on non-geo criteria and then fetch the items from the db by id and run our geo logic.  We store some "rough" positioning in Lucene, such as the region and use that for first level rejection.  (If your geometries can get artificially large then of course this doesn't fully work, but in that case perhaps you store "total area" or some flag that says this item spans a region.

We briefly considered a bounding box model for Lucene to do better first level rejection, but performance was more than good enough with the approach I outlined above for our corpus.

There's one thing we never implemented, which was calculating the minimum distance between two geometries (we almost always have one side of the comparison as a point).  Do you happen to know a reasonably speedy algorithm to do this?

Thanks!
--Max

-----Original Message-----
From: Stephane Nicoll [mailto:stephane.nicoll@gmail.com] 
Sent: Sunday, February 17, 2008 5:22 AM
To: java-user@lucene.apache.org
Subject: Using lucene with a Geospatial catalog

Hi,

I've been browsing the archive and the documentation about Lucene. It
really seems that it could help implementing my use case but I would
like to be sure first.

What I need is to be able to search data in a "catalog" which is
geo-enabled. The data is stored in a database. A record has namely

* name
* keywords
* footprint (that is a geometry that represents the record)
* date range (optional) that defines the "validity of the data"
* Timestamp
* Creation date
* various boolean flag
* custom data

I understand that Lucene is powerful for full-text based search but
what about searching something like "give me all records that contains
the kewords foo, with flag bar true, valid between 20070602 and
20070907 and whose geometry intersect a given box.

I've seen on the list people using tricks by storing the coordinates
in a way we could use range. In my case, the geometry is potentially
very complex. The database handle that for me (Oracle Spatial or
PostGIS for instance) with intersect, contains and such. Is it
possible to combine a lucene search with a DB query? Any best
practices on that topic?

Thanks,
Stéphane


-- 
Large Systems Suck: This rule is 100% transitive. If you build one,
you suck" -- S.Yegge

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Using lucene with a Geospatial catalog

Posted by John Wang <jo...@gmail.com>.
Check out www.browseengine.com, it is an open source meta engine on top of
lucene.
-John

On Feb 17, 2008 2:22 AM, Stephane Nicoll <st...@gmail.com> wrote:

> Hi,
>
> I've been browsing the archive and the documentation about Lucene. It
> really seems that it could help implementing my use case but I would
> like to be sure first.
>
> What I need is to be able to search data in a "catalog" which is
> geo-enabled. The data is stored in a database. A record has namely
>
> * name
> * keywords
> * footprint (that is a geometry that represents the record)
> * date range (optional) that defines the "validity of the data"
> * Timestamp
> * Creation date
> * various boolean flag
> * custom data
>
> I understand that Lucene is powerful for full-text based search but
> what about searching something like "give me all records that contains
> the kewords foo, with flag bar true, valid between 20070602 and
> 20070907 and whose geometry intersect a given box.
>
> I've seen on the list people using tricks by storing the coordinates
> in a way we could use range. In my case, the geometry is potentially
> very complex. The database handle that for me (Oracle Spatial or
> PostGIS for instance) with intersect, contains and such. Is it
> possible to combine a lucene search with a DB query? Any best
> practices on that topic?
>
> Thanks,
> Stéphane
>
>
> --
> Large Systems Suck: This rule is 100% transitive. If you build one,
> you suck" -- S.Yegge
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>