You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Barry Carter <ba...@bigfoot.com> on 2005/07/28 03:42:18 UTC

Lucene vs Derby (vs MySQL) for spatial indexing

Does Lucene optimize range queries that use Sort and/or limit the number 
of hits?

My situation: I have a listing of 2 million cities, with the name,
latitude, longitude, and population of each city. I want to efficiently
find the 50 most populous cities between (for example) latitudes 35.2 and
41.7 and longitudes 19.8 and 27.9

Assuming I normalize the data to be lexically sorted (in other words, I'll
write 7.52 as 007.52, so it comes before 111.01 instead of after it), can
I use a range query on the latitude and longitude fields (limiting the
number of hits to 50, and sorting by population descending) to efficiently
find what I want?

If sorting isn't efficient, can I simply boost each record by its
population (so that high population cities are returned first) and then
limit the number of hits (so I see only the 50 most populous cities in a
given area)?

I tried this in Derby, the code being:

Statement s = DriverManager.getConnection("jdbc:derby:test;create=false").createStatement();
s.setMaxRows(50);
rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7 and lon>19.8 and lon<27.9 ORDER BY population desc");

but Derby inefficiently looks at ALL the cities matching my criteria (even
with indexes on lat and lon and population) before returning the top 50
(this is really bad when the condition is "lat>-90 and lat<90 and lon>-180
and lon<-180", for example).

The MySQL equivalent ("SELECT * FROM cities where lat>35.2 and lat<41.7
and lon>19.8 and lon<27.9 ORDER BY population desc LIMIT 50") with the
same indexes is more efficient (it uses the LIMIT condition to optimize
the query), and using MySQL w/ spatial indexes is even more efficient.  
However, I'm doing this as part of a Java application, so need something
that can be embedded in Java.

Is this a reasonable use of Lucene? Or is coercing Lucene into doing
range-based numeric queries a bad idea?

(In case anyone's interested, I'm writing a zoomable/pannable world map,
so finding the biggest cities in a given area quickly is important)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene vs Derby (vs MySQL) for spatial indexing

Posted by markharw00d <ma...@yahoo.co.uk>.
MySQL has spatial extensions now too.
Your queries lack any free-text criteria so are probably best handled by 
a database, not Lucene..

 >>In case anyone's interested, I'm writing a zoomable/pannable world map

Save yourself some time. Just use the Google maps API.  :-)


		
___________________________________________________________ 
How much free photo storage do you get? Store your holiday 
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Off Topic: Lucene vs Derby (vs MySQL) for spatial indexing

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Ah, I saw it about a month or two ago when moving Simpy to PostgreSQL
8.0.3.  I think I saw mentions of Java inside PostgreSQL in a
development version (8.1.*).

Otis
--
http://simpy.com


--- Dan Armbrust <da...@gmail.com> wrote:

> Otis Gospodnetic wrote:
> 
> >You may also want to consider PostgreSQL for a few reasons:  <snip>
> >3) it seems that the new
> >versions let you embed Java directly into the database (perhaps
> >something like Oracle's Java-embedding thing).
> >  
> >
> Really?  I realize this is off topic, but could you point me to some 
> documentation on that? 
> 
> Are you saying it may be possible to embed lucene into PostgreSQL,
> and 
> use it for text searching?
> 
> Dan
> 
> -- 
> ****************************
> Daniel Armbrust
> Biomedical Informatics
> Mayo Clinic Rochester
> daniel.armbrust(at)mayo.edu
> http://informatics.mayo.edu/
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Off Topic: Lucene vs Derby (vs MySQL) for spatial indexing

Posted by Dan Armbrust <da...@gmail.com>.
Otis Gospodnetic wrote:

>You may also want to consider PostgreSQL for a few reasons:  <snip>
>3) it seems that the new
>versions let you embed Java directly into the database (perhaps
>something like Oracle's Java-embedding thing).
>  
>
Really?  I realize this is off topic, but could you point me to some 
documentation on that? 

Are you saying it may be possible to embed lucene into PostgreSQL, and 
use it for text searching?

Dan

-- 
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene vs Derby (vs MySQL) for spatial indexing

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Barry,

You may also want to consider PostgreSQL for a few reasons: 1) it's
historically known to work well for geo-spatial data, 2) has
GIS/geo-spatial data types and such, and 3) it seems that the new
versions let you embed Java directly into the database (perhaps
something like Oracle's Java-embedding thing).

Otis


--- Barry Carter <ba...@bigfoot.com> wrote:

> Does Lucene optimize range queries that use Sort and/or limit the
> number 
> of hits?
> 
> My situation: I have a listing of 2 million cities, with the name,
> latitude, longitude, and population of each city. I want to
> efficiently
> find the 50 most populous cities between (for example) latitudes 35.2
> and
> 41.7 and longitudes 19.8 and 27.9
> 
> Assuming I normalize the data to be lexically sorted (in other words,
> I'll
> write 7.52 as 007.52, so it comes before 111.01 instead of after it),
> can
> I use a range query on the latitude and longitude fields (limiting
> the
> number of hits to 50, and sorting by population descending) to
> efficiently
> find what I want?
> 
> If sorting isn't efficient, can I simply boost each record by its
> population (so that high population cities are returned first) and
> then
> limit the number of hits (so I see only the 50 most populous cities
> in a
> given area)?
> 
> I tried this in Derby, the code being:
> 
> Statement s =
>
DriverManager.getConnection("jdbc:derby:test;create=false").createStatement();
> s.setMaxRows(50);
> rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7
> and lon>19.8 and lon<27.9 ORDER BY population desc");
> 
> but Derby inefficiently looks at ALL the cities matching my criteria
> (even
> with indexes on lat and lon and population) before returning the top
> 50
> (this is really bad when the condition is "lat>-90 and lat<90 and
> lon>-180
> and lon<-180", for example).
> 
> The MySQL equivalent ("SELECT * FROM cities where lat>35.2 and
> lat<41.7
> and lon>19.8 and lon<27.9 ORDER BY population desc LIMIT 50") with
> the
> same indexes is more efficient (it uses the LIMIT condition to
> optimize
> the query), and using MySQL w/ spatial indexes is even more
> efficient.  
> However, I'm doing this as part of a Java application, so need
> something
> that can be embedded in Java.
> 
> Is this a reasonable use of Lucene? Or is coercing Lucene into doing
> range-based numeric queries a bad idea?
> 
> (In case anyone's interested, I'm writing a zoomable/pannable world
> map,
> so finding the biggest cities in a given area quickly is important)
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org