You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mike <mi...@musicware.com> on 2007/08/29 17:15:16 UTC

Postal Code Radius Search

I've searched the mailing list archives, the web, read the FAQ, etc and I
don't see anything relevant so here it goes…

I'm trying to implement a radius based searching based on zip/postal codes.
 (The user enters their zip code and I show nearby matches under x miles
away sorted by linear distance.)  I already have the data required to pull
this off (zip codes, long/lat coordinates, etc.)   Extreme accuracy is not a
requirement.  It just needs to be an approximation (plus or minus a few
miles.)

What I'm looking for is a little direction.  How have others implemented
this type of search?  What are the pros/cons of various methods?  I have a
few ideas but obviously none of them are very good or I guess I wouldn't be
here asking.  ;)

By the way, my index is updated about every 10 minutes and holds about
25,000 records.  However, this may increase in the next year or so to
hundreds of thousands.  So whatever I do needs to be fairly scalable.  The
items being searched as well as the people searching will be located all
over the world.   Some areas may be busier than others so there is an
opportunity for caching more common locals.

Thank you for your time.  I'd appreciate any suggestions that you can give.

- Mike

RE: Postal Code Radius Search

Posted by Charles Patridge <ch...@fullcapture.com>.
Will,

http://www.sconsig.com/sastips/tip00156.htm

This is an example I used written in SAS code which should be able to
convert to another language - to find all zipcodes within a certain
radius.

HTH,
Chuck P.

Charles Patridge
Full Capture Solutions, Inc.
333 Roberts Street, Suite 400
East Hartford, CT 06108
Phone: 860-291-9517 x 106
Email: Chuck@fullcapture.com

-----Original Message-----
From: Will Johnson [mailto:willjohnsonsearch@gmail.com] 
Sent: Wednesday, August 29, 2007 11:46 AM
To: java-user@lucene.apache.org
Subject: Re: Postal Code Radius Search

a CustomScoreQuery combined with a FieldCacheSource that holds the  
the lat/lon might work.

- will


On Aug 29, 2007, at 11:15 AM, Mike wrote:

> I've searched the mailing list archives, the web, read the FAQ, etc  
> and I
> don't see anything relevant so here it goes...
>
> I'm trying to implement a radius based searching based on zip/ 
> postal codes.
>  (The user enters their zip code and I show nearby matches under x  
> miles
> away sorted by linear distance.)  I already have the data required  
> to pull
> this off (zip codes, long/lat coordinates, etc.)   Extreme accuracy  
> is not a
> requirement.  It just needs to be an approximation (plus or minus a  
> few
> miles.)
>
> What I'm looking for is a little direction.  How have others  
> implemented
> this type of search?  What are the pros/cons of various methods?  I  
> have a
> few ideas but obviously none of them are very good or I guess I  
> wouldn't be
> here asking.  ;)
>
> By the way, my index is updated about every 10 minutes and holds about
> 25,000 records.  However, this may increase in the next year or so to
> hundreds of thousands.  So whatever I do needs to be fairly  
> scalable.  The
> items being searched as well as the people searching will be  
> located all
> over the world.   Some areas may be busier than others so there is an
> opportunity for caching more common locals.
>
> Thank you for your time.  I'd appreciate any suggestions that you  
> can give.
>
> - Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Postal Code Radius Search

Posted by Will Johnson <wi...@gmail.com>.
a CustomScoreQuery combined with a FieldCacheSource that holds the  
the lat/lon might work.

- will


On Aug 29, 2007, at 11:15 AM, Mike wrote:

> I've searched the mailing list archives, the web, read the FAQ, etc  
> and I
> don't see anything relevant so here it goes…
>
> I'm trying to implement a radius based searching based on zip/ 
> postal codes.
>  (The user enters their zip code and I show nearby matches under x  
> miles
> away sorted by linear distance.)  I already have the data required  
> to pull
> this off (zip codes, long/lat coordinates, etc.)   Extreme accuracy  
> is not a
> requirement.  It just needs to be an approximation (plus or minus a  
> few
> miles.)
>
> What I'm looking for is a little direction.  How have others  
> implemented
> this type of search?  What are the pros/cons of various methods?  I  
> have a
> few ideas but obviously none of them are very good or I guess I  
> wouldn't be
> here asking.  ;)
>
> By the way, my index is updated about every 10 minutes and holds about
> 25,000 records.  However, this may increase in the next year or so to
> hundreds of thousands.  So whatever I do needs to be fairly  
> scalable.  The
> items being searched as well as the people searching will be  
> located all
> over the world.   Some areas may be busier than others so there is an
> opportunity for caching more common locals.
>
> Thank you for your time.  I'd appreciate any suggestions that you  
> can give.
>
> - Mike


RE: Postal Code Radius Search

Posted by Charles Patridge <ch...@fullcapture.com>.
Here is an example of getting all the zipcodes within a certain radius -


Something I did in SAS but I am sure you can convert the formula into
another language.

http://www.sconsig.com/sastips/tip00156.htm

Chuck Patridge

Charles Patridge
Full Capture Solutions, Inc.
333 Roberts Street, Suite 400
East Hartford, CT 06108
Phone: 860-291-9517 x 106
Email: Chuck@fullcapture.com
-----Original Message-----
From: Steven Rowe [mailto:sarowe@syr.edu] 
Sent: Wednesday, August 29, 2007 12:37 PM
To: java-user@lucene.apache.org
Subject: Re: Postal Code Radius Search

Mike wrote:
> I've searched the mailing list archives, the web, read the FAQ, etc
and I
> don't see anything relevant so here it goes...
> 
> I'm trying to implement a radius based searching based on zip/postal
codes.

Here is a selection of interesting threads from the Lucene ML with
relevant info:

<http://www.nabble.com/all-records-within-distance----small-index-tf3303
731.html>

<http://www.nabble.com/Hacking-proximity-search%3A-looking-for-feedback-
tf1201416.html>

<http://www.nabble.com/Announcement%3A-Lucene-powering-Monster-job-searc
h-index-%28Beta%29-tf2522646.html>

The standard answer seems to be something like:

1. Index latitude and longitude fields with fixed length
(left-zero-padded) integral values - shift the decimal point to the
right to the desired level of discriminability.  (In your case, convert
the postal codes to lats/longs.)

2. Do a range query on both your lat and your long fields to collect
hits inside a bounding box with your target at the center and with sides
of length double the desired radius.

3. Optionally, sort (and filter) the results by distance from your
target, displaying only those within the desired radius.  If you leave
out this step, you'll get some hits that are outside of the desired
radius - inbetween the bounding circle and the bounding box.

Steve

-- 
Steve Rowe
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Postal Code Radius Search

Posted by Steven Rowe <sa...@syr.edu>.
Mike wrote:
> I've searched the mailing list archives, the web, read the FAQ, etc and I
> don't see anything relevant so here it goes…
> 
> I'm trying to implement a radius based searching based on zip/postal codes.

Here is a selection of interesting threads from the Lucene ML with
relevant info:

<http://www.nabble.com/all-records-within-distance----small-index-tf3303731.html>

<http://www.nabble.com/Hacking-proximity-search%3A-looking-for-feedback-tf1201416.html>

<http://www.nabble.com/Announcement%3A-Lucene-powering-Monster-job-search-index-%28Beta%29-tf2522646.html>

The standard answer seems to be something like:

1. Index latitude and longitude fields with fixed length
(left-zero-padded) integral values - shift the decimal point to the
right to the desired level of discriminability.  (In your case, convert
the postal codes to lats/longs.)

2. Do a range query on both your lat and your long fields to collect
hits inside a bounding box with your target at the center and with sides
of length double the desired radius.

3. Optionally, sort (and filter) the results by distance from your
target, displaying only those within the desired radius.  If you leave
out this step, you'll get some hits that are outside of the desired
radius - inbetween the bounding circle and the bounding box.

Steve

-- 
Steve Rowe
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org