You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dirceu Vieira <di...@gmail.com> on 2012/03/01 08:59:13 UTC

Re: Solr Design question on spatial search

I believe that what you need is spatial search...

Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch

On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar <ve...@gmail.com>wrote:

> Hello,
>
> I have a design question for Solr.
>
> I work for an enterprise which has a lot of retail stores (approx. 20K).
> These retail stores are spread across the world.  My search requirement is
> to find all the cities which are within x miles of a retail store.
>
> So lets say if we have a retail Store in San Francisco and if I search for
> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
> returned as they are within x miles from San Francisco. I also want to rank
> the search results by their distance.
>
> I can create an index with all the cities in it but I am not sure how do I
> ensure that the cities returned in a search result have a nearby retail
> store. Any suggestions ?
>
> Thanks,
> Venu,
>



-- 
Dirceu Vieira Júnior
-------------------------------------------------------------------
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr

Re: Solr Design question on spatial search

Posted by Lance Norskog <go...@gmail.com>.
The Lucene geo searching code is very fast. Geosearch queries
calculate the distance from the city to all 20k stores and sort on
this.

If this is not fast enough, you can pre-calculate the city/store lists
by doing all of this searching in advance. You can store these in a DB
and do incremental updates to your index. As to re-indexing all the
data, you should assume you will do this regularly.

Lance

On Fri, Mar 2, 2012 at 2:06 PM, Venu Gmail Dev <ve...@gmail.com> wrote:
> Sorry for not being clear enough.
>
> I don't know the point of origin. All I know is that there are 20K retail stores. Only the cities within 10 miles radius of these stores should be searchable. Any city which is outside these small 10miles circles around these 20K stores should be ignored.
>
> So when somebody searches for a city, I need to query the cities which are in these 20K 10miles circles but I don't know which 10-mile circle I should query.
>
> So the approach that I was thinking were :-
>
>>>>> a) Have 2 separate indexes. First one to store the information about all the cities and second one to store the retail stores information. Whenever user searches for a city then I return all the matching cities ( and hence the lat-long) from first index and then do a spatial search on each of the matched city in the second index. But this is too costly.
>>>>>
>>>>> b) Index only the cities which have a nearby store. Do all the calculation(s) before indexing the data so that the search is fast. The problem that I see with this approach is that if a new retail store or a city is added then I would have to re-index all the data again.
>
> Does this answers the problem that you posed ?
>
> Thanks,
> Venu.
>
> On Mar 2, 2012, at 9:52 PM, Erick Erickson wrote:
>
>> But again, that doesn't answer the problem I posed. Where is your
>> point of origin?
>> There's nothing in what you've written that indicates how you would know
>> that 10 miles is relative to San Francisco. All you've said is that
>> you're searching
>> on "San". Which would presumably return San Francisco, San Mateo, San Jose.
>>
>> Then, also presumably, you're looking for all the cities with stores
>> within 10 miles
>> of one of these cities. But nothing in your criteria so far says that
>> that city is
>> San Francisco.
>>
>> If you already know that San Francisco is the locus, simple distance
>> will work just
>> fine. You can index both city and store info in the same index and
>> restrict, say, facets
>> (or, indeed search results) by fq clause (e.g. fq=type:city or fq=type:store).
>>
>> Or I'm completely missing the boat here.
>>
>> Best
>> Erick
>>
>>
>> On Fri, Mar 2, 2012 at 11:50 AM, Venu Dev <ve...@gmail.com> wrote:
>>> So let's say x=10 miles. Now if I search for San then San Francisco, San Mateo should be returned because there is a retail store in San Francisco. But San Jose should not be returned because it is more than 10 miles away from San
>>> Francisco. Had there been a retail store in San Jose then it should be also returned when you search for San. I can restrict the queries to a country.
>>>
>>> Thanks,
>>> ~Venu
>>>
>>> On Mar 2, 2012, at 5:57 AM, Erick Erickson <er...@gmail.com> wrote:
>>>
>>>> I don't see how this works, since your search for San could also return
>>>> San Marino, Italy. Would you then return all retail stores in
>>>> X miles of that city? What about San Salvador de Jujuy, Argentina?
>>>>
>>>> And even in your example, San would match San Mateo. But should
>>>> the search then return any stores within X miles of San Mateo?
>>>> You have to stop somewhere....
>>>>
>>>> Is there any other information you have that restricts how far to expand the
>>>> search?
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Thu, Mar 1, 2012 at 4:57 PM, Venu Gmail Dev <ve...@gmail.com> wrote:
>>>>> I don't think Spatial search will fully fit into this. I have 2 approaches in mind but I am not satisfied with either one of them.
>>>>>
>>>>> a) Have 2 separate indexes. First one to store the information about all the cities and second one to store the retail stores information. Whenever user searches for a city then I return all the matching cities from first index and then do a spatial search on each of the matched city in the second index. But this is too costly.
>>>>>
>>>>> b) Index only the cities which have a nearby store. Do all the calculation(s) before indexing the data so that the search is fast. The problem that I see with this approach is that if a new retail store or a city is added then I would have to re-index all the data again.
>>>>>
>>>>>
>>>>> On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:
>>>>>
>>>>>> I believe that what you need is spatial search...
>>>>>>
>>>>>> Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch
>>>>>>
>>>>>> On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar <ve...@gmail.com>wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I have a design question for Solr.
>>>>>>>
>>>>>>> I work for an enterprise which has a lot of retail stores (approx. 20K).
>>>>>>> These retail stores are spread across the world.  My search requirement is
>>>>>>> to find all the cities which are within x miles of a retail store.
>>>>>>>
>>>>>>> So lets say if we have a retail Store in San Francisco and if I search for
>>>>>>> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
>>>>>>> returned as they are within x miles from San Francisco. I also want to rank
>>>>>>> the search results by their distance.
>>>>>>>
>>>>>>> I can create an index with all the cities in it but I am not sure how do I
>>>>>>> ensure that the cities returned in a search result have a nearby retail
>>>>>>> store. Any suggestions ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Venu,
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Dirceu Vieira Júnior
>>>>>> -------------------------------------------------------------------
>>>>>> +47 9753 2473
>>>>>> dirceuvjr.blogspot.com
>>>>>> twitter.com/dirceuvjr
>>>>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Solr Design question on spatial search

Posted by Venu Gmail Dev <ve...@gmail.com>.
Sorry for not being clear enough.

I don't know the point of origin. All I know is that there are 20K retail stores. Only the cities within 10 miles radius of these stores should be searchable. Any city which is outside these small 10miles circles around these 20K stores should be ignored.

So when somebody searches for a city, I need to query the cities which are in these 20K 10miles circles but I don't know which 10-mile circle I should query.

So the approach that I was thinking were :-

>>>> a) Have 2 separate indexes. First one to store the information about all the cities and second one to store the retail stores information. Whenever user searches for a city then I return all the matching cities ( and hence the lat-long) from first index and then do a spatial search on each of the matched city in the second index. But this is too costly.
>>>> 
>>>> b) Index only the cities which have a nearby store. Do all the calculation(s) before indexing the data so that the search is fast. The problem that I see with this approach is that if a new retail store or a city is added then I would have to re-index all the data again.

Does this answers the problem that you posed ?

Thanks,
Venu.

On Mar 2, 2012, at 9:52 PM, Erick Erickson wrote:

> But again, that doesn't answer the problem I posed. Where is your
> point of origin?
> There's nothing in what you've written that indicates how you would know
> that 10 miles is relative to San Francisco. All you've said is that
> you're searching
> on "San". Which would presumably return San Francisco, San Mateo, San Jose.
> 
> Then, also presumably, you're looking for all the cities with stores
> within 10 miles
> of one of these cities. But nothing in your criteria so far says that
> that city is
> San Francisco.
> 
> If you already know that San Francisco is the locus, simple distance
> will work just
> fine. You can index both city and store info in the same index and
> restrict, say, facets
> (or, indeed search results) by fq clause (e.g. fq=type:city or fq=type:store).
> 
> Or I'm completely missing the boat here.
> 
> Best
> Erick
> 
> 
> On Fri, Mar 2, 2012 at 11:50 AM, Venu Dev <ve...@gmail.com> wrote:
>> So let's say x=10 miles. Now if I search for San then San Francisco, San Mateo should be returned because there is a retail store in San Francisco. But San Jose should not be returned because it is more than 10 miles away from San
>> Francisco. Had there been a retail store in San Jose then it should be also returned when you search for San. I can restrict the queries to a country.
>> 
>> Thanks,
>> ~Venu
>> 
>> On Mar 2, 2012, at 5:57 AM, Erick Erickson <er...@gmail.com> wrote:
>> 
>>> I don't see how this works, since your search for San could also return
>>> San Marino, Italy. Would you then return all retail stores in
>>> X miles of that city? What about San Salvador de Jujuy, Argentina?
>>> 
>>> And even in your example, San would match San Mateo. But should
>>> the search then return any stores within X miles of San Mateo?
>>> You have to stop somewhere....
>>> 
>>> Is there any other information you have that restricts how far to expand the
>>> search?
>>> 
>>> Best
>>> Erick
>>> 
>>> On Thu, Mar 1, 2012 at 4:57 PM, Venu Gmail Dev <ve...@gmail.com> wrote:
>>>> I don't think Spatial search will fully fit into this. I have 2 approaches in mind but I am not satisfied with either one of them.
>>>> 
>>>> a) Have 2 separate indexes. First one to store the information about all the cities and second one to store the retail stores information. Whenever user searches for a city then I return all the matching cities from first index and then do a spatial search on each of the matched city in the second index. But this is too costly.
>>>> 
>>>> b) Index only the cities which have a nearby store. Do all the calculation(s) before indexing the data so that the search is fast. The problem that I see with this approach is that if a new retail store or a city is added then I would have to re-index all the data again.
>>>> 
>>>> 
>>>> On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:
>>>> 
>>>>> I believe that what you need is spatial search...
>>>>> 
>>>>> Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch
>>>>> 
>>>>> On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar <ve...@gmail.com>wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I have a design question for Solr.
>>>>>> 
>>>>>> I work for an enterprise which has a lot of retail stores (approx. 20K).
>>>>>> These retail stores are spread across the world.  My search requirement is
>>>>>> to find all the cities which are within x miles of a retail store.
>>>>>> 
>>>>>> So lets say if we have a retail Store in San Francisco and if I search for
>>>>>> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
>>>>>> returned as they are within x miles from San Francisco. I also want to rank
>>>>>> the search results by their distance.
>>>>>> 
>>>>>> I can create an index with all the cities in it but I am not sure how do I
>>>>>> ensure that the cities returned in a search result have a nearby retail
>>>>>> store. Any suggestions ?
>>>>>> 
>>>>>> Thanks,
>>>>>> Venu,
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Dirceu Vieira Júnior
>>>>> -------------------------------------------------------------------
>>>>> +47 9753 2473
>>>>> dirceuvjr.blogspot.com
>>>>> twitter.com/dirceuvjr
>>>> 


Re: Solr Design question on spatial search

Posted by Erick Erickson <er...@gmail.com>.
But again, that doesn't answer the problem I posed. Where is your
point of origin?
There's nothing in what you've written that indicates how you would know
that 10 miles is relative to San Francisco. All you've said is that
you're searching
on "San". Which would presumably return San Francisco, San Mateo, San Jose.

Then, also presumably, you're looking for all the cities with stores
within 10 miles
of one of these cities. But nothing in your criteria so far says that
that city is
San Francisco.

If you already know that San Francisco is the locus, simple distance
will work just
fine. You can index both city and store info in the same index and
restrict, say, facets
(or, indeed search results) by fq clause (e.g. fq=type:city or fq=type:store).

Or I'm completely missing the boat here.

Best
Erick


On Fri, Mar 2, 2012 at 11:50 AM, Venu Dev <ve...@gmail.com> wrote:
> So let's say x=10 miles. Now if I search for San then San Francisco, San Mateo should be returned because there is a retail store in San Francisco. But San Jose should not be returned because it is more than 10 miles away from San
> Francisco. Had there been a retail store in San Jose then it should be also returned when you search for San. I can restrict the queries to a country.
>
> Thanks,
> ~Venu
>
> On Mar 2, 2012, at 5:57 AM, Erick Erickson <er...@gmail.com> wrote:
>
>> I don't see how this works, since your search for San could also return
>> San Marino, Italy. Would you then return all retail stores in
>> X miles of that city? What about San Salvador de Jujuy, Argentina?
>>
>> And even in your example, San would match San Mateo. But should
>> the search then return any stores within X miles of San Mateo?
>> You have to stop somewhere....
>>
>> Is there any other information you have that restricts how far to expand the
>> search?
>>
>> Best
>> Erick
>>
>> On Thu, Mar 1, 2012 at 4:57 PM, Venu Gmail Dev <ve...@gmail.com> wrote:
>>> I don't think Spatial search will fully fit into this. I have 2 approaches in mind but I am not satisfied with either one of them.
>>>
>>> a) Have 2 separate indexes. First one to store the information about all the cities and second one to store the retail stores information. Whenever user searches for a city then I return all the matching cities from first index and then do a spatial search on each of the matched city in the second index. But this is too costly.
>>>
>>> b) Index only the cities which have a nearby store. Do all the calculation(s) before indexing the data so that the search is fast. The problem that I see with this approach is that if a new retail store or a city is added then I would have to re-index all the data again.
>>>
>>>
>>> On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:
>>>
>>>> I believe that what you need is spatial search...
>>>>
>>>> Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch
>>>>
>>>> On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar <ve...@gmail.com>wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I have a design question for Solr.
>>>>>
>>>>> I work for an enterprise which has a lot of retail stores (approx. 20K).
>>>>> These retail stores are spread across the world.  My search requirement is
>>>>> to find all the cities which are within x miles of a retail store.
>>>>>
>>>>> So lets say if we have a retail Store in San Francisco and if I search for
>>>>> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
>>>>> returned as they are within x miles from San Francisco. I also want to rank
>>>>> the search results by their distance.
>>>>>
>>>>> I can create an index with all the cities in it but I am not sure how do I
>>>>> ensure that the cities returned in a search result have a nearby retail
>>>>> store. Any suggestions ?
>>>>>
>>>>> Thanks,
>>>>> Venu,
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dirceu Vieira Júnior
>>>> -------------------------------------------------------------------
>>>> +47 9753 2473
>>>> dirceuvjr.blogspot.com
>>>> twitter.com/dirceuvjr
>>>

Re: Solr Design question on spatial search

Posted by Venu Dev <ve...@gmail.com>.
So let's say x=10 miles. Now if I search for San then San Francisco, San Mateo should be returned because there is a retail store in San Francisco. But San Jose should not be returned because it is more than 10 miles away from San 
Francisco. Had there been a retail store in San Jose then it should be also returned when you search for San. I can restrict the queries to a country. 

Thanks,
~Venu

On Mar 2, 2012, at 5:57 AM, Erick Erickson <er...@gmail.com> wrote:

> I don't see how this works, since your search for San could also return
> San Marino, Italy. Would you then return all retail stores in
> X miles of that city? What about San Salvador de Jujuy, Argentina?
> 
> And even in your example, San would match San Mateo. But should
> the search then return any stores within X miles of San Mateo?
> You have to stop somewhere....
> 
> Is there any other information you have that restricts how far to expand the
> search?
> 
> Best
> Erick
> 
> On Thu, Mar 1, 2012 at 4:57 PM, Venu Gmail Dev <ve...@gmail.com> wrote:
>> I don't think Spatial search will fully fit into this. I have 2 approaches in mind but I am not satisfied with either one of them.
>> 
>> a) Have 2 separate indexes. First one to store the information about all the cities and second one to store the retail stores information. Whenever user searches for a city then I return all the matching cities from first index and then do a spatial search on each of the matched city in the second index. But this is too costly.
>> 
>> b) Index only the cities which have a nearby store. Do all the calculation(s) before indexing the data so that the search is fast. The problem that I see with this approach is that if a new retail store or a city is added then I would have to re-index all the data again.
>> 
>> 
>> On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:
>> 
>>> I believe that what you need is spatial search...
>>> 
>>> Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch
>>> 
>>> On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar <ve...@gmail.com>wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I have a design question for Solr.
>>>> 
>>>> I work for an enterprise which has a lot of retail stores (approx. 20K).
>>>> These retail stores are spread across the world.  My search requirement is
>>>> to find all the cities which are within x miles of a retail store.
>>>> 
>>>> So lets say if we have a retail Store in San Francisco and if I search for
>>>> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
>>>> returned as they are within x miles from San Francisco. I also want to rank
>>>> the search results by their distance.
>>>> 
>>>> I can create an index with all the cities in it but I am not sure how do I
>>>> ensure that the cities returned in a search result have a nearby retail
>>>> store. Any suggestions ?
>>>> 
>>>> Thanks,
>>>> Venu,
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Dirceu Vieira Júnior
>>> -------------------------------------------------------------------
>>> +47 9753 2473
>>> dirceuvjr.blogspot.com
>>> twitter.com/dirceuvjr
>> 

Re: Solr Design question on spatial search

Posted by Erick Erickson <er...@gmail.com>.
I don't see how this works, since your search for San could also return
San Marino, Italy. Would you then return all retail stores in
X miles of that city? What about San Salvador de Jujuy, Argentina?

And even in your example, San would match San Mateo. But should
the search then return any stores within X miles of San Mateo?
You have to stop somewhere....

Is there any other information you have that restricts how far to expand the
search?

Best
Erick

On Thu, Mar 1, 2012 at 4:57 PM, Venu Gmail Dev <ve...@gmail.com> wrote:
> I don't think Spatial search will fully fit into this. I have 2 approaches in mind but I am not satisfied with either one of them.
>
> a) Have 2 separate indexes. First one to store the information about all the cities and second one to store the retail stores information. Whenever user searches for a city then I return all the matching cities from first index and then do a spatial search on each of the matched city in the second index. But this is too costly.
>
> b) Index only the cities which have a nearby store. Do all the calculation(s) before indexing the data so that the search is fast. The problem that I see with this approach is that if a new retail store or a city is added then I would have to re-index all the data again.
>
>
> On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:
>
>> I believe that what you need is spatial search...
>>
>> Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch
>>
>> On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar <ve...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I have a design question for Solr.
>>>
>>> I work for an enterprise which has a lot of retail stores (approx. 20K).
>>> These retail stores are spread across the world.  My search requirement is
>>> to find all the cities which are within x miles of a retail store.
>>>
>>> So lets say if we have a retail Store in San Francisco and if I search for
>>> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
>>> returned as they are within x miles from San Francisco. I also want to rank
>>> the search results by their distance.
>>>
>>> I can create an index with all the cities in it but I am not sure how do I
>>> ensure that the cities returned in a search result have a nearby retail
>>> store. Any suggestions ?
>>>
>>> Thanks,
>>> Venu,
>>>
>>
>>
>>
>> --
>> Dirceu Vieira Júnior
>> -------------------------------------------------------------------
>> +47 9753 2473
>> dirceuvjr.blogspot.com
>> twitter.com/dirceuvjr
>

Re: Solr Design question on spatial search

Posted by Venu Gmail Dev <ve...@gmail.com>.
I don't think Spatial search will fully fit into this. I have 2 approaches in mind but I am not satisfied with either one of them.

a) Have 2 separate indexes. First one to store the information about all the cities and second one to store the retail stores information. Whenever user searches for a city then I return all the matching cities from first index and then do a spatial search on each of the matched city in the second index. But this is too costly.

b) Index only the cities which have a nearby store. Do all the calculation(s) before indexing the data so that the search is fast. The problem that I see with this approach is that if a new retail store or a city is added then I would have to re-index all the data again.


On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:

> I believe that what you need is spatial search...
> 
> Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch
> 
> On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar <ve...@gmail.com>wrote:
> 
>> Hello,
>> 
>> I have a design question for Solr.
>> 
>> I work for an enterprise which has a lot of retail stores (approx. 20K).
>> These retail stores are spread across the world.  My search requirement is
>> to find all the cities which are within x miles of a retail store.
>> 
>> So lets say if we have a retail Store in San Francisco and if I search for
>> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
>> returned as they are within x miles from San Francisco. I also want to rank
>> the search results by their distance.
>> 
>> I can create an index with all the cities in it but I am not sure how do I
>> ensure that the cities returned in a search result have a nearby retail
>> store. Any suggestions ?
>> 
>> Thanks,
>> Venu,
>> 
> 
> 
> 
> -- 
> Dirceu Vieira Júnior
> -------------------------------------------------------------------
> +47 9753 2473
> dirceuvjr.blogspot.com
> twitter.com/dirceuvjr