You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Magnus Ottosson <ma...@magnusottosson.se> on 2010/04/12 10:25:36 UTC

Search for city by name, order by population

Hi,

I have a database with the names and population of cities (about 7
million entities). Is it possible, with couchdb, to create a key that
includes both cityname and population where I can search for a name
and get the matching results ordered by the population?

I tried to create a key like [population, name] and search like this
?startkey=[0, "name"]&endkey=[10000000, "name]&limit=10 but it does
not work like the way I hoped.

Any ideas?

Magnus

Re: Search for city by name, order by population

Posted by Metin Akat <ak...@gmail.com>.
What you need is called couchdb-lucene. All the hacks that try to do
something similar, in my opinion require much more effort.

On Mon, Apr 12, 2010 at 12:20 PM, Sebastian Cohnen
<se...@googlemail.com> wrote:
> you are right, this won't work in your case... but I'm not sure if this is easily solvable... *hmm*
>
> On 12.04.2010, at 11:19, Magnus Ottosson wrote:
>
>> Are you sure about that?
>>
>> Wouldn't [name, population] first order the cities by name and if
>> there are more than one city with the same name they would be order by
>> population?
>>
>> Magnus
>>
>>
>>
>> On Mon, Apr 12, 2010 at 10:59 AM, Nils Breunese <N....@vpro.nl> wrote:
>>> If you use [name, population] as the key I believe you'll get what you need. You could query the view like this: ?startkey=["name"]&endkey=["name",{}]
>>>
>>> Also check out the article on view collation on the wiki: http://wiki.apache.org/couchdb/View_collation
>>>
>>> Nils Breunese.
>>> ________________________________________
>>> Van: Magnus Ottosson [magnus@magnusottosson.se]
>>> Verzonden: maandag 12 april 2010 10:25
>>> Aan: user@couchdb.apache.org
>>> Onderwerp: Search for city by name, order by population
>>>
>>> Hi,
>>>
>>> I have a database with the names and population of cities (about 7
>>> million entities). Is it possible, with couchdb, to create a key that
>>> includes both cityname and population where I can search for a name
>>> and get the matching results ordered by the population?
>>>
>>> I tried to create a key like [population, name] and search like this
>>> ?startkey=[0, "name"]&endkey=[10000000, "name]&limit=10 but it does
>>> not work like the way I hoped.
>>>
>>> Any ideas?
>>>
>>> Magnus
>>>
>>> De informatie vervat in deze  e-mail en meegezonden bijlagen is uitsluitend bedoeld voor gebruik door de geadresseerde en kan vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is voorbehouden aan geadresseerde. De VPRO staat niet in voor de juiste en volledige overbrenging van de inhoud van een verzonden e-mail, noch voor tijdige ontvangst daarvan.
>>>
>
>

Re: Search for city by name, order by population

Posted by Sebastian Cohnen <se...@googlemail.com>.
you are right, this won't work in your case... but I'm not sure if this is easily solvable... *hmm*

On 12.04.2010, at 11:19, Magnus Ottosson wrote:

> Are you sure about that?
> 
> Wouldn't [name, population] first order the cities by name and if
> there are more than one city with the same name they would be order by
> population?
> 
> Magnus
> 
> 
> 
> On Mon, Apr 12, 2010 at 10:59 AM, Nils Breunese <N....@vpro.nl> wrote:
>> If you use [name, population] as the key I believe you'll get what you need. You could query the view like this: ?startkey=["name"]&endkey=["name",{}]
>> 
>> Also check out the article on view collation on the wiki: http://wiki.apache.org/couchdb/View_collation
>> 
>> Nils Breunese.
>> ________________________________________
>> Van: Magnus Ottosson [magnus@magnusottosson.se]
>> Verzonden: maandag 12 april 2010 10:25
>> Aan: user@couchdb.apache.org
>> Onderwerp: Search for city by name, order by population
>> 
>> Hi,
>> 
>> I have a database with the names and population of cities (about 7
>> million entities). Is it possible, with couchdb, to create a key that
>> includes both cityname and population where I can search for a name
>> and get the matching results ordered by the population?
>> 
>> I tried to create a key like [population, name] and search like this
>> ?startkey=[0, "name"]&endkey=[10000000, "name]&limit=10 but it does
>> not work like the way I hoped.
>> 
>> Any ideas?
>> 
>> Magnus
>> 
>> De informatie vervat in deze  e-mail en meegezonden bijlagen is uitsluitend bedoeld voor gebruik door de geadresseerde en kan vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is voorbehouden aan geadresseerde. De VPRO staat niet in voor de juiste en volledige overbrenging van de inhoud van een verzonden e-mail, noch voor tijdige ontvangst daarvan.
>> 


Re: Search for city by name, order by population

Posted by Metin Akat <ak...@gmail.com>.
> I was thinking about something like this (correct me if this is a
> stupid way of doing it):

As I said above, all the hackish solutions that try to do something
like a full-text search implementation, require much more effort than
couchdb-lucene, which can do all that you want (and some more).
It's fast, it can sort in whatever way you want, it can do fuzzy
(partial or wrong words) search etc.

Re: Search for city by name, order by population

Posted by Magnus Ottosson <ma...@magnusottosson.se>.
I was thinking about something like this (correct me if this is a
stupid way of doing it):

For the map function of the view I have:

function(doc) {
    for(var i = 3, i <= doc.name.length; i++){
		emit(doc.name.substring(0, i), doc);
	}
}

This will create several keys for each doc. For the city "new york" it
would create the following keys:

"new", "new ", "new y", "new yo", "new yor", "new york".

This way I can search for all cities starting with the name "new" like
this: "?key=new" and this will return all cities with the "new" key
right?

To limit the result I create a reduce function like this:

function(keys, values, rereduce) {

	var vals = [];
	var resultLength = 20;

	for(var i = 0; i <= values.length;i++){

		if(vals.length <= resultLength){
			vals.push(values[i]);
		}
		else{
			vals.sort(function(a,b){
				return b.population - a.population;
			});
			
			if(vals[resultLength - 1].population < values[i].population){
				vals.pop();
				vals.push(values[i]);
			}
		}
	}
}

What this function does is that it creates an array of the x (defined
in the reduce function) largest cities and return them.

The downside of this is that the number of return cities is defined in
the result function but I can live with that.

Will this work as I expect? Will this have a large impact on the size
of the database? Does all keys point at the same document? Or is all
the documents store once for each key? Have I understand how the
reduce function works correctly?

Magnus



On Mon, Apr 12, 2010 at 11:34 AM, Nils Breunese <N....@vpro.nl> wrote:
> I'm sorry, I totally misread your use case. Yes, you are correct. Forget what I said. :o)
>
> I found couchdb-footrest a while ago, which apparently features 'Order Docs By Value Fields': http://github.com/assembly/couchdb-footrest Sadly, I never got couchdb-footrest to work.
>
> Like others have suggested: I think you'll want to go and look into couchdb-lucene, which will certainly do what you're looking for: http://github.com/rnewson/couchdb-lucene
>
> Nils Breunese.
> ________________________________________
> Van: Magnus Ottosson [magnus@magnusottosson.se]
> Verzonden: maandag 12 april 2010 11:19
> Aan: user@couchdb.apache.org
> Onderwerp: Re: Search for city by name, order by population
>
> Are you sure about that?
>
> Wouldn't [name, population] first order the cities by name and if
> there are more than one city with the same name they would be order by
> population?
>
> Magnus
>
> De informatie vervat in deze  e-mail en meegezonden bijlagen is uitsluitend bedoeld voor gebruik door de geadresseerde en kan vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is voorbehouden aan geadresseerde. De VPRO staat niet in voor de juiste en volledige overbrenging van de inhoud van een verzonden e-mail, noch voor tijdige ontvangst daarvan.
>

RE: Search for city by name, order by population

Posted by Nils Breunese <N....@vpro.nl>.
I'm sorry, I totally misread your use case. Yes, you are correct. Forget what I said. :o)

I found couchdb-footrest a while ago, which apparently features 'Order Docs By Value Fields': http://github.com/assembly/couchdb-footrest Sadly, I never got couchdb-footrest to work.

Like others have suggested: I think you'll want to go and look into couchdb-lucene, which will certainly do what you're looking for: http://github.com/rnewson/couchdb-lucene

Nils Breunese.
________________________________________
Van: Magnus Ottosson [magnus@magnusottosson.se]
Verzonden: maandag 12 april 2010 11:19
Aan: user@couchdb.apache.org
Onderwerp: Re: Search for city by name, order by population

Are you sure about that?

Wouldn't [name, population] first order the cities by name and if
there are more than one city with the same name they would be order by
population?

Magnus

De informatie vervat in deze  e-mail en meegezonden bijlagen is uitsluitend bedoeld voor gebruik door de geadresseerde en kan vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is voorbehouden aan geadresseerde. De VPRO staat niet in voor de juiste en volledige overbrenging van de inhoud van een verzonden e-mail, noch voor tijdige ontvangst daarvan.

Re: Search for city by name, order by population

Posted by Magnus Ottosson <ma...@magnusottosson.se>.
Are you sure about that?

Wouldn't [name, population] first order the cities by name and if
there are more than one city with the same name they would be order by
population?

Magnus



On Mon, Apr 12, 2010 at 10:59 AM, Nils Breunese <N....@vpro.nl> wrote:
> If you use [name, population] as the key I believe you'll get what you need. You could query the view like this: ?startkey=["name"]&endkey=["name",{}]
>
> Also check out the article on view collation on the wiki: http://wiki.apache.org/couchdb/View_collation
>
> Nils Breunese.
> ________________________________________
> Van: Magnus Ottosson [magnus@magnusottosson.se]
> Verzonden: maandag 12 april 2010 10:25
> Aan: user@couchdb.apache.org
> Onderwerp: Search for city by name, order by population
>
> Hi,
>
> I have a database with the names and population of cities (about 7
> million entities). Is it possible, with couchdb, to create a key that
> includes both cityname and population where I can search for a name
> and get the matching results ordered by the population?
>
> I tried to create a key like [population, name] and search like this
> ?startkey=[0, "name"]&endkey=[10000000, "name]&limit=10 but it does
> not work like the way I hoped.
>
> Any ideas?
>
> Magnus
>
> De informatie vervat in deze  e-mail en meegezonden bijlagen is uitsluitend bedoeld voor gebruik door de geadresseerde en kan vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is voorbehouden aan geadresseerde. De VPRO staat niet in voor de juiste en volledige overbrenging van de inhoud van een verzonden e-mail, noch voor tijdige ontvangst daarvan.
>

RE: Search for city by name, order by population

Posted by Nils Breunese <N....@vpro.nl>.
If you use [name, population] as the key I believe you'll get what you need. You could query the view like this: ?startkey=["name"]&endkey=["name",{}]

Also check out the article on view collation on the wiki: http://wiki.apache.org/couchdb/View_collation

Nils Breunese.
________________________________________
Van: Magnus Ottosson [magnus@magnusottosson.se]
Verzonden: maandag 12 april 2010 10:25
Aan: user@couchdb.apache.org
Onderwerp: Search for city by name, order by population

Hi,

I have a database with the names and population of cities (about 7
million entities). Is it possible, with couchdb, to create a key that
includes both cityname and population where I can search for a name
and get the matching results ordered by the population?

I tried to create a key like [population, name] and search like this
?startkey=[0, "name"]&endkey=[10000000, "name]&limit=10 but it does
not work like the way I hoped.

Any ideas?

Magnus

De informatie vervat in deze  e-mail en meegezonden bijlagen is uitsluitend bedoeld voor gebruik door de geadresseerde en kan vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is voorbehouden aan geadresseerde. De VPRO staat niet in voor de juiste en volledige overbrenging van de inhoud van een verzonden e-mail, noch voor tijdige ontvangst daarvan.

Re: Search for city by name, order by population

Posted by Darran White <da...@googlemail.com>.
I've not tried it myself but maybe take a look at the couchdb lucene
project
http://github.com/rnewson/couchdb-lucene
It may help with your autocomplete search
Sent from my iPhone

On 13 Apr 2010, at 23:11, Jon Gretar Borgthorsson <jon.borgthorsson@gmail.com
 > wrote:

> To be honest then in this exact case as you describe it I would skip
> sorting
> it in CouchDB and sort it on the client side. It's a very simple
> solution
> that can probably be done in a single line of code. I might even
> check and
> see if the autocomplete plugin you use simply offers to do that for
> you.
>
> When in doubt always do what has the least amount of code my great
> grandfather used to tell me. ;)
>
> On Mon, Apr 12, 2010 at 8:38 AM, Magnus Ottosson
> <ma...@magnusottosson.se>wrote:
>
>> Oh, I should have made that more clear. I will use this to
>> autocomplete a searchbox. So when the user start typing I will search
>> for the cities that matches the string. The user might type "new".
>> Then I want to fetch the 10 largest cities based on population where
>> the name starts with "new".
>>
>> If I just wanted to search for the name I could have created an index
>> with the name as the key and searched like this:
>> startkey="new"&endkey="new\u9999" and this would have matched all the
>> cities with the name string with new. Right?
>>
>> I want to sort this result by population in descending order so I can
>> fetch the 10 largest cities matching the input.
>>
>> Magnus
>>
>>
>>
>> On Mon, Apr 12, 2010 at 10:31 AM, Sebastian Cohnen
>> <se...@googlemail.com> wrote:
>>> hmm, I do not quite follow... isn't the name of the city unique?
>>> what do
>> you mean by *searching* for a city?
>>>
>>> On 12.04.2010, at 10:25, Magnus Ottosson wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a database with the names and population of cities (about 7
>>>> million entities). Is it possible, with couchdb, to create a key
>>>> that
>>>> includes both cityname and population where I can search for a name
>>>> and get the matching results ordered by the population?
>>>>
>>>> I tried to create a key like [population, name] and search like
>>>> this
>>>> ?startkey=[0, "name"]&endkey=[10000000, "name]&limit=10 but it does
>>>> not work like the way I hoped.
>>>>
>>>> Any ideas?
>>>>
>>>> Magnus
>>>
>>>
>>

Re: Search for city by name, order by population

Posted by Magnus Ottosson <ma...@magnusottosson.se>.
The problem with sorting on the client side is that I have a lot of
entities! Say that I want to fetch all cities starting with the name
"new" it could potentially be thousands and the performance would not
be that great.

I know it's a hack and that it might not be suitable but I found that
if I create the key like I previous mentioned but add the population
to the and of the key I can search for entities like this:

?startkey="new\u9999"&endkey="new0"&descending=true&limit=10

This will sort the cities correctly and I do not need to use the
reduce function.

I will also look at the lucene version but I'm in a hosted environment
where I have no control over the existing software...

Magnus



On Wed, Apr 14, 2010 at 12:11 AM, Jon Gretar Borgthorsson
<jo...@gmail.com> wrote:
> To be honest then in this exact case as you describe it I would skip sorting
> it in CouchDB and sort it on the client side. It's a very simple solution
> that can probably be done in a single line of code. I might even check and
> see if the autocomplete plugin you use simply offers to do that for you.
>
> When in doubt always do what has the least amount of code my great
> grandfather used to tell me. ;)
>
> On Mon, Apr 12, 2010 at 8:38 AM, Magnus Ottosson
> <ma...@magnusottosson.se>wrote:
>
>> Oh, I should have made that more clear. I will use this to
>> autocomplete a searchbox. So when the user start typing I will search
>> for the cities that matches the string. The user might type "new".
>> Then I want to fetch the 10 largest cities based on population where
>> the name starts with "new".
>>
>> If I just wanted to search for the name I could have created an index
>> with the name as the key and searched like this:
>> startkey="new"&endkey="new\u9999" and this would have matched all the
>> cities with the name string with new. Right?
>>
>> I want to sort this result by population in descending order so I can
>> fetch the 10 largest cities matching the input.
>>
>> Magnus
>>
>>
>>
>> On Mon, Apr 12, 2010 at 10:31 AM, Sebastian Cohnen
>> <se...@googlemail.com> wrote:
>> > hmm, I do not quite follow... isn't the name of the city unique? what do
>> you mean by *searching* for a city?
>> >
>> > On 12.04.2010, at 10:25, Magnus Ottosson wrote:
>> >
>> >> Hi,
>> >>
>> >> I have a database with the names and population of cities (about 7
>> >> million entities). Is it possible, with couchdb, to create a key that
>> >> includes both cityname and population where I can search for a name
>> >> and get the matching results ordered by the population?
>> >>
>> >> I tried to create a key like [population, name] and search like this
>> >> ?startkey=[0, "name"]&endkey=[10000000, "name]&limit=10 but it does
>> >> not work like the way I hoped.
>> >>
>> >> Any ideas?
>> >>
>> >> Magnus
>> >
>> >
>>
>

Re: Search for city by name, order by population

Posted by Jon Gretar Borgthorsson <jo...@gmail.com>.
To be honest then in this exact case as you describe it I would skip sorting
it in CouchDB and sort it on the client side. It's a very simple solution
that can probably be done in a single line of code. I might even check and
see if the autocomplete plugin you use simply offers to do that for you.

When in doubt always do what has the least amount of code my great
grandfather used to tell me. ;)

On Mon, Apr 12, 2010 at 8:38 AM, Magnus Ottosson
<ma...@magnusottosson.se>wrote:

> Oh, I should have made that more clear. I will use this to
> autocomplete a searchbox. So when the user start typing I will search
> for the cities that matches the string. The user might type "new".
> Then I want to fetch the 10 largest cities based on population where
> the name starts with "new".
>
> If I just wanted to search for the name I could have created an index
> with the name as the key and searched like this:
> startkey="new"&endkey="new\u9999" and this would have matched all the
> cities with the name string with new. Right?
>
> I want to sort this result by population in descending order so I can
> fetch the 10 largest cities matching the input.
>
> Magnus
>
>
>
> On Mon, Apr 12, 2010 at 10:31 AM, Sebastian Cohnen
> <se...@googlemail.com> wrote:
> > hmm, I do not quite follow... isn't the name of the city unique? what do
> you mean by *searching* for a city?
> >
> > On 12.04.2010, at 10:25, Magnus Ottosson wrote:
> >
> >> Hi,
> >>
> >> I have a database with the names and population of cities (about 7
> >> million entities). Is it possible, with couchdb, to create a key that
> >> includes both cityname and population where I can search for a name
> >> and get the matching results ordered by the population?
> >>
> >> I tried to create a key like [population, name] and search like this
> >> ?startkey=[0, "name"]&endkey=[10000000, "name]&limit=10 but it does
> >> not work like the way I hoped.
> >>
> >> Any ideas?
> >>
> >> Magnus
> >
> >
>

Re: Search for city by name, order by population

Posted by Magnus Ottosson <ma...@magnusottosson.se>.
Oh, I should have made that more clear. I will use this to
autocomplete a searchbox. So when the user start typing I will search
for the cities that matches the string. The user might type "new".
Then I want to fetch the 10 largest cities based on population where
the name starts with "new".

If I just wanted to search for the name I could have created an index
with the name as the key and searched like this:
startkey="new"&endkey="new\u9999" and this would have matched all the
cities with the name string with new. Right?

I want to sort this result by population in descending order so I can
fetch the 10 largest cities matching the input.

Magnus



On Mon, Apr 12, 2010 at 10:31 AM, Sebastian Cohnen
<se...@googlemail.com> wrote:
> hmm, I do not quite follow... isn't the name of the city unique? what do you mean by *searching* for a city?
>
> On 12.04.2010, at 10:25, Magnus Ottosson wrote:
>
>> Hi,
>>
>> I have a database with the names and population of cities (about 7
>> million entities). Is it possible, with couchdb, to create a key that
>> includes both cityname and population where I can search for a name
>> and get the matching results ordered by the population?
>>
>> I tried to create a key like [population, name] and search like this
>> ?startkey=[0, "name"]&endkey=[10000000, "name]&limit=10 but it does
>> not work like the way I hoped.
>>
>> Any ideas?
>>
>> Magnus
>
>

Re: Search for city by name, order by population

Posted by Sebastian Cohnen <se...@googlemail.com>.
hmm, I do not quite follow... isn't the name of the city unique? what do you mean by *searching* for a city?

On 12.04.2010, at 10:25, Magnus Ottosson wrote:

> Hi,
> 
> I have a database with the names and population of cities (about 7
> million entities). Is it possible, with couchdb, to create a key that
> includes both cityname and population where I can search for a name
> and get the matching results ordered by the population?
> 
> I tried to create a key like [population, name] and search like this
> ?startkey=[0, "name"]&endkey=[10000000, "name]&limit=10 but it does
> not work like the way I hoped.
> 
> Any ideas?
> 
> Magnus