You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Tommy Chheng <to...@gmail.com> on 2009/07/20 03:14:14 UTC

multiple key word count query problem

I have a simple word count view defined as:
--------
function(doc) {
   if(doc['couchrest-type'] == 'NsfGrant'){
     var words = doc['abstract'].split(/\W+/);
     words.forEach(function(word){
       if (word.length > 1) emit([word, doc['_id']],1);
     });
   }
}

function(keys, values, rereduce) {
   return sum(values);
}
--------
where the key's first parameter is the word and the 2nd parameter is  
the document_id.

so i can do a query like this to get all the documents with the word  
"the" correctly.
http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey= 
["the"]&endkey=["the",{}]&group_level=2

I'm having trouble doing queries on the 2nd parameter, how can i find  
all the words in a particular document?
I tried
http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?key= 
[null,"0808605"]&group_level=2
which gives nothing(thinking that null would match all words)
and
http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey= 
[null,"0808605"]&endkey=[{},"0808605"]&group_level=2
which gives all results. Why is this?

Thanks,
Tommy

Re: multiple key word count query problem

Posted by Brian Candler <B....@pobox.com>.
On Sun, Jul 19, 2009 at 08:06:44PM -0700, Tommy Chheng wrote:
> the problem with having two views:
> If i had two views, one for [word, doc] => count and [doc, word] =>  
> count; it would be re-doing the same word counting function twice.

It's usually not a problem, but if it is for some reason (i.e. your view
calculation is especially expensive) then you could do:

  for (var word in wors) {
    emit(["by_word", word, doc._id]);
    emit(["by_id", doc._id, word]);
  }

Then this single view could be queried for e.g.
   startkey=["by_word","the"]&endkey=["by_word","the",{}]
or
   startkey=["by_id","mydoc1"]&endkey=["by_id","mydoc1",{}]

I would expect a single view like this to be a bit larger than two separate
views, because of the extra tags being stored.

Re: multiple key word count query problem

Posted by Paul Davis <pa...@gmail.com>.
On Sun, Jul 19, 2009 at 11:06 PM, Tommy Chheng<to...@gmail.com> wrote:
> so for keys with two or more parameters, only the first parameter can be
> used for range selection? the 2nd and remaining keys can only be used for
> grouping/sorting?
>

There's no parameters here. There's only one key. Array sorting works
the same way it does in any other situation. If the first elements
aren't equal there's no reason to consider the second position of the
arrays. Second elements aren't treated any different than the first or
eighth.

> the problem with having two views:
> If i had two views, one for [word, doc] => count and [doc, word] => count;
> it would be re-doing the same word counting function twice.
>

Indeed. But as Nitin notes, we do this to ensure that we ensure
incremental calculations among other things.

> I'm gonna try to compute the docs word counts and store the results in
> database itself.

This may save you some computation, but I'd be greatly surprised if
the separation causes you issues. It may be slower on bulk loading
data, but under a normal production load, the extra computational
demand isn't going to affect you most likely.


Paul Davis

>
> thanks,
> tommy
>
> On Jul 19, 2009, at 7:16 PM, Paul Davis wrote:
>
>> On Sun, Jul 19, 2009 at 9:14 PM, Tommy Chheng<to...@gmail.com>
>> wrote:
>>>
>>> I have a simple word count view defined as:
>>> --------
>>> function(doc) {
>>>  if(doc['couchrest-type'] == 'NsfGrant'){
>>>   var words = doc['abstract'].split(/\W+/);
>>>   words.forEach(function(word){
>>>     if (word.length > 1) emit([word, doc['_id']],1);
>>>   });
>>>  }
>>> }
>>>
>>> function(keys, values, rereduce) {
>>>  return sum(values);
>>> }
>>> --------
>>> where the key's first parameter is the word and the 2nd parameter is the
>>> document_id.
>>>
>>> so i can do a query like this to get all the documents with the word
>>> "the"
>>> correctly.
>>>
>>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey=["the"]&endkey=["the",{}]&group_level=2
>>>
>>> I'm having trouble doing queries on the 2nd parameter, how can i find all
>>> the words in a particular document?
>>> I tried
>>>
>>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?key=[null,"0808605"]&group_level=2
>>> which gives nothing(thinking that null would match all words)
>>> and
>>>
>>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey=[null,"0808605"]&endkey=[{},"0808605"]&group_level=2
>>> which gives all results. Why is this?
>>>
>>> Thanks,
>>> Tommy
>>>
>>
>> Querying a view is asking for a slice of a sorted list. Start and end
>> keys delimit the range of rows returned. The solution to your problem
>> is to create a second view so you can query by docid.
>>
>> Paul Davis
>
>

Re: multiple key word count query problem

Posted by Nitin Borwankar <ni...@borwankar.com>.
Tommy Chheng wrote:
> so for keys with two or more parameters, only the first parameter can 
> be used for range selection? the 2nd and remaining keys can only be 
> used for grouping/sorting?
>
> the problem with having two views:
> If i had two views, one for [word, doc] => count and [doc, word] => 
> count; it would be re-doing the same word counting function twice.
>
> I'm gonna try to compute the docs word counts and store the results in 
> database itself.

Yes but the advantage with letting the db do it is that indexes (views) 
are updated incrementally and dynamically whenever a new doc is added.
To get that functionality from your approach you would have to invoke 
the view explicitly via REST call everytime you or someone added a new doc.
And then you would have to update all your stored counts or do some 
diffing to find out which one had changed.  If you expect yur document 
store to be growing this could create performance issues - however if 
you have a static data store your approach may be fine.

I suspect the db can do all this more efficiently for you, though. So 
unless you are severely disk space constrained you may want to just have 
the two views.

Nitin Borwankar


( P.S. I see some NSF related text in there - I am also working on an 
NSF funded project and using Couch - I'd be happy to exchange notes off 
line also if you want)
> thanks,
> tommy
>
> On Jul 19, 2009, at 7:16 PM, Paul Davis wrote:
>
>> On Sun, Jul 19, 2009 at 9:14 PM, Tommy Chheng<to...@gmail.com> 
>> wrote:
>>> I have a simple word count view defined as:
>>> --------
>>> function(doc) {
>>>  if(doc['couchrest-type'] == 'NsfGrant'){
>>>    var words = doc['abstract'].split(/\W+/);
>>>    words.forEach(function(word){
>>>      if (word.length > 1) emit([word, doc['_id']],1);
>>>    });
>>>  }
>>> }
>>>
>>> function(keys, values, rereduce) {
>>>  return sum(values);
>>> }
>>> --------
>>> where the key's first parameter is the word and the 2nd parameter is 
>>> the
>>> document_id.
>>>
>>> so i can do a query like this to get all the documents with the word 
>>> "the"
>>> correctly.
>>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey=["the"]&endkey=["the",{}]&group_level=2 
>>>
>>>
>>> I'm having trouble doing queries on the 2nd parameter, how can i 
>>> find all
>>> the words in a particular document?
>>> I tried
>>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?key=[null,"0808605"]&group_level=2 
>>>
>>> which gives nothing(thinking that null would match all words)
>>> and
>>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey=[null,"0808605"]&endkey=[{},"0808605"]&group_level=2 
>>>
>>> which gives all results. Why is this?
>>>
>>> Thanks,
>>> Tommy
>>>
>>
>> Querying a view is asking for a slice of a sorted list. Start and end
>> keys delimit the range of rows returned. The solution to your problem
>> is to create a second view so you can query by docid.
>>
>> Paul Davis
>


Re: multiple key word count query problem

Posted by Tommy Chheng <to...@gmail.com>.
so for keys with two or more parameters, only the first parameter can  
be used for range selection? the 2nd and remaining keys can only be  
used for grouping/sorting?

the problem with having two views:
If i had two views, one for [word, doc] => count and [doc, word] =>  
count; it would be re-doing the same word counting function twice.

I'm gonna try to compute the docs word counts and store the results in  
database itself.

thanks,
tommy

On Jul 19, 2009, at 7:16 PM, Paul Davis wrote:

> On Sun, Jul 19, 2009 at 9:14 PM, Tommy  
> Chheng<to...@gmail.com> wrote:
>> I have a simple word count view defined as:
>> --------
>> function(doc) {
>>  if(doc['couchrest-type'] == 'NsfGrant'){
>>    var words = doc['abstract'].split(/\W+/);
>>    words.forEach(function(word){
>>      if (word.length > 1) emit([word, doc['_id']],1);
>>    });
>>  }
>> }
>>
>> function(keys, values, rereduce) {
>>  return sum(values);
>> }
>> --------
>> where the key's first parameter is the word and the 2nd parameter  
>> is the
>> document_id.
>>
>> so i can do a query like this to get all the documents with the  
>> word "the"
>> correctly.
>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey= 
>> ["the"]&endkey=["the",{}]&group_level=2
>>
>> I'm having trouble doing queries on the 2nd parameter, how can i  
>> find all
>> the words in a particular document?
>> I tried
>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?key= 
>> [null,"0808605"]&group_level=2
>> which gives nothing(thinking that null would match all words)
>> and
>> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey= 
>> [null,"0808605"]&endkey=[{},"0808605"]&group_level=2
>> which gives all results. Why is this?
>>
>> Thanks,
>> Tommy
>>
>
> Querying a view is asking for a slice of a sorted list. Start and end
> keys delimit the range of rows returned. The solution to your problem
> is to create a second view so you can query by docid.
>
> Paul Davis


Re: multiple key word count query problem

Posted by Paul Davis <pa...@gmail.com>.
On Sun, Jul 19, 2009 at 9:14 PM, Tommy Chheng<to...@gmail.com> wrote:
> I have a simple word count view defined as:
> --------
> function(doc) {
>  if(doc['couchrest-type'] == 'NsfGrant'){
>    var words = doc['abstract'].split(/\W+/);
>    words.forEach(function(word){
>      if (word.length > 1) emit([word, doc['_id']],1);
>    });
>  }
> }
>
> function(keys, values, rereduce) {
>  return sum(values);
> }
> --------
> where the key's first parameter is the word and the 2nd parameter is the
> document_id.
>
> so i can do a query like this to get all the documents with the word "the"
> correctly.
> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey=["the"]&endkey=["the",{}]&group_level=2
>
> I'm having trouble doing queries on the 2nd parameter, how can i find all
> the words in a particular document?
> I tried
> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?key=[null,"0808605"]&group_level=2
> which gives nothing(thinking that null would match all words)
> and
> http://localhost:5984/nsf_grants/_design/NsfGrant/_view/by_word_doc_count?startkey=[null,"0808605"]&endkey=[{},"0808605"]&group_level=2
> which gives all results. Why is this?
>
> Thanks,
> Tommy
>

Querying a view is asking for a slice of a sorted list. Start and end
keys delimit the range of rows returned. The solution to your problem
is to create a second view so you can query by docid.

Paul Davis