You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Joel Reed <jo...@visn.biz> on 2008/08/26 23:01:56 UTC

using reduce to get SQL distinct like results

I have a bunch of documents that look vaguely like this:

[{ "User":"Jane Doe", "Date":"2008/08/12", } { "User":"Jane Doe", 
"Date":"2008/08/15", }, etc...]

IOW, There might be several entries for each user all with different dates.

I'd like tp combine all "Jane Doe's" records into 1 entry. Some kind of 
output like:

"rows": [ {"key": "Jane Doe", "value": [
{"id":"481bf9e8a0c23bb61eeed4b3707bae59","Date":"2008/08/12"},
{"id":"1e6e541b391efcb9c137d7097e7de6ed","Date":"2008/08/15"}
] } ]


I tried a map/reduce function like this:

"all": {
  "map": "function(doc) { emit(doc.User, doc); }",
  "reduce": "function(keys, values) {
     var tree = {};
     for (var i in keys) 
{                                                            
       var id = keys[i][1];
       var key = values[i].User;
       var value = values[i].Date;

       if (!tree.hasOwnProperty(key)) tree[key] = [];
       var len = tree[key].length;
       tree[key][len] = { id: id, Date: value};
     }

    return tree;
  }

And I get OK output, but not as nice:

"rows":[{"key":null,"value":{"Terri 
Hepburn":[{"id":"481bf9e8a0c23bb61eeed4b3707bae59","Date":"Aug 12, 
2008"},{"id":"1e6e541b391efcb9c137d7097e7de6ed","Date":"Aug 13, 
2008"}],"Portuguese 
Nimrod":[{"id":"5beddf3f80d2662f51a46c3f29dd4e9d","Date":"Aug 20, 
2008"},{"id":"1343194e6dd8e1b07ee1e41949c5a3a3","Date":"Aug 20, 
2008"}],"Paraguayans 
Briana":[{"id":"b0d65454d1395f48a47e4f7b09298a17","Date":"Aug 11, 
2008"},{"id":"af3617bdffd0abfc150e0ebb9b2d6968","Date":"Aug 12, 2008"}],
...

A null key, and everything jammed into the first value. Is that the best 
I can do? Is there a better way to output this from a reduce function? 
Googling has not turned up much for me.

jr


Re: using reduce to get SQL distinct like results

Posted by Joel Reed <jo...@visn.biz>.
Joel Reed wrote:
> Chris Anderson wrote:
>> I would avoid using a reduce at all in this case, and just request the
>> key-range corresponding to the particular user directly from the map.
>>   
> OK, but just to be clear, the data set I want to display would be 
> something like
>
> User 1 -- Dates: 11/02/2003, 12/02/2004, 1/02/2005
> User 2 -- Dates: 3/12/2004, 10/02/2005, 1/02/2006
> User 3 -- Dates: 4/12/2003, 6/02/2006, 1/02/2006
>
> etc...
>
> You wouldn't advise me to make multiple calls to the same view for 
> each user would you? Once to get list of all users, then once for 
> every user? I was thinking that maybe in this case I should be using a 
> reduce function. What would you advise?
Btw, I don't know erlang, but if I can help out by adding info to the 
wiki please let me know. Just some guidance on what is worthy of the 
wiki and where it should be best put would be helpful.

jr

Re: using reduce to get SQL distinct like results

Posted by Paul Davis <pa...@gmail.com>.
Superficial? Bah! :D

Also, if you're wanting to list your users alphabetically, (Or in some
other sortable range) you can page through the sorted list using
startkey and endkey.

Paul

On Tue, Aug 26, 2008 at 10:04 PM, Chris Anderson <jc...@grabb.it> wrote:
> On Tue, Aug 26, 2008 at 6:19 PM, Joel Reed <jo...@visn.biz> wrote:
>> You wouldn't advise me to make multiple calls to the same view for each user
>> would you? Once to get list of all users, then once for every user? I was
>> thinking that maybe in this case I should be using a reduce function. What
>> would you advise?
>
> Using a reduce function shouldn't have any impact on the number of
> calls you have to make. But specifically, I think a reduce function
> like the one in this thread wont stand up to a very large dataset.
>
> In your case I would have one view that outputs all the interesting
> users, and then query the dates-per-user view once per each user. If
> you're interested in all users, of course you can just query the whole
> view all at once and avoid the N-queries problem.
>
> In my application I routinely go the N queries route (for N of a few
> hundred) and can still keep response times to within 5 or 6 seconds
> (acceptable, especially with caching).
>
> There is a patch waiting to be applied that allows for multi-key
> queries, which should help (superficially) with the N queries problem.
>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Re: using reduce to get SQL distinct like results

Posted by Chris Anderson <jc...@grabb.it>.
On Tue, Aug 26, 2008 at 6:19 PM, Joel Reed <jo...@visn.biz> wrote:
> You wouldn't advise me to make multiple calls to the same view for each user
> would you? Once to get list of all users, then once for every user? I was
> thinking that maybe in this case I should be using a reduce function. What
> would you advise?

Using a reduce function shouldn't have any impact on the number of
calls you have to make. But specifically, I think a reduce function
like the one in this thread wont stand up to a very large dataset.

In your case I would have one view that outputs all the interesting
users, and then query the dates-per-user view once per each user. If
you're interested in all users, of course you can just query the whole
view all at once and avoid the N-queries problem.

In my application I routinely go the N queries route (for N of a few
hundred) and can still keep response times to within 5 or 6 seconds
(acceptable, especially with caching).

There is a patch waiting to be applied that allows for multi-key
queries, which should help (superficially) with the N queries problem.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: using reduce to get SQL distinct like results

Posted by Joel Reed <jo...@visn.biz>.
Chris Anderson wrote:
> I would avoid using a reduce at all in this case, and just request the
> key-range corresponding to the particular user directly from the map.
>   
OK, but just to be clear, the data set I want to display would be 
something like

User 1 -- Dates: 11/02/2003, 12/02/2004, 1/02/2005
User 2 -- Dates: 3/12/2004, 10/02/2005, 1/02/2006
User 3 -- Dates: 4/12/2003, 6/02/2006, 1/02/2006

etc...

You wouldn't advise me to make multiple calls to the same view for each 
user would you? Once to get list of all users, then once for every user? 
I was thinking that maybe in this case I should be using a reduce 
function. What would you advise?

jr


> You'll get all the information you need, and little extraneous
> information. You can then munge the data into whatever shape you need
> in your application. However, I bet you won't need to do much munging.
>
>  "all": {
>    "map": "function(doc) { emit(doc.User, doc.Date); }"
>   }
>  }
>
> Reduce should be reserved for when you want to aggregate a scalar
> value from a set of map rows. Eg, the number of messages a user
> received in a given month, or some such. When you merely want all the
> data from a map, about a given key, just query the map.
>
> Chris
>
>
>
>   


Re: using reduce to get SQL distinct like results

Posted by Chris Anderson <jc...@grabb.it>.
I would avoid using a reduce at all in this case, and just request the
key-range corresponding to the particular user directly from the map.
You'll get all the information you need, and little extraneous
information. You can then munge the data into whatever shape you need
in your application. However, I bet you won't need to do much munging.

 "all": {
   "map": "function(doc) { emit(doc.User, doc.Date); }"
  }
 }

Reduce should be reserved for when you want to aggregate a scalar
value from a set of map rows. Eg, the number of messages a user
received in a given month, or some such. When you merely want all the
data from a map, about a given key, just query the map.

Chris



-- 
Chris Anderson
http://jchris.mfdz.com

Re: using reduce to get SQL distinct like results

Posted by Joel Reed <jo...@visn.biz>.
Michael Hendricks wrote:
> On Tue, Aug 26, 2008 at 05:01:56PM -0400, Joel Reed wrote:
>   
>> I have a bunch of documents that look vaguely like this:
>>
>> [{ "User":"Jane Doe", "Date":"2008/08/12", } { "User":"Jane Doe", 
>> "Date":"2008/08/15", }, etc...]
>>
>> IOW, There might be several entries for each user all with different dates.
>>
>> I'd like tp combine all "Jane Doe's" records into 1 entry. Some kind of 
>> output like:
>>
>> "rows": [ {"key": "Jane Doe", "value": [
>> {"id":"481bf9e8a0c23bb61eeed4b3707bae59","Date":"2008/08/12"},
>> {"id":"1e6e541b391efcb9c137d7097e7de6ed","Date":"2008/08/15"}
>> ] } ]
>>     
>
> How about something like this:
>
>  "all": {
>    "map": "function(doc) { emit(doc.User, doc.Date); }",
>    "reduce": "function(keys, values) {
>         var results = new Array();
>         for ( var i in values ) {
>             results.push({
>                 id   : keys[i][1],
>                 Date : values[i]
>             });
>         }
>
>         return results;"
>   }
> }

Thanks Michael, I'll give this a try. The results look alot nicer.

jr

Re: using reduce to get SQL distinct like results

Posted by Michael Hendricks <mi...@ndrix.org>.
On Tue, Aug 26, 2008 at 05:01:56PM -0400, Joel Reed wrote:
> I have a bunch of documents that look vaguely like this:
>
> [{ "User":"Jane Doe", "Date":"2008/08/12", } { "User":"Jane Doe", 
> "Date":"2008/08/15", }, etc...]
>
> IOW, There might be several entries for each user all with different dates.
>
> I'd like tp combine all "Jane Doe's" records into 1 entry. Some kind of 
> output like:
>
> "rows": [ {"key": "Jane Doe", "value": [
> {"id":"481bf9e8a0c23bb61eeed4b3707bae59","Date":"2008/08/12"},
> {"id":"1e6e541b391efcb9c137d7097e7de6ed","Date":"2008/08/15"}
> ] } ]

How about something like this:

 "all": {
   "map": "function(doc) { emit(doc.User, doc.Date); }",
   "reduce": "function(keys, values) {
        var results = new Array();
        for ( var i in values ) {
            results.push({
                id   : keys[i][1],
                Date : values[i]
            });
        }

        return results;"
  }
}

When I fetch the view with group=true, that gives me output like:

{"rows":[{"key":"Jane Doe","value":[
{"id":"3f35758bf54e24dcc14ddccbe9a68412","Date":"2008\/08\/12"},
{"id":"17c69a1298f0218f68e94c9912be160f","Date":"2008\/08\/15"}
]}]}

-- 
Michael