You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mike Bannister <mi...@gmail.com> on 2010/11/11 21:59:56 UTC

First doc in a group?

Hi all, I'm a new user and have made some strides in understanding The
CouchDB Way. I've come to a piece of my project that I'm unsure how to
handle, I would greatly appreciate any nudging in the right direction that
people can offer.

I have a set of documents that are related to users in my system and each
document has a timestamp. What's the CouchDB way of getting the latest
document for each user? Right now I'm going through each user and then
selecting their latest doc using limit/start_key parameters to query the
view. Is there a crafty map function solution to this problem or should I be
thinking about a reduce function to somehow get rid of the docs I don't
want?

Thanks, I think understanding this will be very helpful to me generally.

Mike

Re: First doc in a group?

Posted by Robert Newson <ro...@gmail.com>.
No, it's not a summary in the sense of map/reduce. "sum" or "max" or
"count" are examples of reasonable reduce functions. Specifically,
functions that are commutative, associative and reduce a variable
sized input to a fixed sized output.

What you want is to find the last document matching certain criteria
from a sorted list, and that doesn't need a reduce function, just the
right view query.

B.

On Thu, Nov 11, 2010 at 10:21 PM, Mike Bannister
<mi...@gmail.com> wrote:
> Cory, cool thanks. Wasn't able to decide on my own if reducing was OK for
> this kind of thing.
>
> Robert, but I need one document for each user, wouldn't that be a summary of
> sorts?
>
> -Mike
>
>
>  On Nov 11, 2010 4:46 PM, "Cory Zue" <cz...@dimagi.com> wrote:
>
> You could emit the users as keys, and in your reduce function just
> return the latest by date.
>
>
> On Thu, Nov 11, 2010 at 4:35 PM, Mike Bannister <mi...@gmail.com>
> wrote:
>> Yeah, I'm trying...
>

Re: First doc in a group?

Posted by Mike Bannister <mi...@gmail.com>.
What I failed to be explicit about initially is I'm already doing what
Robert and Matthew are suggesting but it seems like that requires having a
user list initially and I'd also like to get the latest doc for each user in
1 query rather than one for each user. Thanks everyone. -Mike

On Thu, Nov 11, 2010 at 5:21 PM, Mike Bannister <mi...@gmail.com>wrote:

> Cory, cool thanks. Wasn't able to decide on my own if reducing was OK for
> this kind of thing.
>
> Robert, but I need one document for each user, wouldn't that be a summary
> of sorts?
>
> -Mike
>
>
>  On Nov 11, 2010 4:46 PM, "Cory Zue" <cz...@dimagi.com> wrote:
>
> You could emit the users as keys, and in your reduce function just
> return the latest by date.
>
>
> On Thu, Nov 11, 2010 at 4:35 PM, Mike Bannister <mi...@gmail.com>
> wrote:
> > Yeah, I'm trying...
>
>

Re: First doc in a group?

Posted by Mike Bannister <mi...@gmail.com>.
Thanks to everyone, the discussion has been very illustrative to me and
helped deepen my understanding of Couch.

On Thu, Nov 11, 2010 at 8:06 PM, Chad George <ch...@mgproducts.com> wrote:

> After a little reflection, maybe my original response was a little too
> strong. You might need to use a reduce if you want the latest document for
> a
> whole set of users in a single call.
>
> If you only wanted the latest document for a single user, then the view+key
> ranges+limit=1  is definitely the best approach.
>
> Maybe Cory's idea is the best approach. A reduce function that calculates
> the latest document (maximum timestamp) and also use group=True or
> group_level=1 (depending on the final view key structure used)
>
> -Chad
>
> On Thu, Nov 11, 2010 at 6:29 PM, Mike Bannister <mikebannister@gmail.com
> >wrote:
>
> > Cool, I understand now reduce isn't right, initially posted to the list
> > because I couldn't decide that on my own.
> >
> > So what's the most efficient way to get the latest document for each
> user?
> > Seems like I shouldn't have to do one query per user but I'm open minded
> (:
> >
> > -Mike
> >
> >
> >
> > On Thu, Nov 11, 2010 at 5:52 PM, Chad George <ch...@mgproducts.com>
> wrote:
> >
> > > Selecting one row in a view out of many possible isn't what reduce is
> > for.
> > >
> > > I try not to think of it as reducing a set of view results to a smaller
> > set
> > > but rather reducing each and every entry in the view to something
> > smaller.
> > >
> > > The fact that reduce gets multiple view rows to work on at once is just
> > an
> > > optimization. I think its better to think of reduce as working on
> exactly
> > > one view row at a time then rereduce the result to get final answer.
> > > On Nov 11, 2010 5:22 PM, "Mike Bannister" <mi...@gmail.com>
> > wrote:
> > > > Cory, cool thanks. Wasn't able to decide on my own if reducing was OK
> > for
> > > > this kind of thing.
> > > >
> > > > Robert, but I need one document for each user, wouldn't that be a
> > summary
> > > of
> > > > sorts?
> > > >
> > > > -Mike
> > > >
> > > >
> > > > On Nov 11, 2010 4:46 PM, "Cory Zue" <cz...@dimagi.com> wrote:
> > > >
> > > > You could emit the users as keys, and in your reduce function just
> > > > return the latest by date.
> > > >
> > > >
> > > > On Thu, Nov 11, 2010 at 4:35 PM, Mike Bannister <
> > mikebannister@gmail.com
> > > >
> > > > wrote:
> > > >> Yeah, I'm trying...
> > >
> >
>

Re: First doc in a group?

Posted by Chad George <ch...@mgproducts.com>.
After a little reflection, maybe my original response was a little too
strong. You might need to use a reduce if you want the latest document for a
whole set of users in a single call.

If you only wanted the latest document for a single user, then the view+key
ranges+limit=1  is definitely the best approach.

Maybe Cory's idea is the best approach. A reduce function that calculates
the latest document (maximum timestamp) and also use group=True or
group_level=1 (depending on the final view key structure used)

-Chad

On Thu, Nov 11, 2010 at 6:29 PM, Mike Bannister <mi...@gmail.com>wrote:

> Cool, I understand now reduce isn't right, initially posted to the list
> because I couldn't decide that on my own.
>
> So what's the most efficient way to get the latest document for each user?
> Seems like I shouldn't have to do one query per user but I'm open minded (:
>
> -Mike
>
>
>
> On Thu, Nov 11, 2010 at 5:52 PM, Chad George <ch...@mgproducts.com> wrote:
>
> > Selecting one row in a view out of many possible isn't what reduce is
> for.
> >
> > I try not to think of it as reducing a set of view results to a smaller
> set
> > but rather reducing each and every entry in the view to something
> smaller.
> >
> > The fact that reduce gets multiple view rows to work on at once is just
> an
> > optimization. I think its better to think of reduce as working on exactly
> > one view row at a time then rereduce the result to get final answer.
> > On Nov 11, 2010 5:22 PM, "Mike Bannister" <mi...@gmail.com>
> wrote:
> > > Cory, cool thanks. Wasn't able to decide on my own if reducing was OK
> for
> > > this kind of thing.
> > >
> > > Robert, but I need one document for each user, wouldn't that be a
> summary
> > of
> > > sorts?
> > >
> > > -Mike
> > >
> > >
> > > On Nov 11, 2010 4:46 PM, "Cory Zue" <cz...@dimagi.com> wrote:
> > >
> > > You could emit the users as keys, and in your reduce function just
> > > return the latest by date.
> > >
> > >
> > > On Thu, Nov 11, 2010 at 4:35 PM, Mike Bannister <
> mikebannister@gmail.com
> > >
> > > wrote:
> > >> Yeah, I'm trying...
> >
>

Re: First doc in a group?

Posted by Mike Bannister <mi...@gmail.com>.
Cool, I understand now reduce isn't right, initially posted to the list
because I couldn't decide that on my own.

So what's the most efficient way to get the latest document for each user?
Seems like I shouldn't have to do one query per user but I'm open minded (:

-Mike



On Thu, Nov 11, 2010 at 5:52 PM, Chad George <ch...@mgproducts.com> wrote:

> Selecting one row in a view out of many possible isn't what reduce is for.
>
> I try not to think of it as reducing a set of view results to a smaller set
> but rather reducing each and every entry in the view to something smaller.
>
> The fact that reduce gets multiple view rows to work on at once is just an
> optimization. I think its better to think of reduce as working on exactly
> one view row at a time then rereduce the result to get final answer.
> On Nov 11, 2010 5:22 PM, "Mike Bannister" <mi...@gmail.com> wrote:
> > Cory, cool thanks. Wasn't able to decide on my own if reducing was OK for
> > this kind of thing.
> >
> > Robert, but I need one document for each user, wouldn't that be a summary
> of
> > sorts?
> >
> > -Mike
> >
> >
> > On Nov 11, 2010 4:46 PM, "Cory Zue" <cz...@dimagi.com> wrote:
> >
> > You could emit the users as keys, and in your reduce function just
> > return the latest by date.
> >
> >
> > On Thu, Nov 11, 2010 at 4:35 PM, Mike Bannister <mikebannister@gmail.com
> >
> > wrote:
> >> Yeah, I'm trying...
>

Re: First doc in a group?

Posted by Chad George <ch...@mgproducts.com>.
Selecting one row in a view out of many possible isn't what reduce is for.

I try not to think of it as reducing a set of view results to a smaller set
but rather reducing each and every entry in the view to something smaller.

The fact that reduce gets multiple view rows to work on at once is just an
optimization. I think its better to think of reduce as working on exactly
one view row at a time then rereduce the result to get final answer.
On Nov 11, 2010 5:22 PM, "Mike Bannister" <mi...@gmail.com> wrote:
> Cory, cool thanks. Wasn't able to decide on my own if reducing was OK for
> this kind of thing.
>
> Robert, but I need one document for each user, wouldn't that be a summary
of
> sorts?
>
> -Mike
>
>
> On Nov 11, 2010 4:46 PM, "Cory Zue" <cz...@dimagi.com> wrote:
>
> You could emit the users as keys, and in your reduce function just
> return the latest by date.
>
>
> On Thu, Nov 11, 2010 at 4:35 PM, Mike Bannister <mi...@gmail.com>
> wrote:
>> Yeah, I'm trying...

Re: First doc in a group?

Posted by Mike Bannister <mi...@gmail.com>.
Cory, cool thanks. Wasn't able to decide on my own if reducing was OK for
this kind of thing.

Robert, but I need one document for each user, wouldn't that be a summary of
sorts?

-Mike


 On Nov 11, 2010 4:46 PM, "Cory Zue" <cz...@dimagi.com> wrote:

You could emit the users as keys, and in your reduce function just
return the latest by date.


On Thu, Nov 11, 2010 at 4:35 PM, Mike Bannister <mi...@gmail.com>
wrote:
> Yeah, I'm trying...

Re: First doc in a group?

Posted by Robert Newson <ro...@gmail.com>.
The purpose of reduce is to summarize or aggregate.

Matthew's suggestion is correct. Emit keys such that your date sorts
by date then use descending=true to get the latest item of interest by
date. Use limit=1 to ensure you get one result.

To expand it more fully;

map: emit([doc.user, doc.timestamp], null);

then query with;

startkey=["user I want",{}]
endkey=["user I want"]
descending=true
limit=1

B.

On Thu, Nov 11, 2010 at 9:45 PM, Cory Zue <cz...@dimagi.com> wrote:
> You could emit the users as keys, and in your reduce function just
> return the latest by date.
>
> On Thu, Nov 11, 2010 at 4:35 PM, Mike Bannister <mi...@gmail.com> wrote:
>> Yeah, I'm trying to get the latest document for each user in one query
>> rather than one query per user.
>>
>> On Thu, Nov 11, 2010 at 4:11 PM, Matthew Woodward <ma...@mattwoodward.com>wrote:
>>
>>> On Thu, Nov 11, 2010 at 12:59 PM, Mike Bannister <mikebannister@gmail.com
>>> >wrote:
>>>
>>> > I have a set of documents that are related to users in my system and each
>>> > document has a timestamp. What's the CouchDB way of getting the latest
>>> > document for each user?
>>>
>>>
>>> I'm pretty new to CouchDB as well so take this for what it's worth--in one
>>> of our databases we needed to get the most recent document based on a
>>> timestamp, so we just have a view called "byDate" with the key being the
>>> timestamp, and then we just call the view like this:
>>>
>>> http://server/database/_design/designname/_view/byDate?descending=true&limit=1
>>>
>>> Not sure that's quite what you're after and might not address the user part
>>> of what you're doing, but hope that helps a bit.
>>>
>>> --
>>> Matthew Woodward
>>> matt@mattwoodward.com
>>> http://blog.mattwoodward.com
>>> identi.ca / Twitter: @mpwoodward
>>>
>>> Please do not send me proprietary file formats such as Word, PowerPoint,
>>> etc. as attachments.
>>> http://www.gnu.org/philosophy/no-word-attachments.html
>>>
>>
>

Re: First doc in a group?

Posted by Cory Zue <cz...@dimagi.com>.
You could emit the users as keys, and in your reduce function just
return the latest by date.

On Thu, Nov 11, 2010 at 4:35 PM, Mike Bannister <mi...@gmail.com> wrote:
> Yeah, I'm trying to get the latest document for each user in one query
> rather than one query per user.
>
> On Thu, Nov 11, 2010 at 4:11 PM, Matthew Woodward <ma...@mattwoodward.com>wrote:
>
>> On Thu, Nov 11, 2010 at 12:59 PM, Mike Bannister <mikebannister@gmail.com
>> >wrote:
>>
>> > I have a set of documents that are related to users in my system and each
>> > document has a timestamp. What's the CouchDB way of getting the latest
>> > document for each user?
>>
>>
>> I'm pretty new to CouchDB as well so take this for what it's worth--in one
>> of our databases we needed to get the most recent document based on a
>> timestamp, so we just have a view called "byDate" with the key being the
>> timestamp, and then we just call the view like this:
>>
>> http://server/database/_design/designname/_view/byDate?descending=true&limit=1
>>
>> Not sure that's quite what you're after and might not address the user part
>> of what you're doing, but hope that helps a bit.
>>
>> --
>> Matthew Woodward
>> matt@mattwoodward.com
>> http://blog.mattwoodward.com
>> identi.ca / Twitter: @mpwoodward
>>
>> Please do not send me proprietary file formats such as Word, PowerPoint,
>> etc. as attachments.
>> http://www.gnu.org/philosophy/no-word-attachments.html
>>
>

Re: First doc in a group?

Posted by Mike Bannister <mi...@gmail.com>.
Yeah, I'm trying to get the latest document for each user in one query
rather than one query per user.

On Thu, Nov 11, 2010 at 4:11 PM, Matthew Woodward <ma...@mattwoodward.com>wrote:

> On Thu, Nov 11, 2010 at 12:59 PM, Mike Bannister <mikebannister@gmail.com
> >wrote:
>
> > I have a set of documents that are related to users in my system and each
> > document has a timestamp. What's the CouchDB way of getting the latest
> > document for each user?
>
>
> I'm pretty new to CouchDB as well so take this for what it's worth--in one
> of our databases we needed to get the most recent document based on a
> timestamp, so we just have a view called "byDate" with the key being the
> timestamp, and then we just call the view like this:
>
> http://server/database/_design/designname/_view/byDate?descending=true&limit=1
>
> Not sure that's quite what you're after and might not address the user part
> of what you're doing, but hope that helps a bit.
>
> --
> Matthew Woodward
> matt@mattwoodward.com
> http://blog.mattwoodward.com
> identi.ca / Twitter: @mpwoodward
>
> Please do not send me proprietary file formats such as Word, PowerPoint,
> etc. as attachments.
> http://www.gnu.org/philosophy/no-word-attachments.html
>

Re: First doc in a group?

Posted by Matthew Woodward <ma...@mattwoodward.com>.
On Thu, Nov 11, 2010 at 12:59 PM, Mike Bannister <mi...@gmail.com>wrote:

> I have a set of documents that are related to users in my system and each
> document has a timestamp. What's the CouchDB way of getting the latest
> document for each user?


I'm pretty new to CouchDB as well so take this for what it's worth--in one
of our databases we needed to get the most recent document based on a
timestamp, so we just have a view called "byDate" with the key being the
timestamp, and then we just call the view like this:
http://server/database/_design/designname/_view/byDate?descending=true&limit=1

Not sure that's quite what you're after and might not address the user part
of what you're doing, but hope that helps a bit.

-- 
Matthew Woodward
matt@mattwoodward.com
http://blog.mattwoodward.com
identi.ca / Twitter: @mpwoodward

Please do not send me proprietary file formats such as Word, PowerPoint,
etc. as attachments.
http://www.gnu.org/philosophy/no-word-attachments.html