You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by He Shiming <he...@gmail.com> on 2011/04/07 07:43:34 UTC

Grouping, with order, and aggregate

Dear Community,

I'm trying to figure out how to write map/reduce for these two scenarios:

SELECT DISTINCT author FROM books ORDER BY releasedate;
SELECT COUNT(DISTINCT author) FROM books;

I consulted http://guide.couchdb.org/draft/cookbook.html , and the
first query looked like this (with group=true&limit=100):
map: function(doc) {
  if (doc.type == 'book' && doc.author)
    emit(doc.author, 1);
}
reduce: function(keys, values) {
  return sum(values);
}

I couldn't figure out how to sort by date based on this map function.
I would like to provide a list of authors, with newest released books
at front. Pagination is going to be a plus if possible.

And because I specified "group=true", I'm not getting the total row
count (number of distinct authors, not their books). Reduce function
will work by group. So the only solution I thought of, is to emit all
authors of books (without grouping), then manually count the array in
reduce with 2 "for loops". I'm imagining a fairly large data set, and
this can be slow, and occupy a lot of memory. What are the other
options?

Thanks!
-- 
Best regards,
He Shiming

Re: Grouping, with order, and aggregate

Posted by Nebu Pookins <ne...@gmail.com>.
On Thu, Apr 7, 2011 at 1:43 AM, He Shiming <he...@gmail.com> wrote:
> Dear Community,
>
> I'm trying to figure out how to write map/reduce for these two scenarios:
>
> SELECT DISTINCT author FROM books ORDER BY releasedate;
> SELECT COUNT(DISTINCT author) FROM books;
>
> I consulted http://guide.couchdb.org/draft/cookbook.html , and the
> first query looked like this (with group=true&limit=100):
> map: function(doc) {
>  if (doc.type == 'book' && doc.author)
>    emit(doc.author, 1);
> }
> reduce: function(keys, values) {
>  return sum(values);
> }
>
> I couldn't figure out how to sort by date based on this map function.
> I would like to provide a list of authors, with newest released books
> at front. Pagination is going to be a plus if possible.

Given that the same author may have written multiple books, you need
to define what the semantics are for "reducing" multiple release
dates. It sounds like for each author, you want the date of their most
recently released book, so your reduce function should probably look
at all of the release dates for a given, return the most recent date
(this is in addition to the summing logic you've already got in place
to count how many books an author has written).

- Nebu