You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Simon Wex <si...@simonwex.com> on 2008/12/24 02:53:51 UTC

Basic re-understanding of reduce.

I have been working on a few projects with couchdb, some of which are
in production so it came as a complete surprise today when I realized
I don't understand CouchDB's reduce implementation.

I have a map that emits a complex key and a count of 1 for each
document. From that I end up with a data set like so:

{"id":"89c285e3-109f-4ce8-9a11-f5219c6a0b36","key":[341235,"auctioned"],"value":1},
{"id":"ab37e07a-d19e-4ddb-b0ab-7da8b5e24a03","key":[341235,"auctioned"],"value":1},
{"id":"b5cf63e0-1892-41df-833b-84468365a08a","key":[341235,"auctioned"],"value":1},
{"id":"89c285e3-109f-4ce8-9a11-f5219c6a0b36","key":[341235,"teasered","125"],"value":1},
{"id":"3341b67a-c789-492e-a9e7-1a3a99540c59","key":[341235,"teasered","127"],"value":1},
{"id":"7e6d6077-619b-4cec-8cc2-d2e8779cdc15","key":[341235,"teasered","127"],"value":1}...

Now what I want from my reduce if the above was the only output is to
end up with a set like so:

{"key":[341235,"auctioned"],"value":3},
{"key":[341235,"teasered","125"],"value":1},
{"key":[341235,"teasered","127"],"value":2}...

I thought my reduce statement would have to look simply like so:

function(keys, values) {
  return sum(values);
}

Much to my surprise, when I didn't get the numbers expected, I did
some logging from my reduce function:

{
  "keys": [
    [[341263,"teasered","125"],"e13d8135-844e-4fec-bf3a-ed189359b30c"],
    [[341263,"teasered","127"],"ae447f40-51cd-406e-8a65-f5a7f163d20e"], ...
  ],
 "values": [1,1,...],
 "rereduce": false
}

I thought that the reduce function was only called with results of the
same key. Essenially I think I was assuming that the functionality of
reduce when rereduce is true was always the case. Can someone help me
understand why a reduce function would get multiple keys for a single
reduce? Also, what should my reduce then look like?

Thanks for the help, Simon.

Re: Basic re-understanding of reduce.

Posted by Chris Anderson <jc...@gmail.com>.
On Tue, Dec 23, 2008 at 7:25 PM, Simon Wex <si...@simonwex.com> wrote:
>
> So this just the system reducing to enable group_level=3|2|1|0 (since
> my key is max 3 elements)?  That does explain what I'm seeing in the
> logs. Very clever.
>

Reduce is designed to allow you to retrieve the reduction value from
arbitrary key ranges. group_level queries are just a way of
calculating a set of key ranges (eg all those key which share a common
2 element prefix, etc.) You can query a reduce view with any start or
end key you'd like. Group level is just a way to ask CouchDB to make a
series of those queries for you. The cleverness is in being able to
reuse intermediate reduce values to calculate the reduction for novel
key ranges. That is, if part of a reduce has been calculated for a
range, then that partial calculation can be reused when computing the
reduction value for the completed range.

There are some figures that may clear the rereduce parts up here:
http://horicky.blogspot.com/2008/10/couchdb-implementation.html

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Basic re-understanding of reduce.

Posted by Mark Gallop <ma...@gmail.com>.
Hi Jason,

Jason Davies wrote:
> I've created a ticket for this known issue here: 
> https://issues.apache.org/jira/browse/COUCHDB-183
>
> Jason

Very handy patch. Thanks for sharing.

Mark


> On 24 Dec 2008, at 15:36, Paul Davis wrote:
>
>> This is the expected behavior. Reduce is unable to calculate the
>> number of rows efficiently and Futon doesn't try to do anything
>> heroic.
>>
>> On Wed, Dec 24, 2008 at 2:26 AM, Mark Gallop <ma...@gmail.com> 
>> wrote:
>>> Hi Simon,
>>>
>>> Simon Wex wrote:
>>>>
>>>> The interesting thing is that if I query the reduce in futon, I see
>>>> only 10 rows (of 10) and the maximum value is 15. Though if I hit the
>>>> view up with a simple GET request, I see what I expected.  This may be
>>>> a firefox caching issue, I'll dig a bit deeper and report anything of
>>>> interest.
>>>
>>> I also get the same futon behavior when I use reduce. The number of 
>>> rows
>>> displayed is either 10 (default) or the last value I set for "Rows per
>>> page".  Try going to "All Documents", then change the "Rows per 
>>> page" to 100
>>> and then return to your view with a map+reduce. Do you now see a 
>>> maximum of
>>> 100?
>>>
>>> I a getting this with Camino and Safari on a Mac.
>>>
>>> Cheers,
>>> Mark
>>>
>
>


Re: Basic re-understanding of reduce.

Posted by Jason Davies <ja...@jasondavies.com>.
I've created a ticket for this known issue here: https://issues.apache.org/jira/browse/COUCHDB-183

Jason

On 24 Dec 2008, at 15:36, Paul Davis wrote:

> This is the expected behavior. Reduce is unable to calculate the
> number of rows efficiently and Futon doesn't try to do anything
> heroic.
>
> On Wed, Dec 24, 2008 at 2:26 AM, Mark Gallop <ma...@gmail.com>  
> wrote:
>> Hi Simon,
>>
>> Simon Wex wrote:
>>>
>>> The interesting thing is that if I query the reduce in futon, I see
>>> only 10 rows (of 10) and the maximum value is 15. Though if I hit  
>>> the
>>> view up with a simple GET request, I see what I expected.  This  
>>> may be
>>> a firefox caching issue, I'll dig a bit deeper and report anything  
>>> of
>>> interest.
>>
>> I also get the same futon behavior when I use reduce. The number of  
>> rows
>> displayed is either 10 (default) or the last value I set for "Rows  
>> per
>> page".  Try going to "All Documents", then change the "Rows per  
>> page" to 100
>> and then return to your view with a map+reduce. Do you now see a  
>> maximum of
>> 100?
>>
>> I a getting this with Camino and Safari on a Mac.
>>
>> Cheers,
>> Mark
>>


Re: Basic re-understanding of reduce.

Posted by Paul Davis <pa...@gmail.com>.
This is the expected behavior. Reduce is unable to calculate the
number of rows efficiently and Futon doesn't try to do anything
heroic.

On Wed, Dec 24, 2008 at 2:26 AM, Mark Gallop <ma...@gmail.com> wrote:
> Hi Simon,
>
> Simon Wex wrote:
>>
>> The interesting thing is that if I query the reduce in futon, I see
>> only 10 rows (of 10) and the maximum value is 15. Though if I hit the
>> view up with a simple GET request, I see what I expected.  This may be
>> a firefox caching issue, I'll dig a bit deeper and report anything of
>> interest.
>
> I also get the same futon behavior when I use reduce. The number of rows
> displayed is either 10 (default) or the last value I set for "Rows per
> page".  Try going to "All Documents", then change the "Rows per page" to 100
> and then return to your view with a map+reduce. Do you now see a maximum of
> 100?
>
> I a getting this with Camino and Safari on a Mac.
>
> Cheers,
> Mark
>

Re: Basic re-understanding of reduce.

Posted by Mark Gallop <ma...@gmail.com>.
Hi Simon,

Simon Wex wrote:
> The interesting thing is that if I query the reduce in futon, I see
> only 10 rows (of 10) and the maximum value is 15. Though if I hit the
> view up with a simple GET request, I see what I expected.  This may be
> a firefox caching issue, I'll dig a bit deeper and report anything of
> interest.
I also get the same futon behavior when I use reduce. The number of rows 
displayed is either 10 (default) or the last value I set for "Rows per 
page".  Try going to "All Documents", then change the "Rows per page" to 
100 and then return to your view with a map+reduce. Do you now see a 
maximum of 100?

I a getting this with Camino and Safari on a Mac.

Cheers,
Mark

Re: Basic re-understanding of reduce.

Posted by Simon Wex <si...@simonwex.com>.
Damien,

So this just the system reducing to enable group_level=3|2|1|0 (since
my key is max 3 elements)?  That does explain what I'm seeing in the
logs. Very clever.

The interesting thing is that if I query the reduce in futon, I see
only 10 rows (of 10) and the maximum value is 15. Though if I hit the
view up with a simple GET request, I see what I expected.  This may be
a firefox caching issue, I'll dig a bit deeper and report anything of
interest.

Thanks for the speedy reply.

-Simon

On Tue, Dec 23, 2008 at 6:14 PM, Damien Katz <da...@apache.org> wrote:
> You are using the reduce properly.  CouchDb reduces everything regardless of
> the keys. So what you are seeing is couchdb doing full reductions of
> modified nodes during index updates. This gives you the option of reducing
> across arbitrary key ranges.
>
> -Damien
>
> On Dec 23, 2008, at 8:53 PM, Simon Wex wrote:
>
>> I have been working on a few projects with couchdb, some of which are
>> in production so it came as a complete surprise today when I realized
>> I don't understand CouchDB's reduce implementation.
>>
>> I have a map that emits a complex key and a count of 1 for each
>> document. From that I end up with a data set like so:
>>
>>
>> {"id":"89c285e3-109f-4ce8-9a11-f5219c6a0b36","key":[341235,"auctioned"],"value":1},
>>
>> {"id":"ab37e07a-d19e-4ddb-b0ab-7da8b5e24a03","key":[341235,"auctioned"],"value":1},
>>
>> {"id":"b5cf63e0-1892-41df-833b-84468365a08a","key":[341235,"auctioned"],"value":1},
>>
>> {"id":"89c285e3-109f-4ce8-9a11-f5219c6a0b36","key":[341235,"teasered","125"],"value":1},
>>
>> {"id":"3341b67a-c789-492e-a9e7-1a3a99540c59","key":[341235,"teasered","127"],"value":1},
>>
>> {"id":"7e6d6077-619b-4cec-8cc2-d2e8779cdc15","key":[341235,"teasered","127"],"value":1}...
>>
>> Now what I want from my reduce if the above was the only output is to
>> end up with a set like so:
>>
>> {"key":[341235,"auctioned"],"value":3},
>> {"key":[341235,"teasered","125"],"value":1},
>> {"key":[341235,"teasered","127"],"value":2}...
>>
>> I thought my reduce statement would have to look simply like so:
>>
>> function(keys, values) {
>>  return sum(values);
>> }
>>
>> Much to my surprise, when I didn't get the numbers expected, I did
>> some logging from my reduce function:
>>
>> {
>>  "keys": [
>>   [[341263,"teasered","125"],"e13d8135-844e-4fec-bf3a-ed189359b30c"],
>>   [[341263,"teasered","127"],"ae447f40-51cd-406e-8a65-f5a7f163d20e"], ...
>>  ],
>> "values": [1,1,...],
>> "rereduce": false
>> }
>>
>> I thought that the reduce function was only called with results of the
>> same key. Essenially I think I was assuming that the functionality of
>> reduce when rereduce is true was always the case. Can someone help me
>> understand why a reduce function would get multiple keys for a single
>> reduce? Also, what should my reduce then look like?
>>
>> Thanks for the help, Simon.
>
>



-- 
Simon Wex
CTO and Co-Founder
Zeep
http://zeepmobile.com
http://zeepmedia.com

Re: Basic re-understanding of reduce.

Posted by "ara.t.howard" <ar...@gmail.com>.
On Dec 23, 2008, at 7:14 PM, Damien Katz wrote:

> You are using the reduce properly.  CouchDb reduces everything  
> regardless of the keys. So what you are seeing is couchdb doing full  
> reductions of modified nodes during index updates. This gives you  
> the option of reducing across arbitrary key ranges.
>
> -Damien

i'd like to get some clarification on this - any pointers someone can  
throw at me?  an idiot's guide to reduce as it were?

a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being  
better. simply reflect on that.
h.h. the 14th dalai lama




Re: Basic re-understanding of reduce.

Posted by Damien Katz <da...@apache.org>.
You are using the reduce properly.  CouchDb reduces everything  
regardless of the keys. So what you are seeing is couchdb doing full  
reductions of modified nodes during index updates. This gives you the  
option of reducing across arbitrary key ranges.

-Damien

On Dec 23, 2008, at 8:53 PM, Simon Wex wrote:

> I have been working on a few projects with couchdb, some of which are
> in production so it came as a complete surprise today when I realized
> I don't understand CouchDB's reduce implementation.
>
> I have a map that emits a complex key and a count of 1 for each
> document. From that I end up with a data set like so:
>
> {"id":"89c285e3-109f-4ce8-9a11-f5219c6a0b36","key": 
> [341235,"auctioned"],"value":1},
> {"id":"ab37e07a-d19e-4ddb-b0ab-7da8b5e24a03","key": 
> [341235,"auctioned"],"value":1},
> {"id":"b5cf63e0-1892-41df-833b-84468365a08a","key": 
> [341235,"auctioned"],"value":1},
> {"id":"89c285e3-109f-4ce8-9a11-f5219c6a0b36","key": 
> [341235,"teasered","125"],"value":1},
> {"id":"3341b67a-c789-492e-a9e7-1a3a99540c59","key": 
> [341235,"teasered","127"],"value":1},
> {"id":"7e6d6077-619b-4cec-8cc2-d2e8779cdc15","key": 
> [341235,"teasered","127"],"value":1}...
>
> Now what I want from my reduce if the above was the only output is to
> end up with a set like so:
>
> {"key":[341235,"auctioned"],"value":3},
> {"key":[341235,"teasered","125"],"value":1},
> {"key":[341235,"teasered","127"],"value":2}...
>
> I thought my reduce statement would have to look simply like so:
>
> function(keys, values) {
>  return sum(values);
> }
>
> Much to my surprise, when I didn't get the numbers expected, I did
> some logging from my reduce function:
>
> {
>  "keys": [
>    [[341263,"teasered","125"],"e13d8135-844e-4fec-bf3a-ed189359b30c"],
>    [[341263,"teasered","127"],"ae447f40-51cd-406e-8a65- 
> f5a7f163d20e"], ...
>  ],
> "values": [1,1,...],
> "rereduce": false
> }
>
> I thought that the reduce function was only called with results of the
> same key. Essenially I think I was assuming that the functionality of
> reduce when rereduce is true was always the case. Can someone help me
> understand why a reduce function would get multiple keys for a single
> reduce? Also, what should my reduce then look like?
>
> Thanks for the help, Simon.