You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mathias Leppich <ml...@muhqu.de> on 2011/02/02 08:47:18 UTC

Re: Reporting aggregated data using reduce function

Maybe related: a typical reduce function I use to sum on objects:

function (keys, values, rereduce) {
  var sums = {};
  for (var i in values) {
    for (var k in values[i]) {
      sums[k] = (sums[k]||0)+values[i][k];
    }
  }
  return sums;
}

which sums emited keys as follows:
emit("somekey",{"A":1});
emit("somekey",{"B":2});
emit("somekey",{"A":1,"C":1});
emit("somekey",{"A":1,"B":2});

reduced output:
"somekey": {"A":3,"B":4,"C":1}

but, I guess the array approach is more efficient, as it uses less space by using indexes instead of keys...

- Mathias

On 31.01.2011, at 12:07, John wrote:

> That's perfect, thanks Robert. Funny how something so simple can be so confusing if the concept is new to you.......
> 
> For anyone who searches for how to do a reduce over an array in the future, here's the code:
> 
> function(keys, values, rereduce){
> 
> 	var total = [0,0];
> 	values.forEach(function(value){
> 		total[0] += value[0];
> 		total[1] += value[1];
> 	}
> 		
> 	)
> 	return total;
> }
> 
> Looks like I might get to add reporting to my successful use cases for Couchdb!
> 
> John
> 
> 
> On 31 Jan 2011, at 09:01, Robert Newson wrote:
> 
>> in 1.1, _sum will work for arrays of numbers too (rather than
>> concatenating them as above). for now, just loop over the array of
>> arrays and do the sum yourself.
>> 
>> On Mon, Jan 31, 2011 at 1:02 AM, Keith Gable <zi...@gmail.com> wrote:
>>> It sounds like you need a new view for each piece of data.
>>> 
>>> by_answered, by_busy, by_time_to_answer, etc.
>>> 
>>> Then you'd query each view to get the reduction, and the reduce would be as
>>> simple as _sum.
>>> 
>>> 
>>> 
>>> On Jan 30, 2011, at 5:55 PM, John <jo...@netdev.co.uk> wrote:
>>> 
>>>> Hi
>>>> 
>>>> I'm looking to extend our usage of couchdb  by replacing our mysql
>>>> reporting db.
>>>> Whilst using couchdb successfully for a number of varied use cases I've
>>>> never had to do much with reduce so I'm unsure on how to use it to reduce an
>>>> array of values.
>>>> 
>>>> Basically I want to be able to search a database using a composite key and
>>>> retrieving some aggregated information about number of calls, call status,
>>>> avg time to answer and avg duration
>>>> 
>>>> 
>>>> The following view shows how I'd like it to work:
>>>> 
>>>> Key = <Application, Account, Subscription>
>>>> Value = <1, answered, busy, noreply, time to answer, duration>
>>>> 
>>>> e.g.
>>>> 
>>>> ["NTS", "NetDev", "MySub1"], [1,1,0,0,100,200]
>>>> ["NTS", "NetDev", "MySub1"], [1,1,0,0,150,400]
>>>> ["NTS", "NetDev", "MySub1"], [1,1,0,0,170,500]
>>>> ["NTS", "NetDev", "MySub1"], [1,0,1,0,0,0]
>>>> ["NTS", "NetDev", "MySub1"], [1,0,1,0,0,0]
>>>> ["NTS", "NetDev", "MySub1"], [1,0,0,2,0,0]
>>>> ["NTS", "NetDev", "MySub1"], [1,0,0,2,0,0]
>>>> 
>>>> My Reduced output should look like this:
>>>> 
>>>> [7,3,2,2,420,1100]
>>>> i.e. 7 calls in total, 3 answered, 2 busy, 2 no reply, the total time for
>>>> time to answer is 420 and the total time for call duration is 1100.
>>>> 
>>>> I can then compute the two averages after getting the data back from couch
>>>> i.e. 420/no. of answered calls(3) and 1100/no. of answered calls(3)
>>>> 
>>>> I thought that sum(values) would do this for me but it just upsets couch:
>>>> 
>>>> Reduce output must shrink more rapidly: Current output:
>>>> '["001,11,11,11,11,11,11,11,11,11,11,101,11,11,11,11,11,11,11,11,11,11,11,101,11,11,11,11,11,11,11,11'...
>>>> (first 100 of 277 bytes)
>>>> 
>>>> What should my reduce function look like?
>>>> 
>>>> Thanks
>>>> 
>>>> John
>>> 
> 


Re: Reporting aggregated data using reduce function

Posted by John <jo...@netdev.co.uk>.
That's definitely more friendly on the eye and probably less brittle than my Array example, old habits die hard and I'm a telecoms guy who cant get used to all this extra memory.......

In any case both are useful examples of doing something a bit more complex than the usual examples I've seen for Reduce.

I've had some cracking support from this list and developed some really useful queries which again I don't see in the Wiki or Book. Turning the answers in this mailing list into a knowledge base would be an invaluable aid for people looking at the technology for the first time. I certainly don't mind returning the effort I've received from others here and contributing to that but where should such examples go, in a section on the Wiki? 
Making them easy to search for in google, showing a common problem/pattern which is a bit more than trivial and a real world example would benefit all and take some of the strain off this list.

Just a thought but please do reply, anyone, if you have ideas on this or think another approach is better.

John

On 2 Feb 2011, at 07:47, Mathias Leppich wrote:

> Maybe related: a typical reduce function I use to sum on objects:
> 
> function (keys, values, rereduce) {
>  var sums = {};
>  for (var i in values) {
>    for (var k in values[i]) {
>      sums[k] = (sums[k]||0)+values[i][k];
>    }
>  }
>  return sums;
> }
> 
> which sums emited keys as follows:
> emit("somekey",{"A":1});
> emit("somekey",{"B":2});
> emit("somekey",{"A":1,"C":1});
> emit("somekey",{"A":1,"B":2});
> 
> reduced output:
> "somekey": {"A":3,"B":4,"C":1}
> 
> but, I guess the array approach is more efficient, as it uses less space by using indexes instead of keys...
> 
> - Mathias
> 
> On 31.01.2011, at 12:07, John wrote:
> 
>> That's perfect, thanks Robert. Funny how something so simple can be so confusing if the concept is new to you.......
>> 
>> For anyone who searches for how to do a reduce over an array in the future, here's the code:
>> 
>> function(keys, values, rereduce){
>> 
>> 	var total = [0,0];
>> 	values.forEach(function(value){
>> 		total[0] += value[0];
>> 		total[1] += value[1];
>> 	}
>> 		
>> 	)
>> 	return total;
>> }
>> 
>> Looks like I might get to add reporting to my successful use cases for Couchdb!
>> 
>> John
>> 
>> 
>> On 31 Jan 2011, at 09:01, Robert Newson wrote:
>> 
>>> in 1.1, _sum will work for arrays of numbers too (rather than
>>> concatenating them as above). for now, just loop over the array of
>>> arrays and do the sum yourself.
>>> 
>>> On Mon, Jan 31, 2011 at 1:02 AM, Keith Gable <zi...@gmail.com> wrote:
>>>> It sounds like you need a new view for each piece of data.
>>>> 
>>>> by_answered, by_busy, by_time_to_answer, etc.
>>>> 
>>>> Then you'd query each view to get the reduction, and the reduce would be as
>>>> simple as _sum.
>>>> 
>>>> 
>>>> 
>>>> On Jan 30, 2011, at 5:55 PM, John <jo...@netdev.co.uk> wrote:
>>>> 
>>>>> Hi
>>>>> 
>>>>> I'm looking to extend our usage of couchdb  by replacing our mysql
>>>>> reporting db.
>>>>> Whilst using couchdb successfully for a number of varied use cases I've
>>>>> never had to do much with reduce so I'm unsure on how to use it to reduce an
>>>>> array of values.
>>>>> 
>>>>> Basically I want to be able to search a database using a composite key and
>>>>> retrieving some aggregated information about number of calls, call status,
>>>>> avg time to answer and avg duration
>>>>> 
>>>>> 
>>>>> The following view shows how I'd like it to work:
>>>>> 
>>>>> Key = <Application, Account, Subscription>
>>>>> Value = <1, answered, busy, noreply, time to answer, duration>
>>>>> 
>>>>> e.g.
>>>>> 
>>>>> ["NTS", "NetDev", "MySub1"], [1,1,0,0,100,200]
>>>>> ["NTS", "NetDev", "MySub1"], [1,1,0,0,150,400]
>>>>> ["NTS", "NetDev", "MySub1"], [1,1,0,0,170,500]
>>>>> ["NTS", "NetDev", "MySub1"], [1,0,1,0,0,0]
>>>>> ["NTS", "NetDev", "MySub1"], [1,0,1,0,0,0]
>>>>> ["NTS", "NetDev", "MySub1"], [1,0,0,2,0,0]
>>>>> ["NTS", "NetDev", "MySub1"], [1,0,0,2,0,0]
>>>>> 
>>>>> My Reduced output should look like this:
>>>>> 
>>>>> [7,3,2,2,420,1100]
>>>>> i.e. 7 calls in total, 3 answered, 2 busy, 2 no reply, the total time for
>>>>> time to answer is 420 and the total time for call duration is 1100.
>>>>> 
>>>>> I can then compute the two averages after getting the data back from couch
>>>>> i.e. 420/no. of answered calls(3) and 1100/no. of answered calls(3)
>>>>> 
>>>>> I thought that sum(values) would do this for me but it just upsets couch:
>>>>> 
>>>>> Reduce output must shrink more rapidly: Current output:
>>>>> '["001,11,11,11,11,11,11,11,11,11,11,101,11,11,11,11,11,11,11,11,11,11,11,101,11,11,11,11,11,11,11,11'...
>>>>> (first 100 of 277 bytes)
>>>>> 
>>>>> What should my reduce function look like?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> John
>>>> 
>> 
>