You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Shirley Cohen <sc...@cs.utexas.edu> on 2008/09/09 21:20:21 UTC
output multiple values?
I have a simple reducer that computes the average by doing a sum/
count. But I want to output both the average and the count for a
given key, not just the average. Is it possible to output both values
from the same invocation of the reducer? Or do I need two reducer
invocations? If I try to call output.collect() twice from the reducer
and label the key with "type=avg" or "type=count", I get a bunch of
garbage out. Please let me know if you have any suggestions.
Thanks,
Shirley
Re: output multiple values?
Posted by Shirley Cohen <sh...@cis.upenn.edu>.
Thanks Owen! I found the bug in my code: Doing collect twice does
work now :))
Shirley
On Sep 9, 2008, at 4:19 PM, Owen O'Malley wrote:
>
> On Sep 9, 2008, at 12:20 PM, Shirley Cohen wrote:
>
>> I have a simple reducer that computes the average by doing a sum/
>> count. But I want to output both the average and the count for a
>> given key, not just the average. Is it possible to output both
>> values from the same invocation of the reducer? Or do I need two
>> reducer invocations? If I try to call output.collect() twice from
>> the reducer and label the key with "type=avg" or "type=count", I
>> get a bunch of garbage out. Please let me know if you have any
>> suggestions.
>
> I'd be tempted to define a type like:
>
> class AverageAndCount implements Writable {
> private long sum;
> private long count;
> ...
> public String toString() {
> return "avg = " + (sum / (double) count) + ", count = " + count);
> }
> }
>
> Then you could use your reducer as both a combiner and reducer and
> you would get both values out if you use TextOutputFormat. That
> said, it should absolutely work to do collect twice.
>
> -- Owen
Re: output multiple values?
Posted by Owen O'Malley <om...@apache.org>.
On Sep 9, 2008, at 12:20 PM, Shirley Cohen wrote:
> I have a simple reducer that computes the average by doing a sum/
> count. But I want to output both the average and the count for a
> given key, not just the average. Is it possible to output both
> values from the same invocation of the reducer? Or do I need two
> reducer invocations? If I try to call output.collect() twice from
> the reducer and label the key with "type=avg" or "type=count", I get
> a bunch of garbage out. Please let me know if you have any
> suggestions.
I'd be tempted to define a type like:
class AverageAndCount implements Writable {
private long sum;
private long count;
...
public String toString() {
return "avg = " + (sum / (double) count) + ", count = " + count);
}
}
Then you could use your reducer as both a combiner and reducer and you
would get both values out if you use TextOutputFormat. That said, it
should absolutely work to do collect twice.
-- Owen