You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2011/03/25 17:24:24 UTC

"Reduce input groups" vs "Reduce input records"

Hi,

in this MR example, it exists the field "Reduce input groups" and
"Reduce input records". What's the difference between these 2 fields?


$ hadoop jar cloud9.jar edu.umd.cloud9.example.simple.DemoWordCount
data/bible+shakes.nopunc wc 1
10/07/11 22:25:42 INFO simple.DemoWordCount: Tool: DemoWordCount
10/07/11 22:25:42 INFO simple.DemoWordCount:  - input path:
data/bible+shakes.nopunc
10/07/11 22:25:42 INFO simple.DemoWordCount:  - output path: wc
10/07/11 22:25:42 INFO simple.DemoWordCount:  - number of reducers: 1
[...]
10/07/11 22:25:48 INFO mapred.JobClient: Counters: 12
10/07/11 22:25:48 INFO mapred.JobClient:   FileSystemCounters
10/07/11 22:25:48 INFO mapred.JobClient:     FILE_BYTES_READ=22907000
10/07/11 22:25:48 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=5867160
10/07/11 22:25:48 INFO mapred.JobClient:   Map-Reduce Framework
10/07/11 22:25:48 INFO mapred.JobClient:     Reduce input groups=41788
10/07/11 22:25:48 INFO mapred.JobClient:     Combine output records=128253
10/07/11 22:25:48 INFO mapred.JobClient:     Map input records=156215
10/07/11 22:25:48 INFO mapred.JobClient:     Reduce shuffle bytes=0
10/07/11 22:25:48 INFO mapred.JobClient:     Reduce output records=41788
10/07/11 22:25:48 INFO mapred.JobClient:     Spilled Records=170041
10/07/11 22:25:48 INFO mapred.JobClient:     Map output bytes=15919397
10/07/11 22:25:48 INFO mapred.JobClient:     Combine input records=1820763
10/07/11 22:25:48 INFO mapred.JobClient:     Map output records=1734298
10/07/11 22:25:48 INFO mapred.JobClient:     Reduce input records=41788
10/07/11 22:25:48 INFO simple.DemoWordCount: Job Finished in 5.345 seconds


-- 
Pedro

Re: "Reduce input groups" vs "Reduce input records"

Posted by Todd Lipcon <to...@cloudera.com>.

Hi Pedro,

Reduce Input Groups is the number of unique keys fed into the
reducers. Reduce Input Records is the number of values. Each key has
one or more values associated with it coming into the reducer.

For example, with the canonical wordcount example, "reduce input
groups" would be the total number of unique words in the document.
Reduce input records would be the total number of words in the
document (equivalent to `wc -w` in Unix terms)

-Todd

On Fri, Mar 25, 2011 at 9:24 AM, Pedro Costa <ps...@gmail.com> wrote:
> Hi,
>
> in this MR example, it exists the field "Reduce input groups" and
> "Reduce input records". What's the difference between these 2 fields?
>
>
> $ hadoop jar cloud9.jar edu.umd.cloud9.example.simple.DemoWordCount
> data/bible+shakes.nopunc wc 1
> 10/07/11 22:25:42 INFO simple.DemoWordCount: Tool: DemoWordCount
> 10/07/11 22:25:42 INFO simple.DemoWordCount:  - input path:
> data/bible+shakes.nopunc
> 10/07/11 22:25:42 INFO simple.DemoWordCount:  - output path: wc
> 10/07/11 22:25:42 INFO simple.DemoWordCount:  - number of reducers: 1
> [...]
> 10/07/11 22:25:48 INFO mapred.JobClient: Counters: 12
> 10/07/11 22:25:48 INFO mapred.JobClient:   FileSystemCounters
> 10/07/11 22:25:48 INFO mapred.JobClient:     FILE_BYTES_READ=22907000
> 10/07/11 22:25:48 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=5867160
> 10/07/11 22:25:48 INFO mapred.JobClient:   Map-Reduce Framework
> 10/07/11 22:25:48 INFO mapred.JobClient:     Reduce input groups=41788
> 10/07/11 22:25:48 INFO mapred.JobClient:     Combine output records=128253
> 10/07/11 22:25:48 INFO mapred.JobClient:     Map input records=156215
> 10/07/11 22:25:48 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 10/07/11 22:25:48 INFO mapred.JobClient:     Reduce output records=41788
> 10/07/11 22:25:48 INFO mapred.JobClient:     Spilled Records=170041
> 10/07/11 22:25:48 INFO mapred.JobClient:     Map output bytes=15919397
> 10/07/11 22:25:48 INFO mapred.JobClient:     Combine input records=1820763
> 10/07/11 22:25:48 INFO mapred.JobClient:     Map output records=1734298
> 10/07/11 22:25:48 INFO mapred.JobClient:     Reduce input records=41788
> 10/07/11 22:25:48 INFO simple.DemoWordCount: Job Finished in 5.345 seconds
>
>
> --
> Pedro
>



-- 
Todd Lipcon
Software Engineer, Cloudera