You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Thamizhannal Paramasivam <th...@gmail.com> on 2012/01/21 19:17:48 UTC

reducer behavior

Hi All,
I am experimenting MapReduce program on Hadoop-0.19. This program has
single input file with 7 records(later it can have many records on multiple
files) and each input suppose to produce 11 output records. When it runs
with no_of_reducer=4. It produces only 33 records. But, when I ran with
no_of_reducer=1 then it produces 77 records as expected.

What could be the reason for this ? I am missing any configuration
parameter.

Thanks
Tamil

Re: reducer behavior

Posted by Thamizhannal Paramasivam <th...@gmail.com>.
Thanks a lot Harsh.
I am sure that it would be a logical issue on Reducer. when reducer=1 it
works as expected.
But counter output also gives expected result irrespective of num of
reducer.
Here are the counter ouput:
12/01/24 17:02:16 INFO mapred.JobClient:     NUM_RECORDS=66
12/01/24 17:02:16 INFO mapred.JobClient:   Job Counters
12/01/24 17:02:16 INFO mapred.JobClient:     Launched reduce tasks=4
12/01/24 17:02:16 INFO mapred.JobClient:     Launched map tasks=2
12/01/24 17:02:16 INFO mapred.JobClient:     Data-local map tasks=2
12/01/24 17:02:16 INFO mapred.JobClient:   FileSystemCounters
12/01/24 17:02:16 INFO mapred.JobClient:     FILE_BYTES_READ=1028
12/01/24 17:02:16 INFO mapred.JobClient:     HDFS_BYTES_READ=984
12/01/24 17:02:16 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2288
12/01/24 17:02:16 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=5139
12/01/24 17:02:16 INFO mapred.JobClient:   Map-Reduce Framework
12/01/24 17:02:16 INFO mapred.JobClient:     Reduce input groups=6
12/01/24 17:02:16 INFO mapred.JobClient:     Combine output records=0
12/01/24 17:02:16 INFO mapred.JobClient:     Map input records=6
12/01/24 17:02:16 INFO mapred.JobClient:     Reduce shuffle bytes=873
12/01/24 17:02:16 INFO mapred.JobClient:     Reduce output records=66
12/01/24 17:02:16 INFO mapred.JobClient:     Spilled Records=12
12/01/24 17:02:16 INFO mapred.JobClient:     Map output bytes=992
12/01/24 17:02:16 INFO mapred.JobClient:     Map input bytes=794
12/01/24 17:02:16 INFO mapred.JobClient:     Combine input records=0
12/01/24 17:02:16 INFO mapred.JobClient:     Map output records=6
12/01/24 17:02:16 INFO mapred.JobClient:     Reduce input records=6

It says Reduce input records=6 &  Reduce output records=66, but there are
actually 22 output records on reducer.

I use custome output format,
public class CustomMultipleTextOutputFormat<K, V> extends
        MultipleTextOutputFormat<K, V> {

    @Override
    protected String generateFileNameForKeyValue(K key, V value, String
name) {
        String[] keys = key.toString().split("%");
        if ( keys.length != 3 ) {
            return key.toString();
        }
        return keys[2].toString();
    }
}
I am not sure what am I missing? Any suggestion would be appreciated.
Thanks,
Tamil

On Sun, Jan 22, 2012 at 1:24 AM, Harsh J <ha...@cloudera.com> wrote:

> The only difference would be that with 4 reducers your keys would get
> partitioned based on their hashCode() implementation (if you use the
> default hash partitioner) (I'd check the key impl. here, first thing, if
> its a custom key impl.), and each be sent to one reducer.
>
> Check the input record counters on your reducers, and the total map output
> record counters - they should add up and be equal to the latter. Also make
> sure you aren't skipping out on the reducer iterator under any condition,
> when you are doing the reducer op.
>
> I'm guessing its mostly your logic that's somehow causing this but I do
> not have your source bits to say that for sure.
>
> On 21-Jan-2012, at 11:47 PM, Thamizhannal Paramasivam wrote:
>
> Hi All,
> I am experimenting MapReduce program on Hadoop-0.19. This program has
> single input file with 7 records(later it can have many records on multiple
> files) and each input suppose to produce 11 output records. When it runs
> with no_of_reducer=4. It produces only 33 records. But, when I ran with
> no_of_reducer=1 then it produces 77 records as expected.
>
> What could be the reason for this ? I am missing any configuration
> parameter.
>
> Thanks
> Tamil
>
>
> --
> Harsh J
> Customer Ops. Engineer, Cloudera
>
>

Re: reducer behavior

Posted by Harsh J <ha...@cloudera.com>.
The only difference would be that with 4 reducers your keys would get partitioned based on their hashCode() implementation (if you use the default hash partitioner) (I'd check the key impl. here, first thing, if its a custom key impl.), and each be sent to one reducer. 

Check the input record counters on your reducers, and the total map output record counters - they should add up and be equal to the latter. Also make sure you aren't skipping out on the reducer iterator under any condition, when you are doing the reducer op.

I'm guessing its mostly your logic that's somehow causing this but I do not have your source bits to say that for sure.

On 21-Jan-2012, at 11:47 PM, Thamizhannal Paramasivam wrote:

> Hi All,
> I am experimenting MapReduce program on Hadoop-0.19. This program has single input file with 7 records(later it can have many records on multiple files) and each input suppose to produce 11 output records. When it runs with no_of_reducer=4. It produces only 33 records. But, when I ran with no_of_reducer=1 then it produces 77 records as expected.
> 
> What could be the reason for this ? I am missing any configuration parameter.
> 
> Thanks
> Tamil
> 

--
Harsh J
Customer Ops. Engineer, Cloudera