You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rakhi Khatwani <rk...@gmail.com> on 2009/08/27 13:52:20 UTC

Doubt in reducer

Hi,
        I am running a map reduce program which reads data from a file,
processes it and writes the output into another file.
i run 4 maps and 4 reduces, and my output is as follows:
09/08/27 17:34:37 INFO mapred.JobClient: Running job: job_200908271142_0026
09/08/27 17:34:38 INFO mapred.JobClient: map 0% reduce 0%
09/08/27 17:34:45 INFO mapred.JobClient: map 25% reduce 0%
09/08/27 17:34:47 INFO mapred.JobClient: map 50% reduce 0%
09/08/27 17:34:48 INFO mapred.JobClient: map 75% reduce 0%
09/08/27 17:34:50 INFO mapred.JobClient: map 100% reduce 0%
09/08/27 17:35:00 INFO mapred.JobClient: map 100% reduce 4%
09/08/27 17:35:03 INFO mapred.JobClient: map 100% reduce 25%
09/08/27 17:35:12 INFO mapred.JobClient: map 100% reduce 50%
09/08/27 17:35:21 INFO mapred.JobClient: map 100% reduce 75%
09/08/27 17:35:30 INFO mapred.JobClient: map 100% reduce 100%
09/08/27 17:35:31 INFO mapred.JobClient: Job complete: job_200908271142_0026
09/08/27 17:35:31 INFO mapred.JobClient: Counters: 15
09/08/27 17:35:31 INFO mapred.JobClient: File Systems
09/08/27 17:35:31 INFO mapred.JobClient: HDFS bytes read=6666974
09/08/27 17:35:31 INFO mapred.JobClient: Local bytes read=24
09/08/27 17:35:31 INFO mapred.JobClient: Local bytes written=520
09/08/27 17:35:31 INFO mapred.JobClient: Job Counters
09/08/27 17:35:31 INFO mapred.JobClient: Launched reduce tasks=4
09/08/27 17:35:31 INFO mapred.JobClient: Launched map tasks=4
09/08/27 17:35:31 INFO mapred.JobClient: Data-local map tasks=4
09/08/27 17:35:31 INFO mapred.JobClient: Map-Reduce Framework
09/08/27 17:35:31 INFO mapred.JobClient: Reduce input groups=0
09/08/27 17:35:31 INFO mapred.JobClient: Combine output records=0
09/08/27 17:35:31 INFO mapred.JobClient: Map input records=8940
09/08/27 17:35:31 INFO mapred.JobClient: Reduce output records=0
09/08/27 17:35:31 INFO mapred.JobClient: Map output bytes=0
09/08/27 17:35:31 INFO mapred.JobClient: Map input bytes=6663028
09/08/27 17:35:31 INFO mapred.JobClient: Combine input records=0
09/08/27 17:35:31 INFO mapred.JobClient: Map output records=0
09/08/27 17:35:31 INFO mapred.JobClient: Reduce input records=0


but i want my reduce to run , tht is if 25% map is done, thn i want the
reduce 2 save that much data. even if the 2nd map fails, i dont loose data.
any pointers?
Regards,
Raakhi

Re: Doubt in reducer

Posted by Vladimir Klimontovich <kl...@gmail.com>.
But reducer can do some preparations during map process. It can
distribute map output across nodes that will work as reducers.

Copying and sorting map output is also time costuming process (maybe,
more consuming than reduce itself). For example, piece job run log on  
40node cluster
could be like that:

09/08/27 11:08:24 INFO job.JobRunningListener:  map 36% reduce 10%
09/08/27 11:08:28 INFO job.JobRunningListener:  map 37% reduce 10%
09/08/27 11:08:29 INFO job.JobRunningListener:  map 37% reduce 11%

But if you run job on single node cluster reduce will start only after  
map finished.

On Aug 27, 2009, at 4:31 PM, Harish Mallipeddi wrote:

> On Thu, Aug 27, 2009 at 5:22 PM, Rakhi Khatwani  
> <rk...@gmail.com> wrote:
>
>>
>> but i want my reduce to run , tht is if 25% map is done, thn i want  
>> the
>> reduce 2 save that much data. even if the 2nd map fails, i dont  
>> loose data.
>> any pointers?
>> Regards,
>> Raakhi
>>
>
> What you're asking for will break the semantics of reduce(). Reduce  
> can only
> proceed after receiving all the map-outputs.
>
> -- 
> Harish Mallipeddi
> http://blog.poundbang.in

---
Vladimir Klimontovich,
skype: klimontovich
GoogleTalk/Jabber: klimontovich@gmail.com
Cell phone: +7926 890 2349


Re: Doubt in reducer

Posted by Harish Mallipeddi <ha...@gmail.com>.
On Thu, Aug 27, 2009 at 5:22 PM, Rakhi Khatwani <rk...@gmail.com> wrote:

>
> but i want my reduce to run , tht is if 25% map is done, thn i want the
> reduce 2 save that much data. even if the 2nd map fails, i dont loose data.
> any pointers?
> Regards,
> Raakhi
>

What you're asking for will break the semantics of reduce(). Reduce can only
proceed after receiving all the map-outputs.

-- 
Harish Mallipeddi
http://blog.poundbang.in