You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2011/02/12 17:04:53 UTC

Map tasks execution

Hi,

1 - When a Map task is taking too long to finish its process, the JT
launches another Map task to process. This means that the task that
was replaced is killed?

2 - Does Hadoop MR allows that the same input split be processed by 2
different mappers at the same time?


Thanks,
-- 
Pedro

Re: Map tasks execution

Posted by Harsh J <qw...@gmail.com>.

Hello,

On Sat, Feb 12, 2011 at 9:34 PM, Pedro Costa <ps...@gmail.com> wrote:
> Hi,
>
> 1 - When a Map task is taking too long to finish its process, the JT
> launches another Map task to process. This means that the task that
> was replaced is killed?

If a task times out, it is killed and rescheduled. If you're noticing
this in the final waves, it could be the speculative execution feature
of Hadoop MapReduce - enabled by defaults.

> 2 - Does Hadoop MR allows that the same input split be processed by 2
> different mappers at the same time?

In some ways, yes.

There is a speculative execution feature that does this exact thing
(two tasks may be 'computing' in the same race - whichever reports a
completion first, wins). See the 'Speculative execution' sub-topic of
this YDN Hadoop modules page for some details:
http://developer.yahoo.com/hadoop/tutorial/module4.html#tolerence

But it should also be possible to have duplicated input splits / paths
in order to do this (although the 'same-time' is not a guarantee,
again).

-- 
Harsh J
www.harshj.com