You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Leonardo Gamas <le...@jusbrasil.com.br> on 2011/09/29 18:15:23 UTC

Lost task tracker reschedules all tasktracker's successful map tasks

Hi,

I have a very large MapReduce Job and sometimes a TaskTracker doesn't send a
heartbeat in the preconfigured amount of time, so it's considered dead. It's
ok, but all tasks already finished by this TaskTracker are lost too, or
better explained, are rescheduled and re-executed by another TaskTracker.

This is a default behavior or i'm experiencing some bug or miss
configuration?

My reguards,

Leonardo Gamas

Re: Lost task tracker reschedules all tasktracker's successful map tasks

Posted by Leonardo Gamas <le...@jusbrasil.com.br>.
Ok, That's explains a lot! Thanks guys! :)

2011/9/29 Joey Echeverria <jo...@cloudera.com>

> > The question is: the intermediary (before any reducer) results of
> completed
> > individual tasks are recorded in the HDFS, right? So why are these
> results
> > discarded, since the lost of the tasktracker is not the lost of already
> > processed data?
>
> Intermediate results are stored on the local disks and served up via
> an embedded jetty HTTP server. If the tasktracker goes down, so does
> the embedded HTTP server.
>
> -Joey
>
> On Thu, Sep 29, 2011 at 12:59 PM, Leonardo Gamas
> <le...@jusbrasil.com.br> wrote:
> > No, the reducers are fine, or at least i didn't observe any problem.
> >
> > The question is: the intermediary (before any reducer) results of
> completed
> > individual tasks are recorded in the HDFS, right? So why are these
> results
> > discarded, since the lost of the tasktracker is not the lost of already
> > processed data?
> >
> > --Leonardo Gamas
> >
> > 2011/9/29 Robert Evans <ev...@yahoo-inc.com>
> >>
> >> If a TaskTracker is lost then it cannot serve up any Map results to
> >> Reducers that will need them so the Map tasks have to be rerun.  I am
> not
> >> sure if this is the behavior you are seeing or not.  Are completed
> Reducers
> >> being rerun as well?
> >>
> >> --Bobby Evans
> >>
> >> On 9/29/11 11:15 AM, "Leonardo Gamas" <le...@jusbrasil.com.br>
> wrote:
> >>
> >> Hi,
> >>
> >> I have a very large MapReduce Job and sometimes a TaskTracker doesn't
> send
> >> a heartbeat in the preconfigured amount of time, so it's considered
> dead.
> >> It's ok, but all tasks already finished by this TaskTracker are lost
> too, or
> >> better explained, are rescheduled and re-executed by another
> TaskTracker.
> >>
> >> This is a default behavior or i'm experiencing some bug or miss
> >> configuration?
> >>
> >> My reguards,
> >>
> >> Leonardo Gamas
> >>
> >>
> >
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Re: Lost task tracker reschedules all tasktracker's successful map tasks

Posted by Joey Echeverria <jo...@cloudera.com>.
> The question is: the intermediary (before any reducer) results of completed
> individual tasks are recorded in the HDFS, right? So why are these results
> discarded, since the lost of the tasktracker is not the lost of already
> processed data?

Intermediate results are stored on the local disks and served up via
an embedded jetty HTTP server. If the tasktracker goes down, so does
the embedded HTTP server.

-Joey

On Thu, Sep 29, 2011 at 12:59 PM, Leonardo Gamas
<le...@jusbrasil.com.br> wrote:
> No, the reducers are fine, or at least i didn't observe any problem.
>
> The question is: the intermediary (before any reducer) results of completed
> individual tasks are recorded in the HDFS, right? So why are these results
> discarded, since the lost of the tasktracker is not the lost of already
> processed data?
>
> --Leonardo Gamas
>
> 2011/9/29 Robert Evans <ev...@yahoo-inc.com>
>>
>> If a TaskTracker is lost then it cannot serve up any Map results to
>> Reducers that will need them so the Map tasks have to be rerun.  I am not
>> sure if this is the behavior you are seeing or not.  Are completed Reducers
>> being rerun as well?
>>
>> --Bobby Evans
>>
>> On 9/29/11 11:15 AM, "Leonardo Gamas" <le...@jusbrasil.com.br> wrote:
>>
>> Hi,
>>
>> I have a very large MapReduce Job and sometimes a TaskTracker doesn't send
>> a heartbeat in the preconfigured amount of time, so it's considered dead.
>> It's ok, but all tasks already finished by this TaskTracker are lost too, or
>> better explained, are rescheduled and re-executed by another TaskTracker.
>>
>> This is a default behavior or i'm experiencing some bug or miss
>> configuration?
>>
>> My reguards,
>>
>> Leonardo Gamas
>>
>>
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Lost task tracker reschedules all tasktracker's successful map tasks

Posted by Leonardo Gamas <le...@jusbrasil.com.br>.
No, the reducers are fine, or at least i didn't observe any problem.

The question is: the intermediary (before any reducer) results of completed
individual tasks are recorded in the HDFS, right? So why are these results
discarded, since the lost of the tasktracker is not the lost of already
processed data?

--Leonardo Gamas

2011/9/29 Robert Evans <ev...@yahoo-inc.com>

>  If a TaskTracker is lost then it cannot serve up any Map results to
> Reducers that will need them so the Map tasks have to be rerun.  I am not
> sure if this is the behavior you are seeing or not.  Are completed Reducers
> being rerun as well?
>
> --Bobby Evans
>
>
> On 9/29/11 11:15 AM, "Leonardo Gamas" <le...@jusbrasil.com.br> wrote:
>
> Hi,
>
> I have a very large MapReduce Job and sometimes a TaskTracker doesn't send
> a heartbeat in the preconfigured amount of time, so it's considered dead.
> It's ok, but all tasks already finished by this TaskTracker are lost too, or
> better explained, are rescheduled and re-executed by another TaskTracker.
>
> This is a default behavior or i'm experiencing some bug or miss
> configuration?
>
> My reguards,
>
> Leonardo Gamas
>
>
>

Re: Lost task tracker reschedules all tasktracker's successful map tasks

Posted by Robert Evans <ev...@yahoo-inc.com>.
If a TaskTracker is lost then it cannot serve up any Map results to Reducers that will need them so the Map tasks have to be rerun.  I am not sure if this is the behavior you are seeing or not.  Are completed Reducers being rerun as well?

--Bobby Evans

On 9/29/11 11:15 AM, "Leonardo Gamas" <le...@jusbrasil.com.br> wrote:

Hi,

I have a very large MapReduce Job and sometimes a TaskTracker doesn't send a heartbeat in the preconfigured amount of time, so it's considered dead. It's ok, but all tasks already finished by this TaskTracker are lost too, or better explained, are rescheduled and re-executed by another TaskTracker.

This is a default behavior or i'm experiencing some bug or miss configuration?

My reguards,

Leonardo Gamas