You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Brendan W." <bw...@gmail.com> on 2011/11/03 14:18:27 UTC

map task attempt progress at 400%?

Hi,

Running 0.20.2:

A job with about 4000 map tasks quickly blew through all but 3 in a couple
of hours, with the tasks taking about two minutes each.  The remaining
three, however, inched along, with their progress passing 100% and keeping
on going.  After 20 hours or so, I killed the running task attempts.  They
restarted, and same thing:  they inched their way past 100%, getting up
past 400% and continuing.  They finally finished in the middle of last
night.

What does progress > 100% indicate?

Thanks for any help.

Re: map task attempt progress at 400%?

Posted by Joey Echeverria <jo...@cloudera.com>.
The first thing I would check is that your mappers are processing the
same amount of data. I'm not familiar with the Cassandra InputFormat,
but if it doesn't properly split the data, then you could end up with
this behavior. If the data is split properly, I'd look into swapping
as a possible cause.

Is it always the same nodes that are slow?

-Joey

On Thu, Nov 3, 2011 at 10:43 AM, Brendan W. <bw...@gmail.com> wrote:
> The input is actually performed by the apache-cassandra 0.6.9 api for
> map-reduce.  And yes, the cassandra row that is read into the mapper
> consists of a block of 100 compressed lines of text.  So maybe that
> accounts for the progress report.
>
> Any idea what the huge time difference might be due to (2 minutes average
> vs. 20 hrs for the last 3 tasks)?  Does that sound like swapping to you?
>
> Thanks,
>
> Brendan
>
> On Thu, Nov 3, 2011 at 9:44 AM, Joey Echeverria <jo...@cloudera.com> wrote:
>
>> Is you input data compressed? There have been some bugs in the past
>> with reporting progress when reading compressed data.
>>
>> -Joey
>>
>> On Thu, Nov 3, 2011 at 9:18 AM, Brendan W. <bw...@gmail.com> wrote:
>> > Hi,
>> >
>> > Running 0.20.2:
>> >
>> > A job with about 4000 map tasks quickly blew through all but 3 in a
>> couple
>> > of hours, with the tasks taking about two minutes each.  The remaining
>> > three, however, inched along, with their progress passing 100% and
>> keeping
>> > on going.  After 20 hours or so, I killed the running task attempts.
>>  They
>> > restarted, and same thing:  they inched their way past 100%, getting up
>> > past 400% and continuing.  They finally finished in the middle of last
>> > night.
>> >
>> > What does progress > 100% indicate?
>> >
>> > Thanks for any help.
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: map task attempt progress at 400%?

Posted by "Brendan W." <bw...@gmail.com>.
The input is actually performed by the apache-cassandra 0.6.9 api for
map-reduce.  And yes, the cassandra row that is read into the mapper
consists of a block of 100 compressed lines of text.  So maybe that
accounts for the progress report.

Any idea what the huge time difference might be due to (2 minutes average
vs. 20 hrs for the last 3 tasks)?  Does that sound like swapping to you?

Thanks,

Brendan

On Thu, Nov 3, 2011 at 9:44 AM, Joey Echeverria <jo...@cloudera.com> wrote:

> Is you input data compressed? There have been some bugs in the past
> with reporting progress when reading compressed data.
>
> -Joey
>
> On Thu, Nov 3, 2011 at 9:18 AM, Brendan W. <bw...@gmail.com> wrote:
> > Hi,
> >
> > Running 0.20.2:
> >
> > A job with about 4000 map tasks quickly blew through all but 3 in a
> couple
> > of hours, with the tasks taking about two minutes each.  The remaining
> > three, however, inched along, with their progress passing 100% and
> keeping
> > on going.  After 20 hours or so, I killed the running task attempts.
>  They
> > restarted, and same thing:  they inched their way past 100%, getting up
> > past 400% and continuing.  They finally finished in the middle of last
> > night.
> >
> > What does progress > 100% indicate?
> >
> > Thanks for any help.
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Re: map task attempt progress at 400%?

Posted by Joey Echeverria <jo...@cloudera.com>.
Is you input data compressed? There have been some bugs in the past
with reporting progress when reading compressed data.

-Joey

On Thu, Nov 3, 2011 at 9:18 AM, Brendan W. <bw...@gmail.com> wrote:
> Hi,
>
> Running 0.20.2:
>
> A job with about 4000 map tasks quickly blew through all but 3 in a couple
> of hours, with the tasks taking about two minutes each.  The remaining
> three, however, inched along, with their progress passing 100% and keeping
> on going.  After 20 hours or so, I killed the running task attempts.  They
> restarted, and same thing:  they inched their way past 100%, getting up
> past 400% and continuing.  They finally finished in the middle of last
> night.
>
> What does progress > 100% indicate?
>
> Thanks for any help.
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434