You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Keren Ouaknine <ke...@gmail.com> on 2011/12/06 17:02:23 UTC

reduce tasks pending - no progress for few hours

Hello,

Please find details below. I would like to resume the pending tasks (not
sure why they went into pending state in the first place).
I had a look at the logs, there were some tasks failures (3), but
successful on a second attempt.
*
Job Setup:* Successful<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=setup&pagenum=1&state=completed>
*Status:* Running
*Started at:* Tue Dec 06 05:41:39 EST 2011
*Running for:* 5hrs, 18mins, 49sec
*Job Cleanup:* Pending

Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts<http://10.239.24.12:50030/jobfailures.jsp?jobid=job_201111302144_0187>
map<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=map&pagenum=1>
100.00%
1260001260<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=map&pagenum=1&state=completed>
00 / 6<http://10.239.24.12:50030/jobfailures.jsp?jobid=job_201111302144_0187&kind=map&cause=killed>
reduce<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=reduce&pagenum=1>
9.99%

9081<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=reduce&pagenum=1&state=pending>
09<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=reduce&pagenum=1&state=completed>
0 3<http://10.239.24.12:50030/jobfailures.jsp?jobid=job_201111302144_0187&kind=reduce&cause=failed>/
0
Thanks,
Keren

-- 
Keren Ouaknine
Cell: +972 54 2565404
Web: www.kereno.com

Re: reduce tasks pending - no progress for few hours

Posted by Harsh J <ha...@cloudera.com>.
A "Child" is basically your Map or Reduce JVM. It is spawned by the TaskTracker.

I assume that there is just this job running, yes?

Can you check if any of your TaskTrackers have been blacklisted? Also, can you pastebin your day's JobTracker log, grepped for this specific Job ID?

You could also try bumping (stop/start) one TaskTracker and see if it helps get some traction, but would be good to have a log copy before you try doing that - so we can be sure what caused it.

Also, version?

On 06-Dec-2011, at 10:46 PM, Keren Ouaknine wrote:

> Hello Harsh,
> 
> 1. All tasktracker are up, they are 10 available. What do you reckon?
> 
> 2. On one node, there is additionally a process called Child. It seems like a tasktracker, so I am not sure why both Child and tasktracker are running on that node?
> 
> Thanks,
> Keren
> 
> On Tue, Dec 6, 2011 at 11:26 AM, Harsh J <ha...@cloudera.com> wrote:
> Keren,
> 
> How many tasktrackers are you running, and are they all still up?
> 
> On 06-Dec-2011, at 9:32 PM, Keren Ouaknine wrote:
> 
>> Hello,
>> 
>> Please find details below. I would like to resume the pending tasks (not sure why they went into pending state in the first place).
>> I had a look at the logs, there were some tasks failures (3), but successful on a second attempt.
>> 
>> Job Setup: Successful
>> Status: Running
>> Started at: Tue Dec 06 05:41:39 EST 2011
>> Running for: 5hrs, 18mins, 49sec
>> Job Cleanup: Pending
>> 
>> Kind	% Complete	Num Tasks	Pending	Running	Complete	Killed	Failed/Killed
>> Task Attempts
>> map	100.00%
>> 
>> 1260	0	0	1260	0	0 / 6
>> reduce	9.99%
>> 
>> 
>> 90	81	0	9	0	3 / 0
>> 
>> Thanks,
>> Keren
>> 
>> -- 
>> Keren Ouaknine
>> Cell: +972 54 2565404
>> Web: www.kereno.com
>> 
>> 
> 
> 
> 
> 
> -- 
> Keren Ouaknine
> Cell: +972 54 2565404
> Web: www.kereno.com
> 
> 


Re: reduce tasks pending - no progress for few hours

Posted by Keren Ouaknine <ke...@gmail.com>.
Hello Harsh,

1. All tasktracker are up, they are 10 available. What do you reckon?

2. On one node, there is additionally a process called Child. It seems like
a tasktracker, so I am not sure why both Child and tasktracker are running
on that node?

Thanks,
Keren

On Tue, Dec 6, 2011 at 11:26 AM, Harsh J <ha...@cloudera.com> wrote:

> Keren,
>
> How many tasktrackers are you running, and are they all still up?
>
> On 06-Dec-2011, at 9:32 PM, Keren Ouaknine wrote:
>
> Hello,
>
> Please find details below. I would like to resume the pending tasks (not
> sure why they went into pending state in the first place).
> I had a look at the logs, there were some tasks failures (3), but
> successful on a second attempt.
> *
> Job Setup:* Successful<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=setup&pagenum=1&state=completed>
> *Status:* Running
> *Started at:* Tue Dec 06 05:41:39 EST 2011
> *Running for:* 5hrs, 18mins, 49sec
> *Job Cleanup:* Pending
>
> Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
> Task Attempts<http://10.239.24.12:50030/jobfailures.jsp?jobid=job_201111302144_0187>
> map<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=map&pagenum=1>
> 100.00%
> 1260001260<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=map&pagenum=1&state=completed>
> 00 / 6<http://10.239.24.12:50030/jobfailures.jsp?jobid=job_201111302144_0187&kind=map&cause=killed>
> reduce<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=reduce&pagenum=1>
> 9.99%
>
> 9081<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=reduce&pagenum=1&state=pending>
> 09<http://10.239.24.12:50030/jobtasks.jsp?jobid=job_201111302144_0187&type=reduce&pagenum=1&state=completed>
> 0 3<http://10.239.24.12:50030/jobfailures.jsp?jobid=job_201111302144_0187&kind=reduce&cause=failed>/ 0
> Thanks,
> Keren
>
> --
>  Keren Ouaknine
> Cell: +972 54 2565404
> Web: www.kereno.com
>
>
>
>


-- 
Keren Ouaknine
Cell: +972 54 2565404
Web: www.kereno.com

Re: reduce tasks pending - no progress for few hours

Posted by Harsh J <ha...@cloudera.com>.
Keren,

How many tasktrackers are you running, and are they all still up?

On 06-Dec-2011, at 9:32 PM, Keren Ouaknine wrote:

> Hello,
> 
> Please find details below. I would like to resume the pending tasks (not sure why they went into pending state in the first place).
> I had a look at the logs, there were some tasks failures (3), but successful on a second attempt.
> 
> Job Setup: Successful
> Status: Running
> Started at: Tue Dec 06 05:41:39 EST 2011
> Running for: 5hrs, 18mins, 49sec
> Job Cleanup: Pending
> 
> Kind	% Complete	Num Tasks	Pending	Running	Complete	Killed	Failed/Killed
> Task Attempts
> map	100.00%
> 
> 1260	0	0	1260	0	0 / 6
> reduce	9.99%
> 
> 
> 90	81	0	9	0	3 / 0
> 
> Thanks,
> Keren
> 
> -- 
> Keren Ouaknine
> Cell: +972 54 2565404
> Web: www.kereno.com
> 
>