You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Abhinay Mehta <ab...@gmail.com> on 2011/04/28 12:33:46 UTC

MapReduce Jobs being 'stuck' for several hours and then completing

Hi all,

We are using CDH3B4 on the Hadoop Cluster.

We have hourly jobs kicking off every hour using the streaming API,
each one of these jobs used to take 4/5 mins to complete but since 1pm
yesterday all of a sudden started taking 3/4 hours.

We looked at the data the jobs are working on and the data is exactly the
same as it always has been.
The cluster / config has not been touched since the upgrade to CDH3B4 which
was one month ago.

No errors are being reported in any of the logs, the jobs are just taking
longer, much longer.
One thing I have noticed in the logs, when the jobs just sit there in the
middle of a job I do see one consistent entry in the slave log files:

2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]

I see that entry in Map phases and Reduce phases, when the jobs just sit
idle for many tens of mins not doing anything.
This happens even if there is nothing else running on the cluster.

If anyone can shed some light on this or give me a direction to look into
further then it would be much appreciated.

Thank you.

Regards,
Abhinay Mehta

Re: MapReduce Jobs being 'stuck' for several hours and then completing

Posted by Abhinay Mehta <ab...@gmail.com>.
Thanks Koji I'll have a go.

On 28 April 2011 16:44, Koji Noguchi <kn...@yahoo-inc.com> wrote:

> Hi Abhinay,
>
> If you have access to the compute nodes, then
>
> 1) jstack of streaming mapper jvm
> 2) strace -f of streaming mapper jvm
> 3) strace -f of streaming map process itself
>
> might help.
>
> Koji
>
>
> On 4/28/11 3:33 AM, "Abhinay Mehta" <ab...@gmail.com> wrote:
>
> > Hi all,
> >
> > We are using CDH3B4 on the Hadoop Cluster.
> >
> > We have hourly jobs kicking off every hour using the streaming API,
> > each one of these jobs used to take 4/5 mins to complete but since 1pm
> > yesterday all of a sudden started taking 3/4 hours.
> >
> > We looked at the data the jobs are working on and the data is exactly the
> > same as it always has been.
> > The cluster / config has not been touched since the upgrade to CDH3B4
> which
> > was one month ago.
> >
> > No errors are being reported in any of the logs, the jobs are just taking
> > longer, much longer.
> > One thing I have noticed in the logs, when the jobs just sit there in the
> > middle of a job I do see one consistent entry in the slave log files:
> >
> > 2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
> > R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
> > 2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
> > R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
> >
> > I see that entry in Map phases and Reduce phases, when the jobs just sit
> > idle for many tens of mins not doing anything.
> > This happens even if there is nothing else running on the cluster.
> >
> > If anyone can shed some light on this or give me a direction to look into
> > further then it would be much appreciated.
> >
> > Thank you.
> >
> > Regards,
> > Abhinay Mehta
>
>

Re: MapReduce Jobs being 'stuck' for several hours and then completing

Posted by Koji Noguchi <kn...@yahoo-inc.com>.
Hi Abhinay, 

If you have access to the compute nodes, then

1) jstack of streaming mapper jvm
2) strace -f of streaming mapper jvm
3) strace -f of streaming map process itself

might help.

Koji


On 4/28/11 3:33 AM, "Abhinay Mehta" <ab...@gmail.com> wrote:

> Hi all,
> 
> We are using CDH3B4 on the Hadoop Cluster.
> 
> We have hourly jobs kicking off every hour using the streaming API,
> each one of these jobs used to take 4/5 mins to complete but since 1pm
> yesterday all of a sudden started taking 3/4 hours.
> 
> We looked at the data the jobs are working on and the data is exactly the
> same as it always has been.
> The cluster / config has not been touched since the upgrade to CDH3B4 which
> was one month ago.
> 
> No errors are being reported in any of the logs, the jobs are just taking
> longer, much longer.
> One thing I have noticed in the logs, when the jobs just sit there in the
> middle of a job I do see one consistent entry in the slave log files:
> 
> 2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
> R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
> 2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
> R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
> 
> I see that entry in Map phases and Reduce phases, when the jobs just sit
> idle for many tens of mins not doing anything.
> This happens even if there is nothing else running on the cluster.
> 
> If anyone can shed some light on this or give me a direction to look into
> further then it would be much appreciated.
> 
> Thank you.
> 
> Regards,
> Abhinay Mehta