You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Robert Dyer <ps...@gmail.com> on 2012/07/13 06:03:02 UTC

Jobs randomly not starting

I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2
compute nodes).  My input size is a sequence file of around 280mb.

Generally, my jobs run just fine and all finish in 2-5 minutes.  However,
quite randomly the jobs refuse to run.  They submit and appear when running
'hadoop job -list' but don't appear on the jobtracker's webpage.  If I
manually type in the job ID on the webpage I can see it is trying to run
the setup task - the map tasks haven't even started.  I've left them to run
and even after several minutes it is still in this state.

When I spot this, I kill the job and resubmit it and generally it works.

A couple of times I have seen similar problems with reduce tasks that get
stuck while 'initializing'.

Any ideas?

Re: Jobs randomly not starting

Posted by Robert Dyer <ps...@gmail.com>.
Upon further inspection of that log, it appears the problem is the startup
task just takes a very long time.

Typically it is taking at most 6 seconds, but sometimes (the cases I think
its hanging) it actually runs and finishes but takes 3-5 minutes.

Same problem with the cleanup (which is where I thought the reduce was
getting stuck).

I am currently the only user on this cluster and I never have more than 1
job in the queue at a time.

Ideas?

On Fri, Jul 13, 2012 at 1:04 AM, Harsh J <ha...@cloudera.com> wrote:

> Hey Robert,
>
> Any chance you can pastebin the JT logs, grepped for the bad job ID,
> and send the link across? They shouldn't hang the way you describe.
>
> On Fri, Jul 13, 2012 at 9:33 AM, Robert Dyer <ps...@gmail.com> wrote:
> > I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2
> > compute nodes).  My input size is a sequence file of around 280mb.
> >
> > Generally, my jobs run just fine and all finish in 2-5 minutes.  However,
> > quite randomly the jobs refuse to run.  They submit and appear when
> running
> > 'hadoop job -list' but don't appear on the jobtracker's webpage.  If I
> > manually type in the job ID on the webpage I can see it is trying to run
> the
> > setup task - the map tasks haven't even started.  I've left them to run
> and
> > even after several minutes it is still in this state.
> >
> > When I spot this, I kill the job and resubmit it and generally it works.
> >
> > A couple of times I have seen similar problems with reduce tasks that get
> > stuck while 'initializing'.
> >
> > Any ideas?
> >
>

Re: Jobs randomly not starting

Posted by Harsh J <ha...@cloudera.com>.
Hey Robert,

Any chance you can pastebin the JT logs, grepped for the bad job ID,
and send the link across? They shouldn't hang the way you describe.

On Fri, Jul 13, 2012 at 9:33 AM, Robert Dyer <ps...@gmail.com> wrote:
> I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2
> compute nodes).  My input size is a sequence file of around 280mb.
>
> Generally, my jobs run just fine and all finish in 2-5 minutes.  However,
> quite randomly the jobs refuse to run.  They submit and appear when running
> 'hadoop job -list' but don't appear on the jobtracker's webpage.  If I
> manually type in the job ID on the webpage I can see it is trying to run the
> setup task - the map tasks haven't even started.  I've left them to run and
> even after several minutes it is still in this state.
>
> When I spot this, I kill the job and resubmit it and generally it works.
>
> A couple of times I have seen similar problems with reduce tasks that get
> stuck while 'initializing'.
>
> Any ideas?
>



-- 
Harsh J

Re: Jobs randomly not starting

Posted by Bejoy KS <be...@gmail.com>.
Hi Robert

It could be because there are no free slots available in your cluster during job submission time to launch those tasks. Some other tasks may have already occupied the map/reduce slots. 

When you experience this random issue please  verify whether there are free task slots available.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Robert Dyer <ps...@gmail.com>
Date: Thu, 12 Jul 2012 23:03:02 
To: <ma...@hadoop.apache.org>
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Jobs randomly not starting

I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2
compute nodes).  My input size is a sequence file of around 280mb.

Generally, my jobs run just fine and all finish in 2-5 minutes.  However,
quite randomly the jobs refuse to run.  They submit and appear when running
'hadoop job -list' but don't appear on the jobtracker's webpage.  If I
manually type in the job ID on the webpage I can see it is trying to run
the setup task - the map tasks haven't even started.  I've left them to run
and even after several minutes it is still in this state.

When I spot this, I kill the job and resubmit it and generally it works.

A couple of times I have seen similar problems with reduce tasks that get
stuck while 'initializing'.

Any ideas?