You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by yang song <ha...@gmail.com> on 2009/08/17 08:36:02 UTC

Why the jobs are suspended when I add new nodes?

Hi, all
    When I add another 50 nodes into the current cluster(200 nodes) at the
same time, the jobs run very smoothly at first. However, after a while, all
the jobs are suspended and never continue.
    I have no idea but to remove the new nodes. And the jobs run smoothly
again. Now I have to add nodes one by one. I won't add second node until the
jobs run smoothly after adding first node.
    Have you ever encountered the same situation? Could you give me some
tips and notes?
    Thank you!
    Inifok

Re: Why the jobs are suspended when I add new nodes?

Posted by yang song <ha...@gmail.com>.
Thank you for providing this information, and I think it may be resulted
from "too many fetch failures".  Now I have accumulated some experience and
I think I'll solve it soon. Thanks again.

2009/8/19 Jason Venner <ja...@gmail.com>

> I have added small numbers of nodes into running clusters, with running
> jobs
> without issue - when the machines were correctly configured for the
> cluster,
> so this is known to work at least in the 0.18 release series (when I was
> doing this operation).
>
> On Mon, Aug 17, 2009 at 6:56 AM, yang song <ha...@gmail.com>
> wrote:
>
> > The situation is I can't find any unusual thing from the logs.
> > Maybe there is a lot of data to transfer since so many new nodes and the
> > jobs are waiting for it
> >
> > 2009/8/17 Ted Dunning <te...@gmail.com>
> >
> > > Have you looked at the logs?
> > >
> > > On Sun, Aug 16, 2009 at 11:36 PM, yang song <ha...@gmail.com>
> > > wrote:
> > >
> > > > Hi, all
> > > >    When I add another 50 nodes into the current cluster(200 nodes) at
> > the
> > > > same time, the jobs run very smoothly at first. However, after a
> while,
> > > all
> > > > the jobs are suspended and never continue.
> > > >
> > > >
> > >
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

Re: Why the jobs are suspended when I add new nodes?

Posted by Jason Venner <ja...@gmail.com>.
I have added small numbers of nodes into running clusters, with running jobs
without issue - when the machines were correctly configured for the cluster,
so this is known to work at least in the 0.18 release series (when I was
doing this operation).

On Mon, Aug 17, 2009 at 6:56 AM, yang song <ha...@gmail.com> wrote:

> The situation is I can't find any unusual thing from the logs.
> Maybe there is a lot of data to transfer since so many new nodes and the
> jobs are waiting for it
>
> 2009/8/17 Ted Dunning <te...@gmail.com>
>
> > Have you looked at the logs?
> >
> > On Sun, Aug 16, 2009 at 11:36 PM, yang song <ha...@gmail.com>
> > wrote:
> >
> > > Hi, all
> > >    When I add another 50 nodes into the current cluster(200 nodes) at
> the
> > > same time, the jobs run very smoothly at first. However, after a while,
> > all
> > > the jobs are suspended and never continue.
> > >
> > >
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: Why the jobs are suspended when I add new nodes?

Posted by yang song <ha...@gmail.com>.
The situation is I can't find any unusual thing from the logs.
Maybe there is a lot of data to transfer since so many new nodes and the
jobs are waiting for it

2009/8/17 Ted Dunning <te...@gmail.com>

> Have you looked at the logs?
>
> On Sun, Aug 16, 2009 at 11:36 PM, yang song <ha...@gmail.com>
> wrote:
>
> > Hi, all
> >    When I add another 50 nodes into the current cluster(200 nodes) at the
> > same time, the jobs run very smoothly at first. However, after a while,
> all
> > the jobs are suspended and never continue.
> >
> >
>

Re: Why the jobs are suspended when I add new nodes?

Posted by Ted Dunning <te...@gmail.com>.
Have you looked at the logs?

On Sun, Aug 16, 2009 at 11:36 PM, yang song <ha...@gmail.com> wrote:

> Hi, all
>    When I add another 50 nodes into the current cluster(200 nodes) at the
> same time, the jobs run very smoothly at first. However, after a while, all
> the jobs are suspended and never continue.
>
>