You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Marc Limotte <ml...@feeva.com> on 2009/04/24 18:31:00 UTC

Advice on restarting HDFS in a cron

Hi.

I've heard that HDFS starts to slow down after it's been running for a long time.  And I believe I've experienced this.   So, I was thinking to set up a cron job to execute every week to shutdown HDFS and start it up again.

In concept, it would be something like:

0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh

But I'm wondering if there is a safer way to do this.  In particular:

*         What if a map/reduce job is running when this cron hits.  Is there a way to suspend jobs while the HDFS restart happens?

*         Should I also restart the mapred daemons?

*         Should I wait some time after "stop-dfs.sh" for things to settle down, before executing "start-dfs.sh"?  Or maybe I should run a command to verify that it is stopped before I run the start?

Thanks for any help.
Marc

________________________________
PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.

Re: Advice on restarting HDFS in a cron

Posted by jason hadoop <ja...@gmail.com>.

You can also turn down the server logging level via
bin/hadoop daemonlog
-getlevel host:port server
-setlevel host:port

For the namenode 50070
For the jobtracker 50030
For the tasktracker 50060
For the secondary namenode 50090
For the datanode 50075

Somewhat wiser than I with log4j many have a better suggestion for the
logger name to pick other than root., perhaps org.apache.hadoop.

I beleve this, run from the master node would set the log level to warn for
all the datanodes and tasktrackers
for a in `cat conf/slaves`; do bin/hadoop daemonlog -setlevel $a:50075 root
WARN; bin/hadoop daemonlog -setlevel $a:50060 root WARN; done
and of course for the master node jobtracker and namenode
bin/hadoop daemonlog -setlevel localhost:50030 root WARN
bin/hadoop daemonlog -setlevel localhost:50070 root WARN

On Sat, Apr 25, 2009 at 10:10 PM, Rakhi Khatwani
<ra...@gmail.com>wrote:

> Thanks Aaron.
>
> On Sun, Apr 26, 2009 at 10:37 AM, Aaron Kimball <aa...@cloudera.com>
> wrote:
>
> > If your logs were being written to the root partition (/dev/sda1), that's
> > going to fill up fast. This partition is always <= 10 GB on EC2 and much
> of
> > that space is consumed by the OS install. You should redirect your logs
> to
> > some place under /mnt (/dev/sdb1); that's 160 GB.
> >
> > - Aaron
> >
> > On Sun, Apr 26, 2009 at 3:21 AM, Rakhi Khatwani <
> rakhi.khatwani@gmail.com
> > >wrote:
> >
> > > Hi,
> > >   I have faced somewhat a similar issue...
> > >   i have a couple of map reduce jobs running on EC2... after a week or
> > so,
> > > i get a no space on device exception while performing any linux
> > command...
> > > so end up shuttin down hadoop and hbase, clear the logs and then
> restart
> > > them.
> > >
> > > is there a cleaner way to do it???
> > >
> > > thanks
> > > Raakhi
> > >
> > > On Fri, Apr 24, 2009 at 11:59 PM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >
> > > > On Fri, Apr 24, 2009 at 11:18 AM, Marc Limotte <ml...@feeva.com>
> > > wrote:
> > > >
> > > > > Actually, I'm concerned about performance of map/reduce jobs for a
> > > > > long-running cluster.  I.e. it seems to get slower the longer it's
> > > > running.
> > > > >  After a restart of HDFS, the jobs seems to run faster.  Not
> > concerned
> > > > about
> > > > > the start-up time of HDFS.
> > > > >
> > > >
> > > > Hi Marc,
> > > >
> > > > Does it sound like this JIRA describes your problem?
> > > >
> > > > https://issues.apache.org/jira/browse/HADOOP-4766
> > > >
> > > > If so, restarting just the JT should help with the symptoms. (I say
> > > > symptoms
> > > > because this is clearly a problem! Hadoop should be stable and
> > performant
> > > > for months without a cluster restart!)
> > > >
> > > > -Todd
> > > >
> > > >
> > > > >
> > > > > Of course, as you suggest, this could be poor configuration of the
> > > > cluster
> > > > > on my part; but I'd still like to hear best practices around doing
> a
> > > > > scheduled restart.
> > > > >
> > > > > Marc
> > > > >
> > > > > -----Original Message-----
> > > > > From: Allen Wittenauer [mailto:aw@yahoo-inc.com]
> > > > > Sent: Friday, April 24, 2009 10:17 AM
> > > > > To: core-user@hadoop.apache.org
> > > > > Subject: Re: Advice on restarting HDFS in a cron
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 4/24/09 9:31 AM, "Marc Limotte" <ml...@feeva.com> wrote:
> > > > > > I've heard that HDFS starts to slow down after it's been running
> > for
> > > a
> > > > > long
> > > > > > time.  And I believe I've experienced this.
> > > > >
> > > > > We did an upgrade (== complete restart) of a 2000 node instance in
> > ~20
> > > > > minutes on Wednesday. I wouldn't really consider that 'slow', but
> > YMMV.
> > > > >
> > > > > I suspect people aren't running the secondary name node and
> therefore
> > > > have
> > > > > massively large edits file.  The name node appears slow on restart
> > > > because
> > > > > it has to apply the edits to the fsimage rather than having the
> > > secondary
> > > > > keep it up to date.
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Marc Limotte
> > > > >
> > > > > Hi.
> > > > >
> > > > > I've heard that HDFS starts to slow down after it's been running
> for
> > a
> > > > long
> > > > > time.  And I believe I've experienced this.   So, I was thinking to
> > set
> > > > up a
> > > > > cron job to execute every week to shutdown HDFS and start it up
> > again.
> > > > >
> > > > > In concept, it would be something like:
> > > > >
> > > > > 0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh;
> $HADOOP_HOME/bin/start-dfs.sh
> > > > >
> > > > > But I'm wondering if there is a safer way to do this.  In
> particular:
> > > > >
> > > > > *         What if a map/reduce job is running when this cron hits.
> >  Is
> > > > > there a way to suspend jobs while the HDFS restart happens?
> > > > >
> > > > > *         Should I also restart the mapred daemons?
> > > > >
> > > > > *         Should I wait some time after "stop-dfs.sh" for things to
> > > > settle
> > > > > down, before executing "start-dfs.sh"?  Or maybe I should run a
> > command
> > > > to
> > > > > verify that it is stopped before I run the start?
> > > > >
> > > > > Thanks for any help.
> > > > > Marc
> > > > >
> > > > >
> > > > > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS
> MEANT
> > > FOR
> > > > > ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
> > > > COMMUNICATION
> > > > > PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW,
> > > USE,
> > > > > DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
> > > > > PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN
> > E-MAIL
> > > > AND
> > > > > PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
> > > > >
> > > >
> > >
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Advice on restarting HDFS in a cron

Posted by Rakhi Khatwani <ra...@gmail.com>.

Thanks Aaron.

On Sun, Apr 26, 2009 at 10:37 AM, Aaron Kimball <aa...@cloudera.com> wrote:

> If your logs were being written to the root partition (/dev/sda1), that's
> going to fill up fast. This partition is always <= 10 GB on EC2 and much of
> that space is consumed by the OS install. You should redirect your logs to
> some place under /mnt (/dev/sdb1); that's 160 GB.
>
> - Aaron
>
> On Sun, Apr 26, 2009 at 3:21 AM, Rakhi Khatwani <rakhi.khatwani@gmail.com
> >wrote:
>
> > Hi,
> >   I have faced somewhat a similar issue...
> >   i have a couple of map reduce jobs running on EC2... after a week or
> so,
> > i get a no space on device exception while performing any linux
> command...
> > so end up shuttin down hadoop and hbase, clear the logs and then restart
> > them.
> >
> > is there a cleaner way to do it???
> >
> > thanks
> > Raakhi
> >
> > On Fri, Apr 24, 2009 at 11:59 PM, Todd Lipcon <to...@cloudera.com> wrote:
> >
> > > On Fri, Apr 24, 2009 at 11:18 AM, Marc Limotte <ml...@feeva.com>
> > wrote:
> > >
> > > > Actually, I'm concerned about performance of map/reduce jobs for a
> > > > long-running cluster.  I.e. it seems to get slower the longer it's
> > > running.
> > > >  After a restart of HDFS, the jobs seems to run faster.  Not
> concerned
> > > about
> > > > the start-up time of HDFS.
> > > >
> > >
> > > Hi Marc,
> > >
> > > Does it sound like this JIRA describes your problem?
> > >
> > > https://issues.apache.org/jira/browse/HADOOP-4766
> > >
> > > If so, restarting just the JT should help with the symptoms. (I say
> > > symptoms
> > > because this is clearly a problem! Hadoop should be stable and
> performant
> > > for months without a cluster restart!)
> > >
> > > -Todd
> > >
> > >
> > > >
> > > > Of course, as you suggest, this could be poor configuration of the
> > > cluster
> > > > on my part; but I'd still like to hear best practices around doing a
> > > > scheduled restart.
> > > >
> > > > Marc
> > > >
> > > > -----Original Message-----
> > > > From: Allen Wittenauer [mailto:aw@yahoo-inc.com]
> > > > Sent: Friday, April 24, 2009 10:17 AM
> > > > To: core-user@hadoop.apache.org
> > > > Subject: Re: Advice on restarting HDFS in a cron
> > > >
> > > >
> > > >
> > > >
> > > > On 4/24/09 9:31 AM, "Marc Limotte" <ml...@feeva.com> wrote:
> > > > > I've heard that HDFS starts to slow down after it's been running
> for
> > a
> > > > long
> > > > > time.  And I believe I've experienced this.
> > > >
> > > > We did an upgrade (== complete restart) of a 2000 node instance in
> ~20
> > > > minutes on Wednesday. I wouldn't really consider that 'slow', but
> YMMV.
> > > >
> > > > I suspect people aren't running the secondary name node and therefore
> > > have
> > > > massively large edits file.  The name node appears slow on restart
> > > because
> > > > it has to apply the edits to the fsimage rather than having the
> > secondary
> > > > keep it up to date.
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Marc Limotte
> > > >
> > > > Hi.
> > > >
> > > > I've heard that HDFS starts to slow down after it's been running for
> a
> > > long
> > > > time.  And I believe I've experienced this.   So, I was thinking to
> set
> > > up a
> > > > cron job to execute every week to shutdown HDFS and start it up
> again.
> > > >
> > > > In concept, it would be something like:
> > > >
> > > > 0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh
> > > >
> > > > But I'm wondering if there is a safer way to do this.  In particular:
> > > >
> > > > *         What if a map/reduce job is running when this cron hits.
>  Is
> > > > there a way to suspend jobs while the HDFS restart happens?
> > > >
> > > > *         Should I also restart the mapred daemons?
> > > >
> > > > *         Should I wait some time after "stop-dfs.sh" for things to
> > > settle
> > > > down, before executing "start-dfs.sh"?  Or maybe I should run a
> command
> > > to
> > > > verify that it is stopped before I run the start?
> > > >
> > > > Thanks for any help.
> > > > Marc
> > > >
> > > >
> > > > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT
> > FOR
> > > > ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
> > > COMMUNICATION
> > > > PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW,
> > USE,
> > > > DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
> > > > PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN
> E-MAIL
> > > AND
> > > > PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
> > > >
> > >
> >
>

Re: Advice on restarting HDFS in a cron

Posted by Aaron Kimball <aa...@cloudera.com>.

If your logs were being written to the root partition (/dev/sda1), that's
going to fill up fast. This partition is always <= 10 GB on EC2 and much of
that space is consumed by the OS install. You should redirect your logs to
some place under /mnt (/dev/sdb1); that's 160 GB.

- Aaron

On Sun, Apr 26, 2009 at 3:21 AM, Rakhi Khatwani <ra...@gmail.com>wrote:

> Hi,
>   I have faced somewhat a similar issue...
>   i have a couple of map reduce jobs running on EC2... after a week or so,
> i get a no space on device exception while performing any linux command...
> so end up shuttin down hadoop and hbase, clear the logs and then restart
> them.
>
> is there a cleaner way to do it???
>
> thanks
> Raakhi
>
> On Fri, Apr 24, 2009 at 11:59 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
> > On Fri, Apr 24, 2009 at 11:18 AM, Marc Limotte <ml...@feeva.com>
> wrote:
> >
> > > Actually, I'm concerned about performance of map/reduce jobs for a
> > > long-running cluster.  I.e. it seems to get slower the longer it's
> > running.
> > >  After a restart of HDFS, the jobs seems to run faster.  Not concerned
> > about
> > > the start-up time of HDFS.
> > >
> >
> > Hi Marc,
> >
> > Does it sound like this JIRA describes your problem?
> >
> > https://issues.apache.org/jira/browse/HADOOP-4766
> >
> > If so, restarting just the JT should help with the symptoms. (I say
> > symptoms
> > because this is clearly a problem! Hadoop should be stable and performant
> > for months without a cluster restart!)
> >
> > -Todd
> >
> >
> > >
> > > Of course, as you suggest, this could be poor configuration of the
> > cluster
> > > on my part; but I'd still like to hear best practices around doing a
> > > scheduled restart.
> > >
> > > Marc
> > >
> > > -----Original Message-----
> > > From: Allen Wittenauer [mailto:aw@yahoo-inc.com]
> > > Sent: Friday, April 24, 2009 10:17 AM
> > > To: core-user@hadoop.apache.org
> > > Subject: Re: Advice on restarting HDFS in a cron
> > >
> > >
> > >
> > >
> > > On 4/24/09 9:31 AM, "Marc Limotte" <ml...@feeva.com> wrote:
> > > > I've heard that HDFS starts to slow down after it's been running for
> a
> > > long
> > > > time.  And I believe I've experienced this.
> > >
> > > We did an upgrade (== complete restart) of a 2000 node instance in ~20
> > > minutes on Wednesday. I wouldn't really consider that 'slow', but YMMV.
> > >
> > > I suspect people aren't running the secondary name node and therefore
> > have
> > > massively large edits file.  The name node appears slow on restart
> > because
> > > it has to apply the edits to the fsimage rather than having the
> secondary
> > > keep it up to date.
> > >
> > >
> > > -----Original Message-----
> > > From: Marc Limotte
> > >
> > > Hi.
> > >
> > > I've heard that HDFS starts to slow down after it's been running for a
> > long
> > > time.  And I believe I've experienced this.   So, I was thinking to set
> > up a
> > > cron job to execute every week to shutdown HDFS and start it up again.
> > >
> > > In concept, it would be something like:
> > >
> > > 0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh
> > >
> > > But I'm wondering if there is a safer way to do this.  In particular:
> > >
> > > *         What if a map/reduce job is running when this cron hits.  Is
> > > there a way to suspend jobs while the HDFS restart happens?
> > >
> > > *         Should I also restart the mapred daemons?
> > >
> > > *         Should I wait some time after "stop-dfs.sh" for things to
> > settle
> > > down, before executing "start-dfs.sh"?  Or maybe I should run a command
> > to
> > > verify that it is stopped before I run the start?
> > >
> > > Thanks for any help.
> > > Marc
> > >
> > >
> > > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT
> FOR
> > > ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
> > COMMUNICATION
> > > PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW,
> USE,
> > > DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
> > > PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL
> > AND
> > > PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
> > >
> >
>

Re: Advice on restarting HDFS in a cron

Posted by Rakhi Khatwani <ra...@gmail.com>.

Hi,
   I have faced somewhat a similar issue...
   i have a couple of map reduce jobs running on EC2... after a week or so,
i get a no space on device exception while performing any linux command...
so end up shuttin down hadoop and hbase, clear the logs and then restart
them.

is there a cleaner way to do it???

thanks
Raakhi

On Fri, Apr 24, 2009 at 11:59 PM, Todd Lipcon <to...@cloudera.com> wrote:

> On Fri, Apr 24, 2009 at 11:18 AM, Marc Limotte <ml...@feeva.com> wrote:
>
> > Actually, I'm concerned about performance of map/reduce jobs for a
> > long-running cluster.  I.e. it seems to get slower the longer it's
> running.
> >  After a restart of HDFS, the jobs seems to run faster.  Not concerned
> about
> > the start-up time of HDFS.
> >
>
> Hi Marc,
>
> Does it sound like this JIRA describes your problem?
>
> https://issues.apache.org/jira/browse/HADOOP-4766
>
> If so, restarting just the JT should help with the symptoms. (I say
> symptoms
> because this is clearly a problem! Hadoop should be stable and performant
> for months without a cluster restart!)
>
> -Todd
>
>
> >
> > Of course, as you suggest, this could be poor configuration of the
> cluster
> > on my part; but I'd still like to hear best practices around doing a
> > scheduled restart.
> >
> > Marc
> >
> > -----Original Message-----
> > From: Allen Wittenauer [mailto:aw@yahoo-inc.com]
> > Sent: Friday, April 24, 2009 10:17 AM
> > To: core-user@hadoop.apache.org
> > Subject: Re: Advice on restarting HDFS in a cron
> >
> >
> >
> >
> > On 4/24/09 9:31 AM, "Marc Limotte" <ml...@feeva.com> wrote:
> > > I've heard that HDFS starts to slow down after it's been running for a
> > long
> > > time.  And I believe I've experienced this.
> >
> > We did an upgrade (== complete restart) of a 2000 node instance in ~20
> > minutes on Wednesday. I wouldn't really consider that 'slow', but YMMV.
> >
> > I suspect people aren't running the secondary name node and therefore
> have
> > massively large edits file.  The name node appears slow on restart
> because
> > it has to apply the edits to the fsimage rather than having the secondary
> > keep it up to date.
> >
> >
> > -----Original Message-----
> > From: Marc Limotte
> >
> > Hi.
> >
> > I've heard that HDFS starts to slow down after it's been running for a
> long
> > time.  And I believe I've experienced this.   So, I was thinking to set
> up a
> > cron job to execute every week to shutdown HDFS and start it up again.
> >
> > In concept, it would be something like:
> >
> > 0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh
> >
> > But I'm wondering if there is a safer way to do this.  In particular:
> >
> > *         What if a map/reduce job is running when this cron hits.  Is
> > there a way to suspend jobs while the HDFS restart happens?
> >
> > *         Should I also restart the mapred daemons?
> >
> > *         Should I wait some time after "stop-dfs.sh" for things to
> settle
> > down, before executing "start-dfs.sh"?  Or maybe I should run a command
> to
> > verify that it is stopped before I run the start?
> >
> > Thanks for any help.
> > Marc
> >
> >
> > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
> > ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
> COMMUNICATION
> > PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
> > DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
> > PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL
> AND
> > PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
> >
>

Re: Advice on restarting HDFS in a cron

Posted by Todd Lipcon <to...@cloudera.com>.

On Fri, Apr 24, 2009 at 11:18 AM, Marc Limotte <ml...@feeva.com> wrote:

> Actually, I'm concerned about performance of map/reduce jobs for a
> long-running cluster.  I.e. it seems to get slower the longer it's running.
>  After a restart of HDFS, the jobs seems to run faster.  Not concerned about
> the start-up time of HDFS.
>

Hi Marc,

Does it sound like this JIRA describes your problem?

https://issues.apache.org/jira/browse/HADOOP-4766

If so, restarting just the JT should help with the symptoms. (I say symptoms
because this is clearly a problem! Hadoop should be stable and performant
for months without a cluster restart!)

-Todd


>
> Of course, as you suggest, this could be poor configuration of the cluster
> on my part; but I'd still like to hear best practices around doing a
> scheduled restart.
>
> Marc
>
> -----Original Message-----
> From: Allen Wittenauer [mailto:aw@yahoo-inc.com]
> Sent: Friday, April 24, 2009 10:17 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Advice on restarting HDFS in a cron
>
>
>
>
> On 4/24/09 9:31 AM, "Marc Limotte" <ml...@feeva.com> wrote:
> > I've heard that HDFS starts to slow down after it's been running for a
> long
> > time.  And I believe I've experienced this.
>
> We did an upgrade (== complete restart) of a 2000 node instance in ~20
> minutes on Wednesday. I wouldn't really consider that 'slow', but YMMV.
>
> I suspect people aren't running the secondary name node and therefore have
> massively large edits file.  The name node appears slow on restart because
> it has to apply the edits to the fsimage rather than having the secondary
> keep it up to date.
>
>
> -----Original Message-----
> From: Marc Limotte
>
> Hi.
>
> I've heard that HDFS starts to slow down after it's been running for a long
> time.  And I believe I've experienced this.   So, I was thinking to set up a
> cron job to execute every week to shutdown HDFS and start it up again.
>
> In concept, it would be something like:
>
> 0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh
>
> But I'm wondering if there is a safer way to do this.  In particular:
>
> *         What if a map/reduce job is running when this cron hits.  Is
> there a way to suspend jobs while the HDFS restart happens?
>
> *         Should I also restart the mapred daemons?
>
> *         Should I wait some time after "stop-dfs.sh" for things to settle
> down, before executing "start-dfs.sh"?  Or maybe I should run a command to
> verify that it is stopped before I run the start?
>
> Thanks for any help.
> Marc
>
>
> PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
> ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
> PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
> DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
> PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
> PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
>

RE: Advice on restarting HDFS in a cron

Posted by Marc Limotte <ml...@feeva.com>.

Actually, I'm concerned about performance of map/reduce jobs for a long-running cluster.  I.e. it seems to get slower the longer it's running.  After a restart of HDFS, the jobs seems to run faster.  Not concerned about the start-up time of HDFS.

Of course, as you suggest, this could be poor configuration of the cluster on my part; but I'd still like to hear best practices around doing a scheduled restart.

Marc

-----Original Message-----
From: Allen Wittenauer [mailto:aw@yahoo-inc.com]
Sent: Friday, April 24, 2009 10:17 AM
To: core-user@hadoop.apache.org
Subject: Re: Advice on restarting HDFS in a cron

On 4/24/09 9:31 AM, "Marc Limotte" <ml...@feeva.com> wrote:
> I've heard that HDFS starts to slow down after it's been running for a long
> time.  And I believe I've experienced this.

We did an upgrade (== complete restart) of a 2000 node instance in ~20
minutes on Wednesday. I wouldn't really consider that 'slow', but YMMV.

I suspect people aren't running the secondary name node and therefore have
massively large edits file.  The name node appears slow on restart because
it has to apply the edits to the fsimage rather than having the secondary
keep it up to date.

-----Original Message-----
From: Marc Limotte

Hi.

I've heard that HDFS starts to slow down after it's been running for a long time.  And I believe I've experienced this.   So, I was thinking to set up a cron job to execute every week to shutdown HDFS and start it up again.

In concept, it would be something like:

0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh

But I'm wondering if there is a safer way to do this.  In particular:

*         What if a map/reduce job is running when this cron hits.  Is there a way to suspend jobs while the HDFS restart happens?

*         Should I also restart the mapred daemons?

*         Should I wait some time after "stop-dfs.sh" for things to settle down, before executing "start-dfs.sh"?  Or maybe I should run a command to verify that it is stopped before I run the start?

Thanks for any help.
Marc

PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.

Re: Advice on restarting HDFS in a cron

Posted by Allen Wittenauer <aw...@yahoo-inc.com>.

On 4/24/09 9:31 AM, "Marc Limotte" <ml...@feeva.com> wrote:
> I've heard that HDFS starts to slow down after it's been running for a long
> time.  And I believe I've experienced this.

We did an upgrade (== complete restart) of a 2000 node instance in ~20
minutes on Wednesday. I wouldn't really consider that 'slow', but YMMV.

I suspect people aren't running the secondary name node and therefore have
massively large edits file.  The name node appears slow on restart because
it has to apply the edits to the fsimage rather than having the secondary
keep it up to date.