You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Meng Mao <me...@gmail.com> on 2008/06/27 16:48:56 UTC

best command line way to check up/down status of HDFS?

For a Nagios script I'm writing, I'd like a command-line method that checks
if HDFS is up and running.
Is there a better way than to attempt a hadoop dfs command and check the
error code?

Re: best command line way to check up/down status of HDFS?

Posted by Meng Mao <me...@gmail.com>.
I realy like method 3.

I am doing sceenscraping of the jobtracker JSP page, but I thought that was
only a partial solution, since the format of the page could change at any
moment, and because it's potentially much more computationally intensive,
depending on how much information I want to extract. One thing I thought of
would be to create a custom 'naked' JSP that has very little formatting.

On Wed, Jul 2, 2008 at 6:19 AM, Steve Loughran <st...@apache.org> wrote:

> Meng Mao wrote:
>
>> For a Nagios script I'm writing, I'd like a command-line method that
>> checks
>> if HDFS is up and running.
>> Is there a better way than to attempt a hadoop dfs command and check the
>> error code?
>>
>
> 1. There is JMX support built in to Hadoop. If you can bring up Hadoop
> running a JMX agent that is compatible with Nagios, you can keep a close eye
> on the internals.
>
> 2.. I'm making some lifecycle changes to Hadoop; if/when accepted every
> service (name,data, job,...) will have an internal ping() operation to check
> their health -this can be checked in-process only. I'm also adding the
> smartfrog support to do that in-processing pinging, fallback etc; I dont
> know how nagios would work there, but JMX support for these ops should also
> be possible.
>
> 3. When a datanode comes up it starts jetty on a specific port -you can do
> a GET against that jetty instance to see if it is responding. This is a good
> test as it really does verify that the service is live and responding.
> Indeed, that is the official definition of "liveness", at least according to
> Lamport.
>  * review the code to make sure it turns caching off, or you can be burned
> probing for health long hall, seeing the happy page and thinking all is
> well. I forgot to do that in happyaxis.jsp, which is why axis 1.x health
> checks dont work long-haul.
>  * I could imagine improving those pages with better ones, like something
> that checks that the available freespace is within a certain range, and
> returns an error code if there is less, e.g.
>  http://datanode7:5000/checkDiskSpace?mingb=1500
> would test for a min disk space of 1500GB.
>
> There are also web pages for job trackers & the like; better for remote
> health checking than jps checks. JPS (and killall) is better for fallback
> when the things stop responding, but  not adequate for liveness checks.
>
>


-- 
hustlin, hustlin, everyday I'm hustlin

Re: best command line way to check up/down status of HDFS?

Posted by Steve Loughran <st...@apache.org>.
Meng Mao wrote:
> For a Nagios script I'm writing, I'd like a command-line method that checks
> if HDFS is up and running.
> Is there a better way than to attempt a hadoop dfs command and check the
> error code?

1. There is JMX support built in to Hadoop. If you can bring up Hadoop 
running a JMX agent that is compatible with Nagios, you can keep a close 
eye on the internals.

2.. I'm making some lifecycle changes to Hadoop; if/when accepted every 
service (name,data, job,...) will have an internal ping() operation to 
check their health -this can be checked in-process only. I'm also adding 
the smartfrog support to do that in-processing pinging, fallback etc; I 
dont know how nagios would work there, but JMX support for these ops 
should also be possible.

3. When a datanode comes up it starts jetty on a specific port -you can 
do a GET against that jetty instance to see if it is responding. This is 
a good test as it really does verify that the service is live and 
responding. Indeed, that is the official definition of "liveness", at 
least according to Lamport.
  * review the code to make sure it turns caching off, or you can be 
burned probing for health long hall, seeing the happy page and thinking 
all is well. I forgot to do that in happyaxis.jsp, which is why axis 1.x 
health checks dont work long-haul.
  * I could imagine improving those pages with better ones, like 
something that checks that the available freespace is within a certain 
range, and returns an error code if there is less, e.g.
  http://datanode7:5000/checkDiskSpace?mingb=1500
would test for a min disk space of 1500GB.

There are also web pages for job trackers & the like; better for remote 
health checking than jps checks. JPS (and killall) is better for 
fallback when the things stop responding, but  not adequate for liveness 
checks.


Re: best command line way to check up/down status of HDFS?

Posted by Miles Osborne <mi...@inf.ed.ac.uk>.
in that case, do:

jps

and look for both the namenode and also the secondary node

Miles

2008/6/27 Meng Mao <me...@gmail.com>:

> I was thinking of checking for both independently, and taking a logical OR.
> Would that be sufficient?
>
> I'm trying to avoid file reading if possible. Not that reading through a
> log
> is that intensive,
> but it'd be cleaner if I could poll either Hadoop itself or inspect the
> processes running.
>
> On Fri, Jun 27, 2008 at 1:23 PM, Miles Osborne <mi...@inf.ed.ac.uk> wrote:
>
> > that won't work since the namenode may be down, but the secondary
> namenode
> > may be up instead
> >
> > why not instead just look at the respective logs?
> >
> > Miles
> >
> > 2008/6/27 Meng Mao <me...@gmail.com>:
> >
> > > Is running:
> > > ps aux | grep [\\.]NameNode
> > >
> > > and looking for a non empty response a good way to test HDFS up status?
> > >
> > > I'm assuming that if the NameNode process is down, then DFS is
> definitely
> > > down?
> > > Worried that there'd be frequent cases of DFS being messed up but the
> > > process still running just fine.
> > >
> > > On Fri, Jun 27, 2008 at 10:48 AM, Meng Mao <me...@gmail.com> wrote:
> > >
> > > > For a Nagios script I'm writing, I'd like a command-line method that
> > > checks
> > > > if HDFS is up and running.
> > > > Is there a better way than to attempt a hadoop dfs command and check
> > the
> > > > error code?
> > > >
> > >
> > >
> > >
> > > --
> > > hustlin, hustlin, everyday I'm hustlin
> > >
> >
> >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in Scotland,
> > with registration number SC005336.
> >
>
>
>
> --
> hustlin, hustlin, everyday I'm hustlin
>



-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

Re: best command line way to check up/down status of HDFS?

Posted by Meng Mao <me...@gmail.com>.
I was thinking of checking for both independently, and taking a logical OR.
Would that be sufficient?

I'm trying to avoid file reading if possible. Not that reading through a log
is that intensive,
but it'd be cleaner if I could poll either Hadoop itself or inspect the
processes running.

On Fri, Jun 27, 2008 at 1:23 PM, Miles Osborne <mi...@inf.ed.ac.uk> wrote:

> that won't work since the namenode may be down, but the secondary namenode
> may be up instead
>
> why not instead just look at the respective logs?
>
> Miles
>
> 2008/6/27 Meng Mao <me...@gmail.com>:
>
> > Is running:
> > ps aux | grep [\\.]NameNode
> >
> > and looking for a non empty response a good way to test HDFS up status?
> >
> > I'm assuming that if the NameNode process is down, then DFS is definitely
> > down?
> > Worried that there'd be frequent cases of DFS being messed up but the
> > process still running just fine.
> >
> > On Fri, Jun 27, 2008 at 10:48 AM, Meng Mao <me...@gmail.com> wrote:
> >
> > > For a Nagios script I'm writing, I'd like a command-line method that
> > checks
> > > if HDFS is up and running.
> > > Is there a better way than to attempt a hadoop dfs command and check
> the
> > > error code?
> > >
> >
> >
> >
> > --
> > hustlin, hustlin, everyday I'm hustlin
> >
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
>



-- 
hustlin, hustlin, everyday I'm hustlin

Re: best command line way to check up/down status of HDFS?

Posted by Miles Osborne <mi...@inf.ed.ac.uk>.
that won't work since the namenode may be down, but the secondary namenode
may be up instead

why not instead just look at the respective logs?

Miles

2008/6/27 Meng Mao <me...@gmail.com>:

> Is running:
> ps aux | grep [\\.]NameNode
>
> and looking for a non empty response a good way to test HDFS up status?
>
> I'm assuming that if the NameNode process is down, then DFS is definitely
> down?
> Worried that there'd be frequent cases of DFS being messed up but the
> process still running just fine.
>
> On Fri, Jun 27, 2008 at 10:48 AM, Meng Mao <me...@gmail.com> wrote:
>
> > For a Nagios script I'm writing, I'd like a command-line method that
> checks
> > if HDFS is up and running.
> > Is there a better way than to attempt a hadoop dfs command and check the
> > error code?
> >
>
>
>
> --
> hustlin, hustlin, everyday I'm hustlin
>



-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

Re: best command line way to check up/down status of HDFS?

Posted by Meng Mao <me...@gmail.com>.
Is running:
ps aux | grep [\\.]NameNode

and looking for a non empty response a good way to test HDFS up status?

I'm assuming that if the NameNode process is down, then DFS is definitely
down?
Worried that there'd be frequent cases of DFS being messed up but the
process still running just fine.

On Fri, Jun 27, 2008 at 10:48 AM, Meng Mao <me...@gmail.com> wrote:

> For a Nagios script I'm writing, I'd like a command-line method that checks
> if HDFS is up and running.
> Is there a better way than to attempt a hadoop dfs command and check the
> error code?
>



-- 
hustlin, hustlin, everyday I'm hustlin