You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Bill Au <bi...@gmail.com> on 2009/01/27 23:08:09 UTC

decommissioned node showing up ad dead node in web based interface to namenode (dfshealth.jsp)

I was able to decommission a datanode successfully without having to stop my
cluster.  But I noticed that after a node has been decommissioned, it shows
up as a dead node in the web base interface to the namenode (ie
dfshealth.jsp).  My cluster is relatively small and losing a datanode will
have performance impact.  So I have a need to monitor the health of my
cluster and take steps to revive any dead datanode in a timely fashion.  So
is there any way to altogether "get rid of" any decommissioned datanode from
the web interace of the namenode?  Or is there a better way to monitor the
health of the cluster?

Bill

Re: decommissioned node showing up ad dead node in web based interface to namenode (dfshealth.jsp)

Posted by Bill Au <bi...@gmail.com>.
I have been looking into this some more by looking a the output of dfsadmin
-report during the decommissioning process.  After a node has been
decommissioned, dfsadmin -report shows that the node is in the
Decommissioned state.  The web interface dfshealth.jsp shows it as a dead
node.  After I removed the decommissioned node from the exclude file and run
the refreshNodes command, the web interface continues to show it as a dead
node but dfsadmin -report shows the node to be in service.  After I restart
HDFS dfsadmin -report shows the correct information again.

If I restart HDFS leaving the decommissioned node in the exlude, the web
interface shows it as a dead node and dfsadmin -report shows it to be in
service.  But after I remove it from the exclude file and run the
refreshNodes command, both the web interface and dfsadmin -report show the
correct information.

It looks to me I should only remove the decommissioned node from the exclude
file after restarting HDFS.

I would still like to see the web interface report any decommissioned node
as decommissioned rather than dead as with the case with dfsadmin -report.
I am willing to work on a patch for this.  Before I start, does anyone know
if this is already in the works?

Bill

On Mon, Feb 2, 2009 at 5:00 PM, Bill Au <bi...@gmail.com> wrote:

> It looks like the behavior is the same with 0.18.2 and 0.19.0.  Even though
> I removed the decommissioned node from the exclude file and run the
> refreshNode command, the decommissioned node still show up as a dead node.
> What I did noticed is that if I leave the decommissioned node in the exclude
> and restart HDFS, the node will show up as a dead node after restart.  But
> then if I remove it from the exclude file and run the refreshNode command,
> it will disappear from the status page (dfshealth.jsp).
>
> So it looks like I will have to stop and start the entire cluster in order
> to get what I want.
>
> Bill
>
>
> On Thu, Jan 29, 2009 at 5:40 PM, Bill Au <bi...@gmail.com> wrote:
>
>> Not sure why but this does not work for me.  I am running 0.18.2.  I ran
>> hadoop dfsadmin -refreshNodes after removing the decommissioned node from
>> the exclude file.  It still shows up as a dead node.  I also removed it from
>> the slaves file and ran the refresh nodes command again.  It still shows up
>> as a dead node after that.
>>
>> I am going to upgrade to 0.19.0 to see if it makes any difference.
>>
>> Bill
>>
>>
>> On Tue, Jan 27, 2009 at 7:01 PM, paul <pa...@gmail.com> wrote:
>>
>>> Once the nodes are listed as dead, if you still have the host names in
>>> your
>>> conf/exclude file, remove the entries and then run hadoop dfsadmin
>>> -refreshNodes.
>>>
>>>
>>> This works for us on our cluster.
>>>
>>>
>>>
>>> -paul
>>>
>>>
>>> On Tue, Jan 27, 2009 at 5:08 PM, Bill Au <bi...@gmail.com> wrote:
>>>
>>> > I was able to decommission a datanode successfully without having to
>>> stop
>>> > my
>>> > cluster.  But I noticed that after a node has been decommissioned, it
>>> shows
>>> > up as a dead node in the web base interface to the namenode (ie
>>> > dfshealth.jsp).  My cluster is relatively small and losing a datanode
>>> will
>>> > have performance impact.  So I have a need to monitor the health of my
>>> > cluster and take steps to revive any dead datanode in a timely fashion.
>>>  So
>>> > is there any way to altogether "get rid of" any decommissioned datanode
>>> > from
>>> > the web interace of the namenode?  Or is there a better way to monitor
>>> the
>>> > health of the cluster?
>>> >
>>> > Bill
>>> >
>>>
>>
>>
>

Re: decommissioned node showing up ad dead node in web based interface to namenode (dfshealth.jsp)

Posted by Bill Au <bi...@gmail.com>.
It looks like the behavior is the same with 0.18.2 and 0.19.0.  Even though
I removed the decommissioned node from the exclude file and run the
refreshNode command, the decommissioned node still show up as a dead node.
What I did noticed is that if I leave the decommissioned node in the exclude
and restart HDFS, the node will show up as a dead node after restart.  But
then if I remove it from the exclude file and run the refreshNode command,
it will disappear from the status page (dfshealth.jsp).

So it looks like I will have to stop and start the entire cluster in order
to get what I want.

Bill

On Thu, Jan 29, 2009 at 5:40 PM, Bill Au <bi...@gmail.com> wrote:

> Not sure why but this does not work for me.  I am running 0.18.2.  I ran
> hadoop dfsadmin -refreshNodes after removing the decommissioned node from
> the exclude file.  It still shows up as a dead node.  I also removed it from
> the slaves file and ran the refresh nodes command again.  It still shows up
> as a dead node after that.
>
> I am going to upgrade to 0.19.0 to see if it makes any difference.
>
> Bill
>
>
> On Tue, Jan 27, 2009 at 7:01 PM, paul <pa...@gmail.com> wrote:
>
>> Once the nodes are listed as dead, if you still have the host names in
>> your
>> conf/exclude file, remove the entries and then run hadoop dfsadmin
>> -refreshNodes.
>>
>>
>> This works for us on our cluster.
>>
>>
>>
>> -paul
>>
>>
>> On Tue, Jan 27, 2009 at 5:08 PM, Bill Au <bi...@gmail.com> wrote:
>>
>> > I was able to decommission a datanode successfully without having to
>> stop
>> > my
>> > cluster.  But I noticed that after a node has been decommissioned, it
>> shows
>> > up as a dead node in the web base interface to the namenode (ie
>> > dfshealth.jsp).  My cluster is relatively small and losing a datanode
>> will
>> > have performance impact.  So I have a need to monitor the health of my
>> > cluster and take steps to revive any dead datanode in a timely fashion.
>>  So
>> > is there any way to altogether "get rid of" any decommissioned datanode
>> > from
>> > the web interace of the namenode?  Or is there a better way to monitor
>> the
>> > health of the cluster?
>> >
>> > Bill
>> >
>>
>
>

Re: decommissioned node showing up ad dead node in web based interface to namenode (dfshealth.jsp)

Posted by Bill Au <bi...@gmail.com>.
Alyssa,
     I am not trying to revive the dead node.  I want to permanently remove
a node from the cluster.  But after decommissioning it, it shows up as a
dead node until I restart the cluster.  I am looking for a way to get rid of
it from the dfshealth.jsp page without having to restart the cluster.

Bill

On Thu, Jan 29, 2009 at 5:45 PM, Hargraves, Alyssa <al...@wpi.edu> wrote:

> Bill-
>
> I believe once the node is decommissioned you'll also have to run
> bin/hadoop-daemon.sh start datanode and bin/hadoop-daemon.sh start
> tasktracker (both run on the slave node, not master) to revive the dead
> node.  Just removing it from exclude and refreshing doesn't work for me
> either, but with those two additional commands it does.
>
> - Alyssa
> ________________________________________
> From: Bill Au [bill.w.au@gmail.com]
> Sent: Thursday, January 29, 2009 5:40 PM
> To: core-user@hadoop.apache.org
> Subject: Re: decommissioned node showing up ad dead node in web based
> interface to namenode (dfshealth.jsp)
>
> Not sure why but this does not work for me.  I am running 0.18.2.  I ran
> hadoop dfsadmin -refreshNodes after removing the decommissioned node from
> the exclude file.  It still shows up as a dead node.  I also removed it
> from
> the slaves file and ran the refresh nodes command again.  It still shows up
> as a dead node after that.
>
> I am going to upgrade to 0.19.0 to see if it makes any difference.
>
> Bill
>
> On Tue, Jan 27, 2009 at 7:01 PM, paul <pa...@gmail.com> wrote:
>
> > Once the nodes are listed as dead, if you still have the host names in
> your
> > conf/exclude file, remove the entries and then run hadoop dfsadmin
> > -refreshNodes.
> >
> >
> > This works for us on our cluster.
> >
> >
> >
> > -paul
> >
> >
> > On Tue, Jan 27, 2009 at 5:08 PM, Bill Au <bi...@gmail.com> wrote:
> >
> > > I was able to decommission a datanode successfully without having to
> stop
> > > my
> > > cluster.  But I noticed that after a node has been decommissioned, it
> > shows
> > > up as a dead node in the web base interface to the namenode (ie
> > > dfshealth.jsp).  My cluster is relatively small and losing a datanode
> > will
> > > have performance impact.  So I have a need to monitor the health of my
> > > cluster and take steps to revive any dead datanode in a timely fashion.
> >  So
> > > is there any way to altogether "get rid of" any decommissioned datanode
> > > from
> > > the web interace of the namenode?  Or is there a better way to monitor
> > the
> > > health of the cluster?
> > >
> > > Bill
> > >
> >
>

RE: decommissioned node showing up ad dead node in web based interface to namenode (dfshealth.jsp)

Posted by "Hargraves, Alyssa" <al...@WPI.EDU>.
Bill-

I believe once the node is decommissioned you'll also have to run bin/hadoop-daemon.sh start datanode and bin/hadoop-daemon.sh start tasktracker (both run on the slave node, not master) to revive the dead node.  Just removing it from exclude and refreshing doesn't work for me either, but with those two additional commands it does.

- Alyssa
________________________________________
From: Bill Au [bill.w.au@gmail.com]
Sent: Thursday, January 29, 2009 5:40 PM
To: core-user@hadoop.apache.org
Subject: Re: decommissioned node showing up ad dead node in web based   interface to namenode (dfshealth.jsp)

Not sure why but this does not work for me.  I am running 0.18.2.  I ran
hadoop dfsadmin -refreshNodes after removing the decommissioned node from
the exclude file.  It still shows up as a dead node.  I also removed it from
the slaves file and ran the refresh nodes command again.  It still shows up
as a dead node after that.

I am going to upgrade to 0.19.0 to see if it makes any difference.

Bill

On Tue, Jan 27, 2009 at 7:01 PM, paul <pa...@gmail.com> wrote:

> Once the nodes are listed as dead, if you still have the host names in your
> conf/exclude file, remove the entries and then run hadoop dfsadmin
> -refreshNodes.
>
>
> This works for us on our cluster.
>
>
>
> -paul
>
>
> On Tue, Jan 27, 2009 at 5:08 PM, Bill Au <bi...@gmail.com> wrote:
>
> > I was able to decommission a datanode successfully without having to stop
> > my
> > cluster.  But I noticed that after a node has been decommissioned, it
> shows
> > up as a dead node in the web base interface to the namenode (ie
> > dfshealth.jsp).  My cluster is relatively small and losing a datanode
> will
> > have performance impact.  So I have a need to monitor the health of my
> > cluster and take steps to revive any dead datanode in a timely fashion.
>  So
> > is there any way to altogether "get rid of" any decommissioned datanode
> > from
> > the web interace of the namenode?  Or is there a better way to monitor
> the
> > health of the cluster?
> >
> > Bill
> >
>

Re: decommissioned node showing up ad dead node in web based interface to namenode (dfshealth.jsp)

Posted by Bill Au <bi...@gmail.com>.
Not sure why but this does not work for me.  I am running 0.18.2.  I ran
hadoop dfsadmin -refreshNodes after removing the decommissioned node from
the exclude file.  It still shows up as a dead node.  I also removed it from
the slaves file and ran the refresh nodes command again.  It still shows up
as a dead node after that.

I am going to upgrade to 0.19.0 to see if it makes any difference.

Bill

On Tue, Jan 27, 2009 at 7:01 PM, paul <pa...@gmail.com> wrote:

> Once the nodes are listed as dead, if you still have the host names in your
> conf/exclude file, remove the entries and then run hadoop dfsadmin
> -refreshNodes.
>
>
> This works for us on our cluster.
>
>
>
> -paul
>
>
> On Tue, Jan 27, 2009 at 5:08 PM, Bill Au <bi...@gmail.com> wrote:
>
> > I was able to decommission a datanode successfully without having to stop
> > my
> > cluster.  But I noticed that after a node has been decommissioned, it
> shows
> > up as a dead node in the web base interface to the namenode (ie
> > dfshealth.jsp).  My cluster is relatively small and losing a datanode
> will
> > have performance impact.  So I have a need to monitor the health of my
> > cluster and take steps to revive any dead datanode in a timely fashion.
>  So
> > is there any way to altogether "get rid of" any decommissioned datanode
> > from
> > the web interace of the namenode?  Or is there a better way to monitor
> the
> > health of the cluster?
> >
> > Bill
> >
>

Re: decommissioned node showing up ad dead node in web based interface to namenode (dfshealth.jsp)

Posted by paul <pa...@gmail.com>.
Once the nodes are listed as dead, if you still have the host names in your
conf/exclude file, remove the entries and then run hadoop dfsadmin
-refreshNodes.


This works for us on our cluster.



-paul


On Tue, Jan 27, 2009 at 5:08 PM, Bill Au <bi...@gmail.com> wrote:

> I was able to decommission a datanode successfully without having to stop
> my
> cluster.  But I noticed that after a node has been decommissioned, it shows
> up as a dead node in the web base interface to the namenode (ie
> dfshealth.jsp).  My cluster is relatively small and losing a datanode will
> have performance impact.  So I have a need to monitor the health of my
> cluster and take steps to revive any dead datanode in a timely fashion.  So
> is there any way to altogether "get rid of" any decommissioned datanode
> from
> the web interace of the namenode?  Or is there a better way to monitor the
> health of the cluster?
>
> Bill
>