You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by David Hall <dl...@stanford.edu> on 2008/12/04 09:48:28 UTC

When is decomissioning done?

Hi,

I'm trying to decommission some nodes. The process I tried to follow is:

1) add them to conf/excluding (hadoop-site points there)
2) invoke hadoop dfsadmin -refreshNodes

This returns immediately, so I thought it was done, so i killed off
the cluster and rebooted without the new nodes, but then fsck was very
unhappy...

Is there some way to watch the progress of decomissioning?

Thanks,
-- David

Re: When is decomissioning done?

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey David,

Look at the web interface.  Here's mine:

http://dcache-head.unl.edu:8088/dfshealth.jsp

The "admin state" column says "in service" for normal nodes, and  
"decommissioning in progress" for the rest.  When the decommissioning  
is done, the nodes will migrate to the list of "dead nodes" and shut  
themselves off.  Only then can you safely turn off nodes.

Brian

On Dec 4, 2008, at 2:56 AM, David Hall wrote:

> I'm starting to think I'm doing things wrong.
>
> I have an absolute path to dfs.hosts.exclude that includes what i want
> decommissioned, and a dfs.hosts which includes those i want to remain
> commissioned (this points to the slaves file).
>
> Nothing seems to do anything...
>
> What am I missing?
>
> -- David
>
> On Thu, Dec 4, 2008 at 12:48 AM, David Hall <dl...@stanford.edu> wrote:
>> Hi,
>>
>> I'm trying to decommission some nodes. The process I tried to  
>> follow is:
>>
>> 1) add them to conf/excluding (hadoop-site points there)
>> 2) invoke hadoop dfsadmin -refreshNodes
>>
>> This returns immediately, so I thought it was done, so i killed off
>> the cluster and rebooted without the new nodes, but then fsck was  
>> very
>> unhappy...
>>
>> Is there some way to watch the progress of decomissioning?
>>
>> Thanks,
>> -- David
>>


Re: When is decomissioning done?

Posted by David Hall <dl...@stanford.edu>.
Thanks for the link.

I followed that guide, and now I have rather strange behavior. If I
have dfs.hosts set (I didn't when I wrote my last email) to an empty
file when I start the cluster, nothing happens when I refreshnodes; I
take it that's expected. If it's set it to the hosts I want to keep,
none of the datanodes come up at start up, and die with this error. On
the dfshealth page, they're all listed as dead. If instead it's empty
on startup and then I add the hosts, everyone dies when I
refreshNodes.

Thoughts? I'm running 0.18.2. (We haven't moved to java 6 here yet)

Thanks!
-- David

2008-12-04 01:18:10,909 ERROR org.apache.hadoop.dfs.DataNode:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.dfs.DisallowedDatanodeException: Datanode denied
communication with namenode: HOST:PORT # changed.
        at org.apache.hadoop.dfs.FSNamesystem.registerDatanode(FSNamesystem.java:1938)
        at org.apache.hadoop.dfs.NameNode.register(NameNode.java:585)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

        at org.apache.hadoop.ipc.Client.call(Client.java:715)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at org.apache.hadoop.dfs.$Proxy4.register(Unknown Source)
        at org.apache.hadoop.dfs.DataNode.register(DataNode.java:529)
        at org.apache.hadoop.dfs.DataNode.runDatanodeDaemon(DataNode.java:2960)
        at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2995)
        at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3116)


On Thu, Dec 4, 2008 at 9:12 AM, Konstantin Shvachko <sh...@yahoo-inc.com> wrote:
> Just for the reference these links:
> http://wiki.apache.org/hadoop/FAQ#17
> http://hadoop.apache.org/core/docs/r0.19.0/hdfs_user_guide.html#DFSAdmin+Command
>
> Decommissioning is not happening at once.
> -refreshNodes just starts the process, but does not complete it.
> There could be a lot of blocks on the nodes you want to decommission,
> and replication takes time.
> The progress can be monitored on the name-node web UI.
> Right after -refreshNodes on the web ui you will see the nodes you chose for
> decommission have state "Decommission In Progress" you should wait until it
> is
> changed to "Decommissioned" and then turn the node off.
>
> --Konstantin
>
>
> David Hall wrote:
>>
>> I'm starting to think I'm doing things wrong.
>>
>> I have an absolute path to dfs.hosts.exclude that includes what i want
>> decommissioned, and a dfs.hosts which includes those i want to remain
>> commissioned (this points to the slaves file).
>>
>> Nothing seems to do anything...
>>
>> What am I missing?
>>
>> -- David
>>
>> On Thu, Dec 4, 2008 at 12:48 AM, David Hall <dl...@stanford.edu> wrote:
>>>
>>> Hi,
>>>
>>> I'm trying to decommission some nodes. The process I tried to follow is:
>>>
>>> 1) add them to conf/excluding (hadoop-site points there)
>>> 2) invoke hadoop dfsadmin -refreshNodes
>>>
>>> This returns immediately, so I thought it was done, so i killed off
>>> the cluster and rebooted without the new nodes, but then fsck was very
>>> unhappy...
>>>
>>> Is there some way to watch the progress of decomissioning?
>>>
>>> Thanks,
>>> -- David
>>>
>>
>

Re: When is decomissioning done?

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Just for the reference these links:
http://wiki.apache.org/hadoop/FAQ#17
http://hadoop.apache.org/core/docs/r0.19.0/hdfs_user_guide.html#DFSAdmin+Command

Decommissioning is not happening at once.
-refreshNodes just starts the process, but does not complete it.
There could be a lot of blocks on the nodes you want to decommission,
and replication takes time.
The progress can be monitored on the name-node web UI.
Right after -refreshNodes on the web ui you will see the nodes you chose for
decommission have state "Decommission In Progress" you should wait until it is
changed to "Decommissioned" and then turn the node off.

--Konstantin


David Hall wrote:
> I'm starting to think I'm doing things wrong.
> 
> I have an absolute path to dfs.hosts.exclude that includes what i want
> decommissioned, and a dfs.hosts which includes those i want to remain
> commissioned (this points to the slaves file).
> 
> Nothing seems to do anything...
> 
> What am I missing?
> 
> -- David
> 
> On Thu, Dec 4, 2008 at 12:48 AM, David Hall <dl...@stanford.edu> wrote:
>> Hi,
>>
>> I'm trying to decommission some nodes. The process I tried to follow is:
>>
>> 1) add them to conf/excluding (hadoop-site points there)
>> 2) invoke hadoop dfsadmin -refreshNodes
>>
>> This returns immediately, so I thought it was done, so i killed off
>> the cluster and rebooted without the new nodes, but then fsck was very
>> unhappy...
>>
>> Is there some way to watch the progress of decomissioning?
>>
>> Thanks,
>> -- David
>>
> 

Re: When is decomissioning done?

Posted by David Hall <dl...@stanford.edu>.
I'm starting to think I'm doing things wrong.

I have an absolute path to dfs.hosts.exclude that includes what i want
decommissioned, and a dfs.hosts which includes those i want to remain
commissioned (this points to the slaves file).

Nothing seems to do anything...

What am I missing?

-- David

On Thu, Dec 4, 2008 at 12:48 AM, David Hall <dl...@stanford.edu> wrote:
> Hi,
>
> I'm trying to decommission some nodes. The process I tried to follow is:
>
> 1) add them to conf/excluding (hadoop-site points there)
> 2) invoke hadoop dfsadmin -refreshNodes
>
> This returns immediately, so I thought it was done, so i killed off
> the cluster and rebooted without the new nodes, but then fsck was very
> unhappy...
>
> Is there some way to watch the progress of decomissioning?
>
> Thanks,
> -- David
>