You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Hargraves, Alyssa" <al...@WPI.EDU> on 2009/01/22 01:34:48 UTC

Decommissioning Nodes

Hello Hadoop Users,

I was hoping someone would be able to answer a question about node decommissioning.  I have a test Hadoop cluster set up which only consists of my computer and a master node.  I am looking at the removal and addition of nodes.  Adding a node is nearly instant (only about 5 seconds), but removing a node by decommissioning it takes a while, and I don't understand why. Currently, the systems are running no map/reduce tasks and storing no data. DFS Health reports:

7 files and directories, 0 blocks = 7 total. Heap Size is 6.68 MB / 992.31 MB (0%)
Capacity	:	298.02 GB
DFS Remaining	:	245.79 GB
DFS Used	:	4 KB
DFS Used%	:	0 %
Live Nodes 	:	2
Dead Nodes 	:	0

Node 	 Last Contact 	 Admin State 	 Size (GB) 	 Used (%) 	 Used (%) 	 Remaining (GB) 	 Blocks
master	0	In Service	149.01	0	
	122.22	0
slave	82	Decommission In Progress	149.01	0	
	123.58	0 

However, even with nothing stored and nothing running, the decommission process takes 3 to 5 minutes, and I'm not quite sure why. There isn't any data to move anywhere, and there aren't any jobs to worry about.  I am using 0.18.2.

Thank you for any help in solving this,
Alyssa Hargraves

Re: Decommissioning Nodes

Posted by Jeremy Chow <co...@gmail.com>.
Hey Alyssa,
If one of those datanodes down, a few minutes will pass when master discover
this phenomenon. Master node takes those nodes which have not send heatbeat
for quite a while as dead ones.

On Thu, Jan 22, 2009 at 8:34 AM, Hargraves, Alyssa <al...@wpi.edu> wrote:

> Hello Hadoop Users,
>
> I was hoping someone would be able to answer a question about node
> decommissioning.  I have a test Hadoop cluster set up which only consists of
> my computer and a master node.  I am looking at the removal and addition of
> nodes.  Adding a node is nearly instant (only about 5 seconds), but removing
> a node by decommissioning it takes a while, and I don't understand why.
> Currently, the systems are running no map/reduce tasks and storing no data.
> DFS Health reports:
>
> 7 files and directories, 0 blocks = 7 total. Heap Size is 6.68 MB / 992.31
> MB (0%)
> Capacity        :       298.02 GB
> DFS Remaining   :       245.79 GB
> DFS Used        :       4 KB
> DFS Used%       :       0 %
> Live Nodes      :       2
> Dead Nodes      :       0
>
> Node     Last Contact    Admin State     Size (GB)       Used (%)
>  Used (%)        Remaining (GB)          Blocks
> master  0       In Service      149.01  0
>        122.22  0
> slave   82      Decommission In Progress        149.01  0
>        123.58  0
>
> However, even with nothing stored and nothing running, the decommission
> process takes 3 to 5 minutes, and I'm not quite sure why. There isn't any
> data to move anywhere, and there aren't any jobs to worry about.  I am using
> 0.18.2.
>
> Thank you for any help in solving this,
> Alyssa Hargraves




-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com

Re: Decommissioning Nodes

Posted by Kumar Pandey <ku...@gmail.com>.
Can you try setting the following in hadoop-site.xml at the name node and
see if the time comes down to around a minute
    <property>
         <name>heartbeat.recheck.interval</name>
         <value>1</value>
     </property>

This effectively
On Thu, Jan 22, 2009 at 9:42 AM, Hargraves, Alyssa <al...@wpi.edu> wrote:

> I was following the steps at <http://wiki.apache.org/hadoop/FAQ#17> to do
> the decommission.  However, you have to be patient with it since it seems to
> take a long time.  If it took 3-5 minutes with my nodes that have no data
> and no jobs running, I can't imagine how long it would be for a real
> cluster.  One thing that I had trouble with originally was the fact that it
> doesn't seem to work if your replication is set to be same as your number of
> machines (since I was just testing things, I had replication set to 2 with 2
> machines, but that's not a good real-world example).
>
> The problem I'm having though (from Jeremy's reply earlier it sounds like
> he misinterpreted it) isn't how long it is taking for the node to go from
> decommissioned to being recognized by the master as dead.  Whether or not
> it's recognized as dead isn't something that matters for what I'm doing.
>  The real problem is that going from the In Service to Decommissioned state
> is taking forever.  Decommission In Progress lasts 3 to 5 minutes despite
> the fact that there aren't jobs or data on those nodes.  If anyone else has
> any idea why that might be (I can see why it would take time if there are
> jobs or data, but not otherwise) please let me know.
>
> - Alyssa
> ________________________________________
> From: Rob Hamilton [rob@lotame.com]
> Sent: Thursday, January 22, 2009 12:26 PM
> To: core-user@hadoop.apache.org
> Subject: RE: Decommissioning Nodes
>
> I wasn't able to get decommissioning to work at all and found that just
> taking the node down got it out of the cluster. What version are you running
> and how are you initiating the decommissioning?
>
> -Rob
>
>
> Rob Hamilton - VP Network Operations
> P +1 (410) 379-2195 x 240
> E rob@lotame.com
> 6085 Marshalee Drive, Suite 210
> Elkridge, MD 21075
>
>
> -----Original Message-----
> From: Hargraves, Alyssa [mailto:alyssa@WPI.EDU]
> Sent: Wednesday, January 21, 2009 7:35 PM
> To: core-user@hadoop.apache.org
> Subject: Decommissioning Nodes
>
> Hello Hadoop Users,
>
> I was hoping someone would be able to answer a question about node
> decommissioning.  I have a test Hadoop cluster set up which only consists of
> my computer and a master node.  I am looking at the removal and addition of
> nodes.  Adding a node is nearly instant (only about 5 seconds), but removing
> a node by decommissioning it takes a while, and I don't understand why.
> Currently, the systems are running no map/reduce tasks and storing no data.
> DFS Health reports:
>
> 7 files and directories, 0 blocks = 7 total. Heap Size is 6.68 MB / 992.31
> MB (0%)
> Capacity        :       298.02 GB
> DFS Remaining   :       245.79 GB
> DFS Used        :       4 KB
> DFS Used%       :       0 %
> Live Nodes      :       2
> Dead Nodes      :       0
>
> Node     Last Contact    Admin State     Size (GB)       Used (%)
>  Used (%)        Remaining (GB)          Blocks
> master  0       In Service      149.01  0
>        122.22  0
> slave   82      Decommission In Progress        149.01  0
>        123.58  0
>
> However, even with nothing stored and nothing running, the decommission
> process takes 3 to 5 minutes, and I'm not quite sure why. There isn't any
> data to move anywhere, and there aren't any jobs to worry about.  I am using
> 0.18.2.
>
> Thank you for any help in solving this,
> Alyssa Hargraves
>
> The information transmitted in this email is intended only for the
> person(s) or entity to which it is addressed and may contain confidential
> and/or privileged material. Any review, retransmission, dissemination or
> other use of, or taking of any action in reliance upon, this information by
> persons or entities other than the intended recipient is prohibited. If you
> received this email in error, please contact the sender and permanently
> delete the email from any computer.
>
>
>


-- 
Kumar Pandey
http://www.linkedin.com/in/kumarpandey

Re: Decommissioning Nodes

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Hi Alyssa,

In 0.18.2 the decommission monitor wakes up every 5 minutes and
checks nodes that have been decommissioned during this period.
So the time of decommission depends on when during this period
the decommission request was issued. Could be 3 or 5 minutes as
you describe.

Starting from version 0.18.3 and up this was changed.
The decommission monitor wakes up every 30 seconds, but verifies
a limited number of nodes. Since you have only one node it will
be decommissioned in no longer than 30 seconds.

Hope this helps.

Thanks,
--Konstantin


Hargraves, Alyssa wrote:
> I was following the steps at <http://wiki.apache.org/hadoop/FAQ#17> to do the decommission.  However, you have to be patient with it since it seems to take a long time.  If it took 3-5 minutes with my nodes that have no data and no jobs running, I can't imagine how long it would be for a real cluster.  One thing that I had trouble with originally was the fact that it doesn't seem to work if your replication is set to be same as your number of machines (since I was just testing things, I had replication set to 2 with 2 machines, but that's not a good real-world example).
> 
> The problem I'm having though (from Jeremy's reply earlier it sounds like he misinterpreted it) isn't how long it is taking for the node to go from decommissioned to being recognized by the master as dead.  Whether or not it's recognized as dead isn't something that matters for what I'm doing.  The real problem is that going from the In Service to Decommissioned state is taking forever.  Decommission In Progress lasts 3 to 5 minutes despite the fact that there aren't jobs or data on those nodes.  If anyone else has any idea why that might be (I can see why it would take time if there are jobs or data, but not otherwise) please let me know.
> 
> - Alyssa
> ________________________________________
> From: Rob Hamilton [rob@lotame.com]
> Sent: Thursday, January 22, 2009 12:26 PM
> To: core-user@hadoop.apache.org
> Subject: RE: Decommissioning Nodes
> 
> I wasn't able to get decommissioning to work at all and found that just taking the node down got it out of the cluster. What version are you running and how are you initiating the decommissioning?
> 
> -Rob
> 
> 
> Rob Hamilton - VP Network Operations
> P +1 (410) 379-2195 x 240
> E rob@lotame.com
> 6085 Marshalee Drive, Suite 210
> Elkridge, MD 21075
> 
> 
> -----Original Message-----
> From: Hargraves, Alyssa [mailto:alyssa@WPI.EDU]
> Sent: Wednesday, January 21, 2009 7:35 PM
> To: core-user@hadoop.apache.org
> Subject: Decommissioning Nodes
> 
> Hello Hadoop Users,
> 
> I was hoping someone would be able to answer a question about node decommissioning.  I have a test Hadoop cluster set up which only consists of my computer and a master node.  I am looking at the removal and addition of nodes.  Adding a node is nearly instant (only about 5 seconds), but removing a node by decommissioning it takes a while, and I don't understand why. Currently, the systems are running no map/reduce tasks and storing no data. DFS Health reports:
> 
> 7 files and directories, 0 blocks = 7 total. Heap Size is 6.68 MB / 992.31 MB (0%)
> Capacity        :       298.02 GB
> DFS Remaining   :       245.79 GB
> DFS Used        :       4 KB
> DFS Used%       :       0 %
> Live Nodes      :       2
> Dead Nodes      :       0
> 
> Node     Last Contact    Admin State     Size (GB)       Used (%)        Used (%)        Remaining (GB)          Blocks
> master  0       In Service      149.01  0
>         122.22  0
> slave   82      Decommission In Progress        149.01  0
>         123.58  0
> 
> However, even with nothing stored and nothing running, the decommission process takes 3 to 5 minutes, and I'm not quite sure why. There isn't any data to move anywhere, and there aren't any jobs to worry about.  I am using 0.18.2.
> 
> Thank you for any help in solving this,
> Alyssa Hargraves
> 
> The information transmitted in this email is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and permanently delete the email from any computer.
> 
> 
> 

RE: Decommissioning Nodes

Posted by "Hargraves, Alyssa" <al...@WPI.EDU>.
I was following the steps at <http://wiki.apache.org/hadoop/FAQ#17> to do the decommission.  However, you have to be patient with it since it seems to take a long time.  If it took 3-5 minutes with my nodes that have no data and no jobs running, I can't imagine how long it would be for a real cluster.  One thing that I had trouble with originally was the fact that it doesn't seem to work if your replication is set to be same as your number of machines (since I was just testing things, I had replication set to 2 with 2 machines, but that's not a good real-world example).

The problem I'm having though (from Jeremy's reply earlier it sounds like he misinterpreted it) isn't how long it is taking for the node to go from decommissioned to being recognized by the master as dead.  Whether or not it's recognized as dead isn't something that matters for what I'm doing.  The real problem is that going from the In Service to Decommissioned state is taking forever.  Decommission In Progress lasts 3 to 5 minutes despite the fact that there aren't jobs or data on those nodes.  If anyone else has any idea why that might be (I can see why it would take time if there are jobs or data, but not otherwise) please let me know.

- Alyssa
________________________________________
From: Rob Hamilton [rob@lotame.com]
Sent: Thursday, January 22, 2009 12:26 PM
To: core-user@hadoop.apache.org
Subject: RE: Decommissioning Nodes

I wasn't able to get decommissioning to work at all and found that just taking the node down got it out of the cluster. What version are you running and how are you initiating the decommissioning?

-Rob


Rob Hamilton - VP Network Operations
P +1 (410) 379-2195 x 240
E rob@lotame.com
6085 Marshalee Drive, Suite 210
Elkridge, MD 21075


-----Original Message-----
From: Hargraves, Alyssa [mailto:alyssa@WPI.EDU]
Sent: Wednesday, January 21, 2009 7:35 PM
To: core-user@hadoop.apache.org
Subject: Decommissioning Nodes

Hello Hadoop Users,

I was hoping someone would be able to answer a question about node decommissioning.  I have a test Hadoop cluster set up which only consists of my computer and a master node.  I am looking at the removal and addition of nodes.  Adding a node is nearly instant (only about 5 seconds), but removing a node by decommissioning it takes a while, and I don't understand why. Currently, the systems are running no map/reduce tasks and storing no data. DFS Health reports:

7 files and directories, 0 blocks = 7 total. Heap Size is 6.68 MB / 992.31 MB (0%)
Capacity        :       298.02 GB
DFS Remaining   :       245.79 GB
DFS Used        :       4 KB
DFS Used%       :       0 %
Live Nodes      :       2
Dead Nodes      :       0

Node     Last Contact    Admin State     Size (GB)       Used (%)        Used (%)        Remaining (GB)          Blocks
master  0       In Service      149.01  0
        122.22  0
slave   82      Decommission In Progress        149.01  0
        123.58  0

However, even with nothing stored and nothing running, the decommission process takes 3 to 5 minutes, and I'm not quite sure why. There isn't any data to move anywhere, and there aren't any jobs to worry about.  I am using 0.18.2.

Thank you for any help in solving this,
Alyssa Hargraves

The information transmitted in this email is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and permanently delete the email from any computer.



RE: Decommissioning Nodes

Posted by Rob Hamilton <ro...@lotame.com>.
I wasn't able to get decommissioning to work at all and found that just taking the node down got it out of the cluster. What version are you running and how are you initiating the decommissioning?

-Rob


Rob Hamilton - VP Network Operations 
P +1 (410) 379-2195 x 240   
E rob@lotame.com   
6085 Marshalee Drive, Suite 210   
Elkridge, MD 21075   


-----Original Message-----
From: Hargraves, Alyssa [mailto:alyssa@WPI.EDU] 
Sent: Wednesday, January 21, 2009 7:35 PM
To: core-user@hadoop.apache.org
Subject: Decommissioning Nodes

Hello Hadoop Users,

I was hoping someone would be able to answer a question about node decommissioning.  I have a test Hadoop cluster set up which only consists of my computer and a master node.  I am looking at the removal and addition of nodes.  Adding a node is nearly instant (only about 5 seconds), but removing a node by decommissioning it takes a while, and I don't understand why. Currently, the systems are running no map/reduce tasks and storing no data. DFS Health reports:

7 files and directories, 0 blocks = 7 total. Heap Size is 6.68 MB / 992.31 MB (0%)
Capacity	:	298.02 GB
DFS Remaining	:	245.79 GB
DFS Used	:	4 KB
DFS Used%	:	0 %
Live Nodes 	:	2
Dead Nodes 	:	0

Node 	 Last Contact 	 Admin State 	 Size (GB) 	 Used (%) 	 Used (%) 	 Remaining (GB) 	 Blocks
master	0	In Service	149.01	0	
	122.22	0
slave	82	Decommission In Progress	149.01	0	
	123.58	0 

However, even with nothing stored and nothing running, the decommission process takes 3 to 5 minutes, and I'm not quite sure why. There isn't any data to move anywhere, and there aren't any jobs to worry about.  I am using 0.18.2.

Thank you for any help in solving this,
Alyssa Hargraves

The information transmitted in this email is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and permanently delete the email from any computer.