You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by "Phan, Truong Q" <Tr...@team.telstra.com> on 2014/03/20 03:34:36 UTC

how to free up space of the old Data Node

Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
Is there a way to get this node to release space?

[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF442C.0F3C8CF0]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Vinayakumar B <vi...@huawei.com>.

You can change the replication factor using the following command
hdfs dfs - setrep [-R] <rep> <path>

Once this is done, you can re-commission the datanode, then all the overreplicated blocks will be removed.
If not removed, restart the datanode.

Regards,
Vinayakumar B

From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: 20 March 2014 10:28
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
Is this controlled by this parameter (dfs.datanode.handler.count)?

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Please check my inline comments which are in blue color...





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:28 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
>>hadoop fs -setrep -w 2(number) -R /<location of the Dir/File>
Is this controlled by this parameter (dfs.datanode.handler.count)?
>>>>No

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Vinayakumar B <vi...@huawei.com>.

You can change the replication factor using the following command
hdfs dfs - setrep [-R] <rep> <path>

Once this is done, you can re-commission the datanode, then all the overreplicated blocks will be removed.
If not removed, restart the datanode.

Regards,
Vinayakumar B

From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: 20 March 2014 10:28
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
Is this controlled by this parameter (dfs.datanode.handler.count)?

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Vinayakumar B <vi...@huawei.com>.

You can change the replication factor using the following command
hdfs dfs - setrep [-R] <rep> <path>

Once this is done, you can re-commission the datanode, then all the overreplicated blocks will be removed.
If not removed, restart the datanode.

Regards,
Vinayakumar B

From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: 20 March 2014 10:28
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
Is this controlled by this parameter (dfs.datanode.handler.count)?

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Vinayakumar B <vi...@huawei.com>.

You can change the replication factor using the following command
hdfs dfs - setrep [-R] <rep> <path>

Once this is done, you can re-commission the datanode, then all the overreplicated blocks will be removed.
If not removed, restart the datanode.

Regards,
Vinayakumar B

From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: 20 March 2014 10:28
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
Is this controlled by this parameter (dfs.datanode.handler.count)?

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Please check my inline comments which are in blue color...





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:28 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
>>hadoop fs -setrep -w 2(number) -R /<location of the Dir/File>
Is this controlled by this parameter (dfs.datanode.handler.count)?
>>>>No

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Please check my inline comments which are in blue color...





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:28 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
>>hadoop fs -setrep -w 2(number) -R /<location of the Dir/File>
Is this controlled by this parameter (dfs.datanode.handler.count)?
>>>>No

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Please check my inline comments which are in blue color...





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:28 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
>>hadoop fs -setrep -w 2(number) -R /<location of the Dir/File>
Is this controlled by this parameter (dfs.datanode.handler.count)?
>>>>No

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
Is this controlled by this parameter (dfs.datanode.handler.count)?

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF4455.2B5B0CD0]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Normally replication will take more time that depends N/w Following configurations..



you can check the following



a) whether 3 DN's are running in cluster or not....post the fsck and admin report once 3 DN's are up and running..



b) hdfs fsck / -files -blocks -racks | grep -i "repl=3" | wc -l,, you can watch this output..



c) The properties that control this are dfs.namenode.replication.work.multiplier.per.iteration (2), dfs.namenode.replication.max-streams (2) and dfs.namenode.replication.max-streams-hard-limit (4). The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. The values in () indicate their defaults. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

You can perhaps try to increase the set of values to (10, 50, 100) respectively to spruce up the network usage (requires a NameNode restart), but note that your DN memory usage may increase slightly as a result of more blocks information being propagated to it. A reasonable heap size for these values for the DN role would be about 4 GB.



Please check following link for more details

http://stackoverflow.com/questions/17599498/block-replication-limits-in-hdfs







 Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Friday, March 21, 2014 8:32 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the ‘/’ HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
… CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Okie..Good



Thanks & Regards

Brahma Reddy Battula



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Friday, March 21, 2014 10:57 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi

I have managed to fix the hanging of my set replicator factor command.
The hanging was in the hidden staging directory of the user d284671 (/user/d284671/.staging/job_1391480945032_0001/).
I have to use the HDFS’s remove files command to remove all temporary files in this hidden staging directory of this user.
However, I still could not see the low space old data node is releasing space.
This node has been decommissioned from the Data nodes list.

[d284671@bpdevdmsdbs01]  hadoop fs -rm -f -R .staging/*

oracle@bpdevdmsdbs01 sudo -u hdfs hadoop fs -setrep -R -w 2 /

oracle@bpdevdmsdbs01 $ sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.127.245 for path / at Fri Mar 21 16:09:34 EST 2014
....................................................................................................
....................................................................................................
....................................................................................................
..................................................Status: HEALTHY
Total size:    7317850024 B
Total dirs:    130
Total files:   350
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      391 (avg. block size 18715728 B)
Minimally replicated blocks:   391 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Fri Mar 21 16:09:34 EST 2014 in 55 milliseconds


The filesystem under path '/' is HEALTHY


[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 722413383680 (672.80 GB)
DFS Remaining: 707641401344 (659.04 GB)
DFS Used: 14771982336 (13.76 GB)
DFS Used%: 2.04%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385993216 (6.88 GB)
Non DFS Used: 172196048487 (160.37 GB)
DFS Remaining: 370611757056 (345.16 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.36%
Last contact: Fri Mar 21 16:10:42 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385989120 (6.88 GB)
Non DFS Used: 205778165351 (191.65 GB)
DFS Remaining: 337029644288 (313.88 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.26%
Last contact: Fri Mar 21 16:10:42 EST 2014


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: Friday, 21 March 2014 2:02 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the ‘/’ HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
… CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Okie..Good



Thanks & Regards

Brahma Reddy Battula



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Friday, March 21, 2014 10:57 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi

I have managed to fix the hanging of my set replicator factor command.
The hanging was in the hidden staging directory of the user d284671 (/user/d284671/.staging/job_1391480945032_0001/).
I have to use the HDFS’s remove files command to remove all temporary files in this hidden staging directory of this user.
However, I still could not see the low space old data node is releasing space.
This node has been decommissioned from the Data nodes list.

[d284671@bpdevdmsdbs01]  hadoop fs -rm -f -R .staging/*

oracle@bpdevdmsdbs01 sudo -u hdfs hadoop fs -setrep -R -w 2 /

oracle@bpdevdmsdbs01 $ sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.127.245 for path / at Fri Mar 21 16:09:34 EST 2014
....................................................................................................
....................................................................................................
....................................................................................................
..................................................Status: HEALTHY
Total size:    7317850024 B
Total dirs:    130
Total files:   350
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      391 (avg. block size 18715728 B)
Minimally replicated blocks:   391 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Fri Mar 21 16:09:34 EST 2014 in 55 milliseconds


The filesystem under path '/' is HEALTHY


[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 722413383680 (672.80 GB)
DFS Remaining: 707641401344 (659.04 GB)
DFS Used: 14771982336 (13.76 GB)
DFS Used%: 2.04%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385993216 (6.88 GB)
Non DFS Used: 172196048487 (160.37 GB)
DFS Remaining: 370611757056 (345.16 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.36%
Last contact: Fri Mar 21 16:10:42 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385989120 (6.88 GB)
Non DFS Used: 205778165351 (191.65 GB)
DFS Remaining: 337029644288 (313.88 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.26%
Last contact: Fri Mar 21 16:10:42 EST 2014


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: Friday, 21 March 2014 2:02 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the ‘/’ HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
… CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Okie..Good



Thanks & Regards

Brahma Reddy Battula



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Friday, March 21, 2014 10:57 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi

I have managed to fix the hanging of my set replicator factor command.
The hanging was in the hidden staging directory of the user d284671 (/user/d284671/.staging/job_1391480945032_0001/).
I have to use the HDFS’s remove files command to remove all temporary files in this hidden staging directory of this user.
However, I still could not see the low space old data node is releasing space.
This node has been decommissioned from the Data nodes list.

[d284671@bpdevdmsdbs01]  hadoop fs -rm -f -R .staging/*

oracle@bpdevdmsdbs01 sudo -u hdfs hadoop fs -setrep -R -w 2 /

oracle@bpdevdmsdbs01 $ sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.127.245 for path / at Fri Mar 21 16:09:34 EST 2014
....................................................................................................
....................................................................................................
....................................................................................................
..................................................Status: HEALTHY
Total size:    7317850024 B
Total dirs:    130
Total files:   350
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      391 (avg. block size 18715728 B)
Minimally replicated blocks:   391 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Fri Mar 21 16:09:34 EST 2014 in 55 milliseconds


The filesystem under path '/' is HEALTHY


[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 722413383680 (672.80 GB)
DFS Remaining: 707641401344 (659.04 GB)
DFS Used: 14771982336 (13.76 GB)
DFS Used%: 2.04%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385993216 (6.88 GB)
Non DFS Used: 172196048487 (160.37 GB)
DFS Remaining: 370611757056 (345.16 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.36%
Last contact: Fri Mar 21 16:10:42 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385989120 (6.88 GB)
Non DFS Used: 205778165351 (191.65 GB)
DFS Remaining: 337029644288 (313.88 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.26%
Last contact: Fri Mar 21 16:10:42 EST 2014


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: Friday, 21 March 2014 2:02 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the ‘/’ HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
… CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Okie..Good



Thanks & Regards

Brahma Reddy Battula



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Friday, March 21, 2014 10:57 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi

I have managed to fix the hanging of my set replicator factor command.
The hanging was in the hidden staging directory of the user d284671 (/user/d284671/.staging/job_1391480945032_0001/).
I have to use the HDFS’s remove files command to remove all temporary files in this hidden staging directory of this user.
However, I still could not see the low space old data node is releasing space.
This node has been decommissioned from the Data nodes list.

[d284671@bpdevdmsdbs01]  hadoop fs -rm -f -R .staging/*

oracle@bpdevdmsdbs01 sudo -u hdfs hadoop fs -setrep -R -w 2 /

oracle@bpdevdmsdbs01 $ sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.127.245 for path / at Fri Mar 21 16:09:34 EST 2014
....................................................................................................
....................................................................................................
....................................................................................................
..................................................Status: HEALTHY
Total size:    7317850024 B
Total dirs:    130
Total files:   350
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      391 (avg. block size 18715728 B)
Minimally replicated blocks:   391 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Fri Mar 21 16:09:34 EST 2014 in 55 milliseconds


The filesystem under path '/' is HEALTHY


[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 722413383680 (672.80 GB)
DFS Remaining: 707641401344 (659.04 GB)
DFS Used: 14771982336 (13.76 GB)
DFS Used%: 2.04%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385993216 (6.88 GB)
Non DFS Used: 172196048487 (160.37 GB)
DFS Remaining: 370611757056 (345.16 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.36%
Last contact: Fri Mar 21 16:10:42 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385989120 (6.88 GB)
Non DFS Used: 205778165351 (191.65 GB)
DFS Remaining: 337029644288 (313.88 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.26%
Last contact: Fri Mar 21 16:10:42 EST 2014


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: Friday, 21 March 2014 2:02 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the ‘/’ HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
… CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi

I have managed to fix the hanging of my set replicator factor command.
The hanging was in the hidden staging directory of the user d284671 (/user/d284671/.staging/job_1391480945032_0001/).
I have to use the HDFS's remove files command to remove all temporary files in this hidden staging directory of this user.
However, I still could not see the low space old data node is releasing space.
This node has been decommissioned from the Data nodes list.

[d284671@bpdevdmsdbs01]  hadoop fs -rm -f -R .staging/*

oracle@bpdevdmsdbs01 sudo -u hdfs hadoop fs -setrep -R -w 2 /

oracle@bpdevdmsdbs01 $ sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.127.245 for path / at Fri Mar 21 16:09:34 EST 2014
....................................................................................................
....................................................................................................
....................................................................................................
..................................................Status: HEALTHY
Total size:    7317850024 B
Total dirs:    130
Total files:   350
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      391 (avg. block size 18715728 B)
Minimally replicated blocks:   391 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Fri Mar 21 16:09:34 EST 2014 in 55 milliseconds


The filesystem under path '/' is HEALTHY


[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 722413383680 (672.80 GB)
DFS Remaining: 707641401344 (659.04 GB)
DFS Used: 14771982336 (13.76 GB)
DFS Used%: 2.04%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385993216 (6.88 GB)
Non DFS Used: 172196048487 (160.37 GB)
DFS Remaining: 370611757056 (345.16 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.36%
Last contact: Fri Mar 21 16:10:42 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385989120 (6.88 GB)
Non DFS Used: 205778165351 (191.65 GB)
DFS Remaining: 337029644288 (313.88 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.26%
Last contact: Fri Mar 21 16:10:42 EST 2014


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: Friday, 21 March 2014 2:02 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the '/' HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
... CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF451F.FCE56AF0]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Normally replication will take more time that depends N/w Following configurations..



you can check the following



a) whether 3 DN's are running in cluster or not....post the fsck and admin report once 3 DN's are up and running..



b) hdfs fsck / -files -blocks -racks | grep -i "repl=3" | wc -l,, you can watch this output..



c) The properties that control this are dfs.namenode.replication.work.multiplier.per.iteration (2), dfs.namenode.replication.max-streams (2) and dfs.namenode.replication.max-streams-hard-limit (4). The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. The values in () indicate their defaults. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

You can perhaps try to increase the set of values to (10, 50, 100) respectively to spruce up the network usage (requires a NameNode restart), but note that your DN memory usage may increase slightly as a result of more blocks information being propagated to it. A reasonable heap size for these values for the DN role would be about 4 GB.



Please check following link for more details

http://stackoverflow.com/questions/17599498/block-replication-limits-in-hdfs







 Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Friday, March 21, 2014 8:32 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the ‘/’ HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
… CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi

I have managed to fix the hanging of my set replicator factor command.
The hanging was in the hidden staging directory of the user d284671 (/user/d284671/.staging/job_1391480945032_0001/).
I have to use the HDFS's remove files command to remove all temporary files in this hidden staging directory of this user.
However, I still could not see the low space old data node is releasing space.
This node has been decommissioned from the Data nodes list.

[d284671@bpdevdmsdbs01]  hadoop fs -rm -f -R .staging/*

oracle@bpdevdmsdbs01 sudo -u hdfs hadoop fs -setrep -R -w 2 /

oracle@bpdevdmsdbs01 $ sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.127.245 for path / at Fri Mar 21 16:09:34 EST 2014
....................................................................................................
....................................................................................................
....................................................................................................
..................................................Status: HEALTHY
Total size:    7317850024 B
Total dirs:    130
Total files:   350
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      391 (avg. block size 18715728 B)
Minimally replicated blocks:   391 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Fri Mar 21 16:09:34 EST 2014 in 55 milliseconds


The filesystem under path '/' is HEALTHY


[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 722413383680 (672.80 GB)
DFS Remaining: 707641401344 (659.04 GB)
DFS Used: 14771982336 (13.76 GB)
DFS Used%: 2.04%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385993216 (6.88 GB)
Non DFS Used: 172196048487 (160.37 GB)
DFS Remaining: 370611757056 (345.16 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.36%
Last contact: Fri Mar 21 16:10:42 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385989120 (6.88 GB)
Non DFS Used: 205778165351 (191.65 GB)
DFS Remaining: 337029644288 (313.88 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.26%
Last contact: Fri Mar 21 16:10:42 EST 2014


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: Friday, 21 March 2014 2:02 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the '/' HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
... CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF451F.FCE56AF0]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Normally replication will take more time that depends N/w Following configurations..



you can check the following



a) whether 3 DN's are running in cluster or not....post the fsck and admin report once 3 DN's are up and running..



b) hdfs fsck / -files -blocks -racks | grep -i "repl=3" | wc -l,, you can watch this output..



c) The properties that control this are dfs.namenode.replication.work.multiplier.per.iteration (2), dfs.namenode.replication.max-streams (2) and dfs.namenode.replication.max-streams-hard-limit (4). The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. The values in () indicate their defaults. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

You can perhaps try to increase the set of values to (10, 50, 100) respectively to spruce up the network usage (requires a NameNode restart), but note that your DN memory usage may increase slightly as a result of more blocks information being propagated to it. A reasonable heap size for these values for the DN role would be about 4 GB.



Please check following link for more details

http://stackoverflow.com/questions/17599498/block-replication-limits-in-hdfs







 Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Friday, March 21, 2014 8:32 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the ‘/’ HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
… CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Normally replication will take more time that depends N/w Following configurations..



you can check the following



a) whether 3 DN's are running in cluster or not....post the fsck and admin report once 3 DN's are up and running..



b) hdfs fsck / -files -blocks -racks | grep -i "repl=3" | wc -l,, you can watch this output..



c) The properties that control this are dfs.namenode.replication.work.multiplier.per.iteration (2), dfs.namenode.replication.max-streams (2) and dfs.namenode.replication.max-streams-hard-limit (4). The foremost controls the rate of work to be scheduled to a DN at every heartbeat that occurs, and the other two further limit the maximum parallel threaded network transfers done by a DataNode at a time. The values in () indicate their defaults. Some description of this is available at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

You can perhaps try to increase the set of values to (10, 50, 100) respectively to spruce up the network usage (requires a NameNode restart), but note that your DN memory usage may increase slightly as a result of more blocks information being propagated to it. A reasonable heap size for these values for the DN role would be about 4 GB.



Please check following link for more details

http://stackoverflow.com/questions/17599498/block-replication-limits-in-hdfs







 Thanks & Regards

 Brahma Reddy Battula

________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Friday, March 21, 2014 8:32 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the ‘/’ HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
… CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi

I have managed to fix the hanging of my set replicator factor command.
The hanging was in the hidden staging directory of the user d284671 (/user/d284671/.staging/job_1391480945032_0001/).
I have to use the HDFS's remove files command to remove all temporary files in this hidden staging directory of this user.
However, I still could not see the low space old data node is releasing space.
This node has been decommissioned from the Data nodes list.

[d284671@bpdevdmsdbs01]  hadoop fs -rm -f -R .staging/*

oracle@bpdevdmsdbs01 sudo -u hdfs hadoop fs -setrep -R -w 2 /

oracle@bpdevdmsdbs01 $ sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.127.245 for path / at Fri Mar 21 16:09:34 EST 2014
....................................................................................................
....................................................................................................
....................................................................................................
..................................................Status: HEALTHY
Total size:    7317850024 B
Total dirs:    130
Total files:   350
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      391 (avg. block size 18715728 B)
Minimally replicated blocks:   391 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Fri Mar 21 16:09:34 EST 2014 in 55 milliseconds


The filesystem under path '/' is HEALTHY


[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 722413383680 (672.80 GB)
DFS Remaining: 707641401344 (659.04 GB)
DFS Used: 14771982336 (13.76 GB)
DFS Used%: 2.04%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385993216 (6.88 GB)
Non DFS Used: 172196048487 (160.37 GB)
DFS Remaining: 370611757056 (345.16 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.36%
Last contact: Fri Mar 21 16:10:42 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385989120 (6.88 GB)
Non DFS Used: 205778165351 (191.65 GB)
DFS Remaining: 337029644288 (313.88 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.26%
Last contact: Fri Mar 21 16:10:42 EST 2014


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: Friday, 21 March 2014 2:02 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the '/' HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
... CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF451F.FCE56AF0]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi

I have managed to fix the hanging of my set replicator factor command.
The hanging was in the hidden staging directory of the user d284671 (/user/d284671/.staging/job_1391480945032_0001/).
I have to use the HDFS's remove files command to remove all temporary files in this hidden staging directory of this user.
However, I still could not see the low space old data node is releasing space.
This node has been decommissioned from the Data nodes list.

[d284671@bpdevdmsdbs01]  hadoop fs -rm -f -R .staging/*

oracle@bpdevdmsdbs01 sudo -u hdfs hadoop fs -setrep -R -w 2 /

oracle@bpdevdmsdbs01 $ sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.127.245 for path / at Fri Mar 21 16:09:34 EST 2014
....................................................................................................
....................................................................................................
....................................................................................................
..................................................Status: HEALTHY
Total size:    7317850024 B
Total dirs:    130
Total files:   350
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      391 (avg. block size 18715728 B)
Minimally replicated blocks:   391 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Fri Mar 21 16:09:34 EST 2014 in 55 milliseconds


The filesystem under path '/' is HEALTHY


[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 722413383680 (672.80 GB)
DFS Remaining: 707641401344 (659.04 GB)
DFS Used: 14771982336 (13.76 GB)
DFS Used%: 2.04%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385993216 (6.88 GB)
Non DFS Used: 172196048487 (160.37 GB)
DFS Remaining: 370611757056 (345.16 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.36%
Last contact: Fri Mar 21 16:10:42 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7385989120 (6.88 GB)
Non DFS Used: 205778165351 (191.65 GB)
DFS Remaining: 337029644288 (313.88 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.26%
Last contact: Fri Mar 21 16:10:42 EST 2014


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Phan, Truong Q [mailto:Troung.Phan@team.telstra.com]
Sent: Friday, 21 March 2014 2:02 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the '/' HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
... CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF451F.FCE56AF0]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the '/' HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
... CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF450E.2DE05910]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the '/' HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
... CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF450E.2DE05910]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the '/' HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
... CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF450E.2DE05910]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi Battula,

I have followed your advice to reduce the replication factor but the command is taking too long and got stuck at the line below.
I am trying to reduce the replicator factor on the '/' HDFS filesystem but it got stuck to this linke for more than 4hrs.
Is there any way to troubleshoot to see what it has been waiting or improve the command?

[root@nsda3dmsrpt02] ~# sudo -u hdfs hadoop fs -setrep -R -w 2 /
Replication 2 set: /data/MR/Outline_of_Sciences.txt
... CUT OFF LINES
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0019/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/hdfs/logs/application_1394755763143_0020/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0021/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs01_8041 ... done
Waiting for /tmp/logs/oracle/logs/application_1394755763143_0022/bpdevdmsdbs02_8041 ... done
Waiting for /tmp/mapred/system/jobtracker.info ... done
Waiting for /user/d284671/.staging/job_1391480945032_0001/job.split ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 4:18 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF450E.2DE05910]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)





________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF4456.B75D8770]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
Is this controlled by this parameter (dfs.datanode.handler.count)?

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF4455.2B5B0CD0]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF4456.B75D8770]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF4456.B75D8770]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
Is this controlled by this parameter (dfs.datanode.handler.count)?

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF4455.2B5B0CD0]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
Is this controlled by this parameter (dfs.datanode.handler.count)?

Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF4455.2B5B0CD0]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by "Phan, Truong Q" <Tr...@team.telstra.com>.

Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014

Status: CORRUPT
Total size:    7325542923 B
Total dirs:    138
Total files:   383
Total symlinks:                0 (Files currently being written: 2)
Total blocks (validated):      424 (avg. block size 17277223 B)
  ********************************
  CORRUPT FILES:        3
  MISSING BLOCKS:       3
  MISSING SIZE:         791 B
  CORRUPT BLOCKS:       3
  ********************************
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       417 (98.34906 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.976415
Corrupt blocks:                3
Missing replicas:              417 (33.147854 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-------------------------------------------------
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.battula@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF4456.B75D8770]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com<ma...@team.telstra.com>
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org
Subject: how to free up space of the old Data Node

Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org
Subject: how to free up space of the old Data Node

Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org
Subject: how to free up space of the old Data Node

Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.

RE: how to free up space of the old Data Node

Posted by Brahma Reddy Battula <br...@huawei.com>.

Please check my inline comments which are in blue color...



________________________________
From: Phan, Truong Q [Troung.Phan@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org
Subject: how to free up space of the old Data Node

Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it.
>>> is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's)..
>>>And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
>>> ways are there,,but you need to mention, why only this node disk is full..why not other nodes..?  is it because,this node is having less space compared to other nodes
>>> If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G     /data/dfs/dn/current
>>> try to give the following output also
      sudo -u hdfs hadoop fsck /
     sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:    7186453688 B
Total dirs:    11
Total files:   62
Total symlinks:                0
Total blocks (validated):      105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       105 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.0
Corrupt blocks:                0
Missing replicas:              105 (33.333332 %)
Number of data-nodes:          2
Number of racks:               1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport & Routing Engineering | Networks | Telstra Operations

[Telstra]


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.phan@team.telstra.com
W  www.telstra.com<https://email-cn.huawei.com/owa/UrlBlockedError.aspx>

Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movies<http://www.telstra.com/movies>

This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply.