You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Margus Roo <ma...@roo.ee> on 2015/01/08 09:52:51 UTC
Manually deleted blocks from datanodes
Hi
I have simple HDFS setup: 1nn and 2dn.
I created file and added it into HDFS.
About file:
-bash-4.1$ hdfs fsck -blocks -locations -files /user/margusja/file2.txt
Connecting to namenode via http://nn:50070
FSCK started by hdfs (auth:SIMPLE) from /10.101.9.122 for path
/user/margusja/file2.txt at Thu Jan 08 10:34:13 EET 2015
/user/margusja/file2.txt 409600000 bytes, 4 block(s): OK
0. BP-808850907-10.101.21.132-1420641040354:blk_1073741828_1004
len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
1. BP-808850907-10.101.21.132-1420641040354:blk_1073741829_1005
len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
2. BP-808850907-10.101.21.132-1420641040354:blk_1073741830_1006
len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
3. BP-808850907-10.101.21.132-1420641040354:blk_1073741831_1007
len=6946816 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
Status: HEALTHY
Total size: 409600000 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 4 (avg. block size 102400000 B)
Minimally replicated blocks: 4 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Thu Jan 08 10:34:13 EET 2015 in 1 milliseconds
The filesystem under path '/user/margusja/file2.txt' is HEALTHY
Now I went into one datanode and just deleted blk_1073741828 and got
into dn's log:
2015-01-08 10:02:00,994 WARN
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Removed block 1073741828 from memory with missing block file on the disk
2015-01-08 10:02:00,994 WARN
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Deleted a metadata file for the deleted block
/grid/hadoop/hdfs/dn/current/BP-808850907-10.101.21.132-1420641040354/current/finalized/blk_1073741828_1004.meta
But still hdfs gives me that HDFS is healthy.
I can download the file from HDFS using hdfs dfs -get
/user/margusja/file2.txt - there are some warnings that block is missing.
Now I went into second dn and deleted blk_1073741828.
Still hdfs fsck in nn gives me information that HDFS is OK.
Of course now I can't get my file anymore using hdfs dfs -get
/user/margusja/file2.txt because blk_1073741828 is does not exist in dn1
and da2. But still nn is happy and thinks that HDFS is ok.
I guess I am testing it in wrong way.
Is there best practices how to test HDFS before going live? Steps like
if somehow one block will be missing or corrupted?
--
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 480
Re: Manually deleted blocks from datanodes
Posted by Margus Roo <ma...@roo.ee>.
Hi
As I understand there is property for dn's -
dfs.datanode.scan.period.hours and it is by default 504 hours. So in
case I modifi it to 1 hour it means if I delete block from one server
then after one hour nn get info that block is missing and will restore it?
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 480
On 08/01/15 10:52, Margus Roo wrote:
> Hi
>
> I have simple HDFS setup: 1nn and 2dn.
>
> I created file and added it into HDFS.
> About file:
> -bash-4.1$ hdfs fsck -blocks -locations -files /user/margusja/file2.txt
> Connecting to namenode via http://nn:50070
> FSCK started by hdfs (auth:SIMPLE) from /10.101.9.122 for path
> /user/margusja/file2.txt at Thu Jan 08 10:34:13 EET 2015
> /user/margusja/file2.txt 409600000 bytes, 4 block(s): OK
> 0. BP-808850907-10.101.21.132-1420641040354:blk_1073741828_1004
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 1. BP-808850907-10.101.21.132-1420641040354:blk_1073741829_1005
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 2. BP-808850907-10.101.21.132-1420641040354:blk_1073741830_1006
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 3. BP-808850907-10.101.21.132-1420641040354:blk_1073741831_1007
> len=6946816 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
>
> Status: HEALTHY
> Total size: 409600000 B
> Total dirs: 0
> Total files: 1
> Total symlinks: 0
> Total blocks (validated): 4 (avg. block size 102400000 B)
> Minimally replicated blocks: 4 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 2
> Average block replication: 2.0
> Corrupt blocks: 0
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 2
> Number of racks: 1
> FSCK ended at Thu Jan 08 10:34:13 EET 2015 in 1 milliseconds
>
>
> The filesystem under path '/user/margusja/file2.txt' is HEALTHY
>
> Now I went into one datanode and just deleted blk_1073741828 and got
> into dn's log:
> 2015-01-08 10:02:00,994 WARN
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Removed block 1073741828 from memory with missing block file on the disk
> 2015-01-08 10:02:00,994 WARN
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Deleted a metadata file for the deleted block
> /grid/hadoop/hdfs/dn/current/BP-808850907-10.101.21.132-1420641040354/current/finalized/blk_1073741828_1004.meta
>
> But still hdfs gives me that HDFS is healthy.
>
> I can download the file from HDFS using hdfs dfs -get
> /user/margusja/file2.txt - there are some warnings that block is missing.
>
> Now I went into second dn and deleted blk_1073741828.
>
> Still hdfs fsck in nn gives me information that HDFS is OK.
>
> Of course now I can't get my file anymore using hdfs dfs -get
> /user/margusja/file2.txt because blk_1073741828 is does not exist in
> dn1 and da2. But still nn is happy and thinks that HDFS is ok.
>
> I guess I am testing it in wrong way.
> Is there best practices how to test HDFS before going live? Steps like
> if somehow one block will be missing or corrupted?
>
>
>
Re: Manually deleted blocks from datanodes
Posted by Margus Roo <ma...@roo.ee>.
Hi
As I understand there is property for dn's -
dfs.datanode.scan.period.hours and it is by default 504 hours. So in
case I modifi it to 1 hour it means if I delete block from one server
then after one hour nn get info that block is missing and will restore it?
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 480
On 08/01/15 10:52, Margus Roo wrote:
> Hi
>
> I have simple HDFS setup: 1nn and 2dn.
>
> I created file and added it into HDFS.
> About file:
> -bash-4.1$ hdfs fsck -blocks -locations -files /user/margusja/file2.txt
> Connecting to namenode via http://nn:50070
> FSCK started by hdfs (auth:SIMPLE) from /10.101.9.122 for path
> /user/margusja/file2.txt at Thu Jan 08 10:34:13 EET 2015
> /user/margusja/file2.txt 409600000 bytes, 4 block(s): OK
> 0. BP-808850907-10.101.21.132-1420641040354:blk_1073741828_1004
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 1. BP-808850907-10.101.21.132-1420641040354:blk_1073741829_1005
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 2. BP-808850907-10.101.21.132-1420641040354:blk_1073741830_1006
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 3. BP-808850907-10.101.21.132-1420641040354:blk_1073741831_1007
> len=6946816 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
>
> Status: HEALTHY
> Total size: 409600000 B
> Total dirs: 0
> Total files: 1
> Total symlinks: 0
> Total blocks (validated): 4 (avg. block size 102400000 B)
> Minimally replicated blocks: 4 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 2
> Average block replication: 2.0
> Corrupt blocks: 0
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 2
> Number of racks: 1
> FSCK ended at Thu Jan 08 10:34:13 EET 2015 in 1 milliseconds
>
>
> The filesystem under path '/user/margusja/file2.txt' is HEALTHY
>
> Now I went into one datanode and just deleted blk_1073741828 and got
> into dn's log:
> 2015-01-08 10:02:00,994 WARN
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Removed block 1073741828 from memory with missing block file on the disk
> 2015-01-08 10:02:00,994 WARN
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Deleted a metadata file for the deleted block
> /grid/hadoop/hdfs/dn/current/BP-808850907-10.101.21.132-1420641040354/current/finalized/blk_1073741828_1004.meta
>
> But still hdfs gives me that HDFS is healthy.
>
> I can download the file from HDFS using hdfs dfs -get
> /user/margusja/file2.txt - there are some warnings that block is missing.
>
> Now I went into second dn and deleted blk_1073741828.
>
> Still hdfs fsck in nn gives me information that HDFS is OK.
>
> Of course now I can't get my file anymore using hdfs dfs -get
> /user/margusja/file2.txt because blk_1073741828 is does not exist in
> dn1 and da2. But still nn is happy and thinks that HDFS is ok.
>
> I guess I am testing it in wrong way.
> Is there best practices how to test HDFS before going live? Steps like
> if somehow one block will be missing or corrupted?
>
>
>
Re: Manually deleted blocks from datanodes
Posted by Margus Roo <ma...@roo.ee>.
Hi
As I understand there is property for dn's -
dfs.datanode.scan.period.hours and it is by default 504 hours. So in
case I modifi it to 1 hour it means if I delete block from one server
then after one hour nn get info that block is missing and will restore it?
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 480
On 08/01/15 10:52, Margus Roo wrote:
> Hi
>
> I have simple HDFS setup: 1nn and 2dn.
>
> I created file and added it into HDFS.
> About file:
> -bash-4.1$ hdfs fsck -blocks -locations -files /user/margusja/file2.txt
> Connecting to namenode via http://nn:50070
> FSCK started by hdfs (auth:SIMPLE) from /10.101.9.122 for path
> /user/margusja/file2.txt at Thu Jan 08 10:34:13 EET 2015
> /user/margusja/file2.txt 409600000 bytes, 4 block(s): OK
> 0. BP-808850907-10.101.21.132-1420641040354:blk_1073741828_1004
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 1. BP-808850907-10.101.21.132-1420641040354:blk_1073741829_1005
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 2. BP-808850907-10.101.21.132-1420641040354:blk_1073741830_1006
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 3. BP-808850907-10.101.21.132-1420641040354:blk_1073741831_1007
> len=6946816 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
>
> Status: HEALTHY
> Total size: 409600000 B
> Total dirs: 0
> Total files: 1
> Total symlinks: 0
> Total blocks (validated): 4 (avg. block size 102400000 B)
> Minimally replicated blocks: 4 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 2
> Average block replication: 2.0
> Corrupt blocks: 0
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 2
> Number of racks: 1
> FSCK ended at Thu Jan 08 10:34:13 EET 2015 in 1 milliseconds
>
>
> The filesystem under path '/user/margusja/file2.txt' is HEALTHY
>
> Now I went into one datanode and just deleted blk_1073741828 and got
> into dn's log:
> 2015-01-08 10:02:00,994 WARN
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Removed block 1073741828 from memory with missing block file on the disk
> 2015-01-08 10:02:00,994 WARN
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Deleted a metadata file for the deleted block
> /grid/hadoop/hdfs/dn/current/BP-808850907-10.101.21.132-1420641040354/current/finalized/blk_1073741828_1004.meta
>
> But still hdfs gives me that HDFS is healthy.
>
> I can download the file from HDFS using hdfs dfs -get
> /user/margusja/file2.txt - there are some warnings that block is missing.
>
> Now I went into second dn and deleted blk_1073741828.
>
> Still hdfs fsck in nn gives me information that HDFS is OK.
>
> Of course now I can't get my file anymore using hdfs dfs -get
> /user/margusja/file2.txt because blk_1073741828 is does not exist in
> dn1 and da2. But still nn is happy and thinks that HDFS is ok.
>
> I guess I am testing it in wrong way.
> Is there best practices how to test HDFS before going live? Steps like
> if somehow one block will be missing or corrupted?
>
>
>
Re: Manually deleted blocks from datanodes
Posted by Margus Roo <ma...@roo.ee>.
Hi
As I understand there is property for dn's -
dfs.datanode.scan.period.hours and it is by default 504 hours. So in
case I modifi it to 1 hour it means if I delete block from one server
then after one hour nn get info that block is missing and will restore it?
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 480
On 08/01/15 10:52, Margus Roo wrote:
> Hi
>
> I have simple HDFS setup: 1nn and 2dn.
>
> I created file and added it into HDFS.
> About file:
> -bash-4.1$ hdfs fsck -blocks -locations -files /user/margusja/file2.txt
> Connecting to namenode via http://nn:50070
> FSCK started by hdfs (auth:SIMPLE) from /10.101.9.122 for path
> /user/margusja/file2.txt at Thu Jan 08 10:34:13 EET 2015
> /user/margusja/file2.txt 409600000 bytes, 4 block(s): OK
> 0. BP-808850907-10.101.21.132-1420641040354:blk_1073741828_1004
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 1. BP-808850907-10.101.21.132-1420641040354:blk_1073741829_1005
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 2. BP-808850907-10.101.21.132-1420641040354:blk_1073741830_1006
> len=134217728 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
> 3. BP-808850907-10.101.21.132-1420641040354:blk_1073741831_1007
> len=6946816 repl=2 [10.87.13.166:50010, 10.85.145.228:50010]
>
> Status: HEALTHY
> Total size: 409600000 B
> Total dirs: 0
> Total files: 1
> Total symlinks: 0
> Total blocks (validated): 4 (avg. block size 102400000 B)
> Minimally replicated blocks: 4 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 2
> Average block replication: 2.0
> Corrupt blocks: 0
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 2
> Number of racks: 1
> FSCK ended at Thu Jan 08 10:34:13 EET 2015 in 1 milliseconds
>
>
> The filesystem under path '/user/margusja/file2.txt' is HEALTHY
>
> Now I went into one datanode and just deleted blk_1073741828 and got
> into dn's log:
> 2015-01-08 10:02:00,994 WARN
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Removed block 1073741828 from memory with missing block file on the disk
> 2015-01-08 10:02:00,994 WARN
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
> Deleted a metadata file for the deleted block
> /grid/hadoop/hdfs/dn/current/BP-808850907-10.101.21.132-1420641040354/current/finalized/blk_1073741828_1004.meta
>
> But still hdfs gives me that HDFS is healthy.
>
> I can download the file from HDFS using hdfs dfs -get
> /user/margusja/file2.txt - there are some warnings that block is missing.
>
> Now I went into second dn and deleted blk_1073741828.
>
> Still hdfs fsck in nn gives me information that HDFS is OK.
>
> Of course now I can't get my file anymore using hdfs dfs -get
> /user/margusja/file2.txt because blk_1073741828 is does not exist in
> dn1 and da2. But still nn is happy and thinks that HDFS is ok.
>
> I guess I am testing it in wrong way.
> Is there best practices how to test HDFS before going live? Steps like
> if somehow one block will be missing or corrupted?
>
>
>