You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by yo...@wipro.com on 2012/10/29 09:33:55 UTC

How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size: 7600625746 B
Total dirs: 205
Total files: 173
Total blocks (validated): 270 (avg. block size 28150465 B)
********************************
CORRUPT FILES: 171
MISSING BLOCKS: 269
MISSING SIZE: 7600625742 B
CORRUPT BLOCKS: 269
********************************
Minimally replicated blocks: 1 (0.37037036 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 0.0037037036
Corrupt blocks: 269
Missing replicas: 0 (0.0 %)
Number of data-nodes: 1
Number of racks: 1

Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Re: How to do HADOOP RECOVERY ???

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh

You have the dfs.name.dir of the previous install and hence just the metadata alone is available. Hdfs stores the actual blocks in dfs.data.dir which may no longer be there with you. If you have the previous data dir, pointing dfs.data.dir to the old one would resolve your issue.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: <yo...@wipro.com>
Date: Mon, 29 Oct 2012 10:43:44 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

If you backed up both data directory and namespace dirs correctly, configuring them back should work fine.

please check whether your configuration are getting applied properly once. for ex: I can see below dfs.data,dir

here ',' instead of '.' . this might be typo, I am just asking you relook crefully once your configs.



if you just backed up and configure the same dirs to a cluster back and starting would be just equal to restarting the cluster.



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 5:43 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Hi Uma,

You are correct, when I start cluster it goes into safemode and if I do wait its doesn't come out.
I use  -safemode leave option.

Safe mode is ON. The ratio of reported blocks 0.0037 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)


When I start fresh cluster mean..

I have saved the fs.name.dir and fs.data.dir seprately for back-up of old cluster(single node). and used old machine and new machine to start new cluster ( old machine acted as DN and newly added machine was acted as NN+DN). and at the same time I have given different directory location for dfs.name.dir and dfs.data.dir on old machine

Say when it was single node
 dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir

when I used it with another machine as D.N
dfs.name.dir --> /HADOOP/MULTINODE/Name_Dir  && dfs.data.dir --> /HADOOP/MULTINODE/Data_Dir



Now I get back to previous stage.

Old Machine as single Node Cluster (NN + DN)
and gave the path for dfs.name.dir && dfs.data.dir (  dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir)

I have saved namespace and data before configuring the multi node cluster with new machine.



It should work after giving the name space and data directory path in conf files of single node machine and should show the previous content, or I am wrong ??

Why Its it happening, and why is it not cumming from safe mode by itself

Please suggest

Regards
Yogesh Kumar
________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 5:10 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

If you backed up both data directory and namespace dirs correctly, configuring them back should work fine.

please check whether your configuration are getting applied properly once. for ex: I can see below dfs.data,dir

here ',' instead of '.' . this might be typo, I am just asking you relook crefully once your configs.



if you just backed up and configure the same dirs to a cluster back and starting would be just equal to restarting the cluster.



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 5:43 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Hi Uma,

You are correct, when I start cluster it goes into safemode and if I do wait its doesn't come out.
I use  -safemode leave option.

Safe mode is ON. The ratio of reported blocks 0.0037 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)


When I start fresh cluster mean..

I have saved the fs.name.dir and fs.data.dir seprately for back-up of old cluster(single node). and used old machine and new machine to start new cluster ( old machine acted as DN and newly added machine was acted as NN+DN). and at the same time I have given different directory location for dfs.name.dir and dfs.data.dir on old machine

Say when it was single node
 dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir

when I used it with another machine as D.N
dfs.name.dir --> /HADOOP/MULTINODE/Name_Dir  && dfs.data.dir --> /HADOOP/MULTINODE/Data_Dir



Now I get back to previous stage.

Old Machine as single Node Cluster (NN + DN)
and gave the path for dfs.name.dir && dfs.data.dir (  dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir)

I have saved namespace and data before configuring the multi node cluster with new machine.



It should work after giving the name space and data directory path in conf files of single node machine and should show the previous content, or I am wrong ??

Why Its it happening, and why is it not cumming from safe mode by itself

Please suggest

Regards
Yogesh Kumar
________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 5:10 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

If you backed up both data directory and namespace dirs correctly, configuring them back should work fine.

please check whether your configuration are getting applied properly once. for ex: I can see below dfs.data,dir

here ',' instead of '.' . this might be typo, I am just asking you relook crefully once your configs.



if you just backed up and configure the same dirs to a cluster back and starting would be just equal to restarting the cluster.



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 5:43 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Hi Uma,

You are correct, when I start cluster it goes into safemode and if I do wait its doesn't come out.
I use  -safemode leave option.

Safe mode is ON. The ratio of reported blocks 0.0037 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)


When I start fresh cluster mean..

I have saved the fs.name.dir and fs.data.dir seprately for back-up of old cluster(single node). and used old machine and new machine to start new cluster ( old machine acted as DN and newly added machine was acted as NN+DN). and at the same time I have given different directory location for dfs.name.dir and dfs.data.dir on old machine

Say when it was single node
 dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir

when I used it with another machine as D.N
dfs.name.dir --> /HADOOP/MULTINODE/Name_Dir  && dfs.data.dir --> /HADOOP/MULTINODE/Data_Dir



Now I get back to previous stage.

Old Machine as single Node Cluster (NN + DN)
and gave the path for dfs.name.dir && dfs.data.dir (  dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir)

I have saved namespace and data before configuring the multi node cluster with new machine.



It should work after giving the name space and data directory path in conf files of single node machine and should show the previous content, or I am wrong ??

Why Its it happening, and why is it not cumming from safe mode by itself

Please suggest

Regards
Yogesh Kumar
________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 5:10 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

If you backed up both data directory and namespace dirs correctly, configuring them back should work fine.

please check whether your configuration are getting applied properly once. for ex: I can see below dfs.data,dir

here ',' instead of '.' . this might be typo, I am just asking you relook crefully once your configs.



if you just backed up and configure the same dirs to a cluster back and starting would be just equal to restarting the cluster.



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 5:43 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Hi Uma,

You are correct, when I start cluster it goes into safemode and if I do wait its doesn't come out.
I use  -safemode leave option.

Safe mode is ON. The ratio of reported blocks 0.0037 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)


When I start fresh cluster mean..

I have saved the fs.name.dir and fs.data.dir seprately for back-up of old cluster(single node). and used old machine and new machine to start new cluster ( old machine acted as DN and newly added machine was acted as NN+DN). and at the same time I have given different directory location for dfs.name.dir and dfs.data.dir on old machine

Say when it was single node
 dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir

when I used it with another machine as D.N
dfs.name.dir --> /HADOOP/MULTINODE/Name_Dir  && dfs.data.dir --> /HADOOP/MULTINODE/Data_Dir



Now I get back to previous stage.

Old Machine as single Node Cluster (NN + DN)
and gave the path for dfs.name.dir && dfs.data.dir (  dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir)

I have saved namespace and data before configuring the multi node cluster with new machine.



It should work after giving the name space and data directory path in conf files of single node machine and should show the previous content, or I am wrong ??

Why Its it happening, and why is it not cumming from safe mode by itself

Please suggest

Regards
Yogesh Kumar
________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 5:10 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by yo...@wipro.com.

Hi Uma,

You are correct, when I start cluster it goes into safemode and if I do wait its doesn't come out.
I use  -safemode leave option.

Safe mode is ON. The ratio of reported blocks 0.0037 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)


When I start fresh cluster mean..

I have saved the fs.name.dir and fs.data.dir seprately for back-up of old cluster(single node). and used old machine and new machine to start new cluster ( old machine acted as DN and newly added machine was acted as NN+DN). and at the same time I have given different directory location for dfs.name.dir and dfs.data.dir on old machine

Say when it was single node
 dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir

when I used it with another machine as D.N
dfs.name.dir --> /HADOOP/MULTINODE/Name_Dir  && dfs.data.dir --> /HADOOP/MULTINODE/Data_Dir



Now I get back to previous stage.

Old Machine as single Node Cluster (NN + DN)
and gave the path for dfs.name.dir && dfs.data.dir (  dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir)

I have saved namespace and data before configuring the multi node cluster with new machine.



It should work after giving the name space and data directory path in conf files of single node machine and should show the previous content, or I am wrong ??

Why Its it happening, and why is it not cumming from safe mode by itself

Please suggest

Regards
Yogesh Kumar
________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 5:10 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by yo...@wipro.com.

Hi Uma,

You are correct, when I start cluster it goes into safemode and if I do wait its doesn't come out.
I use  -safemode leave option.

Safe mode is ON. The ratio of reported blocks 0.0037 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)


When I start fresh cluster mean..

I have saved the fs.name.dir and fs.data.dir seprately for back-up of old cluster(single node). and used old machine and new machine to start new cluster ( old machine acted as DN and newly added machine was acted as NN+DN). and at the same time I have given different directory location for dfs.name.dir and dfs.data.dir on old machine

Say when it was single node
 dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir

when I used it with another machine as D.N
dfs.name.dir --> /HADOOP/MULTINODE/Name_Dir  && dfs.data.dir --> /HADOOP/MULTINODE/Data_Dir



Now I get back to previous stage.

Old Machine as single Node Cluster (NN + DN)
and gave the path for dfs.name.dir && dfs.data.dir (  dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir)

I have saved namespace and data before configuring the multi node cluster with new machine.



It should work after giving the name space and data directory path in conf files of single node machine and should show the previous content, or I am wrong ??

Why Its it happening, and why is it not cumming from safe mode by itself

Please suggest

Regards
Yogesh Kumar
________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 5:10 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by yo...@wipro.com.

Hi Uma,

You are correct, when I start cluster it goes into safemode and if I do wait its doesn't come out.
I use  -safemode leave option.

Safe mode is ON. The ratio of reported blocks 0.0037 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)


When I start fresh cluster mean..

I have saved the fs.name.dir and fs.data.dir seprately for back-up of old cluster(single node). and used old machine and new machine to start new cluster ( old machine acted as DN and newly added machine was acted as NN+DN). and at the same time I have given different directory location for dfs.name.dir and dfs.data.dir on old machine

Say when it was single node
 dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir

when I used it with another machine as D.N
dfs.name.dir --> /HADOOP/MULTINODE/Name_Dir  && dfs.data.dir --> /HADOOP/MULTINODE/Data_Dir



Now I get back to previous stage.

Old Machine as single Node Cluster (NN + DN)
and gave the path for dfs.name.dir && dfs.data.dir (  dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir)

I have saved namespace and data before configuring the multi node cluster with new machine.



It should work after giving the name space and data directory path in conf files of single node machine and should show the previous content, or I am wrong ??

Why Its it happening, and why is it not cumming from safe mode by itself

Please suggest

Regards
Yogesh Kumar
________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 5:10 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by yo...@wipro.com.

Hi Uma,

You are correct, when I start cluster it goes into safemode and if I do wait its doesn't come out.
I use  -safemode leave option.

Safe mode is ON. The ratio of reported blocks 0.0037 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)


When I start fresh cluster mean..

I have saved the fs.name.dir and fs.data.dir seprately for back-up of old cluster(single node). and used old machine and new machine to start new cluster ( old machine acted as DN and newly added machine was acted as NN+DN). and at the same time I have given different directory location for dfs.name.dir and dfs.data.dir on old machine

Say when it was single node
 dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir

when I used it with another machine as D.N
dfs.name.dir --> /HADOOP/MULTINODE/Name_Dir  && dfs.data.dir --> /HADOOP/MULTINODE/Data_Dir



Now I get back to previous stage.

Old Machine as single Node Cluster (NN + DN)
and gave the path for dfs.name.dir && dfs.data.dir (  dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir)

I have saved namespace and data before configuring the multi node cluster with new machine.



It should work after giving the name space and data directory path in conf files of single node machine and should show the previous content, or I am wrong ??

Why Its it happening, and why is it not cumming from safe mode by itself

Please suggest

Regards
Yogesh Kumar
________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 5:10 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Re: How to do HADOOP RECOVERY ???

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh

You have the dfs.name.dir of the previous install and hence just the metadata alone is available. Hdfs stores the actual blocks in dfs.data.dir which may no longer be there with you. If you have the previous data dir, pointing dfs.data.dir to the old one would resolve your issue.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: <yo...@wipro.com>
Date: Mon, 29 Oct 2012 10:43:44 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Re: How to do HADOOP RECOVERY ???

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh

You have the dfs.name.dir of the previous install and hence just the metadata alone is available. Hdfs stores the actual blocks in dfs.data.dir which may no longer be there with you. If you have the previous data dir, pointing dfs.data.dir to the old one would resolve your issue.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: <yo...@wipro.com>
Date: Mon, 29 Oct 2012 10:43:44 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Re: How to do HADOOP RECOVERY ???

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh

You have the dfs.name.dir of the previous install and hence just the metadata alone is available. Hdfs stores the actual blocks in dfs.data.dir which may no longer be there with you. If you have the previous data dir, pointing dfs.data.dir to the old one would resolve your issue.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: <yo...@wipro.com>
Date: Mon, 29 Oct 2012 10:43:44 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

I am not sure, I understood your scenario correctly here. Here is one possibility for this situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated as your name space is fresh now(infact it can not register untill you clean the data dirs in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode. How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files infomation will not be known by by older image and it will expect the blocks to be report and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by yo...@wipro.com.

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by yo...@wipro.com.

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by yo...@wipro.com.

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by yo...@wipro.com.

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added another machine and made another one as a master + worker and the fist machine as a worker only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595 from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null    ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: How to do HADOOP RECOVERY ???

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com