You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mayuran Yogarajah <ma...@casalemedia.com> on 2009/03/10 01:20:37 UTC

HDFS is corrupt, need to salvage the data.

Hello, it seems the HDFS in my cluster is corrupt.  This is the output 
from hadoop fsck:
 Total size:    9196815693 B
 Total dirs:    17
 Total files:   157
 Total blocks:  157 (avg. block size 58578443 B)
  ********************************
  CORRUPT FILES:        157
  MISSING BLOCKS:       157
  MISSING SIZE:         9196815693 B
  ********************************
 Minimally replicated blocks:   0 (0.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    1
 Average block replication:     0.0
 Missing replicas:              0
 Number of data-nodes:          1
 Number of racks:               1

It seems to say that there is 1 block missing from every file that was 
in the cluster..

I'm not sure how to proceed so any guidance would be much appreciated.  
My primary
concern is recovering the data.

thanks

Re: HDFS is corrupt, need to salvage the data.

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Mayuran,

It takes very long for a lot of iterations if we have to go through each 
debugging step, one at a time. May be a jira is a good place.

- Run fsck with blocks option.

- Check if those ids match with ids in file names found by 'find'.

- Check which directory are these files in.. and verify if that matches 
with datanode configured directory

You are saying there is nothing wrong in the log files, but does it 
imply that datanode sees those 157 missing blocks? May be you should 
post the log or verify that yourself. If DN is working correctly 
according to you, then you should not have 100% of blocks missing.

There are many possibilities, it not easy for me list the the right one 
in your case without much info or list all possible conditions.

Raghu.

Mayuran Yogarajah wrote:
> Mayuran Yogarajah wrote:
>> Raghu Angadi wrote:
>>  
>>> The block files usually don't disappear easily. Check on the datanode if
>>> you find any files starting with "blk". Also check datanode log to see
>>> what happened there... may be use started on a different directory or
>>> something like that.
>>>
>>> Raghu.
>>>
>>>     
>>
>> There are indeed blk files:
>> find -name 'blk*' | wc -l
>> 158
>>
>> I didn't see anything out of the ordinary in the datanode log.
>>
>> At this point is there anything I can do to recover the files? Or do I
>> need to reformat
>> the data node and load the data in again ?
>>
>> thanks
>>   
> Sorry to resend this but I didn't receive a response and wanted to know 
> how to proceed.
> Is it possible to recover the data at this stage? Or is it gone ?
> 
> thanks


Re: HDFS is corrupt, need to salvage the data.

Posted by Mayuran Yogarajah <ma...@casalemedia.com>.
Mayuran Yogarajah wrote:
> Raghu Angadi wrote:
>   
>> The block files usually don't disappear easily. Check on the datanode if
>> you find any files starting with "blk". Also check datanode log to see
>> what happened there... may be use started on a different directory or
>> something like that.
>>
>> Raghu.
>>
>>     
>
> There are indeed blk files:
> find -name 'blk*' | wc -l
> 158
>
> I didn't see anything out of the ordinary in the datanode log.
>
> At this point is there anything I can do to recover the files? Or do I
> need to reformat
> the data node and load the data in again ?
>
> thanks
>   
Sorry to resend this but I didn't receive a response and wanted to know 
how to proceed.
Is it possible to recover the data at this stage? Or is it gone ?

thanks

Re: HDFS is corrupt, need to salvage the data.

Posted by Mayuran Yogarajah <ma...@casalemedia.com>.
Raghu Angadi wrote:
> The block files usually don't disappear easily. Check on the datanode if
> you find any files starting with "blk". Also check datanode log to see
> what happened there... may be use started on a different directory or
> something like that.
>
> Raghu.
>   

There are indeed blk files:
find -name 'blk*' | wc -l
158

I didn't see anything out of the ordinary in the datanode log.

At this point is there anything I can do to recover the files? Or do I 
need to reformat
the data node and load the data in again ?

thanks

Re: HDFS is corrupt, need to salvage the data.

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Mayuran Yogarajah wrote:
> lohit wrote:
>> How many Datanodes do you have.
>> From the output it looks like at the point when you ran fsck, you had 
>> only one datanode connected to your NameNode. Did you have others?
>> Also, I see that your default replication is set to 1. Can you check 
>> if  your datanodes are up and running.
>> Lohit
>>
>>
>>   
> There is only one data node at the moment.  Does this mean the data is 
> not recoverable?
> The HD on the machine seems fine so I'm a little confused as to what 
> caused the HDFS to
> become corrupted.

The block files usually don't disappear easily. Check on the datanode if 
you find any files starting with "blk". Also check datanode log to see 
what happened there... may be use started on a different directory or 
something like that.

Raghu.

Re: HDFS is corrupt, need to salvage the data.

Posted by Mayuran Yogarajah <ma...@casalemedia.com>.
lohit wrote:
> How many Datanodes do you have.
> From the output it looks like at the point when you ran fsck, you had only one datanode connected to your NameNode. Did you have others?
> Also, I see that your default replication is set to 1. Can you check if  your datanodes are up and running.
> Lohit
>
>
>   
There is only one data node at the moment.  Does this mean the data is 
not recoverable?
The HD on the machine seems fine so I'm a little confused as to what 
caused the HDFS to
become corrupted.

M

Re: HDFS is corrupt, need to salvage the data.

Posted by lohit <lo...@yahoo.com>.
How many Datanodes do you have.
>From the output it looks like at the point when you ran fsck, you had only one datanode connected to your NameNode. Did you have others?
Also, I see that your default replication is set to 1. Can you check if  your datanodes are up and running.
Lohit



----- Original Message ----
From: Mayuran Yogarajah <ma...@casalemedia.com>
To: core-user@hadoop.apache.org
Sent: Monday, March 9, 2009 5:20:37 PM
Subject: HDFS is corrupt, need to salvage the data.

Hello, it seems the HDFS in my cluster is corrupt.  This is the output from hadoop fsck:
Total size:    9196815693 B
Total dirs:    17
Total files:   157
Total blocks:  157 (avg. block size 58578443 B)
********************************
CORRUPT FILES:        157
MISSING BLOCKS:       157
MISSING SIZE:         9196815693 B
********************************
Minimally replicated blocks:   0 (0.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    1
Average block replication:     0.0
Missing replicas:              0
Number of data-nodes:          1
Number of racks:               1

It seems to say that there is 1 block missing from every file that was in the cluster..

I'm not sure how to proceed so any guidance would be much appreciated.  My primary
concern is recovering the data.

thanks