You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Hs <as...@gmail.com> on 2012/11/24 03:12:57 UTC

Is it possible to read a corrupted Sequence File

Hi,

I am running hadoop 1.0.3 and hbase-0.94.0on a 12-node cluster. For unknown
operational faults, 6 datanodes  have suffered a complete data loss(hdfs
data directory gone).  When I restart hadoop, it reports "*The ratio of
reported blocks 0.8252*".

I have a folder in hdfs containing many important files in hadoop
SequenceFile format. The hadoop fsck tool shows that  (in this folder)

Total size:    134867556461 B
 Total dirs:    16
 Total files:   251
 Total blocks (validated):      2136 (avg. block size 63140241 B)
  ********************************
  CORRUPT FILES:        167
  MISSING BLOCKS:       405
  MISSING SIZE:         25819446263 B
  CORRUPT BLOCKS:       405
  ********************************

I wonder if I can read these corrupted SequenceFiles with missing blocks
skipped ?  Or, what else can I do now to recover these SequenceFiles as
much as possible ?

Please save me.

Thanks !

(Sorry for duplicating this post on user and hdfs-dev list, I do not know
where exactly i should put it.)

Re: Is it possible to read a corrupted Sequence File

Posted by Mohit Anchlia <mo...@gmail.com>.
I guess one way might be to write your own dfs reader that ignores The exceptions and reads whatever it can

Sent from my iPad

On Nov 23, 2012, at 6:12 PM, Hs <as...@gmail.com> wrote:

> Hi,
> 
> I am running hadoop 1.0.3 and hbase-0.94.0on a 12-node cluster. For unknown operational faults, 6 datanodes  have suffered a complete data loss(hdfs data directory gone).  When I restart hadoop, it reports "The ratio of reported blocks 0.8252".
> 
> I have a folder in hdfs containing many important files in hadoop SequenceFile format. The hadoop fsck tool shows that  (in this folder) 
> 
> Total size:    134867556461 B
>  Total dirs:    16
>  Total files:   251
>  Total blocks (validated):      2136 (avg. block size 63140241 B)
>   ********************************
>   CORRUPT FILES:        167
>   MISSING BLOCKS:       405
>   MISSING SIZE:         25819446263 B
>   CORRUPT BLOCKS:       405
>   ********************************
> 
> I wonder if I can read these corrupted SequenceFiles with missing blocks skipped ?  Or, what else can I do now to recover these SequenceFiles as much as possible ? 
> 
> Please save me.
> 
> Thanks !
> 
> (Sorry for duplicating this post on user and hdfs-dev list, I do not know where exactly i should put it.)

Re: Is it possible to read a corrupted Sequence File

Posted by Mohit Anchlia <mo...@gmail.com>.
I guess one way might be to write your own dfs reader that ignores The exceptions and reads whatever it can

Sent from my iPad

On Nov 23, 2012, at 6:12 PM, Hs <as...@gmail.com> wrote:

> Hi,
> 
> I am running hadoop 1.0.3 and hbase-0.94.0on a 12-node cluster. For unknown operational faults, 6 datanodes  have suffered a complete data loss(hdfs data directory gone).  When I restart hadoop, it reports "The ratio of reported blocks 0.8252".
> 
> I have a folder in hdfs containing many important files in hadoop SequenceFile format. The hadoop fsck tool shows that  (in this folder) 
> 
> Total size:    134867556461 B
>  Total dirs:    16
>  Total files:   251
>  Total blocks (validated):      2136 (avg. block size 63140241 B)
>   ********************************
>   CORRUPT FILES:        167
>   MISSING BLOCKS:       405
>   MISSING SIZE:         25819446263 B
>   CORRUPT BLOCKS:       405
>   ********************************
> 
> I wonder if I can read these corrupted SequenceFiles with missing blocks skipped ?  Or, what else can I do now to recover these SequenceFiles as much as possible ? 
> 
> Please save me.
> 
> Thanks !
> 
> (Sorry for duplicating this post on user and hdfs-dev list, I do not know where exactly i should put it.)

Re: Is it possible to read a corrupted Sequence File

Posted by Mohit Anchlia <mo...@gmail.com>.
I guess one way might be to write your own dfs reader that ignores The exceptions and reads whatever it can

Sent from my iPad

On Nov 23, 2012, at 6:12 PM, Hs <as...@gmail.com> wrote:

> Hi,
> 
> I am running hadoop 1.0.3 and hbase-0.94.0on a 12-node cluster. For unknown operational faults, 6 datanodes  have suffered a complete data loss(hdfs data directory gone).  When I restart hadoop, it reports "The ratio of reported blocks 0.8252".
> 
> I have a folder in hdfs containing many important files in hadoop SequenceFile format. The hadoop fsck tool shows that  (in this folder) 
> 
> Total size:    134867556461 B
>  Total dirs:    16
>  Total files:   251
>  Total blocks (validated):      2136 (avg. block size 63140241 B)
>   ********************************
>   CORRUPT FILES:        167
>   MISSING BLOCKS:       405
>   MISSING SIZE:         25819446263 B
>   CORRUPT BLOCKS:       405
>   ********************************
> 
> I wonder if I can read these corrupted SequenceFiles with missing blocks skipped ?  Or, what else can I do now to recover these SequenceFiles as much as possible ? 
> 
> Please save me.
> 
> Thanks !
> 
> (Sorry for duplicating this post on user and hdfs-dev list, I do not know where exactly i should put it.)

Re: Is it possible to read a corrupted Sequence File

Posted by Mohit Anchlia <mo...@gmail.com>.
I guess one way might be to write your own dfs reader that ignores The exceptions and reads whatever it can

Sent from my iPad

On Nov 23, 2012, at 6:12 PM, Hs <as...@gmail.com> wrote:

> Hi,
> 
> I am running hadoop 1.0.3 and hbase-0.94.0on a 12-node cluster. For unknown operational faults, 6 datanodes  have suffered a complete data loss(hdfs data directory gone).  When I restart hadoop, it reports "The ratio of reported blocks 0.8252".
> 
> I have a folder in hdfs containing many important files in hadoop SequenceFile format. The hadoop fsck tool shows that  (in this folder) 
> 
> Total size:    134867556461 B
>  Total dirs:    16
>  Total files:   251
>  Total blocks (validated):      2136 (avg. block size 63140241 B)
>   ********************************
>   CORRUPT FILES:        167
>   MISSING BLOCKS:       405
>   MISSING SIZE:         25819446263 B
>   CORRUPT BLOCKS:       405
>   ********************************
> 
> I wonder if I can read these corrupted SequenceFiles with missing blocks skipped ?  Or, what else can I do now to recover these SequenceFiles as much as possible ? 
> 
> Please save me.
> 
> Thanks !
> 
> (Sorry for duplicating this post on user and hdfs-dev list, I do not know where exactly i should put it.)