You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Joydeep Sen Sarma <js...@facebook.com> on 2008/01/08 17:51:05 UTC

missing VERSION files leading to failed datanodes

2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is invalid.

[root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat /var/hadoop/tmp/dfs/data/current/VERSION 
[root@hadoop034.sf2p data]# 

any idea why the VERSION file is empty? and how can i regenerate it - or ask the system to generate a new one without discarding all the blocks?


i had previously shutdown and started dfs once (to debug a different bug where it's not honoring du.reserved. more on that later).

help appreciated,

Joydeep

Re: missing VERSION files leading to failed datanodes

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Joydeep Sen Sarma wrote:
> i don't see a previous dir right now. i am pretty sure it wasn't there earlier as well.
> 
> i can send u the ls output offline - but there's nothing in it other than tons of 'subdir*' 

I am just checking whether there was an ongoing upgrade at that time.
If not then there is no recovery.

> we know how it happened (disk was full due to separate bug. restart could not flush VERSION and it went missing,

Yes, this is exactly what HADOOP-2073 fixed.

> subsequent restarts failed). what kind of automatic recovery were u expecting? 
> (perhaps there's some option we should be setting, but are not).

I can only recommend to use upgrade in suspicious cases, even if you are not actually upgrading the software.
The upgrade creates a "snapshot" so that you could rollback if something goes wrong during the startup.

You did the right thing with recovering the version file.
Thanks,
--Konstantin


> -----Original Message-----
> From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
> Sent: Tue 1/8/2008 10:56 AM
> To: hadoop-user@lucene.apache.org
> Subject: Re: missing VERSION files leading to failed datanodes
>  
> Joydeep,
> 
> Do you still have the previous directory? It should be
> /var/hadoop/tmp/dfs/data/previous
> 
> If you do you can use VERSION file from there.
> If not could you please do ls -r /var/hadoop/tmp/dfs/data for me,
> Block files are not needed of course.
> 
> In any case I am interested in how it happened and why automatic recovery is not happening.
> Do you have any log messages from the time the data-node first failed?
> Was it upgrading at that time?
> Any information would be useful.
> 
> Thank you,
> --Konstantin
> 
> 
> Joydeep Sen Sarma wrote:
> 
>>we are running 0.14.4
>>
>>the fix won't help me recover the current version files. all i need is the storageid. it seems to be stored in some file header somewhere. can u tell me how to get it?
>>
>>
>>-----Original Message-----
>>From: dhruba Borthakur [mailto:dhruba@yahoo-inc.com]
>>Sent: Tue 1/8/2008 10:06 AM
>>To: hadoop-user@lucene.apache.org
>>Subject: RE: missing VERSION files leading to failed datanodes
>> 
>>Hi Joydeep,
>>
>>Which version of hadoop are you running? We had earlier fixed a bug
>>https://issues.apache.org/jira/browse/HADOOP-2073
>>in version 0.15.
>>
>>Thanks,
>>dhruba
>>
>>-----Original Message-----
>>From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] 
>>Sent: Tuesday, January 08, 2008 9:34 AM
>>To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
>>Subject: RE: missing VERSION files leading to failed datanodes
>>
>>well - at least i know why this happened. (still looking for a way to
>>restore the version file).
>>
>>https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full
>>on one of the disks (in spite of du.reserved setting). looks like while
>>starting up - the VERSION file could not be written and went missing.
>>that would seem like another bug (writing a tmp file and renaming it to
>>VERSION file would have prevented this mishap):
>>
>>2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
>>java.io.IOException: No space left on device
>>        at java.io.FileOutputStream.writeBytes(Native Method)
>>        at java.io.FileOutputStream.write(FileOutputStream.java:260)
>>        at
>>sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
>>        at
>>sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40
>>4)
>>        at
>>sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
>>        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
>>        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
>>        at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>>        at java.util.Properties.store(Properties.java:666)
>>        at
>>org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
>>        at
>>org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
>>        at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
>>        at
>>org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java
>>:146)
>>        at
>>org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)
>>
>>
>>-----Original Message-----
>>From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
>>Sent: Tue 1/8/2008 8:51 AM
>>To: hadoop-user@lucene.apache.org
>>Subject: missing VERSION files leading to failed datanodes
>> 
>>
>>2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
>>org.apache.hadoop.dfs.InconsistentFSStateException: Directory
>>/var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is
>>invalid.
>>
>>[root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat
>>/var/hadoop/tmp/dfs/data/current/VERSION 
>>[root@hadoop034.sf2p data]# 
>>
>>any idea why the VERSION file is empty? and how can i regenerate it - or
>>ask the system to generate a new one without discarding all the blocks?
>>
>>
>>i had previously shutdown and started dfs once (to debug a different bug
>>where it's not honoring du.reserved. more on that later).
>>
>>help appreciated,
>>
>>Joydeep
>>
>>
>>
> 
> 
> 

RE: missing VERSION files leading to failed datanodes

Posted by Joydeep Sen Sarma <js...@facebook.com>.
i don't see a previous dir right now. i am pretty sure it wasn't there earlier as well.

i can send u the ls output offline - but there's nothing in it other than tons of 'subdir*' 

we know how it happened (disk was full due to separate bug. restart could not flush VERSION and it went missing, subsequent restarts failed). what kind of automatic recovery were u expecting? (perhaps there's some option we should be setting, but are not).


-----Original Message-----
From: Konstantin Shvachko [mailto:shv@yahoo-inc.com]
Sent: Tue 1/8/2008 10:56 AM
To: hadoop-user@lucene.apache.org
Subject: Re: missing VERSION files leading to failed datanodes
 
Joydeep,

Do you still have the previous directory? It should be
/var/hadoop/tmp/dfs/data/previous

If you do you can use VERSION file from there.
If not could you please do ls -r /var/hadoop/tmp/dfs/data for me,
Block files are not needed of course.

In any case I am interested in how it happened and why automatic recovery is not happening.
Do you have any log messages from the time the data-node first failed?
Was it upgrading at that time?
Any information would be useful.

Thank you,
--Konstantin


Joydeep Sen Sarma wrote:
> we are running 0.14.4
> 
> the fix won't help me recover the current version files. all i need is the storageid. it seems to be stored in some file header somewhere. can u tell me how to get it?
> 
> 
> -----Original Message-----
> From: dhruba Borthakur [mailto:dhruba@yahoo-inc.com]
> Sent: Tue 1/8/2008 10:06 AM
> To: hadoop-user@lucene.apache.org
> Subject: RE: missing VERSION files leading to failed datanodes
>  
> Hi Joydeep,
> 
> Which version of hadoop are you running? We had earlier fixed a bug
> https://issues.apache.org/jira/browse/HADOOP-2073
> in version 0.15.
> 
> Thanks,
> dhruba
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] 
> Sent: Tuesday, January 08, 2008 9:34 AM
> To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
> Subject: RE: missing VERSION files leading to failed datanodes
> 
> well - at least i know why this happened. (still looking for a way to
> restore the version file).
> 
> https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full
> on one of the disks (in spite of du.reserved setting). looks like while
> starting up - the VERSION file could not be written and went missing.
> that would seem like another bug (writing a tmp file and renaming it to
> VERSION file would have prevented this mishap):
> 
> 2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
> java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:260)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40
> 4)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
>         at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
>         at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
>         at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>         at java.util.Properties.store(Properties.java:666)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
>         at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
>         at
> org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java
> :146)
>         at
> org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)
> 
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Tue 1/8/2008 8:51 AM
> To: hadoop-user@lucene.apache.org
> Subject: missing VERSION files leading to failed datanodes
>  
> 
> 2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
> org.apache.hadoop.dfs.InconsistentFSStateException: Directory
> /var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is
> invalid.
> 
> [root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat
> /var/hadoop/tmp/dfs/data/current/VERSION 
> [root@hadoop034.sf2p data]# 
> 
> any idea why the VERSION file is empty? and how can i regenerate it - or
> ask the system to generate a new one without discarding all the blocks?
> 
> 
> i had previously shutdown and started dfs once (to debug a different bug
> where it's not honoring du.reserved. more on that later).
> 
> help appreciated,
> 
> Joydeep
> 
> 
> 


Re: missing VERSION files leading to failed datanodes

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Joydeep,

Do you still have the previous directory? It should be
/var/hadoop/tmp/dfs/data/previous

If you do you can use VERSION file from there.
If not could you please do ls -r /var/hadoop/tmp/dfs/data for me,
Block files are not needed of course.

In any case I am interested in how it happened and why automatic recovery is not happening.
Do you have any log messages from the time the data-node first failed?
Was it upgrading at that time?
Any information would be useful.

Thank you,
--Konstantin


Joydeep Sen Sarma wrote:
> we are running 0.14.4
> 
> the fix won't help me recover the current version files. all i need is the storageid. it seems to be stored in some file header somewhere. can u tell me how to get it?
> 
> 
> -----Original Message-----
> From: dhruba Borthakur [mailto:dhruba@yahoo-inc.com]
> Sent: Tue 1/8/2008 10:06 AM
> To: hadoop-user@lucene.apache.org
> Subject: RE: missing VERSION files leading to failed datanodes
>  
> Hi Joydeep,
> 
> Which version of hadoop are you running? We had earlier fixed a bug
> https://issues.apache.org/jira/browse/HADOOP-2073
> in version 0.15.
> 
> Thanks,
> dhruba
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] 
> Sent: Tuesday, January 08, 2008 9:34 AM
> To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
> Subject: RE: missing VERSION files leading to failed datanodes
> 
> well - at least i know why this happened. (still looking for a way to
> restore the version file).
> 
> https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full
> on one of the disks (in spite of du.reserved setting). looks like while
> starting up - the VERSION file could not be written and went missing.
> that would seem like another bug (writing a tmp file and renaming it to
> VERSION file would have prevented this mishap):
> 
> 2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
> java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:260)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40
> 4)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
>         at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
>         at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
>         at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>         at java.util.Properties.store(Properties.java:666)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
>         at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
>         at
> org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java
> :146)
>         at
> org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)
> 
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Tue 1/8/2008 8:51 AM
> To: hadoop-user@lucene.apache.org
> Subject: missing VERSION files leading to failed datanodes
>  
> 
> 2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
> org.apache.hadoop.dfs.InconsistentFSStateException: Directory
> /var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is
> invalid.
> 
> [root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat
> /var/hadoop/tmp/dfs/data/current/VERSION 
> [root@hadoop034.sf2p data]# 
> 
> any idea why the VERSION file is empty? and how can i regenerate it - or
> ask the system to generate a new one without discarding all the blocks?
> 
> 
> i had previously shutdown and started dfs once (to debug a different bug
> where it's not honoring du.reserved. more on that later).
> 
> help appreciated,
> 
> Joydeep
> 
> 
> 

Re: missing VERSION files leading to failed datanodes

Posted by Ted Dunning <td...@veoh.com>.
Can you put this on the wiki or as a comment on the jira?  This could be (as
you just noticed) a life-saver.


On 1/8/08 10:48 AM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:

> never mind. the storageID is logged in the namenode logs. i am able to restore
> the version files and add the datanodes back.
> 
> phew.
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Tue 1/8/2008 10:11 AM
> To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
> Subject: RE: missing VERSION files leading to failed datanodes
>  
> we are running 0.14.4
> 
> the fix won't help me recover the current version files. all i need is the
> storageid. it seems to be stored in some file header somewhere. can u tell me
> how to get it?
> 
> 
> -----Original Message-----
> From: dhruba Borthakur [mailto:dhruba@yahoo-inc.com]
> Sent: Tue 1/8/2008 10:06 AM
> To: hadoop-user@lucene.apache.org
> Subject: RE: missing VERSION files leading to failed datanodes
>  
> Hi Joydeep,
> 
> Which version of hadoop are you running? We had earlier fixed a bug
> https://issues.apache.org/jira/browse/HADOOP-2073
> in version 0.15.
> 
> Thanks,
> dhruba
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Tuesday, January 08, 2008 9:34 AM
> To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
> Subject: RE: missing VERSION files leading to failed datanodes
> 
> well - at least i know why this happened. (still looking for a way to
> restore the version file).
> 
> https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full
> on one of the disks (in spite of du.reserved setting). looks like while
> starting up - the VERSION file could not be written and went missing.
> that would seem like another bug (writing a tmp file and renaming it to
> VERSION file would have prevented this mishap):
> 
> 2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
> java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:260)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40
> 4)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
>         at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
>         at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
>         at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>         at java.util.Properties.store(Properties.java:666)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
>         at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
>         at
> org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java
> :146)
>         at
> org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)
> 
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Tue 1/8/2008 8:51 AM
> To: hadoop-user@lucene.apache.org
> Subject: missing VERSION files leading to failed datanodes
>  
> 
> 2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
> org.apache.hadoop.dfs.InconsistentFSStateException: Directory
> /var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is
> invalid.
> 
> [root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat
> /var/hadoop/tmp/dfs/data/current/VERSION
> [root@hadoop034.sf2p data]#
> 
> any idea why the VERSION file is empty? and how can i regenerate it - or
> ask the system to generate a new one without discarding all the blocks?
> 
> 
> i had previously shutdown and started dfs once (to debug a different bug
> where it's not honoring du.reserved. more on that later).
> 
> help appreciated,
> 
> Joydeep
> 
> 
> 


RE: missing VERSION files leading to failed datanodes

Posted by Joydeep Sen Sarma <js...@facebook.com>.
never mind. the storageID is logged in the namenode logs. i am able to restore the version files and add the datanodes back.

phew.

-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
Sent: Tue 1/8/2008 10:11 AM
To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
Subject: RE: missing VERSION files leading to failed datanodes
 
we are running 0.14.4

the fix won't help me recover the current version files. all i need is the storageid. it seems to be stored in some file header somewhere. can u tell me how to get it?


-----Original Message-----
From: dhruba Borthakur [mailto:dhruba@yahoo-inc.com]
Sent: Tue 1/8/2008 10:06 AM
To: hadoop-user@lucene.apache.org
Subject: RE: missing VERSION files leading to failed datanodes
 
Hi Joydeep,

Which version of hadoop are you running? We had earlier fixed a bug
https://issues.apache.org/jira/browse/HADOOP-2073
in version 0.15.

Thanks,
dhruba

-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] 
Sent: Tuesday, January 08, 2008 9:34 AM
To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
Subject: RE: missing VERSION files leading to failed datanodes

well - at least i know why this happened. (still looking for a way to
restore the version file).

https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full
on one of the disks (in spite of du.reserved setting). looks like while
starting up - the VERSION file could not be written and went missing.
that would seem like another bug (writing a tmp file and renaming it to
VERSION file would have prevented this mishap):

2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at
sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
        at
sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40
4)
        at
sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
        at java.io.BufferedWriter.flush(BufferedWriter.java:236)
        at java.util.Properties.store(Properties.java:666)
        at
org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
        at
org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
        at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
        at
org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java
:146)
        at
org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)


-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
Sent: Tue 1/8/2008 8:51 AM
To: hadoop-user@lucene.apache.org
Subject: missing VERSION files leading to failed datanodes
 

2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
org.apache.hadoop.dfs.InconsistentFSStateException: Directory
/var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is
invalid.

[root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat
/var/hadoop/tmp/dfs/data/current/VERSION 
[root@hadoop034.sf2p data]# 

any idea why the VERSION file is empty? and how can i regenerate it - or
ask the system to generate a new one without discarding all the blocks?


i had previously shutdown and started dfs once (to debug a different bug
where it's not honoring du.reserved. more on that later).

help appreciated,

Joydeep




RE: missing VERSION files leading to failed datanodes

Posted by Joydeep Sen Sarma <js...@facebook.com>.
we are running 0.14.4

the fix won't help me recover the current version files. all i need is the storageid. it seems to be stored in some file header somewhere. can u tell me how to get it?


-----Original Message-----
From: dhruba Borthakur [mailto:dhruba@yahoo-inc.com]
Sent: Tue 1/8/2008 10:06 AM
To: hadoop-user@lucene.apache.org
Subject: RE: missing VERSION files leading to failed datanodes
 
Hi Joydeep,

Which version of hadoop are you running? We had earlier fixed a bug
https://issues.apache.org/jira/browse/HADOOP-2073
in version 0.15.

Thanks,
dhruba

-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] 
Sent: Tuesday, January 08, 2008 9:34 AM
To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
Subject: RE: missing VERSION files leading to failed datanodes

well - at least i know why this happened. (still looking for a way to
restore the version file).

https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full
on one of the disks (in spite of du.reserved setting). looks like while
starting up - the VERSION file could not be written and went missing.
that would seem like another bug (writing a tmp file and renaming it to
VERSION file would have prevented this mishap):

2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at
sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
        at
sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40
4)
        at
sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
        at java.io.BufferedWriter.flush(BufferedWriter.java:236)
        at java.util.Properties.store(Properties.java:666)
        at
org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
        at
org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
        at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
        at
org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java
:146)
        at
org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)


-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
Sent: Tue 1/8/2008 8:51 AM
To: hadoop-user@lucene.apache.org
Subject: missing VERSION files leading to failed datanodes
 

2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
org.apache.hadoop.dfs.InconsistentFSStateException: Directory
/var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is
invalid.

[root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat
/var/hadoop/tmp/dfs/data/current/VERSION 
[root@hadoop034.sf2p data]# 

any idea why the VERSION file is empty? and how can i regenerate it - or
ask the system to generate a new one without discarding all the blocks?


i had previously shutdown and started dfs once (to debug a different bug
where it's not honoring du.reserved. more on that later).

help appreciated,

Joydeep



Re: missing VERSION files leading to failed datanodes

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Tad,

There was a dicussion on that in HADOOP-2073.
You are right in general the moves should be atomic,
but in this particular case the in-place modification works well.
There is a comment in the code explaining this too,
but the code is in 0.15 not in 0.14.4

--Konstantin

Ted Dunning wrote:
> Dhruba,
> 
> It looks from the discussion like the file was overwritten in place.
> 
> Is that good practice?  Normally the way that this sort of update is handled
> is to write a temp file, move the live file to a backup, then move the temp
> file to the live place.  Both moves are atomic so the worst case is that you
> wind up with either a temp and a live file (ignore the temp file since it
> may be incomplete) or a backup and a temp file (move temp to live since it
> must be complete).
> 
> 
> On 1/8/08 10:06 AM, "dhruba Borthakur" <dh...@yahoo-inc.com> wrote:
> 
> 
>>Hi Joydeep,
>>
>>Which version of hadoop are you running? We had earlier fixed a bug
>>https://issues.apache.org/jira/browse/HADOOP-2073
>>in version 0.15.
>>
>>Thanks,
>>dhruba
>>
>>-----Original Message-----
>>From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
>>Sent: Tuesday, January 08, 2008 9:34 AM
>>To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
>>Subject: RE: missing VERSION files leading to failed datanodes
>>
>>well - at least i know why this happened. (still looking for a way to
>>restore the version file).
>>
>>https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full
>>on one of the disks (in spite of du.reserved setting). looks like while
>>starting up - the VERSION file could not be written and went missing.
>>that would seem like another bug (writing a tmp file and renaming it to
>>VERSION file would have prevented this mishap):
>>
>>2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
>>java.io.IOException: No space left on device
>>        at java.io.FileOutputStream.writeBytes(Native Method)
>>        at java.io.FileOutputStream.write(FileOutputStream.java:260)
>>        at
>>sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
>>        at
>>sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40
>>4)
>>        at
>>sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
>>        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
>>        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
>>        at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>>        at java.util.Properties.store(Properties.java:666)
>>        at
>>org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
>>        at
>>org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
>>        at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
>>        at
>>org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java
>>:146)
>>        at
>>org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)
>>
>>
>>-----Original Message-----
>>From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
>>Sent: Tue 1/8/2008 8:51 AM
>>To: hadoop-user@lucene.apache.org
>>Subject: missing VERSION files leading to failed datanodes
>> 
>>
>>2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
>>org.apache.hadoop.dfs.InconsistentFSStateException: Directory
>>/var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is
>>invalid.
>>
>>[root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat
>>/var/hadoop/tmp/dfs/data/current/VERSION
>>[root@hadoop034.sf2p data]#
>>
>>any idea why the VERSION file is empty? and how can i regenerate it - or
>>ask the system to generate a new one without discarding all the blocks?
>>
>>
>>i had previously shutdown and started dfs once (to debug a different bug
>>where it's not honoring du.reserved. more on that later).
>>
>>help appreciated,
>>
>>Joydeep
>>
> 
> 
> 

Re: missing VERSION files leading to failed datanodes

Posted by Ted Dunning <td...@veoh.com>.
Dhruba,

It looks from the discussion like the file was overwritten in place.

Is that good practice?  Normally the way that this sort of update is handled
is to write a temp file, move the live file to a backup, then move the temp
file to the live place.  Both moves are atomic so the worst case is that you
wind up with either a temp and a live file (ignore the temp file since it
may be incomplete) or a backup and a temp file (move temp to live since it
must be complete).


On 1/8/08 10:06 AM, "dhruba Borthakur" <dh...@yahoo-inc.com> wrote:

> Hi Joydeep,
> 
> Which version of hadoop are you running? We had earlier fixed a bug
> https://issues.apache.org/jira/browse/HADOOP-2073
> in version 0.15.
> 
> Thanks,
> dhruba
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Tuesday, January 08, 2008 9:34 AM
> To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
> Subject: RE: missing VERSION files leading to failed datanodes
> 
> well - at least i know why this happened. (still looking for a way to
> restore the version file).
> 
> https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full
> on one of the disks (in spite of du.reserved setting). looks like while
> starting up - the VERSION file could not be written and went missing.
> that would seem like another bug (writing a tmp file and renaming it to
> VERSION file would have prevented this mishap):
> 
> 2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
> java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:260)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40
> 4)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
>         at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
>         at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
>         at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>         at java.util.Properties.store(Properties.java:666)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
>         at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
>         at
> org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java
> :146)
>         at
> org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)
> 
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Tue 1/8/2008 8:51 AM
> To: hadoop-user@lucene.apache.org
> Subject: missing VERSION files leading to failed datanodes
>  
> 
> 2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
> org.apache.hadoop.dfs.InconsistentFSStateException: Directory
> /var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is
> invalid.
> 
> [root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat
> /var/hadoop/tmp/dfs/data/current/VERSION
> [root@hadoop034.sf2p data]#
> 
> any idea why the VERSION file is empty? and how can i regenerate it - or
> ask the system to generate a new one without discarding all the blocks?
> 
> 
> i had previously shutdown and started dfs once (to debug a different bug
> where it's not honoring du.reserved. more on that later).
> 
> help appreciated,
> 
> Joydeep
> 


RE: missing VERSION files leading to failed datanodes

Posted by dhruba Borthakur <dh...@yahoo-inc.com>.
Hi Joydeep,

Which version of hadoop are you running? We had earlier fixed a bug
https://issues.apache.org/jira/browse/HADOOP-2073
in version 0.15.

Thanks,
dhruba

-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] 
Sent: Tuesday, January 08, 2008 9:34 AM
To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
Subject: RE: missing VERSION files leading to failed datanodes

well - at least i know why this happened. (still looking for a way to
restore the version file).

https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full
on one of the disks (in spite of du.reserved setting). looks like while
starting up - the VERSION file could not be written and went missing.
that would seem like another bug (writing a tmp file and renaming it to
VERSION file would have prevented this mishap):

2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at
sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
        at
sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40
4)
        at
sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
        at java.io.BufferedWriter.flush(BufferedWriter.java:236)
        at java.util.Properties.store(Properties.java:666)
        at
org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
        at
org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
        at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
        at
org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java
:146)
        at
org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)


-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
Sent: Tue 1/8/2008 8:51 AM
To: hadoop-user@lucene.apache.org
Subject: missing VERSION files leading to failed datanodes
 

2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
org.apache.hadoop.dfs.InconsistentFSStateException: Directory
/var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is
invalid.

[root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat
/var/hadoop/tmp/dfs/data/current/VERSION 
[root@hadoop034.sf2p data]# 

any idea why the VERSION file is empty? and how can i regenerate it - or
ask the system to generate a new one without discarding all the blocks?


i had previously shutdown and started dfs once (to debug a different bug
where it's not honoring du.reserved. more on that later).

help appreciated,

Joydeep


RE: missing VERSION files leading to failed datanodes

Posted by Joydeep Sen Sarma <js...@facebook.com>.
partitions seem to be selected round-robin. we know the reason why the reservation is not honored (there's a small bug that i reported 2549) - patching it is easy. unfortunately, things soured trying to restart with the patch.

problem is - how do i recover these VERSION files? we don't have a high enough number of surviving nodes - a few blocks have gone missing - and all will be good if i could generate a good version file. it contains a 'storageID' that seems machine specific and i don't know how to find it anywhere else (looked in old logs as well). (other fields seem to be constant across cluster).




-----Original Message-----
From: Ted Dunning [mailto:tdunning@veoh.com]
Sent: Tue 1/8/2008 9:45 AM
To: hadoop-user@lucene.apache.org
Subject: Re: missing VERSION files leading to failed datanodes
 

This has bitten me as well.  It used to be that I would have two possible
partitions depending on which kind of machine I was on.  Some machines had
both partitions available, but one was much smaller.  Hadoop had a nasty
tendency to fill up the smaller partition.  Reordering the partitions in the
configuration helped because it appears that the first partition is always
selected.  The free space parameters do not appear to be honored in any
case.

The good news is that aggressive rebalancing seems to put things in the
right place.


On 1/8/08 9:34 AM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:

> well - at least i know why this happened. (still looking for a way to restore
> the version file).
> 
> https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full on one
> of the disks (in spite of du.reserved setting). looks like while starting up -
> the VERSION file could not be written and went missing. that would seem like
> another bug (writing a tmp file and renaming it to VERSION file would have
> prevented this mishap):
> 
> 2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
> java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:260)
>         at 
> sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
>         at 
> sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:404)
>         at 
> sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
>         at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
>         at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
>         at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>         at java.util.Properties.store(Properties.java:666)
>         at 
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
>         at 
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
>         at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
>         at 
> org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:146)
>         at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)
> 
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Tue 1/8/2008 8:51 AM
> To: hadoop-user@lucene.apache.org
> Subject: missing VERSION files leading to failed datanodes
>  
> 
> 2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
> org.apache.hadoop.dfs.InconsistentFSStateException: Directory
> /var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is invalid.
> 
> [root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat
> /var/hadoop/tmp/dfs/data/current/VERSION
> [root@hadoop034.sf2p data]#
> 
> any idea why the VERSION file is empty? and how can i regenerate it - or ask
> the system to generate a new one without discarding all the blocks?
> 
> 
> i had previously shutdown and started dfs once (to debug a different bug where
> it's not honoring du.reserved. more on that later).
> 
> help appreciated,
> 
> Joydeep
> 



Re: missing VERSION files leading to failed datanodes

Posted by Ted Dunning <td...@veoh.com>.
This has bitten me as well.  It used to be that I would have two possible
partitions depending on which kind of machine I was on.  Some machines had
both partitions available, but one was much smaller.  Hadoop had a nasty
tendency to fill up the smaller partition.  Reordering the partitions in the
configuration helped because it appears that the first partition is always
selected.  The free space parameters do not appear to be honored in any
case.

The good news is that aggressive rebalancing seems to put things in the
right place.


On 1/8/08 9:34 AM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:

> well - at least i know why this happened. (still looking for a way to restore
> the version file).
> 
> https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full on one
> of the disks (in spite of du.reserved setting). looks like while starting up -
> the VERSION file could not be written and went missing. that would seem like
> another bug (writing a tmp file and renaming it to VERSION file would have
> prevented this mishap):
> 
> 2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
> java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:260)
>         at 
> sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
>         at 
> sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:404)
>         at 
> sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
>         at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
>         at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
>         at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>         at java.util.Properties.store(Properties.java:666)
>         at 
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
>         at 
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
>         at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
>         at 
> org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:146)
>         at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)
> 
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Tue 1/8/2008 8:51 AM
> To: hadoop-user@lucene.apache.org
> Subject: missing VERSION files leading to failed datanodes
>  
> 
> 2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
> org.apache.hadoop.dfs.InconsistentFSStateException: Directory
> /var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is invalid.
> 
> [root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat
> /var/hadoop/tmp/dfs/data/current/VERSION
> [root@hadoop034.sf2p data]#
> 
> any idea why the VERSION file is empty? and how can i regenerate it - or ask
> the system to generate a new one without discarding all the blocks?
> 
> 
> i had previously shutdown and started dfs once (to debug a different bug where
> it's not honoring du.reserved. more on that later).
> 
> help appreciated,
> 
> Joydeep
> 


RE: missing VERSION files leading to failed datanodes

Posted by Joydeep Sen Sarma <js...@facebook.com>.
well - at least i know why this happened. (still looking for a way to restore the version file).

https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full on one of the disks (in spite of du.reserved setting). looks like while starting up - the VERSION file could not be written and went missing. that would seem like another bug (writing a tmp file and renaming it to VERSION file would have prevented this mishap):

2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
        at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:404)
        at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
        at java.io.BufferedWriter.flush(BufferedWriter.java:236)
        at java.util.Properties.store(Properties.java:666)
        at org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
        at org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
        at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
        at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:146)
        at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)


-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
Sent: Tue 1/8/2008 8:51 AM
To: hadoop-user@lucene.apache.org
Subject: missing VERSION files leading to failed datanodes
 

2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is invalid.

[root@hadoop034.sf2p data]# ssh hadoop003.sf2p cat /var/hadoop/tmp/dfs/data/current/VERSION 
[root@hadoop034.sf2p data]# 

any idea why the VERSION file is empty? and how can i regenerate it - or ask the system to generate a new one without discarding all the blocks?


i had previously shutdown and started dfs once (to debug a different bug where it's not honoring du.reserved. more on that later).

help appreciated,

Joydeep