You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Michael Bieniosek (JIRA)" <ji...@apache.org> on 2007/10/18 02:32:50 UTC

[jira] Created: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Datanode corruption if machine dies while writing VERSION file
--------------------------------------------------------------

                 Key: HADOOP-2073
                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
             Project: Hadoop
          Issue Type: Bug
    Affects Versions: 0.14.0
            Reporter: Michael Bieniosek


Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:

2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.

When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.

I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
{{{
    /**
     * Write version file.
     * 
     * @throws IOException
     */
    void write() throws IOException {
      corruptPreUpgradeStorage(root);
      write(getVersionFile());
    }

    void write(File to) throws IOException {
      Properties props = new Properties();
      setFields(props, this);
      RandomAccessFile file = new RandomAccessFile(to, "rws");
      FileOutputStream out = null;
      try {
        file.setLength(0);
        file.seek(0);
        out = new FileOutputStream(file.getFD());
        props.store(out, null);
      } finally {
        if (out != null) {
          out.close();
        }
        file.close();
      }
    }
}}}

So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536379 ] 

Konstantin Shvachko commented on HADOOP-2073:
---------------------------------------------

I think you are generalizing the problem.
Imo the problem we are solving here is that version file can become empty, which results in data loss.
Inconsistent data in different data directories can be with or without this patch, just because somebody edited data-node config file or version files themselves.

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Raghu Angadi
>         Attachments: versionFileSize.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536071 ] 

Raghu Angadi commented on HADOOP-2073:
--------------------------------------

I am sure there are multiple places that can result in data loss or non-operational state (like the above) if Datanode or Namenode are killed abruptly. More times the process is killed more the chances of hitting it. The problem reported here is pretty severe (from user's point of view since they cannot recreate VERSION file) and relatively straight fwd to fix.

- First storing a back up copy before writing the new file for VERSIONS looks fine.
- We still need to fix the case with multiple data directories. I will see this this can be handled well.

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Raghu Angadi
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536333 ] 

Raghu Angadi commented on HADOOP-2073:
--------------------------------------

Does windows allow changing length when the file open? Otherwise you could close the stream before setting the length.

It still leaves the problem with multiple data directories? We exit if different VERSION files are inconsistent right?


> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Raghu Angadi
>         Attachments: versionFileSize.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536344 ] 

Michael Bieniosek commented on HADOOP-2073:
-------------------------------------------

> I am attaching a patch that changes file size after writing the data rather than before.

That's still not perfect, though, since the datanode could die while the file is being rewritten, or before the file is resized.

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Raghu Angadi
>         Attachments: versionFileSize.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536354 ] 

shv edited comment on HADOOP-2073 at 10/19/07 2:58 PM:
-----------------------------------------------------------------------

> Does windows allow changing length when the file open?
Yes.

> It still leaves the problem with multiple data directories?
What is that problem? This is not intended to solve all reliability problems, just one.

> We exit if different VERSION files are inconsistent right?
For data-nodes inconsistent file values will cause an exception, for the name-node we choose the most recently updated directory.

> datanode could die while the file is being rewritten, or before the file is resized.
Yes, the perfect solution would be if we could write and resize in memory and then flush and close at once.
Here our problem is that Properties.store() writes data and flushes. So if data-node dies at this point version file will have extra data
if the new data size is less then the old one. The only extra data we write in version file now is related to distributed upgrades.
Even if the upgrade fields remain in the version file the data-node will restart and detect the upgrade has been completed.
So you never end up with the empty version file and we always have either the new or the old data in it, and never the mixture of the two.
The point is that although this approach does not work for arbitrary file modifications, it works for what we do with the version file.
Unless proven otherwise of course.

      was (Author: shv):
    > Does windows allow changing length when the file open?
Yes.

> It still leaves the problem with multiple data directories?
What is that problem? This is not intended to solve all reliability problems, just one.

> We exit if different VERSION files are inconsistent right?
For data-nodes inconsistent file values will cause an exception, for the name-node we choose the most recently updated directory.

> datanode could die while the file is being rewritten, or before the file is resized.
Yes, the perfect solution would be if we could write and resize in memory and then flush and close at once.
Here our problem is that Properties.store() writes data and flushes. So if data-node dies at this point version file will have extra data
if the new data size is less then the old one. The only extra data we write in version file now is related to distributed upgrades.
Even if the upgrade fields remain in the version file the data-node will restart and detect the upgrade has been completed.
So you never end up with the empty version file and that we always have either the new or the old data in it, and never the mixture of the two.
The point is that although this approach does not work for arbitrary file modifications, it works for what we do with the version file.

  
> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Raghu Angadi
>         Attachments: versionFileSize.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-2073:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Konstantin!

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: versionFileSize.patch, versionFileSize1.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-2073:
----------------------------------------

    Attachment: versionFileSize1.patch

Here is the patch with comments.
I think it should go in 0.15

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Raghu Angadi
>         Attachments: versionFileSize.patch, versionFileSize1.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537683 ] 

Hudson commented on HADOOP-2073:
--------------------------------

Integrated in Hadoop-Nightly #282 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/282/])

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: versionFileSize.patch, versionFileSize1.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536402 ] 

Hadoop QA commented on HADOOP-2073:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12368060/versionFileSize1.patch
against trunk revision r586264.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/973/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/973/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/973/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/973/console

This message is automatically generated.

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Konstantin Shvachko
>             Fix For: 0.15.0
>
>         Attachments: versionFileSize.patch, versionFileSize1.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536354 ] 

Konstantin Shvachko commented on HADOOP-2073:
---------------------------------------------

> Does windows allow changing length when the file open?
Yes.

> It still leaves the problem with multiple data directories?
What is that problem? This is not intended to solve all reliability problems, just one.

> We exit if different VERSION files are inconsistent right?
For data-nodes inconsistent file values will cause an exception, for the name-node we choose the most recently updated directory.

> datanode could die while the file is being rewritten, or before the file is resized.
Yes, the perfect solution would be if we could write and resize in memory and then flush and close at once.
Here our problem is that Properties.store() writes data and flushes. So if data-node dies at this point version file will have extra data
if the new data size is less then the old one. The only extra data we write in version file now is related to distributed upgrades.
Even if the upgrade fields remain in the version file the data-node will restart and detect the upgrade has been completed.
So you never end up with the empty version file and that we always have either the new or the old data in it, and never the mixture of the two.
The point is that although this approach does not work for arbitrary file modifications, it works for what we do with the version file.


> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Raghu Angadi
>         Attachments: versionFileSize.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535812 ] 

Konstantin Shvachko commented on HADOOP-2073:
---------------------------------------------

Could you please check your logs for the Exception that was thrown during the write().
So that we could trace why that happened, I mean the empty VERSION file.
StorageDirectory.write(File to) is called in two places:
- from recoverTransitionRead() and
- from register()

I presume no upgrades are involved here, right?


> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536383 ] 

Michael Bieniosek commented on HADOOP-2073:
-------------------------------------------

> [...] The point is that although this approach does not work for arbitrary file modifications, it works for what we do with the version file.

Konstantin, could you put a comment in your patch explaining this argument?  Thanks.

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Raghu Angadi
>         Attachments: versionFileSize.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Bieniosek updated HADOOP-2073:
--------------------------------------

    Component/s: dfs

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-2073:
--------------------------------

    Priority: Blocker  (was: Major)

Making this a blocker as that is the way to request the fix goes into 0.15.

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: versionFileSize.patch, versionFileSize1.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi reassigned HADOOP-2073:
------------------------------------

    Assignee: Raghu Angadi

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Raghu Angadi
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535990 ] 

Michael Bieniosek commented on HADOOP-2073:
-------------------------------------------

There's no Exception in my logs: I'm assuming the linux OOM killer sends the jvm a SIGKILL (http://lxr.linux.no/source/mm/oom_kill.c#L271), so the jvm prints out the shutdown message and exits without giving exception handlers a chance to do anything.

There aren't any upgrades involved here.

FWIW, my logs look like (repeated over and over again):

2007-10-17 00:19:07,051 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = (the hostname)
STARTUP_MSG:   args = []
************************************************************/
2007-10-17 00:19:08,338 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics 
with processName=DataNode, sessionId=null
2007-10-17 00:19:08,439 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at (the hostname)
************************************************************/

Note that it didn't take long before the process was killed.

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-2073:
----------------------------------------

    Attachment: versionFileSize.patch

Killing data-nodes immediately after they started turned out to be a good crash test. Thanks Michael.
I am attaching a patch that changes file size after writing the data rather than before.
That way VERSION never gets emptied. This will solve current Michael's problem.
In general, I agree with Raghu we should check our code for inconsistencies the file system state
can get into as a result of different crash scenarios.
I think this patch should go into 0.15

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Raghu Angadi
>         Attachments: versionFileSize.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-2073:
----------------------------------------

    Fix Version/s: 0.15.0
         Assignee: Konstantin Shvachko  (was: Raghu Angadi)
           Status: Patch Available  (was: Open)

Raghu, could you please create another Jira for the problems you mentioned here.

> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Konstantin Shvachko
>             Fix For: 0.15.0
>
>         Attachments: versionFileSize.patch, versionFileSize1.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536356 ] 

Raghu Angadi commented on HADOOP-2073:
--------------------------------------

>> It still leaves the problem with multiple data directories?
>What is that problem? This is not intended to solve all reliability problems, just one.

Yes, we want to fix just this problem ("Incosistent VERSION file, caused by frequent restarts of datanode"). If there are multiple data directories (common case in many installtions), I was wondering if we will see the same problem/symptoms caused by same root cause even with this patch, when only config of Datanode is different. 


> Datanode corruption if machine dies while writing VERSION file
> --------------------------------------------------------------
>
>                 Key: HADOOP-2073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2073
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Michael Bieniosek
>            Assignee: Raghu Angadi
>         Attachments: versionFileSize.patch
>
>
> Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes.  Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times.  A large percentage of these datanodes will not come up now, and write this message to the logs:
> 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid.
> When I check, /hadoop/dfs/data/current/VERSION is an empty file.  Consequently, I have to delete all the blocks on the datanode and start over.  Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster.
> I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this:
> {{{
>     /**
>      * Write version file.
>      * 
>      * @throws IOException
>      */
>     void write() throws IOException {
>       corruptPreUpgradeStorage(root);
>       write(getVersionFile());
>     }
>     void write(File to) throws IOException {
>       Properties props = new Properties();
>       setFields(props, this);
>       RandomAccessFile file = new RandomAccessFile(to, "rws");
>       FileOutputStream out = null;
>       try {
>         file.setLength(0);
>         file.seek(0);
>         out = new FileOutputStream(file.getFD());
>         props.store(out, null);
>       } finally {
>         if (out != null) {
>           out.close();
>         }
>         file.close();
>       }
>     }
> }}}
> So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see.  Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp?  That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.