You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2008/10/21 21:09:44 UTC

[jira] Created: (HADOOP-4480) data node process should not die if one dir goes bad

data node process should not die if one dir goes bad
----------------------------------------------------

                 Key: HADOOP-4480
                 URL: https://issues.apache.org/jira/browse/HADOOP-4480
             Project: Hadoop Core
          Issue Type: Bug
          Components: fs
    Affects Versions: 0.18.1
            Reporter: Allen Wittenauer


When multiple directories are configured for the data node process to use to store blocks, it currently exits when one of them is not writable.   Instead, it should either completely ignore that directory or attempt to continue reading and then marking it unusable if reads fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4480) data node process should not die if one dir goes bad

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641721#action_12641721 ] 

dhruba borthakur commented on HADOOP-4480:
------------------------------------------

@Konstantin: At the end of a decommission process, a datanode actually shuts down, doesn't it? If so, then using decommissioning might not work in the current scenario.

@Allen: HDFS currently assumes that the entire disk space on the cluster is readable/writable. This keeps the accounting of used space/free space, etc pretty simple. If there is aviailable disk space, then it can be used to store new files in HDFS. The entire space available in the data directories are free disk space. If we allow the datanode to remember read-only disks, then there will be some changes in accounting. Similarly, there will be some changes needed for reporting and alerting adminstrators. This means that the "adminstration" of the cluster becomes slightly more complex. The advantage is that the last bit of disk space gets to be used.

So, my question is: are you seeing this to be a real problem on production clusters?

> data node process should not die if one dir goes bad
> ----------------------------------------------------
>
>                 Key: HADOOP-4480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4480
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Allen Wittenauer
>
> When multiple directories are configured for the data node process to use to store blocks, it currently exits when one of them is not writable.   Instead, it should either completely ignore that directory or attempt to continue reading and then marking it unusable if reads fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4480) data node process should not die if one dir goes bad

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641662#action_12641662 ] 

Konstantin Shvachko commented on HADOOP-4480:
---------------------------------------------

Does it make more sense for a node with a bad drive to go to decommissioning mode?
The web UI can be modified to have 3 sections (live, dead and decommissioning nodes) rather than just 2.
And the node can be still used for replication and read-only purposes.

> data node process should not die if one dir goes bad
> ----------------------------------------------------
>
>                 Key: HADOOP-4480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4480
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Allen Wittenauer
>
> When multiple directories are configured for the data node process to use to store blocks, it currently exits when one of them is not writable.   Instead, it should either completely ignore that directory or attempt to continue reading and then marking it unusable if reads fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4480) data node process should not die if one dir goes bad

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642998#action_12642998 ] 

Allen Wittenauer commented on HADOOP-4480:
------------------------------------------

I specifically targeted the HDFS framework in this bug primarily because the MR framework issues are actually worse.  There is a very good chance that if you have multiple disks, you have swap spread across those disks.  In the case of drive failure, this means you lose a chunk of swap.  Loss of swap==less memory for streaming jobs==job failure in many, many instances.  So let's not get distracted with the issues around MR, job failure, job speed, etc.

What I'm seeing is that at any given time we have 10-20% of our nodes down. The vast majority have a single failed disk.  This means we're leaving capacity on the floor, waiting for a drive replacement.  Why can't these machines just stay up, providing blocks and providing space on the good drives?  For large clusters, this might be a minor inconvenience but for small clusters this could be deadly. 

The current fix is done with wetware, a source of additional strain on traditionally overloaded operations teams.  Random failure times vs. letting the ops team decide when a data node goes down?  This seems like a no brainer from a practicality perspective. Yes, this is clearly more difficult than just killing the node completely.  But over the long haul, it is going to be cheaper in human labor to fix this in Hadoop than to throw more admins at it.

> data node process should not die if one dir goes bad
> ----------------------------------------------------
>
>                 Key: HADOOP-4480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4480
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Allen Wittenauer
>
> When multiple directories are configured for the data node process to use to store blocks, it currently exits when one of them is not writable.   Instead, it should either completely ignore that directory or attempt to continue reading and then marking it unusable if reads fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4480) data node process should not die if one dir goes bad

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642801#action_12642801 ] 

Runping Qi commented on HADOOP-4480:
------------------------------------



I think the map/reduce framework has to handling similar problems.
If a drive of a machine goes bad, the tasks on that machine tend to become stragglers.
The overall performance will be impacted.
Overall, Hadoop is much better at handling total failure than partial failure of nodes, I think it is better to decommission a bad node at a drive failure.
The admin may later choose to remove the drive from the configuration file and restart the node, if he does not want to take away the node for repair.


> data node process should not die if one dir goes bad
> ----------------------------------------------------
>
>                 Key: HADOOP-4480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4480
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Allen Wittenauer
>
> When multiple directories are configured for the data node process to use to store blocks, it currently exits when one of them is not writable.   Instead, it should either completely ignore that directory or attempt to continue reading and then marking it unusable if reads fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4480) data node process should not die if one dir goes bad

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642838#action_12642838 ] 

dhruba borthakur commented on HADOOP-4480:
------------------------------------------

Handling total failures is easier than handling partial failers. I agree with Runping on this one.

> data node process should not die if one dir goes bad
> ----------------------------------------------------
>
>                 Key: HADOOP-4480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4480
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Allen Wittenauer
>
> When multiple directories are configured for the data node process to use to store blocks, it currently exits when one of them is not writable.   Instead, it should either completely ignore that directory or attempt to continue reading and then marking it unusable if reads fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4480) data node process should not die if one dir goes bad

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641695#action_12641695 ] 

Allen Wittenauer commented on HADOOP-4480:
------------------------------------------

Dhruba's makes an excellent point.  The admin definitely needs to know more status on the data nodes in this sort of design.

For smaller clusters, it seems like a bad thing to decommission an entire node when you only have one bad disk.  It would be better for the data node to just start decomm'ing that dir and/or stop giving block reports for that dir.

[For equal sized disks, RAID may be an alternative.  But if you have un-equal sized disks, RAID isn't an option, as you'll be throwing storage away.]



> data node process should not die if one dir goes bad
> ----------------------------------------------------
>
>                 Key: HADOOP-4480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4480
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Allen Wittenauer
>
> When multiple directories are configured for the data node process to use to store blocks, it currently exits when one of them is not writable.   Instead, it should either completely ignore that directory or attempt to continue reading and then marking it unusable if reads fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4480) data node process should not die if one dir goes bad

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641914#action_12641914 ] 

dhruba borthakur commented on HADOOP-4480:
------------------------------------------

I think getting solution to this problem is needed, especially since you are seeing it occur quite often. Is it caused by bad disks or buggy ext3 software? Any ideas on whether XFS on linux avoids this problem of disks going read-only?



> data node process should not die if one dir goes bad
> ----------------------------------------------------
>
>                 Key: HADOOP-4480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4480
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Allen Wittenauer
>
> When multiple directories are configured for the data node process to use to store blocks, it currently exits when one of them is not writable.   Instead, it should either completely ignore that directory or attempt to continue reading and then marking it unusable if reads fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4480) data node process should not die if one dir goes bad

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641869#action_12641869 ] 

Allen Wittenauer commented on HADOOP-4480:
------------------------------------------

Yes, we're seeing this to be a problem.  We have file systems go read only all the time.

But if it makes things easier, then go ahead and go for option one of the two I mentioned in the description:  if it can't be written to, then drop it completely.

> data node process should not die if one dir goes bad
> ----------------------------------------------------
>
>                 Key: HADOOP-4480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4480
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Allen Wittenauer
>
> When multiple directories are configured for the data node process to use to store blocks, it currently exits when one of them is not writable.   Instead, it should either completely ignore that directory or attempt to continue reading and then marking it unusable if reads fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4480) data node process should not die if one dir goes bad

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4480:
-------------------------------------

    Component/s:     (was: fs)
                 dfs

> data node process should not die if one dir goes bad
> ----------------------------------------------------
>
>                 Key: HADOOP-4480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4480
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Allen Wittenauer
>
> When multiple directories are configured for the data node process to use to store blocks, it currently exits when one of them is not writable.   Instead, it should either completely ignore that directory or attempt to continue reading and then marking it unusable if reads fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4480) data node process should not die if one dir goes bad

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641629#action_12641629 ] 

dhruba borthakur commented on HADOOP-4480:
------------------------------------------

If you really want to do this, then we will need better reporting to the adminstrator so that he/she can come to know that this datanode needs tending to.

In the existing scheme of things, if a partition becomre read-only, the datanode shuts down and the adminstrator can see it being listed as 'dead" in the HDFS UI.


> data node process should not die if one dir goes bad
> ----------------------------------------------------
>
>                 Key: HADOOP-4480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4480
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Allen Wittenauer
>
> When multiple directories are configured for the data node process to use to store blocks, it currently exits when one of them is not writable.   Instead, it should either completely ignore that directory or attempt to continue reading and then marking it unusable if reads fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.