You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by "Brandon Li (Created) (JIRA)" <ji...@apache.org> on 2012/03/11 23:18:40 UTC

[jira] [Created] (HDFS-3075) Add mechanism to restore the removed storage directories

Add mechanism to restore the removed storage directories
--------------------------------------------------------

                 Key: HDFS-3075
                 URL: https://issues.apache.org/jira/browse/HDFS-3075
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: name-node
    Affects Versions: 0.24.0, 1.1.0
            Reporter: Brandon Li
            Assignee: Brandon Li


When a storage directory is inaccessible, namenode removes it from the valid storage dir list to a removedStorageDirs list. Those storage directories will not be restored when they become healthy again. 

The proposed solution is to restore the previous failed directories at the beginning of checkpointing, say, rollEdits, by copying necessary metadata files from healthy directory to unhealthy ones. In this way, whenever a failed storage directory is recovered by the administrator, he/she can immediately force a checkpointing to restored a failed directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HDFS-3075) Add mechanism to restore the removed storage directories

Posted by "Tsz Wo (Nicholas), SZE (Reopened) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HDFS-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE reopened HDFS-3075:
------------------------------------------


@Uma, you are right that HADOOP-4885 already has fixed this.  So this one is a backport.  Will revise the title.

@Eli, this is not a dupe of HDFS-2781.
                
> Add mechanism to restore the removed storage directories
> --------------------------------------------------------
>
>                 Key: HDFS-3075
>                 URL: https://issues.apache.org/jira/browse/HDFS-3075
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.24.0, 1.1.0
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>
> When a storage directory is inaccessible, namenode removes it from the valid storage dir list to a removedStorageDirs list. Those storage directories will not be restored when they become healthy again. 
> The proposed solution is to restore the previous failed directories at the beginning of checkpointing, say, rollEdits, by copying necessary metadata files from healthy directory to unhealthy ones. In this way, whenever a failed storage directory is recovered by the administrator, he/she can immediately force a checkpointing to restored a failed directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3075) Backport HADOOP-4885 to branch-1

Posted by "Tsz Wo (Nicholas), SZE (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HDFS-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE resolved HDFS-3075.
------------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1.0
     Hadoop Flags: Reviewed

I have committed this (the patch was posted on HADOOP-4885.)  Thanks, Brandon!
                
> Backport HADOOP-4885 to branch-1
> --------------------------------
>
>                 Key: HDFS-3075
>                 URL: https://issues.apache.org/jira/browse/HDFS-3075
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.24.0, 1.1.0
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>             Fix For: 1.1.0
>
>
> When a storage directory is inaccessible, namenode removes it from the valid storage dir list to a removedStorageDirs list. Those storage directories will not be restored when they become healthy again. 
> The proposed solution is to restore the previous failed directories at the beginning of checkpointing, say, rollEdits, by copying necessary metadata files from healthy directory to unhealthy ones. In this way, whenever a failed storage directory is recovered by the administrator, he/she can immediately force a checkpointing to restored a failed directory.
> See also HADOOP-4885.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3075) Add mechanism to restore the removed storage directories

Posted by "Eli Collins (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HDFS-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins resolved HDFS-3075.
-------------------------------

    Resolution: Duplicate

This is a dupe of HDFS-2781. Brandon, feel free to post a patch there.
                
> Add mechanism to restore the removed storage directories
> --------------------------------------------------------
>
>                 Key: HDFS-3075
>                 URL: https://issues.apache.org/jira/browse/HDFS-3075
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.24.0, 1.1.0
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>
> When a storage directory is inaccessible, namenode removes it from the valid storage dir list to a removedStorageDirs list. Those storage directories will not be restored when they become healthy again. 
> The proposed solution is to restore the previous failed directories at the beginning of checkpointing, say, rollEdits, by copying necessary metadata files from healthy directory to unhealthy ones. In this way, whenever a failed storage directory is recovered by the administrator, he/she can immediately force a checkpointing to restored a failed directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira