You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Wendy Chien (JIRA)" <ji...@apache.org> on 2006/09/22 20:54:24 UTC

[jira] Commented: (HADOOP-432) support undelete, snapshots, or other mechanism to recover lost files

    [ http://issues.apache.org/jira/browse/HADOOP-432?page=comments#action_12436970 ] 
            
Wendy Chien commented on HADOOP-432:
------------------------------------

Here's the current proposal.  Comments are welcome. 

Two config items:
* maximum size (say 5TB). DFS tries to keep recycle bin under this size.
* minimum time (say 1 hour). Files are never removed less than this time after they're deleted.

Namenode:
* keeps track of recycle bin size
* records deletion time of each file
* occasionally wakes up, scans deleted files and removes LRU files until desired size is reached, or files are too young, whichever comes first.

notes:
* namenode keeps deleted files sorted based on deletion times, so scan for oldest file is O(1). It's the equivalent of having ls -tr * (only in namenode, not exposed externally) 
* it's all automatic. No user intervention ever, no purge command.
* file removal is lazy. Options:
  -namenode wakes up occasionally (once a minute?) and removes *all* the files pending deletion
  -namenode wakes up frequently (once a second?) and removes a small (100?) number of files at most
* deleted files are renamed with entire path, username of deleter, and time included.  


> support undelete, snapshots, or other mechanism to recover lost files
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-432
>                 URL: http://issues.apache.org/jira/browse/HADOOP-432
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Yoram Arnon
>         Assigned To: Wendy Chien
>
> currently, once you delete a file it's gone forever.
> most file systems allow some form of recovery of deleted files.
> a simple solution would be an 'undelete' command.
> a more comprehensive solution would include snapshots, manual and automatic, with scheduling options.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira