You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2008/09/03 19:57:44 UTC

[jira] Issue Comment Edited: (HADOOP-1869) access times of HDFS files

    [ https://issues.apache.org/jira/browse/HADOOP-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628073#action_12628073 ] 

rangadi edited comment on HADOOP-1869 at 9/3/08 10:55 AM:
---------------------------------------------------------------

> a file create (such as expansion of an archive and a distcp), [...]

During create is not enough even for these use cases. Say distcp copies 10GB file and sets Mod time at create time (to t - 1month), and the last block is written 1 min later.. then the mod time after Distcp will be (t + 1min) rather than (t - 1month). -1 for extra options to create, or close, etc.

Why not just provide utimes().. since we are using POSIX as a tie breaker? 

Another Konstantin's point is that FS should not allow setting future time.. which sounds ok.. but it is just a file attribute to help users not something filesystem inherently depends upon. I don't see need to police it that much .. and since POSIX is a tie breaker we could just stick to it functionality. Note that all the use cases we need to be able to set modtime too. 

      was (Author: rangadi):
    > a file create (such as expansion of an archive and a distcp), [...]

During create is not enough even for these test cases. Say distcp copies 10GB file and set Mod time at create time (to t - 1month), and the last block was written 1 min later.. then the mod time after Distcp will be (t + 1min) rather than (t - 1month). -1 for extra options to create, or close, etc.

Why not just provide utimes().. since we are using POSIX as a tie breaker? 

Another Konstantin's point is that FS should not allow setting future time.. which sounds ok.. but it is just a file attribute to help users not something filesystem inherently depends upon. I don't see need to police it that much .. and since POSIX is a tie breaker we could just stick to it. For all the use case we need to be able to set modtime too. 
  
> access times of HDFS files
> --------------------------
>
>                 Key: HADOOP-1869
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1869
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.19.0
>
>         Attachments: accessTime1.patch, accessTime4.patch, accessTime5.patch
>
>
> HDFS should support some type of statistics that allows an administrator to determine when a file was last accessed. 
> Since HDFS does not have quotas yet, it is likely that users keep on accumulating files in their home directories without much regard to the amount of space they are occupying. This causes memory-related problems with the namenode.
> Access times are costly to maintain. AFS does not maintain access times. I thind DCE-DFS does maintain access times with a coarse granularity.
> One proposal for HDFS would be to implement something like an "access bit". 
> 1. This access-bit is set when a file is accessed. If the access bit is already set, then this call does not result in a transaction.
> 2. A FileSystem.clearAccessBits() indicates that the access bits of all files need to be cleared.
> An administrator can effectively use the above mechanism (maybe a daily cron job) to determine files that are recently used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.