You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Sean Mackrory (JIRA)" <ji...@apache.org> on 2017/02/02 17:22:51 UTC

[jira] [Updated] (HADOOP-14041) CLI command to prune old metadata

     [ https://issues.apache.org/jira/browse/HADOOP-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Mackrory updated HADOOP-14041:
-----------------------------------
    Attachment: HADOOP-14041-HADOOP-13345.001.patch

Attaching a patch that adds prune(timestamp) to the MetadataStore interface and existing implementations, a CLI tool, and tests for all of that. prune() takes a UTC timestamp as returned by System.currentTimeMillis() and should trim everything with a modification time older than that. The CLI tool determines the timestamp by taking the current time and subtracting various lengths of time. One tricky thing is you can specify minutes with -M, and all the time ranges are in caps so that doesn't clash with -m for specifying the metastore URL.

One thing that probably needs more work is what to do about directories. The local implementation will delete its record of a directory if all the files it tracks in that directory get pruned. I should at least do the equivalent for the DynamoDB implementation, but since there's been some special consideration for handling empty directories that may warrant some more thought. I know [~fabbri]'s been thinking about the nuances of empty directories - any thoughts on that?

All tests pass except as currently documented in other JIRAs. I did for a time have a lot of tests fail at the assertion of type S3AFileStatus in PathMetadataDynamoDBTranslation.pathMetadataToItem. Indeed, we do have a lot of instances of FileStatus (S3AFileStatus' parent class) flying around S3Guard, so I'm surprised I don't get it consistently, but today all the tests are passing. I can't see how anything I've changed while working on this patch would impact it. So just throwing this out there in case others have seen it or have any insight.

> CLI command to prune old metadata
> ---------------------------------
>
>                 Key: HADOOP-14041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14041
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-14041-HADOOP-13345.001.patch
>
>
> Add a CLI command that allows users to specify an age at which to prune metadata that hasn't been modified for an extended period of time. Since the primary use-case targeted at the moment is list consistency, it would make sense (especially when authoritative=false) to prune metadata that is expected to have become consistent a long time ago.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org