You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Aaron Fabbri (JIRA)" <ji...@apache.org> on 2016/10/10 22:48:20 UTC
[jira] [Comment Edited] (HADOOP-13651) S3Guard: S3AFileSystem
Integration with MetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563782#comment-15563782 ]
Aaron Fabbri edited comment on HADOOP-13651 at 10/10/16 10:47 PM:
------------------------------------------------------------------
Minor status update, since this JIRA has a long gestation period. I'm working on this now. So far I have code for:
- New config values: {{fs.s3a.metadatastore.authoratitive}}, and {{fs.s3a.metadatastore.impl}}.
- getFileStatus()
- listStatus()
- rename()
- delete()
- mkdirs()
- copyFromLocalFile()
- copyFile()
What remains for this jira:
- create(). Figuring out the OutputStream plumbing now
- More testing.
What I'd like to do as separate jiras (because I favor smaller code reviews).
- Delete tracking
- Retries (i.e. eventual consistency retry policy). Would love to see this in isolation since it is non-trivial.
I'm inserting TODO comments as I go at key locations for those two items.
Interesting things about my approach so far:
I'm trying to minimize changes to {{S3AFileSystem}}
- diff stat so far: {quote}
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java | 116 ++++++++++++++++++++++++++++++------
{quote}
- I introduce a "metadatastore s3a helper/glue" class S3Guard which is a bunch of static helper functions, so far.
- I introduce {{NullMetadataStore}} which is a no-op metadata store. Goal was to simplify S3AFileSystem changes (always call MetadataStore, don't care if it is no-op), but I also like that it further clarifies {{MetadataStore}} semantics. Turns out S3AFileSystem still sometimes wants to know if there is no MetadataStore to avoid allocating stuff that isn't needed. Seems like ok tradeoff but I'll let folks comment when I post v1 patch.
I'm trying to keep PathMetadata simple: Either you have a PathMetadata, including S3AFileStatus, or you don't. There are some spots where it would be convenient to just record "this path exists, but we don't have metadata yet", (e.g. create() -> OutputStream.close() -> S3AFileSystem.writeFinished().. at that point I don't have a FileStatus.), but that would complicate S3AFileSystem logic. We'll see.
was (Author: fabbri):
Minor status update, since this JIRA has a long gestation period. I'm working on this now. So far I have code for:
- New config values: {{fs.s3a.metadatastore.authoratitive}}, and {{fs.s3a.metadatastore.impl}}.
- getFileStatus()
- listStatus()
- rename()
- delete()
- mkdirs()
- copyFromLocalFile()
- copyFile()
What remains for this jira:
- create(). Figuring out the OutputStream plumbing now
- More testing.
What I'd like to do as separate jiras (because I favor smaller code reviews).
- Delete tracking
- Retries (i.e. eventual consistency retry policy). Would love to see this in isolation since it is non-trivial.
I'm inserting TODO comments as I go at key locations for those two items.
Interesting things about my approach so far:
I'm trying to minimize changes to {{S3AFileSystem}}
- diff stat so far: {quote}
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java | 116 ++++++++++++++++++++++++++++++------
{quote}
- I introduce a "metadatastore s3a helper/glue" glass S3Guard which is a bunch of static helper functions, so far.
- I introduce {{NullMetadataStore}} which is a no-op metadata store. Goal was to simplify S3AFileSystem changes (always call MetadataStore, don't care if it is no-op), but I also like that it further clarifies {{MetadataStore}} semantics. Turns out S3AFileSystem still sometimes wants to know if there is no MetadataStore to avoid allocating stuff that isn't needed. Seems like ok tradeoff but I'll let folks comment when I post v1 patch.
I'm trying to keep PathMetadata simple: Either you have a PathMetadata, including S3AFileStatus, or you don't. There are some spots where it would be convenient to just record "this path exists, but we don't have metadata yet", (e.g. create() -> OutputStream.close() -> S3AFileSystem.writeFinished().. at that point I don't have a FileStatus.), but that would complicate S3AFileSystem logic. We'll see.
> S3Guard: S3AFileSystem Integration with MetadataStore
> -----------------------------------------------------
>
> Key: HADOOP-13651
> URL: https://issues.apache.org/jira/browse/HADOOP-13651
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Aaron Fabbri
> Assignee: Aaron Fabbri
>
> Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata consistency and caching.
> Implementation should have minimal overhead when no MetadataStore is configured.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org