You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Aaron Fabbri (JIRA)" <ji...@apache.org> on 2016/10/10 22:48:20 UTC

[jira] [Comment Edited] (HADOOP-13651) S3Guard: S3AFileSystem Integration with MetadataStore

    [ https://issues.apache.org/jira/browse/HADOOP-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563782#comment-15563782 ] 

Aaron Fabbri edited comment on HADOOP-13651 at 10/10/16 10:47 PM:
------------------------------------------------------------------

Minor status update, since this JIRA has a long gestation period. I'm working on this now.  So far I have code for:

- New config values: {{fs.s3a.metadatastore.authoratitive}}, and {{fs.s3a.metadatastore.impl}}.
- getFileStatus()
- listStatus()
- rename()
- delete()
- mkdirs()
- copyFromLocalFile()
- copyFile()

What remains for this jira:
- create().  Figuring out the OutputStream plumbing now 
- More testing.

What I'd like to do as separate jiras (because I favor smaller code reviews).
- Delete tracking
- Retries (i.e. eventual consistency retry policy).  Would love to see this in isolation since it is non-trivial.

I'm inserting TODO comments as I go at key locations for those two items.

Interesting things about my approach so far:

I'm trying to minimize changes to {{S3AFileSystem}}
   - diff stat so far: {quote}
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java                   | 116 ++++++++++++++++++++++++++++++------
{quote}
   - I introduce a "metadatastore s3a helper/glue" class S3Guard which is a bunch of static helper functions, so far.
   - I introduce {{NullMetadataStore}} which is a no-op metadata store.   Goal was to simplify S3AFileSystem changes (always call MetadataStore, don't care if it is no-op), but I also like that it further clarifies {{MetadataStore}} semantics.  Turns out S3AFileSystem still sometimes wants to know if there is no MetadataStore to avoid allocating stuff that isn't needed.  Seems like ok tradeoff but I'll let folks comment when I post v1 patch.

I'm trying to keep PathMetadata simple:  Either you have a PathMetadata, including S3AFileStatus, or  you don't.   There are some spots where it would be convenient to just record "this path exists, but we don't have metadata yet", (e.g. create() -> OutputStream.close() -> S3AFileSystem.writeFinished().. at that point I don't have a FileStatus.), but that would complicate S3AFileSystem logic.  We'll see.



was (Author: fabbri):
Minor status update, since this JIRA has a long gestation period. I'm working on this now.  So far I have code for:

- New config values: {{fs.s3a.metadatastore.authoratitive}}, and {{fs.s3a.metadatastore.impl}}.
- getFileStatus()
- listStatus()
- rename()
- delete()
- mkdirs()
- copyFromLocalFile()
- copyFile()

What remains for this jira:
- create().  Figuring out the OutputStream plumbing now 
- More testing.

What I'd like to do as separate jiras (because I favor smaller code reviews).
- Delete tracking
- Retries (i.e. eventual consistency retry policy).  Would love to see this in isolation since it is non-trivial.

I'm inserting TODO comments as I go at key locations for those two items.

Interesting things about my approach so far:

I'm trying to minimize changes to {{S3AFileSystem}}
   - diff stat so far: {quote}
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java                   | 116 ++++++++++++++++++++++++++++++------
{quote}
   - I introduce a "metadatastore s3a helper/glue" glass S3Guard which is a bunch of static helper functions, so far.
   - I introduce {{NullMetadataStore}} which is a no-op metadata store.   Goal was to simplify S3AFileSystem changes (always call MetadataStore, don't care if it is no-op), but I also like that it further clarifies {{MetadataStore}} semantics.  Turns out S3AFileSystem still sometimes wants to know if there is no MetadataStore to avoid allocating stuff that isn't needed.  Seems like ok tradeoff but I'll let folks comment when I post v1 patch.

I'm trying to keep PathMetadata simple:  Either you have a PathMetadata, including S3AFileStatus, or  you don't.   There are some spots where it would be convenient to just record "this path exists, but we don't have metadata yet", (e.g. create() -> OutputStream.close() -> S3AFileSystem.writeFinished().. at that point I don't have a FileStatus.), but that would complicate S3AFileSystem logic.  We'll see.


> S3Guard: S3AFileSystem Integration with MetadataStore
> -----------------------------------------------------
>
>                 Key: HADOOP-13651
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13651
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>
> Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata consistency and caching.
> Implementation should have minimal overhead when no MetadataStore is configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org