You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2019/08/30 12:30:00 UTC

[jira] [Commented] (HADOOP-16540) Pluggable Filesystem Caching Support in FileSystem Class

    [ https://issues.apache.org/jira/browse/HADOOP-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919497#comment-16919497 ] 

Steve Loughran commented on HADOOP-16540:
-----------------------------------------

* its (user, prefix, auth) not just prefix and auth, bear that in mind
* given your example use case of S3, I'd like to know a lot more about what you are considering here and why

S3A FS instances are fairly expensive: thread and http pools, dynamo DB pools, AWS transfer managers...you don't want to have >1 per bucket if you can avoid it. It may be better to support some tuning within the store, as HADOOP-16396 did for s3guard authoritative mode.

That leaves different user credentials as the main justification, or similar things like encryption keys to use on different paths. True? Or maybe seek policies?

If so, it'll be fun trying to work out how to deal with operations which span paths.

All work has to be against hadoop trunk; you'll also need to make sure that it works with delegation tokens for job submit, including S3A DTs. That is non trivial as it is another place which uses (token identifier + FS URI) as the map. Only one DT per bucket is going to be collected or provided regardless of how many are in the cache. So please, get familiar with that code before starting to do things with fairly major implications.

> Pluggable Filesystem Caching Support in FileSystem Class
> --------------------------------------------------------
>
>                 Key: HADOOP-16540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16540
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 3.3.0
>            Reporter: Arun Ravi M V
>            Priority: Major
>
> Provide an option to use Custom Cache Class in FileSystem Class. Currently, the caching is enabled by default and uses the URI schema and authority value to determine whether to create a new FS instance for the given URI or to fetch an already existing one from the cache.
> In case of AWS S3 FS Impl, for an S3 path, the authority name will be bucket name, ie Filesystem object will be cached at the bucket level, but providing a custom caching logic can empower the user to cache it at some prefix level and provide more flexibility. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org