You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chris Nauroth (JIRA)" <ji...@apache.org> on 2016/08/10 15:58:20 UTC

[jira] [Comment Edited] (HADOOP-13447) S3Guard: Refactor S3AFileSystem to support introduction of separate metadata repository and tests.

    [ https://issues.apache.org/jira/browse/HADOOP-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415482#comment-15415482 ] 

Chris Nauroth edited comment on HADOOP-13447 at 8/10/16 3:57 PM:
-----------------------------------------------------------------

I'm attaching patch v001 to demonstrate what I have in mind.  The test code refactoring in HADOOP-13446 is a pre-requisite for this patch.

There are at least 2 more things I want to do with this patch before it's ready:

# I want to write a true unit test that mocks S3 client interactions, to prove that the patch does in fact set us up to be able to mock the S3 calls effectively (and therefore simulate eventual consistency).
# I have introduced a test failure in {{ITestS3AFileContextStatistics#testStatistics}}.  Root cause is that handling of {{FileSystem.Statistics}} through {{DelegateToFileSystem}} is a bit funky in terms of scope/lifetime of that stats instance.  I haven't found the best fix yet though.  All other existing tests are passing.

Here is a summary of changes broken down by significant classes:
* {{S3AFileSystem}}: This is now a much smaller class.  It will be responsible for initializing an {{S3Store}}, which encapsulates the S3 calls, and a concrete subclass of {{AbstractS3AccessPolicy}}, which will control how client calls coordinate with S3 and optionally other external metadata repositories.
* {{S3ClientFactory}}: This is a factory for construction of the S3 client instance.  Note that its return type is defined as {{AmazonS3}} (an interface from the AWS SDK), not {{AmazonS3Client}} (the concrete implementation that issues HTTP calls to the S3 back-end).  This is the indirection that will allow us to mock the S3 calls.  Tests will be able to configure a different factory to return a mock client.  The default implementation is {{DefaultS3ClientFactory}}, and all pre-existing configuration logic related to the S3 client has moved here.
* {{S3Store}}: Much of the existing code of {{S3AFileSystem}} has moved here.  This class encapsulates how client calls translate to S3 calls.  This layer uses {{Configuration}} to lookup the desired {{S3ClientFactory}} implementation.
* {{AbstractS3AccessPolicy}} / {{DirectS3AccessPolicy}}: Policy classes define how client calls coordinate between S3 calls (the {{S3Store}}) and optionally other external metadata repositories.  Currently, the only concrete implementation just delegates directly to S3, which provides the same semantics as the existing S3A codebase.  The scope of the various "implement access policy" sub-tasks is to implement other sub-classes that provide different semantics: caching, cross-validation for strong consistency, etc.


was (Author: cnauroth):
I'm attach patch v001 to demonstrate what I have in mind.  The test code refactoring in HADOOP-13446 is a pre-requisite for this patch.

There are at least 2 more things I want to do with this patch before it's ready:

# I want to write a true unit test that mocks S3 client interactions, to prove that the patch does in fact set us up to be able to mock the S3 calls effectively (and therefore simulate eventual consistency).
# I have introduced a test failure in {{ITestS3AFileContextStatistics#testStatistics}}.  Root cause is that handling of {{FileSystem.Statistics}} through {{DelegateToFileSystem}} is a bit funky in terms of scope/lifetime of that stats instance.  I haven't found the best fix yet though.  All other existing tests are passing.

Here is a summary of changes broken down by significant classes:
* {{S3AFileSystem}}: This is now a much smaller class.  It will be responsible for initializing an {{S3Store}}, which encapsulates the S3 calls, and a concrete subclass of {{AbstractS3AccessPolicy}}, which will control how client calls coordinate with S3 and optionally other external metadata repositories.
* {{S3ClientFactory}}: This is a factory for construction of the S3 client instance.  Note that its return type is defined as {{AmazonS3}} (an interface from the AWS SDK), not {{AmazonS3Client}} (the concrete implementation that issues HTTP calls to the S3 back-end).  This is the indirection that will allow us to mock the S3 calls.  Tests will be able to configure a different factory to return a mock client.  The default implementation is {{DefaultS3ClientFactory}}, and all pre-existing configuration logic related to the S3 client has moved here.
* {{S3Store}}: Much of the existing code of {{S3AFileSystem}} has moved here.  This class encapsulates how client calls translate to S3 calls.  This layer uses {{Configuration}} to lookup the desired {{S3ClientFactory}} implementation.
* {{AbstractS3AccessPolicy}} / {{DirectS3AccessPolicy}}: Policy classes define how client calls coordinate between S3 calls (the {{S3Store}}) and optionally other external metadata repositories.  Currently, the only concrete implementation just delegates directly to S3, which provides the same semantics as the existing S3A codebase.  The scope of the various "implement access policy" sub-tasks is to implement other sub-classes that provide different semantics: caching, cross-validation for strong consistency, etc.

> S3Guard: Refactor S3AFileSystem to support introduction of separate metadata repository and tests.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13447
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13447
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13447-HADOOP-13446.001.patch
>
>
> The scope of this issue is to refactor the existing {{S3AFileSystem}} into multiple coordinating classes.  The goal of this refactoring is to separate the {{FileSystem}} API binding from the AWS SDK integration, make code maintenance easier while we're making changes for S3Guard, and make it easier to mock some implementation details so that tests can simulate eventual consistency behavior in a deterministic way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org