You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Sean Mackrory (JIRA)" <ji...@apache.org> on 2017/03/02 23:46:45 UTC

[jira] [Commented] (HADOOP-13914) s3guard: improve S3AFileStatus#isEmptyDirectory handling

    [ https://issues.apache.org/jira/browse/HADOOP-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893273#comment-15893273 ] 

Sean Mackrory commented on HADOOP-13914:
----------------------------------------

Already discussed this a bit with you while reviewing, but sharing all thoughts for everyone else's sake:
* In case other reviewers get confused by it, the hints from Git in each hunk about which function a change is in are misleading S3AFileSystem, and in the places where putAndReturn is removed it is only removed in the S3-only version of the function that does no metadatastore operation. putAndReturn now happens higher in the callstack only when there is a metadatastore context.
* I'm a bit concerned about all the S3AFileStatus -> FileStatus changes (although I could've sworn that we already removed that assertion in PathMetadataTranslation...). I think it's definitely a change for the better, but I'm a little worried there is some application out there that this change will break, despite being @Private and @Evolving... Let's definitely at least call this out in a release note or something.
* Along similar lines, I wondered if a `public S3AFileStatus getFileStatus(Path, bool)` function might be in order in S3AFileSystem? No idea how much it'll get used, but if anyone IS depending on the current S3AFileStatus return type and wants that work done it'd be useful.
* I also thought about a S3AFileStatus.setIsEmptyDirectory(bool) wrapper to avoid any unnecessary change in compatibility, but again, very unlikely to be used publicly, so feel free to ignore. isEmptyDirectory is probably going break the same set of applications that might use it anyway (which for all I know is an empty set), so no perfectly compatible way to do this anyway...
* JavaDoc comment on ITestSGuardEmptyDirs recommends a refactoring after HADOOP-13345 is merged - we should file a JIRA dependent on HADOOP-13345 for that when this is committed.
* There's an assertion in TestDynamoDBMetadataStore.java that isEmptyDirectory() is what we expect. Why is that removed? Is it a Boolean -> Tristate issue? If so, shouldn't we modify the logic to accept either the expected one or UNKNOWN?
* +1 to ignoring the findbugs issue that way.
Tests are running now - almost done and no related failures, will report back if the end result is otherwise.

> s3guard: improve S3AFileStatus#isEmptyDirectory handling
> --------------------------------------------------------
>
>                 Key: HADOOP-13914
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13914
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>         Attachments: HADOOP-13914-HADOOP-13345.000.patch, HADOOP-13914-HADOOP-13345.002.patch, HADOOP-13914-HADOOP-13345.003.patch, s3guard-empty-dirs.md, test-only-HADOOP-13914.patch
>
>
> As discussed in HADOOP-13449, proper support for the isEmptyDirectory() flag stored in S3AFileStatus is missing from DynamoDBMetadataStore.
> The approach taken by LocalMetadataStore is not suitable for the DynamoDB implementation, and also sacrifices good code separation to minimize S3AFileSystem changes pre-merge to trunk.
> I will attach a design doc that attempts to clearly explain the problem and preferred solution.  I suggest we do this work after merging the HADOOP-13345 branch to trunk, but am open to suggestions.
> I can also attach a patch of a integration test that exercises the missing case and demonstrates a failure with DynamoDBMetadataStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org