You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2017/03/13 21:58:41 UTC

[jira] [Comment Edited] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

    [ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923030#comment-15923030 ] 

Steve Loughran edited comment on HADOOP-13786 at 3/13/17 9:58 PM:
------------------------------------------------------------------

patch 012:
* UUID reinstated by default; you can turn this off.
* mostly all the unit tests are passing, 
* the mocks in {{TestStagingPartitionedJobCommit}}, {{TestStagingDirectoryOutputCommitter}}  are having an unexpected delete in mock invocations (good? bad?), and because the staging committer doesn't yet handle directories, some of the protocol tests are failing.
* Added {{AbstractITCommitProtocol}} subclasses for the directory and partition committers
* Those in the protocol IT tests which are failing on the job/commit fail are failing as the tests aren't looking for the right exception.

One thing to highlight here is that when running these tests from my desktop, the staging commits seem to be faster than the magic ones. Why? A lot less S3 communication over a long haul link during task setup/commit, and there's no real data to upload, so the cost delaying the upload until the task commit is negligible.


was (Author: stevel@apache.org):
patch 012:  
* mostly all the tests are passing, 
* the mocks in {{TestStagingPartitionedJobCommit}}, {{TestStagingDirectoryOutputCommitter}}  are having an unexpected delete in mock invocations (good? bad?), and because the staging committer doesn't yet handle directories, some of the protocol tests are failing.

One thing to highlight here is that when running these tests from my desktop, the staging commits seem to be faster than the magic ones. Why? A lot less S3 communication over a long haul link during task setup/commit, and there's no real data to upload, so the cost delaying the upload until the task commit is negligible.

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch, HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch, HADOOP-13786-HADOOP-13345-009.patch, HADOOP-13786-HADOOP-13345-010.patch, HADOOP-13786-HADOOP-13345-011.patch, HADOOP-13786-HADOOP-13345-012.patch, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the presence of failures". Implement it, including whatever is needed to demonstrate the correctness of the algorithm. (that is, assuming that s3guard provides a consistent view of the presence/absence of blobs, show that we can commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output streams (ie. not visible until the close()), if we need to use that to allow us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org