You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2019/06/12 09:04:52 UTC

[GitHub] [hadoop] steveloughran opened a new pull request #951: HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename

steveloughran opened a new pull request #951: HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/951
 
 
   
   Contributed by Steve Loughran.
   
   This is the squashed patch of PR #843 commit 115fb770
   
   Contains
   
   * HADOOP-13936. S3Guard: DynamoDB can go out of sync with S3AFileSystem.delete()
   
   * HADOOP-15604. Bulk commits of S3A MPUs place needless excessive load on S3 & S3Guard
   
   * HADOOP-15658. Memory leak in S3AOutputStream
   
   * HADOOP-16364. S3Guard table destroy to map IllegalArgumentExceptions to IOEs]
   
   This work adds to the S3Guard Metastore APIs
   
   * the notion of a "BulkOperation" : A store-specific class which is requested before initiating bulk work (put, purge, rename) and which then can be used to cache table changes performed during the bulk operation. This allows for renames and commit operations to avoid duplicate creation of parent entries in the tree: the store can track what is already created/found.
   
   * the notion of a "RenameTracker" which factors out the task of updating a metastore with changes to the filesystem during a rename, (files added + deleted) and after the completion of the operation, successful or not.
   
   The original rename update -the one which failed to update the store until the end of the rename is implemented as the DelayedUpdateRenameTracker, while a new ProgressiveRenameTracker updates the sttore as individual files are copied and when bulk deletes complete. To avoid performance problems, stores mut provide a BulkOperation implementation which remembers ancestors added. The DynamoDBMetastore does this.
   
   Some of the new features are implemented as part of a gradual refactoring of the S3AFileSystem itself: the handling of partial delete failures is in its own class org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport which, rather than being given a reference back to the owning S3AFileSystem, is handed a StoreContext which contains restriced attributes and callback. As this refactoring continues in future patches, and the different layers of a new store model factored out, this will be extended.
   
   Change-Id: Ie0bd96ab861f0f30170b75f78e5503fc0e929524

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org