You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2019/08/27 22:15:03 UTC

[GitHub] [hadoop] steveloughran opened a new pull request #1359: HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions

steveloughran opened a new pull request #1359: HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions
URL: https://github.com/apache/hadoop/pull/1359
 
 
   This started off as a performance improvement on very large directory trees: update the metastore table as pages of files are deleted.
   
   But while coding it I discovered that while delete() updated S3Guard with changes, *it never used it to get a consistency directory listing*
   
   That means recursive directory tree deletes could be inconsistent. This is bad. This PR
   
   * Moves the delete code into its own DeleteOperation alongside the rename one
   * Both of which now share the same callbacks for a limited set of methods they can invoke on the S3A FS
   * Uses the s3guard enabled list to scan the dir tree, skipping the use of tombstones to invalidate entries (so recovers from OOB changes)
   * in auth mode, does a final raw S3 scan and cleanup, so even if that is inconsistent, all is good
   
   Test changes as appropriate for debugging things, working out why InconsistentClient wasn't being inconsistent, etc, etc

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org