You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2021/03/02 15:29:08 UTC

[GitHub] [hadoop] steveloughran commented on pull request #2735: HADOOP-11452 make rename/3 public

steveloughran commented on pull request #2735:
URL: https://github.com/apache/hadoop/pull/2735#issuecomment-788992374


   ## full commit logs of squashed commits
   
   ---------
   
   PR with the previous patches.
   
   This is a WiP but ready for some iniital review; lacks tests & spec.
   
   Also, because the base FileSystem.rename/3 does its own src/dest checks, it's less efficient against object stores. They need their own high-perf subclass.
   
   Change-Id: I1586ef2290d7a3d2d33b1a32e2f0999b07c26143
   
   HADOOP-11452 rename/3 has tests for local fs, raw local and hdfs.
   
   + updates docs
   + contains fix for HADOOP-16255 : checksum FS doesn't do rename/3 properly
   
   tests are based on those of rename; there's still some stuff about renaming into an empty directory which makes me suspect there's ambiguity there
   
   Change-Id: Ic1ca6cbe3e9ca80ab8e1459167a7012678e856fc
   
   HADOOP-11452 rename(path, path, options) to become public
   
   * exceptions match expectations of FSMainOperationsBaseTest
   * clean up FSMainOperationsBaseTest by moving to intercept, try with resources.
   * S3A to implement subclass of FSMainOperationsBaseTest, ITestS3AFSMainOperations
   * S3A to implement ITestS3AContractRenameEx
   * Add protected override point, executeInnerRename,  for implementing rename/3; base class calls rename() and throws if that returns false, i.e. current behaviour
   * S3A overrides executeInnerRename to call its own innerRename() and so raise all failures as IOEs.
   
   Issues: is the rename point executeInnerRename() a good name?
   
   S3AFS should really implement a direct rename/3 so there's no duplication of checks for parent etc, maybe even passing
   the values down.  Or we make sure innerRename() is consistent with the spec, which primarily means logic about dest dir existing.
   
   Change-Id: I0ac3695434d85072ab860854e5e88bc6d36e754a
   
   HADOOP-11452 trying to move rename/3 logic into its own class
   
   Change-Id: If2ab67152e08e4c2a225f6e89c24a5d1ff79ee59
   
   HADOOP-15183:  S3AFileSystem does rename/3
   
   Factored the rename check logic out into a RenameHelper which is then used in S3A FileSystem as  the PoC of how it can use the RenameHelper then directly invoke the inner operations. Added some more @Retry attributes as appropriate.
   
   Conclusion: it works, but for efficient IOPS then `innerRename()` needs to take optional source and dest status values and so so omit repeating the checks.
   
   For more work on this
   
   * the tests; file context has some so review and add to AbstractContractRenameTest. Also, based on some feedback from Sean Mackrory: verify the renamed files actually have the data.
   * move internal use of rename (distcp, maprv2 FileOutputcommitter), etc to use this.
   
   of the 50+ places which call rename, they seem split 3-ways int
   
   1. subclasses and forwarding
   2. invocations whch check the return value and throw an IOE
   3. invocations which are written on the assumption that renames raise exceptions on failure
   
   2 & 3 are the ones to change.
   
   Change-Id: Id77ed7352b9d5ddb124f9191c5c5f1b8a76da7bb
   
   HADOOP-11452. Rename
   
   Review of RenameHelper based on current coding styles and
   plans (IOStats, etc)
   
   Change-Id: I3d39ee3ed04a10e7db2c2b2c79833b945b4d691b
   
   HADOOP-11452 Rename/3
   
   S3A high performance edition.
   
   This avoids all surplus S3 calls and has meaningful exception
   raising.
   
   TODO:
   * pull the S3A code out into is own operation + extra callbacks
     (innerGetFileStatus is all that's needed)
   * see if the FileContext default logic can be pulled out too, using
     a custom set of callbacks. If it can't the logic is broken.
   * do some testing
   
   Change-Id: I408b2cfe93f266cf0c9084fa8f05bb84b65c2bad
   
   HADOOP-11452 Rename/3
   
   * Add RawLocalFileSystem rename with useful errors
   * pull out all rename docs into their own filesystem.md doc
   * Add a callback impl which => FileContext too, at least for
     the nonatomic rename. FC doesn't do path name checking. Should we?
   
   Proposed changes
   
   * move the new interfaces up to o.a.h.fs, so that .impl is never
     imported in FileSystem APIS.
   * remove the createRename callbacks method, just have stores
     with implement rename/3 other than the base FS to override all of
     rename 3.
   
   Change-Id: I1fab598553b8e9de4d659b80248bac440dbac018
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org