You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2020/08/24 07:46:05 UTC

[GitHub] [hadoop] yangagile commented on pull request #2235: HDFS-15484 Add new method batchRename for DistributedFileSystem and W…

yangagile commented on pull request #2235:
URL: https://github.com/apache/hadoop/pull/2235#issuecomment-678966359


   > I've discussed some of what I'd like in https://issues.apache.org/jira/browse/HDFS-15484?focusedCommentId=17162752&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17162752
   > 
   > * matching hadoop common JIRA to make clear this is going near the FS APIs
   > * something in the fileysystem spec to cover the new interface, define its semantics, etc. In particular, we need all things which are "obvious" to be written down, because often it turns out they aren't obvious at all.
   > * tests which really set out to break things. Writing the spec will help you think of them
   > 
   > Some ideas for tests
   > 
   > * renaming root
   > * rename to root
   > * rename to self
   > * path under self
   > * path above self
   > * two sources to same dest
   > * chained rename
   > * swapping paths
   > 
   > API wise, this could be our chance to fix rename properly, that is: I should be able to use this for a single rename((src, dest), opts) and have it do what I want. And as discussed, I Want something which works well with object stores
   > 
   > * use a builder to let apps specify options (see openFile()) and use the same base builder classes
   > * Return a future of the outcome. If we can get the HADOOP-16830 IOStatistics patch in first, the outome returned can be declared as it something which implement IOStatisticSource. This matters to me, as I want to know the costs of rename operations.
   > 
   > I think we should also add a rename option about atomicity; for a single rename() this would be that the rename itself is atomic. For a batch with size > 1, this means "the entire set of renames are atomic".
   > 
   > FileContext and ViewFS will also need to pass this through. Sorry.
   > 
   > One thing we could do here is actually provide a base implementation which iterates through the list/array of (src, dest) paths. This would let us add a non-atomic implementation to all filesystems/filecontexts. That would be very nice as it really would let me switch to using this API wherever we used rename(), such as distcp and MR committers.
   > 
   > rename() is that the trickiest of all FS API calls to get right. I don't think we fully understand what right is. certainly if I was asked about the nuances (src = file, dest = dir) and (src = dir, dest=dir) I'm not confident I could give an answer which is consistent for both POSIX and HDFS. This is our opportunity to make some progress here!
   > 
   > I know, this is going to add more work. But it is time.
   
   Thanks @steveloughran for the detailed introduction.
   Yes, we should do the right things, and implemnt step by step.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org