You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2019/10/28 13:00:58 UTC

[GitHub] [hadoop] steveloughran commented on issue #1655: HADOOP-16629: support copyFile in s3afilesystem

steveloughran commented on issue #1655: HADOOP-16629: support copyFile in s3afilesystem
URL: https://github.com/apache/hadoop/pull/1655#issuecomment-546934788
 
 
   see #1679 for my proposal for a Filesystem API for multipart uploads. That is a draft implementation right now which lacks:
   
   1. FileContext support.
   1. updated specification.
   1. parent directory checks.
   
   I must highlight the `BulkOperationState` issue. A big part of speeding up rename/delete/commit in S3A was eliminating and needless duplicate checks of S3Guard state during the bulk operations -we did this by caching the ongoing state. If you plan to copy many files in the same operation, you will need this.
   
   If you look at how RenameOperation uses copyFile(); it is updating its RenameTracker to keep that operation state consistent; for the CommitOperations it also gets passed around by way of a CommitContext. copy (as the multipart upload) is going to have to do the same.
   
   I can help by rounding out the #1679 proposal with the tracking of that state to show you what to do, but
   
   * yes, you are going to have to design something which is designed to work across different stores
   * and be stable across time.
   * with spec and tests.
   
   I know HDFS is not above adding some private API to help out HBase and then, well nobody notices, pulling at that up into hadoop-common, but I don't like that see (HDFS-8631)[https://issues.apache.org/jira/browse/HDFS-8631?focusedCommentId=16961004&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16961004]. I do expect a fair amount of rigour here because we are going to be maintaining these APIs for decades. 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org