You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2019/10/10 13:11:24 UTC

[GitHub] [hadoop] steveloughran commented on issue #1601: HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD.

steveloughran commented on issue #1601: HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD.
URL: https://github.com/apache/hadoop/pull/1601#issuecomment-540571728
 
 
   Sid, thanks for the comments, will review/update the patch
   
   Interesting point about the double list. This code path is how its always been, presumably descended from the s3n code. LIST is slower, costs more and much more prone to eventual consistency, which are all good arguments for HEAD first.
   
   I actually plan to tune some of the calls which always seem to get used on directory walks (listStatus, listFiles, listLocatedStatus) to do the subtree list first, and only go for the HEAD calls if they don't find any children. This is to reduce the cost of treewalks where the bias is towards populated directories

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org