You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Sam Kramer (Jira)" <ji...@apache.org> on 2022/06/08 13:19:00 UTC
[jira] [Comment Edited] (HADOOP-18278) Do not perform a LIST call when creating a file

    [ https://issues.apache.org/jira/browse/HADOOP-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551616#comment-17551616 ] 

Sam Kramer edited comment on HADOOP-18278 at 6/8/22 1:18 PM:
-------------------------------------------------------------

Hey [~stevel@apache.org] ,

Thank you for your detailed comment, this is exactly what we need :) 

I gave a quick glance over your PR, and from my perspective this will do exactly what we need. Our workflow is a write-once, delete-once, read-many times, so we're able to make very strong assumptions (i.e. we do not need to check if file already exists, or if it's a directory). 

 

Edit – We also know what _is_ a directory, and what _isn't_ ahead of time.

 

Any ideas on when you think this may be committed / released? 

Cheers,

Sam


was (Author: JIRAUSER290621):
Hey [~stevel@apache.org] ,

Thank you for your detailed comment, this is exactly what we need :) 

I gave a quick glance over your PR, and from my perspective this will do exactly what we need. Our workflow is a write-once, delete-once, read-many times, so we're able to make very strong assumptions (i.e. we do not need to check if file already exists, or if it's a directory). 

Any ideas on when you think this may be committed / released? 

Cheers,

Sam

> Do not perform a LIST call when creating a file
> -----------------------------------------------
>
>                 Key: HADOOP-18278
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18278
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 3.3.3
>            Reporter: Sam Kramer
>            Priority: Minor
>
> Hello,
> We've noticed that when creating a file, which does not exist in S3, we see an extra LIST call gets issued to see if it's a directory (i.e. if key = "bar", it will issue an object list request for "bar/"). 
> Is this really necessary, shouldn't a HEAD request be sufficient to determine if it actually exists or not? As we're creating 1000s of files, this is quite expensive, as we're effectively doubling our costs for file creation. Curious if others have experienced similar or identical issues, or if there are any workarounds. 
> [https://github.com/apache/hadoop/blob/516a2a8e440378c868ddb02cb3ad14d0d879037f/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L3359-L3369]
>  
> Thanks,
> Sam



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org