You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2022/10/04 16:56:00 UTC

[jira] [Commented] (HADOOP-18420) Optimise S3A’s recursive delete to drop successful S3 keys on retry of S3 DeleteObjects

    [ https://issues.apache.org/jira/browse/HADOOP-18420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612686#comment-17612686 ] 

Steve Loughran commented on HADOOP-18420:
-----------------------------------------

I think when deleting fake dirs we should also use an Invoke.once() over retrying
* it's only cleanup
* retained markers aren't a problem for the recent clients
* maybe permissions problems are the cause of this

> Optimise S3A’s recursive delete to drop successful S3 keys on retry of S3 DeleteObjects
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18420
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18420
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Daniel Carl Jones
>            Priority: Major
>
> S3A users with large filesystems performing renames or deletes can run into throttling when S3A performs a bulk delete on keys. These are currently batches of 250 ([https://github.com/apache/hadoop/blob/c1d82cd95e375410cb0dffc2931063d48687386f/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java#L319-L323]).
> When the bulk delete ([S3 DeleteObjects|https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html]) fails, it provides a list of keys that failed and why. Today, S3A recovers from throttles by sending the DeleteObjects request again with no change. This can result in additional deletes and counts towards throttling limits.
> Instead, S3A should retry only the keys that failed, limiting the number of mutations against the S3 bucket, and hopefully mitigate errors when deleting a large number of objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org