You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Akira Ajisaka (Jira)" <ji...@apache.org> on 2019/09/10 04:39:00 UTC
[jira] [Resolved] (HADOOP-16430) S3AFilesystem.delete to
incrementally update s3guard with deletions
[ https://issues.apache.org/jira/browse/HADOOP-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Akira Ajisaka resolved HADOOP-16430.
------------------------------------
Resolution: Fixed
> S3AFilesystem.delete to incrementally update s3guard with deletions
> -------------------------------------------------------------------
>
> Key: HADOOP-16430
> URL: https://issues.apache.org/jira/browse/HADOOP-16430
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.2.0, 3.3.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Fix For: 3.3.0
>
> Attachments: Screenshot 2019-07-16 at 22.08.31.png
>
>
> Currently S3AFilesystem.delete() only updates the delete at the end of a paged delete operation. This makes it slow when there are many thousands of files to delete ,and increases the window of vulnerability to failures
> Preferred
> * after every bulk DELETE call is issued to S3, queue the (async) delete of all entries in that post.
> * at the end of the delete, await the completion of these operations.
> * inside S3AFS, also do the delete across threads, so that different HTTPS connections can be used.
> This should maximise DDB throughput against tables which aren't IO limited.
> When executed against small IOP limited tables, the parallel DDB DELETE batches will trigger a lot of throttling events; we should make sure these aren't going to trigger failures
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org