You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2016/05/01 13:19:13 UTC

[jira] [Updated] (HADOOP-11572) s3a delete() operation fails during a concurrent delete of child entries

     [ https://issues.apache.org/jira/browse/HADOOP-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated HADOOP-11572:
------------------------------------
    Attachment: HADOOP-11572-001.patch

Patch 001

this is just a bit of code I started to write in HADOOP-13028; I've pulled out as it was getting too complex to work through within that patch.

This code
# catches the multidelete exception
# extracts the list that failed
# and tries them as part of a one-by-one deletion.

I've realised this isn't quite the right approach, as it assumes that the failures are transient and so that individual retries may work. Furthermore, there's no handling of failures in the one-by-one patch.

h3. A core way that a key delete can fail is for the file to be already deleted. 

Retrying doesn't help —and anyway, we don't need to retry, as the system is in the desired state.

What is needed, then, is

# the failure cause of single delete or elements in a multi-delete be examined.
# if this is a not found failure: upgrade to success.
# if it is some other failure, we need to consider what to do? Something like a permissions failure probably merits uprating. What other failures can arise?

It's critical we add tests for this, as that's the only way to understand what AWS S3 will really send back.

My draft test plan here is

# make that removeKeys call protected/package private
# subclass s3a to one which overrides removeKeys(), and, prior to calling the superclass, deletes one or more of the keys from s3. This will trigger failures  
# make sure the removekeys succeeds
# test with the FS.rename() and fs.delete() operations on both multikey and single key removal options.
Alongside this: try to do a delete of a read only bucket, like the amazon landsat data.

Like I said: more complex

> s3a delete() operation fails during a concurrent delete of child entries
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-11572
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11572
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.6.0
>            Reporter: Steve Loughran
>         Attachments: HADOOP-11572-001.patch
>
>
> Reviewing the code, s3a has the problem raised in HADOOP-6688: deletion of a child entry during a recursive directory delete is propagated as an exception, rather than ignored as a detail which idempotent operations should just ignore.
> the exception should be caught and, if a file not found problem, logged rather than propagated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org