You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/10/08 18:04:00 UTC
[jira] [Commented] (HADOOP-13936) S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation

    [ https://issues.apache.org/jira/browse/HADOOP-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642237#comment-16642237 ] 

Steve Loughran commented on HADOOP-13936:
-----------------------------------------

Reviewing this with a goal of fixing.

Options
# as innerDelete/innerRename delete objects, send the deletes down in batches *maybe in their own thread*
# make sure all that cleanup is done in a finally clause, and hope the actual execution never fails (which is really the problem we are trying to address)
# Have the metastore take a delete set of files, knowing that it is part of a larger bulk rename or delete operation, so giving it the option of being clever.

I'm thinking of option 3, from the metastore initiating some multi-object operation (delete? rename?) and getting a context object back which  they will update as they go along, then finally call complete() on.

{code}
bulkDelete = s3guard.initiateBulkDelete(path)
//..iterate through listings, with every batch of deletes
bulkDeletes.deleted(List<Path>)
//and then finally:
bulkDeletes.complete()
{code}
Naiive implementation: ignore the deleted() ops and do what happens today in complete(): delete the tree.

Clever implementation on each deleted() batch, kick off the deletion of those objects (wrapped in a duration log), in the complete() call do a final cleanup treewalk tp get rid of parent entries.

The move operation would be similiar, only as it does updates in batches, it could also track which parent directories had already been created across batches, so there'd be no replication of parent dir creation.

On the topic of batches, these updates could also be done in a (single) worker thread within S3AFileSystem, so that even throttled DDB operations wouldn't take up time which copy calls could take

(also while doing this: log duration of copies @ debug; print out duration & total effective bandwidth. These are things we need to know, and it'd give us a before/after benchmark of any changes



> S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-13936
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13936
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1, 3.1.0, 3.1.1
>            Reporter: Rajesh Balamohan
>            Assignee: Steve Loughran
>            Priority: Blocker
>
> As a part of {{S3AFileSystem.delete}} operation {{innerDelete}} is invoked, which deletes keys from S3 in batches (default is 1000). But DynamoDB is updated only at the end of this operation. This can cause issues when deleting large number of keys. 
> E.g, it is possible to get exception after deleting 1000 keys and in such cases dynamoDB would not be updated. This can cause DynamoDB to go out of sync. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org