You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2019/05/29 17:49:00 UTC
[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

    [ https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851113#comment-16851113 ] 

Steve Loughran commented on HADOOP-13980:
-----------------------------------------

My thoughts on the topic. I managed to get a DDB table inconsistent today: fun. If you have a tombstone marker but child entries in the DDB underneath, an rm(dir) succeeds (no directory), but exists of the child underneath is still true

h1. What would an {{fsck}} operation against S3Guard actually do?

The goal of an {{fsck}} operation is to 

# verify that the DDB table is internally consistent
# verify that the DDB table is consistent with the store.


h2. Definitions of inconsistency

h3. Both modes

* All entries in DDB other than the root entry have a parent entry.
* Except in the special case that the object store itself is in this state: no entries have a parent which is a file.
* Every file entry has a directory entry as a parent, or it is in the root directory.
* Every directory entry has a directory entry as a parent, or it is in the root directory.


h3. Auth mode

* No file exists under a path for which there is a tombstone entry.
* Every directory entry has a directory (getFileStatus() path => isDirectory)
* Every file entry has a path in the S3 store where the size, versionId and etag match.

h3. Non-auth mode

* Where a path has a tombstone marker, there is no file which exists. If so, update DDB
* Where a path has an entry, the file exists. If not, update DDB
* Where a path points to a file, size and any etag and version ID matches that of the object. 



h2, Assumptions: 

* the S3 bucketstore is consistent w.r.t any changes and is not actively being updated.
* No other fsck of the S3 bucket+matching DDB table is in progress.
* other S3 bucket may share the same DDB table, and may be in active use.
* we don't care about throttling/conflict with other users, even in same table (i.e. its on demand or isolated).
* we aren't trying to optimise for minimising throttling with S3, though performance always matters.
* we have lots of memory in a local process to cache information like a directory listing.
* The store could have many millions of entries.
* There are more files than directories.
* Nobody wants to address inconsistencies by deleting data.
* Some of the entries in the table lack etags and versions.

h1. Operations we could do

h2. DDB internal consistency

* verify that the DDB table is a consistent tree: that there are no orphan entries or entries under a deleted directory
* optionally: create the parent entries, including overwriting tombstones.

This will fix consistency within the table, without making any assertions about the consistency with the store


h2. S3Guard to S3 consistency

* all files in S3Guard exist in S3
* if an entry lists an etag, that matches the real status
* if an entry includes a version, a file with that versionID exists
* if an entry is a tombstone marker, there is no S3 file at that path

h2. S3 to S3Guard consistency

* All entries in S3 exist in the DDB table. This is only valid in auth mode, as in non-auth mode it is moot.
* There are no entries in S3 for which the DDB entry is a tombstone marker.

h1. Actions on success

* Collect the entire listing and export it as: CSV, XML, Avro, JSON

h1. Actions on failure

h2. All

* Collect the entire listing and export it as: CSV, XML, Avro, JSON
* Generate report on the problem.
* Fail if there is an inconsistency

h2. DDB internal consistency
* DDB internal consistency: add missing parent entries, replace tombstones with entries.
* Purge all tombstones irrespective of age

h2. S3Guard to S3 consistency

* If a file does not exist, delete that entry.
* If a directory does not exist, delete that entry.
* Update versionID, etag and size with any new values.
* Log files deleted, updated.

h2. S3 to S3Guard consistency

* Add any new files with parents. This is the import operation.
* Log/record files added.


h1. How to implement (efficiently)


h2. DDB consistency

Requirement: every item is either root or has a parent directory which has not been deleted.

It is not enough to list the children off root and then recurse down the tree, because that will not find orphan entries (though it will find file-under-file and file-under-tombstone errors). 

A breadth first search where we cache all entries in the parent level would work, if we can create queries for parent paths like "/*/*". 

# list all entries at a depth, {{d}}.
# verify that all entries have a parent in the directory list of depth {{d-1}}.
# add all directory entries to the directory list of depth {{d}}
# repeat until there is a listing with no child entries
# then do a final scan of all children of arbitrary depth under that level (these are implicitly orphan)

It's too expensive to do an S3 HEAD for every entry in the DDB table; and too slow. 

Better to do a bulk list under a path and compare that with DDB, building up a list of parent directories to also look for.
That list is used to skip rechecking for parents we know were already probed for (that list would also track the found/not found state)

h2. S3A => S3Guard consistency

# List all of S3A recursively
# For each page of entries returned, do a ddb GET to retrieve the DDB entry.
# Verify/Update (size, etag, version, deleted flag)
# Add entry to Set of known paths 
# If entry was absent or deleted, add parent dir to set of parents to validate/create

Finally: iterate through that set of parent directories. For each entry whose parent is not also in the set (Think about it), do a get of the parents

h2. S3Guard => S3 Consistency

*are the entries in S3Guard which don't match S3*

This goes the other way from S3A ==> S3Guard, except S3a GET is expensive enough we don't want to do it. 

Better to: list a subset of the DDB table and a subset of the S3A bucket and compare. If a record was kept of all S3 entries which had been found in the S3A => S3Guard scan, then you only need to look for missing entries. 





> S3Guard CLI: Add fsck check command
> -----------------------------------
>
>                 Key: HADOOP-13980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13980
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Aaron Fabbri
>            Assignee: Gabor Bota
>            Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which compares S3 with MetadataStore, and returns a failure status if any invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org