You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Aaron Fabbri (JIRA)" <ji...@apache.org> on 2019/06/05 20:55:00 UTC

[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

    [ https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857029#comment-16857029 ] 

Aaron Fabbri commented on HADOOP-13980:
---------------------------------------

Thanks for your draft of FSCK requirements [~stevel@apache.org]. This is a good start.

One thing that comes to mind: I don't know that we want to consider "auth mode" as a factor here.  Erring on the side of over-explaining this stuff for clarity:

There are two main authoritative mode flags in play:

(1) per-directory metastore bit that says "this directory is fully loaded into the metastore"

(2) s3a client config bit fs.s3a.metadatastore.authoritative, which allows s3a to short-circuit (skip) s3 on some metadata queries. This one is just a runtime client behavior flag. You could have multiple clients with different settings sharing a bucket. FSCK could also have a different config.  I think you'll still want some FSCK options to select the level of enforcement / paranoia as you outline, just don't think it needs to be conflated with client's allow auth flag. I'd imagine this as a growing set of invariant checks that can be categorized into something like basic / paranoid / full.

Whether or not a s3a client has metadatastore.authoritative bit set in its config doesn't really affect the contents of the metadata store or its relationship to the underlying storage (s3) state**.  If the is_authoritative bit is set on a directory in the metastore, however, that directory listing from metadatastore should *match* the listing of that dir from s3. If the bit is not set, the metastore listing should be a subset of the s3 listing.

I would also split the consistency checks into two categories: MetadataStore-specific, and generic. Majority of the stuff here are generic tests that work with any MetadataStore. DDB also needs to check its internal consistency (since it uses the ancestor-exists invariant to avoid table scans).

Also agreed you'll need table scans here–but how do we expose this for FSCK only? FSCK traditionally reaches below the FS to check its structures. (e.g. ext3 fsck uses a block device below the ext3 fs to check on disk format, right?).

 

** some nuance here, if we want to discuss further.

> S3Guard CLI: Add fsck check command
> -----------------------------------
>
>                 Key: HADOOP-13980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13980
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Aaron Fabbri
>            Assignee: Gabor Bota
>            Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which compares S3 with MetadataStore, and returns a failure status if any invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org