You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2019/08/22 14:21:08 UTC
[GitHub] [hadoop] steveloughran commented on issue #1208: HADOOP-16423.
S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
steveloughran commented on issue #1208: HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log)
URL: https://github.com/apache/hadoop/pull/1208#issuecomment-523927611
## Overall
there are checks, but as it doesnt recurse from me it's hard to valide them.
The UX can be improved. I propose:
* for successful entries, print their details as they are processed, such as length and etag.
* failure to initialize the fs to include the error.
* print the total duration of the check, number of entries scanned.
## operations
failed on root entry.
```
bin/hadoop s3guard fsck -check s3a://guarded-table/
2019-08-22 14:17:37,973 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized.
== Path: s3a://guarded-table/
2019-08-22 14:17:38,160 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(220)) - Entry is in the root, so there's no parent
== Path: s3a://guarded-table/example
2019-08-22 14:17:38,189 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) -
On path: s3a://guarded-table/
No etag.
```
with a path, I got the same message twice
```
bin/hadoop s3guard fsck -check s3a://guarded-table/example
2019-08-22 14:19:26,674 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized.
== Path: s3a://guarded-table/example
== Path: s3a://guarded-table/example
```
missing file.
```
bin/hadoop s3guard fsck -check s3a://guarded-table/example/missing
2019-08-22 14:21:23,252 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized.
java.io.FileNotFoundException: No such file or directory: s3a://guarded-table/example/missing
2019-08-22 14:21:23,404 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 44: java.io.FileNotFoundException: No such file or directory: s3a://guarded-table/example/missing
```
This is good. Add a test for it.
s3a://bucket/.. This is bad. Add test and then fix.
```
bin/hadoop s3guard fsck -check s3a://guarded-table/..
2019-08-22 14:23:14,640 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-2, tableName=guarded-table, tableArn=arn:aws:dynamodb:eu-west-2:980678866538:table/guarded-table} is initialized.
org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3a://guarded-table/..: com.amazonaws.services.s3.model.AmazonS3Exception: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null:400 Invalid URI: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:237)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:164)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2732)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2694)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2587)
at org.apache.hadoop.fs.s3a.s3guard.S3GuardFsck.compareS3RootToMs(S3GuardFsck.java:94)
at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$Fsck.run(S3GuardTool.java:1560)
at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:402)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:1759)
at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.main(S3GuardTool.java:1768)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4920)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4866)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1320)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$5(S3AFileSystem.java:1623)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:406)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:369)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1617)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1593)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2715)
... 8 more
2019-08-22 14:23:14,677 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status -1: org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3a://guarded-table/..: com.amazonaws.services.s3.model.AmazonS3Exception: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null:400 Invalid URI: Invalid URI (Service: Amazon S3; Status Code: 400; Error Code: 400 Invalid URI; Request ID: null; S3 Extended Request ID: null)
```
a check of s3a://guarded-table/example/.. shows qualification is taking place; it is the root dir where things break.
unknown bucket. I'd like the failure message, and no need for the usage string as it is unrelated
```
bin/hadoop s3guard fsck -check s3a://hwdev-ireland-new/
Failed to initialize S3AFileSystem from path: s3a://hwdev-ireland-new/
fsck [OPTIONS] [s3a://BUCKET]
Compares S3 with MetadataStore, and returns a failure status if any rules or invariants are violated. Only works with DynamoDbMetadataStore.
Common options:
check Check the metadata store for errors, but do not fix any issues.
```
* same for `bin/hadoop s3guard fsck -check file://`
For a missing path in the valid fs, a failure. ()
```
bin/hadoop s3guard fsck -check s3a:///hwdev-steve-ireland-new/etc/something
Failed to initialize S3AFileSystem from path: s3a:///hwdev-steve-ireland-new/etc/something
fsck [OPTIONS] [s3a://BUCKET]
Compares S3 with MetadataStore, and returns a failure status if any rules or invariants are violated. Only works with DynamoDbMetadataStore.
Common options:
check Check the metadata store for errors, but do not fix any issues.
2019-08-22 14:42:14,007 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status -1:
```
I'd expect a check to flag the file is missing in s3guard and s3, and if there's a tombstone in s3 to catch that and consider it a check failure.
now, prepared a dir with `bin/hadoop fs -copyFromLocal -t 8 etc s3a://hwdev-steve-ireland-new/`
to create data
```
bin/hadoop fs -ls -R s3a://hwdev-steve-ireland-new/
drwxrwxrwx - stevel stevel 0 2019-08-22 14:34 s3a://hwdev-steve-ireland-new/etc
drwxrwxrwx - stevel stevel 0 2019-08-22 14:34 s3a://hwdev-steve-ireland-new/etc/hadoop
-rw-rw-rw- 1 stevel stevel 1351 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
-rw-rw-rw- 1 stevel stevel 3999 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hadoop-env.cmd
-rw-rw-rw- 1 stevel stevel 118 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/tokens-exclude-aws-secrets.xml
-rw-rw-rw- 1 stevel stevel 620 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/httpfs-site.xml
-rw-rw-rw- 1 stevel stevel 3823 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/log4j.properties~
-rw-rw-rw- 1 stevel stevel 2316 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/ssl-client.xml.example
-rw-rw-rw- 1 stevel stevel 6113 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/yarn-env.sh
-rw-rw-rw- 1 stevel stevel 11765 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hadoop-policy.xml
-rw-rw-rw- 1 stevel stevel 3321 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hadoop-metrics2.properties
-rw-rw-rw- 1 stevel stevel 3414 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hadoop-user-functions.sh.example
-rw-rw-rw- 1 stevel stevel 10 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/workers
drwxrwxrwx - stevel stevel 0 2019-08-22 14:34 s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d
-rw-rw-rw- 1 stevel stevel 3880 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/shellprofile.d/example.sh
-rw-rw-rw- 1 stevel stevel 951 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/mapred-env.cmd
-rw-rw-rw- 1 stevel stevel 682 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/kms-site.xml
-rw-rw-rw- 1 stevel stevel 683 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hdfs-rbf-site.xml
-rw-rw-rw- 1 stevel stevel 775 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/hdfs-site.xml
-rw-rw-rw- 1 stevel stevel 2393 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/container-executor.cfg
-rw-rw-rw- 1 stevel stevel 1860 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/kms-log4j.properties
-rw-rw-rw- 1 stevel stevel 1867 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/core-site.xml
-rw-rw-rw- 1 stevel stevel 1335 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/configuration.xsl
-rw-rw-rw- 1 stevel stevel 2697 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/ssl-server.xml.example
-rw-rw-rw- 1 stevel stevel 758 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/mapred-site.xml
-rw-rw-rw- 1 stevel stevel 1484 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/httpfs-env.sh
-rw-rw-rw- 1 stevel stevel 8260 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/capacity-scheduler.xml
-rw-rw-rw- 1 stevel stevel 6858 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/log4j.properties
-rw-rw-rw- 1 stevel stevel 2681 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/user_ec_policies.xml.template
-rw-rw-rw- 1 stevel stevel 2250 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/yarn-env.cmd
-rw-rw-rw- 1 stevel stevel 2591 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/yarnservice-log4j.properties
-rw-rw-rw- 1 stevel stevel 1657 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/httpfs-log4j.properties
-rw-rw-rw- 1 stevel stevel 1764 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/mapred-env.sh
-rw-rw-rw- 1 stevel stevel 3518 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/kms-acls.xml
-rw-rw-rw- 1 stevel stevel 690 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/yarn-site.xml
-rw-rw-rw- 1 stevel stevel 4113 2019-08-22 14:28 s3a://hwdev-steve-ireland-new/etc/hadoop/mapred-queues.xml.template
-rw-rw-rw- 1 stevel stevel 16948 2019-08-22 14:29 s3a://hwdev-steve-ireland-new/etc/hadoop/hadoop-env.sh
```
on both root and etc/ it failed with a message about etags. these are directories. etags should not be needed.
```
bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/
2019-08-22 14:34:25,915 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
== Path: s3a://hwdev-steve-ireland-new/
2019-08-22 14:34:26,615 [main] INFO s3guard.S3GuardFsck (S3GuardFsck.java:compareFileStatusToPathMetadata(220)) - Entry is in the root, so there's no parent
2019-08-22 14:34:26,623 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) -
On path: s3a://hwdev-steve-ireland-new/
No etag.
~/P/R/fsck bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc
2019-08-22 14:34:39,682 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
== Path: s3a://hwdev-steve-ireland-new/etc
2019-08-22 14:34:40,140 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) -
On path: s3a://hwdev-steve-ireland-new/etc
No etag.
```
Valid file is good. I think we can/should print size, timestamp, etag and version, for more info
```
bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 14:45:25,752 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
```
now purge the ddb table and repeat
Prune *didn't work*. this looks like a prune problem as it does the same for tombstones and DDB shows a lot of them. And I'd have expected to see some debug level logging.
```
bin/hadoop s3guard prune -seconds 0 s3a://hwdev-steve-ireland-new/
2019-08-22 14:47:56,314 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
2019-08-22 14:47:56,341 [main] INFO s3guard.DynamoDBMetadataStore (DurationInfo.java:<init>(72)) - Starting: Pruning DynamoDB Store
2019-08-22 14:47:56,395 [main] INFO s3guard.DynamoDBMetadataStore (DurationInfo.java:close(87)) - Pruning DynamoDB Store: duration 0:00.054s
2019-08-22 14:47:56,395 [main] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1576)) - Finished pruning 0 items in batches of 25
```
Manually delete from the AWS console and fsck the file
```
bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 15:04:11,635 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 15:04:11,951 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) -
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
No PathMetadata for this path in the MS.
2019-08-22 15:04:11,951 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) -
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
No PathMetadata for this path in the MS.
```
1. yes, it found the problem
1. but it reported/found it twice.
1. and didn't cover what was in S3 (e.g. S3 contains a file of size...)
recover into ddb then retry. now fails on version id mismatch
```
bin/hadoop fs -ls -R s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 15:08:28,868 [main] DEBUG s3guard.Operations (DynamoDBMetadataStore.java:logPut(2403)) - #(Put-0001) TOMBSTONE s3a:///hwdev-steve-ireland-new/etc
2019-08-22 15:08:28,870 [main] DEBUG s3guard.Operations (DynamoDBMetadataStore.java:logPut(2403)) - #(Put-0001) TOMBSTONE s3a:///hwdev-steve-ireland-new/etc/hadoop
2019-08-22 15:08:28,870 [main] DEBUG s3guard.Operations (DynamoDBMetadataStore.java:logPut(2403)) - #(Put-0001) TOMBSTONE s3a:///hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
-rw-rw-rw- 1 stevel stevel 1351 2019-08-22 14:39 s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
~/P/R/fsck bin/hadoop s3guard fsck -check s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 15:08:49,920 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(322)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized.
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
== Path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
2019-08-22 15:08:50,222 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) -
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
getVersionId mismatch - s3: Q1Czkv5AjxTbDE9Frv6sjexrulQsNvde, ms: Q1Czkv5AjxTbDE9Frv6sjexrulQsNvde
2019-08-22 15:08:50,223 [main] ERROR s3guard.S3GuardFsckViolationHandler (S3GuardFsckViolationHandler.java:handle(76)) -
On path: s3a://hwdev-steve-ireland-new/etc/hadoop/kms-env.sh
getVersionId mismatch - s3: Q1Czkv5AjxTbDE9Frv6sjexrulQsNvde, ms: Q1Czkv5AjxTbDE9Frv6sjexrulQsNvde
```
the two IDs match; I don't know what the problem is. And if it was a real failure, I'd like to know the rest of the details about the entry.
looking at DDB, this is the first table entry with a versionId. Maybe the check is wrong.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org