You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2020/01/29 18:49:00 UTC

[jira] [Commented] (HADOOP-16823) Manage S3 Throttling exclusively in S3A client

    [ https://issues.apache.org/jira/browse/HADOOP-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026111#comment-17026111 ] 

Steve Loughran commented on HADOOP-16823:
-----------------------------------------

between prepaid IO and load; ITestDynamoDBMetadataStoreScale is the example of this. 


special callout for test setup

{code}
[ERROR] test_070_putDirMarker(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)  Time elapsed: 314.323 s  <<< ERROR!
org.apache.hadoop.fs.s3a.AWSServiceThrottledException: getVersionMarkerItem on ../VERSION: com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: LUKART1RQBVKV0T7BPURUN95QVVV4KQNSO5AEMVJF66Q9ASUAAJG): The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: LUKART1RQBVKV0T7BPURUN95QVVV4KQNSO5AEMVJF66Q9ASUAAJG)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.createMetadataStore(ITestDynamoDBMetadataStoreScale.java:152)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.setup(ITestDynamoDBMetadataStoreScale.java:162)
Caused by: com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: LUKART1RQBVKV0T7BPURUN95QVVV4KQNSO5AEMVJF66Q9ASUAAJG)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.createMetadataStore(ITestDynamoDBMetadataStoreScale.java:152)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.setup(ITestDynamoDBMetadataStoreScale.java:162)
{code}

Note how long the retry count was there. We were backing off big but it still failed on us.


Looking at the AWS metrics, part of the fun is that the way bursty traffic is handled, you may get your capacity at the time of the initial load, but get blocked after. That is: the throttling may not happen under load, but during the next time a low-load API call is made.


Also, S3GuardTableAccess isn't retrying, and some code in tests and the purge/dump table entry points go on to fail when throttling happens when iterating through scans. Fix: you can ask a DDBMetastore to wrap your scan with one bonded to its retry and metrics...plus use of this where appropriate.

ITestDynamoDBMetadataStoreScale is really slow; either the changes make it worse, or its always been really slow and we haven't noticed as it was happening during the (slow) parallel test runs. Proposed: we review it, look at what we want to show and then see if we can make things fail faster

> Manage S3 Throttling exclusively in S3A client
> ----------------------------------------------
>
>                 Key: HADOOP-16823
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16823
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>
> Currently AWS S3 throttling is initially handled in the AWS SDK, only reaching the S3 client code after it has given up.
> This means we don't always directly observe when throttling is taking place.
> Proposed:
> * disable throttling retries in the AWS client library
> * add a quantile for the S3 throttle events, as DDB has
> * isolate counters of s3 and DDB throttle events to classify issues better
> Because we are taking over the AWS retries, we will need to expand the initial delay en retries and the number of retries we should support before giving up.
> Also: should we log throttling events? It could be useful but there is a risk of logs overloading especially if many threads in the same process were triggering the problem.
> Proposed: log at debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org