You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2022/06/10 14:19:00 UTC

[jira] [Comment Edited] (HADOOP-18285) S3a should retry when being throttled by STS (assumed roles)

    [ https://issues.apache.org/jira/browse/HADOOP-18285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552778#comment-17552778 ] 

Steve Loughran edited comment on HADOOP-18285 at 6/10/22 2:18 PM:
------------------------------------------------------------------

You must have been creating a lot of requests! We have a test (ILoadTestSessionCredentials) to see what happens in such a situation -I was was curious as nobody seemed to know what the actual maximum rate was and behaviour when overloaded wasn't defined.  my fear was that the entire account will get locked out for an extended period of time. (i tested on a saturday on a region nobody else in our test account used, just for safety)

Support for handling throttling events here would be welcome; ILoadTestSessionCredentials is the place to add the logic. We would want to wire up some IOStatistics collection here, which would tie in with the ongoing work to collect and aggregate those statistics across an entire job. It would also be interesting to think about auditing wire up so that the http referrer header collected in STS logs (they collect these, right?) include things like the job ID and principal. But as throttling should normally be very rare it is not necessarily critical here.

it is annoying that every aws Service seems to have a different way of reporting a throttling events. isThrottleException() needed explicit handling of dynamo DB events there alongside s3. That is where you will need to add the extra logic to recognise the event.

Patches welcome, we plan to branch off for a new release at the end of July. Do look at the S3a testing policy and know that we really mean it: no declaration of endpoint, no review.




was (Author: stevel@apache.org):
You must have been creating a lot of requests! We have a test (ILoadTestSessionCredentials) to see what happens in such a situation -I was was curious as nobody seemed to know what the actual maximum rate was and behaviour when overloaded wasn't defined. Is my fear was that the entire account will get locked out for an extended period of time.

Support for handling throttling events here would be welcome; ILoadTestSessionCredentials is the place to add the logic. We would want to wire up some IOStatistics collection here, which would tie in with the ongoing work to collect and aggregate those statistics across an entire job. It would also be interesting to think about auditing wire up so that the http referrer header collected in STS logs (they collect these, right?) include things like the job ID and principal. But as throttling should normally be very rare it is not necessarily critical here.

it is annoying that every aws Service seems to have a different way of reporting a throttling events. isThrottleException() needed explicit handling of dynamo DB events there alongside s3. That is where you will need to add the extra logic to recognise the event.

Patches welcome, we plan to branch off for a new release at the end of July. Do look at the S3a testing policy and know that we really mean it: no declaration of endpoint, no review.



> S3a should retry when being throttled by STS (assumed roles)
> ------------------------------------------------------------
>
>                 Key: HADOOP-18285
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18285
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 3.3.3
>            Reporter: André Kelpe
>            Priority: Major
>
> We ran into an issue where we were being throttled by AWS when reading from a bucket using the sts assume-role mechanism.
>  
> The stacktrace looks like this:
>  
> {code:java}
> Caused by: com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: Rate exceeded (Service: AWSSecurityTokenService; Status Code: 400; Error Code: Throttling; Request ID: 02f32511-418c-4b2a-96ef-2d7ba8dafab1; Proxy: null)    1654700598727
>         at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)    1654700598727
>         at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)    1654700598727
>         at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)    1654700598727
>         at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)    1654700598727
>         at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)    1654700598727
>         at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)    1654700598727
>         at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)    1654700598727
>         at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)    1654700598727
>         at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)    1654700598727
>         at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)    1654700598727
>         at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)    1654700598727
>         at com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke(AWSSecurityTokenServiceClient.java:1682)    1654700598727
>         at com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1649)    1654700598727
>         at com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1638)    1654700598727
>         at com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole(AWSSecurityTokenServiceClient.java:498)    1654700598727
>         at com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole(AWSSecurityTokenServiceClient.java:467)    1654700598727
>         at com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.newSession(STSAssumeRoleSessionCredentialsProvider.java:348)    1654700598727
>         at com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.access$000(STSAssumeRoleSessionCredentialsProvider.java:44)    1654700598727
>         at com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider$1.call(STSAssumeRoleSessionCredentialsProvider.java:93)    1654700598727
>         at com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider$1.call(STSAssumeRoleSessionCredentialsProvider.java:90)    1654700598727
>         at com.amazonaws.auth.RefreshableTask.refreshValue(RefreshableTask.java:295)    1654700598727
>         at com.amazonaws.auth.RefreshableTask.blockingRefresh(RefreshableTask.java:251)    1654700598727
>         at com.amazonaws.auth.RefreshableTask.getValue(RefreshableTask.java:192)    1654700598727
>         at com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.getCredentials(STSAssumeRoleSessionCredentialsProvider.java:320)    1654700598727{code}
> I read the code and from what I can see the Exception is being handled by S3AUtils here [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java#L240]
> It does not further inspect the message and assumes that the 400 is indeed a bad request. Because of this it gets handled as a {color:#24292f}AWSBadRequestException{color} which then will lead to the request to fail instead of retry in the S3ARetryPolicy.
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java#L215-L217]
>  
> A better approach seems to be to look at the sub-type and message of the original exception and handle it as a back-off and retry by throwing a different exception than {color:#24292f}AWSBadRequestException{color}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org