You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2020/08/05 18:13:00 UTC

[jira] [Commented] (HADOOP-17180) S3Guard: Include 500 DynamoDB system errors in exponential backoff retries

    [ https://issues.apache.org/jira/browse/HADOOP-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171659#comment-17171659 ] 

Steve Loughran commented on HADOOP-17180:
-----------------------------------------

DDB's error reporting is pretty painful. I don't see any support for it in S3aUtils.isThrottleException in trunk, so upgrading isn't going to make your life better

.InMemoryFileIndex is something I've heard bad reviews for in terms of S3 load (https://stackoverflow.com/questions/60590925/spark-making-expensive-s3-api-calls ), probably the trigger of this.

Anyway, if treating 500 as a throttle event from DDB is needed, patch against trunk welcome, we can go back to 3.3.x as well. We've moved a long way from the 3.1 line though.

Looking at the changelog, you need to be on 3.2.x to get the current DDB throttle logic in HADOOP-15426

Can you try that and see if it helps?


> S3Guard: Include 500 DynamoDB system errors in exponential backoff retries
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-17180
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17180
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 3.1.3
>            Reporter: David Kats
>            Priority: Major
>         Attachments: image-2020-08-03-09-58-54-102.png
>
>
> We get fatal failures from S3guard (that in turn fail our spark jobs) because of the inernal DynamoDB system errors.
> {color:#000000}com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG): Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG){color}
> {color:#000000}The DynamoDB has separate statistic for system errors:{color}
> {color:#000000}!image-2020-08-03-09-58-54-102.png!{color}
> {color:#000000}I contacted the AWS Support and got an explanation that those 500 errors are returned to the client once DynamoDB gets overwhelmed with client requests.{color}
> {color:#000000}So essentially the traffic should had been throttled but it didn't and got 500 system errors.{color}
> {color:#000000}My point is that the client should handle those errors just like throttling exceptions - {color}
> {color:#000000}with exponential backoff retries.{color}
>  
> {color:#000000}Here is more complete exception stack trace:{color}
>  
> *{color:#000000}org.apache.hadoop.fs.s3a.AWSServiceIOException: get on s3a://rem-spark/persisted_step_data/15/0afb1ccb73854f1fa55517a77ec7cc5e__b67e2221-f0e3-4c89-90ab-f49618ea4557__SDTopology/parquet.all_ranges/topo_id=321: com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG): Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG) at{color}*{color:#000000} org.apache.hadoop.fs.s3a.S3AUtils.translateDynamoDBException(S3AUtils.java:389) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:181) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.get(DynamoDBMetadataStore.java:438) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2110) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1889) at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$9(S3AFileSystem.java:1868) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1868) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$listLeafFiles(InMemoryFileIndex.scala:277) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3$$anonfun$apply$2.apply(InMemoryFileIndex.scala:207) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3$$anonfun$apply$2.apply(InMemoryFileIndex.scala:206) at scala.collection.immutable.Stream.map(Stream.scala:418) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3.apply(InMemoryFileIndex.scala:206) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3.apply(InMemoryFileIndex.scala:204) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2925) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2901) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeGetItem(AmazonDynamoDBClient.java:1640) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.getItem(AmazonDynamoDBClient.java:1616) at com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.doLoadItem(GetItemImpl.java:77) at com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.getItem(GetItemImpl.java:66) at com.amazonaws.services.dynamodbv2.document.Table.getItem(Table.java:608) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.getConsistentItem(DynamoDBMetadataStore.java:423) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.innerGet(DynamoDBMetadataStore.java:459) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.lambda$get$2(DynamoDBMetadataStore.java:439) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) ... 29 more{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org