You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/04/01 03:26:04 UTC

[GitHub] [druid] splunk-tschetter opened a new issue #11059: S3Utils can only upload files with max size of 5GB

splunk-tschetter opened a new issue #11059:
URL: https://github.com/apache/druid/issues/11059


   If S3Utils.uploadFileIfPossible is used to push up a file greater than 5GB, it will hit S3 limits for object sizes.  In S3, a single PutRequest can at max handle 5GB and a multipart upload should be used for files that are larger.
   
   ### Affected Version
   
   0.18 and main branch (https://github.com/apache/druid/blob/8296123d895db7d06bc4517db5e767afb7862b83/extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3Utils.java#L272)
   
   ### Description
   
   If S3Utils.uploadFileIfPossible is used to push up a file greater than 5GB, it will hit S3 limits for object sizes.  In S3, a single PutRequest can at max handle 5GB and a multipart upload should be used for files that are larger.
   
   We ran into this because of a task log trying to be uploaded exceeding the size limits.  I actually don't know the exact size of the file, but it failed with 
   
   ```java.lang.RuntimeException: com.amazonaws.services.s3.model.AmazonS3Exception: Your proposed upload exceeds the maximum allowed size (Service: Amazon S3; Status Code: 400; Error Code: EntityTooLarge; Request ID: XXXX; S3 Extended Request ID:XXXX; Proxy: null), S3 Extended Request ID: XXXX
   	at org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:145) ~[?:?]
   	at org.apache.druid.storage.s3.S3TaskLogs.pushTaskLog(S3TaskLogs.java:122) ~[?:?]
   	at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:369) [druid-indexing-service-0.18.0]
   	at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:132) [druid-indexing-service-0.18.0]
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
   Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Your proposed upload exceeds the maximum allowed size (Service: Amazon S3; Status Code: 400; Error Code: EntityTooLarge; Request ID: XXXX; S3 Extended Request ID: XXXX; Proxy: null)
   	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811) ~[aws-java-sdk-core-1.11.837.jar:?]
   	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395) ~[aws-java-sdk-core-1.11.837.jar:?]
   	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371) ~[aws-java-sdk-core-1.11.837.jar:?]
   	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) ~[aws-java-sdk-core-1.11.837.jar:?]
   	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) ~[aws-java-sdk-core-1.11.837.jar:?]
   	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) ~[aws-java-sdk-core-1.11.837.jar:?]
   	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) ~[aws-java-sdk-core-1.11.837.jar:?]
   	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) ~[aws-java-sdk-core-1.11.837.jar:?]
   	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) ~[aws-java-sdk-core-1.11.837.jar:?]
   	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) ~[aws-java-sdk-core-1.11.837.jar:?]
   	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) ~[aws-java-sdk-core-1.11.837.jar:?]
   	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062) ~[aws-java-sdk-s3-1.11.837.jar:?]
   	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008) ~[aws-java-sdk-s3-1.11.837.jar:?]
   	at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:394) ~[aws-java-sdk-s3-1.11.837.jar:?]
   	at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:5950) ~[aws-java-sdk-s3-1.11.837.jar:?]
   	at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1812) ~[aws-java-sdk-s3-1.11.837.jar:?]
   	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1772) ~[aws-java-sdk-s3-1.11.837.jar:?]
   	at org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3.putObject(ServerSideEncryptingAmazonS3.java:110) ~[?:?]
   	at org.apache.druid.storage.s3.S3Utils.uploadFileIfPossible(S3Utils.java:225) ~[?:?]
   	at org.apache.druid.storage.s3.S3TaskLogs.lambda$pushTaskFile$0(S3TaskLogs.java:138) ~[?:?]
   	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:87) ~[druid-core-0.18.0]
   	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:115) ~[druid-core-0.18.0]
   	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:105) ~[druid-core-0.18.0]
   	at org.apache.druid.storage.s3.S3Utils.retryS3Operation(S3Utils.java:87) ~[?:?]
   	at org.apache.druid.storage.s3.S3TaskLogs.pushTaskFile(S3TaskLogs.java:136) ~[?:?]
   	... 7 more```
   
   This stackoverflow has me believing that the limit is 5GB so that's what we were hitting: https://stackoverflow.com/questions/26319815/entitytoolarge-error-when-uploading-a-5g-file-to-amazon-s3
   
   This AWS documentation agrees that a single PutObject maxes out at 5GB: https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html
   
   And points to this for how to do a multi-part:
   https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org