You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/06/25 15:26:00 UTC

[jira] [Commented] (HADOOP-14070) S3a: Failed to reset the request input stream

    [ https://issues.apache.org/jira/browse/HADOOP-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522437#comment-16522437 ] 

Steve Loughran commented on HADOOP-14070:
-----------------------------------------

Looking @ this again, tracing codepaths in the IDE

* When disk is being used as the buffer for block upload, we pass in the File reference: expectation, AWS SDK does mark/reset against it, reopening the file if needed. We explicitly moved to passing in the file ref for better recovery.
* When a byte array or bytebuffer is used to buffer blocks, then we pass in an associated stream which has no limits on where it can reset...its just an offset into an array

Stack trace here doesn't show us what is being used in the S3A connector, just that the AWS buffering couldn't reset its buffering wrapper as it had gone too far.  And why was that {{SdkBufferedInputStream}} being used? That's the interesting question.

# tracing things through the IDE, it would only seem to happen if {{InputStream.markSupported() == false}}.
# the byte array and our won block array output streams do return true, do let you reset to anywhere.
# So either that wasn't picked up, and they were wrapped, or this was a file source (it's the default, after all), and somehow that got wrapped by a buffer.

[~mojodna] it's been a while —but do you remember if you explicitly set {{fs.s3a.fast.upload.buffer}} to buffer via byte array "array" or byte buffer, "bytebuffer"? If not, something isn't right with what the SDK claims to do for wrapping file sources & handling falures there.

w.r.t actions, if it is in the SDK, that's the place for the fix. its happening in the xfer manager threads, only being manifest for us in the waitForUploads() code; there's no way we could wrap and fix this


> S3a: Failed to reset the request input stream
> ---------------------------------------------
>
>                 Key: HADOOP-14070
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14070
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Seth Fitzsimmons
>            Priority: Major
>
> {code}
> Feb 07, 2017 8:05:46 AM com.google.common.util.concurrent.Futures$CombinedFuture setExceptionAndMaybeLog
> SEVERE: input future failed.
> com.amazonaws.ResetException: Failed to reset the request input stream; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:501)
> at org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:492)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1219)
> at org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Resetting to invalid mark
> at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
> at com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
> at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:169)
> at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> ... 20 more
> 2017-02-07 08:05:46 WARN S3AInstrumentation:777 - Closing output stream statistics while data is still marked as pending upload in OutputStreamStatistics{blocksSubmitted=519, blocksInQueue=0, blocksActive=1, blockUploadsCompleted=518, blockUploadsFailed=2, bytesPendingUpload=82528300, bytesUploaded=54316236800, blocksAllocated=519, blocksReleased=519, blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0, transferDuration=2637812 ms, queueDuration=839 ms, averageQueueTime=1 ms, totalUploadDuration=2638651 ms, effectiveBandwidth=2.05848506680118E7 bytes/s}
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: Multi-part upload with id 'uDonLgtsyeToSmhyZuNb7YrubCDiyXCCQy4mdVc5ZmYWPPHyZ3H3ZlFZzKktaPUiYb7uT4.oM.lcyoazHF7W8pK4xWmXV4RWmIYGYYhN6m25nWRrBEE9DcJHcgIhFD8xd7EKIjijEd1k4S5JY1HQvA--' to 2017/history-170130.orc on 2017/history-170130.orc: com.amazonaws.ResetException: Failed to reset the request input stream; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int): Failed to reset the request input stream; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
> at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
> at org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
> at org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
> at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:351)
> at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.orc.impl.PhysicalFsWriter.close(PhysicalFsWriter.java:221)
> at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2827)
> at net.mojodna.osm2orc.standalone.OsmPbf2Orc.convert(OsmPbf2Orc.java:296)
> at net.mojodna.osm2orc.Osm2Orc.main(Osm2Orc.java:47)
> Caused by: com.amazonaws.ResetException: Failed to reset the request input stream; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:501)
> at org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:492)
> at org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Resetting to invalid mark
> at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
> at com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
> at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:169)
> at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1219)
> ... 20 more
> {code}
> Potentially relevant: https://github.com/aws/aws-sdk-java/issues/427



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org