You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Sean Mackrory (JIRA)" <ji...@apache.org> on 2018/06/18 20:28:00 UTC

[jira] [Comment Edited] (HADOOP-15541) AWS SDK can mistake stream timeouts for EOF and throw SdkClientExceptions

    [ https://issues.apache.org/jira/browse/HADOOP-15541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516306#comment-16516306 ] 

Sean Mackrory edited comment on HADOOP-15541 at 6/18/18 8:27 PM:
-----------------------------------------------------------------

{quote}Like you say, no real point in not aborting here.{quote}

Help me understand, though: when *do* we get a benefit from draining the stream instead of simply aborting?

{quote}Happy for a patch, I don't think we can test this easily so not expecting any tests in the patch...{quote}

Yeah. This was (at the time anyway) happening pretty repeatedly with a particular workload - I'm hoping that keeps up so I can be fairly confident that the end result here is correct handling of timeouts.

Instead of the forceAbort option, any objection to simple aborting when we catch IOExceptions AND SdkClientExceptions? If we're intended to close a previous stream and open a new one and draining the stream fails for any reason at all, I'd think we'd still want to force abort and proceed regardless of the option that led us to this point.


was (Author: mackrorysd):
{quote}Like you say, no real point in not aborting here.\{quote}

Help me understand, though: when *do* we get a benefit from draining the stream instead of simply aborting?

{quote}Happy for a patch, I don't think we can test this easily so not expecting any tests in the patch...\{quote}

Yeah. This was (at the time anyway) happening pretty repeatedly with a particular workload - I'm hoping that keeps up so I can be fairly confident that the end result here is correct handling of timeouts.

> AWS SDK can mistake stream timeouts for EOF and throw SdkClientExceptions
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-15541
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15541
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.9.1, 2.8.4, 3.0.2, 3.1.1
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>            Priority: Major
>
> I've gotten a few reports of read timeouts not being handled properly in some Impala workloads. What happens is the following sequence of events (credit to Sailesh Mukil for figuring this out):
>  * S3AInputStream.read() gets a SocketTimeoutException when it calls wrappedStream.read()
>  * This is handled by onReadFailure -> reopen -> closeStream. When we try to drain the stream, SdkFilterInputStream.read() in the AWS SDK fails because of checkLength. The underlying Apache Commons stream returns -1 in the case of a timeout, and EOF.
>  * The SDK assumes the -1 signifies an EOF, so assumes the bytes read must equal expected bytes, and because they don't (because it's a timeout and not an EOF) it throws an SdkClientException.
> This is tricky to test for without a ton of mocking of AWS SDK internals, because you have to get into this conflicting state where the SDK has only read a subset of the expected bytes and gets a -1.
> closeStream will abort the stream in the event of an IOException when draining. We could simply also abort in the event of an SdkClientException. I'm testing that this results in correct functionality in the workloads that seem to hit these timeouts a lot, but all the s3a tests continue to work with that change. I'm going to open an issue with the AWS SDK Github as well, but I'm not sure what the ideal outcome would be unless there's a good way to distinguish between a stream that has timed out and a stream that read all the data without huge rewrites.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org