You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Sudheer Vinukonda (JIRA)" <ji...@apache.org> on 2014/09/18 19:35:35 UTC

[jira] [Commented] (TS-3085) Large POSTs over (relatively) slower connections failing in ats5

    [ https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139222#comment-14139222 ] 

Sudheer Vinukonda commented on TS-3085:
---------------------------------------

Per Leif's suggestion on a different jira, I've marked this for 5.2 and added a back port to 5.1.1, but, this is a blocker for our ats5 roll out, and, perhaps, whoever has use cases involving large POSTs may need to cherry pick the fix.

> Large POSTs over (relatively) slower connections failing in ats5
> ----------------------------------------------------------------
>
>                 Key: TS-3085
>                 URL: https://issues.apache.org/jira/browse/TS-3085
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: SSL
>    Affects Versions: 5.0.1
>            Reporter: Sudheer Vinukonda
>            Assignee: Sudheer Vinukonda
>              Labels: yahoo
>             Fix For: 5.2.0
>
>
> We ran into a production issue where large POSTs (30MB or high) are failing over slower connection speeds after ats5 roll out (the problem could be easily reproduced using a charles proxy with throttling enabled). 
> Further debugging isolated the issue to uploads over SSL connections and after a lot of debugging the issue appears to be the below:
> ATS calls SSL_read() followed by SSL_get_error() to check if there was any error in the read. This is repeated until either the complete data is read or an error occurs. However, from the openssl documentation, it is recommended to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure the error queue is clean of any leftover/garbage errors.  It's not clear what might be corrupting the error queue of the SSL context in a tight loop - possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a good idea and adding this seems to resolve the post failures.
> Documentation from openSSL and some related notes on stackoverflow:
> https://www.openssl.org/docs/ssl/SSL_get_error.html
> http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error
> {code}
> "SSL_get_error() returns a result code (suitable for the C ``switch''
> statement) for a preceding call to SSL_connect(), SSL_accept(),
> SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value
> returned by that TLS/SSL I/O function must be passed to SSL_get_error() in
> parameter ret.
> In addition to ssl and ret, SSL_get_error() inspects the current thread's
> OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that
> performed the TLS/SSL I/O operation, and no other OpenSSL function calls should
> appear in between. The current thread's error queue must be empty before the
> TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably."
> "SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error,
> the error stays in the queue.
> You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write
> etc) that is followed by SSL_get_error, otherwise you may be reading an old
> error that occurred previously in the current thread."
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)