You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Sudheer Vinukonda (JIRA)" <ji...@apache.org> on 2014/09/20 01:21:33 UTC

[jira] [Comment Edited] (TS-3085) Large POSTs over (relatively) slower connections failing in ats5

    [ https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141522#comment-14141522 ] 

Sudheer Vinukonda edited comment on TS-3085 at 9/19/14 11:21 PM:
-----------------------------------------------------------------

[~jpeach@apache.org] : Sorry, I couldn't get all of your suggestions incorporated in the first pass. For example, if I restrict calling SSL_get_error() only when return value of SSL API call is < 0, it gets stuck in a loop. 

Haven't debugged it much, but, it seems like even OpenSSL code also does something like what ats is doing (pls see below) - https://github.com/luvit/openssl/blob/master/openssl/ssl/bio_ssl.c. So, I am wondering if we can leave it that way. The other suggestion you had about moving the SSL_Write wrapper to SSLUtils.cc is easy enough to do..if you would like, I can update the patch with that, or do that in a separate jira with perhaps some other cleanup.

{code}
static int ssl_read(BIO *b, char *out, int outl)
	{
//......
/*	if (ret > 0) */
	ret=SSL_read(ssl,out,outl);

	switch (SSL_get_error(ssl,ret))
		{
	case SSL_ERROR_NONE:
//...
{code}


was (Author: sudheerv):
[~jpeach@apache.org] : Sorry, I couldn't get all of your suggestions incorporated in the first pass. For example, if I restrict calling SSL_get_error() only when return value of SSL API call is < 0, it gets stuck in a loop. 

Haven't debugged it much, but, it seems like even OpenSSL code also does something like that (pls see below) - https://github.com/luvit/openssl/blob/master/openssl/ssl/bio_ssl.c. So, I am wondering if we can leave it that way. The other suggestion you had about moving the SSL_Write wrapper to SSLUtils.cc is easy enough to do..if you would like, I can update the patch with that, or do that in a separate jira with perhaps some other cleanup.

{code}
static int ssl_read(BIO *b, char *out, int outl)
	{
//......
/*	if (ret > 0) */
	ret=SSL_read(ssl,out,outl);

	switch (SSL_get_error(ssl,ret))
		{
	case SSL_ERROR_NONE:
//...
{code}

> Large POSTs over (relatively) slower connections failing in ats5
> ----------------------------------------------------------------
>
>                 Key: TS-3085
>                 URL: https://issues.apache.org/jira/browse/TS-3085
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: SSL
>    Affects Versions: 5.0.1
>            Reporter: Sudheer Vinukonda
>            Assignee: Sudheer Vinukonda
>              Labels: yahoo
>             Fix For: 5.2.0
>
>         Attachments: TS-3085.diff
>
>
> We ran into a production issue where large POSTs (30MB or high) are failing over slower connection speeds after ats5 roll out (the problem could be easily reproduced using a charles proxy with throttling enabled). 
> Further debugging isolated the issue to uploads over SSL connections and after a lot of debugging the issue appears to be the below:
> ATS calls SSL_read() followed by SSL_get_error() to check if there was any error in the read. This is repeated until either the complete data is read or an error occurs. However, from the openssl documentation, it is recommended to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure the error queue is clean of any leftover/garbage errors.  It's not clear what might be corrupting the error queue of the SSL context in a tight loop - possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a good idea and adding this seems to resolve the post failures.
> Documentation from openSSL and some related notes on stackoverflow:
> https://www.openssl.org/docs/ssl/SSL_get_error.html
> http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error
> {code}
> "SSL_get_error() returns a result code (suitable for the C ``switch''
> statement) for a preceding call to SSL_connect(), SSL_accept(),
> SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value
> returned by that TLS/SSL I/O function must be passed to SSL_get_error() in
> parameter ret.
> In addition to ssl and ret, SSL_get_error() inspects the current thread's
> OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that
> performed the TLS/SSL I/O operation, and no other OpenSSL function calls should
> appear in between. The current thread's error queue must be empty before the
> TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably."
> "SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error,
> the error stays in the queue.
> You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write
> etc) that is followed by SSL_get_error, otherwise you may be reading an old
> error that occurred previously in the current thread."
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)