You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@trafficserver.apache.org by "Walsh, Peter" <Pe...@disney.com> on 2012/08/07 22:34:50 UTC

SSL Error on RHEL 5, wrong cipher returned

Hello all,
We recently experienced an issue in which our ATS instances got into a bad state and requests to origin servers over https began failing.   The traffic.out log file has many SSL Errors regarding a wrong cipher returned (see below).  Restarting traffic server resolved this issue.  We have only seen this a few times and are unable to reproduce it ourselves.

Has anyone experienced this?

In doing some research I uncovered several mentions of thread safety issues with open SSL that that could lead to this type of error.  However, we've been unable to pin point an open SSL patch that gives us high degree of confidence that upgrading our open SSL fixes this and since it doesn't happen often and we can't reproduce it, there isn't a way to verify the bug is gone.

traffic.out Log Snippet:
[Aug  6 14:38:02.261] Server {1103939904} ERROR: SSL::9:error:14092105:SSL routines:SSL3_GET_SERVER_HELLO:wrong cipher returned:s3_clnt.c:744:
[Aug  6 14:38:02.263] Server {1103939904} ERROR: SSL ERROR: sslClientHandShakeEvent.

Error.log snippet (with our IP's, host and paths removed):
20120806.13h07m22s CONNECT:[1] could not connect [CONNECTION_ERROR] to <insert IP here> for 'https://<<https://<host>insert host and path>'
20120806.13h07m22s CONNECT:[2] could not connect [CONNECTION_ERROR] to <insert IP here> for 'https://<<https://<host>insert host and path>'
20120806.13h07m22s RESPONSE: sent 0.0.0.0 status 502 (Connect Error <Success/0>) for 'https://<<https://<host>insert host and path>'



Re: SSL Error on RHEL 5, wrong cipher returned

Posted by "Walsh, Peter" <Pe...@disney.com>.
Hi Leif,
We upgraded to from OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008  to openssl.x86_64    0.9.8e-22.el5_8.4

From: Leif Hedstrom <zw...@apache.org>>
Date: Tue, 14 Aug 2012 00:29:57 -0700
To: "users@trafficserver.apache.org<ma...@trafficserver.apache.org>" <us...@trafficserver.apache.org>>
Cc: Peter Walsh <pe...@disney.com>>
Subject: Re: SSL Error on RHEL 5, wrong cipher returned

On 8/13/12 8:41 AM, Walsh, Peter wrote:
In case anyone else experiences this on RHEL 5, we updated our OpenSSL library to the latest and so far haven’t seen this issue again.

Interesting. Do you know what version(s) of OpenSSL that had this problem?

-- Leif


Re: SSL Error on RHEL 5, wrong cipher returned

Posted by Leif Hedstrom <zw...@apache.org>.
On 8/13/12 8:41 AM, Walsh, Peter wrote:
>
> In case anyone else experiences this on RHEL 5, we updated our OpenSSL 
> library to the latest and so far haven't seen this issue again.
>

Interesting. Do you know what version(s) of OpenSSL that had this problem?

-- Leif


RE: SSL Error on RHEL 5, wrong cipher returned

Posted by "Walsh, Peter" <Pe...@disney.com>.
In case anyone else experiences this on RHEL 5, we updated our OpenSSL library to the latest and so far haven't seen this issue again.

Pete Walsh
Software Engineer
206-664-4150

From: users-return-1966-Peter.Walsh=disney.com@trafficserver.apache.org [mailto:users-return-1966-Peter.Walsh=disney.com@trafficserver.apache.org] On Behalf Of Walsh, Peter
Sent: Tuesday, August 07, 2012 1:35 PM
To: users@trafficserver.apache.org
Subject: SSL Error on RHEL 5, wrong cipher returned

Hello all,
We recently experienced an issue in which our ATS instances got into a bad state and requests to origin servers over https began failing.   The traffic.out log file has many SSL Errors regarding a wrong cipher returned (see below).  Restarting traffic server resolved this issue.  We have only seen this a few times and are unable to reproduce it ourselves.

Has anyone experienced this?

In doing some research I uncovered several mentions of thread safety issues with open SSL that that could lead to this type of error.  However, we've been unable to pin point an open SSL patch that gives us high degree of confidence that upgrading our open SSL fixes this and since it doesn't happen often and we can't reproduce it, there isn't a way to verify the bug is gone.

traffic.out Log Snippet:
[Aug  6 14:38:02.261] Server {1103939904} ERROR: SSL::9:error:14092105:SSL routines:SSL3_GET_SERVER_HELLO:wrong cipher returned:s3_clnt.c:744:
[Aug  6 14:38:02.263] Server {1103939904} ERROR: SSL ERROR: sslClientHandShakeEvent.

Error.log snippet (with our IP's, host and paths removed):
20120806.13h07m22s CONNECT:[1] could not connect [CONNECTION_ERROR] to <insert IP here> for 'https://<<https://%3chost>insert host and path>'
20120806.13h07m22s CONNECT:[2] could not connect [CONNECTION_ERROR] to <insert IP here> for 'https://<<https://%3chost>insert host and path>'
20120806.13h07m22s RESPONSE: sent 0.0.0.0 status 502 (Connect Error <Success/0>) for 'https://<<https://%3chost>insert host and path>'



Re: SSL Error on RHEL 5, wrong cipher returned

Posted by "Owens, Steve" <St...@disney.com>.
In looking at the
Log entry: [Aug  6 14:38:02.261] Server {1103939904} ERROR: SSL::9:error:14092105:SSL routines:SSL3_GET_SERVER_HELLO:wrong cipher returned:s3_clnt.c:744:

We can see in the file

693 944 if (i < 0)
 694 {
 695 /* we did not say we would use this cipher */
 696 0 al=SSL_AD_ILLEGAL_PARAMETER;
 697 0 SSLerr(SSL_F_SSL3_GET_SERVER_HELLO,SSL_R_WRONG_CIPHER_RETURNED);
 698 0 goto f_err;
 699 }

744
745 944 return(1);
 746 0 f_err:
 747 0 ssl3_send_alert(s,SSL3_AL_FATAL,al);
 748 0 err:
 749 0 return(-1);
 750 }

Searching for SSL_R_WRONG_CIPHER_RETURNED we get:

http://marc.info/?l=openssl-dev&m=122789102030356


Wow. This may explain a once-in-a-million anomaly going on here.
Hmmmmm... Checked this and yes, same story in 0.9.9 HEAD.

Note that this is a particular wicked insect, because, as it is the
stack/stack.c internal_find() function who exhibits the _side effect_
of sorting a yet unsorted stack, _anyone_ who employs a sk_xyz_find()
finds themselves a piece of non-threadsafe code. (You got /my/
attention now!)


There are two ways about this:

a) either forego on the side effect and resort to a linear (slow)
search/scan when the stack is unsorted (which can happen, for
instance, after an element has been inserted (sk_*_insert())), so that
sk_find() will be threadsafe (assuming no parallel insert()/push()
calls) or

b) provide a threadsafe environment by surrounding a least each sort
with a lock. (I can see why lock-encapsulating the find() would be
bothersome, as it would make threads wait for each other, where they
wouldn't really have to in 99% of cases, so the brute-force approach
of lock-covering the find() shouldn't be contemplated.


Which leads to a code inspection:
This finds us one working example of a solution of type (b):

crypto/asn1/x_crl.c, function def_crl_lookup(): here locking surrounds
the sort() operation in the flow

if (!..._is_sorted())
{
  lock();
  sort();
  unlock();
}
find();

and thus _is_ thread safe (my initial kneejerk reaction was: it is
not, but then I looked again...): since the find() won't ever be
reached if the stack is unsorted, the lock surrounding the sort() is
good enough -- assuming no-one will sk_*_insert() / sk_*_push() items
into the stack in another thread. Even a non-atomic read in
_is_sorted() is no problem, as it can only conclude the stack is /not/
sorted a little too often, walking the code straight into the
lock+sort, resulting in the stack getting sorted more often than
necessary in this fringe, but the sort() is repeatable and when your
RTL provides a high quality qsort() which checks for partially/fully
sorted input (and several implementations do so), this is a minor
overhead at the start, while allowing full parallelism of _find()
lateron.

Also, note that sk_*_insert()/push() are thread unsafe (hm, no .pod
for this stuff yet; I should write one then...) and should not be used
on a thread-shared stack while multiple threads may access it, as
_insert() will surely corrupt any parallel running find() - as the
latter doesn't come with locks. Worse, insert/push() will make the
stack object not-thread-safe until it is followed by a completed
sk_sort().



From: "Walsh, Peter" <Pe...@disney.com>>
Reply-To: "users@trafficserver.apache.org<ma...@trafficserver.apache.org>" <us...@trafficserver.apache.org>>
Date: Tue, 7 Aug 2012 13:34:50 -0700
To: "users@trafficserver.apache.org<ma...@trafficserver.apache.org>" <us...@trafficserver.apache.org>>
Subject: SSL Error on RHEL 5, wrong cipher returned

Hello all,
We recently experienced an issue in which our ATS instances got into a bad state and requests to origin servers over https began failing.   The traffic.out log file has many SSL Errors regarding a wrong cipher returned (see below).  Restarting traffic server resolved this issue.  We have only seen this a few times and are unable to reproduce it ourselves.

Has anyone experienced this?

In doing some research I uncovered several mentions of thread safety issues with open SSL that that could lead to this type of error.  However, we've been unable to pin point an open SSL patch that gives us high degree of confidence that upgrading our open SSL fixes this and since it doesn't happen often and we can't reproduce it, there isn't a way to verify the bug is gone.

traffic.out Log Snippet:
[Aug  6 14:38:02.261] Server {1103939904} ERROR: SSL::9:error:14092105:SSL routines:SSL3_GET_SERVER_HELLO:wrong cipher returned:s3_clnt.c:744:
[Aug  6 14:38:02.263] Server {1103939904} ERROR: SSL ERROR: sslClientHandShakeEvent.

Error.log snippet (with our IP's, host and paths removed):
20120806.13h07m22s CONNECT:[1] could not connect [CONNECTION_ERROR] to <insert IP here> for 'https://<<https://<host>insert host and path>'
20120806.13h07m22s CONNECT:[2] could not connect [CONNECTION_ERROR] to <insert IP here> for 'https://<<https://<host>insert host and path>'
20120806.13h07m22s RESPONSE: sent 0.0.0.0 status 502 (Connect Error <Success/0>) for 'https://<<https://<host>insert host and path>'