You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tomcat.apache.org by Rainer Jung <ra...@kippdata.de> on 2018/11/25 09:42:19 UTC

Current problems with TLS 1.0 and NIO(2)+native+openssl 1.1.1

I observed that when building tcnative against OpenSSL 1.1.1 I ran into 
hangs when talking TLS 1.0 with Tomcat trunk using that tcnative plus 
Nio(2).

A simple "GET /" request eg. send with curl, hangs for 60 seconds after 
a successful TLS handshake, then the client ends with an "empty reply 
from server".

You can also reproduce with openssl s_client. The request will hang 
until you send another additional empty line (in addition to the usual 
empty line ending the request). The additional one will then trigger 
another read which will find the old request data and handle it.

The problem does not occur with the APR connector. APR and Nio(2) seem 
to use very different code paths in tcnative for TLS handling 
(sslnetwork.c versus ssl.c).

I have some understanding of the root cause but currently no good idea 
how to fix it. The root cause is incorrect handling of SSL_read when it 
returns "0". The OpenSSL man page has a relevant description at [1]. As 
observed also in mod_ssl (Apache web server), OpenSSL 1.1.1 behaves 
different than older version in that it can return "0", were old 
versions returned "-1". That was always documented as a possibility but 
in reality now really happens. The tcnative code used by APR handles 
this in the native part. The code used by Nio(2) simply returns the 
value it gets from SSL_read() and leaves it to the calling Java to 
handle that. netty, from which we borrowed the ideas for Java plus 
OpenSSL, does include such code in ReferenceCountedOpenSslEngine.java, 
especially the SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE handling.

I could have experimented with their approach, but for some reason there 
seems to be another problem that makes it harder. The relevant call to 
SSL_read() returns "0", but does not return WANT_READ or WANT_WRITE from 
a following SSL_get_error(), but instead "5", which is 
SSL_ERROR_SYSCALL. I do not have a good idea, where this comes from. 
When tracing system calls, it seems it comes from an EAGAIN in a socket 
read, but I am not sure about that.

In our Java code, what happens is a call to unwrap() in OpenSSLEngine. 
This call writes I think 146 bytes, then checks 
pendingReadableBytesInSSL(). That call in turn calls SSL.readFromSSL() 
and gets back "0" (from SSL_read()). Up in unwrap() we then skip the 
while loop and finally return with BUFFER_UNDERFLOW. Then we hang, 
probably because the data was read by OpenSSL and no more socket event 
happens. If I artificially add another call to 
pendingReadableBytesInSSL() which triggers another SSL_read(), the hang 
does not occur.

IMHO TLS 1.0 is not such a big problem, but we should at least document 
it when we do a new release.

I might drill down debugging into the native layer checking errno etc. 
but I am not sure I will find the time.

[1]: https://www.openssl.org/docs/man1.1.1/man3/SSL_read.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: Current problems with TLS 1.0 and NIO(2)+native+openssl 1.1.1

Posted by Christopher Schultz <ch...@christopherschultz.net>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Rémy,

On 11/29/18 08:27, Rémy Maucherat wrote:
> On Sun, Nov 25, 2018 at 10:42 AM Rainer Jung
> <ra...@kippdata.de> wrote:
> 
>> In our Java code, what happens is a call to unwrap() in
>> OpenSSLEngine. This call writes I think 146 bytes, then checks 
>> pendingReadableBytesInSSL(). That call in turn calls
>> SSL.readFromSSL() and gets back "0" (from SSL_read()). Up in
>> unwrap() we then skip the while loop and finally return with
>> BUFFER_UNDERFLOW. Then we hang, probably because the data was
>> read by OpenSSL and no more socket event happens. If I
>> artificially add another call to pendingReadableBytesInSSL()
>> which triggers another SSL_read(), the hang does not occur.
>> 
>> IMHO TLS 1.0 is not such a big problem, but we should at least
>> document it when we do a new release.
>> 
>> Last time, Mark fixed pendingReadableBytesInSSL (=
> SSL.pendingReadableBytesInSSL) not working by using a fake read 
> (SSL.readFromSSL(ssl, EMPTY_ADDR, 0)) to start a new record. So
> then we actually need to make *two* fake reads (if the result of
> the first is zero) and we would be fine ? This OpenSSL API sounds
> ridiculously bad ... (IMO)
> 
> So you made it sound like it would work: Index:
> java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java 
> ===================================================================
>
> 
- --- java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java    (revis
ion
> 1847712) +++
> java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java
> (working copy) @@ -635,6 +635,9 @@ // See
> https://www.openssl.org/docs/manmaster/ssl/SSL_pending.html 
> clearLastError(); int lastPrimingReadResult = SSL.readFromSSL(ssl,
> EMPTY_ADDR, 0); // priming read +        if (lastPrimingReadResult
> == 0) { +            lastPrimingReadResult = SSL.readFromSSL(ssl,
> EMPTY_ADDR, 0); +        } // check if SSL_read returned <= 0. In
> this case we need to check the error and see if it was something //
> fatal. if (lastPrimingReadResult <= 0) {

Is anyone in contact with the OpenSSL folks?

I mean... we could bumble our way through this by ourselves, or we
could ask people who know everything about OpenSSL for some help...

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwAIBYACgkQHPApP6U8
pFgvHw/7BREGqoGDkhGuoWM5dHc4NbaLyDOFF+sFVVDvgE6Ll9oT+jTnotlRlrNy
7ZVspEW6On9VdPKhZYW1itmu1iCqLMnh1Akk2HzUZzKmJR/oMz/gshb76rASwg+u
IyZwDfsemoXVXXIgkT6iEA4yzA3je/L3ahug6JNHC8EGp/QIBBun8OQfwWY30PHY
NfQ5tWmBeJ8bBdb1iV9YPdfWNvBX6LBTjr+iBFJs3QUDgAkiSWLRRu9h8QpfN4XH
gb8SbOGM/YLXHvNoY6flONWzXVFX9mERq4a1JKQHxRJm6HOYy3IX0CZnx+ebhiN1
G1PNVnUPMo5P6hUbo1/W186zFQI5X8C1F/pUKR43fWYyXpaY7q58vHKrfeHEpkbT
SmAUBpAJhT3v9qL+ITW51eeLkIC2CmFz1Xmj8xf89Z5gT37nM9MehtOp23fiu3+A
TrgGxmEjst3NJ6gNSD8OJllYoUHk+tsZXEUo1KH5amDYCk1d0LVpJ8/xorwBuZhX
AXkMgleq0l0Qk5WFaFkhswZ+w6uJCqoEVMbWWfHhp+aFreO5ZJzOQWouDrrMvOTi
WfbJl44qwgBSBwAWKxrDwPbj3ifc0rIhEIZ72A1f1Vf3AllT2moFodZ+HBym3Tgu
bqn1zWjWwP57mtFgvSQS7VWSp1zbZ6xQb7YnSQbIZr47TEYFSG4=
=43IP
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: Current problems with TLS 1.0 and NIO(2)+native+openssl 1.1.1

Posted by Rainer Jung <ra...@kippdata.de>.

Am 29.11.2018 um 15:55 schrieb Mark Thomas:
> On 29/11/2018 13:27, Rémy Maucherat wrote:
>> On Sun, Nov 25, 2018 at 10:42 AM Rainer Jung <ra...@kippdata.de>
>> wrote:
>>
>>> In our Java code, what happens is a call to unwrap() in OpenSSLEngine.
>>> This call writes I think 146 bytes, then checks
>>> pendingReadableBytesInSSL(). That call in turn calls SSL.readFromSSL()
>>> and gets back "0" (from SSL_read()). Up in unwrap() we then skip the
>>> while loop and finally return with BUFFER_UNDERFLOW. Then we hang,
>>> probably because the data was read by OpenSSL and no more socket event
>>> happens. If I artificially add another call to
>>> pendingReadableBytesInSSL() which triggers another SSL_read(), the hang
>>> does not occur.
>>>
>>> IMHO TLS 1.0 is not such a big problem, but we should at least document
>>> it when we do a new release.
>>>
>> Last time, Mark fixed pendingReadableBytesInSSL (=
>> SSL.pendingReadableBytesInSSL) not working by using a fake read
>> (SSL.readFromSSL(ssl, EMPTY_ADDR, 0)) to start a new record. So then we
>> actually need to make *two* fake reads (if the result of the first is zero)
>> and we would be fine ? This OpenSSL API sounds ridiculously bad ... (IMO)
> 
> My research is leading me in that direction. So far, it looks like TLS
> 1.0 is sending two application data packets (I assume this is mitigation
> for BEAST) whereas TLS 1.1 and later only send 1. It appears those two
> packets each need a priming read. Not sure why. I'd expect to be able to
> read a single character after the first packet but the API indicates 0
> characters are available.
> 
> I have been trying to find some API to call that reliably indicates that
> another priming read is required but I haven't found it yet. I am
> currently looking at triggering a second priming read if:
> - TLS 1.0 is being used
> - a priming read returns 0
> - SSL_get_error returns 5
> - pendingReadableBytesInSSL returns 0
> 
> I also want to think about trying to reduce the number of times we call
> into native code. I'm not sure how practical this is and - since it only
> affects TLS 1.0 - I'm not overly concerned since folks should really be
> on TLS 1.2 / 1.3 by now.

I like the restriction of the workaround for TLS 1.0. My experiments 
seemed to indicate, that such an additional priming read might not 
directly help at the point in time where the normal read returns 0 byte, 
but it might help indirectly, because it is also done at some 
communication stage before and then the first read after processing no 
longer returns 0 but instead some bytes. Everything quite vague, I know.

Regards,

Rainer

>> So you made it sound like it would work:
>> Index: java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java
>> ===================================================================
>> --- java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java    (revision
>> 1847712)
>> +++ java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java    (working
>> copy)
>> @@ -635,6 +635,9 @@
>>           // See https://www.openssl.org/docs/manmaster/ssl/SSL_pending.html
>>           clearLastError();
>>           int lastPrimingReadResult = SSL.readFromSSL(ssl, EMPTY_ADDR, 0);
>> // priming read
>> +        if (lastPrimingReadResult == 0) {
>> +            lastPrimingReadResult = SSL.readFromSSL(ssl, EMPTY_ADDR, 0);
>> +        }
>>           // check if SSL_read returned <= 0. In this case we need to check
>> the error and see if it was something
>>           // fatal.
>>           if (lastPrimingReadResult <= 0) {
> 
> Yep. That is a much cleaner version of what I have currently working.
> 
> Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: Current problems with TLS 1.0 and NIO(2)+native+openssl 1.1.1

Posted by Mark Thomas <ma...@apache.org>.

On 29/11/2018 13:27, Rémy Maucherat wrote:
> On Sun, Nov 25, 2018 at 10:42 AM Rainer Jung <ra...@kippdata.de>
> wrote:
> 
>> In our Java code, what happens is a call to unwrap() in OpenSSLEngine.
>> This call writes I think 146 bytes, then checks
>> pendingReadableBytesInSSL(). That call in turn calls SSL.readFromSSL()
>> and gets back "0" (from SSL_read()). Up in unwrap() we then skip the
>> while loop and finally return with BUFFER_UNDERFLOW. Then we hang,
>> probably because the data was read by OpenSSL and no more socket event
>> happens. If I artificially add another call to
>> pendingReadableBytesInSSL() which triggers another SSL_read(), the hang
>> does not occur.
>>
>> IMHO TLS 1.0 is not such a big problem, but we should at least document
>> it when we do a new release.
>>
> Last time, Mark fixed pendingReadableBytesInSSL (=
> SSL.pendingReadableBytesInSSL) not working by using a fake read
> (SSL.readFromSSL(ssl, EMPTY_ADDR, 0)) to start a new record. So then we
> actually need to make *two* fake reads (if the result of the first is zero)
> and we would be fine ? This OpenSSL API sounds ridiculously bad ... (IMO)

My research is leading me in that direction. So far, it looks like TLS
1.0 is sending two application data packets (I assume this is mitigation
for BEAST) whereas TLS 1.1 and later only send 1. It appears those two
packets each need a priming read. Not sure why. I'd expect to be able to
read a single character after the first packet but the API indicates 0
characters are available.

I have been trying to find some API to call that reliably indicates that
another priming read is required but I haven't found it yet. I am
currently looking at triggering a second priming read if:
- TLS 1.0 is being used
- a priming read returns 0
- SSL_get_error returns 5
- pendingReadableBytesInSSL returns 0

I also want to think about trying to reduce the number of times we call
into native code. I'm not sure how practical this is and - since it only
affects TLS 1.0 - I'm not overly concerned since folks should really be
on TLS 1.2 / 1.3 by now.

> So you made it sound like it would work:
> Index: java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java
> ===================================================================
> --- java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java    (revision
> 1847712)
> +++ java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java    (working
> copy)
> @@ -635,6 +635,9 @@
>          // See https://www.openssl.org/docs/manmaster/ssl/SSL_pending.html
>          clearLastError();
>          int lastPrimingReadResult = SSL.readFromSSL(ssl, EMPTY_ADDR, 0);
> // priming read
> +        if (lastPrimingReadResult == 0) {
> +            lastPrimingReadResult = SSL.readFromSSL(ssl, EMPTY_ADDR, 0);
> +        }
>          // check if SSL_read returned <= 0. In this case we need to check
> the error and see if it was something
>          // fatal.
>          if (lastPrimingReadResult <= 0) {

Yep. That is a much cleaner version of what I have currently working.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: Current problems with TLS 1.0 and NIO(2)+native+openssl 1.1.1

Posted by Rémy Maucherat <re...@apache.org>.

On Sun, Nov 25, 2018 at 10:42 AM Rainer Jung <ra...@kippdata.de>
wrote:

> In our Java code, what happens is a call to unwrap() in OpenSSLEngine.
> This call writes I think 146 bytes, then checks
> pendingReadableBytesInSSL(). That call in turn calls SSL.readFromSSL()
> and gets back "0" (from SSL_read()). Up in unwrap() we then skip the
> while loop and finally return with BUFFER_UNDERFLOW. Then we hang,
> probably because the data was read by OpenSSL and no more socket event
> happens. If I artificially add another call to
> pendingReadableBytesInSSL() which triggers another SSL_read(), the hang
> does not occur.
>
> IMHO TLS 1.0 is not such a big problem, but we should at least document
> it when we do a new release.
>
> Last time, Mark fixed pendingReadableBytesInSSL (=
SSL.pendingReadableBytesInSSL) not working by using a fake read
(SSL.readFromSSL(ssl, EMPTY_ADDR, 0)) to start a new record. So then we
actually need to make *two* fake reads (if the result of the first is zero)
and we would be fine ? This OpenSSL API sounds ridiculously bad ... (IMO)

So you made it sound like it would work:
Index: java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java
===================================================================
--- java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java    (revision
1847712)
+++ java/org/apache/tomcat/util/net/openssl/OpenSSLEngine.java    (working
copy)
@@ -635,6 +635,9 @@
         // See https://www.openssl.org/docs/manmaster/ssl/SSL_pending.html
         clearLastError();
         int lastPrimingReadResult = SSL.readFromSSL(ssl, EMPTY_ADDR, 0);
// priming read
+        if (lastPrimingReadResult == 0) {
+            lastPrimingReadResult = SSL.readFromSSL(ssl, EMPTY_ADDR, 0);
+        }
         // check if SSL_read returned <= 0. In this case we need to check
the error and see if it was something
         // fatal.
         if (lastPrimingReadResult <= 0) {

Rémy

Re: Current problems with TLS 1.0 and NIO(2)+native+openssl 1.1.1

Posted by Mark Thomas <ma...@apache.org>.

I've been looking at this in case we need a change in native before I
roll the 1.2.19 release.

On 25/11/2018 09:42, Rainer Jung wrote:
> I observed that when building tcnative against OpenSSL 1.1.1 I ran into
> hangs when talking TLS 1.0 with Tomcat trunk using that tcnative plus
> Nio(2).
> 
> A simple "GET /" request eg. send with curl, hangs for 60 seconds after
> a successful TLS handshake, then the client ends with an "empty reply
> from server".
> 
> You can also reproduce with openssl s_client. The request will hang
> until you send another additional empty line (in addition to the usual
> empty line ending the request). The additional one will then trigger
> another read which will find the old request data and handle it.

I also see this with openssl s_client

> The problem does not occur with the APR connector. APR and Nio(2) seem
> to use very different code paths in tcnative for TLS handling
> (sslnetwork.c versus ssl.c).
> 
> I have some understanding of the root cause but currently no good idea
> how to fix it. The root cause is incorrect handling of SSL_read when it
> returns "0". The OpenSSL man page has a relevant description at [1]. As
> observed also in mod_ssl (Apache web server), OpenSSL 1.1.1 behaves
> different than older version in that it can return "0", were old
> versions returned "-1". That was always documented as a possibility but
> in reality now really happens. The tcnative code used by APR handles
> this in the native part. The code used by Nio(2) simply returns the
> value it gets from SSL_read() and leaves it to the calling Java to
> handle that. netty, from which we borrowed the ideas for Java plus
> OpenSSL, does include such code in ReferenceCountedOpenSslEngine.java,
> especially the SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE handling.
> 
> I could have experimented with their approach, but for some reason there
> seems to be another problem that makes it harder. The relevant call to
> SSL_read() returns "0", but does not return WANT_READ or WANT_WRITE from
> a following SSL_get_error(), but instead "5", which is
> SSL_ERROR_SYSCALL. I do not have a good idea, where this comes from.
> When tracing system calls, it seems it comes from an EAGAIN in a socket
> read, but I am not sure about that.

I did not see this. All the error codes I saw were zero (which makes it
even harder to figure out a solution).

Which OS were you testing? Where exactly did you observe that EAGAIN error?

> In our Java code, what happens is a call to unwrap() in OpenSSLEngine.
> This call writes I think 146 bytes, then checks
> pendingReadableBytesInSSL(). That call in turn calls SSL.readFromSSL()
> and gets back "0" (from SSL_read()). Up in unwrap() we then skip the
> while loop and finally return with BUFFER_UNDERFLOW. Then we hang,
> probably because the data was read by OpenSSL and no more socket event
> happens. If I artificially add another call to
> pendingReadableBytesInSSL() which triggers another SSL_read(), the hang
> does not occur.

I have tried various ways to differentiate between "there is some data
there somewhere if you just keep trying" and "no, there really isn't any
data there" without success so far.

> IMHO TLS 1.0 is not such a big problem, but we should at least document
> it when we do a new release.
> 
> I might drill down debugging into the native layer checking errno etc.
> but I am not sure I will find the time.
> 
> [1]: https://www.openssl.org/docs/man1.1.1/man3/SSL_read.html

I'd like to spend a little more time looking at this before I tag the
release.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org