You are viewing a plain text version of this content. The canonical link for it is here.

Posted to bugs@httpd.apache.org by bu...@apache.org on 2017/03/31 01:51:08 UTC

[Bug 60947] New: Segfault on startup when using mod_ssl with APR-crypto

https://bz.apache.org/bugzilla/show_bug.cgi?id=60947

            Bug ID: 60947
           Summary: Segfault on startup when using mod_ssl with APR-crypto
           Product: Apache httpd-2
           Version: 2.5-HEAD
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: mod_ssl
          Assignee: bugs@httpd.apache.org
          Reporter: jchampion@apache.org
  Target Milestone: ---

mod_ssl's use of CRYPTO_THREADID_set_callback() is unfortunately bugged if
another module has loaded OpenSSL, either via the apr_crypto routines or some
other pathway.

When mod_ssl loads, it calls CRYPTO_THREADID_set_callback() as part of setting
up the thread-safety routines in OpenSSL. On unload, it attempts to unset this
callback by calling the registration function with a NULL argument. This always
fails, and the function returns zero, because unfortunately this API is
write-once.

*If* mod_ssl is the only OpenSSL consumer, then libcrypto will be unloaded and
reloaded and the OpenSSL static initialization will reset the callback, masking
any issue. But if something else has loaded OpenSSL as well (say, by using the
apr_crypto_* API), that threadid callback will now be pointing at junk.

Even after this, there's still a way to "succeed" -- *if* mod_ssl is reloaded
into exactly the same position, everything will proceed as normal. But if not,
we'll segfault the next time we call into OpenSSL. Currently, our Ubuntu
autotest machine is running into this crash.

Things we might consider:
- don't set any callback at all, and rely on OpenSSL's default threadid
callback
- for platforms on which the default threadid callback is unsafe/nonfunctional,
revert to the deprecated CRYPTO_set_id_callback(), which is not write-once
- talk to OpenSSL upstream to figure out the "right way" to deal with multiple
(possibly competing) OpenSSL consumers in the same address space

See also:
- bug 54357, which similarly discusses an inability to reset a static callback
in OpenSSL
- postgresql dev conversation on this:
https://postgrespro.com/list/id/87zj6ht6ef.fsf@wulczer.org

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 60947] Segfault on startup when using mod_ssl with APR-crypto

Posted by bu...@apache.org.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60947

--- Comment #5 from Jacob Champion <jc...@apache.org> ---
(In reply to Rainer Jung from comment #4)
> Probably not really helpful, but this problem somewhat reminds me of the
> older
> 
> https://bz.apache.org/bugzilla/show_bug.cgi?id=54357
Yep, and just like with that bug, there's no way to unregister the callback
once it's been set. The only winning move is not to play...

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 60947] Segfault on startup when using mod_ssl with APR-crypto

Posted by bu...@apache.org.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60947

--- Comment #7 from sta@cpanel.net ---
We've recently seen the same bug(crashing during graceful restart):
====
(gdb) bt
#0  0x00002b2fac30f340 in ?? ()
#1  0x00002b2fac8b74d4 in ERR_get_state () from
/opt/cpanel/ea-openssl/lib64/libcrypto.so.1.0.0
#2  0x00002b2fac8b793f in ERR_clear_error () from
/opt/cpanel/ea-openssl/lib64/libcrypto.so.1.0.0
#3  0x00002b2fac8a669e in ENGINE_load_builtin_engines () from
/opt/cpanel/ea-openssl/lib64/libcrypto.so.1.0.0
#4  0x00002b2facbf6fd5 in ssl_hook_pre_config (pconf=0x55ae04945138,
plog=<optimized out>, ptemp=<optimized out>) at mod_ssl.c:407
#5  0x000055ae03f4986e in ap_run_pre_config (pconf=pconf@entry=0x55ae04945138,
plog=0x55ae04972358, ptemp=0x55ae049c4608) at config.c:89
#6  0x000055ae03f2321c in main (argc=3, argv=0x7fff9928ab88) at main.c:775
(gdb) list ERR_get_state
1029     ERR_remove_thread_state(NULL);
1030    }
1031    #endif
1032
1033    ERR_STATE *ERR_get_state(void)
1034    {
1035     static ERR_STATE fallback;
1036     ERR_STATE *ret, tmp, *tmpp = NULL;
1037     int i;
1038     CRYPTO_THREADID tid;
(gdb) l
1039
1040     err_fns_check();
1041     CRYPTO_THREADID_current(&tid);
1042     CRYPTO_THREADID_cpy(&tmp.tid, &tid);
1043     ret = ERRFN(thread_get_item) (&tmp);
1044
1045     /* ret == the error state, if NULL, make a new one */
1046     if (ret == NULL) {
1047         ret = (ERR_STATE *)OPENSSL_malloc(sizeof(ERR_STATE));
1048         if (ret == NULL)
(gdb) l CRYPTO_THREADID_current
481     return threadid_callback;
482    }
483
484    void CRYPTO_THREADID_current(CRYPTO_THREADID *id)
485    {
486     if (threadid_callback) {
487         threadid_callback(id);
488         return;
489     }
490    #ifndef OPENSSL_NO_DEPRECATED
(gdb)
====

I saw a patch was committed to trunk, but it doesn't seem it ever made it into
the 2.4.x branch:
====
https://github.com/apache/httpd/commit/d7dad162b7294df358284134c06d0889076d98af#diff-6b684bff3a841b11b1a51e5ab3c79761
====

Before we include the change into our own distribution, is there any reason the
patch wasn't merged into 2.4.x and this case closed?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 60947] Segfault on startup when using mod_ssl with APR-crypto

Posted by bu...@apache.org.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60947

bz.apache@egrechishkina.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bz.apache@egrechishkina.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 60947] Segfault on startup when using mod_ssl with APR-crypto

Posted by bu...@apache.org.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60947

Paul Spangler <pa...@ni.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |paul.spangler@ni.com

--- Comment #2 from Paul Spangler <pa...@ni.com> ---
This comment isn't related to the particular issue you've described with the
callback, but is related to the title of the bug. We've also seen a rare
segfault when mixing mod_ssl with APR-crypto, but in SSL_load_error_strings
(which is supposedly fixed in 1.1.0, see
https://rt.openssl.org/Ticket/Display.html?user=guest&pass=guest&id=2325):

1. SSL_load_error_strings sets up a static hash table in libeay containing
error strings, some of which point to memory in ssleay.
2. libapr loads apr_crypto_openssl, which bumps the refcount on libeay.
3. Server executes normally... until say the child process crashes or a
graceful restart.
4. All modules are unloaded, but APR drivers are not (i.e. apr_crypto_openssl
remains loaded).
5. During unload, ssleay gets unloaded but libeay remains loaded due to step 2.
6. Modules are loaded again.
7. ssleay may load into a different location than it was in step 1.
8. The hash table from step 1 still exists (in libeay), but now if it tries to
check one of the SSL error strings, it points to a now-invalid place in memory,
crashing.

Our research led to this (somewhat old) mailing list thread:
http://mailing.openssl.dev.narkive.com/syUDbGxq/memory-corruption-after-libssl-is-unloaded-from-memory
where one of the devs is "not sure it has ever been a feasible goal to make
OpenSSL DSO/DLLs
able to be unloaded (with the aim of subsequently loading)."

In the end, we decided it was safest to pin ssleay and libeay in memory for the
lifetime of the server via a module that preloads them and never unloads them.
That "solution" wouldn't fix this particular callback issue since it's related
to mod_ssl moving in memory, but I figured this might be worth sharing anyway.
Feel free to ignore if it's irrelevant :)

Maybe the "fix" here would also be to pin whatever the callback points to in
memory (if that would even be feasible) so it survives a mod_ssl reload?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 60947] Segfault on startup when using mod_ssl with APR-crypto

Posted by bu...@apache.org.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60947

--- Comment #4 from Rainer Jung <ra...@kippdata.de> ---
Probably not really helpful, but this problem somewhat reminds me of the older

https://bz.apache.org/bugzilla/show_bug.cgi?id=54357

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 60947] Segfault on startup when using mod_ssl with APR-crypto

Posted by bu...@apache.org.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60947

--- Comment #6 from Jacob Champion <jc...@apache.org> ---
First attempt at a fix checked into trunk at r1791849.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 60947] Segfault on startup when using mod_ssl with APR-crypto

Posted by bu...@apache.org.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60947

--- Comment #1 from Jacob Champion <jc...@apache.org> ---
More research:

1) OpenSSL 1.1.0 no longer requires the use of CRYPTO_THREADID_set_callback()
at all, and in fact the CRYPTO_THREADID_* stuff is now a no-op. See rsalz's
post at https://www.openssl.org/blog/blog/2017/02/21/threads/ .

2) OpenSSL 1.0.2 (and possibly prior, but I haven't verified) contains default
implementations for Windows and BeOS. For all other platforms, its fallback
implementation uses the address of errno to tell threads apart.

3) Our current implementation of the threadid callback is possibly incorrect on
some platforms, since we're using CRYPTO_THREADID_set_numeric() and truncating
an apr_os_thread_t into the space of an unsigned long. This opens up the
possibility of collisions. As a concrete example: this makes our implementation
objectively worse than the default implementation on Windows, because we
truncate a HANDLE into a long, whereas the default uses GetCurrentThreadId()
directly.

There is a CRYPTO_THREADID_set_pointer() that allows us to pass in a void*, and
if it turns out we have to set a threadid callback, I think we should prefer
the set_pointer() flavor if the native thread identifier is bigger than a long.

This is all making me feel like we should try to dump the callback entirely on
as many platforms as possible (i.e. those with an addressable, per-thread
errno), and decide what to do with the remaining platforms (if any) on a
case-by-case basis.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 60947] Segfault on startup when using mod_ssl with APR-crypto

Posted by bu...@apache.org.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60947

Dimitry Andric <di...@unified-streaming.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dimitry@unified-streaming.c
                   |                            |om

--- Comment #8 from Dimitry Andric <di...@unified-streaming.com> ---
Another ping on this bug. It would be nice to have this officially backported
to 2.4.x, as Apache + mod_ssl now segfaults by default, on various Dockerized
platforms. In particular, running arm64 containers on x86_64 hardware, where
qemu-user-static is used.

Unfortunately r1791849 (and its other patches from the trunk-openssl-threadid
branch) don't seem to apply cleanly to the current state of the 2.4.x branch...

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 60947] Segfault on startup when using mod_ssl with APR-crypto

Posted by bu...@apache.org.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60947

--- Comment #3 from Jacob Champion <jc...@apache.org> ---
(In reply to Paul Spangler from comment #2)
> This comment isn't related to the particular issue you've described with the
> callback, but is related to the title of the bug. We've also seen a rare
> segfault when mixing mod_ssl with APR-crypto, but in SSL_load_error_strings
> (which is supposedly fixed in 1.1.0, see
> https://rt.openssl.org/Ticket/Display.html?user=guest&pass=guest&id=2325):
Nice, thanks for the link! More evidence that perhaps the long-term fix for
these sorts of problems is just to move to 1.1.0, where they've done away with
a lot of the static global initialization stuff...

> Maybe the "fix" here would also be to pin whatever the callback points to in
> memory (if that would even be feasible) so it survives a mod_ssl reload?
Interesting idea. I suppose we could have the callback(s) live in a completely
separate DSO that never gets unloaded. Feels like the bang-for-buck might not
be there, if this is a problem that's just going to go away after everyone
finally updates to 1.1.0-or-later.

*If* we have to keep supporting stuff like 0.9.8 forever, though, it might be
worth a shot. Keep an eye on my recent dev thread [1]; maybe someone will have
some input on that front.

[1]
https://lists.apache.org/thread.html/15c735e7c513f150a534bc0be69c101106b4d64d076f2c22e2a0ad52@%3Cdev.httpd.apache.org%3E

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org