You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Yann Ylavic <yl...@gmail.com> on 2017/06/30 10:18:59 UTC

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Hi Luca,

[better/easier to talk about details on dev@]

On Fri, Jun 30, 2017 at 11:05 AM,  <bu...@apache.org> wrote:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=60956
>
> --- Comment #11 from Luca Toscano <to...@gmail.com> ---
> Other two interesting trunk improvements that have not been backported yet:
>
> http://svn.apache.org/viewvc?view=revision&revision=1706669
> http://svn.apache.org/viewvc?view=revision&revision=1734656
>
> IIUC these ones are meant to provide a more async behavior to most of the
> output filters, namely setting aside buckets (on the heap) to avoid blocking.

These are quite orthogonal I think, and don't seem to fix this particular issue.

>
> After a bit of thinking it seems to me that we'd need to find a solution that
> prevents the mod_ssl_output filter to block, but in a safe way.

IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
modssl_smart_shutdown(), only in the "abortive" mode of
ssl_filter_io_shutdown().

The caller of the ssl output filters should decide when to flush, so
http://svn.apache.org/r1651077 was not the good fix in this regard.

Even if we don't BIO_flush, openssl shouldn't retain the close-notify
by itself, so at least it should go down to the next/core ouput filter
(and stay there until the socket is write-able or asked to flush).

>
> In this particular case we assume this about start_lingering_close_nonblocking:
>
> """
> /*
>  * Close our side of the connection, NOT flushing data to the client.
>  * This should only be called if there has been an error or if we know
>  * that our send buffers are empty.
>  * Pre-condition: cs is not in any timeout queue and not in the pollset,
>  *                timeout_mutex is not locked
>  * return: 0 if connection is fully closed,
>  *         1 if connection is lingering
>  * may be called by listener thread
>  */
> """
>
> I tried the following patch:
>
> """
> Index: server/mpm/event/event.c
> ===================================================================
> --- server/mpm/event/event.c    (revision 1800362)
> +++ server/mpm/event/event.c    (working copy)
> @@ -744,10 +744,7 @@
>      conn_rec *c = cs->c;
>      apr_socket_t *csd = cs->pfd.desc.s;
>
> -    if (ap_prep_lingering_close(c)
> -        || c->aborted
> -        || ap_shutdown_conn(c, 0) != APR_SUCCESS || c->aborted
> -        || apr_socket_shutdown(csd, APR_SHUTDOWN_WRITE) != APR_SUCCESS) {
> +    if (ap_prep_lingering_close(c) || c->aborted) {
>          apr_socket_close(csd);
>          ap_push_pool(worker_queue_info, cs->p);
>          if (dying)
> """
>
> So the idea was to brutally close the connection only if
> ap_prep_lingering_close(c) is not 0 or if the client has already aborted, but
> to leave all the other cases to the start_lingering_close_common. This is
> probably not enough/correct because the connection would go into the
> lingering_close queue, to be picked up again by
> process_timeout_queue(linger_q,..) after the timeout that would call
> stop_lingering_close, that would in turn simply close the socket without giving
> the possibility to mod_ssl to flush its close-notify (because no
> ap_shutdown_conn would be called).

With a possibly non-blocking modssl_smart_shutdown(), I think we could
make ap_shutdown_conn(c, 0) return something like APR_INCOMPLETE for
the case the shutdown was "buffered" in the output filter stack (e.g.
core output filter).

In mpm_event, we would then go to (or stay in) the WRITE_COMPLETION
state instead of LINGER, until every remaining piece data is flushed
successfully.

Does this sound OK?

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Eissing <st...@greenbytes.de>.

> Am 30.06.2017 um 13:33 schrieb Yann Ylavic <yl...@gmail.com>:
> 
> On Fri, Jun 30, 2017 at 1:20 PM, Ruediger Pluem <rp...@apache.org> wrote:
>> 
>> On 06/30/2017 12:18 PM, Yann Ylavic wrote:
>>> 
>>> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
>>> modssl_smart_shutdown(), only in the "abortive" mode of
>>> ssl_filter_io_shutdown().
>> 
>> I think the issue starts before that.
>> ap_prep_lingering_close calls the pre_close_connection hook and modules that are registered
>> to this hook can perform all sort of long lasting blocking operations there.
>> While it can be argued that this would be a bug in the module I think the only safe way
>> is to have the whole start_lingering_close_nonblocking being executed by a worker thread.
> 
> Correct, that'd be much simpler/safer indeed.
> We need a new SHUTDOWN state then, right?

+1

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Luca Toscano <to...@gmail.com>.

2017-07-17 9:33 GMT+02:00 Stefan Eissing <st...@greenbytes.de>:

>
> > Am 14.07.2017 um 21:52 schrieb Yann Ylavic <yl...@gmail.com>:
> >
> > On Fri, Jun 30, 2017 at 1:33 PM, Yann Ylavic <yl...@gmail.com>
> wrote:
> >> On Fri, Jun 30, 2017 at 1:20 PM, Ruediger Pluem <rp...@apache.org>
> wrote:
> >>>
> >>> On 06/30/2017 12:18 PM, Yann Ylavic wrote:
> >>>>
> >>>> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
> >>>> modssl_smart_shutdown(), only in the "abortive" mode of
> >>>> ssl_filter_io_shutdown().
> >>>
> >>> I think the issue starts before that.
> >>> ap_prep_lingering_close calls the pre_close_connection hook and
> modules that are registered
> >>> to this hook can perform all sort of long lasting blocking operations
> there.
> >>> While it can be argued that this would be a bug in the module I think
> the only safe way
> >>> is to have the whole start_lingering_close_nonblocking being executed
> by a worker thread.
> >>
> >> Correct, that'd be much simpler/safer indeed.
> >> We need a new SHUTDOWN state then, right?
> >
> > Actually it was less simple than expected, and it has some caveats
> obviously...
> >
> > The attached patch does not introduce a new state but reuses the
> > existing CONN_STATE_LINGER since it was not really considered by the
> > listener thread (which uses CONN_STATE_LINGER_NORMAL and
> > CONN_STATE_LINGER_SHORT instead), but that's a detail.
> >
> > Mainly, start_lingering_close_nonblocking() now simply schedules a
> > shutdown (i.e. pre_close_connection() followed by immediate close)
> > that will we be run by a worker thread.
> > A new shutdown_linger_q is created/handled (with the same timeout as
> > the short_linger_q, namely 2 seconds) to hold connections to be
> > shutdown.
> >
> > So now when a connection times out in the write_completion or
> > keepalive queues, it needs (i.e. the listener may wait for) an
> > available worker to process its shutdown/close.
> > This means we can *not* close kept alive connections immediatly like
> > before when becoming short of workers, which will favor active KA
> > connections over new ones in this case (I don't think it's that
> > serious but the previous was taking care of that. For me it's up to
> > the admin to size the workers appropriately...).
> >
> > Same when a connection in the shutdown_linger_q itself times out, the
> > patch will require a worker immediatly to do the job (see
> > shutdown_lingering_close() callback).
> >
> > So overall, this patch may introduce the need for more workers than
> > before, what was (wrongly) done by the listener thread has to be done
> > somewhere anyway...
> >
> > Finally, I think there is room for improvements like batching
> > shutdowns in the same worker if there is no objection on the approach
> > so far.
> >
> > WDYT?
>
> I will test the patch, most likely today. I lot of +1s for the initiative!
>
>
Thanks a lot for this work Yann, really nice! I will try to test it as well
during the next days, I was not convinced in the beginning that triggering
a (potential) increase in workers usage was super great for our users but
it is definitely the most correct and safe thing to do. Even if we find a
way to fix the ssl-lingering-close-block issue we might encounter a similar
one in the future, so it is better imho to fix it at the source.

Luca

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Eissing <st...@greenbytes.de>.

Threw it into my test suite and works nicely. 

> Am 17.07.2017 um 14:02 schrieb Yann Ylavic <yl...@gmail.com>:
> 
> On Mon, Jul 17, 2017 at 9:33 AM, Stefan Eissing
> <st...@greenbytes.de> wrote:
>> 
>> I will test the patch, most likely today. I lot of +1s for the initiative!
> 
> Thanks Stefan, as I said the proposed patch currently reuses the
> existing CONN_STATE_LINGER state to shutdown connections, but if it
> needs to be set from outside mpm_event (eg. mod_h2 ;) we could add a
> new state...

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Mon, Jul 17, 2017 at 9:33 AM, Stefan Eissing
<st...@greenbytes.de> wrote:
>
> I will test the patch, most likely today. I lot of +1s for the initiative!

Thanks Stefan, as I said the proposed patch currently reuses the
existing CONN_STATE_LINGER state to shutdown connections, but if it
needs to be set from outside mpm_event (eg. mod_h2 ;) we could add a
new state...

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Eissing <st...@greenbytes.de>.

> Am 14.07.2017 um 21:52 schrieb Yann Ylavic <yl...@gmail.com>:
> 
> On Fri, Jun 30, 2017 at 1:33 PM, Yann Ylavic <yl...@gmail.com> wrote:
>> On Fri, Jun 30, 2017 at 1:20 PM, Ruediger Pluem <rp...@apache.org> wrote:
>>> 
>>> On 06/30/2017 12:18 PM, Yann Ylavic wrote:
>>>> 
>>>> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
>>>> modssl_smart_shutdown(), only in the "abortive" mode of
>>>> ssl_filter_io_shutdown().
>>> 
>>> I think the issue starts before that.
>>> ap_prep_lingering_close calls the pre_close_connection hook and modules that are registered
>>> to this hook can perform all sort of long lasting blocking operations there.
>>> While it can be argued that this would be a bug in the module I think the only safe way
>>> is to have the whole start_lingering_close_nonblocking being executed by a worker thread.
>> 
>> Correct, that'd be much simpler/safer indeed.
>> We need a new SHUTDOWN state then, right?
> 
> Actually it was less simple than expected, and it has some caveats obviously...
> 
> The attached patch does not introduce a new state but reuses the
> existing CONN_STATE_LINGER since it was not really considered by the
> listener thread (which uses CONN_STATE_LINGER_NORMAL and
> CONN_STATE_LINGER_SHORT instead), but that's a detail.
> 
> Mainly, start_lingering_close_nonblocking() now simply schedules a
> shutdown (i.e. pre_close_connection() followed by immediate close)
> that will we be run by a worker thread.
> A new shutdown_linger_q is created/handled (with the same timeout as
> the short_linger_q, namely 2 seconds) to hold connections to be
> shutdown.
> 
> So now when a connection times out in the write_completion or
> keepalive queues, it needs (i.e. the listener may wait for) an
> available worker to process its shutdown/close.
> This means we can *not* close kept alive connections immediatly like
> before when becoming short of workers, which will favor active KA
> connections over new ones in this case (I don't think it's that
> serious but the previous was taking care of that. For me it's up to
> the admin to size the workers appropriately...).
> 
> Same when a connection in the shutdown_linger_q itself times out, the
> patch will require a worker immediatly to do the job (see
> shutdown_lingering_close() callback).
> 
> So overall, this patch may introduce the need for more workers than
> before, what was (wrongly) done by the listener thread has to be done
> somewhere anyway...
> 
> Finally, I think there is room for improvements like batching
> shutdowns in the same worker if there is no objection on the approach
> so far.
> 
> WDYT?

I will test the patch, most likely today. I lot of +1s for the initiative!

-Stefan

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Hello,

fullstatus says:
   Slot  PID  Stopping   Connections    Threads       Async connections
                       total accepting busy idle writing keep-alive  closing
   0    25042 no       0     no        2    198  0       0
4294966698
   1    4347  no       0     no        0    200  0       0
4294966700
   2    26273 no       0     no        1    199  0       0
4294966698
   3    4348  no       0     no        0    200  0       0
4294966699
   4    10224 no       0     no        0    200  0       0
4294966697
   5    12157 no       0     no        0    200  0       0
4294966700
   6    23027 no       0     no        0    200  0       0
4294966698
   7    28597 no       0     no        0    200  0       0
4294966698
   8    7519  no       0     no        0    200  0       0
4294966697
   9    18609 no       0     no        2    198  0       0
4294966698
   10   3183  no       0     no        0    200  0       0
4294966698
   11   14704 no       0     no        0    200  0       0
4294966698
   12   26237 no       0     no        0    200  0       0
4294966700
   13   32070 no       0     no        0    200  0       0
4294966697
   14   12070 no       1     no        0    200  0       0
4294966699
   15   16627 no       0     no        0    200  0       0
4294966698
   16   29413 no       0     no        0    200  0       0
4294966699
   17   435   no       0     no        0    200  0       0
4294966699
   18   24808 no       0     no        0    200  0       0
4294966700
   19   1822  no       0     no        0    200  0       0
4294966699
   20   1721  no       0     no        0    200  0       0
4294966698
   21   2875  no       0     no        0    200  0       0
4294966698
   22   25879 no       0     no        0    200  0       0
4294966698
   23   28091 no       0     no        0    200  0       0
4294966696
   24   31452 no       0     no        0    200  0       0
4294966698
   25   32706 no       0     no        0    200  0       0
4294966698
   26   8858  no       14    yes       3    197  0       6
4294966783
   27   10203 no       5     yes       2    198  0       2
4294966949
   Sum  28    0        20              10   5590 0       8          -16400

Greets,
Stefan

Am 19.07.2017 um 17:05 schrieb Stefan Priebe - Profihost AG:
> 
> Am 19.07.2017 um 16:59 schrieb Stefan Priebe - Profihost AG:
>> Hello Yann,
>>
>> i'm observing some deadlocks again.
>>
>> I'm using
>> httpd 2.4.27
>> + mod_h2
>> + httpd-2.4.x-mpm_event-wakeup-v7.1.patch
>> + your ssl linger fix patch from this thread
>>
>> What kind of information do you need? If you need a full stack backtrace
>>  - from which pid? Or from all httpd pids?
> 
> Something i forgot to tell:
> 
> it seems httpd is running at max threads:
> awk '{print $10 $11}' lsof.txt | sort | uniq -c | grep LISTEN
>   25050 *:http(LISTEN)
>   25050 *:https(LISTEN)
> 
> Stefan
> 
>>
>> Thanks!
>>
>> Greets,
>> Stefan
>>
>> Am 14.07.2017 um 21:52 schrieb Yann Ylavic:
>>> On Fri, Jun 30, 2017 at 1:33 PM, Yann Ylavic <yl...@gmail.com> wrote:
>>>> On Fri, Jun 30, 2017 at 1:20 PM, Ruediger Pluem <rp...@apache.org> wrote:
>>>>>
>>>>> On 06/30/2017 12:18 PM, Yann Ylavic wrote:
>>>>>>
>>>>>> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
>>>>>> modssl_smart_shutdown(), only in the "abortive" mode of
>>>>>> ssl_filter_io_shutdown().
>>>>>
>>>>> I think the issue starts before that.
>>>>> ap_prep_lingering_close calls the pre_close_connection hook and modules that are registered
>>>>> to this hook can perform all sort of long lasting blocking operations there.
>>>>> While it can be argued that this would be a bug in the module I think the only safe way
>>>>> is to have the whole start_lingering_close_nonblocking being executed by a worker thread.
>>>>
>>>> Correct, that'd be much simpler/safer indeed.
>>>> We need a new SHUTDOWN state then, right?
>>>
>>> Actually it was less simple than expected, and it has some caveats obviously...
>>>
>>> The attached patch does not introduce a new state but reuses the
>>> existing CONN_STATE_LINGER since it was not really considered by the
>>> listener thread (which uses CONN_STATE_LINGER_NORMAL and
>>> CONN_STATE_LINGER_SHORT instead), but that's a detail.
>>>
>>> Mainly, start_lingering_close_nonblocking() now simply schedules a
>>> shutdown (i.e. pre_close_connection() followed by immediate close)
>>> that will we be run by a worker thread.
>>> A new shutdown_linger_q is created/handled (with the same timeout as
>>> the short_linger_q, namely 2 seconds) to hold connections to be
>>> shutdown.
>>>
>>> So now when a connection times out in the write_completion or
>>> keepalive queues, it needs (i.e. the listener may wait for) an
>>> available worker to process its shutdown/close.
>>> This means we can *not* close kept alive connections immediatly like
>>> before when becoming short of workers, which will favor active KA
>>> connections over new ones in this case (I don't think it's that
>>> serious but the previous was taking care of that. For me it's up to
>>> the admin to size the workers appropriately...).
>>>
>>> Same when a connection in the shutdown_linger_q itself times out, the
>>> patch will require a worker immediatly to do the job (see
>>> shutdown_lingering_close() callback).
>>>
>>> So overall, this patch may introduce the need for more workers than
>>> before, what was (wrongly) done by the listener thread has to be done
>>> somewhere anyway...
>>>
>>> Finally, I think there is room for improvements like batching
>>> shutdowns in the same worker if there is no objection on the approach
>>> so far.
>>>
>>> WDYT?
>>>
>>> Regards,
>>> Yann.
>>>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Eric Covener <co...@gmail.com>.

On Wed, Jul 19, 2017 at 2:25 PM, Stefan Priebe - Profihost AG
<s....@profihost.ag> wrote:
> Hello,
>
> here we go:
>
> This one is from a server where the first httpd process got stuck:
>
>    Slot  PID  Stopping   Connections    Threads       Async connections
>                        total accepting busy idle writing keep-alive  closing
>    0    31675 no       0     no        0    200  0       0
> 4294966700
>
> gdb thread apply all from this pid:
> https://apaste.info/cBT5
>

summary from a script I use:

1: read>read>ap>child_main>make_child>server_main_loop>event>ap_run_mpm>main
300: pthread_cond_wait@@GLIBC_2.3.2>apr_thread_cond_wait>get_next>slot>start_thread>clone
200: pthread_cond_wait@@GLIBC_2.3.2>apr_thread_cond_wait>ap_queue_pop_something>worker_thread>start_thread>clone
1: epoll_wait>impl_pollset_poll>listener_thread>start_thread>clone

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Hello,

here we go:

This one is from a server where the first httpd process got stuck:

   Slot  PID  Stopping   Connections    Threads       Async connections
                       total accepting busy idle writing keep-alive  closing
   0    31675 no       0     no        0    200  0       0
4294966700

gdb thread apply all from this pid:
https://apaste.info/cBT5

Greets,
Stefan

Am 19.07.2017 um 17:48 schrieb Luca Toscano:
> Hello Stefan,
> 
> 2017-07-19 17:05 GMT+02:00 Stefan Priebe - Profihost AGlo
> <s.priebe@profihost.ag <ma...@profihost.ag>>:
> 
> 
>     Am 19.07.2017 um 16:59 schrieb Stefan Priebe - Profihost AG:
>     > Hello Yann,
>     >
>     > i'm observing some deadlocks again.
>     >
>     > I'm using
>     > httpd 2.4.27
>     > + mod_h2
>     > + httpd-2.4.x-mpm_event-wakeup-v7.1.patch
>     > + your ssl linger fix patch from this thread
>     >
>     > What kind of information do you need? If you need a full stack backtrace
>     >  - from which pid? Or from all httpd pids?
> 
>     Something i forgot to tell:
> 
>     it seems httpd is running at max threads:
>     awk '{print $10 $11}' lsof.txt | sort | uniq -c | grep LISTEN
>       25050 *:http(LISTEN)
>       25050 *:https(LISTEN)
> 
> 
> First of all let me tell you how awesome is your regular testing, thank
> you! It is helping a ton to deliver stable code :)
> 
> From my point of view I think that you can attach gdb to one or more
> httpd processes and do the usual "thread apply all", IIUC your httpd
> ends up having all of its processes in more or less the same state right?
> 
> Let's use https://apaste.info/ or similar though otherwise we'll need to
> exchange super long emails.
> 
> Thanks again!
> 
> Luca

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Am 20.07.2017 um 01:26 schrieb Yann Ylavic:
> On Wed, Jul 19, 2017 at 11:14 PM, Stefan Priebe - Profihost AG
> <s....@profihost.ag> wrote:
>> Am 19.07.2017 um 22:46 schrieb Yann Ylavic:
>>>
>>> Attached is a v2 if you feel confident enough, still ;)
>>
>> Thanks, yes i will.
> 
> If you managed to install v2 already you may want to ignore this new
> v3, which only addresses a very unlikely error case where
> lingering_count++ is missing (plus some non-functional changes, a bit
> of renaming and the factorization which would have avoided this
> mistake in the first place).
> 
> Otherwise, you could try this one instead.

Thanks, switched to V3.

> 
> Thanks,
> Yann.
>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Yes:
Slot  PID  Stopping   Connections    Threads       Async connections
                      total accepting busy idle writing keep-alive  closing
  0    3614  no       1     no        4    196  0       0
4294966701
  1    3615  no       0     no        5    195  0       0
4294966697
  2    10228 no       0     no        6    194  0       0
4294966698
  3    12030 no       0     no        4    196  0       0
4294966697
  4    19495 no       3     yes       6    194  0       0
4294966801
  5    22098 no       6     yes       5    195  0       5
4294966706
  6    30071 no       15    yes       8    192  0       5
4294967283
  Sum  7     0        25              38   1362 0       10         -3489

Stefan

Excuse my typo sent from my mobile phone.

> Am 20.07.2017 um 15:18 schrieb Yann Ylavic <yl...@gmail.com>:
> 
> On Thu, Jul 20, 2017 at 2:58 PM, Stefan Priebe - Profihost AG
> <s....@profihost.ag> wrote:
>> Yes it looks the same but I can't tell if it is.
>> 
>> Here's a backtrace from V3:
>> https://apaste.info/Aw0r
> 
> Thanks Stefan, how about mod_status, still some strange entries?
> 
> Regards,
> Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Thu, Jul 20, 2017 at 2:58 PM, Stefan Priebe - Profihost AG
<s....@profihost.ag> wrote:
> Yes it looks the same but I can't tell if it is.
>
> Here's a backtrace from V3:
> https://apaste.info/Aw0r

Thanks Stefan, how about mod_status, still some strange entries?

Regards,
Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Yes it looks the same but I can't tell if it is.

Here's a backtrace from V3:
https://apaste.info/Aw0r

Greets,
Stefan

Excuse my typo sent from my mobile phone.

> Am 20.07.2017 um 13:01 schrieb Yann Ylavic <yl...@gmail.com>:
> 
> On Thu, Jul 20, 2017 at 12:48 PM, Stefan Priebe - Profihost AG
> <s....@profihost.ag> wrote:
>> V3 didn't help. Will post a new gdb backtrace soon takes some time as I'm on
>> holiday.
> 
> Thanks Stefan, still some/the same issue in status?
> 
> Regards,
> Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Thu, Jul 20, 2017 at 12:48 PM, Stefan Priebe - Profihost AG
<s....@profihost.ag> wrote:
> V3 didn't help. Will post a new gdb backtrace soon takes some time as I'm on
> holiday.

Thanks Stefan, still some/the same issue in status?

Regards,
Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Fri, Jul 21, 2017 at 1:08 AM, Yann Ylavic <yl...@gmail.com> wrote:
> On Thu, Jul 20, 2017 at 12:48 PM, Stefan Priebe - Profihost AG
> <s....@profihost.ag> wrote:
>> V3 didn't help.
>
> I just posted a new patch in this thread, with a new approach which I
> think is better anyway.
>
> Would you mind testing it in your environment?

Latest (defer_linger_chain-v2.patch, minor changes) attached to PR 60956.

>
>
> Regards,
> Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Eissing <st...@greenbytes.de>.

Thanks for testing and verifying the fix, Stefan!

> Am 31.07.2017 um 11:32 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
> 
> 4tr i was able to fix this by mod_h2 v1.10.10
> 
> Greets,
> Stefan
> 
> Am 25.07.2017 um 15:40 schrieb Stefan Eissing:
>> Well, if the customer could reproduce this at a 
>> 
>>  LogLevel http2:trace2
>> 
>> that would help.
>> 
>>> Am 25.07.2017 um 15:38 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>> 
>>> Hello Stefan,
>>> 
>>> thanks for the patch. No it does not solve the problem our customer is
>>> seeing.
>>> 
>>> What kind of details / logs you need?
>>> 
>>> Greets,
>>> Stefan
>>> 
>>> Am 25.07.2017 um 11:59 schrieb Stefan Eissing:
>>>> The issue was opened here: https://github.com/icing/mod_h2/issues/143
>>>> 
>>>> I made a patch that i hope addresses the problem. The 2.4.x version I attach to this mail.
>>>> 
>>>> Thanks!
>>>> 
>>>> Stefan
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> Am 25.07.2017 um 08:13 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>>>> 
>>>>> 
>>>>> Am 24.07.2017 um 23:06 schrieb Stefan Eissing:
>>>>>> I have another report of request getting stuck, resulting in the error you noticed. Will look tomorrow and report back here what I find.
>>>>> 
>>>>> Thanks, if you need any logs. Pleae ask.
>>>>> 
>>>>> Stefan
>>>>> 
>>>>>> 
>>>>>>> Am 24.07.2017 um 22:20 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>>>>>> 
>>>>>>> Hello all,
>>>>>>> 
>>>>>>> currently 8 hours of testing without any issues.
>>>>>>> 
>>>>>>> @Stefan
>>>>>>> i've most probably another issue with http2 where some elements of the
>>>>>>> page are sometimes missing and the connection results in
>>>>>>> ERR_CONNECTION_CLOSED after 60s. What kind of details do you need?
>>>>>>> 
>>>>>>> Greets,
>>>>>>> Stefan
>>>>>>>> Am 22.07.2017 um 13:35 schrieb Yann Ylavic:
>>>>>>>>> On Sat, Jul 22, 2017 at 2:18 AM, Yann Ylavic <yl...@gmail.com> wrote:
>>>>>>>>> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
>>>>>>>>> <s....@profihost.ag> wrote:
>>>>>>>>>> 
>>>>>>>>>> your new defer linger V3 deadlocked as well.
>>>>>>>>>> 
>>>>>>>>>> GDB traces:
>>>>>>>>>> https://www.apaste.info/LMfJ
>>>>>>>>> 
>>>>>>>>> This shows the listener thread waiting for a worker while there are
>>>>>>>>> many available.
>>>>>>>>> My mistake, the worker threads failed to rearm their idle state for
>>>>>>>>> the deferred close case.
>>>>>>>>> 
>>>>>>>>> V4 available, thanks Stefan!
>>>>>>>> 
>>>>>>>> Since I didn't want you to test again something that fails quickly, I
>>>>>>>> did some stress tests myself this time with several tools, with V5
>>>>>>>> (same as V4 from a 2.4.x POV).
>>>>>>>> 
>>>>>>>> It seems to work well here...
>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Yann.
>>>>>> 
>>>> 
>>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

4tr i was able to fix this by mod_h2 v1.10.10

Greets,
Stefan

Am 25.07.2017 um 15:40 schrieb Stefan Eissing:
> Well, if the customer could reproduce this at a 
> 
>   LogLevel http2:trace2
> 
> that would help.
> 
>> Am 25.07.2017 um 15:38 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>
>> Hello Stefan,
>>
>> thanks for the patch. No it does not solve the problem our customer is
>> seeing.
>>
>> What kind of details / logs you need?
>>
>> Greets,
>> Stefan
>>
>> Am 25.07.2017 um 11:59 schrieb Stefan Eissing:
>>> The issue was opened here: https://github.com/icing/mod_h2/issues/143
>>>
>>> I made a patch that i hope addresses the problem. The 2.4.x version I attach to this mail.
>>>
>>> Thanks!
>>>
>>> Stefan
>>>
>>>
>>>
>>>
>>>
>>>> Am 25.07.2017 um 08:13 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>>>
>>>>
>>>> Am 24.07.2017 um 23:06 schrieb Stefan Eissing:
>>>>> I have another report of request getting stuck, resulting in the error you noticed. Will look tomorrow and report back here what I find.
>>>>
>>>> Thanks, if you need any logs. Pleae ask.
>>>>
>>>> Stefan
>>>>
>>>>>
>>>>>> Am 24.07.2017 um 22:20 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> currently 8 hours of testing without any issues.
>>>>>>
>>>>>> @Stefan
>>>>>> i've most probably another issue with http2 where some elements of the
>>>>>> page are sometimes missing and the connection results in
>>>>>> ERR_CONNECTION_CLOSED after 60s. What kind of details do you need?
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>> Am 22.07.2017 um 13:35 schrieb Yann Ylavic:
>>>>>>>> On Sat, Jul 22, 2017 at 2:18 AM, Yann Ylavic <yl...@gmail.com> wrote:
>>>>>>>> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
>>>>>>>> <s....@profihost.ag> wrote:
>>>>>>>>>
>>>>>>>>> your new defer linger V3 deadlocked as well.
>>>>>>>>>
>>>>>>>>> GDB traces:
>>>>>>>>> https://www.apaste.info/LMfJ
>>>>>>>>
>>>>>>>> This shows the listener thread waiting for a worker while there are
>>>>>>>> many available.
>>>>>>>> My mistake, the worker threads failed to rearm their idle state for
>>>>>>>> the deferred close case.
>>>>>>>>
>>>>>>>> V4 available, thanks Stefan!
>>>>>>>
>>>>>>> Since I didn't want you to test again something that fails quickly, I
>>>>>>> did some stress tests myself this time with several tools, with V5
>>>>>>> (same as V4 from a 2.4.x POV).
>>>>>>>
>>>>>>> It seems to work well here...
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Yann.
>>>>>
>>>
>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Eissing <st...@greenbytes.de>.

I am waiting to hear back from the peeps that opened the github issue. From how I read their logs, the patch should help them. Will report what they say. 

-Stefan

> Am 25.07.2017 um 15:40 schrieb Stefan Eissing <st...@greenbytes.de>:
> 
> Well, if the customer could reproduce this at a 
> 
>  LogLevel http2:trace2
> 
> that would help.
> 
>> Am 25.07.2017 um 15:38 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>> 
>> Hello Stefan,
>> 
>> thanks for the patch. No it does not solve the problem our customer is
>> seeing.
>> 
>> What kind of details / logs you need?
>> 
>> Greets,
>> Stefan
>> 
>>> Am 25.07.2017 um 11:59 schrieb Stefan Eissing:
>>> The issue was opened here: https://github.com/icing/mod_h2/issues/143
>>> 
>>> I made a patch that i hope addresses the problem. The 2.4.x version I attach to this mail.
>>> 
>>> Thanks!
>>> 
>>> Stefan
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> Am 25.07.2017 um 08:13 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>>> 
>>>> 
>>>>> Am 24.07.2017 um 23:06 schrieb Stefan Eissing:
>>>>> I have another report of request getting stuck, resulting in the error you noticed. Will look tomorrow and report back here what I find.
>>>> 
>>>> Thanks, if you need any logs. Pleae ask.
>>>> 
>>>> Stefan
>>>> 
>>>>> 
>>>>>> Am 24.07.2017 um 22:20 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>>>>> 
>>>>>> Hello all,
>>>>>> 
>>>>>> currently 8 hours of testing without any issues.
>>>>>> 
>>>>>> @Stefan
>>>>>> i've most probably another issue with http2 where some elements of the
>>>>>> page are sometimes missing and the connection results in
>>>>>> ERR_CONNECTION_CLOSED after 60s. What kind of details do you need?
>>>>>> 
>>>>>> Greets,
>>>>>> Stefan
>>>>>>>> Am 22.07.2017 um 13:35 schrieb Yann Ylavic:
>>>>>>>> On Sat, Jul 22, 2017 at 2:18 AM, Yann Ylavic <yl...@gmail.com> wrote:
>>>>>>>> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
>>>>>>>> <s....@profihost.ag> wrote:
>>>>>>>>> 
>>>>>>>>> your new defer linger V3 deadlocked as well.
>>>>>>>>> 
>>>>>>>>> GDB traces:
>>>>>>>>> https://www.apaste.info/LMfJ
>>>>>>>> 
>>>>>>>> This shows the listener thread waiting for a worker while there are
>>>>>>>> many available.
>>>>>>>> My mistake, the worker threads failed to rearm their idle state for
>>>>>>>> the deferred close case.
>>>>>>>> 
>>>>>>>> V4 available, thanks Stefan!
>>>>>>> 
>>>>>>> Since I didn't want you to test again something that fails quickly, I
>>>>>>> did some stress tests myself this time with several tools, with V5
>>>>>>> (same as V4 from a 2.4.x POV).
>>>>>>> 
>>>>>>> It seems to work well here...
>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Yann.
>>>>> 
>>> 
>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Eissing <st...@greenbytes.de>.

Well, if the customer could reproduce this at a 

  LogLevel http2:trace2

that would help.

> Am 25.07.2017 um 15:38 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
> 
> Hello Stefan,
> 
> thanks for the patch. No it does not solve the problem our customer is
> seeing.
> 
> What kind of details / logs you need?
> 
> Greets,
> Stefan
> 
> Am 25.07.2017 um 11:59 schrieb Stefan Eissing:
>> The issue was opened here: https://github.com/icing/mod_h2/issues/143
>> 
>> I made a patch that i hope addresses the problem. The 2.4.x version I attach to this mail.
>> 
>> Thanks!
>> 
>> Stefan
>> 
>> 
>> 
>> 
>> 
>>> Am 25.07.2017 um 08:13 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>> 
>>> 
>>> Am 24.07.2017 um 23:06 schrieb Stefan Eissing:
>>>> I have another report of request getting stuck, resulting in the error you noticed. Will look tomorrow and report back here what I find.
>>> 
>>> Thanks, if you need any logs. Pleae ask.
>>> 
>>> Stefan
>>> 
>>>> 
>>>>> Am 24.07.2017 um 22:20 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>>>> 
>>>>> Hello all,
>>>>> 
>>>>> currently 8 hours of testing without any issues.
>>>>> 
>>>>> @Stefan
>>>>> i've most probably another issue with http2 where some elements of the
>>>>> page are sometimes missing and the connection results in
>>>>> ERR_CONNECTION_CLOSED after 60s. What kind of details do you need?
>>>>> 
>>>>> Greets,
>>>>> Stefan
>>>>>> Am 22.07.2017 um 13:35 schrieb Yann Ylavic:
>>>>>>> On Sat, Jul 22, 2017 at 2:18 AM, Yann Ylavic <yl...@gmail.com> wrote:
>>>>>>> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
>>>>>>> <s....@profihost.ag> wrote:
>>>>>>>> 
>>>>>>>> your new defer linger V3 deadlocked as well.
>>>>>>>> 
>>>>>>>> GDB traces:
>>>>>>>> https://www.apaste.info/LMfJ
>>>>>>> 
>>>>>>> This shows the listener thread waiting for a worker while there are
>>>>>>> many available.
>>>>>>> My mistake, the worker threads failed to rearm their idle state for
>>>>>>> the deferred close case.
>>>>>>> 
>>>>>>> V4 available, thanks Stefan!
>>>>>> 
>>>>>> Since I didn't want you to test again something that fails quickly, I
>>>>>> did some stress tests myself this time with several tools, with V5
>>>>>> (same as V4 from a 2.4.x POV).
>>>>>> 
>>>>>> It seems to work well here...
>>>>>> 
>>>>>>> Regards,
>>>>>>> Yann.
>>>> 
>>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Hello Stefan,

thanks for the patch. No it does not solve the problem our customer is
seeing.

What kind of details / logs you need?

Greets,
Stefan

Am 25.07.2017 um 11:59 schrieb Stefan Eissing:
> The issue was opened here: https://github.com/icing/mod_h2/issues/143
> 
> I made a patch that i hope addresses the problem. The 2.4.x version I attach to this mail.
> 
> Thanks!
> 
> Stefan
> 
> 
> 
> 
> 
>> Am 25.07.2017 um 08:13 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>
>>
>> Am 24.07.2017 um 23:06 schrieb Stefan Eissing:
>>> I have another report of request getting stuck, resulting in the error you noticed. Will look tomorrow and report back here what I find.
>>
>> Thanks, if you need any logs. Pleae ask.
>>
>> Stefan
>>
>>>
>>>> Am 24.07.2017 um 22:20 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>>>
>>>> Hello all,
>>>>
>>>> currently 8 hours of testing without any issues.
>>>>
>>>> @Stefan
>>>> i've most probably another issue with http2 where some elements of the
>>>> page are sometimes missing and the connection results in
>>>> ERR_CONNECTION_CLOSED after 60s. What kind of details do you need?
>>>>
>>>> Greets,
>>>> Stefan
>>>>> Am 22.07.2017 um 13:35 schrieb Yann Ylavic:
>>>>>> On Sat, Jul 22, 2017 at 2:18 AM, Yann Ylavic <yl...@gmail.com> wrote:
>>>>>> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
>>>>>> <s....@profihost.ag> wrote:
>>>>>>>
>>>>>>> your new defer linger V3 deadlocked as well.
>>>>>>>
>>>>>>> GDB traces:
>>>>>>> https://www.apaste.info/LMfJ
>>>>>>
>>>>>> This shows the listener thread waiting for a worker while there are
>>>>>> many available.
>>>>>> My mistake, the worker threads failed to rearm their idle state for
>>>>>> the deferred close case.
>>>>>>
>>>>>> V4 available, thanks Stefan!
>>>>>
>>>>> Since I didn't want you to test again something that fails quickly, I
>>>>> did some stress tests myself this time with several tools, with V5
>>>>> (same as V4 from a 2.4.x POV).
>>>>>
>>>>> It seems to work well here...
>>>>>
>>>>>> Regards,
>>>>>> Yann.
>>>
>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Eissing <st...@greenbytes.de>.

The issue was opened here: https://github.com/icing/mod_h2/issues/143

I made a patch that i hope addresses the problem. The 2.4.x version I attach to this mail.

Thanks!

Stefan

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Am 24.07.2017 um 23:06 schrieb Stefan Eissing:
> I have another report of request getting stuck, resulting in the error you noticed. Will look tomorrow and report back here what I find.

Thanks, if you need any logs. Pleae ask.

Stefan

> 
>> Am 24.07.2017 um 22:20 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
>>
>> Hello all,
>>
>> currently 8 hours of testing without any issues.
>>
>> @Stefan
>> i've most probably another issue with http2 where some elements of the
>> page are sometimes missing and the connection results in
>> ERR_CONNECTION_CLOSED after 60s. What kind of details do you need?
>>
>> Greets,
>> Stefan
>>> Am 22.07.2017 um 13:35 schrieb Yann Ylavic:
>>>> On Sat, Jul 22, 2017 at 2:18 AM, Yann Ylavic <yl...@gmail.com> wrote:
>>>> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
>>>> <s....@profihost.ag> wrote:
>>>>>
>>>>> your new defer linger V3 deadlocked as well.
>>>>>
>>>>> GDB traces:
>>>>> https://www.apaste.info/LMfJ
>>>>
>>>> This shows the listener thread waiting for a worker while there are
>>>> many available.
>>>> My mistake, the worker threads failed to rearm their idle state for
>>>> the deferred close case.
>>>>
>>>> V4 available, thanks Stefan!
>>>
>>> Since I didn't want you to test again something that fails quickly, I
>>> did some stress tests myself this time with several tools, with V5
>>> (same as V4 from a 2.4.x POV).
>>>
>>> It seems to work well here...
>>>
>>>> Regards,
>>>> Yann.
>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Eissing <st...@greenbytes.de>.

I have another report of request getting stuck, resulting in the error you noticed. Will look tomorrow and report back here what I find.

> Am 24.07.2017 um 22:20 schrieb Stefan Priebe - Profihost AG <s....@profihost.ag>:
> 
> Hello all,
> 
> currently 8 hours of testing without any issues.
> 
> @Stefan
> i've most probably another issue with http2 where some elements of the
> page are sometimes missing and the connection results in
> ERR_CONNECTION_CLOSED after 60s. What kind of details do you need?
> 
> Greets,
> Stefan
>> Am 22.07.2017 um 13:35 schrieb Yann Ylavic:
>>> On Sat, Jul 22, 2017 at 2:18 AM, Yann Ylavic <yl...@gmail.com> wrote:
>>> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
>>> <s....@profihost.ag> wrote:
>>>> 
>>>> your new defer linger V3 deadlocked as well.
>>>> 
>>>> GDB traces:
>>>> https://www.apaste.info/LMfJ
>>> 
>>> This shows the listener thread waiting for a worker while there are
>>> many available.
>>> My mistake, the worker threads failed to rearm their idle state for
>>> the deferred close case.
>>> 
>>> V4 available, thanks Stefan!
>> 
>> Since I didn't want you to test again something that fails quickly, I
>> did some stress tests myself this time with several tools, with V5
>> (same as V4 from a 2.4.x POV).
>> 
>> It seems to work well here...
>> 
>>> Regards,
>>> Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Hello all,

currently 8 hours of testing without any issues.

@Stefan
i've most probably another issue with http2 where some elements of the
page are sometimes missing and the connection results in
ERR_CONNECTION_CLOSED after 60s. What kind of details do you need?

Greets,
Stefan
Am 22.07.2017 um 13:35 schrieb Yann Ylavic:
> On Sat, Jul 22, 2017 at 2:18 AM, Yann Ylavic <yl...@gmail.com> wrote:
>> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
>> <s....@profihost.ag> wrote:
>>>
>>> your new defer linger V3 deadlocked as well.
>>>
>>> GDB traces:
>>> https://www.apaste.info/LMfJ
>>
>> This shows the listener thread waiting for a worker while there are
>> many available.
>> My mistake, the worker threads failed to rearm their idle state for
>> the deferred close case.
>>
>> V4 available, thanks Stefan!
> 
> Since I didn't want you to test again something that fails quickly, I
> did some stress tests myself this time with several tools, with V5
> (same as V4 from a 2.4.x POV).
> 
> It seems to work well here...
> 
>> Regards,
>> Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

First test with version five looks good so far will continue extensive testing tomorrow.

Greets,
Stefan

Excuse my typo sent from my mobile phone.

> Am 22.07.2017 um 13:35 schrieb Yann Ylavic <yl...@gmail.com>:
> 
>> On Sat, Jul 22, 2017 at 2:18 AM, Yann Ylavic <yl...@gmail.com> wrote:
>> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
>> <s....@profihost.ag> wrote:
>>> 
>>> your new defer linger V3 deadlocked as well.
>>> 
>>> GDB traces:
>>> https://www.apaste.info/LMfJ
>> 
>> This shows the listener thread waiting for a worker while there are
>> many available.
>> My mistake, the worker threads failed to rearm their idle state for
>> the deferred close case.
>> 
>> V4 available, thanks Stefan!
> 
> Since I didn't want you to test again something that fails quickly, I
> did some stress tests myself this time with several tools, with V5
> (same as V4 from a 2.4.x POV).
> 
> It seems to work well here...
> 
>> Regards,
>> Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Sat, Jul 22, 2017 at 2:18 AM, Yann Ylavic <yl...@gmail.com> wrote:
> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
> <s....@profihost.ag> wrote:
>>
>> your new defer linger V3 deadlocked as well.
>>
>> GDB traces:
>> https://www.apaste.info/LMfJ
>
> This shows the listener thread waiting for a worker while there are
> many available.
> My mistake, the worker threads failed to rearm their idle state for
> the deferred close case.
>
> V4 available, thanks Stefan!

Since I didn't want you to test again something that fails quickly, I
did some stress tests myself this time with several tools, with V5
(same as V4 from a 2.4.x POV).

It seems to work well here...

> Regards,
> Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Eissing <st...@greenbytes.de>.

And to answer myself: no, the v3 patch does not expose anything when running in h2fuzz.

> Am 22.07.2017 um 07:17 schrieb Stefan Eissing <st...@greenbytes.de>:
> 
> Profihost, where bugs come to die!
> 
> I am currently fully overloaded, but it would be interesting to check how the previous versions of the patch fare in a h2fuzz setup.
> 
> -Stefan
> 
>> Am 22.07.2017 um 02:18 schrieb Yann Ylavic <yl...@gmail.com>:
>> 
>> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
>> <s....@profihost.ag> wrote:
>>> 
>>> your new defer linger V3 deadlocked as well.
>>> 
>>> GDB traces:
>>> https://www.apaste.info/LMfJ
>> 
>> This shows the listener thread waiting for a worker while there are
>> many available.
>> My mistake, the worker threads failed to rearm their idle state for
>> the deferred close case.
>> 
>> V4 available, thanks Stefan!
>> 
>> Regards,
>> Yann.
>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Eissing <st...@greenbytes.de>.

Profihost, where bugs come to die!

I am currently fully overloaded, but it would be interesting to check how the previous versions of the patch fare in a h2fuzz setup.

-Stefan

> Am 22.07.2017 um 02:18 schrieb Yann Ylavic <yl...@gmail.com>:
> 
> On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
> <s....@profihost.ag> wrote:
>> 
>> your new defer linger V3 deadlocked as well.
>> 
>> GDB traces:
>> https://www.apaste.info/LMfJ
> 
> This shows the listener thread waiting for a worker while there are
> many available.
> My mistake, the worker threads failed to rearm their idle state for
> the deferred close case.
> 
> V4 available, thanks Stefan!
> 
> Regards,
> Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Fri, Jul 21, 2017 at 10:31 PM, Stefan Priebe - Profihost AG
<s....@profihost.ag> wrote:
>
> your new defer linger V3 deadlocked as well.
>
> GDB traces:
> https://www.apaste.info/LMfJ

This shows the listener thread waiting for a worker while there are
many available.
My mistake, the worker threads failed to rearm their idle state for
the deferred close case.

V4 available, thanks Stefan!

Regards,
Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Hello Yann,

your new defer linger V3 deadlocked as well.

GDB traces:
https://www.apaste.info/LMfJ

But this time i have no fullstatus for you as the apache didn't serve
any connections at all anymore. But even before i did NOT see those
strange values for closing connections.

Thanks!

Greets,
Stefan
Am 21.07.2017 um 01:08 schrieb Yann Ylavic:
> On Thu, Jul 20, 2017 at 12:48 PM, Stefan Priebe - Profihost AG
> <s....@profihost.ag> wrote:
>> V3 didn't help.
> 
> I just posted a new patch in this thread, with a new approach which I
> think is better anyway.
> 
> Would you mind testing it in your environment?
> 
> 
> Regards,
> Yann.
>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Hello Yann,

i downloaded V3. Can't guarantee when i can test. May be today or on monday.

Greets,
Stefan

Am 21.07.2017 um 01:08 schrieb Yann Ylavic:
> On Thu, Jul 20, 2017 at 12:48 PM, Stefan Priebe - Profihost AG
> <s....@profihost.ag> wrote:
>> V3 didn't help.
> 
> I just posted a new patch in this thread, with a new approach which I
> think is better anyway.
> 
> Would you mind testing it in your environment?
> 
> 
> Regards,
> Yann.
>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Thu, Jul 20, 2017 at 12:48 PM, Stefan Priebe - Profihost AG
<s....@profihost.ag> wrote:
> V3 didn't help.

I just posted a new patch in this thread, with a new approach which I
think is better anyway.

Would you mind testing it in your environment?


Regards,
Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

V3 didn't help. Will post a new gdb backtrace soon takes some time as I'm on holiday.

Stefan

Excuse my typo sent from my mobile phone.

> Am 20.07.2017 um 01:26 schrieb Yann Ylavic <yl...@gmail.com>:
> 
> On Wed, Jul 19, 2017 at 11:14 PM, Stefan Priebe - Profihost AG
> <s....@profihost.ag> wrote:
>> Am 19.07.2017 um 22:46 schrieb Yann Ylavic:
>>> 
>>> Attached is a v2 if you feel confident enough, still ;)
>> 
>> Thanks, yes i will.
> 
> If you managed to install v2 already you may want to ignore this new
> v3, which only addresses a very unlikely error case where
> lingering_count++ is missing (plus some non-functional changes, a bit
> of renaming and the factorization which would have avoided this
> mistake in the first place).
> 
> Otherwise, you could try this one instead.
> 
> Thanks,
> Yann.
> <shutdown_linger_q-v3.patch>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Wed, Jul 19, 2017 at 11:14 PM, Stefan Priebe - Profihost AG
<s....@profihost.ag> wrote:
> Am 19.07.2017 um 22:46 schrieb Yann Ylavic:
>>
>> Attached is a v2 if you feel confident enough, still ;)
>
> Thanks, yes i will.

If you managed to install v2 already you may want to ignore this new
v3, which only addresses a very unlikely error case where
lingering_count++ is missing (plus some non-functional changes, a bit
of renaming and the factorization which would have avoided this
mistake in the first place).

Otherwise, you could try this one instead.

Thanks,
Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Am 19.07.2017 um 22:46 schrieb Yann Ylavic:
> Hi Stefan,
> 
> thanks for testing again!
> 
> On Wed, Jul 19, 2017 at 7:42 PM, Stefan Priebe - Profihost AG
> <s....@profihost.ag> wrote:
>>
>> What looks strange
>> from a first view is that async connections closing has very high and
>> strange values:
>> 4294967211
> 
> Indeed, I messed up with mpm_event's lingering_count in the first patch.
> And it can lead to disabling the listener, which I think is what you observe.
> 
> Attached is a v2 if you feel confident enough, still ;)

Thanks, yes i will.

> 
> Regards,
> Yann.
>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

Hi Stefan,

thanks for testing again!

On Wed, Jul 19, 2017 at 7:42 PM, Stefan Priebe - Profihost AG
<s....@profihost.ag> wrote:
>
> What looks strange
> from a first view is that async connections closing has very high and
> strange values:
> 4294967211

Indeed, I messed up with mpm_event's lingering_count in the first patch.
And it can lead to disabling the listener, which I think is what you observe.

Attached is a v2 if you feel confident enough, still ;)

Regards,
Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Hello Luca,

i need to wait until a machine is crashing again. What looks strange
from a first view is that async connections closing has very high and
strange values:
4294967211

Even a not yet crashed system has those:

   Slot  PID  Stopping   Connections    Threads       Async connections
                       total accepting busy idle writing keep-alive  closing
   0    25157 no       12    yes       4    196  0       9
4294967231 <== HERE ==
   1    25159 no       22    yes       8    192  0       13
4294967211 <== HERE ==
   Sum  2     0        34              12   388  0       22         -150

Greets,
Stefan

Am 19.07.2017 um 17:48 schrieb Luca Toscano:
> Hello Stefan,
> 
> 2017-07-19 17:05 GMT+02:00 Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag <ma...@profihost.ag>>:
> 
> 
>     Am 19.07.2017 um 16:59 schrieb Stefan Priebe - Profihost AG:
>     > Hello Yann,
>     >
>     > i'm observing some deadlocks again.
>     >
>     > I'm using
>     > httpd 2.4.27
>     > + mod_h2
>     > + httpd-2.4.x-mpm_event-wakeup-v7.1.patch
>     > + your ssl linger fix patch from this thread
>     >
>     > What kind of information do you need? If you need a full stack backtrace
>     >  - from which pid? Or from all httpd pids?
> 
>     Something i forgot to tell:
> 
>     it seems httpd is running at max threads:
>     awk '{print $10 $11}' lsof.txt | sort | uniq -c | grep LISTEN
>       25050 *:http(LISTEN)
>       25050 *:https(LISTEN)
> 
> 
> First of all let me tell you how awesome is your regular testing, thank
> you! It is helping a ton to deliver stable code :)
> 
> From my point of view I think that you can attach gdb to one or more
> httpd processes and do the usual "thread apply all", IIUC your httpd
> ends up having all of its processes in more or less the same state right?
> 
> Let's use https://apaste.info/ or similar though otherwise we'll need to
> exchange super long emails.
> 
> Thanks again!
> 
> Luca

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Luca Toscano <to...@gmail.com>.

Hello Stefan,

2017-07-19 17:05 GMT+02:00 Stefan Priebe - Profihost AG <
s.priebe@profihost.ag>:

>
> Am 19.07.2017 um 16:59 schrieb Stefan Priebe - Profihost AG:
> > Hello Yann,
> >
> > i'm observing some deadlocks again.
> >
> > I'm using
> > httpd 2.4.27
> > + mod_h2
> > + httpd-2.4.x-mpm_event-wakeup-v7.1.patch
> > + your ssl linger fix patch from this thread
> >
> > What kind of information do you need? If you need a full stack backtrace
> >  - from which pid? Or from all httpd pids?
>
> Something i forgot to tell:
>
> it seems httpd is running at max threads:
> awk '{print $10 $11}' lsof.txt | sort | uniq -c | grep LISTEN
>   25050 *:http(LISTEN)
>   25050 *:https(LISTEN)
>
>
First of all let me tell you how awesome is your regular testing, thank
you! It is helping a ton to deliver stable code :)

From my point of view I think that you can attach gdb to one or more httpd
processes and do the usual "thread apply all", IIUC your httpd ends up
having all of its processes in more or less the same state right?

Let's use https://apaste.info/ or similar though otherwise we'll need to
exchange super long emails.

Thanks again!

Luca

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Am 19.07.2017 um 16:59 schrieb Stefan Priebe - Profihost AG:
> Hello Yann,
> 
> i'm observing some deadlocks again.
> 
> I'm using
> httpd 2.4.27
> + mod_h2
> + httpd-2.4.x-mpm_event-wakeup-v7.1.patch
> + your ssl linger fix patch from this thread
> 
> What kind of information do you need? If you need a full stack backtrace
>  - from which pid? Or from all httpd pids?

Something i forgot to tell:

it seems httpd is running at max threads:
awk '{print $10 $11}' lsof.txt | sort | uniq -c | grep LISTEN
  25050 *:http(LISTEN)
  25050 *:https(LISTEN)

Stefan

> 
> Thanks!
> 
> Greets,
> Stefan
> 
> Am 14.07.2017 um 21:52 schrieb Yann Ylavic:
>> On Fri, Jun 30, 2017 at 1:33 PM, Yann Ylavic <yl...@gmail.com> wrote:
>>> On Fri, Jun 30, 2017 at 1:20 PM, Ruediger Pluem <rp...@apache.org> wrote:
>>>>
>>>> On 06/30/2017 12:18 PM, Yann Ylavic wrote:
>>>>>
>>>>> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
>>>>> modssl_smart_shutdown(), only in the "abortive" mode of
>>>>> ssl_filter_io_shutdown().
>>>>
>>>> I think the issue starts before that.
>>>> ap_prep_lingering_close calls the pre_close_connection hook and modules that are registered
>>>> to this hook can perform all sort of long lasting blocking operations there.
>>>> While it can be argued that this would be a bug in the module I think the only safe way
>>>> is to have the whole start_lingering_close_nonblocking being executed by a worker thread.
>>>
>>> Correct, that'd be much simpler/safer indeed.
>>> We need a new SHUTDOWN state then, right?
>>
>> Actually it was less simple than expected, and it has some caveats obviously...
>>
>> The attached patch does not introduce a new state but reuses the
>> existing CONN_STATE_LINGER since it was not really considered by the
>> listener thread (which uses CONN_STATE_LINGER_NORMAL and
>> CONN_STATE_LINGER_SHORT instead), but that's a detail.
>>
>> Mainly, start_lingering_close_nonblocking() now simply schedules a
>> shutdown (i.e. pre_close_connection() followed by immediate close)
>> that will we be run by a worker thread.
>> A new shutdown_linger_q is created/handled (with the same timeout as
>> the short_linger_q, namely 2 seconds) to hold connections to be
>> shutdown.
>>
>> So now when a connection times out in the write_completion or
>> keepalive queues, it needs (i.e. the listener may wait for) an
>> available worker to process its shutdown/close.
>> This means we can *not* close kept alive connections immediatly like
>> before when becoming short of workers, which will favor active KA
>> connections over new ones in this case (I don't think it's that
>> serious but the previous was taking care of that. For me it's up to
>> the admin to size the workers appropriately...).
>>
>> Same when a connection in the shutdown_linger_q itself times out, the
>> patch will require a worker immediatly to do the job (see
>> shutdown_lingering_close() callback).
>>
>> So overall, this patch may introduce the need for more workers than
>> before, what was (wrongly) done by the listener thread has to be done
>> somewhere anyway...
>>
>> Finally, I think there is room for improvements like batching
>> shutdowns in the same worker if there is no objection on the approach
>> so far.
>>
>> WDYT?
>>
>> Regards,
>> Yann.
>>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Stefan Priebe - Profihost AG <s....@profihost.ag>.

Hello Yann,

i'm observing some deadlocks again.

I'm using
httpd 2.4.27
+ mod_h2
+ httpd-2.4.x-mpm_event-wakeup-v7.1.patch
+ your ssl linger fix patch from this thread

What kind of information do you need? If you need a full stack backtrace
 - from which pid? Or from all httpd pids?

Thanks!

Greets,
Stefan

Am 14.07.2017 um 21:52 schrieb Yann Ylavic:
> On Fri, Jun 30, 2017 at 1:33 PM, Yann Ylavic <yl...@gmail.com> wrote:
>> On Fri, Jun 30, 2017 at 1:20 PM, Ruediger Pluem <rp...@apache.org> wrote:
>>>
>>> On 06/30/2017 12:18 PM, Yann Ylavic wrote:
>>>>
>>>> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
>>>> modssl_smart_shutdown(), only in the "abortive" mode of
>>>> ssl_filter_io_shutdown().
>>>
>>> I think the issue starts before that.
>>> ap_prep_lingering_close calls the pre_close_connection hook and modules that are registered
>>> to this hook can perform all sort of long lasting blocking operations there.
>>> While it can be argued that this would be a bug in the module I think the only safe way
>>> is to have the whole start_lingering_close_nonblocking being executed by a worker thread.
>>
>> Correct, that'd be much simpler/safer indeed.
>> We need a new SHUTDOWN state then, right?
> 
> Actually it was less simple than expected, and it has some caveats obviously...
> 
> The attached patch does not introduce a new state but reuses the
> existing CONN_STATE_LINGER since it was not really considered by the
> listener thread (which uses CONN_STATE_LINGER_NORMAL and
> CONN_STATE_LINGER_SHORT instead), but that's a detail.
> 
> Mainly, start_lingering_close_nonblocking() now simply schedules a
> shutdown (i.e. pre_close_connection() followed by immediate close)
> that will we be run by a worker thread.
> A new shutdown_linger_q is created/handled (with the same timeout as
> the short_linger_q, namely 2 seconds) to hold connections to be
> shutdown.
> 
> So now when a connection times out in the write_completion or
> keepalive queues, it needs (i.e. the listener may wait for) an
> available worker to process its shutdown/close.
> This means we can *not* close kept alive connections immediatly like
> before when becoming short of workers, which will favor active KA
> connections over new ones in this case (I don't think it's that
> serious but the previous was taking care of that. For me it's up to
> the admin to size the workers appropriately...).
> 
> Same when a connection in the shutdown_linger_q itself times out, the
> patch will require a worker immediatly to do the job (see
> shutdown_lingering_close() callback).
> 
> So overall, this patch may introduce the need for more workers than
> before, what was (wrongly) done by the listener thread has to be done
> somewhere anyway...
> 
> Finally, I think there is room for improvements like batching
> shutdowns in the same worker if there is no objection on the approach
> so far.
> 
> WDYT?
> 
> Regards,
> Yann.
>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Luca Toscano <to...@gmail.com>.

2017-07-21 1:16 GMT+02:00 Yann Ylavic <yl...@gmail.com>:

> On Fri, Jul 21, 2017 at 1:05 AM, Postmaster
> <po...@cienacorp.onmicrosoft.com> wrote:
> > This message was created automatically by mail delivery software. Your
> email message was not delivered as is to the intended recipients because
> malware was detected in one or more attachments included with it. All
> attachments were deleted.
> >
> > --- Additional Information ---:
> []
> >
> > Detections found:
> > defer_linger_chain.patch         JS/Jasobfus.A!ml
>
> wtf?
>

Was that a -1 for the patch? :D

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Fri, Jul 21, 2017 at 1:05 AM, Postmaster
<po...@cienacorp.onmicrosoft.com> wrote:
> This message was created automatically by mail delivery software. Your email message was not delivered as is to the intended recipients because malware was detected in one or more attachments included with it. All attachments were deleted.
>
> --- Additional Information ---:
[]
>
> Detections found:
> defer_linger_chain.patch         JS/Jasobfus.A!ml

wtf?

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Fri, Jul 14, 2017 at 9:52 PM, Yann Ylavic <yl...@gmail.com> wrote:
>
> So overall, this patch may introduce the need for more workers than
> before, what was (wrongly) done by the listener thread has to be done
> somewhere anyway...

That patch didn't work (as reported by Stefan Pribe) and I now don't
feel the need to debug it further, see below.

>
> Finally, I think there is room for improvements like batching
> shutdowns in the same worker if there is no objection on the approach
> so far.

That's the way to go IMO, here is a new patch which is much simpler
and effective I think.

The idea is that when nonblocking is required (i.e. in the listener),
connections to flush and close are atomically pushed/popped to/from a
chain (linked list) by the listener/some worker.

So start_lingering_close_nonblocking() simply fills the chain (this is
atomic/nonblocking), and any worker thread which is done with its
current connection will empty the chain while calling
start_lingering_close_blocking() for each connection.

To prevent starvation of deferred lingering closes, the listener may
create a worker at the of its loop, when/if the chain is (fully)
filled.

While the previous patch potentially induced some overhead in the
number workers and thread contexts switches, I think this new one much
better in this regard.

What do you think of it?

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Eric Covener <co...@gmail.com>.

> Also, it seems that in the deferred lingering case we should probaly
> shorten the socket timeout before calling (and possibly blocking on)
> ap_start_lingering_close()'s hooks/flush, since we likely come from a
> time-up already...

+1

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Eric Covener <co...@gmail.com>.

> So, should we favor the draining of defer_linger_chain as much workers
> as necessary like the current patch, or should we have as few workers
> as possible and not start new workers in loops with no effect on
> defer_linger_chain?

I think the fewer workers option could lead to hard to debug (from an
end user POV) intermittent problems with clients far back in the queue
who see a FIN delayed by someone ahead in the queue.  They may send
subsequent requests in the meantime that will spin.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Fri, Jul 21, 2017 at 3:07 PM, Luca Toscano <to...@gmail.com> wrote:
>>
>> To prevent starvation of deferred lingering closes, the listener may
>> create a worker at the of its loop, when/if the chain is (fully)
>> filled.
>
> IIUC the trick is to run "(have_idle_worker && push2worker(NULL) ==
> APR_SUCCESS)" that reserves a worker that in turn eventually checks the
> defer_linger_chain as part of its new code.

That will be a dedicated worker, it won't handle a connection before
flushing/closing the ones in defer_linger_chain, straight to that
(thanks to the NULL arg).

> This also seems sto leverage
> workers_were_busy that will prioritize lingering closes over ka connections
> right? (or at least try to)

workers_were_busy is per loop (zeroed), so the new (late) get_worker
which updates it won't affect the next loop (and since there will be a
poll() in between this get_worker and the next time we'll have to
decide whether to kill KA connections or not, it looks reasonable to
not depend on the previous loop anyway).

Regarding this part, though :

        if (defer_linger_chain) {
            get_worker(&have_idle_worker, 0, &workers_were_busy);
            if (have_idle_worker && push2worker(NULL) == APR_SUCCESS) {
                have_idle_worker = 0;
            }

I think we could be smarter and possibly not create a worker if no
lingering close was chained in *this* loop.
With the current patch we would create one if, for example, a new
connection is accepted before the worker in charge of the lingering
closes (from the previous loop) did not finish its work.
Here defer_linger_chain was not filled by the current loop, but not
completely emptied either because the worker is blocking on its first
connections to shutdown, hence != NULL.

So, should we favor the draining of defer_linger_chain as much workers
as necessary like the current patch, or should we have as few workers
as possible and not start new workers in loops with no effect on
defer_linger_chain?

Also, it seems that in the deferred lingering case we should probaly
shorten the socket timeout before calling (and possibly blocking on)
ap_start_lingering_close()'s hooks/flush, since we likely come from a
time-up already...

Thoughts?

Regards,
Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Luca Toscano <to...@gmail.com>.

Hi Yann,

2017-07-21 1:05 GMT+02:00 Yann Ylavic <yl...@gmail.com>:

> On Fri, Jul 14, 2017 at 9:52 PM, Yann Ylavic <yl...@gmail.com> wrote:
> >
> > So overall, this patch may introduce the need for more workers than
> > before, what was (wrongly) done by the listener thread has to be done
> > somewhere anyway...
>
> That patch didn't work (as reported by Stefan Pribe) and I now don't
> feel the need to debug it further, see below.
>
> >
> > Finally, I think there is room for improvements like batching
> > shutdowns in the same worker if there is no objection on the approach
> > so far.
>
> That's the way to go IMO, here is a new patch which is much simpler
> and effective I think.
>

It is indeed much simpler to follow even for people like me with not enough
experience in mpm-event's code :)


> The idea is that when nonblocking is required (i.e. in the listener),
> connections to flush and close are atomically pushed/popped to/from a
> chain (linked list) by the listener/some worker.
>
> So start_lingering_close_nonblocking() simply fills the chain (this is
> atomic/nonblocking), and any worker thread which is done with its
> current connection will empty the chain while calling
> start_lingering_close_blocking() for each connection.
>
> To prevent starvation of deferred lingering closes, the listener may
> create a worker at the of its loop, when/if the chain is (fully)
> filled.
>

IIUC the trick is to run "(have_idle_worker && push2worker(NULL) ==
APR_SUCCESS)" that reserves a worker that in turn eventually checks the
defer_linger_chain as part of its new code. This also seems sto leverage
workers_were_busy that will prioritize lingering closes over ka connections
right? (or at least try to)



> While the previous patch potentially induced some overhead in the
> number workers and thread contexts switches, I think this new one much
> better in this regard.
>
> What do you think of it?
>

I like it, from a first look I'd +1 it but as said above I am not really
authoritative for mpm-event's code :)

Let's see how Stefan's tests go!

Thanks for this work.

Luca

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Fri, Jul 14, 2017 at 9:52 PM, Yann Ylavic <yl...@gmail.com> wrote:
>
> So overall, this patch may introduce the need for more workers than
> before, what was (wrongly) done by the listener thread has to be done
> somewhere anyway...

That patch didn't work (as reported by Stefan Pribe) and I now don't
feel the need to debug it further, see below.

>
> Finally, I think there is room for improvements like batching
> shutdowns in the same worker if there is no objection on the approach
> so far.

That's the way to go IMO, here is a new patch which is much simpler
and effective I think.

The idea is that when nonblocking is required (i.e. in the listener),
connections to flush and close are atomically pushed/popped to/from a
chain (linked list) by the listener/some worker.

So start_lingering_close_nonblocking() simply fills the chain (this is
atomic/nonblocking), and any worker thread which is done with its
current connection will empty the chain while calling
start_lingering_close_blocking() for each connection.

To prevent starvation of deferred lingering closes, the listener may
create a worker at the of its loop, when/if the chain is (fully)
filled.

While the previous patch potentially induced some overhead in the
number workers and thread contexts switches, I think this new one much
better in this regard.

What do you think of it?

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Fri, Jun 30, 2017 at 1:33 PM, Yann Ylavic <yl...@gmail.com> wrote:
> On Fri, Jun 30, 2017 at 1:20 PM, Ruediger Pluem <rp...@apache.org> wrote:
>>
>> On 06/30/2017 12:18 PM, Yann Ylavic wrote:
>>>
>>> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
>>> modssl_smart_shutdown(), only in the "abortive" mode of
>>> ssl_filter_io_shutdown().
>>
>> I think the issue starts before that.
>> ap_prep_lingering_close calls the pre_close_connection hook and modules that are registered
>> to this hook can perform all sort of long lasting blocking operations there.
>> While it can be argued that this would be a bug in the module I think the only safe way
>> is to have the whole start_lingering_close_nonblocking being executed by a worker thread.
>
> Correct, that'd be much simpler/safer indeed.
> We need a new SHUTDOWN state then, right?

Actually it was less simple than expected, and it has some caveats obviously...

The attached patch does not introduce a new state but reuses the
existing CONN_STATE_LINGER since it was not really considered by the
listener thread (which uses CONN_STATE_LINGER_NORMAL and
CONN_STATE_LINGER_SHORT instead), but that's a detail.

Mainly, start_lingering_close_nonblocking() now simply schedules a
shutdown (i.e. pre_close_connection() followed by immediate close)
that will we be run by a worker thread.
A new shutdown_linger_q is created/handled (with the same timeout as
the short_linger_q, namely 2 seconds) to hold connections to be
shutdown.

So now when a connection times out in the write_completion or
keepalive queues, it needs (i.e. the listener may wait for) an
available worker to process its shutdown/close.
This means we can *not* close kept alive connections immediatly like
before when becoming short of workers, which will favor active KA
connections over new ones in this case (I don't think it's that
serious but the previous was taking care of that. For me it's up to
the admin to size the workers appropriately...).

Same when a connection in the shutdown_linger_q itself times out, the
patch will require a worker immediatly to do the job (see
shutdown_lingering_close() callback).

So overall, this patch may introduce the need for more workers than
before, what was (wrongly) done by the listener thread has to be done
somewhere anyway...

Finally, I think there is room for improvements like batching
shutdowns in the same worker if there is no objection on the approach
so far.

WDYT?

Regards,
Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Luca Toscano <to...@gmail.com>.

Hi Yann and Ruediger,

2c from a mpm-event newbie inline:

2017-06-30 13:33 GMT+02:00 Yann Ylavic <yl...@gmail.com>:

> On Fri, Jun 30, 2017 at 1:20 PM, Ruediger Pluem <rp...@apache.org> wrote:
> >
> > On 06/30/2017 12:18 PM, Yann Ylavic wrote:
> >>
> >> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
> >> modssl_smart_shutdown(), only in the "abortive" mode of
> >> ssl_filter_io_shutdown().
> >
> > I think the issue starts before that.
> > ap_prep_lingering_close calls the pre_close_connection hook and modules
> that are registered
> > to this hook can perform all sort of long lasting blocking operations
> there.
> > While it can be argued that this would be a bug in the module I think
> the only safe way
> > is to have the whole start_lingering_close_nonblocking being executed
> by a worker thread.
>

This makes a lot of sense and I agree, but at the same time I feel that it
would be really great not to move lingering close responsibilities away
from the listener. As far as we know mod_ssl is the only one that shows
this "buggy" behavior, would it make sense to attempt to "fix it" now and
postpone the decision about pushing start_lingering_close_nonblocking to a
worker thread?

> Correct, that'd be much simpler/safer indeed.
> We need a new SHUTDOWN state then, right?
>

IIUC in each case that the listener calls start_lingering_close_nonblocking
we'd need to set the connection to SHUTDOWN, check if there is a free
worker and push2worker the connection to it right?

Thanks!

Luca

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Fri, Jun 30, 2017 at 1:20 PM, Ruediger Pluem <rp...@apache.org> wrote:
>
> On 06/30/2017 12:18 PM, Yann Ylavic wrote:
>>
>> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
>> modssl_smart_shutdown(), only in the "abortive" mode of
>> ssl_filter_io_shutdown().
>
> I think the issue starts before that.
> ap_prep_lingering_close calls the pre_close_connection hook and modules that are registered
> to this hook can perform all sort of long lasting blocking operations there.
> While it can be argued that this would be a bug in the module I think the only safe way
> is to have the whole start_lingering_close_nonblocking being executed by a worker thread.

Correct, that'd be much simpler/safer indeed.
We need a new SHUTDOWN state then, right?


Regards,
Yann.

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Ruediger Pluem <rp...@apache.org>.


On 06/30/2017 12:18 PM, Yann Ylavic wrote:
> Hi Luca,
> 
> [better/easier to talk about details on dev@]
> 
> On Fri, Jun 30, 2017 at 11:05 AM,  <bu...@apache.org> wrote:
>> https://bz.apache.org/bugzilla/show_bug.cgi?id=60956
>>
>> --- Comment #11 from Luca Toscano <to...@gmail.com> ---
>> Other two interesting trunk improvements that have not been backported yet:
>>
>> http://svn.apache.org/viewvc?view=revision&revision=1706669
>> http://svn.apache.org/viewvc?view=revision&revision=1734656
>>
>> IIUC these ones are meant to provide a more async behavior to most of the
>> output filters, namely setting aside buckets (on the heap) to avoid blocking.
> 
> These are quite orthogonal I think, and don't seem to fix this particular issue.
> 
>>
>> After a bit of thinking it seems to me that we'd need to find a solution that
>> prevents the mod_ssl_output filter to block, but in a safe way.
> 
> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
> modssl_smart_shutdown(), only in the "abortive" mode of
> ssl_filter_io_shutdown().

I think the issue starts before that.
ap_prep_lingering_close calls the pre_close_connection hook and modules that are registered
to this hook can perform all sort of long lasting blocking operations there.
While it can be argued that this would be a bug in the module I think the only safe way
is to have the whole start_lingering_close_nonblocking being executed by a worker thread.

Regards

Rüdiger

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Yann Ylavic <yl...@gmail.com>.

On Fri, Jun 30, 2017 at 12:52 PM, Luca Toscano <to...@gmail.com> wrote:
>
> 2017-06-30 12:18 GMT+02:00 Yann Ylavic <yl...@gmail.com>:
>> >
>> > http://svn.apache.org/viewvc?view=revision&revision=1706669
>> > http://svn.apache.org/viewvc?view=revision&revision=1734656
>> >
>> > IIUC these ones are meant to provide a more async behavior to most of
>> > the
>> > output filters, namely setting aside buckets (on the heap) to avoid
>> > blocking.
>>
>> These are quite orthogonal I think, and don't seem to fix this particular
>> issue.
>
> Sorry for the noise, I am still reviewing those to understand what they
> really do and added them as reference to the task.

NP, it's not really a noise either because with my proposed change
they'd imply a different fix for 2.4 and trunk.

>>
>> With a possibly non-blocking modssl_smart_shutdown(), I think we could
>> make ap_shutdown_conn(c, 0) return something like APR_INCOMPLETE for
>> the case the shutdown was "buffered" in the output filter stack (e.g.
>> core output filter).
>>
>> In mpm_event, we would then go to (or stay in) the WRITE_COMPLETION
>> state instead of LINGER, until every remaining piece data is flushed
>> successfully.
>
> IIUC in this case the listener is calling
> process_timeout_queue(write_completion_q, ..), so the conn has already been
> in the WRITE_COMPLETION state for Timeout seconds and needs to be closed.
> Your suggestion would be to still force the listener to call
> ap_shutdown_conn(c, 0), but a "smarter" version that eventually returns
> APR_INCOMPLETE rather than blocking. Then the listener could put the
> connection again in the WRITE_COMPLETION queue, and wait for the client to
> unblock and read the missing bytes about close-notify (or just let Timeout
> seconds to pass, forcing the listener to shutdown the socket and be done
> with it).

We'd use a shorter timeout but that was the idea yes.

But nevermind, I like Rüdiger's idea better actually, less invasive (I
think) and more bullets proof. Continued there...

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Posted by Luca Toscano <to...@gmail.com>.

Hi Yann!

2017-06-30 12:18 GMT+02:00 Yann Ylavic <yl...@gmail.com>:

> Hi Luca,
>
> [better/easier to talk about details on dev@]
>
> On Fri, Jun 30, 2017 at 11:05 AM,  <bu...@apache.org> wrote:
> > https://bz.apache.org/bugzilla/show_bug.cgi?id=60956
> >
> > --- Comment #11 from Luca Toscano <to...@gmail.com> ---
> > Other two interesting trunk improvements that have not been backported
> yet:
> >
> > http://svn.apache.org/viewvc?view=revision&revision=1706669
> > http://svn.apache.org/viewvc?view=revision&revision=1734656
> >
> > IIUC these ones are meant to provide a more async behavior to most of the
> > output filters, namely setting aside buckets (on the heap) to avoid
> blocking.
>
> These are quite orthogonal I think, and don't seem to fix this particular
> issue.
>

Sorry for the noise, I am still reviewing those to understand what they
really do and added them as reference to the task.
I thought that they might have been useful while thinking about a solution,
since from my (ignorant :) point of view the fact that ap_core_output_filter
was blocked by mod_ssl's BIO_flush was somehow pointing me to those
commits.

Will skip these notes in the future, they might be confusing!

>
> [..]
>

> With a possibly non-blocking modssl_smart_shutdown(), I think we could
> make ap_shutdown_conn(c, 0) return something like APR_INCOMPLETE for
> the case the shutdown was "buffered" in the output filter stack (e.g.
> core output filter).
>
> In mpm_event, we would then go to (or stay in) the WRITE_COMPLETION
> state instead of LINGER, until every remaining piece data is flushed
> successfully.
>

IIUC in this case the listener is calling
process_timeout_queue(write_completion_q, ..), so the conn has already been
in the WRITE_COMPLETION state for Timeout seconds and needs to be closed.
Your suggestion would be to still force the listener to
call ap_shutdown_conn(c, 0), but a "smarter" version that eventually
returns APR_INCOMPLETE rather than blocking. Then the listener could put
the connection again in the WRITE_COMPLETION queue, and wait for the client
to unblock and read the missing bytes about close-notify (or just let
Timeout seconds to pass, forcing the listener to shutdown the socket and be
done with it).

You are free to trash the email if it doesn't make sense, I am probably
missing too many important pieces of the puzzle :)

Thanks for the help!

Luca