You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Luca Toscano <to...@gmail.com> on 2017/01/17 17:40:58 UTC

Re: Async write completion broken in trunk?

2016-09-18 10:57 GMT+02:00 Stefan Fritsch <sf...@sfritsch.de>:

> Hi Graham,
>
> On Wed, 14 Sep 2016, Graham Leggett wrote:
>
> > On 06 Sep 2016, at 12:06 AM, Stefan Fritsch <sf...@sfritsch.de> wrote:
> >
> > > in trunk, when having a lot of slow long running transfers, most of
> them seem
> > > to hog a thread and only a few of them go into async write completion
> mode.
> > > Compare this to 2.4, where many transfers are doing async write
> completion and
> > > only a small number of threads are busy.
> > >
> > > Is this a known issue?
> >
> > I’ve seen this before, but couldn’t reproduce it reliably.
> >
> > If you can catch a long running transfer can you get a stacktrace from
> > it? One possible option is that a bucket is insisting on being consumed
> > in one go and thus refuses to leave the handler hook and enter write
> > completion, the second option is that the bucket ends up in write
> > completion but blocks, and we would need to know why.
> >
> > There are still a number of filters out there that insist on swallowing
> > whole bucket brigades in one go. Any of these filters will prevent write
> > completion from working. This will show up in the stacktrace.
>
> Stack trace of a few such threads is below. I don't see anything besides
> the core output filter there. Maybe you see more.
>
> I have also included the list of loaded modules. From extra/ I have
> httpd-userdir.conf proxy-html.conf httpd-info.conf httpd-mpm.conf
> included. Apart from that, it's a somewhat outdated but unmodified default
> config.
>
>
>
> Cheers,
> Stefan
>
>
> Thread 5 (Thread 0x7f70c9a69700 (LWP 3398)):
>
> #0  0x00007f70daaea50d in poll () at ../sysdeps/unix/syscall-template.S:84
> No locals.
> #1  0x00007f70db83c5d2 in apr_poll (aprset=0x7f70c9a68c20, num=1,
> nsds=0x7f70c9a68c0c, timeout=60000) at poll/unix/poll.c:120
>         i = 1
>         num_to_poll = 1
>         pollset = 0x7f70c9a68b60
> #2  0x00000000004529b1 in send_brigade_blocking (c=0x7f70c825b348,
> bytes_written=0x7f70c8257a00, bb=0x7f70c8257658, s=0x7f70c825b0b0)
>     at core_filters.c:633
>         nsds = 1
>         timeout = 60000000
>         pollset = {p = 0x7f70c8257028, desc_type = APR_POLL_SOCKET,
> reqevents = 4, rtnevents = 4, desc = {f = 0x7f70c825b0b0,
>             s = 0x7f70c825b0b0}, client_data = 0x7f70c82576a8}
>         rv = <optimized out>
> #3  ap_core_output_filter (f=0x7f70c8257830, bb=0x7f70c8257658) at
> core_filters.c:449
>         c = 0x7f70c825b348
>         net = 0x7f70c82577d8
>         ctx = 0x7f70c82579f0
>         flush_upto = 0x7f70c825a6e8
>         rv = <optimized out>
>         loglevel = 4
> #4  0x000000000043955f in ap_pass_brigade (bb=<optimized out>,
> next=0x7f70c8257830) at util_filter.c:610
>         e = <optimized out>
> #5  ap_filter_output_pending (c=0x7f70c825b348) at util_filter.c:985
>         f = 0x7f70c8257830
>         rindex = 0x7f70c82576a8
>         data_in_output_filters = -1
> #6  0x0000000000435ef0 in ap_run_output_pending (c=c@entry=0x7f70c825b348)
> at mpm_common.c:122
>         pHook = <optimized out>
>         n = 0
>         rv = -1
> #7  0x00007f70ce5da6fc in process_socket (my_thread_num=3, my_child_num=1,
> cs=0x7f70c825b2b8, sock=0x7f70c825b0b0, p=0x7f70c825b028,
>     thd=0x7f70dc57cee8) at event.c:1152
>         not_complete_yet = <optimized out>
>         c = 0x7f70c825b348
>
>         sbh = 0x7f70c825b2a0
>         conn_id = <optimized out>
>         rc = <optimized out>
> #8  worker_thread (thd=0x7f70dc57cee8, dummy=<optimized out>) at
> event.c:2261
>         csd = 0x7f70c825b0b0
>         cs = 0x0
>         te = 0x0
>         ptrans = 0x7f70c825b028
>         ti = <optimized out>
>         process_slot = 1
>         thread_slot = 3
>         rv = <optimized out>
>         is_idle = 0
> #9  0x00007f70db842488 in dummy_worker (opaque=0x7f70dc57cee8) at
> threadproc/unix/thread.c:142
>         thread = 0x7f70dc57cee8
> #10 0x00007f70dafb4464 in start_thread (arg=0x7f70c9a69700) at
> pthread_create.c:333
>         __res = <optimized out>
>         pd = 0x7f70c9a69700
>         now = <optimized out>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140122396202752,
> 6480505367792406515, 0, 140122429770751, 0, 140122711343168,
>                 -6408919108241895437, -6408951309539651597},
> mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
>               prev = 0x0, cleanup = 0x0, canceltype = 0}}}
>         not_first_call = <optimized out>
>         pagesize_m1 = <optimized out>
>         sp = <optimized out>
>         freesize = <optimized out>
>         __PRETTY_FUNCTION__ = "start_thread"
> #11 0x00007f70daaf397f in clone () at ../sysdeps/unix/sysv/linux/
> x86_64/clone.S:105
> No locals.
>
> Thread 3 (Thread 0x7f70caa6b700 (LWP 3394)):
> #0  0x00007f70daaea50d in poll () at ../sysdeps/unix/syscall-template.S:84
> No locals.
> #1  0x00007f70db83c5d2 in apr_poll (aprset=0x7f70caa6ac20, num=1,
> nsds=0x7f70caa6ac0c, timeout=60000) at poll/unix/poll.c:120
>         i = 1
>         num_to_poll = 1
>         pollset = 0x7f70caa6ab60
> #2  0x00000000004529b1 in send_brigade_blocking (c=0x7f70dc506348,
> bytes_written=0x7f70dc502a00, bb=0x7f70dc502658, s=0x7f70dc5060b0)
>     at core_filters.c:633
>         nsds = 1
>         timeout = 60000000
>         pollset = {p = 0x7f70dc502028, desc_type = APR_POLL_SOCKET,
> reqevents = 4, rtnevents = 4, desc = {f = 0x7f70dc5060b0,
>             s = 0x7f70dc5060b0}, client_data = 0x7f70dc5026a8}
>         rv = <optimized out>
> #3  ap_core_output_filter (f=0x7f70dc502830, bb=0x7f70dc502658) at
> core_filters.c:449
>         c = 0x7f70dc506348
>         net = 0x7f70dc5027d8
>         ctx = 0x7f70dc5029f0
>         flush_upto = 0x7f70dc5056e8
>         rv = <optimized out>
>         loglevel = 4
> #4  0x000000000043955f in ap_pass_brigade (bb=<optimized out>,
> next=0x7f70dc502830) at util_filter.c:610
>         e = <optimized out>
> #5  ap_filter_output_pending (c=0x7f70dc506348) at util_filter.c:985
>         f = 0x7f70dc502830
>         rindex = 0x7f70dc5026a8
>         data_in_output_filters = -1
> #6  0x0000000000435ef0 in ap_run_output_pending (c=c@entry=0x7f70dc506348)
> at mpm_common.c:122
>         pHook = <optimized out>
>         n = 0
>         rv = -1
> #7  0x00007f70ce5da6fc in process_socket (my_thread_num=1, my_child_num=1,
> cs=0x7f70dc5062b8, sock=0x7f70dc5060b0, p=0x7f70dc506028,
>     thd=0x7f70dc57ce88) at event.c:1152
>         not_complete_yet = <optimized out>
>         c = 0x7f70dc506348
>         sbh = 0x7f70dc5062a0
>         conn_id = <optimized out>
>         rc = <optimized out>
> #8  worker_thread (thd=0x7f70dc57ce88, dummy=<optimized out>) at
> event.c:2261
>         csd = 0x7f70dc5060b0
>         cs = 0x0
>         te = 0x0
>         ptrans = 0x7f70dc506028
>         ti = <optimized out>
>         process_slot = 1
>         thread_slot = 1
>         rv = <optimized out>
>         is_idle = 0
> #9  0x00007f70db842488 in dummy_worker (opaque=0x7f70dc57ce88) at
> threadproc/unix/thread.c:142
>         thread = 0x7f70dc57ce88
> #10 0x00007f70dafb4464 in start_thread (arg=0x7f70caa6b700) at
> pthread_create.c:333
>         __res = <optimized out>
>         pd = 0x7f70caa6b700
>         now = <optimized out>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140122412988160,
> 6480505367792406515, 0, 140122429770751, 0, 140122711343168,
>                 -6408916910292381709, -6408951309539651597},
> mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
>               prev = 0x0, cleanup = 0x0, canceltype = 0}}}
>         not_first_call = <optimized out>
>         pagesize_m1 = <optimized out>
>         sp = <optimized out>
>         freesize = <optimized out>
>         __PRETTY_FUNCTION__ = "start_thread"
> #11 0x00007f70daaf397f in clone () at ../sysdeps/unix/sysv/linux/
> x86_64/clone.S:105
> No locals.
>
> Thread 2 (Thread 0x7f70cb26c700 (LWP 3391)):
> #0  0x00007f70daaea50d in poll () at ../sysdeps/unix/syscall-template.S:84
> No locals.
> #1  0x00007f70db83c5d2 in apr_poll (aprset=0x7f70cb26bc20, num=1,
> nsds=0x7f70cb26bc0c, timeout=60000) at poll/unix/poll.c:120
>         i = 1
>         num_to_poll = 1
>         pollset = 0x7f70cb26bb60
> #2  0x00000000004529b1 in send_brigade_blocking (c=0x7f70dc514348,
> bytes_written=0x7f70dc510a00, bb=0x7f70dc510658, s=0x7f70dc5140b0)
>     at core_filters.c:633
>         nsds = 1
>         timeout = 60000000
>         pollset = {p = 0x7f70dc510028, desc_type = APR_POLL_SOCKET,
> reqevents = 4, rtnevents = 4, desc = {f = 0x7f70dc5140b0,
>             s = 0x7f70dc5140b0}, client_data = 0x7f70dc5106a8}
>         rv = <optimized out>
> #3  ap_core_output_filter (f=0x7f70dc510830, bb=0x7f70dc510658) at
> core_filters.c:449
>         c = 0x7f70dc514348
>         net = 0x7f70dc5107d8
>         ctx = 0x7f70dc5109f0
>         flush_upto = 0x7f70dc5136e8
>         rv = <optimized out>
>         loglevel = 4
> #4  0x000000000043955f in ap_pass_brigade (bb=<optimized out>,
> next=0x7f70dc510830) at util_filter.c:610
>         e = <optimized out>
> #5  ap_filter_output_pending (c=0x7f70dc514348) at util_filter.c:985
>         f = 0x7f70dc510830
>         rindex = 0x7f70dc5106a8
>         data_in_output_filters = 0
> #6  0x0000000000435ef0 in ap_run_output_pending (c=c@entry=0x7f70dc514348)
> at mpm_common.c:122
>         pHook = <optimized out>
>         n = 0
>         rv = -1
> #7  0x00007f70ce5da6fc in process_socket (my_thread_num=0, my_child_num=1,
> cs=0x7f70dc5142b8, sock=0x7f70dc5140b0, p=0x7f70dc514028,
>     thd=0x7f70dc57ce58) at event.c:1152
>         not_complete_yet = <optimized out>
>         c = 0x7f70dc514348
>         sbh = 0x7f70dc5142a0
>         conn_id = <optimized out>
>         rc = <optimized out>
> #8  worker_thread (thd=0x7f70dc57ce58, dummy=<optimized out>) at
> event.c:2261
>         csd = 0x7f70dc5140b0
>         cs = 0x0
>         te = 0x0
>         ptrans = 0x7f70dc514028
>         ti = <optimized out>
>         process_slot = 1
>         thread_slot = 0
>         rv = <optimized out>
>         is_idle = 0
> #9  0x00007f70db842488 in dummy_worker (opaque=0x7f70dc57ce58) at
> threadproc/unix/thread.c:142
>         thread = 0x7f70dc57ce58
> #10 0x00007f70dafb4464 in start_thread (arg=0x7f70cb26c700) at
> pthread_create.c:333
>         __res = <optimized out>
>         pd = 0x7f70cb26c700
>         now = <optimized out>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140122421380864,
> 6480505367792406515, 0, 140122429770751, 0, 140122711343168,
>                 -6408913612294369293, -6408951309539651597},
> mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
>               prev = 0x0, cleanup = 0x0, canceltype = 0}}}
>         not_first_call = <optimized out>
>         pagesize_m1 = <optimized out>
>         sp = <optimized out>
>         freesize = <optimized out>
>         __PRETTY_FUNCTION__ = "start_thread"
> #11 0x00007f70daaf397f in clone () at ../sysdeps/unix/sysv/linux/
> x86_64/clone.S:105
> No locals.
>
>
>
>
> LoadModule authn_file_module modules/mod_authn_file.so
> LoadModule authn_dbm_module modules/mod_authn_dbm.so
> LoadModule authn_anon_module modules/mod_authn_anon.so
> LoadModule authn_dbd_module modules/mod_authn_dbd.so
> LoadModule authn_socache_module modules/mod_authn_socache.so
> LoadModule authn_core_module modules/mod_authn_core.so
> LoadModule authz_host_module modules/mod_authz_host.so
> LoadModule authz_groupfile_module modules/mod_authz_groupfile.so
> LoadModule authz_user_module modules/mod_authz_user.so
> LoadModule authz_dbm_module modules/mod_authz_dbm.so
> LoadModule authz_owner_module modules/mod_authz_owner.so
> LoadModule authz_dbd_module modules/mod_authz_dbd.so
> LoadModule authz_core_module modules/mod_authz_core.so
> LoadModule authnz_fcgi_module modules/mod_authnz_fcgi.so
> LoadModule access_compat_module modules/mod_access_compat.so
> LoadModule auth_basic_module modules/mod_auth_basic.so
> LoadModule auth_form_module modules/mod_auth_form.so
> LoadModule auth_digest_module modules/mod_auth_digest.so
> LoadModule allowmethods_module modules/mod_allowmethods.so
> LoadModule allowhandlers_module modules/mod_allowhandlers.so
> LoadModule socache_shmcb_module modules/mod_socache_shmcb.so
> LoadModule reqtimeout_module modules/mod_reqtimeout.so
> LoadModule request_module modules/mod_request.so
> LoadModule include_module modules/mod_include.so
> LoadModule filter_module modules/mod_filter.so
> LoadModule substitute_module modules/mod_substitute.so
> LoadModule sed_module modules/mod_sed.so
> LoadModule charset_lite_module modules/mod_charset_lite.so
> LoadModule xml2enc_module modules/mod_xml2enc.so
> LoadModule proxy_html_module modules/mod_proxy_html.so
> LoadModule mime_module modules/mod_mime.so
> LoadModule syslog_module modules/mod_syslog.so
> LoadModule log_config_module modules/mod_log_config.so
> LoadModule logio_module modules/mod_logio.so
> LoadModule lua_module modules/mod_lua.so
> LoadModule env_module modules/mod_env.so
> LoadModule expires_module modules/mod_expires.so
> LoadModule headers_module modules/mod_headers.so
> LoadModule ident_module modules/mod_ident.so
> LoadModule usertrack_module modules/mod_usertrack.so
> LoadModule unique_id_module modules/mod_unique_id.so
> LoadModule setenvif_module modules/mod_setenvif.so
> LoadModule version_module modules/mod_version.so
> LoadModule remoteip_module modules/mod_remoteip.so
> LoadModule proxy_module modules/mod_proxy.so
> LoadModule proxy_connect_module modules/mod_proxy_connect.so
> LoadModule proxy_ftp_module modules/mod_proxy_ftp.so
> LoadModule proxy_http_module modules/mod_proxy_http.so
> LoadModule proxy_fcgi_module modules/mod_proxy_fcgi.so
> LoadModule proxy_scgi_module modules/mod_proxy_scgi.so
> LoadModule proxy_fdpass_module modules/mod_proxy_fdpass.so
> LoadModule proxy_wstunnel_module modules/mod_proxy_wstunnel.so
> LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
> LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
> LoadModule proxy_express_module modules/mod_proxy_express.so
> LoadModule session_module modules/mod_session.so
> LoadModule session_cookie_module modules/mod_session_cookie.so
> LoadModule session_crypto_module modules/mod_session_crypto.so
> LoadModule session_dbd_module modules/mod_session_dbd.so
> LoadModule slotmem_shm_module modules/mod_slotmem_shm.so
> LoadModule slotmem_plain_module modules/mod_slotmem_plain.so
> LoadModule policy_module modules/mod_policy.so
> LoadModule lbmethod_byrequests_module modules/mod_lbmethod_byrequests.so
> LoadModule lbmethod_bytraffic_module modules/mod_lbmethod_bytraffic.so
> LoadModule lbmethod_bybusyness_module modules/mod_lbmethod_bybusyness.so
> LoadModule lbmethod_heartbeat_module modules/mod_lbmethod_heartbeat.so
> LoadModule mpm_event_module modules/mod_mpm_event.so
> LoadModule unixd_module modules/mod_unixd.so
> LoadModule status_module modules/mod_status.so
> LoadModule autoindex_module modules/mod_autoindex.so
> LoadModule asis_module modules/mod_asis.so
> LoadModule info_module modules/mod_info.so
> LoadModule suexec_module modules/mod_suexec.so
> LoadModule vhost_alias_module modules/mod_vhost_alias.so
> LoadModule negotiation_module modules/mod_negotiation.so
> LoadModule dir_module modules/mod_dir.so
> LoadModule userdir_module modules/mod_userdir.so
> LoadModule alias_module modules/mod_alias.so
> LoadModule rewrite_module modules/mod_rewrite.so


Since this email thread seems important, is there any update from anybody
working on it? It would be great to open a bugzilla task otherwise, to
track everything and make sure that we don't forget about it :)

Luca

Re: Async write completion broken in trunk?

Posted by Luca Toscano <to...@gmail.com>.

2017-01-18 13:50 GMT+01:00 Graham Leggett <mi...@sharp.fm>:

> On 17 Jan 2017, at 7:40 PM, Luca Toscano <to...@gmail.com> wrote:
>
> > Since this email thread seems important, is there any update from
> anybody working on it? It would be great to open a bugzilla task otherwise,
> to track everything and make sure that we don't forget about it :)
>
> I looked at this a while back - I found that pipelining support is causing
> the blocking.
>
> When we’re handling pipelined requests and we reach a limit, we block.
> What we should be doing instead is looping round back to WRITE_COMPLETION,
> waiting for permission to write again. This should be reasonably
> straightforward to fix, but my financial year end is the end of this month
> so can’t look at this till February.
>
> I suspect pipelining support has been suppressed in v2.4.x event MPM, and
> was at some point “fixed” on trunk, bringing this problem back.
>
>
Thanks a lot for the explanation, I didn't mean to rush you but just to
ping the list to figure out if anything new came up :)

Luca

Re: Async write completion broken in trunk?

Posted by Yann Ylavic <yl...@gmail.com>.

On Wed, Jan 18, 2017 at 2:43 PM, Graham Leggett <mi...@sharp.fm> wrote:
> On 18 Jan 2017, at 3:38 PM, Jim Jagielski <ji...@jaguNET.com> wrote:
>
>> That's a good overview... Yeah, it might be getter better, but it does
>> seem that the nature of the bugs implies it's also getting more
>> fragile. Could be, likely is, just my impression due to some weird
>> things I've seen lately on some testing.

We could make no improvement in trunk (bugfixes only) and have better
feeling, yet that's not how we'll make httpd-next...

Let's fix the bugs or revert if things really prove catastrophic, right?

>
> Pipelining support as it stands is really ugly.
>
> There is a point that is reached where we need to ask the question
> “is there more data available to read”, and if the answer is yes, we
> need to follow the pipeline path. We answer the question “is there
> more data available to read” by trying to read some data, and then in
> turn by tying ourselves in ugly knots trying to “unread” that data so
> the next request can work.

Are we talking about ap_check_pipeline()?

>
> I want to fix this properly so we don’t read data, we just get an
> answer “there is data to be read”.

Not sure I get the nuance, but "there is *HTTP* data to be read" (like
ap_check_pipeline() does) looks reasonable to me, unless we want
something like a "TLS close notify" (or trailing CRLFs) start a new
request processing which will fail (or block).

Regards,
Yann.

Re: Async write completion broken in trunk?

Posted by Graham Leggett <mi...@sharp.fm>.

On 18 Jan 2017, at 3:38 PM, Jim Jagielski <ji...@jaguNET.com> wrote:

> That's a good overview... Yeah, it might be getter better, but it does
> seem that the nature of the bugs implies it's also getting more
> fragile. Could be, likely is, just my impression due to some weird
> things I've seen lately on some testing.

Pipelining support as it stands is really ugly.

There is a point that is reached where we need to ask the question “is there more data available to read”, and if the answer is yes, we need to follow the pipeline path. We answer the question “is there more data available to read” by trying to read some data, and then in turn by tying ourselves in ugly knots trying to “unread” that data so the next request can work.

I want to fix this properly so we don’t read data, we just get an answer “there is data to be read”. This will take a bit of thought, but doesn’t seem difficult to do.

Regards,
Graham
—

Re: Async write completion broken in trunk?

Posted by Jim Jagielski <ji...@jaguNET.com>.

> On Jan 18, 2017, at 8:25 AM, Luca Toscano <to...@gmail.com> wrote:
> 
> 
> 
> 2017-01-18 14:00 GMT+01:00 Jim Jagielski <ji...@jagunet.com>:
> 
> > On Jan 18, 2017, at 7:50 AM, Graham Leggett <mi...@sharp.fm> wrote:
> >
> > On 17 Jan 2017, at 7:40 PM, Luca Toscano <to...@gmail.com> wrote:
> >
> >> Since this email thread seems important, is there any update from anybody working on it? It would be great to open a bugzilla task otherwise, to track everything and make sure that we don't forget about it :)
> >
> > I looked at this a while back - I found that pipelining support is causing the blocking.
> >
> > When we’re handling pipelined requests and we reach a limit, we block. What we should be doing instead is looping round back to WRITE_COMPLETION, waiting for permission to write again. This should be reasonably straightforward to fix, but my financial year end is the end of this month so can’t look at this till February.
> >
> > I suspect pipelining support has been suppressed in v2.4.x event MPM, and was at some point “fixed” on trunk, bringing this problem back.
> >
> 
> Somewhat on-topic but also off-topic as well, but it does seem
> that event on trunk is getting much more fragile instead of more
> stable. It seems to be picking up some kruft which is making
> event on trunk a *worse* option than what's in 2.4, despite a deeper
> async alignment.
> 
> 
> My 2c view on this is that mpm-event has been getting better during the last months, but the nature of the changes, together with the fact that we have not developed a good way to test regressions, causes long running debugging tasks. In the case of the wake-ups, we are basically testing a new big feature thanks to live traffic from Stefan Priebe (thanks again for the help btw!), meanwhile it would be really great to have some sort of test to fake traffic and spot problems (that would absolutely not substitute the invaluable user feedback of course).
> 

That's a good overview... Yeah, it might be getter better, but it does
seem that the nature of the bugs implies it's also getting more
fragile. Could be, likely is, just my impression due to some weird
things I've seen lately on some testing.

Re: Async write completion broken in trunk?

Posted by Luca Toscano <to...@gmail.com>.

2017-01-18 14:00 GMT+01:00 Jim Jagielski <ji...@jagunet.com>:

>
> > On Jan 18, 2017, at 7:50 AM, Graham Leggett <mi...@sharp.fm> wrote:
> >
> > On 17 Jan 2017, at 7:40 PM, Luca Toscano <to...@gmail.com> wrote:
> >
> >> Since this email thread seems important, is there any update from
> anybody working on it? It would be great to open a bugzilla task otherwise,
> to track everything and make sure that we don't forget about it :)
> >
> > I looked at this a while back - I found that pipelining support is
> causing the blocking.
> >
> > When we’re handling pipelined requests and we reach a limit, we block.
> What we should be doing instead is looping round back to WRITE_COMPLETION,
> waiting for permission to write again. This should be reasonably
> straightforward to fix, but my financial year end is the end of this month
> so can’t look at this till February.
> >
> > I suspect pipelining support has been suppressed in v2.4.x event MPM,
> and was at some point “fixed” on trunk, bringing this problem back.
> >
>
> Somewhat on-topic but also off-topic as well, but it does seem
> that event on trunk is getting much more fragile instead of more
> stable. It seems to be picking up some kruft which is making
> event on trunk a *worse* option than what's in 2.4, despite a deeper
> async alignment.
>
>
My 2c view on this is that mpm-event has been getting better during the
last months, but the nature of the changes, together with the fact that we
have not developed a good way to test regressions, causes long running
debugging tasks. In the case of the wake-ups, we are basically testing a
new big feature thanks to live traffic from Stefan Priebe (thanks again for
the help btw!), meanwhile it would be really great to have some sort of
test to fake traffic and spot problems (that would absolutely not
substitute the invaluable user feedback of course).

Luca

Re: Async write completion broken in trunk?

Posted by Jim Jagielski <ji...@jaguNET.com>.

> On Jan 18, 2017, at 8:14 AM, Graham Leggett <mi...@sharp.fm> wrote:
> 
> On 18 Jan 2017, at 3:05 PM, Jim Jagielski <ji...@jaguNET.com> wrote:
> 
>> Well, there's this, and it seems like there are issues w/
>> the current mutexing as well, causing weird wake-ups.
> 
> Only solution really is to find the bugs and fix them.
> 

Yep... I'm trying to do some stress testing

Re: Async write completion broken in trunk?

Posted by Graham Leggett <mi...@sharp.fm>.

On 18 Jan 2017, at 3:05 PM, Jim Jagielski <ji...@jaguNET.com> wrote:

> Well, there's this, and it seems like there are issues w/
> the current mutexing as well, causing weird wake-ups.

Only solution really is to find the bugs and fix them.

Regards,
Graham
—

Re: Async write completion broken in trunk?

Posted by Jim Jagielski <ji...@jaguNET.com>.

> On Jan 18, 2017, at 8:01 AM, Graham Leggett <mi...@sharp.fm> wrote:
> 
> On 18 Jan 2017, at 3:00 PM, Jim Jagielski <ji...@jaguNET.com> wrote:
> 
>> Somewhat on-topic but also off-topic as well, but it does seem
>> that event on trunk is getting much more fragile instead of more
>> stable. It seems to be picking up some kruft which is making
>> event on trunk a *worse* option than what's in 2.4, despite a deeper
>> async alignment.
> 
> Can you give an example?
> 

Well, there's this, and it seems like there are issues w/
the current mutexing as well, causing weird wake-ups.

Re: Async write completion broken in trunk?

Posted by Graham Leggett <mi...@sharp.fm>.

On 18 Jan 2017, at 3:00 PM, Jim Jagielski <ji...@jaguNET.com> wrote:

> Somewhat on-topic but also off-topic as well, but it does seem
> that event on trunk is getting much more fragile instead of more
> stable. It seems to be picking up some kruft which is making
> event on trunk a *worse* option than what's in 2.4, despite a deeper
> async alignment.

Can you give an example?

Regards,
Graham
—

Re: Async write completion broken in trunk?

Posted by Jim Jagielski <ji...@jaguNET.com>.

> On Jan 18, 2017, at 7:50 AM, Graham Leggett <mi...@sharp.fm> wrote:
> 
> On 17 Jan 2017, at 7:40 PM, Luca Toscano <to...@gmail.com> wrote:
> 
>> Since this email thread seems important, is there any update from anybody working on it? It would be great to open a bugzilla task otherwise, to track everything and make sure that we don't forget about it :)
> 
> I looked at this a while back - I found that pipelining support is causing the blocking.
> 
> When we’re handling pipelined requests and we reach a limit, we block. What we should be doing instead is looping round back to WRITE_COMPLETION, waiting for permission to write again. This should be reasonably straightforward to fix, but my financial year end is the end of this month so can’t look at this till February.
> 
> I suspect pipelining support has been suppressed in v2.4.x event MPM, and was at some point “fixed” on trunk, bringing this problem back.
> 

Somewhat on-topic but also off-topic as well, but it does seem
that event on trunk is getting much more fragile instead of more
stable. It seems to be picking up some kruft which is making
event on trunk a *worse* option than what's in 2.4, despite a deeper
async alignment.

Re: Async write completion broken in trunk?

Posted by Graham Leggett <mi...@sharp.fm>.

On 17 Jan 2017, at 7:40 PM, Luca Toscano <to...@gmail.com> wrote:

> Since this email thread seems important, is there any update from anybody working on it? It would be great to open a bugzilla task otherwise, to track everything and make sure that we don't forget about it :)

I looked at this a while back - I found that pipelining support is causing the blocking.

When we’re handling pipelined requests and we reach a limit, we block. What we should be doing instead is looping round back to WRITE_COMPLETION, waiting for permission to write again. This should be reasonably straightforward to fix, but my financial year end is the end of this month so can’t look at this till February.

I suspect pipelining support has been suppressed in v2.4.x event MPM, and was at some point “fixed” on trunk, bringing this problem back.

Regards,
Graham
—