You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2005/09/13 14:11:10 UTC

DO NOT REPLY [Bug 36636] New: - blocking writev()/poll()-call causes mod_dav to stop responding

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=36636>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=36636

           Summary: blocking writev()/poll()-call causes mod_dav to stop
                    responding
           Product: Apache httpd-2.0
           Version: 2.0.54
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: All
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: arne@arne.net


After load on our production mod_dav server increased, we saw sporadic
non-functioning of DAV requests (PROPFIND especially), while GET requests
would still work. Each DAV request would hereafter block a child. This would
sometimes lead to effective DoS due to the
continous stream of new incoming requests blocking more and more children.
After long investigation of the problem, I am able to reproduce the bug, but
only using low-level
flowreplay (application layer tcpdump replay tool, part of tcpreplay).
I think the bug is caused by a blocking poll()-call (similar to bug 22030) after
a writev() call due to (missing TCP ACKs|some other network problem)
The PROPFIND causing it requests a large amount of data, so poll() blocks
several times [Timeout] sec.
Before blocking, this child has allocated a filesystem lock

10:53:01.096934 [ffffe410] fcntl64(9, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET,
start=0, len=0}) = 0 <0.000010>

on /tmp/DavLock.dir, which is the file the other children (new DAV requests)
require a lock for, too, hence blocking in the lock call:

10:53:01.370153 [ffffe410] fcntl64(9, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET,
start=0, len=0}) = 0 <716.870032>

The GET requests still work (when enough non-blocked children are still
available) since they do not require the /tmp/DavLock.dir fs-lock.

The part where the offending child is blocking poll() is straced here:

[...]
10:53:01.216048 [ffffe410] open("/srv/www/webdav/..........", O_RDONLY) = -1
ENOENT (No such file or directory) <0.000014>
10:53:01.216220 [ffffe410] writev(8, [{"1f5e\r\n", 6}, {"<insert large amount of
data here>"}, {"\r\n", 2}], 4) = 2487 <0.000011>
10:53:01.216489 [ffffe410] poll([{fd=8, events=POLLOUT}], 1, 300000) = 0
<299.956071>
10:58:01.172688 [ffffe410] open("/srv/www/webdav/..........", O_RDONLY) = -1
ENOENT (No such file or directory) <0.000023>
[...]
10:58:01.174614 [ffffe410] writev(8, [{"1f60\r\n", 6}, {"<insert large amount of
data here>"}, {"\r\n", 2}], 4) = -1 EAGAIN <0.000013>
10:58:01.174957 [ffffe410] poll([{fd=8, events=POLLOUT}], 1, 300000) = 0
<299.844216>
11:03:01.019320 [ffffe410] open("/src/www/webdav/..........", O_RDONLY) = -1
ENOENT (No such file or directory) <0.000015>
[...]
11:03:01.021422 [ffffe410] writev(8, [{"1f67\r\n", 6}, {"<insert large amount of
data here>"}, {"\r\n", 2}], 4) = -1 EAGAIN <0.000012>
11:03:01.021711 [ffffe410] poll([{fd=8, events=POLLOUT,
revents=POLLERR|POLLHUP}], 1, 300000) = 1 <113.248743>
11:04:54.270509 [ffffe410] writev(8, [{"1f67\r\n", 6}, {"<insert large amount of
data here>"}, {"\r\n", 2}], 4) = -1 EPIPE (Broken pipe) <0.000016>
11:04:54.270812 [ffffe410] --- SIGPIPE (Broken pipe) @ 0 (0) ---
[...]

After this last blocking poll(), all writev()s are repeated similar to the last
one ("1f67"...) resulting in EPIPE and the server
starts working again.

A backtrace on this process during a blocking poll() gives:

#0  0xffffe410 in ?? ()
#1  0xbfffc7d8 in ?? ()
#2  0x000493e0 in ?? ()
#3  0x00000001 in ?? ()
#4  0x404207bd in poll () from /lib/tls/libc.so.6
#5  0x402e9aad in apr_poll (aprset=0xbfffc860, num=1, nsds=0xbfffc85c,
timeout=Variable "timeout" is not available.
    ) at poll.c:130
#6  0x402e9e9f in apr_wait_for_io_or_timeout (f=Variable "f" is not available.
    ) at waitio.c:54
#7  0x402dfbee in apr_socket_sendv (sock=0x81bdb80, vec=0xbfffea58, nvec=3,
len=0xbfffc908) at sendrecv.c:208
#8  0x402dfcf6 in apr_sendv (sock=0x81bdb80, vec=0xbfffea58, nvec=3,
len=0xbfffc908) at sendrecv.c:991
#9  0x0807850a in writev_it_all (s=0x81bdb80, vec=0x493e1, nvec=1, len=8038,
nbytes=0xbfffc988) at core.c:2914
#10 0x08078fed in core_output_filter (f=0x81bdff8, b=0x81c7c58) at core.c:4247
#11 0x0807236b in ap_pass_brigade (next=0x493e1, bb=0x1) at util_filter.c:512
#12 0x08060ca8 in chunk_filter (f=0x81d9a90, b=0x81c7c58) at http_core.c:218
#13 0x0807236b in ap_pass_brigade (next=0x493e1, bb=0x1) at util_filter.c:512
#14 0x08075653 in ap_content_length_filter (f=0x81cdc60, b=0x81c7c58) at
protocol.c:1232
#15 0x0807236b in ap_pass_brigade (next=0x493e1, bb=0x1) at util_filter.c:512
#16 0x4002f7bb in apr_brigade_write (b=0x81c7c58, flush=0x80723a0
<ap_filter_flush>, ctx=0x81cdc60,
    str=0x82580a8 "<lp1:getetag>\"2229f-1e5e-3f81de40\"</lp1:getetag>\n",
nbyte=49) at apr_brigade.c:393
#17 0x4002fa28 in apr_brigade_puts (bb=0x81c7c58, flush=0x80723a0
<ap_filter_flush>, ctx=0x81cdc60,
    str=0x82580a8 "<lp1:getetag>\"2229f-1e5e-3f81de40\"</lp1:getetag>\n") at
apr_brigade.c:569
#18 0x404c2be4 in dav_send_one_response (response=Variable "response" is not
available.
    ) at mod_dav.c:464
#19 0x404c2d73 in dav_stream_response (wres=Variable "wres" is not available.
    ) at mod_dav.c:1100
#20 0x404c439a in dav_propfind_walker (wres=0xbfffeeb4, calltype=1) at
mod_dav.c:1917
#21 0x404d7cb8 in dav_fs_walker (fsctx=Variable "fsctx" is not available.
    ) at repos.c:1485
#22 0x404d826e in dav_fs_internal_walk (params=Variable "params" is not available.
    ) at repos.c:1729
#23 0x404c7dd2 in dav_handler (r=0x81ccff0) at mod_dav.c:2045
#24 0x080674eb in ap_run_handler (r=0x81ccff0) at config.c:152
#25 0x08069f45 in ap_invoke_handler (r=0x81ccff0) at config.c:364
#26 0x080656ef in ap_process_request (r=0x81ccff0) at http_request.c:249
#27 0x08060e79 in ap_process_http_connection (c=0x81bdc58) at http_core.c:251
#28 0x080702fb in ap_run_process_connection (c=0x81bdc58) at connection.c:43
#29 0x080665f8 in child_main (child_num_arg=Variable "child_num_arg" is not
available.
    ) at prefork.c:610
#30 0x08066781 in make_child (s=Variable "s" is not available.
    ) at prefork.c:704
#31 0x08066811 in startup_children (number_to_start=3) at prefork.c:722
#32 0x08066edd in ap_mpm_run (_pconf=0x809c0a8, plog=0x80c6150, s=0x809de58) at
prefork.c:941
#33 0x0806bffa in main (argc=3, argv=0xbffff504) at main.c:618

Please let me know if you need more information. I am sorry not being able to
provide an easy way to reproduce the bug, since it seems
specific to network effects. I can provide the .pcap dumps and part of the data
used in the DAV, or, alternatively, test the
bug with patched httpd versions.

Thank you for your efforts!

Regards,
Arne Voigtlaender

System-Info:
------------
Tested on:Linux 2.6.7, 2.6.13 (Distro: SuSE SLES 9)
          httpd 2.0.49, 2.0.53, 2.0.54
Hardware: 2x Xeon 3.4GHz, 5GB RAM, 8GB HDD,
          2x Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet
          (IBM Blade)

References:
-----------
http://archives.free.net.ph/message/20050909.014740.64bfea69.en.html
http://issues.apache.org/bugzilla/show_bug.cgi?id=22030

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org