You are viewing a plain text version of this content. The canonical link for it is here.
Posted to rivet-dev@tcl.apache.org by bu...@apache.org on 2013/07/02 10:37:51 UTC
[Bug 55153] fileevents do not fire
https://issues.apache.org/bugzilla/show_bug.cgi?id=55153
--- Comment #1 from Harald Oehlmann <ha...@elmicron.de> ---
I asked on clt for help:
https://groups.google.com/forum/#!topic/comp.lang.tcl/Ja01kv75tVk
Thread name: "Async sockets hang under Rivet but work in tclsh 8.6"
Start Date: 27.6.2013
Here is the essence:
- CentOS 6.2 64 bit
- /usr/sbin/httpd -V
Server version: Apache/2.2.15 (Unix)
Server built: May 13 2013 22:11:16
Server's Module Magic Number: 20051115:25
Server loaded: APR 1.3.9, APR-Util 1.3.9
Compiled using: APR 1.3.9, APR-Util 1.3.9
Architecture: 64-bit
Server MPM: Prefork
threaded: no
forked: yes (variable process count)
- Apache was configured to only have one thread (/etc/httpd/conf/httpd.conf:
<IfModule prefork.c>
StartServers 1
MinSpareServers 1
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000
</IfModule>
)
- Test file "test_file_async.tcl" in "/var/www/html". File contents as in the
original bug.
- Start http: service httpd start
- start strace:
ps awux | grep sbin/httpd | awk '{print"-p " $2}' | xargs strace -tt -f -o
straceall.txt
- the resulting trace file is attached to the bug.
Here are the comments by Alexandre Ferieux to the strace log from clt:
OK here is what I can see:
(1) open the file, and wake up the notifier (by writing a single byte to the
pipe) thread so that it takes the new fd into account for its select() mask:
> 29969 12:04:52.368364 open("/usr/tmp/test.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 21
> ...
> 29969 12:04:52.368806 write(17, "\0", 1) = 1
(2) In the notifier: detect that activity on the control pipe...
> 29965 12:04:52.368843 <... select resumed> ) = 1 (in [16])
> 29965 12:04:52.368889 read(16, <unfinished ...>
> 29965 12:04:52.368922 <... read resumed> "\0", 1) = 1
(3) ... and normally restart a select with the new fd (21) in the write mask,
but here, FAIL TO DO SO:
> 29965 12:04:52.368963 select(17, [16], [], [], NULL <unfinished ...>
(4) So, sleep stupidly (only monitoring the control pipe) until the timeout
expires:
> 29962 12:04:53.218193 <... select resumed> ) = 0 (Timeout)
I ran your script in vanilla Tcl (which works, as we already know), and
verified that the difference is exactly this (ignore the difference in fds):
2297 17:31:15.800767 open("/tmp/test.txt",
O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 5
2298 17:31:15.803090 <... read resumed> "\0", 1) = 1
2298 17:31:15.803165 select(6, [3], [5], [], NULL) = 1 (out [5])
So here, somehow the communication with the notifier is incomplete: we do get
the control signal through ("hey, new mask to recompute"), but the
recomputation miserably fails.
Next step, I guess, is to look in detail at the data structures that are shared
between the caller and notifier. Perhaps the maintainer knows how the Rivet
environment is different in that respect ?
--- Comment 2 ---
> 29965 12:04:52.368963 select(17, [16], [], [], NULL <unfinished ...>
> ...
> 29962 12:04:53.218193 <... select resumed> ) = 0 (Timeout)
Hmmm, just noticed the different thread IDs: here we have a dance of 3 threads
!
- 29969 is your main thread
- 29965 is the notifier
- 29962 is a mysterious other thread that does a 1-second periodic task
Apart from the (remote) similarity between this 1-second period and your [after
1000], this does not really match what we see in the script. Definitely
something to be explained by a Rivet-savvy person...
-Alex
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: rivet-dev-unsubscribe@tcl.apache.org
For additional commands, e-mail: rivet-dev-help@tcl.apache.org