You are viewing a plain text version of this content. The canonical link for it is here.
Posted to rivet-dev@tcl.apache.org by bu...@apache.org on 2013/07/02 10:37:51 UTC

[Bug 55153] fileevents do not fire

https://issues.apache.org/bugzilla/show_bug.cgi?id=55153

--- Comment #1 from Harald Oehlmann <ha...@elmicron.de> ---
I asked on clt for help:
https://groups.google.com/forum/#!topic/comp.lang.tcl/Ja01kv75tVk
Thread name: "Async sockets hang under Rivet but work in tclsh 8.6"
Start Date: 27.6.2013

Here is the essence:
- CentOS 6.2 64 bit
- /usr/sbin/httpd -V
Server version: Apache/2.2.15 (Unix)
Server built:   May 13 2013 22:11:16
Server's Module Magic Number: 20051115:25
Server loaded:  APR 1.3.9, APR-Util 1.3.9
Compiled using: APR 1.3.9, APR-Util 1.3.9
Architecture:   64-bit
Server MPM:     Prefork
  threaded:     no
    forked:     yes (variable process count)
- Apache was configured to only have one thread (/etc/httpd/conf/httpd.conf:
<IfModule prefork.c>
StartServers       1
MinSpareServers    1
MaxSpareServers   20
ServerLimit      256
MaxClients       256
MaxRequestsPerChild  4000
</IfModule>
)

- Test file "test_file_async.tcl" in "/var/www/html". File contents as in the
original bug.

- Start http: service httpd start
- start strace:
  ps awux | grep sbin/httpd | awk '{print"-p " $2}' | xargs strace -tt -f -o
straceall.txt 
- the resulting trace file is attached to the bug.

Here are the comments by Alexandre Ferieux to the strace log from clt:

OK here is what I can see:

(1) open the file, and wake up the notifier (by writing a single byte to the
pipe) thread so that it takes the new fd into account for its select() mask:

> 29969 12:04:52.368364 open("/usr/tmp/test.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 21
> ...
> 29969 12:04:52.368806 write(17, "\0", 1) = 1

(2) In the notifier: detect that activity on the control pipe...

> 29965 12:04:52.368843 <... select resumed> ) = 1 (in [16])
> 29965 12:04:52.368889 read(16,  <unfinished ...>
> 29965 12:04:52.368922 <... read resumed> "\0", 1) = 1

(3) ... and normally restart a select with the new fd (21) in the write mask,
but here, FAIL TO DO SO:

> 29965 12:04:52.368963 select(17, [16], [], [], NULL <unfinished ...>

(4) So, sleep stupidly (only monitoring the control pipe) until the timeout
expires:

> 29962 12:04:53.218193 <... select resumed> ) = 0 (Timeout)


I ran your script in vanilla Tcl (which works, as we already know), and
verified that the difference is exactly this (ignore the difference in fds):

2297  17:31:15.800767 open("/tmp/test.txt",
O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 5
2298  17:31:15.803090 <... read resumed> "\0", 1) = 1
2298  17:31:15.803165 select(6, [3], [5], [], NULL) = 1 (out [5])

So here, somehow the communication with the notifier is incomplete: we do get
the control signal through ("hey, new mask to recompute"), but the
recomputation miserably fails.

Next step, I guess, is to look in detail at the data structures that are shared
between the caller and notifier. Perhaps the maintainer knows how the Rivet
environment is different in that respect ?

--- Comment 2 ---

> 29965 12:04:52.368963 select(17, [16], [], [], NULL <unfinished ...>
> ...
> 29962 12:04:53.218193 <... select resumed> ) = 0 (Timeout)

Hmmm, just noticed the different thread IDs: here we have a dance of 3 threads
!

 - 29969 is your main thread
 - 29965 is the notifier
 - 29962 is a mysterious other thread that does a 1-second periodic task

Apart from the (remote) similarity between this 1-second period and your [after
1000], this does not really match what we see in the script. Definitely
something to be explained by a Rivet-savvy person...

-Alex

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: rivet-dev-unsubscribe@tcl.apache.org
For additional commands, e-mail: rivet-dev-help@tcl.apache.org