You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Paul Querna <ch...@force-elite.com> on 2007/01/17 04:41:57 UTC
Bug in epoll
Hey All,
I am observing a bug in apr_pollset_poll.
What I am seeing is this:
1) About ~140 sockets to different machines added to the pollset in
apr_memcache_multgetp() watching for read availability.
2) Write memcache requests to the servers
3) Start _poll()'ing for data:
a) The first couple sockets come back within a few milliseconds, and
are read correctly.
b) The next time apr_pollset_poll is called, it does return, but only
a SINGLE socket is marked as available, and it waits to within 1
millisecond of the TIMEOUT value. This single socket is read correctly.
c) The next time apr_pollset_poll is called, it behaves like normal,
returning multiple results, in a very short time period.
The pattern of a,b,c sometimes repeats multiple times before all of the
data has been received from the servers.
Other notes:
- This is in a single threaded client, so there is no cross locking of
the linked lists from _add or _remove in the pollset.
- OS is RHEL 4 update 2.
- This is 99.9% reproducible in a large scale test and production
environment.
The most interesting aspect to me is that if I compile APR using poll()
instead of epoll() as the apr_pollset backend, the exact same code works
great, with no extra delay. (just pass apr_cv_epoll=no to your
./configure line).
I googled'^H^H^H^H^H^Hsearched around, and wasn't able to find mention
of a bug like this.
To me, the non-kernel programmer, it looks like epoll is only getting
triggered on the wakeup timer for the timeout, and not returning
instantly when it has found a socket available for read. When it
finally does hit the timeout wakeup, it does notice that there is a
socket available to read, and returns it, rather than an actual timeout.
For the short term, I am satisfied with disabling epoll on my builds of
APR. I think we should consider disabling epoll by default on APR, if I
can isolate the bug to a kernel revision.
Any ideas or pointers to epoll bugs|fixes would be great.....
-Paul
Re: Bug in epoll
Posted by Davi Arnaut <da...@haxent.com.br>.
Paul Querna wrote:
> Hey All,
>
> I am observing a bug in apr_pollset_poll.
>
..
>
> To me, the non-kernel programmer, it looks like epoll is only getting
> triggered on the wakeup timer for the timeout, and not returning
> instantly when it has found a socket available for read. When it
> finally does hit the timeout wakeup, it does notice that there is a
> socket available to read, and returns it, rather than an actual timeout.
>
> For the short term, I am satisfied with disabling epoll on my builds of
> APR. I think we should consider disabling epoll by default on APR, if I
> can isolate the bug to a kernel revision.
>
epoll code hasn't changed much in ages..I would probably blame
apr_memcache code :)
Example:
apr_memcache_multgetp():
..
rv = apr_pollset_create(&pollset, apr_hash_count(server_queries),
temp_pool, 0);
..
..
apr_pool_clear(temp_pool);
apr_pollset_destroy(pollset);
Also there a lot of calls that the returns values are not checked.
Anyway, better double check the apr_memcache code. A full strace log
should help shed some light on the issue.
--
Davi Arnaut