You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Shail Bhatnagar <sh...@cisco.com> on 2001/08/06 18:55:19 UTC

Spurious return from select()

I have observed that sometimes select() 
returns a positive value to  one or more processes
but only one is able to read UDP data.

The server is this case has been modified
to listen on a well known udp port.

Does anybody have any clue ? The errno returned
by recvfrom() is 11 - resource temporarily unavailable.

Secondly, is there a know crash on Solaris when
APR_HAS_THREADS is on. This is about httpd 2.0.16 beta.

Thanks,
Shail

Re: Spurious return from select()

Posted by Shail Bhatnagar <sh...@cisco.com>.
Jeff, Fair enough. So the accept() will actually make the
well known socket "not readable" and other worker httpd
processes will not get "false" returns from select().

The solution would be to do the recvfrom() before releasing
the mutex.

Thanks,
Shail

Jeff Trawick wrote:
> 
> Shail Bhatnagar <sh...@cisco.com> writes:
> 
> > Jeff, The recvfrom() does not take place until process_connection() hook
> > is invoked. This is consistent with apache code, I think. So, one would
> > think that vanilla httpd processes ( without my changes) encounter the
> > same
> > problem ?? After the mutex is turned off and before the read/recv is
> > performed
> > there is a window of opportunity when other process/processes will enter
> > select()
> > and it will immediately return a positive value, indicating the socket
> > to be
> > readable. One has to set the O_NDELAY flag in these descriptors to work
> > around
> > this problem.
> >
> > Comments ??
> 
> recvfrom() on your UDP socket is analogous to the accept() done on the
> apache listening socket in a couple of respects, one of which is
> locking requirements
> 
> you want to allow only one select() pop for a given datagram (the
> mutex takes care of this) *AND* the thread for which select() popped
> must call recvfrom() before releasing the mutex; otherwise, as soon as
> you release the mutex select() can pop for another thread
> 
> you can teach the code to recover if a thread is awakened and doesn't
> get a datagram, but performance will suffer due to extra dispatches
> 
> --
> Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
>        http://www.geocities.com/SiliconValley/Park/9289/
>              Born in Roswell... married an alien...

Re: Spurious return from select()

Posted by Jeff Trawick <tr...@attglobal.net>.
Shail Bhatnagar <sh...@cisco.com> writes:

> Jeff, The recvfrom() does not take place until process_connection() hook
> is invoked. This is consistent with apache code, I think. So, one would
> think that vanilla httpd processes ( without my changes) encounter the
> same
> problem ?? After the mutex is turned off and before the read/recv is
> performed
> there is a window of opportunity when other process/processes will enter
> select()
> and it will immediately return a positive value, indicating the socket
> to be
> readable. One has to set the O_NDELAY flag in these descriptors to work
> around
> this problem. 
> 
> Comments ??

recvfrom() on your UDP socket is analogous to the accept() done on the
apache listening socket in a couple of respects, one of which is
locking requirements

you want to allow only one select() pop for a given datagram (the
mutex takes care of this) *AND* the thread for which select() popped
must call recvfrom() before releasing the mutex; otherwise, as soon as
you release the mutex select() can pop for another thread

you can teach the code to recover if a thread is awakened and doesn't
get a datagram, but performance will suffer due to extra dispatches

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: Spurious return from select()

Posted by Shail Bhatnagar <sh...@cisco.com>.
Jeff, The recvfrom() does not take place until process_connection() hook
is invoked. This is consistent with apache code, I think. So, one would
think that vanilla httpd processes ( without my changes) encounter the
same
problem ?? After the mutex is turned off and before the read/recv is
performed
there is a window of opportunity when other process/processes will enter
select()
and it will immediately return a positive value, indicating the socket
to be
readable. One has to set the O_NDELAY flag in these descriptors to work
around
this problem. 

Comments ??

Thanks,
Shail


Jeff Trawick wrote:
> 
> Shail Bhatnagar <sh...@cisco.com> writes:
> 
> > Jeff, Thanks for your response. I am
> > using the standard child_main() loop
> > in which select() is protected by the
> > mutex. The only difference is that the
> > parent is bound to a well known udp
> > port and so all children are monitoring
> > this well known port. Despite this mutex,
> > I see this behavior fairly consistently.
> > The frequency is more on solaris than on linux.
> 
> you are calling recvfrom() while still holding the accept mutex,
> right?
> 
> no other idea...
> 
> > The crash that I saw was in apr_pool_alloc_init().
> > apr_lock_create() failed, although there were
> > not permissions problems and then apr_lock_destroy()
> > crashed while accessing a NULL pointer.
> >
> > The relevant code fragment in apr_pool_alloc_init() is :
> > #if APR_HAS_THREADS
> >     status = apr_lock_create(&alloc_mutex, APR_MUTEX, APR_INTRAPROCESS,
> >                    NULL, globalp);
> >     if (status != APR_SUCCESS) {
> >         apr_lock_destroy(alloc_mutex);
> >         return status;
> >     }
> 
> well, clearly apr_lock_destroy() shouldn't be called if the mutex
> wasn't successfully created... I'll commit a fix in a jiffy
> 
> the question most interesting to me is why this apr_lock_create()
> failing for you and for noone else...  (I guess it just happened
> once?)
> 
> if you can manage to recreate, please run truss/strace on the program
> so we can see what syscall is failing, and with what errno
> 
> --
> Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
>        http://www.geocities.com/SiliconValley/Park/9289/
>              Born in Roswell... married an alien...

Re: Spurious return from select()

Posted by Jeff Trawick <tr...@attglobal.net>.
Shail Bhatnagar <sh...@cisco.com> writes:

> Jeff, Thanks for your response. I am 
> using the standard child_main() loop
> in which select() is protected by the
> mutex. The only difference is that the
> parent is bound to a well known udp
> port and so all children are monitoring
> this well known port. Despite this mutex,
> I see this behavior fairly consistently.
> The frequency is more on solaris than on linux.

you are calling recvfrom() while still holding the accept mutex,
right?

no other idea...

> The crash that I saw was in apr_pool_alloc_init().
> apr_lock_create() failed, although there were
> not permissions problems and then apr_lock_destroy()
> crashed while accessing a NULL pointer.
> 
> The relevant code fragment in apr_pool_alloc_init() is :
> #if APR_HAS_THREADS
>     status = apr_lock_create(&alloc_mutex, APR_MUTEX, APR_INTRAPROCESS,
>                    NULL, globalp);
>     if (status != APR_SUCCESS) {
>         apr_lock_destroy(alloc_mutex);
>         return status;
>     }

well, clearly apr_lock_destroy() shouldn't be called if the mutex
wasn't successfully created... I'll commit a fix in a jiffy

the question most interesting to me is why this apr_lock_create()
failing for you and for noone else...  (I guess it just happened
once?)

if you can manage to recreate, please run truss/strace on the program
so we can see what syscall is failing, and with what errno

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: Spurious return from select()

Posted by Shail Bhatnagar <sh...@cisco.com>.
Jeff, Thanks for your response. I am 
using the standard child_main() loop
in which select() is protected by the
mutex. The only difference is that the
parent is bound to a well known udp
port and so all children are monitoring
this well known port. Despite this mutex,
I see this behavior fairly consistently.
The frequency is more on solaris than on linux.

The crash that I saw was in apr_pool_alloc_init().
apr_lock_create() failed, although there were
not permissions problems and then apr_lock_destroy()
crashed while accessing a NULL pointer.

The relevant code fragment in apr_pool_alloc_init() is :
#if APR_HAS_THREADS
    status = apr_lock_create(&alloc_mutex, APR_MUTEX, APR_INTRAPROCESS,
                   NULL, globalp);
    if (status != APR_SUCCESS) {
        apr_lock_destroy(alloc_mutex);
        return status;
    }


Thanks,
Shail


Jeff Trawick wrote:
> 
> Shail Bhatnagar <sh...@cisco.com> writes:
> 
> > I have observed that sometimes select()
> > returns a positive value to  one or more processes
> > but only one is able to read UDP data.
> 
> On every system I know of, select() wakes up every process/thread
> selecting on the same descriptor (e.g., UDP socket) when the condition
> is met (e.g., a datagram is ready to read).
> 
> You'll want to use a mutex to ensure that only one process is in the
> select()+recvfrom() path at a time.  Otherwise, the extra wakeups will
> hurt performance.
> 
> But Apache already does this when there are multiple listening (TCP)
> sockets.  Take advantage of that mutex.
> 
> > The server is this case has been modified
> > to listen on a well known udp port.
> >
> > Does anybody have any clue ? The errno returned
> > by recvfrom() is 11 - resource temporarily unavailable.
> >
> > Secondly, is there a know crash on Solaris when
> > APR_HAS_THREADS is on. This is about httpd 2.0.16 beta.
> 
> Justin Erenkrantz fixed a gethostbyname() issue with a threaded build
> on Solaris < 8.  You may wish to try the latest code from CVS.
> 
> --
> Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
>        http://www.geocities.com/SiliconValley/Park/9289/
>              Born in Roswell... married an alien...

Re: Spurious return from select()

Posted by Jeff Trawick <tr...@attglobal.net>.
Shail Bhatnagar <sh...@cisco.com> writes:

> I have observed that sometimes select() 
> returns a positive value to  one or more processes
> but only one is able to read UDP data.

On every system I know of, select() wakes up every process/thread
selecting on the same descriptor (e.g., UDP socket) when the condition
is met (e.g., a datagram is ready to read).

You'll want to use a mutex to ensure that only one process is in the 
select()+recvfrom() path at a time.  Otherwise, the extra wakeups will
hurt performance.

But Apache already does this when there are multiple listening (TCP)
sockets.  Take advantage of that mutex.

> The server is this case has been modified
> to listen on a well known udp port.
> 
> Does anybody have any clue ? The errno returned
> by recvfrom() is 11 - resource temporarily unavailable.
> 
> Secondly, is there a know crash on Solaris when
> APR_HAS_THREADS is on. This is about httpd 2.0.16 beta.

Justin Erenkrantz fixed a gethostbyname() issue with a threaded build
on Solaris < 8.  You may wish to try the latest code from CVS.

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...