You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by Brian Moore <be...@cmc.net> on 1997/04/13 21:00:02 UTC

mod_proxy/374: mod_proxy(?) seems to alarm(0) somewhere

>Number:         374
>Category:       mod_proxy
>Synopsis:       mod_proxy(?) seems to alarm(0) somewhere
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    apache (Apache HTTP Project)
>State:          open
>Class:          sw-bug
>Submitter-Id:   apache
>Arrival-Date:   Sun Apr 13 12:00:01 1997
>Originator:     bem@cmc.net
>Organization:
apache
>Release:        1.2b8
>Environment:
Solaris 2.5, all recommended patches, gcc 2.7.2
>Description:
Looks like there's one other problem in mod_proxy with alarms being turned off
(not blocked via the block_alarms() call, but alarm(0)'d for some reason).  I'm
guessing on the module involved, since the three dead children this morning
were all doing proxy stuff.

The backtrace of a child that's been waiting for 110k seconds:
#0  0xef67792c in _read ()
#1  0x29364 in saferead ()
#2  0x29480 in bread ()
#3  0x488b0 in proxy_send_fb ()
#4  0x47e78 in proxy_http_handler ()
#5  0x432c0 in proxy_handler ()
#6  0x1f040 in invoke_handler ()
#7  0x21dc0 in process_request_internal ()
#8  0x21df4 in process_request ()
#9  0x1bf30 in child_main ()
#10 0x1c0cc in make_child ()
#11 0x1c8c8 in standalone_main ()
#12 0x1cb88 in main ()
(gdb) up
#1  0x29364 in saferead ()
(gdb) print alarms_blocked
$1 = 0

So this seems to be something calling alarm(0) somewhere instead of a 'logical'
alarms-off via the official mechanism.

>How-To-Repeat:
Not sure: virtually all of our proxy users are on a 10Mbps ethernet but behind
a firewall.  This usage may or may not be relevant.  The children I found dead
this morning were fetching files from cdrom.com via http, so it should be normal
the only odd thing is that these were quake files so they were no doubt huge.
>Fix:
Will be looking at the code myself this week
>Audit-Trail:
>Unformatted:



Re: mod_proxy/374: mod_proxy(?) seems to alarm(0) somewhere

Posted by Brian Moore <be...@cmc.net>.
FWIW, there's a post tonight in comp.infosystems.www.servers.unix about
a similar problem with the proxy module on b8.  He has a LOT more proxy
traffic than I, and being on a 128k line in Australia, is more subject
to net weirdnesses than those of us a hop from MCI on a lightly loaded T1.

I took the liberty of mailing him and having him attach GDB to one of the
sick children to see if his backtrace shows up the same as mine.  If so,
and if he's willing, he might make a good test spot.  (Dealing with Silly
User Night around here and didn't get a chance to play with it again.
Neato trick: http://www.disneyland.com/ crashes Netscape for Linux and
IE3.02 for Win95, I think.  It's been a night of explaining to users why
other sites are broken. :))



Re: mod_proxy/374: mod_proxy(?) seems to alarm(0) somewhere

Posted by Marc Slemko <ma...@znep.com>.
On Tue, 15 Apr 1997, Brian Moore wrote:

> I went to play with one of my demon-children (300k seconds on her now) and
> looked some at the code.  
> 
> (gdb) print (char *) timeout_name
> $7 = 0x5c480 "proxy send body"
> 
> So, at least I know where it thinks the last timeout was set.  Will take me a
> while to grok proxy_send_fb, but that seems to be where the problem is.
> 
> Looking through the logs, I see at about the same time that this child was last
> seen fetching stuff from www.cdrom.com:
> [Fri Apr 11 14:59:16 1997] proxy send body timed out for 192.168.1.32
> [Fri Apr 11 15:01:50 1997] proxy send body timed out for 192.168.1.32
> [Fri Apr 11 15:02:10 1997] proxy send body timed out for 192.168.1.32
> 
> It looks like the timeout was received but send_body never saw it.
> 
> I won't profess to be an expert on Apache's timeouts, but it looks like there
> should be either the timeout_req flag set here (it's not).  It looks like
> current_conn->aborted should be set, though gdb won't let me check.
> 
> All the while's in proxy_send_fb do check that... but there is a while loop in
> saferead that doesn't and I don't see an easy way to make it get to the
> connection structure.
> 
> Would changing that to a hard_timeout fix it?

You could try it, however be careful because it may cause things like
cache coherency problems; I haven't looked at the cache code, so I don't
know what bad things happen if it doesn't get to do whatever it does after
a soft_timeout.  I don't trust soft timeouts in certain parts of the code.
The problem is that it is possible for certain operations to block either
for very very long periods or forever on certain operating systems under
certain conditions; in those cases, only a hard_timeout will work
properly. 

Do you have all the newest Solaris TCP patches installed?  It is possible
that there is some condition which would normally eventually timeout
(but the soft_timeout doesn't stop, of course) but due to one of the many
bugs that have popped up in Solaris' networking it didn't.

Even if saferead did check connection->aborted that would not necessarily
make things safe because the read may well be automatically restarted on
systems that support interruptable system calls.  Hmm... perhaps
soft_timeout should set a hard_timeout.  Sigh.  Yes, the timeout code is a
bit of a mess.

> 
> > You may want to double check any soft_timeout()s.  If any of them go off,
> > there will be no further timeouts around and the code is expected to
> > properly check connection->aborted before doing anything.  A quick glance
> > doesn't show any such obvious things, but it is possible.  Temporarily
> > replacing all soft_timeout()s with hard_timeout()s would tell if this is
> > the problem, although it could break some things; haven't really looked at
> > the code... 
> > 
> > On Sun, 13 Apr 1997, Brian Moore wrote:
> > 
> > > Nope, it's a http transfer (but a large one).  Not sure what it is,
> > > though: it seems to be that alarm(0) is getting called [which in my
> > > looking at the code is a bad thing to do] somewhere.  The client request
> > > on this is on the other side of a packet filtering router, but at 10mbps,
> > > so it shouldn't be a client timeout.
> > > 
> > > Since it got through the flush-the-buffer stuff in saferead(), I think
> > > it's not the speed, just the dropped alarm.  See I printed out the value
> > > of alarms_blocked, which in theory should mean it's not blocked. :)  I've
> > > left a couple of these children running (there were three transferring
> > > files via http from www.cdrom.com.  The specific URLs (though I doubt it
> > > matters) GET http://www.cdrom.com/pub/quake/quakec/weapons/mini20.zip
> > > and 
> > > GET http://www.cdrom.com/pub/quake/quakec/weapons/pnc1_02b.zip
> > > I killed the third demon-child -- those two are still running.
> > > 
> > > Since we have about 50 machines on the far side of this router using
> > > illegal IP's, it may be hard to spot: they do a good amount of web/ftp
> > > access and it all runs through Apache, so it's a rare occurence.
> > > 
> > > When tracking down the missing unblocks, I did insert some code to
> > > whine... something like:
> > > 
> > >     alarm_save = alarm(0);
> > >     if (!alarm_save )
> > >         syslog(LOG_DAEMON | LOG_EMERG, "saferead, no alarm! %p",
> > >         getpid()); 
> > >     else
> > >         alarm(alarm_save);
> > > 
> > > When that code was in place in 1.2b7 I did see a bunch of times saferead
> > > was getting called with no alarm, which shouldn't happen (though I will
> > > confess I don't know what alarm() returns if say 1/2 a second is
> > > remaining on Solaris).
> > > 
> > > I'll see if I can find who was downloading quake this week and if they
> > > did anything like abort it or anything.
> > > 
> > > On Sun, 13 Apr 1997, Chuck Murcko wrote:
> > > 
> > > > Thanks for the report, Brian. It looks like a large file transfer is
> > > > indeed punching through a soft timeout. I assume these are FTP
> > > > transfers? I can duplicate your environment, so I should see the
> > > > problem when I test for it.
> > > > 
> > > > Brian Moore wrote:
> > > > > 
> > > > > >Number:         374
> > > > > >Category:       mod_proxy
> > > > > >Synopsis:       mod_proxy(?) seems to alarm(0) somewhere
> > > > > >Confidential:   no
> > > > > >Severity:       serious
> > > > > >Priority:       medium
> > > > > >Responsible:    apache (Apache HTTP Project)
> > > > > >State:          open
> > > > > >Class:          sw-bug
> > > > > >Submitter-Id:   apache
> > > > > >Arrival-Date:   Sun Apr 13 12:00:01 1997
> > > > > >Originator:     bem@cmc.net
> > > > > >Organization:
> > > > > apache
> > > > > >Release:        1.2b8
> > > > > >Environment:
> > > > > Solaris 2.5, all recommended patches, gcc 2.7.2
> > > > > >Description:
> > > > > Looks like there's one other problem in mod_proxy with alarms being
> > > > > turned off (not blocked via the block_alarms() call, but alarm(0)'d
> > > > > for some reason).  I'm guessing on the module involved, since the
> > > > > three dead children this morning were all doing proxy stuff.
> > > > > 
> > > > > The backtrace of a child that's been waiting for 110k seconds:
> > > > > #0  0xef67792c in _read ()
> > > > > #1  0x29364 in saferead ()
> > > > > #2  0x29480 in bread ()
> > > > > #3  0x488b0 in proxy_send_fb ()
> > > > > #4  0x47e78 in proxy_http_handler ()
> > > > > #5  0x432c0 in proxy_handler ()
> > > > > #6  0x1f040 in invoke_handler ()
> > > > > #7  0x21dc0 in process_request_internal ()
> > > > > #8  0x21df4 in process_request ()
> > > > > #9  0x1bf30 in child_main ()
> > > > > #10 0x1c0cc in make_child ()
> > > > > #11 0x1c8c8 in standalone_main ()
> > > > > #12 0x1cb88 in main ()
> > > > > (gdb) up
> > > > > #1  0x29364 in saferead ()
> > > > > (gdb) print alarms_blocked
> > > > > $1 = 0
> > > > > 
> > > > > So this seems to be something calling alarm(0) somewhere instead of a
> > > > > 'logical' alarms-off via the official mechanism.
> > > > > 
> > > > > >How-To-Repeat:
> > > > > Not sure: virtually all of our proxy users are on a 10Mbps ethernet
> > > > > but behind a firewall.  This usage may or may not be relevant.  The
> > > > > children I found dead this morning were fetching files from cdrom.com
> > > > > via http, so it should be normal the only odd thing is that these were
> > > > > quake files so they were no doubt huge. >Fix:
> > > > > Will be looking at the code myself this week
> > > > > >Audit-Trail:
> > > > > >Unformatted:
> > > > 
> > > > -- 
> > > > chuck
> > > > Chuck Murcko
> > > > The Topsail Group, West Chester PA USA
> > > > chuck@topsail.org
> > > > 
> > > > 
> > > 
> > 
> 


Re: mod_proxy/374: mod_proxy(?) seems to alarm(0) somewhere

Posted by Brian Moore <be...@cmc.net>.
I went to play with one of my demon-children (300k seconds on her now) and
looked some at the code.  

(gdb) print (char *) timeout_name
$7 = 0x5c480 "proxy send body"

So, at least I know where it thinks the last timeout was set.  Will take me a
while to grok proxy_send_fb, but that seems to be where the problem is.

Looking through the logs, I see at about the same time that this child was last
seen fetching stuff from www.cdrom.com:
[Fri Apr 11 14:59:16 1997] proxy send body timed out for 192.168.1.32
[Fri Apr 11 15:01:50 1997] proxy send body timed out for 192.168.1.32
[Fri Apr 11 15:02:10 1997] proxy send body timed out for 192.168.1.32

It looks like the timeout was received but send_body never saw it.

I won't profess to be an expert on Apache's timeouts, but it looks like there
should be either the timeout_req flag set here (it's not).  It looks like
current_conn->aborted should be set, though gdb won't let me check.

All the while's in proxy_send_fb do check that... but there is a while loop in
saferead that doesn't and I don't see an easy way to make it get to the
connection structure.

Would changing that to a hard_timeout fix it?

> You may want to double check any soft_timeout()s.  If any of them go off,
> there will be no further timeouts around and the code is expected to
> properly check connection->aborted before doing anything.  A quick glance
> doesn't show any such obvious things, but it is possible.  Temporarily
> replacing all soft_timeout()s with hard_timeout()s would tell if this is
> the problem, although it could break some things; haven't really looked at
> the code... 
> 
> On Sun, 13 Apr 1997, Brian Moore wrote:
> 
> > Nope, it's a http transfer (but a large one).  Not sure what it is,
> > though: it seems to be that alarm(0) is getting called [which in my
> > looking at the code is a bad thing to do] somewhere.  The client request
> > on this is on the other side of a packet filtering router, but at 10mbps,
> > so it shouldn't be a client timeout.
> > 
> > Since it got through the flush-the-buffer stuff in saferead(), I think
> > it's not the speed, just the dropped alarm.  See I printed out the value
> > of alarms_blocked, which in theory should mean it's not blocked. :)  I've
> > left a couple of these children running (there were three transferring
> > files via http from www.cdrom.com.  The specific URLs (though I doubt it
> > matters) GET http://www.cdrom.com/pub/quake/quakec/weapons/mini20.zip
> > and 
> > GET http://www.cdrom.com/pub/quake/quakec/weapons/pnc1_02b.zip
> > I killed the third demon-child -- those two are still running.
> > 
> > Since we have about 50 machines on the far side of this router using
> > illegal IP's, it may be hard to spot: they do a good amount of web/ftp
> > access and it all runs through Apache, so it's a rare occurence.
> > 
> > When tracking down the missing unblocks, I did insert some code to
> > whine... something like:
> > 
> >     alarm_save = alarm(0);
> >     if (!alarm_save )
> >         syslog(LOG_DAEMON | LOG_EMERG, "saferead, no alarm! %p",
> >         getpid()); 
> >     else
> >         alarm(alarm_save);
> > 
> > When that code was in place in 1.2b7 I did see a bunch of times saferead
> > was getting called with no alarm, which shouldn't happen (though I will
> > confess I don't know what alarm() returns if say 1/2 a second is
> > remaining on Solaris).
> > 
> > I'll see if I can find who was downloading quake this week and if they
> > did anything like abort it or anything.
> > 
> > On Sun, 13 Apr 1997, Chuck Murcko wrote:
> > 
> > > Thanks for the report, Brian. It looks like a large file transfer is
> > > indeed punching through a soft timeout. I assume these are FTP
> > > transfers? I can duplicate your environment, so I should see the
> > > problem when I test for it.
> > > 
> > > Brian Moore wrote:
> > > > 
> > > > >Number:         374
> > > > >Category:       mod_proxy
> > > > >Synopsis:       mod_proxy(?) seems to alarm(0) somewhere
> > > > >Confidential:   no
> > > > >Severity:       serious
> > > > >Priority:       medium
> > > > >Responsible:    apache (Apache HTTP Project)
> > > > >State:          open
> > > > >Class:          sw-bug
> > > > >Submitter-Id:   apache
> > > > >Arrival-Date:   Sun Apr 13 12:00:01 1997
> > > > >Originator:     bem@cmc.net
> > > > >Organization:
> > > > apache
> > > > >Release:        1.2b8
> > > > >Environment:
> > > > Solaris 2.5, all recommended patches, gcc 2.7.2
> > > > >Description:
> > > > Looks like there's one other problem in mod_proxy with alarms being
> > > > turned off (not blocked via the block_alarms() call, but alarm(0)'d
> > > > for some reason).  I'm guessing on the module involved, since the
> > > > three dead children this morning were all doing proxy stuff.
> > > > 
> > > > The backtrace of a child that's been waiting for 110k seconds:
> > > > #0  0xef67792c in _read ()
> > > > #1  0x29364 in saferead ()
> > > > #2  0x29480 in bread ()
> > > > #3  0x488b0 in proxy_send_fb ()
> > > > #4  0x47e78 in proxy_http_handler ()
> > > > #5  0x432c0 in proxy_handler ()
> > > > #6  0x1f040 in invoke_handler ()
> > > > #7  0x21dc0 in process_request_internal ()
> > > > #8  0x21df4 in process_request ()
> > > > #9  0x1bf30 in child_main ()
> > > > #10 0x1c0cc in make_child ()
> > > > #11 0x1c8c8 in standalone_main ()
> > > > #12 0x1cb88 in main ()
> > > > (gdb) up
> > > > #1  0x29364 in saferead ()
> > > > (gdb) print alarms_blocked
> > > > $1 = 0
> > > > 
> > > > So this seems to be something calling alarm(0) somewhere instead of a
> > > > 'logical' alarms-off via the official mechanism.
> > > > 
> > > > >How-To-Repeat:
> > > > Not sure: virtually all of our proxy users are on a 10Mbps ethernet
> > > > but behind a firewall.  This usage may or may not be relevant.  The
> > > > children I found dead this morning were fetching files from cdrom.com
> > > > via http, so it should be normal the only odd thing is that these were
> > > > quake files so they were no doubt huge. >Fix:
> > > > Will be looking at the code myself this week
> > > > >Audit-Trail:
> > > > >Unformatted:
> > > 
> > > -- 
> > > chuck
> > > Chuck Murcko
> > > The Topsail Group, West Chester PA USA
> > > chuck@topsail.org
> > > 
> > > 
> > 
> 


Re: mod_proxy/374: mod_proxy(?) seems to alarm(0) somewhere

Posted by Marc Slemko <ma...@znep.com>.
You may want to double check any soft_timeout()s.  If any of them go off,
there will be no further timeouts around and the code is expected to
properly check connection->aborted before doing anything.  A quick glance
doesn't show any such obvious things, but it is possible.  Temporarily
replacing all soft_timeout()s with hard_timeout()s would tell if this is
the problem, although it could break some things; haven't really looked at
the code... 

On Sun, 13 Apr 1997, Brian Moore wrote:

> Nope, it's a http transfer (but a large one).  Not sure what it is, though:
> it seems to be that alarm(0) is getting called [which in my looking at the
> code is a bad thing to do] somewhere.  The client request on this is on the
> other side of a packet filtering router, but at 10mbps, so it shouldn't be
> a client timeout.
> 
> Since it got through the flush-the-buffer stuff in saferead(), I think it's
> not the speed, just the dropped alarm.  See I printed out the value of
> alarms_blocked, which in theory should mean it's not blocked. :)  I've left
> a couple of these children running (there were three transferring files
> via http from www.cdrom.com.  The specific URLs (though I doubt it matters)
> GET http://www.cdrom.com/pub/quake/quakec/weapons/mini20.zip
> and 
> GET http://www.cdrom.com/pub/quake/quakec/weapons/pnc1_02b.zip
> I killed the third demon-child -- those two are still running.
> 
> Since we have about 50 machines on the far side of this router using illegal
> IP's, it may be hard to spot: they do a good amount of web/ftp access and
> it all runs through Apache, so it's a rare occurence.
> 
> When tracking down the missing unblocks, I did insert some code to
> whine... something like:
> 
>     alarm_save = alarm(0);
>     if (!alarm_save )
>         syslog(LOG_DAEMON | LOG_EMERG, "saferead, no alarm! %p", getpid());
>     else
>         alarm(alarm_save);
> 
> When that code was in place in 1.2b7 I did see a bunch of times saferead
> was getting called with no alarm, which shouldn't happen (though I will
> confess I don't know what alarm() returns if say 1/2 a second is remaining
> on Solaris).
> 
> I'll see if I can find who was downloading quake this week and if they did
> anything like abort it or anything.
> 
> On Sun, 13 Apr 1997, Chuck Murcko wrote:
> 
> > Thanks for the report, Brian. It looks like a large file transfer is
> > indeed punching through a soft timeout. I assume these are FTP
> > transfers? I can duplicate your environment, so I should see the problem
> > when I test for it.
> > 
> > Brian Moore wrote:
> > > 
> > > >Number:         374
> > > >Category:       mod_proxy
> > > >Synopsis:       mod_proxy(?) seems to alarm(0) somewhere
> > > >Confidential:   no
> > > >Severity:       serious
> > > >Priority:       medium
> > > >Responsible:    apache (Apache HTTP Project)
> > > >State:          open
> > > >Class:          sw-bug
> > > >Submitter-Id:   apache
> > > >Arrival-Date:   Sun Apr 13 12:00:01 1997
> > > >Originator:     bem@cmc.net
> > > >Organization:
> > > apache
> > > >Release:        1.2b8
> > > >Environment:
> > > Solaris 2.5, all recommended patches, gcc 2.7.2
> > > >Description:
> > > Looks like there's one other problem in mod_proxy with alarms being turned off
> > > (not blocked via the block_alarms() call, but alarm(0)'d for some reason).  I'm
> > > guessing on the module involved, since the three dead children this morning
> > > were all doing proxy stuff.
> > > 
> > > The backtrace of a child that's been waiting for 110k seconds:
> > > #0  0xef67792c in _read ()
> > > #1  0x29364 in saferead ()
> > > #2  0x29480 in bread ()
> > > #3  0x488b0 in proxy_send_fb ()
> > > #4  0x47e78 in proxy_http_handler ()
> > > #5  0x432c0 in proxy_handler ()
> > > #6  0x1f040 in invoke_handler ()
> > > #7  0x21dc0 in process_request_internal ()
> > > #8  0x21df4 in process_request ()
> > > #9  0x1bf30 in child_main ()
> > > #10 0x1c0cc in make_child ()
> > > #11 0x1c8c8 in standalone_main ()
> > > #12 0x1cb88 in main ()
> > > (gdb) up
> > > #1  0x29364 in saferead ()
> > > (gdb) print alarms_blocked
> > > $1 = 0
> > > 
> > > So this seems to be something calling alarm(0) somewhere instead of a 'logical'
> > > alarms-off via the official mechanism.
> > > 
> > > >How-To-Repeat:
> > > Not sure: virtually all of our proxy users are on a 10Mbps ethernet but behind
> > > a firewall.  This usage may or may not be relevant.  The children I found dead
> > > this morning were fetching files from cdrom.com via http, so it should be normal
> > > the only odd thing is that these were quake files so they were no doubt huge.
> > > >Fix:
> > > Will be looking at the code myself this week
> > > >Audit-Trail:
> > > >Unformatted:
> > 
> > -- 
> > chuck
> > Chuck Murcko
> > The Topsail Group, West Chester PA USA
> > chuck@topsail.org
> > 
> > 
> 


Re: mod_proxy/374: mod_proxy(?) seems to alarm(0) somewhere

Posted by Brian Moore <be...@cmc.net>.
Nope, it's a http transfer (but a large one).  Not sure what it is, though:
it seems to be that alarm(0) is getting called [which in my looking at the
code is a bad thing to do] somewhere.  The client request on this is on the
other side of a packet filtering router, but at 10mbps, so it shouldn't be
a client timeout.

Since it got through the flush-the-buffer stuff in saferead(), I think it's
not the speed, just the dropped alarm.  See I printed out the value of
alarms_blocked, which in theory should mean it's not blocked. :)  I've left
a couple of these children running (there were three transferring files
via http from www.cdrom.com.  The specific URLs (though I doubt it matters)
GET http://www.cdrom.com/pub/quake/quakec/weapons/mini20.zip
and 
GET http://www.cdrom.com/pub/quake/quakec/weapons/pnc1_02b.zip
I killed the third demon-child -- those two are still running.

Since we have about 50 machines on the far side of this router using illegal
IP's, it may be hard to spot: they do a good amount of web/ftp access and
it all runs through Apache, so it's a rare occurence.

When tracking down the missing unblocks, I did insert some code to
whine... something like:

    alarm_save = alarm(0);
    if (!alarm_save )
        syslog(LOG_DAEMON | LOG_EMERG, "saferead, no alarm! %p", getpid());
    else
        alarm(alarm_save);

When that code was in place in 1.2b7 I did see a bunch of times saferead
was getting called with no alarm, which shouldn't happen (though I will
confess I don't know what alarm() returns if say 1/2 a second is remaining
on Solaris).

I'll see if I can find who was downloading quake this week and if they did
anything like abort it or anything.

On Sun, 13 Apr 1997, Chuck Murcko wrote:

> Thanks for the report, Brian. It looks like a large file transfer is
> indeed punching through a soft timeout. I assume these are FTP
> transfers? I can duplicate your environment, so I should see the problem
> when I test for it.
> 
> Brian Moore wrote:
> > 
> > >Number:         374
> > >Category:       mod_proxy
> > >Synopsis:       mod_proxy(?) seems to alarm(0) somewhere
> > >Confidential:   no
> > >Severity:       serious
> > >Priority:       medium
> > >Responsible:    apache (Apache HTTP Project)
> > >State:          open
> > >Class:          sw-bug
> > >Submitter-Id:   apache
> > >Arrival-Date:   Sun Apr 13 12:00:01 1997
> > >Originator:     bem@cmc.net
> > >Organization:
> > apache
> > >Release:        1.2b8
> > >Environment:
> > Solaris 2.5, all recommended patches, gcc 2.7.2
> > >Description:
> > Looks like there's one other problem in mod_proxy with alarms being turned off
> > (not blocked via the block_alarms() call, but alarm(0)'d for some reason).  I'm
> > guessing on the module involved, since the three dead children this morning
> > were all doing proxy stuff.
> > 
> > The backtrace of a child that's been waiting for 110k seconds:
> > #0  0xef67792c in _read ()
> > #1  0x29364 in saferead ()
> > #2  0x29480 in bread ()
> > #3  0x488b0 in proxy_send_fb ()
> > #4  0x47e78 in proxy_http_handler ()
> > #5  0x432c0 in proxy_handler ()
> > #6  0x1f040 in invoke_handler ()
> > #7  0x21dc0 in process_request_internal ()
> > #8  0x21df4 in process_request ()
> > #9  0x1bf30 in child_main ()
> > #10 0x1c0cc in make_child ()
> > #11 0x1c8c8 in standalone_main ()
> > #12 0x1cb88 in main ()
> > (gdb) up
> > #1  0x29364 in saferead ()
> > (gdb) print alarms_blocked
> > $1 = 0
> > 
> > So this seems to be something calling alarm(0) somewhere instead of a 'logical'
> > alarms-off via the official mechanism.
> > 
> > >How-To-Repeat:
> > Not sure: virtually all of our proxy users are on a 10Mbps ethernet but behind
> > a firewall.  This usage may or may not be relevant.  The children I found dead
> > this morning were fetching files from cdrom.com via http, so it should be normal
> > the only odd thing is that these were quake files so they were no doubt huge.
> > >Fix:
> > Will be looking at the code myself this week
> > >Audit-Trail:
> > >Unformatted:
> 
> -- 
> chuck
> Chuck Murcko
> The Topsail Group, West Chester PA USA
> chuck@topsail.org
> 
> 


Re: mod_proxy/374: mod_proxy(?) seems to alarm(0) somewhere

Posted by Chuck Murcko <ch...@topsail.org>.
Thanks for the report, Brian. It looks like a large file transfer is
indeed punching through a soft timeout. I assume these are FTP
transfers? I can duplicate your environment, so I should see the problem
when I test for it.

Brian Moore wrote:
> 
> >Number:         374
> >Category:       mod_proxy
> >Synopsis:       mod_proxy(?) seems to alarm(0) somewhere
> >Confidential:   no
> >Severity:       serious
> >Priority:       medium
> >Responsible:    apache (Apache HTTP Project)
> >State:          open
> >Class:          sw-bug
> >Submitter-Id:   apache
> >Arrival-Date:   Sun Apr 13 12:00:01 1997
> >Originator:     bem@cmc.net
> >Organization:
> apache
> >Release:        1.2b8
> >Environment:
> Solaris 2.5, all recommended patches, gcc 2.7.2
> >Description:
> Looks like there's one other problem in mod_proxy with alarms being turned off
> (not blocked via the block_alarms() call, but alarm(0)'d for some reason).  I'm
> guessing on the module involved, since the three dead children this morning
> were all doing proxy stuff.
> 
> The backtrace of a child that's been waiting for 110k seconds:
> #0  0xef67792c in _read ()
> #1  0x29364 in saferead ()
> #2  0x29480 in bread ()
> #3  0x488b0 in proxy_send_fb ()
> #4  0x47e78 in proxy_http_handler ()
> #5  0x432c0 in proxy_handler ()
> #6  0x1f040 in invoke_handler ()
> #7  0x21dc0 in process_request_internal ()
> #8  0x21df4 in process_request ()
> #9  0x1bf30 in child_main ()
> #10 0x1c0cc in make_child ()
> #11 0x1c8c8 in standalone_main ()
> #12 0x1cb88 in main ()
> (gdb) up
> #1  0x29364 in saferead ()
> (gdb) print alarms_blocked
> $1 = 0
> 
> So this seems to be something calling alarm(0) somewhere instead of a 'logical'
> alarms-off via the official mechanism.
> 
> >How-To-Repeat:
> Not sure: virtually all of our proxy users are on a 10Mbps ethernet but behind
> a firewall.  This usage may or may not be relevant.  The children I found dead
> this morning were fetching files from cdrom.com via http, so it should be normal
> the only odd thing is that these were quake files so they were no doubt huge.
> >Fix:
> Will be looking at the code myself this week
> >Audit-Trail:
> >Unformatted:

-- 
chuck
Chuck Murcko
The Topsail Group, West Chester PA USA
chuck@topsail.org