You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Brian Behlendorf <br...@organic.com> on 1997/01/10 07:57:49 UTC

Finish 1.2!

If I have any foot to put down with this group, I'd like to put it down now.
Let's get 1.2 out the fucking door, and then think about the future.  Okay?
It is absolutely unproductive to talk about post-1.2 right now.

Here's a timetable: 

  01/14 Release 1.2b5
  01/21 Release 1.2b6
  01/28 Release 1.2 final, warts and all.

If we don't get many bug reports for 1.2b5 we might be able to skip the b6
stage, but I'd like the header parse API to get some good thrashing and that
might require one more step.

What do people think?

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com  www.apache.org  hyperreal.com  http://www.organic.com/JOBS

FIN_WAIT_2

Posted by Brian Behlendorf <br...@organic.com>.

On Fri, 10 Jan 1997, Marc Slemko wrote:
> Is the BSDI box patched to have a FIN_WAIT_2 timeout? 

Oops, I was mistaked - I am seeing it on BSDI.  I haven't given it Chuck's
BSDI patch since I don't have completely ready access to BSDI source, but I'll
test the NO_LINGCLOSE define just to be safe.  

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com  www.apache.org  hyperreal.com  http://www.organic.com/JOBS

Re: Finish 1.2!

Posted by Marc Slemko <ma...@znep.com>.

On Thu, 9 Jan 1997, Brian Behlendorf wrote:

> On Fri, 10 Jan 1997, Marc Slemko wrote:
> > > Here's a timetable: 
> > > 
> > >   01/14 Release 1.2b5
> > 
> > Urp.  If I'm unlucky, it may be a couple of days after that before I can
> > finsh snprintf patches, have them looked at, and we can sort out a few
> > details about how they should be compiled in.  If buffer overflow fixes
> > are going to go in 1.2, they need to be in the next beta.
> 
> I consider that to be a priority as well, and worth waiting for.  Biting the
> bullet: it will need at least a b6 to account for porting issues related to
> snprintf, I'm guessing.  So back up my timetable by a week?
> 
> 01/21 Release 1.2b5  (with snprintf)
> 01/28 Release 1.2b6
> 02/04 Release 1.2 final?

Hopefully not that long for my stuff; I'm hoping to have it mostly done
this weekend (11th-12th) but if not it may be the next one (18th-19th)
before I can get the time.  

I'm guessing there will need to be a b6 for snprintf porting issues, but
hoping there won't.  It is looking better than I had hoped in terms of
portability, but only time will tell.

> 
> > I'm wondering if the FIN_WAIT_2 thing may start causing a lot of problems
> > after release?  There are a good number of people having tons of
> > connections stuck in FIN_WAIT_2 that they didn't have before 1.2 but do
> > with 1.2.  
> 
> I wish we knew more about what may be causing this.  Certainly adding a note to
> the docs suggesting toggling the lingering close code may be appropriate, yet I
> haven't seen conclusive evidence that that's the problem.  For what it's worth
> it's not been a problem on any of the servers I watch (Solaris with 1.5 mill
> hits/day, BSDI with ~150K, SGI with ~40K).

Is the BSDI box patched to have a FIN_WAIT_2 timeout? 

Anyone have a list of addresses of people who have been having trouble? 
It wouldn't hurt to send out a little survey with a few questions (ie.
what OS, does this happen with 1.1.1, does it happen if you modify this in
the source, does it happen if you modify that in the source, etc.) ...

Re: Finish 1.2!

Posted by Marc Slemko <ma...@znep.com>.

On Thu, 9 Jan 1997, Cliff Skolnick wrote:
> 
> Well this is actually a well known TCP/IP bug.  If I remember correctly 
> it had something to do with the client "disappearing" from the net (like 
> hanging up their ISP) and not letting the socket fully close.  I thought 
> a while about why this could be worse with 1.2, and the only two things I 
> come up with are:

The problem of connections getting hung in FIN_WAIT_2 is a well known
"bug" in the TCP spec in that it lets a client's behaviour adversely
affect the server.  But I don't see the normal path to hanging in
FIN_WAIT_2 being that common.

What happens is:
	- the server sends a FIN to the client, which says that the
	  server will no longer be sending any more data.  When it
	  gets an ACK to this FIN, it goes into the FIN_WAIT_2 state.
	  Just looking at the packet exchange, a connection at this
	  stage could still be used to transfer data from the client
	  to the server because we have only done a half close.
	- the server is now in FIN_WAIT_2; it waits forever (or until
	  a timeout) until it gets a FIN from the client; when it
	  does, it sends an ACK and goes to TIME_WAIT.

In the normal situation, unless it is setup to do a half-close, the
client should then send a FIN right away, which the server ACKs and
goes into TIME_WAIT.  

On the surface of things, you should only get into this state if the
client disconnects between sending the ACK to your FIN, and sending a
FIN back to you.  In a normal situation, those two should happen
almost at the same time.  Hmm.  

I'll have to check the kernel, but by doing a sortof half-close 
(shutdown(2) with 1 as a second param) apache may be putting the
kernel in a state where things don't timeout as they would if it
shutdown both directions at the same time.  That said, lingering_close
may have nothing to do with it; there is certainly some evidence
pointing towards disabling it not fixing anything.

Hmm.  I think that lingering_close will behave differently on Linux
than other platforms because Linux modifies the timer passed to
select, no?

> 
> 	1) They were always there, and people are just noticing them.  The
> 	people who downgraded and still saw them said nothing, but the people
> 	who downgraded and did not see them spoke up really loudly.  The only
> 	real cause was the random there/not there factor.  I know I have
> 	always seen these until I applied a patch to get rid of them
> 	after a while, this was a kernel thing since these sockets are not
> 	tied to a user process.

I certainly would be able to believe that, but there seem to be too
many people who have a server that simply will not run under 1.2
because it runs out of mbufs, but under 1.1.1 it is definitely fine.

> 	2) Sockets are gettting stuck in 1.2, increasing the chance that
> 	this may happen.  If the server does not close the connection and
> 	lets it timeout I can guess there may be a greater chance of a
> 	FIN_WAIT2 when the server starts timing stuff out and you have a
> 	bunch of dialup users logging of the net.

> 
> Any other thoughts?  Maybe we should ask the people to send the error_log and
> see if there are more timeouts reported for 1.2 that 1.1?  Way to test 
> for #2.

I'm not sure that would help that much.  I think that most of the
things that you would expect to cause this would be in http_main.c, so
perhaps getting them to try a http_main.c that was hacked to be as
much like 1.1.1 as possible would help.  We also can't forget that we
may be seeing several problems here.  With 1.1.1 on FreeBSD, I
normally see... 60 or so connections in FIN_WAIT_2 on a server doing
perhaps 10 connections/sec on average, but it doesn't cause a problem
because they timeout.

Re: Finish 1.2!

Posted by Cliff Skolnick <cl...@steam.com>.

On Thu, 9 Jan 1997, Brian Behlendorf wrote:
> 
> > I'm wondering if the FIN_WAIT_2 thing may start causing a lot of problems
> > after release?  There are a good number of people having tons of
> > connections stuck in FIN_WAIT_2 that they didn't have before 1.2 but do
> > with 1.2.  
> 
> I wish we knew more about what may be causing this.  Certainly adding a note to
> the docs suggesting toggling the lingering close code may be appropriate, yet I
> haven't seen conclusive evidence that that's the problem.  For what it's worth
> it's not been a problem on any of the servers I watch (Solaris with 1.5 mill
> hits/day, BSDI with ~150K, SGI with ~40K).

Well this is actually a well known TCP/IP bug.  If I remember correctly 
it had something to do with the client "disappearing" from the net (like 
hanging up their ISP) and not letting the socket fully close.  I thought 
a while about why this could be worse with 1.2, and the only two things I 
come up with are:

	1) They were always there, and people are just noticing them.  The
	people who downgraded and still saw them said nothing, but the people
	who downgraded and did not see them spoke up really loudly.  The only
	real cause was the random there/not there factor.  I know I have
	always seen these until I applied a patch to get rid of them
	after a while, this was a kernel thing since these sockets are not
	tied to a user process.

	2) Sockets are gettting stuck in 1.2, increasing the chance that
	this may happen.  If the server does not close the connection and
	lets it timeout I can guess there may be a greater chance of a
	FIN_WAIT2 when the server starts timing stuff out and you have a
	bunch of dialup users logging of the net.

Any other thoughts?  Maybe we should ask the people to send the error_log and
see if there are more timeouts reported for 1.2 that 1.1?  Way to test 
for #2.

Cliff

--
Cliff Skolnick, Technical Consultant
Steam Tunnel Operations
cliff@steam.com, 415.297.5938
http://www.steam.com/

Re: Finish 1.2!

Posted by Brian Behlendorf <br...@organic.com>.

On Fri, 10 Jan 1997, Marc Slemko wrote:
> > Here's a timetable: 
> > 
> >   01/14 Release 1.2b5
> 
> Urp.  If I'm unlucky, it may be a couple of days after that before I can
> finsh snprintf patches, have them looked at, and we can sort out a few
> details about how they should be compiled in.  If buffer overflow fixes
> are going to go in 1.2, they need to be in the next beta.

I consider that to be a priority as well, and worth waiting for.  Biting the
bullet: it will need at least a b6 to account for porting issues related to
snprintf, I'm guessing.  So back up my timetable by a week?

01/21 Release 1.2b5  (with snprintf)
01/28 Release 1.2b6
02/04 Release 1.2 final?

> I'm wondering if the FIN_WAIT_2 thing may start causing a lot of problems
> after release?  There are a good number of people having tons of
> connections stuck in FIN_WAIT_2 that they didn't have before 1.2 but do
> with 1.2.  

I wish we knew more about what may be causing this.  Certainly adding a note to
the docs suggesting toggling the lingering close code may be appropriate, yet I
haven't seen conclusive evidence that that's the problem.  For what it's worth
it's not been a problem on any of the servers I watch (Solaris with 1.5 mill
hits/day, BSDI with ~150K, SGI with ~40K).

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com  www.apache.org  hyperreal.com  http://www.organic.com/JOBS

Re: Finish 1.2!

Posted by Marc Slemko <ma...@znep.com>.

On Thu, 9 Jan 1997, Brian Behlendorf wrote:

> 
> If I have any foot to put down with this group, I'd like to put it down now.
> Let's get 1.2 out the fucking door, and then think about the future.  Okay?
> It is absolutely unproductive to talk about post-1.2 right now.
> 
> Here's a timetable: 
> 
>   01/14 Release 1.2b5

Urp.  If I'm unlucky, it may be a couple of days after that before I can
finsh snprintf patches, have them looked at, and we can sort out a few
details about how they should be compiled in.  If buffer overflow fixes
are going to go in 1.2, they need to be in the next beta.

But, that sounds like a good start.

I'm wondering if the FIN_WAIT_2 thing may start causing a lot of problems
after release?  There are a good number of people having tons of
connections stuck in FIN_WAIT_2 that they didn't have before 1.2 but do
with 1.2.  

>   01/21 Release 1.2b6
>   01/28 Release 1.2 final, warts and all.
> 
> If we don't get many bug reports for 1.2b5 we might be able to skip the b6
> stage, but I'd like the header parse API to get some good thrashing and that
> might require one more step.
> 
> What do people think?
> 
> 	Brian
> 
> --=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
> brian@organic.com  www.apache.org  hyperreal.com  http://www.organic.com/JOBS
>