You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Sander Striker <st...@apache.org> on 2002/04/18 11:33:09 UTC

Release 2.0.36

Hi,

There have been several bug fixes since 2.0.35,
the community should benefit from that IMO.

What is the current status on 2.0.36-dev?
Are we still core dumping on daedalus?  How
close are we to finding/fixing the problem
we are seeing there?

Greg (Ames), can you holler when you've
seen daedalus run for 3 days without dumping
core? ;)

Do we have other known issues that should hold
up 2.0.36?

I'm bringing this up because I don't want us
to end up with the same release rate we had
for the betas.  Better to release often IMO.


Sander



Re: Release 2.0.36

Posted by Jeff Trawick <tr...@attglobal.net>.
Greg Ames <gr...@apache.org> writes:

> The other is a sendfile assert that we've seen intermittently for a long time. 
> Jeff thinks the FreeBSD kernel is giving us a return value of 0 with no bytes
> sent.  I think we might be passing a length or an offset that's too big, perhaps
> when the site is updated after we stat() the file.  I plan to write traps to
> test both theories.

Here's what I think in a little more detail:

[I suspect that] Both FreeBSD and Linux sendfile() will return 0 in
some conditions where it sends no bytes.  The conditions?  Bogus
offset/length combination.  I just verified that on Linux I get return
code 0 from sendfile() when asking it to send 8192 bytes starting at
offset 999999999 in a file that is much smaller than that.  The file
offset was updated to 999999999.  (It seems that sendfile picks up
some semantics from sparse file support in seek-type operations.)

Why would we give sendfile a bogus offset/length combination?

a couple of possible reasons:

a) like you said, the file size could change during the request (a
   definite problem)
b) we bungle the processing somehow and come up with a bogus file
   bucket

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

Re: Release 2.0.36

Posted by Greg Ames <gr...@apache.org>.
Sander Striker wrote:

> What is the current status on 2.0.36-dev?
> Are we still core dumping on daedalus?  How
> close are we to finding/fixing the problem
> we are seeing there?
> 
> Greg (Ames), can you holler when you've
> seen daedalus run for 3 days without dumping
> core? ;)

(sorry for the delay...just catching up on e-mail after an outage last week)

daedalus has had only 2 core dumps in about 10 days.  One was a mmap cleanup
issue which Cliff has a promising looking fix for in 2.0.36.  

The other is a sendfile assert that we've seen intermittently for a long time. 
Jeff thinks the FreeBSD kernel is giving us a return value of 0 with no bytes
sent.  I think we might be passing a length or an offset that's too big, perhaps
when the site is updated after we stat() the file.  I plan to write traps to
test both theories.

So I don't know of any reason why 2.0.36 won't be stable.

Greg

Re: Release 2.0.36

Posted by Justin Erenkrantz <je...@apache.org>.
On Thu, Apr 18, 2002 at 11:33:09AM +0200, Sander Striker wrote:
> Do we have other known issues that should hold
> up 2.0.36?

Well, by my count, we have 74 open PRs in Bugzilla.  A number of us
have been triaging them (you know who you are), but more developers
assisting in closing these suckers would do us well.  

That said, there are a number of gotchas and major faults fixed
in between 2.0.35->HEAD that it may be worth considering doing a
release.  I don't think we'll ever fix all of the PRs, but we should
attempt to make sure we've addressed PRs where multiple people have
reported the same problem.  -- justin

Re: Release 2.0.36

Posted by Greg Stein <gs...@lyra.org>.
On Thu, Apr 18, 2002 at 07:33:54AM -0500, William A. Rowe, Jr. wrote:
>...
> If I can get that semantics change done on optional fns/hooks so we can
> avoid all mmn version bumps for optional fn/hooks, I think that would also
> cut down on the bumps for foreign modules.  Will look to make some
> progress and post an example patch by Friday morning.

Well, you always want to do a minor bump when you introduce new APIs. A
minor bump should not invalidate any old code, but it provides a way for
modules to recognize when new APIs are available (especially nice with the
optional hooks, but also fine for compile-time).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Release 2.0.36

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 04:33 AM 4/18/2002, Sander Striker wrote:

>There have been several bug fixes since 2.0.35,
>the community should benefit from that IMO.

No doubt :)

>What is the current status on 2.0.36-dev?

I just let a nightly binary build fly on ~wrowe/ that I've posted in response
to certain bugfixes nobody on this list ever reproduced (win32 socket
problems and other cgi/scriptinterpreter fooness.)  Hope to have some
feedback quickly!

>Greg (Ames), can you holler when you've
>seen daedalus run for 3 days without dumping
>core? ;)

I was under the impression we are still running a near-.35 with specific
bugfixes.  After those patches prove themselves, it probably will be time
to bump to head.  [But I could just be confused :-]

>I'm bringing this up because I don't want us
>to end up with the same release rate we had
>for the betas.  Better to release often IMO.

Wholeheartedly agreed here.  Every two to four weeks would seem about
ideal, one to three weeks of bugfixes and a week of stabilization should
help folks accept the new codebase as solid and building on that.

If I can get that semantics change done on optional fns/hooks so we can
avoid all mmn version bumps for optional fn/hooks, I think that would also
cut down on the bumps for foreign modules.  Will look to make some
progress and post an example patch by Friday morning.


Re: Release 2.0.36

Posted by Cliff Woolley <jw...@virginia.edu>.
On 18 Apr 2002, Jeff Trawick wrote:

> > 1) The MMAP bucket cleanup problem, which has been responsible for
> >    some (rare-ish) segv's on daedalus [I think I figured out
> >    how to fix this last night]
>
> great!

UGGGGHHHHHH It's even MORE complicated than I could have imagined.  It's
quite difficult (it might even be impossible) to ensure that you never
touch m->mmap after it's cleaned up.  You can register your own cleanup,
but what if the apr_mmap_t is dup'ed and ownership is transferred?  Then
you're screwed.

May I just say that cleanups are a bitch.  :)

Anyway, it looks like we might have no choice but to go back to NOT
calling apr_mmap_delete() from mmap_bucket_destroy().  But that means
that due to file_read morphing itself we could potentially have a large
amount of data mmaped at a time before any of it goes away.  Is that
killer?  Because if I could just remove that one line this would all go
away.  =-]

--Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: Release 2.0.36

Posted by Jeff Trawick <tr...@attglobal.net>.
Cliff Woolley <jw...@virginia.edu> writes:

> On Thu, 18 Apr 2002, Sander Striker wrote:
> 
> > What is the current status on 2.0.36-dev?
> 
> Big things that I know of besides what's in bugzilla:
> 
> 1) The MMAP bucket cleanup problem, which has been responsible for
>    some (rare-ish) segv's on daedalus [I think I figured out
>    how to fix this last night]

great!

> 2) The worker shutdown segfault ... Jeff, does the patch you committed
>    fix this for sure, or did it perhaps just hide the problem?  There
>    didn't seem to be a consensus about this on-list.

It definitely doesn't deal with the fact that cleaning up pchild
before all threads terminate during graceless termination can
potentially cause all sorts of problems.

I certainly don't see any segfaults in this area anymore (which is
valuable to me since my regression tests can test other code).

I think the code I put in yesterday should be backed out, but
something else needs to change Real Soon Now to avoid the segfault.
I'll be -1 on another release otherwise (not that such an opinion
would make any difference).  I'll back it out reasonably soon (need
some lunch first) and hope for the best.

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

Re: Release 2.0.36

Posted by Jeff Trawick <tr...@attglobal.net>.
Cliff Woolley <jw...@virginia.edu> writes:

>   5) What about the libtool --install issue and Sander's partial
>      patch for it?

I'll try to get this done by tomorrow afternoon.  Besides the coding,
it needs to be tested out on some platforms which Sander doesn't have
access to.

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

Re: Volunteering to be RM, WAS: RE: Release 2.0.36

Posted by Justin Erenkrantz <je...@apache.org>.
On Tue, Apr 23, 2002 at 02:00:25PM +0200, Sander Striker wrote:
> Hi,
> 
> I volunteer to be RM for 2.0.36 (that is, if noone
> has a problem with that ;).
> 
> I'm aware of the issues we still have in HEAD, which
> is why we need a tag and run that on daedalus.
> 
> However, I'll hold of on the tag since there are
> probably going to be some file moves in the atomics
> section.  When that is done, we can tag.  The tag
> can be bumped for the worker fix (if it's not
> already in by then).

The reason I suggested a hold to Sander on account of the atomics
is that we have a bunch of PRs relating to building atomics on
Solaris that haven't been (yet) resolved.

I believe we could rethink how we are building the atomics to
make it a bit less troublesome.  I'd rather see us use CPU-based
defines rather than OS-based defines for those platforms where
the OS is of no help (Linux, Solaris, and old FreeBSD).

Personally, I don't care to spend any more time on the atomics
code.  If someone wants to see this included, I heartily suggest
that it should get fixed soon.  -- justin

Re: Volunteering to be RM, WAS: RE: Release 2.0.36

Posted by Bill Stoddard <bi...@wstoddard.com>.
+1

> Hi,
> 
> I volunteer to be RM for 2.0.36 (that is, if noone
> has a problem with that ;).
> 
> I'm aware of the issues we still have in HEAD, which
> is why we need a tag and run that on daedalus.
> 
> However, I'll hold of on the tag since there are
> probably going to be some file moves in the atomics
> section.  When that is done, we can tag.  The tag
> can be bumped for the worker fix (if it's not
> already in by then).
> 
> Thoughts?
> 
> Sander
> 


Volunteering to be RM, WAS: RE: Release 2.0.36

Posted by Sander Striker <st...@apache.org>.
Hi,

I volunteer to be RM for 2.0.36 (that is, if noone
has a problem with that ;).

I'm aware of the issues we still have in HEAD, which
is why we need a tag and run that on daedalus.

However, I'll hold of on the tag since there are
probably going to be some file moves in the atomics
section.  When that is done, we can tag.  The tag
can be bumped for the worker fix (if it's not
already in by then).

Thoughts?

Sander


RE: Release 2.0.36

Posted by Cliff Woolley <jw...@virginia.edu>.
On Mon, 22 Apr 2002, Sander Striker wrote:

>   - allocate the sockets out of a special pool so we can clean up the
> sockets (using apr_pool_clear(psock)), sleep for 1 sec (should be enough
> for all threads to notice the sockets are gone).  After that clean
> pchild as usual.

>From my uninformed perspective, this sounds like a good compromise...

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



RE: Release 2.0.36

Posted by Sander Striker <st...@apache.org>.
> From: trawick@rdu88-250-035.nc.rr.com
> [mailto:trawick@rdu88-250-035.nc.rr.com]On Behalf Of Jeff Trawick
> Sent: 22 April 2002 16:33

> "Sander Striker" <st...@apache.org> writes:
> 
> > > From: Cliff Woolley [mailto:jwoolley@virginia.edu]
> > > Sent: 18 April 2002 16:44
> >  
> > >> What is the current status on 2.0.36-dev?
> 
> > Saw the fixes, so this is gone.
> > 
> > > 2) The worker shutdown segfault ... Jeff, does the patch you committed
> > >    fix this for sure, or did it perhaps just hide the problem?  There
> > >    didn't seem to be a consensus about this on-list.
> > 
> > Jeff?
> 
> This problem still exists.  Note that I backed out a previous change I
> had made which eliminated the segfaults but didn't attack the real
> issue that pchild needs to live as long as our worker threads or bad
> stuff will surely happen.

Yes.  Now the questions are:

- Do we want to hold up a release for this?  If so, for how long?

- Is a graceless shutdown/restart at all possible with worker?
  [given the current APR thread API]

  Options I see for solving the problem are:

  - don't do graceless shutdown/restart, only do graceful.  (not acceptable)
  - implement apr_thread_cancel and call this on all threads prior to cleaning
    pchild.
  - allocate the sockets out of a special pool so we can clean up the sockets
    (using apr_pool_clear(psock)), sleep for 1 sec (should be enough for all
    threads to notice the sockets are gone).  After that clean pchild as usual.

Sander




Re: Release 2.0.36

Posted by Jeff Trawick <tr...@attglobal.net>.
"Sander Striker" <st...@apache.org> writes:

> > From: Cliff Woolley [mailto:jwoolley@virginia.edu]
> > Sent: 18 April 2002 16:44
>  
> >> What is the current status on 2.0.36-dev?

> Saw the fixes, so this is gone.
> 
> > 2) The worker shutdown segfault ... Jeff, does the patch you committed
> >    fix this for sure, or did it perhaps just hide the problem?  There
> >    didn't seem to be a consensus about this on-list.
> 
> Jeff?

This problem still exists.  Note that I backed out a previous change I
had made which eliminated the segfaults but didn't attack the real
issue that pchild needs to live as long as our worker threads or bad
stuff will surely happen.

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

RE: Release 2.0.36

Posted by Sander Striker <st...@apache.org>.
> From: Cliff Woolley [mailto:jwoolley@virginia.edu]
> Sent: 18 April 2002 16:44
 
>> What is the current status on 2.0.36-dev?
>
> Big things that I know of besides what's in bugzilla:
>
> 1) The MMAP bucket cleanup problem, which has been responsible for
>    some (rare-ish) segv's on daedalus [I think I figured out
>    how to fix this last night]

Saw the fixes, so this is gone.

> 2) The worker shutdown segfault ... Jeff, does the patch you committed
>    fix this for sure, or did it perhaps just hide the problem?  There
>    didn't seem to be a consensus about this on-list.

Jeff?

> 3) Did Greg's 416 problems get fixed?  Greg, I think you had a patch,
>    but did it get committed?  Is there still a known condition (related
>    to this or not) that will cause the byterange filter to segv?

Looks like this was fixed in httpd-2.0/modules/http/http_protocol.c rev 1.408.

> 4) Is #3 different from the C-L / byterange filter misordering problem?
>    If not, is this one fixed?

Greg?

> 5) What about the libtool --install issue and Sander's partial
>    patch for it?

This is taken care of.

I would very much get another release out the door by the end of the
week/start of next week.  We've had many bugfixes and I don't think we
should sit on them for much longer.

Sander



Re: Release 2.0.36

Posted by Cliff Woolley <jw...@virginia.edu>.
On Thu, 18 Apr 2002, Cliff Woolley wrote:

> > What is the current status on 2.0.36-dev?
>
> Big things that I know of besides what's in bugzilla:
>
> 1) The MMAP bucket cleanup problem, which has been responsible for
>    some (rare-ish) segv's on daedalus [I think I figured out
>    how to fix this last night]
>
> 2) The worker shutdown segfault ... Jeff, does the patch you committed
>    fix this for sure, or did it perhaps just hide the problem?  There
>    didn't seem to be a consensus about this on-list.
>
> 3) Did Greg's 416 problems get fixed?  Greg, I think you had a patch,
>    but did it get committed?  Is there still a known condition (related
>    to this or not) that will cause the byterange filter to segv?
>
> 4) Is #3 different from the C-L / byterange filter misordering problem?
>    If not, is this one fixed?

  5) What about the libtool --install issue and Sander's partial
     patch for it?

--Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA



Re: Release 2.0.36

Posted by Cliff Woolley <jw...@virginia.edu>.
On Thu, 18 Apr 2002, Sander Striker wrote:

> What is the current status on 2.0.36-dev?

Big things that I know of besides what's in bugzilla:

1) The MMAP bucket cleanup problem, which has been responsible for
   some (rare-ish) segv's on daedalus [I think I figured out
   how to fix this last night]

2) The worker shutdown segfault ... Jeff, does the patch you committed
   fix this for sure, or did it perhaps just hide the problem?  There
   didn't seem to be a consensus about this on-list.

3) Did Greg's 416 problems get fixed?  Greg, I think you had a patch,
   but did it get committed?  Is there still a known condition (related
   to this or not) that will cause the byterange filter to segv?

4) Is #3 different from the C-L / byterange filter misordering problem?
   If not, is this one fixed?

--Cliff