You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2005/08/02 17:11:39 UTC

DO NOT REPLY [Bug 35974] New: - Occasional seg fault/bus error in NFS hosted includes-parsed files

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=35974>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35974

           Summary: Occasional seg fault/bus error in NFS hosted includes-
                    parsed files
           Product: Apache httpd-2.0
           Version: 2.0.54
          Platform: Sun
        OS/Version: Solaris
            Status: NEW
          Severity: normal
          Priority: P2
         Component: mod_include
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: stuart@terminus.co.uk


We're running Apache 2.0.54 on multiple Solaris 8 machines, using the prefork
MPM, mmap and sendfile turned off - sharing a NFS mounted docroot hosted by a
NetApp. Occasionally we are getting child processes die with either a bus error
or segmentation fault - it seems to be taking the order of 1-2 hours of
high-load production traffic to happen. From the cores I've inspected, they all
seem to be when a request is being parsed by mod_include (so I've marked that as
the component - though something else may be at fault).

Having searched around, bug 34708 and bug 19325 look as if they might be related
to this. Certainly the files in our docroot do get modified. The process would
be for the file to be removed, and then recreated by another system. So Apache
might be getting upset by the file disappearing mid request. It's unlikely
(though I guess possible) that a file would be removed and recreated in time for
Apache to have one request on the original file and the subsequent request on
the new file (ie: without accessing it in the period of "no file"). My attempts
at trying to recreate this scenario without production traffic have failed. So
it's been difficult to rule out particular system components (though we have
unloaded as much of our code as possible). Additionally, we are seeing some
errors on a couple of servers that have a different docroot (though still on
NFS) where files seldom change (or at least not at the times we've seen errors).

Based on comments in one of the bugs above, and looking at the head revision of
apr-util-trunk, I've also patched apr_buckets_file.c (two liner - so it checks
for APR_EOF) - I'll attach the diff. We still get errors (though it doesn't
appear to hurt so I've left it in for now)...

I've got two backtraces which I'll attach. Unfortunately we weren't initially
compiling with debugging symbols, though we have now enabled that. Since
enabling, we've only had one error - otherwise I'd attach more.

I'd offer to upload the core file but it's 100Mb as we have a huge chunk of
shared memory in use. Note that I don't think that is to blame because the other
servers don't use it (waiting for a segfault there so I can upload that
instead), and because the points of failure don't implicate it at all.

Any questions or things you'd like me to try - please ask. Really keen to get to
the bottom of this.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org