You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by Mike Essex <ms...@metro1.com> on 1999/12/16 23:02:20 UTC

mod_jserv/5485: Servlets stop responding after working correctly for an extended period

>Number:         5485
>Category:       mod_jserv
>Synopsis:       Servlets stop responding after working correctly for an extended period
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    jserv
>State:          open
>Class:          sw-bug
>Submitter-Id:   apache
>Arrival-Date:   Thu Dec 16 14:10:01 PST 1999
>Last-Modified:
>Originator:     msx@metro1.com
>Organization:
apache
>Release:        apache 1.3.6 and jserv 1.0
>Environment:
SunOS dat-chi 5.6 Generic_105181-13 sun4u sparc SUNW,Ultra-4
JDBC access to an oracle listener
Sun javac compiler
>Description:
We are using apache and jserv as an oracle database front end for a web-based application.
There are approximately 40 servlets which work in conjunction to produce the correct HTML displays and control
database activities.

The web server and jserv run for extended periods with no problems and then the servlets stop responding.
The web server is still running.  Each servlet logs its transactions and status to log files but there are no problem
indications in those files.  The /usr/local/apache/jserv/logs files give no problem indications either.  The
/usr/local/apache/logs/access_log shows this access in progress when the servlets stopped responding

[15/Dec/1999:22:29:49 -0600] "GET /mdex/DirectorySelectForAdd?. . . . . . (deleted this user's personal info)

By checking a different servlet's logfile which has activity about once a second I was able to see that that
servlet stopped responding at the same time.

The next lines in the access_log shows this line repeated 5 more times at 5 minute 2 seconds intervals.  I'm
guessing this is coming from the web server itself since our user applications do not automatically send retries.

The /usr/local/apache/logs/error_log file does not show an entry with a timestamp at the same time that the
servlets stop responding but there are the following lines between the last entry and when the web server
was restarted:

thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH.
thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH.
thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH.
thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH.
thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH.
thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH.
thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH.
thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH.
thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH.
thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH.
thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH.
thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH.
thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH.
thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH.
thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH.
thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH.
thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH.
thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH.
thr_continue of 0xeabc07f8(16) failed: 3 = ESRCH.
thr_continue of 0xeabc06d0(-280320148) failed: 3 = ESRCH.
thr_continue of 0xeabc06e0(0) failed: 3 = ESRCH.

The servlets did not respond until the problem was discovered about 12 hours later.  The web server was
restarted by "apachectl graceful" and the servlets started working again.
>How-To-Repeat:
This problem occurs randomly as far as we can tell.  We have not been able to cause it to happen.  Since the
web server is being restarted frequently as we make changes to the servlets (the product is still in beta) it is not
clear if it is related to the amount of time or number of transactions since start time.  This has occured about
once a week.  Our application is a 7x24 used by major cell phone carriers and even short outages
are not permitted.
>Fix:
Don't have the slightest, but it has the feel of a breakdown in communications between the web server and jserv
or the jserv gets locked up on an internal error.
>Audit-Trail:
>Unformatted:
[In order for any reply to be added to the PR database, you need]
[to include <ap...@Apache.Org> in the Cc line and make sure the]
[subject line starts with the report component and number, with ]
[or without any 'Re:' prefixes (such as "general/1098:" or      ]
["Re: general/1098:").  If the subject doesn't match this       ]
[pattern, your message will be misfiled and ignored.  The       ]
["apbugs" address is not added to the Cc line of messages from  ]
[the database automatically because of the potential for mail   ]
[loops.  If you do not include this Cc, your reply may be ig-   ]
[nored unless you are responding to an explicit request from a  ]
[developer.  Reply only with text; DO NOT SEND ATTACHMENTS!     ]