You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Glenn Nielsen <gl...@mail.more.net> on 2003/09/25 15:53:35 UTC

[Fwd: Re: mod_jk does not detect a hung Tomcat]

Bill Barker wrote:
> ----- Original Message -----
> From: "Glenn Nielsen" <gl...@mail.more.net>
> To: "Tomcat Developers List" <to...@jakarta.apache.org>
> Sent: Wednesday, September 24, 2003 12:28 PM
> Subject: Re: mod_jk does not detect a hung Tomcat
> 
> 
> 
>>
>>Henri Gomez wrote:
>>
>>>David Rees a écrit :
>>>
>>>
>>>>Henri Gomez said:
>>>>
>>>>
>>>>>Henri Gomez a écrit :
>>>>>
>>>>>
>>>>>>>Nope since you don't have to just test at protocol level but also on
>>>>>>>higher level, for instance check the full chain, up to servlet
>>>>>>>handling.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>It's easy to simulate this behavior by sending a STOP signal to
>>>>>>>>Tomcat.
>>>>>>>>
>>>>>>>>I've also attached a log from mod_jk showing the problem.  I marked
>>>>>>>>the
>>>>>>>>point at which processing in mod_jk stopped until I sent a CONT
>>>>>>>>signal to
>>>>>>>>tomcat.
>>>>>>>>
>>>>>>>>Does mod_jk2 have this same problem?  Is there any interest in
>>>>>>>
> fixing
> 
>>>>>>>>this? Does anyone have a workaround for this issue?
>>>>>>>
>>>>>>>
>>>>>>>Well, if you have a hung tomcat, you're probably allready in serious
>>>>>>>trouble.
>>>>>>
>>>>
>>>>No, actually in my case I wasn't.  I had two Tomcats running, as one
>>>
> was
> 
>>>>prone to locking up due to a JVM or application bug.  With a 50-50 load
>>>>distribution between two Tomcats, this left me with 1/2 of the requests
>>>>getting stuck and clients waiting forever and tying up Apache
>>>>processes. Eventually, a DOS will be the result if action is not taken
>>>>in time.  If
>>>>mod_jk noticed it wasn't really alive, this wouldn't be an issue at
>>>
> all.
> 
>>>>
>>>>>>>Anyway, if we add stuff like time-out in ajp request, you could be
>>>>>>>stuck with long running servlets. Also jk read request in a blocking
>>>>>>>mode for performance and adding timeout here is not an option.
>>>>>>
>>>>
>>>>Agreed that we wouldn't want a timeout normally to handle normal long
>>>>running servlet processes, but if there was a PING/PONG added to the
>>>>protocol there should be a timeout to prevent the above situation.
>>>>
>>>>
>>>>
>>>>>>When I worked on ajp13++ (ajp14) protocol, I added a more secure auth
>>>>>>mecanism at connection time.
>>>>>>
>>>>>>Since there is a bidirectionnal communication, jk could detect that
>>>>>>even if the connection is open, the remote didn't respond and so fall
>>>>>>back to the next in cluster configuration.
>>>>>>
>>>>>>But on allready established connections, the problem persist.
>>>>>>
>>>>>>Or we should add a PING/PONG before sending any request to tomcat.
>>>>>>
>>>>>>It could be done as optional but I work on it only if many users make
>>>>>>such requirements
>>>>>
>>>>>
>>>>>if many users ask for such feature ;)
>>>>
>>>>
>>>>
>>>>Well, you've got one so far.  ;-)  Adding a configurable option to have
>>>>mod_jk verify (PING/PONG) that Tomcat is actually responding before
>>>
> using
> 
>>>>the connection would solve the problem and I can't imagine that it
>>>
> would
> 
>>>>add a lot of complexity to the code as well.  If I wasn't so rusty
>>>>with my
>>>>C programming and had some spare time, I would offer to help code it
>>>>up. ;-)  In any case, I'll be more than happy to help test.
>>>
>>>
>>>Well, if you could find more users or at least one tomcat commiter
>>>(Glenn, Remy, Costin, JFC...) who need it, I'll add the necessary code
>>>in java and C areas ;)
>>>
>>
>>
>>There may be a simple way to achieve what David is asking for without
>>setting a request timeout or implementing a PING/PONG between mod_jk
>>and Tomcat.
>>
>>What if each worker tracked the number of requests which were handled
>>by the worker since the last successful completion of a request.
>>
>>i.e. add the following to a worker
>>
>>worker->last_completed // Time in seconds since last successfully
> 
> completed request
> 
>>worker->requests_since_last_completed  // Number of requests sent to
> 
> worker
> 
>>since last successful completion.
>>
>>Then logic could be added to try and detect an instance of Tomcat which
> 
> has
> 
>>failed.  Perhaps even allow several additional worker properties to
> 
> determine
> 
>>when mod_jk should consider the worker failed.
> 
> 
> This won't work  with the pre-fork MPM, since each Apache child will have
> its own idea of the timing.  The only way that it could tell that a Tomcat
> failed is to try the request and fail :).
> 

Argh, you are right, this goes back to the age old problem of not being able
to write a global worker connection pool or shared memory with the current code.

The only way to move forward would be to rewrite mod_jk 1.2 to use APR.

Glenn

----------------------------------------------------------------------
Glenn Nielsen             glenn@more.net | /* Spelin donut madder    |
MOREnet System Programming               |  * if iz ina coment.      |
Missouri Research and Education Network  |  */                       |
----------------------------------------------------------------------