You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Glenn Nielsen <gl...@mail.more.net> on 2003/09/25 15:53:35 UTC
[Fwd: Re: mod_jk does not detect a hung Tomcat]
Bill Barker wrote:
> ----- Original Message -----
> From: "Glenn Nielsen" <gl...@mail.more.net>
> To: "Tomcat Developers List" <to...@jakarta.apache.org>
> Sent: Wednesday, September 24, 2003 12:28 PM
> Subject: Re: mod_jk does not detect a hung Tomcat
>
>
>
>>
>>Henri Gomez wrote:
>>
>>>David Rees a écrit :
>>>
>>>
>>>>Henri Gomez said:
>>>>
>>>>
>>>>>Henri Gomez a écrit :
>>>>>
>>>>>
>>>>>>>Nope since you don't have to just test at protocol level but also on
>>>>>>>higher level, for instance check the full chain, up to servlet
>>>>>>>handling.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>It's easy to simulate this behavior by sending a STOP signal to
>>>>>>>>Tomcat.
>>>>>>>>
>>>>>>>>I've also attached a log from mod_jk showing the problem. I marked
>>>>>>>>the
>>>>>>>>point at which processing in mod_jk stopped until I sent a CONT
>>>>>>>>signal to
>>>>>>>>tomcat.
>>>>>>>>
>>>>>>>>Does mod_jk2 have this same problem? Is there any interest in
>>>>>>>
> fixing
>
>>>>>>>>this? Does anyone have a workaround for this issue?
>>>>>>>
>>>>>>>
>>>>>>>Well, if you have a hung tomcat, you're probably allready in serious
>>>>>>>trouble.
>>>>>>
>>>>
>>>>No, actually in my case I wasn't. I had two Tomcats running, as one
>>>
> was
>
>>>>prone to locking up due to a JVM or application bug. With a 50-50 load
>>>>distribution between two Tomcats, this left me with 1/2 of the requests
>>>>getting stuck and clients waiting forever and tying up Apache
>>>>processes. Eventually, a DOS will be the result if action is not taken
>>>>in time. If
>>>>mod_jk noticed it wasn't really alive, this wouldn't be an issue at
>>>
> all.
>
>>>>
>>>>>>>Anyway, if we add stuff like time-out in ajp request, you could be
>>>>>>>stuck with long running servlets. Also jk read request in a blocking
>>>>>>>mode for performance and adding timeout here is not an option.
>>>>>>
>>>>
>>>>Agreed that we wouldn't want a timeout normally to handle normal long
>>>>running servlet processes, but if there was a PING/PONG added to the
>>>>protocol there should be a timeout to prevent the above situation.
>>>>
>>>>
>>>>
>>>>>>When I worked on ajp13++ (ajp14) protocol, I added a more secure auth
>>>>>>mecanism at connection time.
>>>>>>
>>>>>>Since there is a bidirectionnal communication, jk could detect that
>>>>>>even if the connection is open, the remote didn't respond and so fall
>>>>>>back to the next in cluster configuration.
>>>>>>
>>>>>>But on allready established connections, the problem persist.
>>>>>>
>>>>>>Or we should add a PING/PONG before sending any request to tomcat.
>>>>>>
>>>>>>It could be done as optional but I work on it only if many users make
>>>>>>such requirements
>>>>>
>>>>>
>>>>>if many users ask for such feature ;)
>>>>
>>>>
>>>>
>>>>Well, you've got one so far. ;-) Adding a configurable option to have
>>>>mod_jk verify (PING/PONG) that Tomcat is actually responding before
>>>
> using
>
>>>>the connection would solve the problem and I can't imagine that it
>>>
> would
>
>>>>add a lot of complexity to the code as well. If I wasn't so rusty
>>>>with my
>>>>C programming and had some spare time, I would offer to help code it
>>>>up. ;-) In any case, I'll be more than happy to help test.
>>>
>>>
>>>Well, if you could find more users or at least one tomcat commiter
>>>(Glenn, Remy, Costin, JFC...) who need it, I'll add the necessary code
>>>in java and C areas ;)
>>>
>>
>>
>>There may be a simple way to achieve what David is asking for without
>>setting a request timeout or implementing a PING/PONG between mod_jk
>>and Tomcat.
>>
>>What if each worker tracked the number of requests which were handled
>>by the worker since the last successful completion of a request.
>>
>>i.e. add the following to a worker
>>
>>worker->last_completed // Time in seconds since last successfully
>
> completed request
>
>>worker->requests_since_last_completed // Number of requests sent to
>
> worker
>
>>since last successful completion.
>>
>>Then logic could be added to try and detect an instance of Tomcat which
>
> has
>
>>failed. Perhaps even allow several additional worker properties to
>
> determine
>
>>when mod_jk should consider the worker failed.
>
>
> This won't work with the pre-fork MPM, since each Apache child will have
> its own idea of the timing. The only way that it could tell that a Tomcat
> failed is to try the request and fail :).
>
Argh, you are right, this goes back to the age old problem of not being able
to write a global worker connection pool or shared memory with the current code.
The only way to move forward would be to rewrite mod_jk 1.2 to use APR.
Glenn
----------------------------------------------------------------------
Glenn Nielsen glenn@more.net | /* Spelin donut madder |
MOREnet System Programming | * if iz ina coment. |
Missouri Research and Education Network | */ |
----------------------------------------------------------------------