You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Lars George <la...@worldlingo.com> on 2003/05/16 11:21:05 UTC
Problems with hanging Ajp13Processors

Sorry for posting this here and in the user list but I feel this is a
developer issue.

Hi,

We have a big problem. We have 3 Apaches 1.3.27 using mod_jk2 from the
connectors 4.1.24 package. The mod_jk2 is setup as a loadbalancer talking to
4 app servers using Tomcat 4.1.24 - all of them run on Linux 2.2.14 servers.
Currently we are using the Ajp13Connector on the Tomcat side, but bear in
mind the the problem I describe herein also is present with the
CoyoteConnector.

What happens is the following, as long as there are a certain number of
requests everything works fine, but when it goes over that certain threshold
it starts to hang more and more Ajp13Processor threads on the Tomcat side.
We are not talking about a very heavy load here, maybe 1-3 requests per
second per Tomcat.

Once over the threshold the Ajp13Processor get to the point in Ajp13.java
where they try to receive the next request and more specific the header of
this next request. There is a "readN()" call where it just hangs. Since they
do not recover they are stuck and over a few minutes we run out of
processors and the Ajp13Connector starts rejecting further connections.

We have 177 processors defined for each Tomcat and from the less than 10
that do the actual qork, ie. they are still serving requests, we have 167
odd ones hanging on the read operation.

I checked the mod_jk2 sources plus the Tomcat connector sources and I cannot
see where this is happening and way. Since we have an average of 70-80
concurrent requests per Apache (using mod_status to get the current values)
and 3 of them which results in a number a lot less smaller than what we
should be able to handle with 4 x 177 slots on the Tomcats.

I actually hacked the code to put a timeout around that readN where our
threads block but this caused havoc on the Tomcats, they were not noticing
this properly and accept more and more requests but not finishing them.
Anyhow, that was not the solution to our problem.

I can only guess it results from the mod_jk2 attempting to connect three
(the standard value) times per failed worker. Assuming somehow our Tomcat pr
ocessors do not except the request quick enough it means three processors
will be contacted. They all start receiving the request header after they
got woken up and this is where they sit since nothing is sent. Of course,
mod_jk2 seems to close the socket for workers which fail but there is
apparently no IOException thrown for the client socket.

Off-peak, when there are say 1 or less requests per second per Tomcat all
works fine, if they would have the same problem they should still run full
but slower. But this does not seem to be the case, it only ever happens
under heavier load.

Also, when this happens, ie. all 4 Tomcat have all each 177 slots full and
they start refusing further connections, the scoreboard of Apache as shown
by mod_status shows no pending request reads etc. They only show the
established connections. This is why I think it is caused by the internal
retries of mod_jk2.

Is there maybe a bug in the Linux kernel we use that caused the socket not
to be notified of its remote closure? But then, why does mod_jk2 think the
first worker it tries does fail in the first place? A timing issue?

Please, is there anyone out there who could help with this issue, I would
appreciate that greatly!

Kind regards,
Lars


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org