You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Ahmed Musa <do...@gmx.at> on 2008/02/20 15:39:38 UTC

mod_jk Problems - worker went to error state and dont recover

Hallo to all,
After long unsuccessful research i hope someone can give me a hint to the
following problems.

Our Apache-mod_jk-Tomcat Infrastructur was running without Problems for
about one year-than since two month mod_jk errors occurs.
We upgraded the mod_jk Version, made improvements in the worker.properties
- the problems changed and get less but sometimes they appear further on.
 
It seems that the mod_jk worker loose the connection to their
Tomcat-Backendserver - there are messages in the mod_jk log Files which
points in this direction.
Normally this seems not to be a big problem - but under certain conditions
(which ?) the worker goes to an error state and cannot recover itself- must
be done manually.

Problem 1: The Tomcats are reachable - unknown why the workers think the
server is dead ?
Problem 2: I have no idea why the worker goes to an error state and cannot
recover.
Problem3: I miss explanations of logged messages - i read the messages -
but cannot match them to the situation - when does a worker post this
messages

[Wed Feb 20 10:04:01.889 2008] [19237:3086010048] [info]
jk_handler::mod_jk.c (2270): Aborting connection for worker=ajp_ggi
[Wed Feb 20 10:04:39.799 2008] [19294:3086010048] [error]
ajp_get_reply::jk_ajp_common.c (1623): (INETP1011) Timeout with waiting
reply from tomca
t. Tomcat is down, stopped or network problems (errno=110)
[Wed Feb 20 10:04:39.799 2008] [19294:3086010048] [error]
ajp_service::jk_ajp_common.c (2034): (INETP1011) receiving reply from
tomcat failed with
out recovery in send loop attempt=0
[Wed Feb 20 10:04:41.799 2008] [19294:3086010048] [error]
service::jk_lb_worker.c (1105): unrecoverable error 504, request failed.
Tomcat failed i
n the middle of request, we can't recover to another instance.

-> Which Timeout - how does mod_jk think Tomcat is down ? Where can i found
details to errno=110 ?...
-> receiving reply from tomcat failed with out recovery in send loop
attempt=0  - ? with out recovery in send loop - means?
-> unrecoverable error 504 - details to this error ?

Ok - i turn the logging level to debug - the course of events get more
clear - but also more questions appear - there are socket numbers - which
sockets - what are these numbers e.g
will be shutting down socket 35 for worker INETP1021 - The sockets are good
for ? - how many are there/per worker ? can i configure them ?

=> Generally -How can i solve such problems - i tried to look into the
mod_jk code - searching for error codes, error messages - but cannot find
some relevant informations,
- i am studying the log Files - but don't find out what really happens.

So  - maybe someone has an idea why the worker think that the
corresponding Tomcat is dead, and why he will not recover by itself. !

And i am also searching for tips how i can help myself  - and where to
find something about the error codes, messages,..in mod_jk

thanks for your attention
Best
ahmed musa (writing from vienna)
 
Current Infrastructur
We have 3 Apache Webserver (2.2.6) -based on CentOS release 4.3 /
Kernelversion 2.6.9-34
In front of the Webserver there are two (two Locations) HW-Loadbalancer
(but they have no role in this story)
The Webservers are hosted at our ISP.
 
The Webserver balance the requests via mod_jk (Version 1.2.25) for approx.
10 Webapps to 18 Backend-Tomcatserver (Bladeserver - because of underlying
Application-Parts the OS ist Windows 2003 Server - a long story not worth
to explain :-) ). The Tomcatserver gain Data via Requests against
DB2 Server/DB2-Databases on the Mainframe. The Tomcatserver are Inhouse -
and were rebooted nightly because of automated Deployment processes.

Between the Webserver and the Tomcatserver is a Checkpoint Firewall.
 
All webapps are deployed on all Tomcats - only mod_jk manages the requests
to certain Tomcat- instances.
(on one Bladeserver there are two identically Tomcat Instances running).
 
Versions: Tomcat - 5.5.17_11, JDK 1.5.0_11-b03. The requests against the
public Website(s) are normal short living requests - not many -
The most Webapps (Portals) need a login, have a strong focus on business
logic - so the instances are big (many MBs in RAM), the sessions are sticky
and the session timeout is 20 minutes. But there are also less requests. To
the User requests - Monitoring requests from our ISP are added.
The Problems appears at Servers/Portals which very less Userrequests.

worker.properties
worker.list=ajp_bam,ajp_ggi,ajp_ad,ajp_svp,.......,jkstatus

worker.template.type=ajp13
worker.template.lbfactor=5
worker.template.socket_keepalive=1
worker.template.connect_timeout=7000
worker.template.prepost_timeout=5000
worker.template.reply_timeout=120000
worker.template.retries=6
worker.template.activation=Active
worker.template.recovery_options=7

worker.lbtemplate.type=lb
worker.lbtemplate.max_reply_timeouts=6
worker.lbtemplate.method=Session

#Produktions Worker
# AS-INETP101 - 106 - 6/6 GGI
worker.INETP1011.host=AS-INETP101.AEAT.ALLIANZ.AT
worker.INETP1011.port=65001
worker.INETP1011.reference=worker.template

....many more of the same

then

worker.ajp_ad.reference=worker.lbtemplate
worker.ajp_ad.balance_workers=INETP1032,INETP1062

.... many more portals

at least jkstatus

The JKMount is very simple
JkMount /* ajp_ad    --- for the other portals mostly the same

The Portals are Virtual Hosts on the Apache.

Tomcat - server.xml
example
<Connector port="65001" maxThreads="300" protocol="AJP/1.3" />
    <Engine name="Catalina" jvmRoute="INETP5021" defaultHost="default">
......
<Host name="slfinsol.com" appBase="webapps" unpackWARs="true"
autoDeploy="false" deployOnStartup="false" xmlValidation="false"
xmlNamespaceAware="false">
        <Alias>www.slfinsol.com</Alias>
        <Alias>web1.slfinsol.com</Alias>
        ...
        <Alias>testweb.slfinsol.com</Alias>
        .....
        <Valve className="org.apache.catalina.valves.AccessLogValve"
directory="logs" prefix="swl_access_log." suffix=".txt" pattern="common"
resolveHosts="false" />
        <Valve
className="at.allianz.tomcat.valve.RequestTimeValve"/>
        <Valve
className="at.allianz.tomcat.valve.WebcollaborationWorkaroundValve"/>
        <Context path="" docBase="swl" />
        <Context path="/monitor5" docBase="monitor" />
        <Context path="/swl" docBase="swl" />
      </Host>    
-- 

--------------------------------------------
Dipl.Ing.Ahmed Musa
Arnethgasse 62/12
A-1160 Wien
ahmed.musa@gmx.at
http://www.positiver-dialog.at
--------------------------------------------
-- 
Psst! Geheimtipp: Online Games kostenlos spielen bei den GMX Free Games! 
http://games.entertainment.web.de/de/entertainment/games/free

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org