You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@tomcat.apache.org by ti...@bt.com on 2007/12/10 12:17:14 UTC

ISAPI JK2 ran better than JK, how can that be?

OK,
 
So our website keeps crashing over the past couple of weeks (usual story on this list eh?)
 
We've been running JK isapi plugin v1.2.15 for a fair while, but the isapi redirector log always contains huge numbers of errors being thrown (see snippet below). We were getting a complete failure of IIS to serve traffic, solved only by a restart of IIS and Tomcat.

Very recently we moved up to v1.2.25 in the hope of improving performance but it seems to have little effect- we're still getting high numbers of 503 responses sent back (maybe 5+ per minute).
We do however now serve static resources from IIS to reduce the use of the ISAPI calls where possible.
 
But here's the kicker: - previously this year we were still using a JK2 isapi_redirector2.dll, and that seemed to be serving comparable traffic rates with fewer errors (certainly no complete failures). No hard data to support this yet, just my recollection of serious outages over the past couple of years.
 
 
AWStats on our log files suggests our incident traffic is ~7 million pages per month, peaking at lunchtime & early evening at perhaps 3-5 reqs/sec.
 
Scaling to multiple tomcats is not an option right now due to 3rd party license costs in the webapp (its a CMS system).
 
Our environment:
Java 1.3.1
Tomcat 4.1.18
IIS v5
IIS & Tomcat are co-located on same server  (4GB RAM, win2k o/s)
 
Questions:

- Are there obvious worker directives that would help the issue further ?
- In the list archives I've seen conflicting views on what to set connectionTimeout to be in the tomcat and worker config. Some say 0, some say 600 secs. Which tends to be more useful? 
- All of the 00002745 errors - do they indicate a network problem upstream of the server?
- When viewing the jkstatus page, the worker only shows type, host, address. I was expecting further data as listed in the legend. Am I missing something?
 
 
isapi log:
 
[Fri Dec 07 03:35:16 2007] [error] jk_isapi_plugin.c (639): WriteClient failed with 00002745
[Fri Dec 07 03:35:16 2007] [info]  jk_ajp_common.c (1384): Connection aborted or network problems
[Fri Dec 07 03:35:16 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat failed, because of client error without recovery in send loop 0
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient failed with 00002746
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient failed with 00002746
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection aborted or network problems
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection aborted or network problems
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat failed, because of client error without recovery in send loop 0
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient failed with 00002746
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat failed, because of client error without recovery in send loop 0
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient failed with 00002746
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection aborted or network problems
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection aborted or network problems
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat failed, because of client error without recovery in send loop 0
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat failed, because of client error without recovery in send loop 0
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient failed with 00002746
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient failed with 00002746
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection aborted or network problems
 
Out Tomcat connector config - 
 
<Connector className="org.apache.coyote.tomcat4.CoyoteConnector" redirectPort="8443" bufferSize="2048" port="8009" connectionTimeout="300000" scheme="http" enableLookups="false" secure="false" protocolHandlerClassName="org.apache.jk.server.JkCoyoteHandler" debug="0" disableUploadTimeout="false" proxyPort="0" maxProcessors="200" minProcessors="2" tcpNoDelay="true" acceptCount="20" useURIValidationHack="false">
  <Factory className="org.apache.catalina.net.DefaultServerSocketFactory"/>
</Connector>
 
worker.properties - 
 
worker.website.type=ajp13
worker.website.host=localhost
worker.website.port=8009
# 200 concurrent users
worker.website.connection_pool_size=200
worker.website.connection_pool_timeout=300 
 

 

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

RE: ISAPI JK2 ran better than JK, how can that be?

Posted by ti...@bt.com.

Martin
 
Yes, I've read that article, and we're on a win2k server box. That isn't the issue I believe.
 
thanks anyway
 
 
Tim

 
 
________________________________

From: Martin Gainty [mailto:mgainty@hotmail.com]
Sent: Mon 10/12/2007 13:28
To: Tomcat Users List
Subject: Re: ISAPI JK2 ran better than JK, how can that be?



Hi Tim

non server products like Windows 2000 Professional or Windows XP the number
of concurrent connections is limited to 10.
This is discussed in Mladens article located at
http://people.apache.org/~mturk/docs/article/ftwai.html

M-


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: ISAPI JK2 ran better than JK, how can that be?

Posted by Martin Gainty <mg...@hotmail.com>.

Hi Tim

non server products like Windows 2000 Professional or Windows XP the number
of concurrent connections is limited to 10.
This is discussed in Mladens article located at
http://people.apache.org/~mturk/docs/article/ftwai.html

M-
----- Original Message -----
From: <ti...@bt.com>
To: <us...@tomcat.apache.org>
Cc: <gr...@bt.com>
Sent: Monday, December 10, 2007 6:17 AM
Subject: ISAPI JK2 ran better than JK, how can that be?


OK,

So our website keeps crashing over the past couple of weeks (usual story on
this list eh?)

We've been running JK isapi plugin v1.2.15 for a fair while, but the isapi
redirector log always contains huge numbers of errors being thrown (see
snippet below). We were getting a complete failure of IIS to serve traffic,
solved only by a restart of IIS and Tomcat.

Very recently we moved up to v1.2.25 in the hope of improving performance
but it seems to have little effect- we're still getting high numbers of 503
responses sent back (maybe 5+ per minute).
We do however now serve static resources from IIS to reduce the use of the
ISAPI calls where possible.

But here's the kicker: - previously this year we were still using a JK2
isapi_redirector2.dll, and that seemed to be serving comparable traffic
rates with fewer errors (certainly no complete failures). No hard data to
support this yet, just my recollection of serious outages over the past
couple of years.


AWStats on our log files suggests our incident traffic is ~7 million pages
per month, peaking at lunchtime & early evening at perhaps 3-5 reqs/sec.

Scaling to multiple tomcats is not an option right now due to 3rd party
license costs in the webapp (its a CMS system).

Our environment:
Java 1.3.1
Tomcat 4.1.18
IIS v5
IIS & Tomcat are co-located on same server  (4GB RAM, win2k o/s)

Questions:

- Are there obvious worker directives that would help the issue further ?
- In the list archives I've seen conflicting views on what to set
connectionTimeout to be in the tomcat and worker config. Some say 0, some
say 600 secs. Which tends to be more useful?
- All of the 00002745 errors - do they indicate a network problem upstream
of the server?
- When viewing the jkstatus page, the worker only shows type, host, address.
I was expecting further data as listed in the legend. Am I missing
something?


isapi log:

[Fri Dec 07 03:35:16 2007] [error] jk_isapi_plugin.c (639): WriteClient
failed with 00002745
[Fri Dec 07 03:35:16 2007] [info]  jk_ajp_common.c (1384): Connection
aborted or network problems
[Fri Dec 07 03:35:16 2007] [info]  jk_ajp_common.c (1731): Receiving from
tomcat failed, because of client error without recovery in send loop 0
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient
failed with 00002746
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient
failed with 00002746
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection
aborted or network problems
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection
aborted or network problems
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1731): Receiving from
tomcat failed, because of client error without recovery in send loop 0
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient
failed with 00002746
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1731): Receiving from
tomcat failed, because of client error without recovery in send loop 0
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient
failed with 00002746
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection
aborted or network problems
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection
aborted or network problems
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1731): Receiving from
tomcat failed, because of client error without recovery in send loop 0
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1731): Receiving from
tomcat failed, because of client error without recovery in send loop 0
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient
failed with 00002746
[Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient
failed with 00002746
[Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection
aborted or network problems

Out Tomcat connector config -

<Connector className="org.apache.coyote.tomcat4.CoyoteConnector"
redirectPort="8443" bufferSize="2048" port="8009" connectionTimeout="300000"
scheme="http" enableLookups="false" secure="false"
protocolHandlerClassName="org.apache.jk.server.JkCoyoteHandler" debug="0"
disableUploadTimeout="false" proxyPort="0" maxProcessors="200"
minProcessors="2" tcpNoDelay="true" acceptCount="20"
useURIValidationHack="false">
  <Factory className="org.apache.catalina.net.DefaultServerSocketFactory"/>
</Connector>

worker.properties -

worker.website.type=ajp13
worker.website.host=localhost
worker.website.port=8009
# 200 concurrent users
worker.website.connection_pool_size=200
worker.website.connection_pool_timeout=300




---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: ISAPI JK2 ran better than JK, how can that be?

Posted by Rainer Jung <ra...@kippdata.de>.

Hi Tim,

tim.fulcher@bt.com wrote:
> But now I have a load balancer in place, with one worker. A status is
> shown below after about 6hrs of traffic. As you can see the client
> errors make up about 6% of all incident traffic, though they dwarf
> server errors.
> 
> Acc	 Err	 CE	 RE	 Wr	 Rd	 Busy	 Max
 >42245	 97	 2556	 0	 24M	 1.4G	 18	 60

OK, first of all we have 42245 requests forwarded since last restart of
the web server, if this is 6 hours, it's about 2 requests per second
(mean value, not peak). Your busyness (see also below) is 18. If your
busyness is often that high, it means that your medium response time
should be something like

busyness / load = 18 requests / (2 requests per second) = 9 seconds.

2556 client errors, i.e. either the request could not be read
completely, or more likely the response could not be returned completely
is pretty much (more than 5%). Also: 97 errors really mean, that 97
times there was a serious problem between the web server and tomcat. We
don't know though, how often there was a rot cause. Maybe it was only a
problem once, and 97 requests ran into it very quickly after each other.

> But then just as I'm writing this my site falls over - I get the
> Service Temporarily Unavailable consistently on my pages. I grabbed a
> snapshot of the status worker -

Good, so you know at least, that the web server is still able to respond :)

State	 Acc	 Err	 CE	 RE	 Wr	 Rd	 Busy	 Max
ERR/FRC	 43181	 691	 2597	 0	 24M	 1.4G	 100	 115

OK, so the plugin detected an error ("ERR"), when talking to Tomcat.
Details on this error should be in the log file. Since the load balancer
only has one member, it doesn't take the only member out of service
(what it would do, if it had more members), instead it does forced
recovery ("FRC"), i.e. it still sends requests there, although it failed
with the requests before.

We can see, that from the 936 requests forwarded since the last
snapshot, 691 ran into an error, and from the remaining 245 requests 39
into got a client error, even more than the 5% we had over 6 hours.

The most important thing though is, that the busyness went up to 100
with a max of 115. Busyness is the number of requests, which are
currently being processed by the backend (more precisely those, that
have been forwarded by this web server and not fully returned yet).

The fact, that your busyness went up that far usually means, that
something in your backend got very slow. If this is true, and what is
slow, can be analyzed best using java thread dumps of the backend process.

> Clearly there seemed to be some catastrophic occurrence that made the
> Error count rocket and the worker state change. I'm unfamliar with
> the load balancer - will a state of ERR/FRC be rectified somehow? For
> now I just restarted IIS & Tomcat which appears to be the only method
> of recovery at present

To check the theory, that the problem lies within the backend, and not 
IIS, you could try to find out what's happening, if you only restart 
tomcat. If it's really the backend, the isapi redirector should be able 
to forward traffic again without IIS restart.

In order to prevent issues from long TC timeouts when you shutdown the 
backend, you could first take the status of the backend to "Stop" in the 
load balancer. Then the load balancer will immediately answer all 
requests with an error and not stack them up in frnt of the backend. 
Then resart Tomcat and after full Tomcat startup, take the load balancer 
again to "Active". If your service runs again, then it's likely a 
backend problem.

Diagnosing without thread dumps might be hard. Have a look at the log files.

> cheers
> 
> 
> Tim

Regards,

Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

RE: ISAPI JK2 ran better than JK, how can that be?

Posted by ti...@bt.com.

Hmm, I'm running Tomcat 4.1.18 which didn't even support the %D logging directive :-/

But now I have a load balancer in place, with one worker. A status is shown below after about 6hrs of traffic.
As you can see the client errors make up about 6% of all incident traffic, though they dwarf server errors.

Type	 Sticky Sessions	 Force Sticky Sessions	 Retries	 LB Method	 Locking	 Recover Wait Time	 Max Reply Timeouts	
lb	 True	 False	 2	 Request	 Optimistic	 60	 0	

Good	 Degraded	 Bad/Stopped	 Busy	 Max Busy	 Next Maintenance	
1	 0	 0	 18	 60	 37/99	

Balancer Members

 	 Name	 Type	 Host	 Addr	 Act	 State	 D	 F	 M	 V	 Acc	 Err	 CE	 RE	 Wr	 Rd	 Busy	 Max	 Route	 RR	 Cd	 Rs	
	worker41	 ajp13	 localhost:8009	 127.0.0.1:8009	 ACT	 OK	 0	 1	 1	 140	 42245	 97	 2556	 0	 24M	 1.4G	 18	 60	 worker41	  	  	 0/0	

It certainly gives me a better picture of what's going on, but I'm no further towards what is stopping everything working on a daily basis.

But then just as I'm writing this my site falls over - I get the Service Temporarily Unavailable consistently on my pages. I grabbed a snapshot of the status worker - 

Type	 Sticky Sessions	 Force Sticky Sessions	 Retries	 LB Method	 Locking	 Recover Wait Time	 Max Reply Timeouts	
lb	 True	 False	 2	 Request	 Optimistic	 60	 0	

Good	 Degraded	 Bad/Stopped	 Busy	 Max Busy	 Next Maintenance	
1	 0	 0	 100	 115	 21/83	

Balancer Members

 	 Name	 Type	 Host	 Addr	 Act	 State	 D	 F	 M	 V	 Acc	 Err	 CE	 RE	 Wr	 Rd	 Busy	 Max	 Route	 RR	 Cd	 Rs	
	worker41	 ajp13	 localhost:8009	 127.0.0.1:8009	 ACT	 ERR/FRC	 0	 1	 1	 132	 43181	 691	 2597	 0	 24M	 1.4G	 100	 115	 worker41	  	  	 0/0	

Clearly there seemed to be some catastrophic occurrence that made the Error count rocket and the worker state change. I'm unfamliar with the load balancer - will a state of ERR/FRC be rectified somehow? For now I just restarted IIS & Tomcat which appears to be the only method of recovery at present

cheers

Tim

________________________________

From: Rainer Jung [mailto:rainer.jung@kippdata.de]
Sent: Mon 10/12/2007 15:39
To: Tomcat Users List
Cc: Ludwig,GJA,Graeme,DGE R
Subject: Re: ISAPI JK2 ran better than JK, how can that be?

Hi Tim,

tim.fulcher@bt.com wrote:
> Hi  Rainer,
>
> Thanks for the response. To cover a few points you made -
>
> - Yes, I had a hunch long running requests are a problem; because of
> our appliction design, some pages invoked for the first time take a
> while (we can't cache them all!).  Is there an easy way to correlate
> (apart from timestamp) the errors in the isapi and the requests made
> to IIS ?  I mean can I get isapi log to show the URL being processed?

I'm afraid the answer most liekely is "no". You can file an enhancement
request in our bugzilla. That way the feature might materialize one day.

For Apache httpd, correlation between error an log message is a little
better. Even though we don't include the URL, we include the PID and
thread ID, and both can be logged in the httpd access log. So having
time, process id and thread id, usually makes it possible to correlate
successful, although it is still some work and not perfect.

>  - I've now got the %D option in place on Tomcat to figure out from
> tomorrow which are the heavy pages - Yes, thread dumps on JDK 1.3 &
> TC4.1.x are tricky - I'm looking at the Tomcat JavaWrapper approach
> as a way forward. The version of the 3rd party product we have in
> place is only supported on jdk1.3  (this is a pretty ancient set up!)
>  - I agree that my incident traffic load is not huge, and should be
> supportable by the environment in place. - I'll try a load balancer
> worker to see if that tells me more info

At least it does tell you the number of errors a worker had, and also
the number of client errors. That way you can check quickly, if there
are more errors than client errors, and by pollling the values, you can
find out how often and during which times things are happening. Only
counters though, no per incodent information.

The page can be configured to return machine readable content. Have al
look at:

http://tomcat.apache.org/connectors-doc/reference/status.html

> If possible I'll have some more information in a day or so from
> this...
>
>
> cheers
>
>
> Tim

Regards,

Rainer

Re: ISAPI JK2 ran better than JK, how can that be?

Posted by Rainer Jung <ra...@kippdata.de>.

Hi Tim,

tim.fulcher@bt.com wrote:
> Hi  Rainer,
> 
> Thanks for the response. To cover a few points you made -
> 
> - Yes, I had a hunch long running requests are a problem; because of
> our appliction design, some pages invoked for the first time take a
> while (we can't cache them all!).  Is there an easy way to correlate
> (apart from timestamp) the errors in the isapi and the requests made
> to IIS ?  I mean can I get isapi log to show the URL being processed?

I'm afraid the answer most liekely is "no". You can file an enhancement 
request in our bugzilla. That way the feature might materialize one day.

For Apache httpd, correlation between error an log message is a little 
better. Even though we don't include the URL, we include the PID and 
thread ID, and both can be logged in the httpd access log. So having 
time, process id and thread id, usually makes it possible to correlate 
successful, although it is still some work and not perfect.

>  - I've now got the %D option in place on Tomcat to figure out from
> tomorrow which are the heavy pages - Yes, thread dumps on JDK 1.3 &
> TC4.1.x are tricky - I'm looking at the Tomcat JavaWrapper approach
> as a way forward. The version of the 3rd party product we have in
> place is only supported on jdk1.3  (this is a pretty ancient set up!)
>  - I agree that my incident traffic load is not huge, and should be
> supportable by the environment in place. - I'll try a load balancer
> worker to see if that tells me more info

At least it does tell you the number of errors a worker had, and also 
the number of client errors. That way you can check quickly, if there 
are more errors than client errors, and by pollling the values, you can 
find out how often and during which times things are happening. Only 
counters though, no per incodent information.

The page can be configured to return machine readable content. Have al 
look at:

http://tomcat.apache.org/connectors-doc/reference/status.html

> If possible I'll have some more information in a day or so from
> this...
> 
> 
> cheers
> 
> 
> Tim

Regards,

Rainer

> 
> ________________________________
> 
> From: Rainer Jung [mailto:rainer.jung@kippdata.de] Sent: Mon
> 10/12/2007 12:11 To: Tomcat Users List Subject: Re: ISAPI JK2 ran
> better than JK, how can that be?
> 
> 
> 
> Hi Tim,
> 
> tim.fulcher@bt.com wrote:
>> OK,
>> 
>> So our website keeps crashing over the past couple of weeks (usual 
>> story on this list eh?)
> 
> Not really (although a users list is always focused on problems and
> not on the working side of things ...)
> 
>> We've been running JK isapi plugin v1.2.15 for a fair while, but
>> the isapi redirector log always contains huge numbers of errors
>> being thrown (see snippet below). We were getting a complete
>> failure of IIS to serve traffic, solved only by a restart of IIS
>> and Tomcat.
>> 
>> Very recently we moved up to v1.2.25 in the hope of improving 
>> performance but it seems to have little effect- we're still getting
>>  high numbers of 503 responses sent back (maybe 5+ per minute). We
>> do however now serve static resources from IIS to reduce the use of
>> the ISAPI calls where possible.
> 
> The error number 2746 is hex for 10054, which is a connection reset
> by peer winsock error. peer in this case is your IIS client (browser
> etc.).
> 
> Often this is caused by long running requests, where the users press
> the retry/again button. Then the browser immediately closes the
> connection and uses a new one for the same request. When you are
> sending back the response later, the closed connection gets detected
> and logged.
> 
> You should configure logging of response durations to find out, if
> maybe you've got a problem with long running requests. You can do
> that using the JK request logging, or with the Tomcat access log (add
> format %D to your pattern, which is duration milliseconds).
> 
> Usually this does *not* mean, that restarting IIS or Tomcat will
> help.
> 
> Concerning Tomcat: you should do a couple of thread dumps before 
> restarting it. That way you can find out, if lots of requests got
> stuck inside the container, and if so, what they are actually doing
> or waiting for.
> 
> Concerning IIS: does "netstat -an" look fine, once you think you need
> to restart?
> 
>> But here's the kicker: - previously this year we were still using a
>>  JK2 isapi_redirector2.dll, and that seemed to be serving
>> comparable traffic rates with fewer errors (certainly no complete
>> failures). No hard data to support this yet, just my recollection
>> of serious outages over the past couple of years.
> 
> I think, we should make a distinction between the number of log
> messages (here we simply might be more detailed with JK) and serious
> problems, like the container no longer responding, or responding to
> slowly.
> 
>> AWStats on our log files suggests our incident traffic is ~7
>> million pages per month, peaking at lunchtime & early evening at
>> perhaps 3-5 reqs/sec.
> 
> That's not a lot of traffic. What are average response times? Is it 
> usual webapp load, or very special use cases, like long running
> uploads or downloads?
> 
>> Scaling to multiple tomcats is not an option right now due to 3rd 
>> party license costs in the webapp (its a CMS system).
> 
> The request numbers seem not to support scaling horicontally like an 
> option, that you should consoder already (except request handling is 
> very CPU intensive, or you need a lot of memory, or ...).
> 
>> Our environment: Java 1.3.1 Tomcat 4.1.18 IIS v5 IIS & Tomcat are 
>> co-located on same server  (4GB RAM, win2k o/s)
> 
> Ooops. I'm not really sure about the behaviour of 1.3 fopr thread
> dumps. It's fine for 1.4.2, but you should test in a stagi8ng or dev
> system, what happens with 1.3. Consider updateing to 4.1.36 and if
> possible 1.4.2_some_recent_patch_level.
> 
>> Questions:
>> 
>> - Are there obvious worker directives that would help the issue 
>> further ? - In the list archives I've seen conflicting views on
>> what to set connectionTimeout to be in the tomcat and worker
>> config. Some say 0, some say 600 secs. Which tends to be more
>> useful? - All of the 00002745 errors - do they indicate a network
>> problem upstream of the server? - When viewing the jkstatus page,
>> the worker only shows type, host, address. I was expecting further
>> data as listed in the legend. Am I missing something?
> 
> 2746: see above. I would not expect any worker setting to help in
> case the root cause are really long running requests. Then you would
> really have to log request duration and do a couple of thread dumps,
> to find out, which requests are running to long for which reason.
> 
> jkstatus: add a load balancer worker to your ajp13 worker and use the
>  load balancer as the worker you map. The load balancer does a lot of
>  statistics and shows all the detailed information in jkstatus.
> Because of its managability a load balancer is interesting, even if
> you have only one backend.
> 
>> isapi log:
>> 
>> [Fri Dec 07 03:35:16 2007] [error] jk_isapi_plugin.c (639): 
>> WriteClient failed with 00002745 [Fri Dec 07 03:35:16 2007] [info] 
>> jk_ajp_common.c (1384): Connection aborted or network problems [Fri
>>  Dec 07 03:35:16 2007] [info]  jk_ajp_common.c (1731): Receiving
>> from tomcat failed, because of client error without recovery in
>> send loop 0 [Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c
>> (639): WriteClient failed with 00002746 [Fri Dec 07 03:35:17 2007]
>> [error] jk_isapi_plugin.c (639): WriteClient failed with 00002746
>> [Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384):
>> Connection aborted or network problems [Fri Dec 07 03:35:17 2007]
>> [info]  jk_ajp_common.c (1384): Connection aborted or network
>> problems [Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1731):
>> Receiving from tomcat failed, because of client error without
>> recovery in send loop 0 [Fri Dec 07 03:35:17 2007] [error]
>> jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec
>> 07 03:35:17 2007] [info]  jk_ajp_common.c (1731): Receiving from
>> tomcat failed, because of client error without recovery in send
>> loop 0 [Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639):
>> WriteClient failed with 00002746 [Fri Dec 07 03:35:17 2007] [info]
>> jk_ajp_common.c (1384): Connection aborted or network problems [Fri
>> Dec 07 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection
>> aborted or network problems [Fri Dec 07 03:35:17 2007] [info]
>> jk_ajp_common.c (1731): Receiving from tomcat failed, because of
>> client error without recovery in send loop 0 [Fri Dec 07 03:35:17
>> 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat failed,
>> because of client error without recovery in send loop 0 [Fri Dec 07
>> 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient failed
>> with 00002746 [Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c
>> (639): WriteClient failed with 00002746 [Fri Dec 07 03:35:17 2007]
>> [info]  jk_ajp_common.c (1384): Connection aborted or network
>> problems
>> 
>> Out Tomcat connector config -
>> 
>> <Connector className="org.apache.coyote.tomcat4.CoyoteConnector" 
>> redirectPort="8443" bufferSize="2048" port="8009" 
>> connectionTimeout="300000" scheme="http" enableLookups="false" 
>> secure="false" 
>> protocolHandlerClassName="org.apache.jk.server.JkCoyoteHandler" 
>> debug="0" disableUploadTimeout="false" proxyPort="0" 
>> maxProcessors="200" minProcessors="2" tcpNoDelay="true" 
>> acceptCount="20" useURIValidationHack="false"> <Factory 
>> className="org.apache.catalina.net.DefaultServerSocketFactory"/> 
>> </Connector>
>> 
>> worker.properties -
>> 
>> worker.website.type=ajp13 worker.website.host=localhost 
>> worker.website.port=8009 # 200 concurrent users 
>> worker.website.connection_pool_size=200 
>> worker.website.connection_pool_timeout=300
> 
> Regards,
> 
> Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

RE: ISAPI JK2 ran better than JK, how can that be?

Posted by ti...@bt.com.

Hi  Rainer, 

Thanks for the response. To cover a few points you made -

- Yes, I had a hunch long running requests are a problem; because of our appliction design, some pages invoked for the first time take a while (we can't cache them all!).  Is there an easy way to correlate (apart from timestamp) the errors in the isapi and the requests made to IIS ?  I mean can I get isapi log to show the URL being processed?
- I've now got the %D option in place on Tomcat to figure out from tomorrow which are the heavy pages
- Yes, thread dumps on JDK 1.3 & TC4.1.x are tricky - I'm looking at the Tomcat JavaWrapper approach as a way forward. The version of the 3rd party product we have in place is only supported on jdk1.3  (this is a pretty ancient set up!)
- I agree that my incident traffic load is not huge, and should be supportable by the environment in place.
- I'll try a load balancer worker to see if that tells me more info

If possible I'll have some more information in a day or so from this...

cheers

Tim

________________________________

From: Rainer Jung [mailto:rainer.jung@kippdata.de]
Sent: Mon 10/12/2007 12:11
To: Tomcat Users List
Subject: Re: ISAPI JK2 ran better than JK, how can that be?

Hi Tim,

tim.fulcher@bt.com wrote:
> OK,
>
> So our website keeps crashing over the past couple of weeks (usual
> story on this list eh?)

Not really (although a users list is always focused on problems and not
on the working side of things ...)

> We've been running JK isapi plugin v1.2.15 for a fair while, but the
> isapi redirector log always contains huge numbers of errors being
> thrown (see snippet below). We were getting a complete failure of IIS
> to serve traffic, solved only by a restart of IIS and Tomcat.
>
> Very recently we moved up to v1.2.25 in the hope of improving
> performance but it seems to have little effect- we're still getting
> high numbers of 503 responses sent back (maybe 5+ per minute). We do
> however now serve static resources from IIS to reduce the use of the
> ISAPI calls where possible.

The error number 2746 is hex for 10054, which is a connection reset by
peer winsock error. peer in this case is your IIS client (browser etc.).

Often this is caused by long running requests, where the users press the
  retry/again button. Then the browser immediately closes the connection
and uses a new one for the same request. When you are sending back the
response later, the closed connection gets detected and logged.

You should configure logging of response durations to find out, if maybe
you've got a problem with long running requests. You can do that using
the JK request logging, or with the Tomcat access log (add format %D to
your pattern, which is duration milliseconds).

Usually this does *not* mean, that restarting IIS or Tomcat will help.

Concerning Tomcat: you should do a couple of thread dumps before
restarting it. That way you can find out, if lots of requests got stuck
inside the container, and if so, what they are actually doing or waiting
for.

Concerning IIS: does "netstat -an" look fine, once you think you need to
restart?

> But here's the kicker: - previously this year we were still using a
> JK2 isapi_redirector2.dll, and that seemed to be serving comparable
> traffic rates with fewer errors (certainly no complete failures). No
> hard data to support this yet, just my recollection of serious
> outages over the past couple of years.

I think, we should make a distinction between the number of log messages
(here we simply might be more detailed with JK) and serious problems,
like the container no longer responding, or responding to slowly.

> AWStats on our log files suggests our incident traffic is ~7 million
> pages per month, peaking at lunchtime & early evening at perhaps 3-5
> reqs/sec.

That's not a lot of traffic. What are average response times? Is it
usual webapp load, or very special use cases, like long running uploads
or downloads?

> Scaling to multiple tomcats is not an option right now due to 3rd
> party license costs in the webapp (its a CMS system).

The request numbers seem not to support scaling horicontally like an
option, that you should consoder already (except request handling is
very CPU intensive, or you need a lot of memory, or ...).

> Our environment: Java 1.3.1 Tomcat 4.1.18 IIS v5 IIS & Tomcat are
> co-located on same server  (4GB RAM, win2k o/s)

Ooops. I'm not really sure about the behaviour of 1.3 fopr thread dumps.
It's fine for 1.4.2, but you should test in a stagi8ng or dev system,
what happens with 1.3. Consider updateing to 4.1.36 and if possible
1.4.2_some_recent_patch_level.

> Questions:
>
> - Are there obvious worker directives that would help the issue
> further ? - In the list archives I've seen conflicting views on what
> to set connectionTimeout to be in the tomcat and worker config. Some
> say 0, some say 600 secs. Which tends to be more useful? - All of the
> 00002745 errors - do they indicate a network problem upstream of the
> server? - When viewing the jkstatus page, the worker only shows type,
> host, address. I was expecting further data as listed in the legend.
> Am I missing something?

2746: see above. I would not expect any worker setting to help in case
the root cause are really long running requests. Then you would really
have to log request duration and do a couple of thread dumps, to find
out, which requests are running to long for which reason.

jkstatus: add a load balancer worker to your ajp13 worker and use the
load balancer as the worker you map. The load balancer does a lot of
statistics and shows all the detailed information in jkstatus. Because
of its managability a load balancer is interesting, even if you have
only one backend.

> isapi log:
>
> [Fri Dec 07 03:35:16 2007] [error] jk_isapi_plugin.c (639):
> WriteClient failed with 00002745 [Fri Dec 07 03:35:16 2007] [info]
> jk_ajp_common.c (1384): Connection aborted or network problems [Fri
> Dec 07 03:35:16 2007] [info]  jk_ajp_common.c (1731): Receiving from
> tomcat failed, because of client error without recovery in send loop
> 0 [Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639):
> WriteClient failed with 00002746 [Fri Dec 07 03:35:17 2007] [error]
> jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07
> 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection aborted or
> network problems [Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c
> (1384): Connection aborted or network problems [Fri Dec 07 03:35:17
> 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat failed,
> because of client error without recovery in send loop 0 [Fri Dec 07
> 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient failed
> with 00002746 [Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c
> (1731): Receiving from tomcat failed, because of client error without
> recovery in send loop 0 [Fri Dec 07 03:35:17 2007] [error]
> jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07
> 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection aborted or
> network problems [Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c
> (1384): Connection aborted or network problems [Fri Dec 07 03:35:17
> 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat failed,
> because of client error without recovery in send loop 0 [Fri Dec 07
> 03:35:17 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat
> failed, because of client error without recovery in send loop 0 [Fri
> Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient
> failed with 00002746 [Fri Dec 07 03:35:17 2007] [error]
> jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07
> 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection aborted or
> network problems
>
> Out Tomcat connector config -
>
> <Connector className="org.apache.coyote.tomcat4.CoyoteConnector"
> redirectPort="8443" bufferSize="2048" port="8009"
> connectionTimeout="300000" scheme="http" enableLookups="false"
> secure="false"
> protocolHandlerClassName="org.apache.jk.server.JkCoyoteHandler"
> debug="0" disableUploadTimeout="false" proxyPort="0"
> maxProcessors="200" minProcessors="2" tcpNoDelay="true"
> acceptCount="20" useURIValidationHack="false"> <Factory
> className="org.apache.catalina.net.DefaultServerSocketFactory"/>
> </Connector>
>
> worker.properties -
>
> worker.website.type=ajp13 worker.website.host=localhost
> worker.website.port=8009 # 200 concurrent users
> worker.website.connection_pool_size=200
> worker.website.connection_pool_timeout=300

Regards,

Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: ISAPI JK2 ran better than JK, how can that be?

Posted by Rainer Jung <ra...@kippdata.de>.

Hi Tim,

tim.fulcher@bt.com wrote:
> OK,
> 
> So our website keeps crashing over the past couple of weeks (usual
> story on this list eh?)

Not really (although a users list is always focused on problems and not 
on the working side of things ...)

> We've been running JK isapi plugin v1.2.15 for a fair while, but the
> isapi redirector log always contains huge numbers of errors being
> thrown (see snippet below). We were getting a complete failure of IIS
> to serve traffic, solved only by a restart of IIS and Tomcat.
>
> Very recently we moved up to v1.2.25 in the hope of improving
> performance but it seems to have little effect- we're still getting
> high numbers of 503 responses sent back (maybe 5+ per minute). We do
> however now serve static resources from IIS to reduce the use of the
> ISAPI calls where possible.

The error number 2746 is hex for 10054, which is a connection reset by 
peer winsock error. peer in this case is your IIS client (browser etc.).

Often this is caused by long running requests, where the users press the 
  retry/again button. Then the browser immediately closes the connection 
and uses a new one for the same request. When you are sending back the 
response later, the closed connection gets detected and logged.

You should configure logging of response durations to find out, if maybe 
you've got a problem with long running requests. You can do that using 
the JK request logging, or with the Tomcat access log (add format %D to 
your pattern, which is duration milliseconds).

Usually this does *not* mean, that restarting IIS or Tomcat will help.

Concerning Tomcat: you should do a couple of thread dumps before 
restarting it. That way you can find out, if lots of requests got stuck 
inside the container, and if so, what they are actually doing or waiting 
for.

Concerning IIS: does "netstat -an" look fine, once you think you need to 
restart?

> But here's the kicker: - previously this year we were still using a
> JK2 isapi_redirector2.dll, and that seemed to be serving comparable
> traffic rates with fewer errors (certainly no complete failures). No
> hard data to support this yet, just my recollection of serious
> outages over the past couple of years.

I think, we should make a distinction between the number of log messages 
(here we simply might be more detailed with JK) and serious problems, 
like the container no longer responding, or responding to slowly.

> AWStats on our log files suggests our incident traffic is ~7 million
> pages per month, peaking at lunchtime & early evening at perhaps 3-5
> reqs/sec.

That's not a lot of traffic. What are average response times? Is it 
usual webapp load, or very special use cases, like long running uploads 
or downloads?

> Scaling to multiple tomcats is not an option right now due to 3rd
> party license costs in the webapp (its a CMS system).

The request numbers seem not to support scaling horicontally like an 
option, that you should consoder already (except request handling is 
very CPU intensive, or you need a lot of memory, or ...).

> Our environment: Java 1.3.1 Tomcat 4.1.18 IIS v5 IIS & Tomcat are
> co-located on same server  (4GB RAM, win2k o/s)

Ooops. I'm not really sure about the behaviour of 1.3 fopr thread dumps. 
It's fine for 1.4.2, but you should test in a stagi8ng or dev system, 
what happens with 1.3. Consider updateing to 4.1.36 and if possible 
1.4.2_some_recent_patch_level.

> Questions:
> 
> - Are there obvious worker directives that would help the issue
> further ? - In the list archives I've seen conflicting views on what
> to set connectionTimeout to be in the tomcat and worker config. Some
> say 0, some say 600 secs. Which tends to be more useful? - All of the
> 00002745 errors - do they indicate a network problem upstream of the
> server? - When viewing the jkstatus page, the worker only shows type,
> host, address. I was expecting further data as listed in the legend.
> Am I missing something?

2746: see above. I would not expect any worker setting to help in case 
the root cause are really long running requests. Then you would really 
have to log request duration and do a couple of thread dumps, to find 
out, which requests are running to long for which reason.

jkstatus: add a load balancer worker to your ajp13 worker and use the 
load balancer as the worker you map. The load balancer does a lot of 
statistics and shows all the detailed information in jkstatus. Because 
of its managability a load balancer is interesting, even if you have 
only one backend.

> isapi log:
> 
> [Fri Dec 07 03:35:16 2007] [error] jk_isapi_plugin.c (639):
> WriteClient failed with 00002745 [Fri Dec 07 03:35:16 2007] [info]
> jk_ajp_common.c (1384): Connection aborted or network problems [Fri
> Dec 07 03:35:16 2007] [info]  jk_ajp_common.c (1731): Receiving from
> tomcat failed, because of client error without recovery in send loop
> 0 [Fri Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639):
> WriteClient failed with 00002746 [Fri Dec 07 03:35:17 2007] [error]
> jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07
> 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection aborted or
> network problems [Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c
> (1384): Connection aborted or network problems [Fri Dec 07 03:35:17
> 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat failed,
> because of client error without recovery in send loop 0 [Fri Dec 07
> 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient failed
> with 00002746 [Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c
> (1731): Receiving from tomcat failed, because of client error without
> recovery in send loop 0 [Fri Dec 07 03:35:17 2007] [error]
> jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07
> 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection aborted or
> network problems [Fri Dec 07 03:35:17 2007] [info]  jk_ajp_common.c
> (1384): Connection aborted or network problems [Fri Dec 07 03:35:17
> 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat failed,
> because of client error without recovery in send loop 0 [Fri Dec 07
> 03:35:17 2007] [info]  jk_ajp_common.c (1731): Receiving from tomcat
> failed, because of client error without recovery in send loop 0 [Fri
> Dec 07 03:35:17 2007] [error] jk_isapi_plugin.c (639): WriteClient
> failed with 00002746 [Fri Dec 07 03:35:17 2007] [error]
> jk_isapi_plugin.c (639): WriteClient failed with 00002746 [Fri Dec 07
> 03:35:17 2007] [info]  jk_ajp_common.c (1384): Connection aborted or
> network problems
> 
> Out Tomcat connector config -
> 
> <Connector className="org.apache.coyote.tomcat4.CoyoteConnector"
> redirectPort="8443" bufferSize="2048" port="8009"
> connectionTimeout="300000" scheme="http" enableLookups="false"
> secure="false"
> protocolHandlerClassName="org.apache.jk.server.JkCoyoteHandler"
> debug="0" disableUploadTimeout="false" proxyPort="0"
> maxProcessors="200" minProcessors="2" tcpNoDelay="true"
> acceptCount="20" useURIValidationHack="false"> <Factory
> className="org.apache.catalina.net.DefaultServerSocketFactory"/> 
> </Connector>
> 
> worker.properties -
> 
> worker.website.type=ajp13 worker.website.host=localhost 
> worker.website.port=8009 # 200 concurrent users 
> worker.website.connection_pool_size=200 
> worker.website.connection_pool_timeout=300

Regards,

Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org