You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Dmitry Beransky <dm...@gmail.com> on 2009/05/01 18:19:54 UTC

jk-to-tomcat multiple retries

Hi,

We have the strangest problem started happening to us a few weeks ago
(after several years of running pretty much the same configuration).

1. The problem is only happening in the production environment.  We
cannot reproduce it on staging, which as far as we can tell is
configured identically to production.
2. The problem seems to be tied to the traffic volume and possible
pattern (Hence probably why we cannot reproduce it on staging).
3. Our configuration:  W2K server running IIS 6 and JK 1.2.27, Tomcat
v. 5.5.12 running on the same box.

The problem is as follows: some requests would result in multiple
copies of the first buffer-full of expected data, ended either by a
503 error page data, a 502 error page data, or the rest of the proper
page.  The number of copies is directly related to the JK's number of
retries setting.  With the number of retries initially being set to
10, the maximum number of repeated copies in the response was 20.
When we set the number of retries to 2, invalid replies contain only a
single buffer-full of the page's proper data followed by an error page
data.  On the Tomcat side, these multiple copies show up as multiple
request entries in the access log, while there is only one
corresponding request entry in the IIS log.

After a fresh restart of Tomcat, it takes a little while for this
problem to start manifesting itself.  With time, it is starting to
affect increasingly more requests until finally Tomcat gets entirely
locked up.  At that point Tomcat needs to be restarted.... lather,
rinse, repeat...

Here's what a sample of error messages in JK's log looks like:

[2760:2476] [error] jk_isapi_plugin.c (1199): WriteClient failed with
10053 (0x00002745)
[2760:4020] [error] jk_ajp_common.c (1726): Chunk length too large.
Length of AJP message is 8188, chunk length is 8192.
[2760:4020] [error] jk_ajp_common.c (2426): (default_1) connecting to
tomcat failed.
[2760:1104] [error] jk_ajp_common.c (1726): Chunk length too large.
Length of AJP message is 8188, chunk length is 8192.
[2760:1104] [error] jk_ajp_common.c (2426): (default_1) connecting to
tomcat failed.
[2760:1104] [error] jk_ajp_common.c (1726): Chunk length too large.
Length of AJP message is 8188, chunk length is 8192.
[2760:1104] [error] jk_ajp_common.c (2426): (default_1) connecting to
tomcat failed.
[2760:1104] [error] jk_lb_worker.c (1432): All tomcat instances
failed, no more workers left
[2760:1104] [error] jk_isapi_plugin.c (2199): service() failed with
http error 503
[2760:3876] [error] jk_ajp_common.c (1726): Chunk length too large.
Length of AJP message is 8188, chunk length is 8192.
[2760:3876] [error] jk_ajp_common.c (2426): (default_1) connecting to
tomcat failed.
[2760:3232] [error] jk_ajp_common.c (1726): Chunk length too large.
Length of AJP message is 8188, chunk length is 8192.
[2760:3232] [error] jk_ajp_common.c (2426): (default_1) connecting to
tomcat failed.
[2760:3232] [error] jk_ajp_common.c (1726): Chunk length too large.
Length of AJP message is 8188, chunk length is 8192.
[2760:3232] [error] jk_ajp_common.c (2426): (default_1) connecting to
tomcat failed.
[2760:3232] [error] jk_lb_worker.c (1432): All tomcat instances
failed, no more workers left
[2760:3232] [error] jk_isapi_plugin.c (2199): service() failed with
http error 503

The Chunk length messages have been in our logs forever.  Yesterday I
temporarily changed JK & Tomcat configuration matching the packet
sizes.  The chunk errors went away, but the problem seemed to persist,
so I put everything back the way it was.

To me this look likes some weird error condition in Tomcat has hit an
obscure bug in JK whereby it doesn't clear the response buffer between
retries.  Has anyone encountered this issue before or is just willing
to land a helping hand in troubleshooting?


Thanks
Dmitry

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: jk-to-tomcat multiple retries

Posted by Rainer Jung <ra...@kippdata.de>.
Thanks for letting us know.

On 06.05.2009 19:32, Dmitry Beransky wrote:
> We were finally allowed to upgrade to Tomcat 5.5.27 and that seemed to
> have done away with the symptoms (I'm reluctant so say that upgrading
> fixed the problem, since I don't even know what it was in the first
> place ;-)
> 
> Thanks for the help, everyone.
> d.
> 
>> The chunk length message seems pretty weird. Looks like a protocol
>> corruption. Those indicate, that you should really try a TC update.
>> Concerining your restriction "can't update before any other options are
>> exhausted": there will never be any other options exhausted. But after
>> some options are taken, the rest get more and more expensive, risky and
>> with a low chance of success.
>>
>>> To me this look likes some weird error condition in Tomcat has hit an
>>> obscure bug in JK whereby it doesn't clear the response buffer between
>>> retries.  Has anyone encountered this issue before or is just willing
>>> to land a helping hand in troubleshooting?
>> Not encountered this before, and I think noone reported a similar
>> observation. Concerning "retries": Could you provide your full
>> configuration (e.g. retries for an ajp13 worker is something very
>> different from retries of a load balancer worker).
>>
>> Regards,
>>
>> Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: jk-to-tomcat multiple retries

Posted by Dmitry Beransky <dm...@gmail.com>.
We were finally allowed to upgrade to Tomcat 5.5.27 and that seemed to
have done away with the symptoms (I'm reluctant so say that upgrading
fixed the problem, since I don't even know what it was in the first
place ;-)

Thanks for the help, everyone.
d.

> The chunk length message seems pretty weird. Looks like a protocol
> corruption. Those indicate, that you should really try a TC update.
> Concerining your restriction "can't update before any other options are
> exhausted": there will never be any other options exhausted. But after
> some options are taken, the rest get more and more expensive, risky and
> with a low chance of success.
>
>> To me this look likes some weird error condition in Tomcat has hit an
>> obscure bug in JK whereby it doesn't clear the response buffer between
>> retries.  Has anyone encountered this issue before or is just willing
>> to land a helping hand in troubleshooting?
>
> Not encountered this before, and I think noone reported a similar
> observation. Concerning "retries": Could you provide your full
> configuration (e.g. retries for an ajp13 worker is something very
> different from retries of a load balancer worker).
>
> Regards,
>
> Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: jk-to-tomcat multiple retries

Posted by Rainer Jung <ra...@kippdata.de>.
On 01.05.2009 18:19, Dmitry Beransky wrote:
> We have the strangest problem started happening to us a few weeks ago
> (after several years of running pretty much the same configuration).
> 
> 1. The problem is only happening in the production environment.  We
> cannot reproduce it on staging, which as far as we can tell is
> configured identically to production.
> 2. The problem seems to be tied to the traffic volume and possible
> pattern (Hence probably why we cannot reproduce it on staging).
> 3. Our configuration:  W2K server running IIS 6 and JK 1.2.27, Tomcat
> v. 5.5.12 running on the same box.
> 
> The problem is as follows: some requests would result in multiple
> copies of the first buffer-full of expected data, ended either by a
> 503 error page data, a 502 error page data, or the rest of the proper
> page.  The number of copies is directly related to the JK's number of
> retries setting.  With the number of retries initially being set to
> 10, the maximum number of repeated copies in the response was 20.
> When we set the number of retries to 2, invalid replies contain only a
> single buffer-full of the page's proper data followed by an error page
> data.  On the Tomcat side, these multiple copies show up as multiple
> request entries in the access log, while there is only one
> corresponding request entry in the IIS log.
> 
> After a fresh restart of Tomcat, it takes a little while for this
> problem to start manifesting itself.  With time, it is starting to
> affect increasingly more requests until finally Tomcat gets entirely
> locked up.  At that point Tomcat needs to be restarted.... lather,
> rinse, repeat...
> 
> Here's what a sample of error messages in JK's log looks like:
> 
> [2760:2476] [error] jk_isapi_plugin.c (1199): WriteClient failed with
> 10053 (0x00002745)
> [2760:4020] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:4020] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:1104] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:1104] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:1104] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:1104] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:1104] [error] jk_lb_worker.c (1432): All tomcat instances
> failed, no more workers left
> [2760:1104] [error] jk_isapi_plugin.c (2199): service() failed with
> http error 503
> [2760:3876] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:3876] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:3232] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:3232] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:3232] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:3232] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:3232] [error] jk_lb_worker.c (1432): All tomcat instances
> failed, no more workers left
> [2760:3232] [error] jk_isapi_plugin.c (2199): service() failed with
> http error 503
> 
> The Chunk length messages have been in our logs forever.  Yesterday I
> temporarily changed JK & Tomcat configuration matching the packet
> sizes.  The chunk errors went away, but the problem seemed to persist,
> so I put everything back the way it was.

The chunk length message seems pretty weird. Looks like a protocol
corruption. Those indicate, that you should really try a TC update.
Concerining your restriction "can't update before any other options are
exhausted": there will never be any other options exhausted. But after
some options are taken, the rest get more and more expensive, risky and
with a low chance of success.

> To me this look likes some weird error condition in Tomcat has hit an
> obscure bug in JK whereby it doesn't clear the response buffer between
> retries.  Has anyone encountered this issue before or is just willing
> to land a helping hand in troubleshooting?

Not encountered this before, and I think noone reported a similar
observation. Concerning "retries": Could you provide your full
configuration (e.g. retries for an ajp13 worker is something very
different from retries of a load balancer worker).

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: jk-to-tomcat multiple retries

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dmitry,

On 5/1/2009 4:18 PM, Dmitry Beransky wrote:
> Hence, I'm asking here for help
> diagnosing the problem, but if everyone else is as stumped as I am
> then, sure, the next (reluctant) step would be to upgrade Tomcat in
> hope that it will go away (but as we all know, hope is not a strategy
> :-)

Yeah, I can't really lend any specific advice. If it's any consolation,
the upgrade should be relatively painless. There are only a few things
that might break - mostly due to changes in default security settings like

You should carefully read-through
http://tomcat.apache.org/tomcat-5.5-doc/changelog.html
and
http://tomcat.apache.org/security-5.html
for changes since your version.

Specifically, 5.5.22 contains a change to URL encoding that may affect you.

Good luck,
- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkn7XScACgkQ9CaO5/Lv0PDC5ACcCiyJe2+rxOL0qdNAz/HYZrTn
qNAAn2nhVSyxBmxykagoVQBnUQD3WZsa
=etyG
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: jk-to-tomcat multiple retries

Posted by Dmitry Beransky <dm...@gmail.com>.
> If you suspect a bug in Tomcat, wouldn't upgrading to a version of
> Tomcat where that bug had potentially been fixed be a good strategy?

True, but the problem is that I don't have enough data to make a
decision.  For all I know, there might be something really bad in our
code that's triggering this weirdness, something that can be fixed
with a 5 minute change. I would like figure this out before before
just "reinstalling Windows" :)  Hence, I'm asking here for help
diagnosing the problem, but if everyone else is as stumped as I am
then, sure, the next (reluctant) step would be to upgrade Tomcat in
hope that it will go away (but as we all know, hope is not a strategy
:-)

> Is there a way for you to re-play activity from your
> production logs against a staging server to generate the proper load
> and/or usage profile?

We've tried exactly that with JMeter, but the problem didn't surface.


D.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: jk-to-tomcat multiple retries

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dmitry,

On 5/1/2009 12:37 PM, Dmitry Beransky wrote:
>> Upgrade that Tomcat, it is *years* old - 5.5.27 is the latest.
>> See if it still does it then.
> 
> Unfortunately, the pesky reality is that I would not be permitted to
> do such an upgrade to our entire infrastructure until I can show that
> all other options have been exhausted.

If you suspect a bug in Tomcat, wouldn't upgrading to a version of
Tomcat where that bug had potentially been fixed be a good strategy?

It's really too bad you can't reproduce the error in your staging
environment. Is there a way for you to re-play activity from your
production logs against a staging server to generate the proper load
and/or usage profile?

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkn7NzEACgkQ9CaO5/Lv0PCOXgCgrdYdI24T5mcNzqaWKOEQp5Sp
NYgAoIu62TY/6VtrNZJBaNoHC+/3BkMm
=Ioh3
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: jk-to-tomcat multiple retries

Posted by "Caldarale, Charles R" <Ch...@unisys.com>.
> From: Dmitry Beransky [mailto:dmitry.maven@gmail.com]
> Subject: Re: jk-to-tomcat multiple retries
> 
> Unfortunately, the pesky reality is that I would not be permitted to
> do such an upgrade to our entire infrastructure until I can show that
> all other options have been exhausted.

This reminds me of the opening line to "Still Crazy":

"History teaches us that men behave wisely... once they've exhausted all other alternatives."

Which is actually a variation on a quote from Churchill.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: jk-to-tomcat multiple retries

Posted by Dmitry Beransky <dm...@gmail.com>.
> Upgrade that Tomcat, it is *years* old - 5.5.27 is the latest.
> See if it still does it then.

Unfortunately, the pesky reality is that I would not be permitted to
do such an upgrade to our entire infrastructure until I can show that
all other options have been exhausted.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: jk-to-tomcat multiple retries

Posted by Pid <p...@pidster.com>.
Dmitry Beransky wrote:
> Hi,
> 
> We have the strangest problem started happening to us a few weeks ago
> (after several years of running pretty much the same configuration).
> 
> 1. The problem is only happening in the production environment.  We
> cannot reproduce it on staging, which as far as we can tell is
> configured identically to production.
> 2. The problem seems to be tied to the traffic volume and possible
> pattern (Hence probably why we cannot reproduce it on staging).
> 3. Our configuration:  W2K server running IIS 6 and JK 1.2.27, Tomcat
> v. 5.5.12 running on the same box.

Upgrade that Tomcat, it is *years* old - 5.5.27 is the latest.
See if it still does it then.

p


> The problem is as follows: some requests would result in multiple
> copies of the first buffer-full of expected data, ended either by a
> 503 error page data, a 502 error page data, or the rest of the proper
> page.  The number of copies is directly related to the JK's number of
> retries setting.  With the number of retries initially being set to
> 10, the maximum number of repeated copies in the response was 20.
> When we set the number of retries to 2, invalid replies contain only a
> single buffer-full of the page's proper data followed by an error page
> data.  On the Tomcat side, these multiple copies show up as multiple
> request entries in the access log, while there is only one
> corresponding request entry in the IIS log.
> 
> After a fresh restart of Tomcat, it takes a little while for this
> problem to start manifesting itself.  With time, it is starting to
> affect increasingly more requests until finally Tomcat gets entirely
> locked up.  At that point Tomcat needs to be restarted.... lather,
> rinse, repeat...
> 
> Here's what a sample of error messages in JK's log looks like:
> 
> [2760:2476] [error] jk_isapi_plugin.c (1199): WriteClient failed with
> 10053 (0x00002745)
> [2760:4020] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:4020] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:1104] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:1104] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:1104] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:1104] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:1104] [error] jk_lb_worker.c (1432): All tomcat instances
> failed, no more workers left
> [2760:1104] [error] jk_isapi_plugin.c (2199): service() failed with
> http error 503
> [2760:3876] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:3876] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:3232] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:3232] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:3232] [error] jk_ajp_common.c (1726): Chunk length too large.
> Length of AJP message is 8188, chunk length is 8192.
> [2760:3232] [error] jk_ajp_common.c (2426): (default_1) connecting to
> tomcat failed.
> [2760:3232] [error] jk_lb_worker.c (1432): All tomcat instances
> failed, no more workers left
> [2760:3232] [error] jk_isapi_plugin.c (2199): service() failed with
> http error 503
> 
> The Chunk length messages have been in our logs forever.  Yesterday I
> temporarily changed JK & Tomcat configuration matching the packet
> sizes.  The chunk errors went away, but the problem seemed to persist,
> so I put everything back the way it was.
> 
> To me this look likes some weird error condition in Tomcat has hit an
> obscure bug in JK whereby it doesn't clear the response buffer between
> retries.  Has anyone encountered this issue before or is just willing
> to land a helping hand in troubleshooting?
> 
> 
> Thanks
> Dmitry
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org