You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by St...@fiducia.de on 2011/09/19 10:49:43 UTC

mod_jk doesn`t distribute and failover on tomcat-error

Hello Users,

we have a problem in one of our environments.

If one out of 6 balanced Tomcat-Server, throws an OutOfMemory-Error, mod_jk
doestn´t distribute any Reqeuest to the other Servers, until we restart the
faultet Server or stop the Server via jkstatus.

Here is a sample of the mod_jk-Logfile, while our Server1 got an
OutOfMemoryError. From 10:15 to 10:22 non of the other Servers was
distributet by mod_jk, even tough they had no problem at all.

.
.
.
[Mon Sep 05 10:14:55 2011]
POST /jbf/servlet/SME-Service;jsessionid=3279D33AC494E9F6B82FC98729718837.Server4
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.048129
[Mon Sep 05 10:14:55 2011]
POST /jbf/servlet/SME-Service;jsessionid=BF8A94149A1B57FF2635450DEA135FA5.Server3
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.330673
[Mon Sep 05 10:14:55 2011]
POST /jbf/servlet/SME-Service;jsessionid=04B28E9048CF8F5732D5131CEE0EC1C0.Server4
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.130111
[Mon Sep 05 10:14:55 2011] POST /jbf/servlet/SME-Service HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.012367
[Mon Sep 05 10:14:55 2011]
POST /jbf/servlet/SME-Service;jsessionid=36A071BE7AFDBB5102E442F73C46C705.Server6
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.017777
[Mon Sep 05 10:14:55 2011]
POST /jbf/servlet/SME-Service;jsessionid=3279D33AC494E9F6B82FC98729718837.Server4
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.041176
[Mon Sep 05 10:14:55 2011]
POST /jbf/servlet/SME-Service;jsessionid=36A1A33494BE681EE2460D8FC8A175CC.Server4
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.211262
[Mon Sep 05 10:14:55 2011] POST /jbf/servlet/SME-Service HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.012553
[Mon Sep 05 10:14:55 2011]
POST /jbf/servlet/SME-Service;jsessionid=A177A2D694F7B98F1F9C89F1C78D5DA9.Server6
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.017297
[Mon Sep 05 10:14:55 2011]
POST /jbf/servlet/SME-Service;jsessionid=3279D33AC494E9F6B82FC98729718837.Server4
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.086294
[Mon Sep 05 10:14:55 2011] POST /jbf/servlet/SME-Service HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.012679
[Mon Sep 05 10:14:56 2011]
POST /jbf/servlet/SME-Service;jsessionid=1E9233F828A7B069E824E9B6C5ADE276.Server6
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.534394
[Mon Sep 05 10:14:56 2011]
POST /jbf/servlet/SME-Service;jsessionid=139500E8FAD13B9F1C69D0A7C09CBA52.Server3
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.576801
[Mon Sep 05 10:14:56 2011]
POST /jbf/servlet/SME-Service;jsessionid=854053063286D4D8628294EF40CE7899.Server2
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 17.076338
[Mon Sep 05 10:14:56 2011]
POST /jbf/servlet/SME-Service;jsessionid=36A1A33494BE681EE2460D8FC8A175CC.Server4
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.066612
[Mon Sep 05 10:14:56 2011] POST /jbf/servlet/SME-Service HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.012472
[Mon Sep 05 10:14:56 2011]
POST /jbf/servlet/SME-Service;jsessionid=854964D0A5D4947AA8D669ACE3FDFD4F.Server4
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.038900
[Mon Sep 05 10:14:57 2011]
POST /jbf/servlet/SME-Service;jsessionid=D4B3F0B6D0B569A71C9604D879C0A238.Server6
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 1.471433
[Mon Sep 05 10:14:57 2011]
POST /jbf/servlet/SME-Service;jsessionid=CFD905CD3F369504A4FDD84B5ACE7C7E.Server2
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.173437
[Mon Sep 05 10:14:57 2011] POST /jbf/servlet/SME-Service HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.012598
[Mon Sep 05 10:14:57 2011]
POST /jbf/servlet/SME-Service;jsessionid=D68B10A1B7B6383998FED523AEAC736D.Server3
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.044001
[Mon Sep 05 10:14:57 2011]
POST /jbf/servlet/SME-Service;jsessionid=7DE4895920959C15037723BB615DBD3F.Server5
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.046596
[Mon Sep 05 10:14:57 2011]
POST /jbf/servlet/SME-Service;jsessionid=A0865023784DEEAFE5D0552DB4CD90E3.Server6
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.045351
[Mon Sep 05 10:14:57 2011]
POST /jbf/servlet/SME-Service;jsessionid=3944C465202D852B098BFEA1B8A38F34.Server2
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.088592
[Mon Sep 05 10:14:58 2011]
POST /jbf/servlet/SME-Service;jsessionid=26D9A00A5E3D45C75C9549381E8CA547.Server6
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.042704
[Mon Sep 05 10:14:58 2011]
POST /jbf/servlet/SME-Service;jsessionid=0E92920CC85581BAD769B0576B9104E2.Server6
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.108742
[Mon Sep 05 10:14:58 2011]
POST /jbf/servlet/SME-Service;jsessionid=3AC092BC5AF98300B77803391E27A9D8.Server5
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.075578
[Mon Sep 05 10:14:58 2011]
POST /jbf/servlet/SME-Service;jsessionid=B9F7EE3D76D298FE8B306E123A510F90.Server3
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.108107
[Mon Sep 05 10:14:58 2011]
POST /jbf/servlet/SME-Service;jsessionid=467AE6BC7B60D4924BB7B7FA530E2978.Server6
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.017212
[Mon Sep 05 10:14:58 2011]
POST /jbf/servlet/SME-Service;jsessionid=2F4B0D8ED3B97975A76ECF0824A56D16.Server5
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.037867
[Mon Sep 05 10:15:07 2011] [26959:22] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:08 2011] [26959:20] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:09 2011] [26959:21] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:10 2011] [26959:15] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:10 2011] [26960:26] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:10 2011] [26959:25] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:10 2011] [26959:23] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:11 2011] [26960:22] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:11 2011] [26959:14] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:11 2011] [26960:21] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:11 2011] [26959:11] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:11 2011] [26960:20] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:12 2011] [26960:27] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:13 2011] [26959:13] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:14 2011] [26960:15] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:14 2011] [26961:27] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:15:16 2011] [26959:10] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
.
.
.
.
.
[Mon Sep 05 10:22:00 2011] [26974:19] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:22:01 2011] [26973:9] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:22:01 2011] [26969:6] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:22:01 2011] [26974:14] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:22:03 2011] [26974:23] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:22:13 2011] [26959:22] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:22:13 2011] [26959:22] [error] ajp_service::jk_ajp_common.c
(2559): (Server1) connecting to tomcat failed.
[Mon Sep 05 10:22:13 2011]
POST /jbf/servlet/SME-Service;jsessionid=CE0CC07297809A414B919A4C01DFF107.Server1
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 494.351793
[Mon Sep 05 10:22:13 2011] POST /jbf/servlet/SME-Service HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.012753
[Mon Sep 05 10:22:13 2011] [26959:20] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:22:13 2011] [26959:20] [error] ajp_service::jk_ajp_common.c
(2559): (Server1) connecting to tomcat failed.
[Mon Sep 05 10:22:13 2011]
POST /jbf/servlet/SME-Service;jsessionid=BBFADFD00EA147FDE6A1CB45B51B1372.Server1
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 494.401530
[Mon Sep 05 10:22:14 2011]
POST /jbf/servlet/SME-Service;jsessionid=294A0E4C9413FF440C64C6B1E4934D6F.Server2
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.062610
[Mon Sep 05 10:22:14 2011] [26959:21] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:22:14 2011] [26959:21] [error] ajp_service::jk_ajp_common.c
(2559): (Server1) connecting to tomcat failed.
[Mon Sep 05 10:22:14 2011]
POST /jbf/servlet/SME-Service;jsessionid=941AF9D5FB7B78A564FE39346AEE7BBE.Server1
 HTTP/1.1 HTTP/1.1
www.mydomain.com 200 494.364604
[Mon Sep 05 10:22:14 2011] POST /jbf/servlet/SME-Service HTTP/1.1 HTTP/1.1
www.mydomain.com 200 0.012492
[Mon Sep 05 10:22:15 2011] [26960:26] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:22:15 2011] [26959:15] [error]
ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to backend
failed. Tomcat is probably not started or is listening on the wrong port
(errno=145)
[Mon Sep 05 10:22:15 2011] [26960:26] [error] ajp_service::jk_ajp_common.c
(2559): (Server1) connecting to tomcat failed.
[Mon Sep 05 10:22:15 2011] [26959:15] [error] ajp_service::jk_ajp_common.c
(2559): (Server1) connecting to tomcat failed.


I made an analysis, about the Number of Requests processed by each Server
per Minute:

Timest.	   Srv1 Srv2  Srv3  Srv4  Srv5  Srv6
10:08, 750 821 829 792 796 754
10:09, 677 630 624 635 617 647
10:10, 598 641 604 605 598 624
10:11, 573 551 592 547 560 585
10:12, 634 613 616 628 662 623
10:13, 680 708 634 617 735 771
10:14, 10 546 521 450 437 409
10:15, Timestamp missing
10:16, Timestamp missing
10:17, Timestamp missing
10:18, Timestamp missing
10:19, Timestamp missing
10:20, Timestamp missing
10:21, Timestamp missing
10:22, 56 238 243 261 250 247
10:23, 10 728 742 716 740 761
10:24, 0 671 638 649 669 608
10:25, 0 559 607 562 583 592
10:26, 0 618 577 588 559 595
10:27, 0 571 605 632 609 614
10:28, 0 498 446 453 470 458
10:29, 0 526 525 522 491 495
10:30, 5 484 497 489 534 543
10:31, 0 551 539 554 531 528
10:32, 0 470 497 501 508 484
10:33, 0 540 492 516 491 509
10:34, 0 411 480 444 424 402
10:35, 0 443 396 419 418 408
10:36, 0 360 370 412 391 381
10:37, 0 386 329 294 340 366
10:38, 0 356 367 371 334 338
10:39, 0 371 375 376 387 393
10:40, 0 441 448 470 441 421
10:41, 0 935 890 893 912 1077
10:42, 0 833 880 862 855 780
10:43, 0 745 792 787 787 816
10:44, 0 866 849 834 793 758
10:45, 0 909 801 831 828 811
10:46, 0 752 875 957 894 897
10:47, 0 793 779 834 732 728
10:48, 0 671 738 678 693 816
10:49, 0 878 709 799 746 729
10:50, 0 698 713 809 730 678
10:51, 0 679 735 587 690 749
10:52, 0 680 686 646 692 650
10:53, 0 638 691 664 638 633
10:54, 686 549 562 506 529 548
10:55, 669 573 556 634 663 661
10:56, 479 609 532 606 611 625
10:57, 618 534 662 580 549 535
10:58, 513 524 508 619 536 511
10:59, 540 570 556 495 640 543



workers.properties Configuration:

worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=Server1,Server2,Server3,Server4,Server5,Server6,Server7,Server8
worker.loadbalancer.sticky_session=True
worker.loadbalancer.sticky_session_force=False
worker.loadbalancer.method=Request
worker.loadbalancer.lock=Optimistic

#########################################################################
# Worker loadbalancer Server1 #
#########################################################################

worker.Server1.type=ajp13
workerServer1.host=Server1.mydomain
workerServer1.port=8800
workerServer1.socket_timeout=600
workerServer1.socket_keepalive=0
workerServer1.retries=7
workerServer1.retry_interval=100
workerServer1.connection_pool_timeout=600
workerServer1.lbfactor=100
workerServer1.connect_timeout=3000
workerServer1.prepost_timeout=3000
workerServer1.reply_timeout=0
workerServer1.recovery_options=0
workerServer1.activation=Active
workerServer1.route=Server1
workerServer1.domain=Server1
workerServer1.redirect=-

#########################################################################
# Worker loadbalancer Server2 #
#########################################################################

worker.Server1.type=ajp13
workerServer1.host=Server1.mydomain
workerServer1.port=8800
workerServer1.socket_timeout=600
workerServer1.socket_keepalive=0
workerServer1.retries=7
workerServer1.retry_interval=100
workerServer1.connection_pool_timeout=600
workerServer1.lbfactor=100
workerServer1.connect_timeout=3000
workerServer1.prepost_timeout=3000
workerServer1.reply_timeout=0
workerServer1.recovery_options=0
workerServer1.activation=Active
workerServer1.route=Server1
workerServer1.domain=Server1
workerServer1.redirect=-

.
.
.
.
Tomcat is Version 6.0.32, Apache is 2.2.16 and mod_jk is 1.2.30

Many Thx
Steffen



----------------------------------------------------------------------------------------------------------------------------------------------


Fiducia IT AG
Fiduciastraße 20
76227 Karlsruhe

Sitz der Gesellschaft: Karlsruhe
AG Mannheim HRB 100059

Vorsitzender des Aufsichtsrats: Gregor Scheller
Vorsitzender des Vorstands: Michael Krings
Stellv. Vorsitzender des Vorstands: Klaus-Peter Bruns
Vorstand: Jens-Olaf Bartels, Carsten Pfläging, Hans-Peter Straberger

Umsatzsteuer-ID.Nr. DE143582320, http://www.fiducia.de
----------------------------------------------------------------------------------------------------------------------------------------------

Re: Antwort: Re: mod_jk doesn`t distribute and failover on tomcat-error

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steffen,

On 9/20/2011 3:18 AM, Steffen.Scheuler@fiducia.de wrote:
>> 1. You should look into using "template" workers.
> Yes, the configuration would be cleaner, but is it a functional
> problem?

No, but it allows people to more easily debug your configuration
without having to cross-check all the settings between workers. Just
read them once and know that they are all the same.

>> 2. Unless you really want to explicitly set all those properties,
>> don't set anything that is the same as a documented default.
>> There's no reason to specify all those details.
> In some cases, we set the same value as the default values to
> avoid problems whit changed defaults.

Fair enough.

>> So, 10 requests were sent to Server1 during this minute? That
>> sounds reasonable, given:
>> 
>>> workerServer1.retry_interval=100
>> 
>> That means that mod_jk will try 10 times per second to reach
>> Server1 when it's in an error state.
>> 
>>> 10:22, 56 238 243 261 250 247 10:23, 10 728 742 716 740 761
>> 
>> 56 seems high, but that might be due to multiple httpd workers
>> all re-trying.
>> 
> The number of requests in my analysis are really processed requests
> with RC=200 not just tries (unfortunatly i don´t see tries in
> Loglevel error).

You might want to bump-up your LogLevel while investigating, then.

>>> 10:54, 686 549 562 506 529 548
>> 
>> So, this is when Server1 becomes operational again? Did you have
>> to use the status worker to trigger mod_jk to allow it back into
>> the cluster, or did it recover on it's own?
> the behavior past 10:22, is reasonable to me, mod_jk recovered on
> its own there was no operational intervention until 10:37, when the
> faulted server was deaktivated in status and restarted.
> 
> The problem is the behavior from 10:15 to 10 10:21, because no
> request was routed to the operational Servers.

Woah, so the entire cluster stalled? That's clearly a problem. Or, did
you just not get any good sampling data for those time periods? I only
see errors for "Server1". Were there other errors? Where are you
getting your numbers for your table with headings "Timest.	   Srv1
Srv2  Srv3  Srv4  Srv5  Srv6"?

>> You might want to consider configuring mod_jk to use a
>> "ping_mode" for activation management.
> 
> Which ping_mode do you recommend me for trying this on a Server
> with that number of requests?  "P" seems to be much overhead.....

You might try "I", but I would consult with your networking folks to
determine which strategy might be best. Honestly, 50 req/sec shouldn't
be a problem to do a prepost connection check: they are fairly fast.
But you're right, it would slow things down just a bit. What is your
expected response time for most of those requests?

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk54yY0ACgkQ9CaO5/Lv0PAeYACgilJAEsbMTV6yLYwDHmhkNXWO
DH4AoLeF41/MQDB80DNMy9mNWBaa/Prw
=2S9J
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Antwort: Re: mod_jk doesn`t distribute and failover on tomcat-error

Posted by St...@fiducia.de.
Hy Chris,

first to your Several thoughts:
>
> 0. You are missing dots (but you told Andre that it was a huge typo...
>    not sure how that kind of typo happens).
It happens, because I subsituted the real worker names to anonymize the
configuration.

> 1. You should look into using "template" workers.
Yes, the configuration would be cleaner, but is it a functional problem?

> 2. Unless you really want to explicitly set all those properties, don't
>    set anything that is the same as a documented default. There's no
>    reason to specify all those details.
In some cases, we set the same value as the default values to avoid
problems whit changed defaults.


> So, 10 requests were sent to Server1 during this minute? That sounds
> reasonable, given:
>
> > workerServer1.retry_interval=100
>
> That means that mod_jk will try 10 times per second to reach Server1
> when it's in an error state.
>
> > 10:22, 56 238 243 261 250 247 10:23, 10 728 742 716 740 761
>
> 56 seems high, but that might be due to multiple httpd workers all
> re-trying.
>
The number of requests in my analysis are really processed requests with
RC=200 not just tries (unfortunatly i don´t see tries in Loglevel error).

> > 10:54, 686 549 562 506 529 548
>
> So, this is when Server1 becomes operational again? Did you have to
> use the status worker to trigger mod_jk to allow it back into the
> cluster, or did it recover on it's own?
the behavior past 10:22, is reasonable to me, mod_jk recovered on its own
there was no operational intervention until 10:37, when the faulted server
was deaktivated in status and restarted.

The problem is the behavior from 10:15 to 10 10:21, because no request was
routed to the operational Servers.

> You might want to consider configuring mod_jk to use a "ping_mode" for
> activation management.

Which ping_mode do you recommend me for trying this on a Server with that
number of requests?  "P" seems to be much overhead.....

Thx
Steffen


Christopher Schultz <ch...@christopherschultz.net> schrieb am 19.09.2011
17:47:39:

> Von: Christopher Schultz <ch...@christopherschultz.net>
> An: Tomcat Users List <us...@tomcat.apache.org>
> Datum: 19.09.2011 17:48
> Betreff: Re: mod_jk doesn`t distribute and failover on tomcat-error
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Steffen,
>
> On 9/19/2011 4:49 AM, Steffen.Scheuler@fiducia.de wrote:
> > If one out of 6 balanced Tomcat-Server, throws an
> > OutOfMemory-Error, mod_jk doestn´t distribute any Reqeuest to the
> > other Servers, until we restart the faultet Server or stop the
> > Server via jkstatus.
> >
> > Here is a sample of the mod_jk-Logfile, while our Server1 got an
> > OutOfMemoryError. From 10:15 to 10:22 non of the other Servers was
> > distributet by mod_jk, even tough they had no problem at all.
> >
> > [snip]
> >
> > [Mon Sep 05 10:15:07 2011] [26959:22] [error]
> > ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to
> > backend failed. Tomcat is probably not started or is listening on
> > the wrong port (errno=145)
>
> The good thing is that mod_jk could detect the error. I was going to
> guess that Tomcat continued to return 200 responses or something
> foolish like that. This ought to be easier to fix than what I was
fearing.
>
> > I made an analysis, about the Number of Requests processed by each
> > Server per Minute:
> >
> > Timest.      Srv1 Srv2  Srv3  Srv4  Srv5  Srv6 10:08, 750 821 829 792
> > 796 754 10:09, 677 630 624 635 617 647 10:10, 598 641 604 605 598
> > 624 10:11, 573 551 592 547 560 585 10:12, 634 613 616 628 662 623
> > 10:13, 680 708 634 617 735 771 10:14, 10 546 521 450 437 409
>
> So, 10 requests were sent to Server1 during this minute? That sounds
> reasonable, given:
>
> > workerServer1.retry_interval=100
>
> That means that mod_jk will try 10 times per second to reach Server1
> when it's in an error state.
>
> > 10:22, 56 238 243 261 250 247 10:23, 10 728 742 716 740 761
>
> 56 seems high, but that might be due to multiple httpd workers all
> re-trying.
>
> > 10:24, 0 671 638 649 669 608
>
> At this point, it looks like mod_jk has finally given up on the worker.
>
> What did the Tomcat status worker say at this point for the worker
> "Server1"?
>
> > 10:54, 686 549 562 506 529 548
>
> So, this is when Server1 becomes operational again? Did you have to
> use the status worker to trigger mod_jk to allow it back into the
> cluster, or did it recover on it's own?
>
> Do you know if the [error] messages in mod_jk.log above actually
> correspond to client connection failures, or are they just notices
> that one member of the cluster dropped-out?
>
> > workers.properties Configuration:
> >
> > worker.loadbalancer.type=lb
> >
>
worker.loadbalancer.balance_workers=Server1,Server2,Server3,Server4,Server5,Server6,Server7,Server8

> >
> >
> worker.loadbalancer.sticky_session=True
> > worker.loadbalancer.sticky_session_force=False
> > worker.loadbalancer.method=Request
> > worker.loadbalancer.lock=Optimistic
>
> You might want to set "retries" here.
>
> >
#########################################################################
> >
> >
> # Worker loadbalancer Server1 #
> >
#########################################################################
> >
> >  worker.Server1.type=ajp13 workerServer1.host=Server1.mydomain
> > workerServer1.port=8800 workerServer1.socket_timeout=600
> > workerServer1.socket_keepalive=0 workerServer1.retries=7
> > workerServer1.retry_interval=100
> > workerServer1.connection_pool_timeout=600
> > workerServer1.lbfactor=100 workerServer1.connect_timeout=3000
> > workerServer1.prepost_timeout=3000 workerServer1.reply_timeout=0
> > workerServer1.recovery_options=0 workerServer1.activation=Active
> > workerServer1.route=Server1 workerServer1.domain=Server1
> > workerServer1.redirect=-
>
> Several thoughts:
>
> 0. You are missing dots (but you told Andre that it was a huge typo...
>    not sure how that kind of typo happens).
> 1. You should look into using "template" workers.
> 2. Unless you really want to explicitly set all those properties, don't
>    set anything that is the same as a documented default. There's no
>    reason to specify all those details.
>
> >
#########################################################################
> >
> >
> # Worker loadbalancer Server2 #
> >
#########################################################################
> >
> >  worker.Server1.type=ajp13 workerServer1.host=Server1.mydomain
>
> Another huge typo?
>
> You might want to consider configuring mod_jk to use a "ping_mode" for
> activation management.
>
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk53ZBsACgkQ9CaO5/Lv0PA3MwCfWTOGRC5fHXgIbDr0vfbI2Aq/
> EqUAn0ei7EtsqCW/iNkIhSOylVfc4odP
> =NlXi
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>




----------------------------------------------------------------------------------------------------------------------------------------------


Fiducia IT AG
Fiduciastraße 20
76227 Karlsruhe

Sitz der Gesellschaft: Karlsruhe
AG Mannheim HRB 100059

Vorsitzender des Aufsichtsrats: Gregor Scheller
Vorsitzender des Vorstands: Michael Krings
Stellv. Vorsitzender des Vorstands: Klaus-Peter Bruns
Vorstand: Jens-Olaf Bartels, Carsten Pfläging, Hans-Peter Straberger

Umsatzsteuer-ID.Nr. DE143582320, http://www.fiducia.de
----------------------------------------------------------------------------------------------------------------------------------------------

Re: mod_jk doesn`t distribute and failover on tomcat-error

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steffen,

On 9/19/2011 4:49 AM, Steffen.Scheuler@fiducia.de wrote:
> If one out of 6 balanced Tomcat-Server, throws an
> OutOfMemory-Error, mod_jk doestn´t distribute any Reqeuest to the
> other Servers, until we restart the faultet Server or stop the
> Server via jkstatus.
> 
> Here is a sample of the mod_jk-Logfile, while our Server1 got an 
> OutOfMemoryError. From 10:15 to 10:22 non of the other Servers was 
> distributet by mod_jk, even tough they had no problem at all.
> 
> [snip]
> 
> [Mon Sep 05 10:15:07 2011] [26959:22] [error] 
> ajp_send_request::jk_ajp_common.c (1585): (Server1) connecting to
> backend failed. Tomcat is probably not started or is listening on
> the wrong port (errno=145)

The good thing is that mod_jk could detect the error. I was going to
guess that Tomcat continued to return 200 responses or something
foolish like that. This ought to be easier to fix than what I was fearing.

> I made an analysis, about the Number of Requests processed by each
> Server per Minute:
> 
> Timest.	   Srv1 Srv2  Srv3  Srv4  Srv5  Srv6 10:08, 750 821 829 792
> 796 754 10:09, 677 630 624 635 617 647 10:10, 598 641 604 605 598
> 624 10:11, 573 551 592 547 560 585 10:12, 634 613 616 628 662 623 
> 10:13, 680 708 634 617 735 771 10:14, 10 546 521 450 437 409

So, 10 requests were sent to Server1 during this minute? That sounds
reasonable, given:

> workerServer1.retry_interval=100

That means that mod_jk will try 10 times per second to reach Server1
when it's in an error state.

> 10:22, 56 238 243 261 250 247 10:23, 10 728 742 716 740 761

56 seems high, but that might be due to multiple httpd workers all
re-trying.

> 10:24, 0 671 638 649 669 608

At this point, it looks like mod_jk has finally given up on the worker.

What did the Tomcat status worker say at this point for the worker
"Server1"?

> 10:54, 686 549 562 506 529 548

So, this is when Server1 becomes operational again? Did you have to
use the status worker to trigger mod_jk to allow it back into the
cluster, or did it recover on it's own?

Do you know if the [error] messages in mod_jk.log above actually
correspond to client connection failures, or are they just notices
that one member of the cluster dropped-out?

> workers.properties Configuration:
> 
> worker.loadbalancer.type=lb 
> worker.loadbalancer.balance_workers=Server1,Server2,Server3,Server4,Server5,Server6,Server7,Server8
>
> 
worker.loadbalancer.sticky_session=True
> worker.loadbalancer.sticky_session_force=False 
> worker.loadbalancer.method=Request 
> worker.loadbalancer.lock=Optimistic

You might want to set "retries" here.

> #########################################################################
>
> 
# Worker loadbalancer Server1 #
> #########################################################################
>
>  worker.Server1.type=ajp13 workerServer1.host=Server1.mydomain 
> workerServer1.port=8800 workerServer1.socket_timeout=600 
> workerServer1.socket_keepalive=0 workerServer1.retries=7 
> workerServer1.retry_interval=100 
> workerServer1.connection_pool_timeout=600 
> workerServer1.lbfactor=100 workerServer1.connect_timeout=3000 
> workerServer1.prepost_timeout=3000 workerServer1.reply_timeout=0 
> workerServer1.recovery_options=0 workerServer1.activation=Active 
> workerServer1.route=Server1 workerServer1.domain=Server1 
> workerServer1.redirect=-

Several thoughts:

0. You are missing dots (but you told Andre that it was a huge typo...
   not sure how that kind of typo happens).
1. You should look into using "template" workers.
2. Unless you really want to explicitly set all those properties, don't
   set anything that is the same as a documented default. There's no
   reason to specify all those details.

> #########################################################################
>
> 
# Worker loadbalancer Server2 #
> #########################################################################
>
>  worker.Server1.type=ajp13 workerServer1.host=Server1.mydomain

Another huge typo?

You might want to consider configuring mod_jk to use a "ping_mode" for
activation management.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk53ZBsACgkQ9CaO5/Lv0PA3MwCfWTOGRC5fHXgIbDr0vfbI2Aq/
EqUAn0ei7EtsqCW/iNkIhSOylVfc4odP
=NlXi
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Antwort: Re: mod_jk doesn`t distribute and failover on tomcat-error

Posted by St...@fiducia.de.
Hy André,

sorry its a typo in my posting. The real workers.properties ist correct.

Greets
Steffen



Von:	André Warnier <aw...@ice-sa.com>
An:	Tomcat Users List <us...@tomcat.apache.org>
Datum:	19.09.2011 11:45
Betreff:	Re: mod_jk doesn`t distribute and failover on tomcat-error



Hi.

Steffen.Scheuler@fiducia.de wrote:
...
>
> workers.properties Configuration:
>
> worker.loadbalancer.type=lb
>
worker.loadbalancer.balance_workers=Server1,Server2,Server3,Server4,Server5,Server6,Server7,Server8

> worker.loadbalancer.sticky_session=True
> worker.loadbalancer.sticky_session_force=False
> worker.loadbalancer.method=Request
> worker.loadbalancer.lock=Optimistic
>
> #########################################################################
> # Worker loadbalancer Server1 #
> #########################################################################
>
> worker.Server1.type=ajp13
> workerServer1.host=Server1.mydomain
> workerServer1.port=8800
> workerServer1.socket_timeout=600
...
Is this a typo ?
All your configuration options, except the first line, are missing a "."
between "worker"
and "ServerX".


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org






----------------------------------------------------------------------------------------------------------------------------------------------


Fiducia IT AG
Fiduciastraße 20
76227 Karlsruhe

Sitz der Gesellschaft: Karlsruhe
AG Mannheim HRB 100059

Vorsitzender des Aufsichtsrats: Gregor Scheller
Vorsitzender des Vorstands: Michael Krings
Stellv. Vorsitzender des Vorstands: Klaus-Peter Bruns
Vorstand: Jens-Olaf Bartels, Carsten Pfläging, Hans-Peter Straberger

Umsatzsteuer-ID.Nr. DE143582320, http://www.fiducia.de
----------------------------------------------------------------------------------------------------------------------------------------------

Re: mod_jk doesn`t distribute and failover on tomcat-error

Posted by André Warnier <aw...@ice-sa.com>.
Hi.

Steffen.Scheuler@fiducia.de wrote:
...
> 
> workers.properties Configuration:
> 
> worker.loadbalancer.type=lb
> worker.loadbalancer.balance_workers=Server1,Server2,Server3,Server4,Server5,Server6,Server7,Server8
> worker.loadbalancer.sticky_session=True
> worker.loadbalancer.sticky_session_force=False
> worker.loadbalancer.method=Request
> worker.loadbalancer.lock=Optimistic
> 
> #########################################################################
> # Worker loadbalancer Server1 #
> #########################################################################
> 
> worker.Server1.type=ajp13
> workerServer1.host=Server1.mydomain
> workerServer1.port=8800
> workerServer1.socket_timeout=600
...
Is this a typo ?
All your configuration options, except the first line, are missing a "." between "worker" 
and "ServerX".


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org