You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by "Kozak, Milos" <Mi...@commerzsystems.com> on 2016/10/18 08:10:49 UTC

Load balancing problem with activation=disabled

Hi,

I am debugging a mod_jk load-balancing configuration which has been used a lot, but for two nodes only. Currently, we made a change for more nodes, and we are facing problem.

Original idea was to have one PROD and DR servers such that all requests are handled by PROD and if PROD goes down DR takes over. In order to do that we used activation=disabled for DR worker, such that:


worker.list=jkstatus,lbhierarchy
worker.jkstatus.type=status
worker.lbhierarchy.type=lb
worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2

worker.hierarchy-1.type=ajp13
worker.hierarchy-1.host=host1
worker.hierarchy-1.port=8009
worker.hierarchy-1.socket_timeout=0
worker.hierarchy-1.socket_keepalive=False
worker.hierarchy-1.retries=2
worker.hierarchy-1.connection_pool_timeout=0
worker.hierarchy-1.lbfactor=1
worker.hierarchy-1.redirect=hierarchy-2

worker.hierarchy-2.type=ajp13
worker.hierarchy-2.host=host2
worker.hierarchy-2.port=8009
worker.hierarchy-2.socket_timeout=0
worker.hierarchy-2.socket_keepalive=False
worker.hierarchy-2.retries=2
worker.hierarchy-2.connection_pool_timeout=0
worker.hierarchy-2.lbfactor=1
worker.hierarchy-2.activation=disabled


However, the current demand is to have four servers which are chained. Basically, we try to have one PROD and 3 DR servers. Each DR is activated when the previous worker goes down. Therefore, we prepared configuration like this:



worker.list=jkstatus,lbhierarchy,lb2hierarchy
worker.jkstatus.type=status
worker.lbhierarchy.type=lb
worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4

worker.hierarchy-1.type=ajp13
worker.hierarchy-1.host=host
worker.hierarchy-1.port=8009
worker.hierarchy-1.socket_timeout=0
worker.hierarchy-1.socket_keepalive=False
worker.hierarchy-1.retries=2
worker.hierarchy-1.connection_pool_timeout=0
worker.hierarchy-1.lbfactor=1
worker.hierarchy-1.redirect=hierarchy-2

worker.hierarchy-2.type=ajp13
worker.hierarchy-2.host=host
worker.hierarchy-2.port=8010
worker.hierarchy-2.socket_timeout=0
worker.hierarchy-2.socket_keepalive=False
worker.hierarchy-2.retries=2
worker.hierarchy-2.connection_pool_timeout=0
worker.hierarchy-2.lbfactor=1
worker.hierarchy-2.activation=disabled
worker.hierarchy-2.redirect=hierarchy-3

worker.hierarchy-3.type=ajp13
worker.hierarchy-3.host=host
worker.hierarchy-3.port=8011
worker.hierarchy-3.socket_timeout=0
worker.hierarchy-3.socket_keepalive=False
worker.hierarchy-3.retries=2
worker.hierarchy-3.connection_pool_timeout=0
worker.hierarchy-3.lbfactor=1
worker.hierarchy-3.activation=disabled
worker.hierarchy-3.redirect=hierarchy-4

worker.hierarchy-4.type=ajp13
worker.hierarchy-4.host=host12
worker.hierarchy-4.port=10603
worker.hierarchy-4.socket_timeout=0
worker.hierarchy-4.socket_keepalive=False
worker.hierarchy-4.retries=2
worker.hierarchy-4.connection_pool_timeout=0
worker.hierarchy-4.lbfactor=1
worker.hierarchy-4.activation=disabled
worker.hierarchy-4.redirect=hierarchy-1

Initially, 3 servers are disabled and redirect is specified.

Problem occurs when hierarchy-3 worker goes down because hierarchy-4 never gets initiated and mod_jk log says that:
All tomcat instances failed, no more workers left

However, workers list is like this:
worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4

Which means we have got more than 3 workers hence we have more workers left... Here is log where I tried to take down workers one by one:

[Tue Oct 18 09:51:30.623 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
[Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8009 failed (errno=111)
[Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-1) Failed opening socket to (HOST:8009) (errno=111)
[Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-1) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
[Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
[Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_service::jk_ajp_common.c (2794): (hierarchy-1) connecting to tomcat failed (rc=-3, errors=5, client_errors=0).
[Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1595): service failed, worker hierarchy-1 is in error state
[Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8010 failed (errno=111)
[Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-2) Failed opening socket to (HOST:8010) (errno=111)
[Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-2) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
[Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-2) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
[Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8010 failed (errno=111)
[Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-2) Failed opening socket to (HOST:8010) (errno=111)
[Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-2) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
[Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-2) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
[Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [error] ajp_service::jk_ajp_common.c (2794): (hierarchy-2) connecting to tomcat failed (rc=-3, errors=5, client_errors=0).
[Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1595): service failed, worker hierarchy-2 is in error state
[Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8011 failed (errno=111)
[Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-3) Failed opening socket to (HOST:8011) (errno=111)
[Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-3) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
[Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-3) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
[Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8011 failed (errno=111)
[Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-3) Failed opening socket to (HOST:8011) (errno=111)
[Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-3) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
[Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-3) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
[Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [error] ajp_service::jk_ajp_common.c (2794): (hierarchy-3) connecting to tomcat failed (rc=-3, errors=5, client_errors=0).
[Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1595): service failed, worker hierarchy-3 is in error state
[Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=0, retry=1)
[Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=1, retry=1)
[Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=2, retry=1)
[Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=3, retry=1)
[Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1675): All tomcat instances are busy or in error state
[Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [error] service::jk_lb_worker.c (1680): All tomcat instances failed, no more workers left
[Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] jk_handler::mod_jk.c (2991): Service error=0 for worker=lbhierarchy


I tested similar configuration for 5 workers, and I never got over worker 3..


We are using :
Server Version: Apache/2.2.15 (Unix) mod_jk/1.2.41
JK Version: mod_jk/1.2.41


I don't know whether is this a bug or did I miss something in documentation? Thanks for a help!


Miloš Kozák
Application Operations
AO Front Office Support
Commerz Systems Prague


________________________________

Právní informace: Tento e-mail a všechny připojené soubory jsou důvěrné a mohou být chráněny zákonem a jsou určeny pouze oprávněným adresátům. Pokud jste obdržel/a tento e-mail omylem, oznamte to, prosím, neprodleně jeho odesílateli a pak jej vymažte. Tento e-mail ani připojené soubory nejsou návrhem na uzavření smlouvy ani jeho přijetím, ledaže jsou tak výslovně označeny. Commerzbank neručí za bezchybný a úplný přenos zasílaných informací, ani za zpoždění nebo přerušení přenosu a ani za škody způsobené použitím nebo důvěrou v tyto informace.

Legal information: This e-mail message and all attached files are confidential, may be protected under the law, and are intended for authorized recipients only. If you have received this e-mail message in error, please notify the sender immediately and subsequently delete the message. This e-mail message and the attached files do not constitute a proposal to enter into an agreement or the acceptance of such a proposal, unless expressly designated as such. COMMERZBANK gives no guarantee that the transmitted information is error-free and complete. COMMERZBANK will not be held liable for any transmission delay or interruption, nor for any loss incurred due to the use of, or reliance on, the transmitted information.

COMMERZBANK Aktiengesellschaft, pobočka Praha, se sídlem Praha 2, Jugoslávská 1, 120 21, IČO: 47 61 09 21, zapsaná v Obchodním rejstříku u Městského soudu v Praze, v oddílu A, vložka 7341, pobočka COMMERZBANK Aktiengesellschaft, Frankfurt nad Mohanem, Německo.
Povinné údaje: http://www.commerzbank.de/pflichtangaben

COMMERZBANK Aktiengesellschaft, pobočka Praha, with its registered office: Praha 2, Jugoslávská 1, 120 21, registered number: 47610921, entered in the Commercial Register at the Municipal Court Prague, Section A, Entry 7341, branch of COMMERZBANK Aktiengesellschaft, Frankfurt am Main, Germany.
Mandatory information: http://www.commerzbank.com/pflichtangaben

RE: Load balancing problem with activation=disabled

Posted by "Kozak, Milos" <Mi...@commerzsystems.com>.
I actually achieved my solution with activation directive, and such a configuration works. The only drawback is:

When PROD recovers all traffic is managed by DR until DR goes down, and my case would be to take back all traffic after PROD is recovered.


Milos

-----Original Message-----
From: Rainer Jung [mailto:rainer.jung@kippdata.de]
Sent: 18 October 2016 11:33
To: Tomcat Users List
Subject: Re: Load balancing problem with activation=disabled



Am 18.10.2016 um 10:10 schrieb Kozak, Milos:
> Hi,
>
> I am debugging a mod_jk load-balancing configuration which has been used a lot, but for two nodes only. Currently, we made a change for more nodes, and we are facing problem.
>
> Original idea was to have one PROD and DR servers such that all requests are handled by PROD and if PROD goes down DR takes over. In order to do that we used activation=disabled for DR worker, such that:
>
>
> worker.list=jkstatus,lbhierarchy
> worker.jkstatus.type=status
> worker.lbhierarchy.type=lb
> worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2
>
> worker.hierarchy-1.type=ajp13
> worker.hierarchy-1.host=host1
> worker.hierarchy-1.port=8009
> worker.hierarchy-1.socket_timeout=0
> worker.hierarchy-1.socket_keepalive=False
> worker.hierarchy-1.retries=2
> worker.hierarchy-1.connection_pool_timeout=0
> worker.hierarchy-1.lbfactor=1
> worker.hierarchy-1.redirect=hierarchy-2
>
> worker.hierarchy-2.type=ajp13
> worker.hierarchy-2.host=host2
> worker.hierarchy-2.port=8009
> worker.hierarchy-2.socket_timeout=0
> worker.hierarchy-2.socket_keepalive=False
> worker.hierarchy-2.retries=2
> worker.hierarchy-2.connection_pool_timeout=0
> worker.hierarchy-2.lbfactor=1
> worker.hierarchy-2.activation=disabled
>
>
> However, the current demand is to have four servers which are chained. Basically, we try to have one PROD and 3 DR servers. Each DR is activated when the previous worker goes down. Therefore, we prepared configuration like this:
>
>
>
> worker.list=jkstatus,lbhierarchy,lb2hierarchy
> worker.jkstatus.type=status
> worker.lbhierarchy.type=lb
> worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4
>
> worker.hierarchy-1.type=ajp13
> worker.hierarchy-1.host=host
> worker.hierarchy-1.port=8009
> worker.hierarchy-1.socket_timeout=0
> worker.hierarchy-1.socket_keepalive=False
> worker.hierarchy-1.retries=2
> worker.hierarchy-1.connection_pool_timeout=0
> worker.hierarchy-1.lbfactor=1
> worker.hierarchy-1.redirect=hierarchy-2
>
> worker.hierarchy-2.type=ajp13
> worker.hierarchy-2.host=host
> worker.hierarchy-2.port=8010
> worker.hierarchy-2.socket_timeout=0
> worker.hierarchy-2.socket_keepalive=False
> worker.hierarchy-2.retries=2
> worker.hierarchy-2.connection_pool_timeout=0
> worker.hierarchy-2.lbfactor=1
> worker.hierarchy-2.activation=disabled
> worker.hierarchy-2.redirect=hierarchy-3
>
> worker.hierarchy-3.type=ajp13
> worker.hierarchy-3.host=host
> worker.hierarchy-3.port=8011
> worker.hierarchy-3.socket_timeout=0
> worker.hierarchy-3.socket_keepalive=False
> worker.hierarchy-3.retries=2
> worker.hierarchy-3.connection_pool_timeout=0
> worker.hierarchy-3.lbfactor=1
> worker.hierarchy-3.activation=disabled
> worker.hierarchy-3.redirect=hierarchy-4
>
> worker.hierarchy-4.type=ajp13
> worker.hierarchy-4.host=host12
> worker.hierarchy-4.port=10603
> worker.hierarchy-4.socket_timeout=0
> worker.hierarchy-4.socket_keepalive=False
> worker.hierarchy-4.retries=2
> worker.hierarchy-4.connection_pool_timeout=0
> worker.hierarchy-4.lbfactor=1
> worker.hierarchy-4.activation=disabled
> worker.hierarchy-4.redirect=hierarchy-1
>
> Initially, 3 servers are disabled and redirect is specified.
>
> Problem occurs when hierarchy-3 worker goes down because hierarchy-4 never gets initiated and mod_jk log says that:
> All tomcat instances failed, no more workers left
>
> However, workers list is like this:
> worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4
>
> Which means we have got more than 3 workers hence we have more workers left... Here is log where I tried to take down workers one by one:
>
> [Tue Oct 18 09:51:30.623 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8009 failed (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-1) Failed opening socket to (HOST:8009) (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-1) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_service::jk_ajp_common.c (2794): (hierarchy-1) connecting to tomcat failed (rc=-3, errors=5, client_errors=0).
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1595): service failed, worker hierarchy-1 is in error state
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8010 failed (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-2) Failed opening socket to (HOST:8010) (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-2) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-2) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
> [Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8010 failed (errno=111)
> [Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-2) Failed opening socket to (HOST:8010) (errno=111)
> [Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-2) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
> [Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-2) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
> [Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [error] ajp_service::jk_ajp_common.c (2794): (hierarchy-2) connecting to tomcat failed (rc=-3, errors=5, client_errors=0).
> [Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1595): service failed, worker hierarchy-2 is in error state
> [Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8011 failed (errno=111)
> [Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-3) Failed opening socket to (HOST:8011) (errno=111)
> [Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-3) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
> [Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-3) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8011 failed (errno=111)
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-3) Failed opening socket to (HOST:8011) (errno=111)
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-3) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-3) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [error] ajp_service::jk_ajp_common.c (2794): (hierarchy-3) connecting to tomcat failed (rc=-3, errors=5, client_errors=0).
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1595): service failed, worker hierarchy-3 is in error state
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=0, retry=1)
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=1, retry=1)
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=2, retry=1)
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=3, retry=1)
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1675): All tomcat instances are busy or in error state
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [error] service::jk_lb_worker.c (1680): All tomcat instances failed, no more workers left
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] jk_handler::mod_jk.c (2991): Service error=0 for worker=lbhierarchy
>
>
> I tested similar configuration for 5 workers, and I never got over worker 3..
>
>
> We are using :
> Server Version: Apache/2.2.15 (Unix) mod_jk/1.2.41
> JK Version: mod_jk/1.2.41
>
>
> I don't know whether is this a bug or did I miss something in documentation? Thanks for a help!

Look at "distance" instead of activation. Set all workers active and use
distance values, e.g. "0" for all "normal workers", "1" for the first DR
group, "2" etc.

See:

http://tomcat.apache.org/connectors-doc/reference/workers.html

Regards,

Rainer


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


________________________________

Právní informace: Tento e-mail a všechny připojené soubory jsou důvěrné a mohou být chráněny zákonem a jsou určeny pouze oprávněným adresátům. Pokud jste obdržel/a tento e-mail omylem, oznamte to, prosím, neprodleně jeho odesílateli a pak jej vymažte. Tento e-mail ani připojené soubory nejsou návrhem na uzavření smlouvy ani jeho přijetím, ledaže jsou tak výslovně označeny. Commerzbank neručí za bezchybný a úplný přenos zasílaných informací, ani za zpoždění nebo přerušení přenosu a ani za škody způsobené použitím nebo důvěrou v tyto informace.

Legal information: This e-mail message and all attached files are confidential, may be protected under the law, and are intended for authorized recipients only. If you have received this e-mail message in error, please notify the sender immediately and subsequently delete the message. This e-mail message and the attached files do not constitute a proposal to enter into an agreement or the acceptance of such a proposal, unless expressly designated as such. COMMERZBANK gives no guarantee that the transmitted information is error-free and complete. COMMERZBANK will not be held liable for any transmission delay or interruption, nor for any loss incurred due to the use of, or reliance on, the transmitted information.

COMMERZBANK Aktiengesellschaft, pobočka Praha, se sídlem Praha 2, Jugoslávská 1, 120 21, IČO: 47 61 09 21, zapsaná v Obchodním rejstříku u Městského soudu v Praze, v oddílu A, vložka 7341, pobočka COMMERZBANK Aktiengesellschaft, Frankfurt nad Mohanem, Německo.
Povinné údaje: http://www.commerzbank.de/pflichtangaben

COMMERZBANK Aktiengesellschaft, pobočka Praha, with its registered office: Praha 2, Jugoslávská 1, 120 21, registered number: 47610921, entered in the Commercial Register at the Municipal Court Prague, Section A, Entry 7341, branch of COMMERZBANK Aktiengesellschaft, Frankfurt am Main, Germany.
Mandatory information: http://www.commerzbank.com/pflichtangaben

Re: Load balancing problem with activation=disabled

Posted by Rainer Jung <ra...@kippdata.de>.

Am 18.10.2016 um 10:10 schrieb Kozak, Milos:
> Hi,
>
> I am debugging a mod_jk load-balancing configuration which has been used a lot, but for two nodes only. Currently, we made a change for more nodes, and we are facing problem.
>
> Original idea was to have one PROD and DR servers such that all requests are handled by PROD and if PROD goes down DR takes over. In order to do that we used activation=disabled for DR worker, such that:
>
>
> worker.list=jkstatus,lbhierarchy
> worker.jkstatus.type=status
> worker.lbhierarchy.type=lb
> worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2
>
> worker.hierarchy-1.type=ajp13
> worker.hierarchy-1.host=host1
> worker.hierarchy-1.port=8009
> worker.hierarchy-1.socket_timeout=0
> worker.hierarchy-1.socket_keepalive=False
> worker.hierarchy-1.retries=2
> worker.hierarchy-1.connection_pool_timeout=0
> worker.hierarchy-1.lbfactor=1
> worker.hierarchy-1.redirect=hierarchy-2
>
> worker.hierarchy-2.type=ajp13
> worker.hierarchy-2.host=host2
> worker.hierarchy-2.port=8009
> worker.hierarchy-2.socket_timeout=0
> worker.hierarchy-2.socket_keepalive=False
> worker.hierarchy-2.retries=2
> worker.hierarchy-2.connection_pool_timeout=0
> worker.hierarchy-2.lbfactor=1
> worker.hierarchy-2.activation=disabled
>
>
> However, the current demand is to have four servers which are chained. Basically, we try to have one PROD and 3 DR servers. Each DR is activated when the previous worker goes down. Therefore, we prepared configuration like this:
>
>
>
> worker.list=jkstatus,lbhierarchy,lb2hierarchy
> worker.jkstatus.type=status
> worker.lbhierarchy.type=lb
> worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4
>
> worker.hierarchy-1.type=ajp13
> worker.hierarchy-1.host=host
> worker.hierarchy-1.port=8009
> worker.hierarchy-1.socket_timeout=0
> worker.hierarchy-1.socket_keepalive=False
> worker.hierarchy-1.retries=2
> worker.hierarchy-1.connection_pool_timeout=0
> worker.hierarchy-1.lbfactor=1
> worker.hierarchy-1.redirect=hierarchy-2
>
> worker.hierarchy-2.type=ajp13
> worker.hierarchy-2.host=host
> worker.hierarchy-2.port=8010
> worker.hierarchy-2.socket_timeout=0
> worker.hierarchy-2.socket_keepalive=False
> worker.hierarchy-2.retries=2
> worker.hierarchy-2.connection_pool_timeout=0
> worker.hierarchy-2.lbfactor=1
> worker.hierarchy-2.activation=disabled
> worker.hierarchy-2.redirect=hierarchy-3
>
> worker.hierarchy-3.type=ajp13
> worker.hierarchy-3.host=host
> worker.hierarchy-3.port=8011
> worker.hierarchy-3.socket_timeout=0
> worker.hierarchy-3.socket_keepalive=False
> worker.hierarchy-3.retries=2
> worker.hierarchy-3.connection_pool_timeout=0
> worker.hierarchy-3.lbfactor=1
> worker.hierarchy-3.activation=disabled
> worker.hierarchy-3.redirect=hierarchy-4
>
> worker.hierarchy-4.type=ajp13
> worker.hierarchy-4.host=host12
> worker.hierarchy-4.port=10603
> worker.hierarchy-4.socket_timeout=0
> worker.hierarchy-4.socket_keepalive=False
> worker.hierarchy-4.retries=2
> worker.hierarchy-4.connection_pool_timeout=0
> worker.hierarchy-4.lbfactor=1
> worker.hierarchy-4.activation=disabled
> worker.hierarchy-4.redirect=hierarchy-1
>
> Initially, 3 servers are disabled and redirect is specified.
>
> Problem occurs when hierarchy-3 worker goes down because hierarchy-4 never gets initiated and mod_jk log says that:
> All tomcat instances failed, no more workers left
>
> However, workers list is like this:
> worker.lbhierarchy.balance_workers=hierarchy-1,hierarchy-2,hierarchy-3,hierarchy-4
>
> Which means we have got more than 3 workers hence we have more workers left... Here is log where I tried to take down workers one by one:
>
> [Tue Oct 18 09:51:30.623 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8009 failed (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-1) Failed opening socket to (HOST:8009) (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-1) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_service::jk_ajp_common.c (2794): (hierarchy-1) connecting to tomcat failed (rc=-3, errors=5, client_errors=0).
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1595): service failed, worker hierarchy-1 is in error state
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8010 failed (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-2) Failed opening socket to (HOST:8010) (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-2) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
> [Tue Oct 18 09:51:30.724 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-2) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
> [Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8010 failed (errno=111)
> [Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-2) Failed opening socket to (HOST:8010) (errno=111)
> [Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-2) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
> [Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-2) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
> [Tue Oct 18 09:51:30.824 2016] [31890:139909801125856] [error] ajp_service::jk_ajp_common.c (2794): (hierarchy-2) connecting to tomcat failed (rc=-3, errors=5, client_errors=0).
> [Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1595): service failed, worker hierarchy-2 is in error state
> [Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8011 failed (errno=111)
> [Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-3) Failed opening socket to (HOST:8011) (errno=111)
> [Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-3) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
> [Tue Oct 18 09:51:30.825 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-3) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] jk_open_socket::jk_connect.c (817): connect to HOST:8011 failed (errno=111)
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (hierarchy-3) Failed opening socket to (HOST:8011) (errno=111)
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [error] ajp_send_request::jk_ajp_common.c (1728): (hierarchy-3) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=111)
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] ajp_service::jk_ajp_common.c (2773): (hierarchy-3) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [error] ajp_service::jk_ajp_common.c (2794): (hierarchy-3) connecting to tomcat failed (rc=-3, errors=5, client_errors=0).
> [Tue Oct 18 09:51:30.925 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1595): service failed, worker hierarchy-3 is in error state
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=0, retry=1)
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=1, retry=1)
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=2, retry=1)
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1664): All tomcat instances failed, no more workers left (attempt=3, retry=1)
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] service::jk_lb_worker.c (1675): All tomcat instances are busy or in error state
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [error] service::jk_lb_worker.c (1680): All tomcat instances failed, no more workers left
> [Tue Oct 18 09:51:31.026 2016] [31890:139909801125856] [info] jk_handler::mod_jk.c (2991): Service error=0 for worker=lbhierarchy
>
>
> I tested similar configuration for 5 workers, and I never got over worker 3..
>
>
> We are using :
> Server Version: Apache/2.2.15 (Unix) mod_jk/1.2.41
> JK Version: mod_jk/1.2.41
>
>
> I don't know whether is this a bug or did I miss something in documentation? Thanks for a help!

Look at "distance" instead of activation. Set all workers active and use 
distance values, e.g. "0" for all "normal workers", "1" for the first DR 
group, "2" etc.

See:

http://tomcat.apache.org/connectors-doc/reference/workers.html

Regards,

Rainer


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org