You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@tomcat.apache.org by WenDong Zhang <zw...@gmail.com> on 2009/04/09 04:02:09 UTC

how mod_jk load balancer auto recover node?

Hi all,

I'm using httpd 2.2 & mod_jk as the load balancer server. There 2
tomcat nodes under the cluster. After run a long time, the tomcat node
need to restart (because I found that the system resource's usage is
too high, the cpu usage is almost 100%). During the restarting period,
httpd load balancer maybe treat the node in error status and try to
recover it. I set each node retry times to 2 (the default value):
  worker.node1.retries = 2
but it seems that the httpd try to recover the error node for a while,
if fail it will never try to recover the node any more.

HERE IS MY QUESTION:
is there any way let the httpd load balancer try to recover the fail
nodes per several minutes (e.g. 5 mins), and not depend on the
"retries" parameter.

Thanks.


-- 
Best Regards!
Wen Dong

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: how mod_jk load balancer auto recover node?

Posted by WenDong Zhang <zw...@gmail.com>.

oh, I am not familiar with mod_jk. I configured the load balancer server as
the two guides:
1. workers.properties configuration
http://tomcat.apache.org/connectors-doc/reference/workers.html#Advanced%20Worker%20Directives
2. apache how to
http://tomcat.apache.org/connectors-doc/webserver_howto/apache.html

my configuration files as follow:
## httpd.conf

LoadModule jk_module modules/mod_jk.so
<IfModule mod_jk.c>
    JkWorkersFile       conf/workers.properties
    JkLogFile           logs/mod_jk.log
    JkLogLevel          info
    JkLogStampFormat    "[%a %b %d %H:%M:%S %Y] "
    JkOptions           +ForwardKeySize +ForwardURICompat
-ForwardDirectories
    #JkRequestLogFormat  "%w %V %T"

    <Location /*/WEB-INF/*>
        AllowOverride None
        deny from all
    </Location>

    <Location /*/META-INF/*>
        AllowOverride None
        deny from all
    </Location>

    # forward ALL web requests to our mod_jk loadbalancer workers
    JKMount /* loadbalancer

    JkMount /jkstatus/* jkstatus

</IfModule>

## workers.properties
worker.list=loadbalancer,jkstatus
worker.maintain=60

worker.node65.port=8009
worker.node65.host=9.186.10.65
worker.node65.type=ajp13
worker.node65.connection_pool_size=100
worker.node65.connection_pool_minsize=50
worker.node65.connection_pool_timeout=500
worker.node65.lbfactor=2

worker.node106.port=8009
worker.node106.host=9.186.10.106
worker.node106.type=ajp13
worker.node106.connection_pool_size=160
worker.node106.connection_pool_minsize=80
worker.node106.connection_pool_timeout=500
worker.node106.lbfactor=2

worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=node65,node106
worker.loadbalancer.sticky_session=1
worker.loadbalancer.sticky_session_force=0
worker.jkstatus.type=status

======
BTW: I did some test about load balancer performance. I found that the
nodes' response time is the a factor of the lb's performance. and
connection_pool is also need to be set. When pool size increase the
tolerance of each node's response time increase too. but the pool size also
has a limit. In my test case, when pool size increase to 200, the errors
also occurs.

the errors in mod_jk.log:

[Tue Mar 24 15:36:37 2009] [14425:735282400] [warn]
ajp_get_endpoint::jk_ajp_common.c (2946): Unable to get the free endpoint
for worker node65 from 50 slots
[Tue Mar 24 15:36:37 2009] [14425:4059268320] [info]
service::jk_lb_worker.c (1161): could not get free endpoint for worker
node65 (0 retries)


Thanks.

2009/4/9 Rainer Jung <ra...@kippdata.de>:
> On 09.04.2009 04:02, WenDong Zhang wrote:
>> Hi all,
>>
>> I'm using httpd 2.2 & mod_jk as the load balancer server. There 2
>> tomcat nodes under the cluster. After run a long time, the tomcat node
>> need to restart (because I found that the system resource's usage is
>> too high, the cpu usage is almost 100%). During the restarting period,
>> httpd load balancer maybe treat the node in error status and try to
>> recover it. I set each node retry times to 2 (the default value):
>>   worker.node1.retries = 2
>> but it seems that the httpd try to recover the error node for a while,
>> if fail it will never try to recover the node any more.
>
> No that's not how it works.
>
>> HERE IS MY QUESTION:
>> is there any way let the httpd load balancer try to recover the fail
>> nodes per several minutes (e.g. 5 mins), and not depend on the
>> "retries" parameter.
>
> Recovery doesn't have to do with the retries parameter. It does happen
> automatically, if
>
> - enough time has passed (at least a minute after the last recovery
> attempt, resp. the error detection)
> - a request comes in, that the load balancer wants to send to the broken
> node
>
> Show us your configuration (JK directives in httpd config,
> workers.properties, if used uriworkermap.properties, version of mod_jk)
> and you jk log.
>
> In recently new versions of mod_jk you can also request a recovery via
> the status worker. But that should only be necessary in case the one
> minute wait period is to long for you.
>
> Regards,
>
> Rainer
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>



-- 
Best Regards!
Wen Dong

Re: how mod_jk load balancer auto recover node?

Posted by Rainer Jung <ra...@kippdata.de>.

On 09.04.2009 04:02, WenDong Zhang wrote:
> Hi all,
> 
> I'm using httpd 2.2 & mod_jk as the load balancer server. There 2
> tomcat nodes under the cluster. After run a long time, the tomcat node
> need to restart (because I found that the system resource's usage is
> too high, the cpu usage is almost 100%). During the restarting period,
> httpd load balancer maybe treat the node in error status and try to
> recover it. I set each node retry times to 2 (the default value):
>   worker.node1.retries = 2
> but it seems that the httpd try to recover the error node for a while,
> if fail it will never try to recover the node any more.

No that's not how it works.

> HERE IS MY QUESTION:
> is there any way let the httpd load balancer try to recover the fail
> nodes per several minutes (e.g. 5 mins), and not depend on the
> "retries" parameter.

Recovery doesn't have to do with the retries parameter. It does happen
automatically, if

- enough time has passed (at least a minute after the last recovery
attempt, resp. the error detection)
- a request comes in, that the load balancer wants to send to the broken
node

Show us your configuration (JK directives in httpd config,
workers.properties, if used uriworkermap.properties, version of mod_jk)
and you jk log.

In recently new versions of mod_jk you can also request a recovery via
the status worker. But that should only be necessary in case the one
minute wait period is to long for you.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org