You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@tomcat.apache.org by ben short <ja...@gmail.com> on 2007/08/23 12:51:27 UTC

JK Loadbalancer not balancing fairly

Hi All,

We are doing some load testing on our setup and find that the cpu use
age of tomcat reported by top on the two systems is not equal.
Typically we see figures like ~400% to 800% cpu on one machine and
~50% on the other machine for the java process. We would expect that
the two cpu values to be equal.

The jkstatus page on box one shows the following after a restart.
Although before a restart the Max column was showing 250 for jcpres1
and 32 for jcpres2.

Name	Type	Host	Addr	Act	State	D	F	M	V	Acc	Err	CE	RE	Wr	Rd	Busy	Max	Route	RR	Cd	Rs
 	jcpres1	ajp13	172.16.4.11:8009	172.16.4.11:8009	ACT	OK	0	1	1	869	4276	2	4	0	939K	286M	1	11	jcpres1	
	 	0/0
 	jcpres2	ajp13	172.16.4.12:8009	172.16.4.12:8009	ACT	OK	0	1	1	869	4277	2	1	0	943K	280M	2	9	jcpres2	
	 	0/0

and box 2

Name	Type	Host	Addr	Act	State	D	F	M	V	Acc	Err	CE	RE	Wr	Rd	Busy	Max	Route	RR	Cd	Rs
 	jcpres1	ajp13	172.16.4.11:8009	172.16.4.11:8009	ACT	OK	0	1	1	484	3872	0	4	0	850K	256M	3	10	jcpres1	
	 	0/0
 	jcpres2	ajp13	172.16.4.12:8009	172.16.4.12:8009	ACT	OK	0	1	1	483	3871	0	4	0	850K	260M	1	10	jcpres2	
	 	0/0


Our system setup.

Both machines are running the the following software on RedHat 4ES

Httpd 2.2.4
Mod JK 1.2.25
Tomcat 6.0.12
Java 1.6.0_01

Box 1.

workers.properties

# JK Status worker config

worker.list=jkstatus
worker.jkstatus.type=status

# Presentaton Load Balancer Config

worker.list=preslb

worker.preslb.type=lb
worker.preslb.balance_workers=jcpres1,jcpres2
worker.preslb.sticky_session=1

worker.jcpres1.port=8009
worker.jcpres1.host=172.16.4.11
worker.jcpres1.type=ajp13
worker.jcpres1.lbfactor=1
worker.jcpres1.fail_on_status=503,400,500,909

worker.jcpres2.port=8009
worker.jcpres2.host=172.16.4.12
worker.jcpres2.type=ajp13
worker.jcpres2.lbfactor=1
worker.jcpres2.fail_on_status=503,400,500,909


Box 2.

workers.properties

# JK Status worker config

worker.list=jkstatus
worker.jkstatus.type=status

# Presentaton Load Balancer Config

worker.list=preslb

worker.preslb.type=lb
worker.preslb.balance_workers=jcpres1,jcpres2
worker.preslb.sticky_session=1

worker.jcpres1.port=8009
worker.jcpres1.host=172.16.4.11
worker.jcpres1.type=ajp13
worker.jcpres1.lbfactor=1
worker.jcpres1.fail_on_status=503,400,500,909

worker.jcpres2.port=8009
worker.jcpres2.host=172.16.4.12
worker.jcpres2.type=ajp13
worker.jcpres2.lbfactor=1
worker.jcpres2.fail_on_status=503,400,500,909

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: JK Loadbalancer not balancing fairly

Posted by Rainer Jung <ra...@kippdata.de>.

Brian.Horblit@thomson.com schrieb:
> Rainer,
> 
> Thanks very much for the clarification! Since I have playing with the
> load balancing strategy set to session ("worker.router.method=S" on my
> load balancer), is there a way to tell roughly how many sessions have
> been pinned to each worker/tomcat? In this case would the load balancer

No unfortunatley not. You can log cookies (if used) wuth apache and the
name of the target worker in the access log. Maybe easier is to log the
session ID in Tomcats access log (I think %S, check the Valves docs) and
then count the different IDs (not nice, but will work).

> value be (something like) the number of new sessions sent to a
> particular worker divided by two some number of times? If this were true
> you still would not know the number of sessions pinned to a worked
> because of the factors of two having been divided out. I just got a HTTP

It is true.

> JMX adapter wired up in Tomcat so I'll see if I can get session info
> that way...

Yes, the manager MBean of the context contains session info.

> Thanks again,
> 
> Brian

Regards,

Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

RE: JK Loadbalancer not balancing fairly

Posted by Br...@thomson.com.

Rainer,

Thanks very much for the clarification! Since I have playing with the
load balancing strategy set to session ("worker.router.method=S" on my
load balancer), is there a way to tell roughly how many sessions have
been pinned to each worker/tomcat? In this case would the load balancer
value be (something like) the number of new sessions sent to a
particular worker divided by two some number of times? If this were true
you still would not know the number of sessions pinned to a worked
because of the factors of two having been divided out. I just got a HTTP
JMX adapter wired up in Tomcat so I'll see if I can get session info
that way...

Thanks again,

Brian

-->-----Original Message-----
-->From: Rainer Jung [mailto:rainer.jung@kippdata.de] 
-->Sent: Thursday, August 23, 2007 11:22 AM
-->To: Tomcat Users List
-->Subject: Re: JK Loadbalancer not balancing fairly
-->
-->Brian.Horblit@thomson.com schrieb:
-->> Ben,
-->> 
-->> So I assume you have two web servers fronting two app servers - or 
-->> there are two servers both of which have a web server and an app 
-->> server? For the restart you talk about - did you restart both web 
-->> servers? Do you have a good load balancer (local director, content 
-->> director like an F5) in front of the two web servers?
-->> 
-->> If I am reading your JKStatus text correctly I noticed the 
-->following:
-->> 
-->> Load balancer value on web server 2
-->> ----------------------------------- = ~0.56 Load balancer 
-->value on web 
-->> server 1
-->> 
-->> but
-->> 
-->> Number requests on web server 2
-->> ----------------------------------- = ~0.91 Number requests on web 
-->> server 1
-->> 
-->> 
-->> Now, if I am interpreting the meaning of "load balancer value" and 
-->> "number of reuqests" correctly, that would imply that the 
-->number of 
-->> sessions stuck to each app server from web server 1 is 
-->very roughly 
-->> twice as high as from 2, but the total number of requests 
-->sent to each 
-->> app server from both web servers is very roughly the same. (Can 
-->> someone confirm I'm intrepreting those #s correctly?)
-->
-->The number of requests is the total since last jk/apache 
-->restart. So if the last restart was shortly before, the 
-->numbers will not help. If they were not reset after the 
-->tests, we would know, that Apache 1 had a little more 
-->requests than apache 2, but both of them send exacty the 
-->same number of requests to the two tomcat nodes (delta=1 request).
-->
-->The V column is the balancing value used to decide, where 
-->the next request goes to. It is the number of requests sent 
-->to the tomcat divided by two once a minute, so it is 
-->multiplied by a decay curve. The big difference between the 
-->V values of apache 1 and apache 2 does not matter. It could 
-->simply mean, that the one with the bigger V value did it's 
-->division more recent in time. The V values for the two 
-->tomcats are again very similar on the same Apache, another 
-->indication of good balancing.
-->
-->All his is true for the default balancing method "Requests".
-->
-->I would suggest first to follow CPU by Tomcat process over 
-->the test period (not per system and not simply as one 
-->number, instead as a graph over time).
-->
-->> According to the docs, each connect by default trys to 
-->keep the number 
-->> of requests sent to each worker the same, which looks to 
-->be happening 
-->> reasonably well. (I'm playing with trying the keep the number of 
-->> sessions balanced since our apps tend to be more of a memory issue 
-->> than a cpu issue. There is a setting on the connector for this.)
-->> 
-->> With a some info on your setup we can try to figure out the load 
-->> imbalance.
-->> 
-->> As a note, I am playing with the jk1.2.x connector, but 
-->our productio 
-->> systems use the old jk2.x connector. With that, I've seen a load 
-->> imbalance on the app servers when one of the app serves 
-->has gone down 
-->> for a while, and then has come back up. If the connectors are not 
-->> reset, they will try to "catch up" the restarted app 
-->server in terms 
-->> of the number of requests it has handled, thus loading it 
-->more heavily 
-->> than servers that have been up the whole time.
-->
-->The catchup problem should be fixed. A recovered or 
-->reactivated worker gets the biggest "work done" value of all 
-->other workers, so it should start normal or even a little 
-->less loaded.
-->
-->> 
-->> Brian
-->
-->Regards,
-->
-->Rainer
-->
-->---------------------------------------------------------------------
-->To start a new topic, e-mail: users@tomcat.apache.org To 
-->unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
-->For additional commands, e-mail: users-help@tomcat.apache.org
-->
-->

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: JK Loadbalancer not balancing fairly

Posted by Rainer Jung <ra...@kippdata.de>.

Brian.Horblit@thomson.com schrieb:
> Ben,
> 
> So I assume you have two web servers fronting two app servers - or there
> are two servers both of which have a web server and an app server? For
> the restart you talk about - did you restart both web servers? Do you
> have a good load balancer (local director, content director like an F5)
> in front of the two web servers?
> 
> If I am reading your JKStatus text correctly I noticed the following:
> 
> Load balancer value on web server 2
> ----------------------------------- = ~0.56
> Load balancer value on web server 1  
> 
> but
> 
> Number requests on web server 2
> ----------------------------------- = ~0.91
> Number requests on web server 1  
> 
> 
> Now, if I am interpreting the meaning of "load balancer value" and
> "number of reuqests" correctly, that would imply that the number of
> sessions stuck to each app server from web server 1 is very roughly
> twice as high as from 2, but the total number of requests sent to each
> app server from both web servers is very roughly the same. (Can someone
> confirm I'm intrepreting those #s correctly?)

The number of requests is the total since last jk/apache restart. So if
the last restart was shortly before, the numbers will not help. If they
were not reset after the tests, we would know, that Apache 1 had a
little more requests than apache 2, but both of them send exacty the
same number of requests to the two tomcat nodes (delta=1 request).

The V column is the balancing value used to decide, where the next
request goes to. It is the number of requests sent to the tomcat divided
by two once a minute, so it is multiplied by a decay curve. The big
difference between the V values of apache 1 and apache 2 does not
matter. It could simply mean, that the one with the bigger V value did
it's division more recent in time. The V values for the two tomcats are
again very similar on the same Apache, another indication of good balancing.

All his is true for the default balancing method "Requests".

I would suggest first to follow CPU by Tomcat process over the test
period (not per system and not simply as one number, instead as a graph
over time).

> According to the docs, each connect by default trys to keep the number
> of requests sent to each worker the same, which looks to be happening
> reasonably well. (I'm playing with trying the keep the number of
> sessions balanced since our apps tend to be more of a memory issue than
> a cpu issue. There is a setting on the connector for this.)
> 
> With a some info on your setup we can try to figure out the load
> imbalance.
> 
> As a note, I am playing with the jk1.2.x connector, but our productio
> systems use the old jk2.x connector. With that, I've seen a load
> imbalance on the app servers when one of the app serves has gone down
> for a while, and then has come back up. If the connectors are not reset,
> they will try to "catch up" the restarted app server in terms of the
> number of requests it has handled, thus loading it more heavily than
> servers that have been up the whole time.

The catchup problem should be fixed. A recovered or reactivated worker
gets the biggest "work done" value of all other workers, so it should
start normal or even a little less loaded.

> 
> Brian

Regards,

Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

RE: JK Loadbalancer not balancing fairly

Posted by Br...@thomson.com.

Ben,

So I assume you have two web servers fronting two app servers - or there
are two servers both of which have a web server and an app server? For
the restart you talk about - did you restart both web servers? Do you
have a good load balancer (local director, content director like an F5)
in front of the two web servers?

If I am reading your JKStatus text correctly I noticed the following:

Load balancer value on web server 2
----------------------------------- = ~0.56
Load balancer value on web server 1  

but

Number requests on web server 2
----------------------------------- = ~0.91
Number requests on web server 1  


Now, if I am interpreting the meaning of "load balancer value" and
"number of reuqests" correctly, that would imply that the number of
sessions stuck to each app server from web server 1 is very roughly
twice as high as from 2, but the total number of requests sent to each
app server from both web servers is very roughly the same. (Can someone
confirm I'm intrepreting those #s correctly?)

According to the docs, each connect by default trys to keep the number
of requests sent to each worker the same, which looks to be happening
reasonably well. (I'm playing with trying the keep the number of
sessions balanced since our apps tend to be more of a memory issue than
a cpu issue. There is a setting on the connector for this.)

With a some info on your setup we can try to figure out the load
imbalance.

As a note, I am playing with the jk1.2.x connector, but our productio
systems use the old jk2.x connector. With that, I've seen a load
imbalance on the app servers when one of the app serves has gone down
for a while, and then has come back up. If the connectors are not reset,
they will try to "catch up" the restarted app server in terms of the
number of requests it has handled, thus loading it more heavily than
servers that have been up the whole time.

Brian



-->-----Original Message-----
-->From: ben short [mailto:jamin.short@gmail.com] 
-->Sent: Thursday, August 23, 2007 4:51 AM
-->To: Tomcat Users List
-->Subject: JK Loadbalancer not balancing fairly
-->
-->Hi All,
-->
-->We are doing some load testing on our setup and find that 
-->the cpu use age of tomcat reported by top on the two systems 
-->is not equal.
-->Typically we see figures like ~400% to 800% cpu on one 
-->machine and ~50% on the other machine for the java process. 
-->We would expect that the two cpu values to be equal.
-->
-->The jkstatus page on box one shows the following after a restart.
-->Although before a restart the Max column was showing 250 for 
-->jcpres1 and 32 for jcpres2.
-->
-->Name	Type	Host	Addr	Act	State	D	F	
-->M	V	Acc	Err	CE	RE	Wr	Rd	
-->Busy	Max	Route	RR	Cd	Rs
--> 	jcpres1	ajp13	172.16.4.11:8009	
-->172.16.4.11:8009	ACT	OK	0	1	1	
-->869	4276	2	4	0	939K	286M	1	
-->11	jcpres1	
-->	 	0/0
--> 	jcpres2	ajp13	172.16.4.12:8009	
-->172.16.4.12:8009	ACT	OK	0	1	1	
-->869	4277	2	1	0	943K	280M	2	
-->9	jcpres2	
-->	 	0/0
-->
-->and box 2
-->
-->Name	Type	Host	Addr	Act	State	D	F	
-->M	V	Acc	Err	CE	RE	Wr	Rd	
-->Busy	Max	Route	RR	Cd	Rs
--> 	jcpres1	ajp13	172.16.4.11:8009	
-->172.16.4.11:8009	ACT	OK	0	1	1	
-->484	3872	0	4	0	850K	256M	3	
-->10	jcpres1	
-->	 	0/0
--> 	jcpres2	ajp13	172.16.4.12:8009	
-->172.16.4.12:8009	ACT	OK	0	1	1	
-->483	3871	0	4	0	850K	260M	1	
-->10	jcpres2	
-->	 	0/0
-->
-->
-->Our system setup.
-->
-->Both machines are running the the following software on RedHat 4ES
-->
-->Httpd 2.2.4
-->Mod JK 1.2.25
-->Tomcat 6.0.12
-->Java 1.6.0_01
-->
-->Box 1.
-->
-->workers.properties
-->
--># JK Status worker config
-->
-->worker.list=jkstatus
-->worker.jkstatus.type=status
-->
--># Presentaton Load Balancer Config
-->
-->worker.list=preslb
-->
-->worker.preslb.type=lb
-->worker.preslb.balance_workers=jcpres1,jcpres2
-->worker.preslb.sticky_session=1
-->
-->worker.jcpres1.port=8009
-->worker.jcpres1.host=172.16.4.11
-->worker.jcpres1.type=ajp13
-->worker.jcpres1.lbfactor=1
-->worker.jcpres1.fail_on_status=503,400,500,909
-->
-->worker.jcpres2.port=8009
-->worker.jcpres2.host=172.16.4.12
-->worker.jcpres2.type=ajp13
-->worker.jcpres2.lbfactor=1
-->worker.jcpres2.fail_on_status=503,400,500,909
-->
-->
-->Box 2.
-->
-->workers.properties
-->
--># JK Status worker config
-->
-->worker.list=jkstatus
-->worker.jkstatus.type=status
-->
--># Presentaton Load Balancer Config
-->
-->worker.list=preslb
-->
-->worker.preslb.type=lb
-->worker.preslb.balance_workers=jcpres1,jcpres2
-->worker.preslb.sticky_session=1
-->
-->worker.jcpres1.port=8009
-->worker.jcpres1.host=172.16.4.11
-->worker.jcpres1.type=ajp13
-->worker.jcpres1.lbfactor=1
-->worker.jcpres1.fail_on_status=503,400,500,909
-->
-->worker.jcpres2.port=8009
-->worker.jcpres2.host=172.16.4.12
-->worker.jcpres2.type=ajp13
-->worker.jcpres2.lbfactor=1
-->worker.jcpres2.fail_on_status=503,400,500,909
-->
-->---------------------------------------------------------------------
-->To start a new topic, e-mail: users@tomcat.apache.org To 
-->unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
-->For additional commands, e-mail: users-help@tomcat.apache.org
-->
-->

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org