You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tomcat.apache.org by Mathias Herberts <Ma...@iroise.net> on 2002/05/10 23:19:51 UTC

Re: cvs commit: jakarta-tomcat-connectors/jk/native2/common jk_worker_lb.c

costin@apache.org wrote:
> 
> costin      02/05/09 14:06:48
> 
>   Modified:    jk/native2/common jk_worker_lb.c
>   Log:
>   That's the big one.
> 
>   Please review !
> 
>   It changes the handling of lb_value to int. I also cleaned up the logic so
>   it's easier ( I hope ) to understand what's happening. "Levels" replace
>   the 'local worker', I thing I got the logic straight for those.
> 
>   I started to add a 'introspection' data, to validate and better report
>   the config.
> 
>   We use one table per level. At the moment the maximum number of workers
>   is hardcoded ( to 255 ), we could make it dynamic but that would make things
>   pretty complex when we add workers dynamically ( it won't work without
>   a CS or atomic operations )

Hi Costin,

I read your code throughly and found no problem in
get_most_suitable_worker, I think your approach to prioritizing workers
is the best. What bernd and I had done was mainly driven by the need to
have a frontal load balancer detect the failure of the local worker(s).
Since my last patch and having read yours I think I found a better way
to make the load balancer detect failures.

Configure all Apache instances so they see all Tomcat instances, assign
a higher priority to local workers on each Apache, therefore local
workers will be chosen first. On each Apache, the load balancing worker
is called lb. Another load balancing worker, balancing only the local
workers, is called hwlb. The hardware load balancer checks the health of
the Apache servers using a URI which is served by hwlb instead of lb,
therefore if there are no more local workers left alive, the requests
the hardware load balancer dispatches to the associated Apache before it
can detect the local workers failure will be rerouted to the other non
local workers and the client will only loose its session information,
she will not get any errors. When the hardware load balancer ends up
detecting the local workers failure (because the hwlb worker rejected
the request due to the lack of available worker), it will declare the
Apache inactive and will only use the other ones.

This setup solves my use cases at least, I don't know for Bernd's.

There remains a related problem in jk_requtil in
jk2_requtil_getCookieByName, as I mentioned several months ago on the
list, the cookie extraction does not work for cookies whose format
conforms to RFC 2169, that is the cookie value is enclosed in double
quotes. Such cookie format is used by lynx for example. I had submitted
a patch into the bug database but cannot find it anymore, I'll have to
look up my archives.

Good job on the lb worker Costin,

Mathias.

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: cvs commit: jakarta-tomcat-connectors/jk/native2/common jk_worker_lb.c

Posted by Bernd Koecke <bk...@schlund.de>.

costinm@covalent.net wrote:
> On Tue, 14 May 2002, Bernd Koecke wrote:
> 
> 
>>Hi Costin,
>>
>>the new patch seems to work, but I'll test it more exactly tomorrow. Then I'll 
>>create the patches and the functional description.
>>
>>In short, the patched lb_worker uses an additinal flag on the other workers (e.g 
>>worker.ajp13.local_worker=1) to determine if it should be moved to the beginning 
>>of the balanced_workers. So we don't need to deal with two lists in lb_worker 
>>and the lb_value '0' has no special meaning. The flag for sending requests only 
>>to local workers is 'local_worker_only' on the lb_worker. More when the patch is 
>>tested and ready.
> 
> 
> Ok. I already commited part of the changes for jk2 - but my version is 
> called 'hwBalanceErr', on worker_lb.
> 
> If 0 normal selection of non-local workers takes place if all locals are 
> in error state. If non 0, we'll return the value as the error code - for 
> a front-end balancer to detect and stop forwarding requests for this 
> instance. 
> 
> I think that's the behavior you need - and it also allows customization
> for the returned error code.
> 

That sounds great, many thanks!

The patch for jk1 is on the way and I added some explanation how it works and 
about the two config flags.

Bernd


> Costin
> 
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 



-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: bk@schlund.de


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: cvs commit: jakarta-tomcat-connectors/jk/native2/common jk_worker_lb.c

Posted by co...@covalent.net.

On Tue, 14 May 2002, Bernd Koecke wrote:

> Hi Costin,
> 
> the new patch seems to work, but I'll test it more exactly tomorrow. Then I'll 
> create the patches and the functional description.
> 
> In short, the patched lb_worker uses an additinal flag on the other workers (e.g 
> worker.ajp13.local_worker=1) to determine if it should be moved to the beginning 
> of the balanced_workers. So we don't need to deal with two lists in lb_worker 
> and the lb_value '0' has no special meaning. The flag for sending requests only 
> to local workers is 'local_worker_only' on the lb_worker. More when the patch is 
> tested and ready.

Ok. I already commited part of the changes for jk2 - but my version is 
called 'hwBalanceErr', on worker_lb.

If 0 normal selection of non-local workers takes place if all locals are 
in error state. If non 0, we'll return the value as the error code - for 
a front-end balancer to detect and stop forwarding requests for this 
instance. 

I think that's the behavior you need - and it also allows customization
for the returned error code.

Costin

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: cvs commit: jakarta-tomcat-connectors/jk/native2/common jk_worker_lb.c

Posted by Bernd Koecke <bk...@schlund.de>.

costinm@covalent.net wrote:
[...]

> 
> I'll implement the same thing in jk2, but I wait your patch for jk1.
> 

Hi Costin,

the new patch seems to work, but I'll test it more exactly tomorrow. Then I'll 
create the patches and the functional description.

In short, the patched lb_worker uses an additinal flag on the other workers (e.g 
worker.ajp13.local_worker=1) to determine if it should be moved to the beginning 
of the balanced_workers. So we don't need to deal with two lists in lb_worker 
and the lb_value '0' has no special meaning. The flag for sending requests only 
to local workers is 'local_worker_only' on the lb_worker. More when the patch is 
tested and ready.

Bernd

> Costin
> 
> 
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 

-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: bk@schlund.de

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: cvs commit: jakarta-tomcat-connectors/jk/native2/common jk_worker_lb.c

Posted by Bernd Koecke <bk...@schlund.de>.

costinm@covalent.net wrote:
> On Tue, 14 May 2002, Bernd Koecke wrote:
> 
> 
>>The '0' as lb_value is needed to determine which are the main/local-workers. If 
>>we don't have this special value we need an additional config-flag with a list 
>>of the local/main-workers like in Mathias patch.
>>
>>Should I add an additional config-flag (I will take it from Mathias patch) or do 
>>we stay with the special '0' value?
> 
> 
> I think it would be a good idea, it'll make things cleaner.
> 
> 'local_worker' would be allways selected, and if 'main_worker_mode' ( or 
> maybe 'hw_lb_mode' ) no fallback will happen.
> 
> 
> 
>>The 'main_worker_mode' is not the same like the 'in_main_worker_mode' var in 
>>lb_worker struct. If 'main_worker_mode' flag is set to 'reject' in the 
>>workers.properties the reject var of lb_worker struct is set to JK_TRUE. The 
>>'in_main_worker_mode' var of lb_worker struct is set to JK_TRUE if there is in 
>>minimum one worker with '0' as lb_value.
> 
> 
> That's a bit confusing. Maybe some better variable names are needed.
> 
> 2 flags should be enough - 'local_worker' and 'local_worker_only' ( or 
> something that makes it clear that if the flag is set, no fallback will
> occur but an error is returned for the hw balancer ).

Ok, how should we handle the local_worker list? The current code depends on one 
worker list. And for requests with a session its easier to look into one list. 
Is it ok to have the balanced_workers and one ore more of these workers could be 
  in the local_worker list? Then we could leave must of the code in validate 
function untouched and after getting all the workers we go through the 
local_worker list, if any, and move the worker from this list at the beginning 
of the balanced_workers and mark them as local. Would this be ok? Oterwise we 
have to handle two lists and it would be possible to have only local workers and 
no balanced_workers. Then the lb_module makes no sense, but it is configurable 
and we have to deal with this. Another solution is to have two lists in config 
but only one in lb_worker. But then we have to rewrite most of the code in 
validate and handle memory etc. You know I'm not so experienced in C, so I would 
prefere the first suggestion :).

Bernd

> 
> I'll implement the same thing in jk2, but I wait your patch for jk1.
> 
> Costin
> 
> 
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 

-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: bk@schlund.de

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: cvs commit: jakarta-tomcat-connectors/jk/native2/common jk_worker_lb.c

Posted by co...@covalent.net.

On Tue, 14 May 2002, Bernd Koecke wrote:

> The '0' as lb_value is needed to determine which are the main/local-workers. If 
> we don't have this special value we need an additional config-flag with a list 
> of the local/main-workers like in Mathias patch.
> 
> Should I add an additional config-flag (I will take it from Mathias patch) or do 
> we stay with the special '0' value?

I think it would be a good idea, it'll make things cleaner.

'local_worker' would be allways selected, and if 'main_worker_mode' ( or 
maybe 'hw_lb_mode' ) no fallback will happen.

> The 'main_worker_mode' is not the same like the 'in_main_worker_mode' var in 
> lb_worker struct. If 'main_worker_mode' flag is set to 'reject' in the 
> workers.properties the reject var of lb_worker struct is set to JK_TRUE. The 
> 'in_main_worker_mode' var of lb_worker struct is set to JK_TRUE if there is in 
> minimum one worker with '0' as lb_value.

That's a bit confusing. Maybe some better variable names are needed.

2 flags should be enough - 'local_worker' and 'local_worker_only' ( or 
something that makes it clear that if the flag is set, no fallback will
occur but an error is returned for the hw balancer ).

I'll implement the same thing in jk2, but I wait your patch for jk1.

Costin

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: cvs commit: jakarta-tomcat-connectors/jk/native2/common jk_worker_lb.c

Posted by Bernd Koecke <bk...@schlund.de>.

costinm@covalent.net wrote:
> On Mon, 13 May 2002, Bernd Koecke wrote:
> 
> 
>>Sorry, I must say it again, for my environment it is an error, if a _switched
>>off_ tomcat got a request without a sessionid or with a session from another 
>>node. Its not necessary that this tomact-apache tandem is
> 
> 
> In the current code ( in jk2 ), if a worker is in 'disabled' state it'll 
> only get requests with sessionid, as you need.
> 
> If it is not disabled, but has a higher level ( == distance ), it'll
> still not get any new requests unless all closer workers are in error
> state.
> 
> 
>>update and start them up again. If there are no local/main worker I need an 
>>error response and no routing to a switched off tomcat. Its possible that this 
>>happens once per day.
> 
> 
> Setting the non-local workers in disabled state should do that. 
> 
> 
> 
>>I know this might be a special environment. I spent some time in jk1 to
>>build a working patch. Than I started looking in jk2. I'm not a good C
> 
> 
> Your patch looks ok. Would it be possible to remove the use of '0' as 
> a special value, and keep only the main_worker_mode flag for that ?
> Also, what's the meaning of 'reject' flag ? 
> 

The '0' as lb_value is needed to determine which are the main/local-workers. If 
we don't have this special value we need an additional config-flag with a list 
of the local/main-workers like in Mathias patch.

Should I add an additional config-flag (I will take it from Mathias patch) or do 
we stay with the special '0' value?

The reject value of the 'main_worker_mode' flag is for the special behavior not 
to balance even if no main-worker is up. Without this flag you would send a 
request to a non main-worker if all main-workers are in error state. When the 
main-workers are only a preference it might be ok to send a request to a non 
main-worker and lose only the session but don't send an error response. I think 
this was what Mathias said. But I need an error response if the main-worker is down.

The 'main_worker_mode' is not the same like the 'in_main_worker_mode' var in 
lb_worker struct. If 'main_worker_mode' flag is set to 'reject' in the 
workers.properties the reject var of lb_worker struct is set to JK_TRUE. The 
'in_main_worker_mode' var of lb_worker struct is set to JK_TRUE if there is in 
minimum one worker with '0' as lb_value.

> Also it would be nice to get some documentation for the new settings.
> 

Thats no problem, I could write a patch for the HTML-page.

> Regarding jk2 - I just want to know if the current solution is ok or 
> what are still problems. For now the priority is getting the patch in jk1
> so it can be released in 4.0.4 final ( so today or early tommorow I would 
> like to close this issue ). 

This sounds pretty good, many thanks!

Bernd

> 
> Costin 
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 

-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: bk@schlund.de

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: cvs commit: jakarta-tomcat-connectors/jk/native2/common jk_worker_lb.c

Posted by co...@covalent.net.

On Mon, 13 May 2002, Bernd Koecke wrote:

> Sorry, I must say it again, for my environment it is an error, if a _switched
> off_ tomcat got a request without a sessionid or with a session from another 
> node. Its not necessary that this tomact-apache tandem is

In the current code ( in jk2 ), if a worker is in 'disabled' state it'll 
only get requests with sessionid, as you need.

If it is not disabled, but has a higher level ( == distance ), it'll
still not get any new requests unless all closer workers are in error
state.

> update and start them up again. If there are no local/main worker I need an 
> error response and no routing to a switched off tomcat. Its possible that this 
> happens once per day.

Setting the non-local workers in disabled state should do that. 

> I know this might be a special environment. I spent some time in jk1 to
> build a working patch. Than I started looking in jk2. I'm not a good C

Your patch looks ok. Would it be possible to remove the use of '0' as 
a special value, and keep only the main_worker_mode flag for that ?
Also, what's the meaning of 'reject' flag ? 

Also it would be nice to get some documentation for the new settings.

Regarding jk2 - I just want to know if the current solution is ok or 
what are still problems. For now the priority is getting the patch in jk1
so it can be released in 4.0.4 final ( so today or early tommorow I would 
like to close this issue ). 

Costin 

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: cvs commit: jakarta-tomcat-connectors/jk/native2/common jk_worker_lb.c

Posted by Bernd Koecke <bk...@schlund.de>.

Mathias Herberts wrote:
 > costin@apache.org wrote:
 >
 >> costin      02/05/09 14:06:48
 >>
 >> Modified:    jk/native2/common jk_worker_lb.c Log: That's the big one.
 >>
 >> Please review !
 >>
 >> It changes the handling of lb_value to int. I also cleaned up the logic so
 >> it's easier ( I hope ) to understand what's happening. "Levels" replace the
 >> 'local worker', I thing I got the logic straight for those.
 >>
 >> I started to add a 'introspection' data, to validate and better report the
 >> config.
 >>
 >> We use one table per level. At the moment the maximum number of workers is
 >> hardcoded ( to 255 ), we could make it dynamic but that would make things
 >> pretty complex when we add workers dynamically ( it won't work without a CS
 >> or atomic operations )
 >
 >
 > Hi Costin,
 >
 > I read your code throughly and found no problem in get_most_suitable_worker,
 > I think your approach to prioritizing workers is the best. What bernd and I
 > had done was mainly driven by the need to have a frontal load balancer detect
 > the failure of the local worker(s). Since my last patch and having read yours
 > I think I found a better way to make the load balancer detect failures.
 >
 > Configure all Apache instances so they see all Tomcat instances, assign a
 > higher priority to local workers on each Apache, therefore local workers will
 > be chosen first. On each Apache, the load balancing worker is called lb.
 > Another load balancing worker, balancing only the local workers, is called
 > hwlb. The hardware load balancer checks the health of the Apache servers
 > using a URI which is served by hwlb instead of lb, therefore if there are no
 > more local workers left alive, the requests the hardware load balancer
 > dispatches to the associated Apache before it can detect the local workers
 > failure will be rerouted to the other non local workers and the client will
 > only loose its session information, she will not get any errors. When the
 > hardware load balancer ends up detecting the local workers failure (because
 > the hwlb worker rejected the request due to the lack of available worker), it
 > will declare the Apache inactive and will only use the other ones.
 >
 > This setup solves my use cases at least, I don't know for Bernd's.

Sorry, I must say it again, for my environment it is an error, if a _switched
off_ tomcat got a request without a sessionid or with a session from another 
node. Its not necessary that this tomact-apache tandem is
shut down. We switch off a port on this node and than the balancer wouldn't send
a request to it. And than no mod_jk is allowed to send a request to it without a
session for this node. It is normal that some nodes are _switched off_. We need
this for a a graceful update. We switch off some nodes, wait till there are no 
active sessions (all timed out) and then we shutdown apache + tomcat, make an 
update and start them up again. If there are no local/main worker I need an 
error response and no routing to a switched off tomcat. Its possible that this 
happens once per day.

I know this might be a special environment. I spent some time in jk1 to
build a working patch. Than I started looking in jk2. I'm not a good C
developer, so I needed some time for looking into jk2. Now I think I understand
the internal structure. I don't want to send untested patches or patches which
build more problems than it solves. The last patch I sent for jk1 solved my
problem, I tested it here on a testcluster and I hope it broke no prior
functionality. But it will take some time till I could send a patch for jk2. My 
boss give me some deadlines for other projects, one is next Wednesday. I would 
be happy if jk2 make it possible to use local/main-worker with sticky sessions 
(need only one per node/mod_jk). And if all local/main-worker are down the 
request gets an error-response. I will do my best to install a jk2 on my test 
cluster and try to play around with it.

May be I misunderstood Mathias suggestion for jk2, than delete the whole mail 
:). I hope I could send a patch for jk2 or look into the new code shortly.

Again, I think its a very good idea to use ints for lb_value, set a maximum and 
correct the value if one reaches this upper bound. And its a good idea to make 
the local/main-worker a more general thing. For a cluster environment it is a 
nice feature :).

Thanks

Bernd

 >
 > There remains a related problem in jk_requtil in jk2_requtil_getCookieByName,
 > as I mentioned several months ago on the list, the cookie extraction does not
 > work for cookies whose format conforms to RFC 2169, that is the cookie value
 > is enclosed in double quotes. Such cookie format is used by lynx for example.
 > I had submitted a patch into the bug database but cannot find it anymore,
 > I'll have to look up my archives.
 >
 > Good job on the lb worker Costin,
 >
 > Mathias.

-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: bk@schlund.de

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: cvs commit: jakarta-tomcat-connectors/jk/native2/common jk_worker_lb.c

Posted by co...@covalent.net.

Hi Mathias,

Thanks for the review.

Few comments:

> Configure all Apache instances so they see all Tomcat instances, assign
> a higher priority to local workers on each Apache, therefore local

What you set is the 'level' ( or proximity, distance, etc ) - lower 
numbers mean closer ( and higher priority, local worker ). 

I still need to add and verify the setters and check the various cases.

> workers, is called hwlb. The hardware load balancer checks the health of
> the Apache servers using a URI which is served by hwlb instead of lb,

You may have noticed the 'jk_status' worker - it displays runtime 
informations ( still in progress - we need to agregate the statistics 
from all workers using shm, but the status of the workers should be fine ).

It can be easily extended ( or a similar handler added ) so it 
can generate info that can be 'consumed' by a front load balancer
or other tools. Like number of active workers and average response times.


I am also investigating how we can use the number of active connections
on each worker and the response times in the main lb, any idea would
be wellcome :-)


> This setup solves my use cases at least, I don't know for Bernd's.

Ok, but let me know if you find jk2 acceptable for your case and 
what is the minimal change to jk1 that we can do. I still have 
to merge your patches, I was waiting for more comments.

I don't think we can/should backport the new code, it's far too
much. 


> There remains a related problem in jk_requtil in
> jk2_requtil_getCookieByName, as I mentioned several months ago on the
> list, the cookie extraction does not work for cookies whose format
> conforms to RFC 2169, that is the cookie value is enclosed in double
> quotes. Such cookie format is used by lynx for example. I had submitted
> a patch into the bug database but cannot find it anymore, I'll have to
> look up my archives.

Please do, and send it to the list ( with PATCH ).

I would apreciate 2 patches, one for jk1 and one for jk2 ( if the problem 
is in both ).

Costin


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>