You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tomcat.apache.org by Amund Elstad <Am...@ergo.no> on 2002/05/02 13:41:54 UTC

RE: PROPOSAL: mod_jk2: Group/Instance

Hi,

My understanding of the old lb-worker is that although you can tweak its
behavior using load-balancing factors, it does not theoretically cover:

(1) all requests without a session are routed to a specific tomcat instance
(if that instance is working). 
(2) Tomcat instances in standby or "soft shutdown" mode where they serve
requests bound by established sessions, and requests without a session only
if all non-standby instances have failed.

(1) is (as noted previously) useful when you have a cluster of Apache
servers.
(2) is useful in a 24x7 production environment. eg mark an instance for
"soft shutdown", wait for existing sessions to expire, do maintenance
(hardware or software), and then add instance back to normal lb-mode. 

It would be great to have support for both in jk2, for example by using
percentage values for load-balancing factors (adding up to 100% for all
instances in a group). Then, for (1) adjust the lbfactor to 100%, and for
(2) adjust the lbfactor to 0%. 

BTW, I think both the Group/Instance and the Autoconfig proporsals are very
good indeed.

cheers,
amund

costinm@covalent.net wrote:

>On Tue, 30 Apr 2002, Bernd Koecke wrote:
>
>> some weeks ago I send a patch for mod_jk for an only routing lb_worker. A
few 
>> days later I sent the docu. Henry Gomez said, that it should be commited.
But it 
>> I think it isn't in the repository. But its the same  with me here, to
mutch 
>> work for to less time :).
>
>I think it is in mod_jk, I remember seeing the commit. 
>
>And I think I commited it in jk2 as well ( after some modifications ).
>
>> I need sticky sessions but no loadbalancing in the module. If a request
without 
>> a session comes in, it should be routed to the _local_ tomcat.
>
>Well, there is another use-case with the exact same behavior - Apache2 
>with tomcat in JNI mode. All requests without session should be routed to 
>the _jni_ channel ( i.e. in-process, minimal overhead ).
>
>It's exacly the same - so be sure I do my best to handle this case :-)
>
>Apache2 acts like a 'natural' load-balancer/fail-over, with the parent
>process monitoring for crashes and it starts/stop childs based on 
>load.
>
>
>> I think this could be possible with the associated instance of a channel
(item 
>> 7). Then I have to configure all four nodes for the same group. Because
all 
>> nodes will serve the same webapps and associate the channel with this
group. But 
>> for this I need a non balancing group. I don't see if the default
behavior of a 
>> group is balancing and if this can be switched off. Is this right or do I
miss 
>> something?
>
>The default is balancing, but you can tune this using weithgs ( and I 
>think we use your code for making one instance 'top priority').
>
>Please check the code, take a look and send additional comments/patches.
>
>It's not yet completely done, of course.
>
>
>Thanks,
>Costin 

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

[PATHC] jakarta-tomcat-connectors Re: PROPOSAL: mod_jk2: Group/Instance

Posted by Bernd Koecke <bk...@schlund.de>.

Hi Costin,

now here is my patch. It is very small and it works. And we don't need 
additional config flags. When the lb_value is read from the config file it is 
checked against zero. With this a flag in lb_worker is set so that the 
get_max_lb-function could decide if this worker should be used or not. When you 
set lb_value of the main or local worker to 1 and 0 for the others all works 
fine. When you switch off the main worker you will be routed to the first of the 
other worker. Thats not very balancing, but the load balancer in front of the 
cluster shouldn't send requestes to a node with a shutdowned tomcat. It is only 
for requests with sessions on this node and for the time between shutdown of 
tomcat and the recognition of this by the balancer.

When tomcat is up again it will take a little time, in maximum the value of 
WAIT_BEFORE_RECOVER and the worker will be choosen and because of the flag it 
wouldn't get _inf_ as his lb_value.

the patch was created by
cvs diff -u jk_lb_worker.c

Bernd

Bernd Koecke wrote:
> Hi Costin,
> 
> May be I checked out the wrong repository. I checked out 
> jakarta-tomcat-connectors with the 
> CVSROOT=:pserver:anoncvs@cvs.apache.org:/home/cvspublic
> 
> Now to the details, see below.
> 
> costinm@covalent.net wrote:
> 
>> On Thu, 2 May 2002, Bernd Koecke wrote:
>>
>>
>>> misunderstood it. After you said that my patch is included a had a 
>>> closer look at mod_jk. I can't see anything of my code but I found 
>>> the special meaning of the zero lb_factor/lb_value. It seems that I 
>>> didn't understand it right at the first time. This could solve my 
>>> problem but after a closer look and some testing I found another 
>>> problem. When you set the lb_value in workers.properties to 1 for the 
>>> local tomcat and 0 for the others, you get the desired behavior. But 
>>> if you switch off the local tomcat for a short time you come into 
>>> trouble. The problem is the 0 for the other workers. The calculation 
>>> of lb_worker transforms the 0 to _inf_. Because 1/0 for a double is 
>>> _inf_. This is greater than any 
>>
>>
>>
>> I think there is a piece that checks for 0 and sets it to 
>> DEFAULT_VALUE (==1 ) before doing 1/lb. 
> 
> 
> No, I think not :). I checked it yesterday. With some additional log 
> statements in the validate function of jk_lb_worker.c you get the value 
> _inf_ for the lb_factor and lb_value (line 434-444). Because if it would 
> be set to 1, my config hadn't worked. Because I set the local worker to 
> 1 and the others to 0.
> 
>>
>> While looking at the code - I'm not very sure this whole float is needed,
>> I'll try to find a way to simplify it and use ints ( maybe 0..100 with 
>> some 'special' values for NEVER and ALLWAYS, or some additional flags ).
>>
> 
> This is possible, but then you must add a check if the value is 0. 
> Because without it you calc 1/0 with an int and this will give you an 
> error.
> 
>> But the way it works ( or at least how I understand it ) is that if 
>> the main worker fails, then we look at all workers in error state and 
>> try the one with the oldest error. And the 'main' worker will be tried 
>> again when the timeout expires.
>>
> 
> Thats not the whole story. Its right you will check the main worker when 
> its back again and use it only once. Because when the request was 
> successful handled rec->in_recovering is true (line 332 of 
> jk_lb_worker.c, service function). Than get_max_lb get the value _inf_ 
> from one of the other worker. Than the things happen which I said in my 
> prior mail.
> 
>> I haven't tested this too much, I just applied the patches ( that I 
>> understand :-), I'll add some more debugging for this process and 
>> maybe we can find a better solution.
>>
>> But this functionality is essential for the JNI worker and very important
>> in general - so I really want to find the best solution. If you have any
>> patch idea, let me know.
>>
>> To avoid further confusion and complexity in the lb-factor/value, I 
>> think we should add one more flag ( 'local_worker' ? ) and use it 
>> explicitely. Again, patches are wellcome - it's allways good to have 
>>  different ( and more ) eyes looking at the code.
> 
> 
> That was it what I did in my sent patch, the additional documentation 
> was sent a few days later. But my additions to the lb_worker were a 
> little bit to complex. You are right we should get it when we use the 
> flag only on the main worker and change the behavior after a failure for 
> this worker. But we need the trick with 0/inf for the other worker, 
> because only with this we have the situation that the other worker 
> wouldn't be asked when there is no session and the main worker is up.
> 
> I will try to build another patch and send it. I think it could be 
> possible without an additional flag.
> 
> Another tought about this:
> When you use double and we fix the handling after an error, the main 
> worker would never reach _inf_. Because the lb_factor is < 1 if lb_value 
> wasn't 0. After choosing the worker this value is added to the lb_value. 
> But with a high value for lb_value the differenc between two savable 
> double numbers is greater than the lb_factor. But this is only 
> interessting in theory. I think in real world we will reboot apache 
> before this will happen :).
> 
> 
> Bernd
> 
>> ( that can go in both jk1, but I can't see a release of jk2 without 
>> this functionality )
>>
>> Costin
>>
>>
>>
>>> other lb_value and greater than the lb_value of the local tomcat. But 
>>> after a failure of the local tomcat he is in error_state. After some 
>>> time its set to recovering and if the local tomcat is back again the 
>>> function jk(2)_get_max_lb gets the highest lb_value. This is _inf_ 
>>> from one of the other workers. The addition of a value to _inf_ is 
>>> meaningless. You end up with an lb_value of _inf_ for the local 
>>> worker. If this worker isn't the first in the worker list, it will 
>>> never be choosen again. Because his lb_value will never be less than 
>>> another lb_value, because all the other workers have _inf_ as theire 
>>> lb_values. So every request without a session will be routed to the 
>>> first of the other tomcats.
>>>
>>> The only way out is a restart of the local apache after tomcat is up 
>>> and running. But I don't know when tomcat is finished with all his 
>>> contexts and started the connectors.
>>>
>>> I didn't looked very deep into jk2, but I found the same 
>>> get_most_suitable_worker and get_max_lb functions. The jk2_get_max_lb 
>>> function will always return _inf_. In your answer to some other mails 
>>> you said, that workers could be removed. Do I understand it right, 
>>> that if my local tomcat goes down his worker is removed from the list 
>>> and after he is comming up again added to the worker list with 
>>> reseted lb_value (only for mod_jk2)?
>>>
>>> The next days I will look in the docu and code of jk2 and give it a 
>>> try. May be all my problems gone away with the new module :).
>>>
>>> Sorry if I ask stupid questions, but I want to make it working for 
>>> our new cluster.
>>>
>>> Thanks
>>>
>>> Bernd
>>>
>>>
>>>> This is essential for jk2's JNI worker, which fits perfectly this case
>>>> ( you don't want to send via TCP when you have a tomcat instance in 
>>>> the same process ).
>>>>
>>>>
>>>>
>>>>
>>>>> (2) Tomcat instances in standby or "soft shutdown" mode where they 
>>>>> serve
>>>>> requests bound by established sessions, and requests without a 
>>>>> session only
>>>>> if all non-standby instances have failed.
>>>>
>>>>
>>>>
>>>> That's what the SHM scoreboard is going to do ( among other things 
>>>> ). You can register tomcat instances ( which will be added 
>>>> automatically ),
>>>> or unregister - in which case no new requests ( except the old 
>>>> sessions )
>>>> will go to the unregistered tomcat.
>>>>
>>>>
>>>> Costin
>>>>
>>>>
>>>>


[...]



-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: bk@schlund.de

Re: [PATCH] added handling of a main worker in jk_lb_worker, Re: PROPOSAL: mod_jk2: Group/Instance

Posted by co...@covalent.net.

Done. 

BTW, you may have an older version - you should update from head
and then test.

Thanks.

Costin

On Fri, 3 May 2002, Bernd Koecke wrote:

> Hi Costin,
> 
> it wasn't difficult, so here is the new patch. The new (old) behavior is:
> The main worker is defined by a lb_value of 0. This will never be changed in 
> jk_lb_worker. The other workers can get a value greater than 0. If the value 
> from config file is less than 0 it is multiplicated with -1.
> 
> Your are right this is a better solution. We can switch from doubles to int and 
> we get the other worker balanced if the main worker is down.
> 
> Bernd
> 
> Bernd Koecke wrote:
> > Hi Costin,
> > 
> > costinm@covalent.net wrote:
> > 
> >> Hi Bernd,
> >>
> >> First, many thanks for your help :-)
> >>
> >>
> > 
> > your welcome, its a lot of fun :)
> > 
> >>> No, I think not :). I checked it yesterday. With some additional log 
> >>> statements in the validate function of jk_lb_worker.c you get the 
> >>> value _inf_ for the lb_factor and lb_value (line 434-444). Because if 
> >>> it would be set to 1, my config hadn't worked. Because I set the 
> >>> local worker to 1 and the others to 0.
> >>
> >>
> >>
> >> I'll check again, and fix it if necesarry.
> >>
> >> I wrote some code in jk2 that seems to solve the problem, and I can
> >> backport this to jk1 if it is correct.
> >>
> >> Probably this is my mistake - I remember the discussion and the patch
> >> that was sent for this problem, and most likely I did something
> >> wrong commiting it ( i.e. I did few changes trying to simplify it, and it
> >> seems I 'simplified' too much ). But my memory still has the patch's 
> >> logic
> >> which seemed fine :-)
> >>
> >>
> >>> This is possible, but then you must add a check if the value is 0. 
> >>> Because without it you calc 1/0 with an int and this will give you an 
> >>> error.
> >>
> >>
> >>
> >> Yes, of course. 0 will continue to mean 'default worker'.
> >>
> > 
> > see below
> > 
> >> I'm not very comfortable with float calculations in the critical
> >> path ( and in an area that is executed concurently !). The only problem
> >> is what happens on overflows - the lb_value may become 0 ( or a small 
> >> value ) and then the worker will take all the load.
> >>
> >>
> >>> Thats not the whole story. Its right you will check the main worker 
> >>> when its back again and use it only once. Because when the request 
> >>> was successful handled rec->in_recovering is true (line 332 of 
> >>> jk_lb_worker.c, service function). Than get_max_lb get the value 
> >>> _inf_ from one of the other worker. Than the things happen which I 
> >>> said in my prior mail.
> >>
> >>
> >>
> >>> That was it what I did in my sent patch, the additional documentation 
> >>> was sent a few days later. But my additions to the lb_worker were a 
> >>> little bit to complex. You are right we should get it when we use the 
> >>> flag only on the main worker and change the behavior after a failure 
> >>> for this worker. But we need the trick with 0/inf for the other 
> >>> worker, because only with this we have the situation that the other 
> >>> worker wouldn't be asked when there is no session and the main worker 
> >>> is up.
> >>
> >>
> >>
> >>
> >> Ok, can you send the patch again :-) ?
> >> For going back to the main worker - if we let it with lb_value=0 at all
> >> time ( i.e. we don't alter that at any time ), and only in_error_state 
> >> is set on failure - then I believe the thing will work fine.
> >>
> >>
> > 
> > Thats the invers from the actual situation. So my patch from a few hours 
> > earlier this day depends on the fact that the other worker get a 
> > lb_value of 0 in the config file. This will be converted to _inf_ and 
> > the main worker gets 1 and this  will be the minimal lb_value of the 
> > balanced workers. If we want the possibility to switch to ints I could 
> > send a new patch which handles 0 as a special value for the main worker.
> > 
> > Should I?
> > 
> > Bernd
> > 
> > 
> >>
> >>> I will try to build another patch and send it. I think it could be 
> >>> possible without an additional flag.
> >>
> >>
> >>
> >> Great !
> >>
> >>
> >>
> >>> Another tought about this:
> >>> When you use double and we fix the handling after an error, the main 
> >>> worker would never reach _inf_. Because the lb_factor is < 1 if 
> >>> lb_value wasn't 0. After choosing the worker this value is added to 
> >>> the lb_value. But with a high value for lb_value the differenc 
> >>> between two savable double numbers is greater than the lb_factor. But 
> >>> this is only interessting in theory. I think in real world we will 
> >>> reboot apache before this will happen :).
> >>
> >>
> >>  
> >> That may become a problem if we use ints.
> >>
> >> Costin
> >>
> >>
> >>
> > [...]
> > 
> > 
> 
> 
> 
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

[PATCH] added handling of a main worker in jk_lb_worker, Re: PROPOSAL: mod_jk2: Group/Instance

Posted by Bernd Koecke <bk...@schlund.de>.

Hi Costin,

it wasn't difficult, so here is the new patch. The new (old) behavior is:
The main worker is defined by a lb_value of 0. This will never be changed in 
jk_lb_worker. The other workers can get a value greater than 0. If the value 
from config file is less than 0 it is multiplicated with -1.

Your are right this is a better solution. We can switch from doubles to int and 
we get the other worker balanced if the main worker is down.

Bernd

Bernd Koecke wrote:
> Hi Costin,
> 
> costinm@covalent.net wrote:
> 
>> Hi Bernd,
>>
>> First, many thanks for your help :-)
>>
>>
> 
> your welcome, its a lot of fun :)
> 
>>> No, I think not :). I checked it yesterday. With some additional log 
>>> statements in the validate function of jk_lb_worker.c you get the 
>>> value _inf_ for the lb_factor and lb_value (line 434-444). Because if 
>>> it would be set to 1, my config hadn't worked. Because I set the 
>>> local worker to 1 and the others to 0.
>>
>>
>>
>> I'll check again, and fix it if necesarry.
>>
>> I wrote some code in jk2 that seems to solve the problem, and I can
>> backport this to jk1 if it is correct.
>>
>> Probably this is my mistake - I remember the discussion and the patch
>> that was sent for this problem, and most likely I did something
>> wrong commiting it ( i.e. I did few changes trying to simplify it, and it
>> seems I 'simplified' too much ). But my memory still has the patch's 
>> logic
>> which seemed fine :-)
>>
>>
>>> This is possible, but then you must add a check if the value is 0. 
>>> Because without it you calc 1/0 with an int and this will give you an 
>>> error.
>>
>>
>>
>> Yes, of course. 0 will continue to mean 'default worker'.
>>
> 
> see below
> 
>> I'm not very comfortable with float calculations in the critical
>> path ( and in an area that is executed concurently !). The only problem
>> is what happens on overflows - the lb_value may become 0 ( or a small 
>> value ) and then the worker will take all the load.
>>
>>
>>> Thats not the whole story. Its right you will check the main worker 
>>> when its back again and use it only once. Because when the request 
>>> was successful handled rec->in_recovering is true (line 332 of 
>>> jk_lb_worker.c, service function). Than get_max_lb get the value 
>>> _inf_ from one of the other worker. Than the things happen which I 
>>> said in my prior mail.
>>
>>
>>
>>> That was it what I did in my sent patch, the additional documentation 
>>> was sent a few days later. But my additions to the lb_worker were a 
>>> little bit to complex. You are right we should get it when we use the 
>>> flag only on the main worker and change the behavior after a failure 
>>> for this worker. But we need the trick with 0/inf for the other 
>>> worker, because only with this we have the situation that the other 
>>> worker wouldn't be asked when there is no session and the main worker 
>>> is up.
>>
>>
>>
>>
>> Ok, can you send the patch again :-) ?
>> For going back to the main worker - if we let it with lb_value=0 at all
>> time ( i.e. we don't alter that at any time ), and only in_error_state 
>> is set on failure - then I believe the thing will work fine.
>>
>>
> 
> Thats the invers from the actual situation. So my patch from a few hours 
> earlier this day depends on the fact that the other worker get a 
> lb_value of 0 in the config file. This will be converted to _inf_ and 
> the main worker gets 1 and this  will be the minimal lb_value of the 
> balanced workers. If we want the possibility to switch to ints I could 
> send a new patch which handles 0 as a special value for the main worker.
> 
> Should I?
> 
> Bernd
> 
> 
>>
>>> I will try to build another patch and send it. I think it could be 
>>> possible without an additional flag.
>>
>>
>>
>> Great !
>>
>>
>>
>>> Another tought about this:
>>> When you use double and we fix the handling after an error, the main 
>>> worker would never reach _inf_. Because the lb_factor is < 1 if 
>>> lb_value wasn't 0. After choosing the worker this value is added to 
>>> the lb_value. But with a high value for lb_value the differenc 
>>> between two savable double numbers is greater than the lb_factor. But 
>>> this is only interessting in theory. I think in real world we will 
>>> reboot apache before this will happen :).
>>
>>
>>  
>> That may become a problem if we use ints.
>>
>> Costin
>>
>>
>>
> [...]
> 
> 



-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: bk@schlund.de

Re: PROPOSAL: mod_jk2: Group/Instance

Posted by Bernd Koecke <bk...@schlund.de>.

Hi Costin,

costinm@covalent.net wrote:
> Hi Bernd,
> 
> First, many thanks for your help :-)
> 
> 

your welcome, its a lot of fun :)

>>No, I think not :). I checked it yesterday. With some additional log statements 
>>in the validate function of jk_lb_worker.c you get the value _inf_ for the 
>>lb_factor and lb_value (line 434-444). Because if it would be set to 1, my 
>>config hadn't worked. Because I set the local worker to 1 and the others to 0.
> 
> 
> I'll check again, and fix it if necesarry.
> 
> I wrote some code in jk2 that seems to solve the problem, and I can
> backport this to jk1 if it is correct.
> 
> Probably this is my mistake - I remember the discussion and the patch
> that was sent for this problem, and most likely I did something
> wrong commiting it ( i.e. I did few changes trying to simplify it, and it
> seems I 'simplified' too much ). But my memory still has the patch's logic
> which seemed fine :-)
> 
> 
>>This is possible, but then you must add a check if the value is 0. Because 
>>without it you calc 1/0 with an int and this will give you an error.
> 
> 
> Yes, of course. 0 will continue to mean 'default worker'.
>

see below

> I'm not very comfortable with float calculations in the critical
> path ( and in an area that is executed concurently !). The only problem
> is what happens on overflows - the lb_value may become 0 ( or a small 
> value ) and then the worker will take all the load. 
> 
> 
> 
>>Thats not the whole story. Its right you will check the main worker when its 
>>back again and use it only once. Because when the request was successful handled 
>>rec->in_recovering is true (line 332 of jk_lb_worker.c, service function). Than 
>>get_max_lb get the value _inf_ from one of the other worker. Than the things 
>>happen which I said in my prior mail.
> 
> 
>>That was it what I did in my sent patch, the additional documentation was sent a 
>>few days later. But my additions to the lb_worker were a little bit to complex. 
>>You are right we should get it when we use the flag only on the main worker and 
>>change the behavior after a failure for this worker. But we need the trick with 
>>0/inf for the other worker, because only with this we have the situation that 
>>the other worker wouldn't be asked when there is no session and the main worker 
>>is up.
> 
> 
> 
> Ok, can you send the patch again :-) ? 
> 
> For going back to the main worker - if we let it with lb_value=0 at all
> time ( i.e. we don't alter that at any time ), and only in_error_state 
> is set on failure - then I believe the thing will work fine.
> 
>

Thats the invers from the actual situation. So my patch from a few hours earlier 
this day depends on the fact that the other worker get a lb_value of 0 in the 
config file. This will be converted to _inf_ and the main worker gets 1 and this 
  will be the minimal lb_value of the balanced workers. If we want the 
possibility to switch to ints I could send a new patch which handles 0 as a 
special value for the main worker.

Should I?

Bernd


> 
>>I will try to build another patch and send it. I think it could be possible 
>>without an additional flag.
> 
> 
> Great !
> 
> 
> 
>>Another tought about this:
>>When you use double and we fix the handling after an error, the main worker 
>>would never reach _inf_. Because the lb_factor is < 1 if lb_value wasn't 0. 
>>After choosing the worker this value is added to the lb_value. But with a high 
>>value for lb_value the differenc between two savable double numbers is greater 
>>than the lb_factor. But this is only interessting in theory. I think in real 
>>world we will reboot apache before this will happen :).
> 
>  
> That may become a problem if we use ints.
> 
> Costin
> 
> 
> 
[...]


-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: bk@schlund.de


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: PROPOSAL: mod_jk2: Group/Instance

Posted by co...@covalent.net.

Hi Bernd,

First, many thanks for your help :-)

> No, I think not :). I checked it yesterday. With some additional log statements 
> in the validate function of jk_lb_worker.c you get the value _inf_ for the 
> lb_factor and lb_value (line 434-444). Because if it would be set to 1, my 
> config hadn't worked. Because I set the local worker to 1 and the others to 0.

I'll check again, and fix it if necesarry.

I wrote some code in jk2 that seems to solve the problem, and I can
backport this to jk1 if it is correct.

Probably this is my mistake - I remember the discussion and the patch
that was sent for this problem, and most likely I did something
wrong commiting it ( i.e. I did few changes trying to simplify it, and it
seems I 'simplified' too much ). But my memory still has the patch's logic
which seemed fine :-)

> This is possible, but then you must add a check if the value is 0. Because 
> without it you calc 1/0 with an int and this will give you an error.

Yes, of course. 0 will continue to mean 'default worker'.

I'm not very comfortable with float calculations in the critical
path ( and in an area that is executed concurently !). The only problem
is what happens on overflows - the lb_value may become 0 ( or a small 
value ) and then the worker will take all the load. 


> Thats not the whole story. Its right you will check the main worker when its 
> back again and use it only once. Because when the request was successful handled 
> rec->in_recovering is true (line 332 of jk_lb_worker.c, service function). Than 
> get_max_lb get the value _inf_ from one of the other worker. Than the things 
> happen which I said in my prior mail.

> That was it what I did in my sent patch, the additional documentation was sent a 
> few days later. But my additions to the lb_worker were a little bit to complex. 
> You are right we should get it when we use the flag only on the main worker and 
> change the behavior after a failure for this worker. But we need the trick with 
> 0/inf for the other worker, because only with this we have the situation that 
> the other worker wouldn't be asked when there is no session and the main worker 
> is up.


Ok, can you send the patch again :-) ? 

For going back to the main worker - if we let it with lb_value=0 at all
time ( i.e. we don't alter that at any time ), and only in_error_state 
is set on failure - then I believe the thing will work fine.


> I will try to build another patch and send it. I think it could be possible 
> without an additional flag.

Great !


> Another tought about this:
> When you use double and we fix the handling after an error, the main worker 
> would never reach _inf_. Because the lb_factor is < 1 if lb_value wasn't 0. 
> After choosing the worker this value is added to the lb_value. But with a high 
> value for lb_value the differenc between two savable double numbers is greater 
> than the lb_factor. But this is only interessting in theory. I think in real 
> world we will reboot apache before this will happen :).
 
That may become a problem if we use ints.

Costin


> 
> Bernd
> 
> > ( that can go in both jk1, but I can't see a release of jk2 without this 
> > functionality )
> > 
> > Costin
> > 
> > 
> > 
> >>other lb_value and greater than the lb_value of the local tomcat. But after a 
> >>failure of the local tomcat he is in error_state. After some time its set to 
> >>recovering and if the local tomcat is back again the function jk(2)_get_max_lb 
> >>gets the highest lb_value. This is _inf_ from one of the other workers. The 
> >>addition of a value to _inf_ is meaningless. You end up with an lb_value of 
> >>_inf_ for the local worker. If this worker isn't the first in the worker list, 
> >>it will never be choosen again. Because his lb_value will never be less than 
> >>another lb_value, because all the other workers have _inf_ as theire lb_values. 
> >>So every request without a session will be routed to the first of the other 
> >>tomcats.
> >>
> >>The only way out is a restart of the local apache after tomcat is up and 
> >>running. But I don't know when tomcat is finished with all his contexts and 
> >>started the connectors.
> >>
> >>I didn't looked very deep into jk2, but I found the same 
> >>get_most_suitable_worker and get_max_lb functions. The jk2_get_max_lb function 
> >>will always return _inf_. In your answer to some other mails you said, that 
> >>workers could be removed. Do I understand it right, that if my local tomcat goes 
> >>down his worker is removed from the list and after he is comming up again added 
> >>to the worker list with reseted lb_value (only for mod_jk2)?
> >>
> >>The next days I will look in the docu and code of jk2 and give it a try. May be 
> >>all my problems gone away with the new module :).
> >>
> >>Sorry if I ask stupid questions, but I want to make it working for our new cluster.
> >>
> >>Thanks
> >>
> >>Bernd
> >>
> >>
> >>>This is essential for jk2's JNI worker, which fits perfectly this case
> >>>( you don't want to send via TCP when you have a tomcat instance in the 
> >>>same process ).
> >>>
> >>>
> >>>
> >>>
> >>>>(2) Tomcat instances in standby or "soft shutdown" mode where they serve
> >>>>requests bound by established sessions, and requests without a session only
> >>>>if all non-standby instances have failed.
> >>>
> >>>
> >>>That's what the SHM scoreboard is going to do ( among other things ). 
> >>>You can register tomcat instances ( which will be added automatically ),
> >>>or unregister - in which case no new requests ( except the old sessions )
> >>>will go to the unregistered tomcat.
> >>>
> >>>
> >>>Costin
> >>>
> >>>
> >>>
> >>>>costinm@covalent.net wrote:
> >>>>
> >>>>
> >>>>
> >>>>>On Tue, 30 Apr 2002, Bernd Koecke wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>some weeks ago I send a patch for mod_jk for an only routing lb_worker. A
> >>>>>
> >>>>few 
> >>>>
> >>>>
> >>>>>>days later I sent the docu. Henry Gomez said, that it should be commited.
> >>>>>
> >>>>But it 
> >>>>
> >>>>
> >>>>>>I think it isn't in the repository. But its the same  with me here, to
> >>>>>
> >>>>mutch 
> >>>>
> >>>>
> >>>>>>work for to less time :).
> >>>>>
> >>>>>I think it is in mod_jk, I remember seeing the commit. 
> >>>>>
> >>>>>And I think I commited it in jk2 as well ( after some modifications ).
> >>>>>
> >>>>>
> >>>>>
> >>>>>>I need sticky sessions but no loadbalancing in the module. If a request
> >>>>>
> >>>>without 
> >>>>
> >>>>
> >>>>>>a session comes in, it should be routed to the _local_ tomcat.
> >>>>>
> >>>>>Well, there is another use-case with the exact same behavior - Apache2 
> >>>>>with tomcat in JNI mode. All requests without session should be routed to 
> >>>>>the _jni_ channel ( i.e. in-process, minimal overhead ).
> >>>>>
> >>>>>It's exacly the same - so be sure I do my best to handle this case :-)
> >>>>>
> >>>>>Apache2 acts like a 'natural' load-balancer/fail-over, with the parent
> >>>>>process monitoring for crashes and it starts/stop childs based on 
> >>>>>load.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>I think this could be possible with the associated instance of a channel
> >>>>>
> >>>>(item 
> >>>>
> >>>>
> >>>>>>7). Then I have to configure all four nodes for the same group. Because
> >>>>>
> >>>>all 
> >>>>
> >>>>
> >>>>>>nodes will serve the same webapps and associate the channel with this
> >>>>>
> >>>>group. But 
> >>>>
> >>>>
> >>>>>>for this I need a non balancing group. I don't see if the default
> >>>>>
> >>>>behavior of a 
> >>>>
> >>>>
> >>>>>>group is balancing and if this can be switched off. Is this right or do I
> >>>>>
> >>>>miss 
> >>>>
> >>>>
> >>>>>>something?
> >>>>>
> >>>>>The default is balancing, but you can tune this using weithgs ( and I 
> >>>>>think we use your code for making one instance 'top priority').
> >>>>>
> >>>>>Please check the code, take a look and send additional comments/patches.
> >>>>>
> >>>>>It's not yet completely done, of course.
> >>>>>
> >>>>>
> >>>>>Thanks,
> >>>>>Costin 
> >>>>
> >>>>
> >>>>--
> >>>>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> >>>>For additional commands, e-mail: <ma...@jakarta.apache.org>
> >>>>
> >>>>--
> >>>>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> >>>>For additional commands, e-mail: <ma...@jakarta.apache.org>
> >>>>
> >>>>
> >>>
> >>>--
> >>>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> >>>For additional commands, e-mail: <ma...@jakarta.apache.org>
> >>>
> >>
> >>
> >>
> > 
> > 
> > --
> > To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> > For additional commands, e-mail: <ma...@jakarta.apache.org>
> > 
> 
> 
> 
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: PROPOSAL: mod_jk2: Group/Instance

Posted by Bernd Koecke <bk...@schlund.de>.

Hi Costin,

May be I checked out the wrong repository. I checked out 
jakarta-tomcat-connectors with the 
CVSROOT=:pserver:anoncvs@cvs.apache.org:/home/cvspublic

Now to the details, see below.

costinm@covalent.net wrote:
> On Thu, 2 May 2002, Bernd Koecke wrote:
> 
> 
>>misunderstood it. After you said that my patch is included a had a closer look 
>>at mod_jk. I can't see anything of my code but I found the special meaning of 
>>the zero lb_factor/lb_value. It seems that I didn't understand it right at the 
>>first time. This could solve my problem but after a closer look and some testing 
>>I found another problem. When you set the lb_value in workers.properties to 1 
>>for the local tomcat and 0 for the others, you get the desired behavior. But if 
>>you switch off the local tomcat for a short time you come into trouble. The 
>>problem is the 0 for the other workers. The calculation of lb_worker transforms 
>>the 0 to _inf_. Because 1/0 for a double is _inf_. This is greater than any 
> 
> 
> I think there is a piece that checks for 0 and sets it to DEFAULT_VALUE 
> (==1 ) before doing 1/lb. 

No, I think not :). I checked it yesterday. With some additional log statements 
in the validate function of jk_lb_worker.c you get the value _inf_ for the 
lb_factor and lb_value (line 434-444). Because if it would be set to 1, my 
config hadn't worked. Because I set the local worker to 1 and the others to 0.

> 
> While looking at the code - I'm not very sure this whole float is needed,
> I'll try to find a way to simplify it and use ints ( maybe 0..100 with 
> some 'special' values for NEVER and ALLWAYS, or some additional flags ).
> 

This is possible, but then you must add a check if the value is 0. Because 
without it you calc 1/0 with an int and this will give you an error.

> But the way it works ( or at least how I understand it ) is that if the 
> main worker fails, then we look at all workers in error state and try the 
> one with the oldest error. And the 'main' worker will be tried again when 
> the timeout expires.
> 

Thats not the whole story. Its right you will check the main worker when its 
back again and use it only once. Because when the request was successful handled 
rec->in_recovering is true (line 332 of jk_lb_worker.c, service function). Than 
get_max_lb get the value _inf_ from one of the other worker. Than the things 
happen which I said in my prior mail.

> I haven't tested this too much, I just applied the patches ( that I 
> understand :-), I'll add some more debugging for this process and maybe 
> we can find a better solution.
> 
> But this functionality is essential for the JNI worker and very important
> in general - so I really want to find the best solution. If you have any
> patch idea, let me know.
> 
> To avoid further confusion and complexity in the lb-factor/value, I 
> think we should add one more flag ( 'local_worker' ? ) and use it 
> explicitely. Again, patches are wellcome - it's allways good to have 
>  different ( and more ) eyes looking at the code. 
> 

That was it what I did in my sent patch, the additional documentation was sent a 
few days later. But my additions to the lb_worker were a little bit to complex. 
You are right we should get it when we use the flag only on the main worker and 
change the behavior after a failure for this worker. But we need the trick with 
0/inf for the other worker, because only with this we have the situation that 
the other worker wouldn't be asked when there is no session and the main worker 
is up.

I will try to build another patch and send it. I think it could be possible 
without an additional flag.

Another tought about this:
When you use double and we fix the handling after an error, the main worker 
would never reach _inf_. Because the lb_factor is < 1 if lb_value wasn't 0. 
After choosing the worker this value is added to the lb_value. But with a high 
value for lb_value the differenc between two savable double numbers is greater 
than the lb_factor. But this is only interessting in theory. I think in real 
world we will reboot apache before this will happen :).


Bernd

> ( that can go in both jk1, but I can't see a release of jk2 without this 
> functionality )
> 
> Costin
> 
> 
> 
>>other lb_value and greater than the lb_value of the local tomcat. But after a 
>>failure of the local tomcat he is in error_state. After some time its set to 
>>recovering and if the local tomcat is back again the function jk(2)_get_max_lb 
>>gets the highest lb_value. This is _inf_ from one of the other workers. The 
>>addition of a value to _inf_ is meaningless. You end up with an lb_value of 
>>_inf_ for the local worker. If this worker isn't the first in the worker list, 
>>it will never be choosen again. Because his lb_value will never be less than 
>>another lb_value, because all the other workers have _inf_ as theire lb_values. 
>>So every request without a session will be routed to the first of the other 
>>tomcats.
>>
>>The only way out is a restart of the local apache after tomcat is up and 
>>running. But I don't know when tomcat is finished with all his contexts and 
>>started the connectors.
>>
>>I didn't looked very deep into jk2, but I found the same 
>>get_most_suitable_worker and get_max_lb functions. The jk2_get_max_lb function 
>>will always return _inf_. In your answer to some other mails you said, that 
>>workers could be removed. Do I understand it right, that if my local tomcat goes 
>>down his worker is removed from the list and after he is comming up again added 
>>to the worker list with reseted lb_value (only for mod_jk2)?
>>
>>The next days I will look in the docu and code of jk2 and give it a try. May be 
>>all my problems gone away with the new module :).
>>
>>Sorry if I ask stupid questions, but I want to make it working for our new cluster.
>>
>>Thanks
>>
>>Bernd
>>
>>
>>>This is essential for jk2's JNI worker, which fits perfectly this case
>>>( you don't want to send via TCP when you have a tomcat instance in the 
>>>same process ).
>>>
>>>
>>>
>>>
>>>>(2) Tomcat instances in standby or "soft shutdown" mode where they serve
>>>>requests bound by established sessions, and requests without a session only
>>>>if all non-standby instances have failed.
>>>
>>>
>>>That's what the SHM scoreboard is going to do ( among other things ). 
>>>You can register tomcat instances ( which will be added automatically ),
>>>or unregister - in which case no new requests ( except the old sessions )
>>>will go to the unregistered tomcat.
>>>
>>>
>>>Costin
>>>
>>>
>>>
>>>>costinm@covalent.net wrote:
>>>>
>>>>
>>>>
>>>>>On Tue, 30 Apr 2002, Bernd Koecke wrote:
>>>>>
>>>>>
>>>>>
>>>>>>some weeks ago I send a patch for mod_jk for an only routing lb_worker. A
>>>>>
>>>>few 
>>>>
>>>>
>>>>>>days later I sent the docu. Henry Gomez said, that it should be commited.
>>>>>
>>>>But it 
>>>>
>>>>
>>>>>>I think it isn't in the repository. But its the same  with me here, to
>>>>>
>>>>mutch 
>>>>
>>>>
>>>>>>work for to less time :).
>>>>>
>>>>>I think it is in mod_jk, I remember seeing the commit. 
>>>>>
>>>>>And I think I commited it in jk2 as well ( after some modifications ).
>>>>>
>>>>>
>>>>>
>>>>>>I need sticky sessions but no loadbalancing in the module. If a request
>>>>>
>>>>without 
>>>>
>>>>
>>>>>>a session comes in, it should be routed to the _local_ tomcat.
>>>>>
>>>>>Well, there is another use-case with the exact same behavior - Apache2 
>>>>>with tomcat in JNI mode. All requests without session should be routed to 
>>>>>the _jni_ channel ( i.e. in-process, minimal overhead ).
>>>>>
>>>>>It's exacly the same - so be sure I do my best to handle this case :-)
>>>>>
>>>>>Apache2 acts like a 'natural' load-balancer/fail-over, with the parent
>>>>>process monitoring for crashes and it starts/stop childs based on 
>>>>>load.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I think this could be possible with the associated instance of a channel
>>>>>
>>>>(item 
>>>>
>>>>
>>>>>>7). Then I have to configure all four nodes for the same group. Because
>>>>>
>>>>all 
>>>>
>>>>
>>>>>>nodes will serve the same webapps and associate the channel with this
>>>>>
>>>>group. But 
>>>>
>>>>
>>>>>>for this I need a non balancing group. I don't see if the default
>>>>>
>>>>behavior of a 
>>>>
>>>>
>>>>>>group is balancing and if this can be switched off. Is this right or do I
>>>>>
>>>>miss 
>>>>
>>>>
>>>>>>something?
>>>>>
>>>>>The default is balancing, but you can tune this using weithgs ( and I 
>>>>>think we use your code for making one instance 'top priority').
>>>>>
>>>>>Please check the code, take a look and send additional comments/patches.
>>>>>
>>>>>It's not yet completely done, of course.
>>>>>
>>>>>
>>>>>Thanks,
>>>>>Costin 
>>>>
>>>>
>>>>--
>>>>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>>>>For additional commands, e-mail: <ma...@jakarta.apache.org>
>>>>
>>>>--
>>>>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>>>>For additional commands, e-mail: <ma...@jakarta.apache.org>
>>>>
>>>>
>>>
>>>--
>>>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>>>For additional commands, e-mail: <ma...@jakarta.apache.org>
>>>
>>
>>
>>
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 



-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: bk@schlund.de


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

AW: PROPOSAL: mod_jk2: Group/Instance

Posted by Hans Schmid <Ha...@einsurance.de>.


> -----Ursprungliche Nachricht-----
> Von: costinm@covalent.net [mailto:costinm@covalent.net]
> Gesendet: Donnerstag, 2. Mai 2002 19:35
> An: Tomcat Developers List
> Betreff: Re: PROPOSAL: mod_jk2: Group/Instance
>
>
> On Thu, 2 May 2002, Bernd Koecke wrote:
>
> > misunderstood it. After you said that my patch is included a
> had a closer look
> > at mod_jk. I can't see anything of my code but I found the
> special meaning of
> > the zero lb_factor/lb_value. It seems that I didn't understand
> it right at the
> > first time. This could solve my problem but after a closer look
> and some testing
> > I found another problem. When you set the lb_value in
> workers.properties to 1
> > for the local tomcat and 0 for the others, you get the desired
> behavior. But if
> > you switch off the local tomcat for a short time you come into
> trouble. The
> > problem is the 0 for the other workers. The calculation of
> lb_worker transforms
> > the 0 to _inf_. Because 1/0 for a double is _inf_. This is
> greater than any


Thats why we use values like 0.000001 in this situation in mod_jk1 and live
with the few lost sessions.

Somewhere in the tomcat-Dev archives I once saw a patch introducing a
'active' flag
for the lbfactor, but I never had the time to apply this patch and try it
out.

it was something like

worker.ajp13-01.lbfactor=1
worker.ajp13-01.active=0

in the workers.properties

(we already discussed this one in 2001 :)
http://marc.theaimsgroup.com/?l=tomcat-dev&m=100719342027584&w=4

>
> I think there is a piece that checks for 0 and sets it to DEFAULT_VALUE
> (==1 ) before doing 1/lb.
>
> While looking at the code - I'm not very sure this whole float is needed,
> I'll try to find a way to simplify it and use ints ( maybe 0..100 with
> some 'special' values for NEVER and ALLWAYS, or some additional flags ).
>
> But the way it works ( or at least how I understand it ) is that if the
> main worker fails, then we look at all workers in error state and try the
> one with the oldest error. And the 'main' worker will be tried again when
> the timeout expires.
>
>
> I haven't tested this too much, I just applied the patches ( that I
> understand :-), I'll add some more debugging for this process and maybe
> we can find a better solution.
>
> But this functionality is essential for the JNI worker and very important
> in general - so I really want to find the best solution. If you have any
> patch idea, let me know.
>
> To avoid further confusion and complexity in the lb-factor/value, I
> think we should add one more flag ( 'local_worker' ? ) and use it
> explicitely. Again, patches are wellcome - it's allways good to have
>  different ( and more ) eyes looking at the code.
>
> ( that can go in both jk1, but I can't see a release of jk2 without this
> functionality )
>
> Costin
>
>
> > other lb_value and greater than the lb_value of the local
> tomcat. But after a
> > failure of the local tomcat he is in error_state. After some
> time its set to
> > recovering and if the local tomcat is back again the function
> jk(2)_get_max_lb
> > gets the highest lb_value. This is _inf_ from one of the other
> workers. The
> > addition of a value to _inf_ is meaningless. You end up with an
> lb_value of
> > _inf_ for the local worker. If this worker isn't the first in
> the worker list,
> > it will never be choosen again. Because his lb_value will never
> be less than
> > another lb_value, because all the other workers have _inf_ as
> theire lb_values.
> > So every request without a session will be routed to the first
> of the other
> > tomcats.
> >
> > The only way out is a restart of the local apache after tomcat
> is up and
> > running. But I don't know when tomcat is finished with all his
> contexts and
> > started the connectors.
> >
> > I didn't looked very deep into jk2, but I found the same
> > get_most_suitable_worker and get_max_lb functions. The
> jk2_get_max_lb function
> > will always return _inf_. In your answer to some other mails
> you said, that
> > workers could be removed. Do I understand it right, that if my
> local tomcat goes
> > down his worker is removed from the list and after he is
> comming up again added
> > to the worker list with reseted lb_value (only for mod_jk2)?
> >
> > The next days I will look in the docu and code of jk2 and give
> it a try. May be
> > all my problems gone away with the new module :).
> >
> > Sorry if I ask stupid questions, but I want to make it working
> for our new cluster.
> >
> > Thanks
> >
> > Bernd
> >
> > > This is essential for jk2's JNI worker, which fits perfectly this case
> > > ( you don't want to send via TCP when you have a tomcat
> instance in the
> > > same process ).
> > >
> > >
> > >
> > >>(2) Tomcat instances in standby or "soft shutdown" mode where
> they serve
> > >>requests bound by established sessions, and requests without
> a session only
> > >>if all non-standby instances have failed.
> > >
> > >
> > > That's what the SHM scoreboard is going to do ( among other things ).
> > > You can register tomcat instances ( which will be added
> automatically ),
> > > or unregister - in which case no new requests ( except the
> old sessions )
> > > will go to the unregistered tomcat.
> > >
> > >
> > > Costin
> > >
> > >
> > >>costinm@covalent.net wrote:
> > >>
> > >>
> > >>>On Tue, 30 Apr 2002, Bernd Koecke wrote:
> > >>>
> > >>>
> > >>>>some weeks ago I send a patch for mod_jk for an only
> routing lb_worker. A
> > >>>
> > >>few
> > >>
> > >>>>days later I sent the docu. Henry Gomez said, that it
> should be commited.
> > >>>
> > >>But it
> > >>
> > >>>>I think it isn't in the repository. But its the same  with
> me here, to
> > >>>
> > >>mutch
> > >>
> > >>>>work for to less time :).
> > >>>
> > >>>I think it is in mod_jk, I remember seeing the commit.
> > >>>
> > >>>And I think I commited it in jk2 as well ( after some
> modifications ).
> > >>>
> > >>>
> > >>>>I need sticky sessions but no loadbalancing in the module.
> If a request
> > >>>
> > >>without
> > >>
> > >>>>a session comes in, it should be routed to the _local_ tomcat.
> > >>>
> > >>>Well, there is another use-case with the exact same behavior
> - Apache2
> > >>>with tomcat in JNI mode. All requests without session should
> be routed to
> > >>>the _jni_ channel ( i.e. in-process, minimal overhead ).
> > >>>
> > >>>It's exacly the same - so be sure I do my best to handle
> this case :-)
> > >>>
> > >>>Apache2 acts like a 'natural' load-balancer/fail-over, with
> the parent
> > >>>process monitoring for crashes and it starts/stop childs based on
> > >>>load.
> > >>>
> > >>>
> > >>>
> > >>>>I think this could be possible with the associated instance
> of a channel
> > >>>
> > >>(item
> > >>
> > >>>>7). Then I have to configure all four nodes for the same
> group. Because
> > >>>
> > >>all
> > >>
> > >>>>nodes will serve the same webapps and associate the channel
> with this
> > >>>
> > >>group. But
> > >>
> > >>>>for this I need a non balancing group. I don't see if the default
> > >>>
> > >>behavior of a
> > >>
> > >>>>group is balancing and if this can be switched off. Is this
> right or do I
> > >>>
> > >>miss
> > >>
> > >>>>something?
> > >>>
> > >>>The default is balancing, but you can tune this using
> weithgs ( and I
> > >>>think we use your code for making one instance 'top priority').
> > >>>
> > >>>Please check the code, take a look and send additional
> comments/patches.
> > >>>
> > >>>It's not yet completely done, of course.
> > >>>
> > >>>
> > >>>Thanks,
> > >>>Costin
> > >>
> > >>
> > >>--
> > >>To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> >>For additional commands, e-mail:
<ma...@jakarta.apache.org>
> >>
> >>--
> >>To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> >>For additional commands, e-mail:
<ma...@jakarta.apache.org>
> >>
> >>
> >
> >
> > --
> > To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> > For additional commands, e-mail:
<ma...@jakarta.apache.org>
> >
>
>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: PROPOSAL: mod_jk2: new lb values

Posted by Bernd Koecke <bk...@schlund.de>.

costinm@covalent.net wrote:
> Based on the previous discussion:
> 
> - change lbfactor from float to int ( maybe rename it to avoid confusion)
> 

It would be less magic but you have to check if the value is 0.

> - The value will be from 1 to MAX ( 100 ? ).
> 
> - Smaller values will mean more work. The value '0' ( or a special 
> flag ? ) will mean the worker will be used allways ( as long as it is not 
> in_error state ). We can make sure the '0' is the first in the list, 
> and avoid looking for others.
> 
> -  A factor 2 will take 2 times fewer requests than factor 1, 3 will be 
> 1/3, etc. ( each worker uses a counter, and the counter is incremented on 
> each request with the factor value - that's how it works today to 
> implement the round roubin ).
> 
> -  When a worker reaches MAX, all workers will be reset to their
> original values and error state reset. ( that means we'll reset the
> error state based on number of requests, not time ) ( is this a good idea ?) 
> 
> - A value of MAX ( Or flag ? ) will mean the worker will take no 
> request, except those with a previous session id. That's the gracefull
> shutdown.
> 
> In addition, I'm in process of moving the lb properties to channel, 
> since that's what the user should configure in jk2. 
> 
> Costin

I think this is a mutch better aproach then the magic zero lb_value :). I could 
check it for jk1 and I hope I get time to look deeper in jk2 to test it there too.

Bernd
-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: bk@schlund.de


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

AW: PROPOSAL: mod_jk2: new lb values

Posted by Hans Schmid <Ha...@einsurance.de>.

> -----Ursprungliche Nachricht-----
> Von: costinm@covalent.net [mailto:costinm@covalent.net]
> Gesendet: Donnerstag, 2. Mai 2002 20:15
> An: Tomcat Developers List
> Betreff: PROPOSAL: mod_jk2: new lb values
>
>
> Based on the previous discussion:
>
> - change lbfactor from float to int ( maybe rename it to avoid confusion)

Int is definately the way to go for lbfactor.

>
> - The value will be from 1 to MAX ( 100 ? ).
>
> - Smaller values will mean more work. The value '0' ( or a special
> flag ? ) will mean the worker will be used allways ( as long as it is not
> in_error state ). We can make sure the '0' is the first in the list,
> and avoid looking for others.

OK, I would do it the other way around (greater value = more work)
but this is a matter of taste.

>
> -  A factor 2 will take 2 times fewer requests than factor 1, 3 will be
> 1/3, etc. ( each worker uses a counter, and the counter is incremented on
> each request with the factor value - that's how it works today to
> implement the round roubin ).
>

OK

> -  When a worker reaches MAX, all workers will be reset to their
> original values and error state reset. ( that means we'll reset the
> error state based on number of requests, not time ) ( is this a
> good idea ?)
>
> - A value of MAX ( Or flag ? ) will mean the worker will take no
> request, except those with a previous session id. That's the gracefull
> shutdown.

I would seperate the active/inactive state (the manually set inactivity not
the one resulting from error states) from the lbfactor value
(see my other reply about the 'active' flag')

So only apply the round robin logic to channels with active=1

Note: This Active flag would be configured at runtime.
-> Mark one Tomcat to shutdown graceful means set active=0
-> Startup Tomcat would set active=1 and use the assigned lbfactor

This way you know what to do with a worker that has reached MAX
either
- it's counter has to be reset if active = 1
or
- it must stay at MAX, because it was deactivated (set to active=0)



Hans

>
> In addition, I'm in process of moving the lb properties to channel,
> since that's what the user should configure in jk2.
>
> Costin
>
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

PROPOSAL: mod_jk2: new lb values

Posted by co...@covalent.net.

Based on the previous discussion:

- change lbfactor from float to int ( maybe rename it to avoid confusion)

- The value will be from 1 to MAX ( 100 ? ).

- Smaller values will mean more work. The value '0' ( or a special 
flag ? ) will mean the worker will be used allways ( as long as it is not 
in_error state ). We can make sure the '0' is the first in the list, 
and avoid looking for others.

-  A factor 2 will take 2 times fewer requests than factor 1, 3 will be 
1/3, etc. ( each worker uses a counter, and the counter is incremented on 
each request with the factor value - that's how it works today to 
implement the round roubin ).

-  When a worker reaches MAX, all workers will be reset to their
original values and error state reset. ( that means we'll reset the
error state based on number of requests, not time ) ( is this a good idea ?) 

- A value of MAX ( Or flag ? ) will mean the worker will take no 
request, except those with a previous session id. That's the gracefull
shutdown.

In addition, I'm in process of moving the lb properties to channel, 
since that's what the user should configure in jk2. 

Costin



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: PROPOSAL: mod_jk2: Group/Instance

Posted by co...@covalent.net.

On Thu, 2 May 2002, Bernd Koecke wrote:

> misunderstood it. After you said that my patch is included a had a closer look 
> at mod_jk. I can't see anything of my code but I found the special meaning of 
> the zero lb_factor/lb_value. It seems that I didn't understand it right at the 
> first time. This could solve my problem but after a closer look and some testing 
> I found another problem. When you set the lb_value in workers.properties to 1 
> for the local tomcat and 0 for the others, you get the desired behavior. But if 
> you switch off the local tomcat for a short time you come into trouble. The 
> problem is the 0 for the other workers. The calculation of lb_worker transforms 
> the 0 to _inf_. Because 1/0 for a double is _inf_. This is greater than any 

I think there is a piece that checks for 0 and sets it to DEFAULT_VALUE 
(==1 ) before doing 1/lb. 

While looking at the code - I'm not very sure this whole float is needed,
I'll try to find a way to simplify it and use ints ( maybe 0..100 with 
some 'special' values for NEVER and ALLWAYS, or some additional flags ).

But the way it works ( or at least how I understand it ) is that if the 
main worker fails, then we look at all workers in error state and try the 
one with the oldest error. And the 'main' worker will be tried again when 
the timeout expires.


I haven't tested this too much, I just applied the patches ( that I 
understand :-), I'll add some more debugging for this process and maybe 
we can find a better solution.

But this functionality is essential for the JNI worker and very important
in general - so I really want to find the best solution. If you have any
patch idea, let me know.

To avoid further confusion and complexity in the lb-factor/value, I 
think we should add one more flag ( 'local_worker' ? ) and use it 
explicitely. Again, patches are wellcome - it's allways good to have 
 different ( and more ) eyes looking at the code. 

( that can go in both jk1, but I can't see a release of jk2 without this 
functionality )

Costin


> other lb_value and greater than the lb_value of the local tomcat. But after a 
> failure of the local tomcat he is in error_state. After some time its set to 
> recovering and if the local tomcat is back again the function jk(2)_get_max_lb 
> gets the highest lb_value. This is _inf_ from one of the other workers. The 
> addition of a value to _inf_ is meaningless. You end up with an lb_value of 
> _inf_ for the local worker. If this worker isn't the first in the worker list, 
> it will never be choosen again. Because his lb_value will never be less than 
> another lb_value, because all the other workers have _inf_ as theire lb_values. 
> So every request without a session will be routed to the first of the other 
> tomcats.
> 
> The only way out is a restart of the local apache after tomcat is up and 
> running. But I don't know when tomcat is finished with all his contexts and 
> started the connectors.
> 
> I didn't looked very deep into jk2, but I found the same 
> get_most_suitable_worker and get_max_lb functions. The jk2_get_max_lb function 
> will always return _inf_. In your answer to some other mails you said, that 
> workers could be removed. Do I understand it right, that if my local tomcat goes 
> down his worker is removed from the list and after he is comming up again added 
> to the worker list with reseted lb_value (only for mod_jk2)?
> 
> The next days I will look in the docu and code of jk2 and give it a try. May be 
> all my problems gone away with the new module :).
> 
> Sorry if I ask stupid questions, but I want to make it working for our new cluster.
> 
> Thanks
> 
> Bernd
> 
> > This is essential for jk2's JNI worker, which fits perfectly this case
> > ( you don't want to send via TCP when you have a tomcat instance in the 
> > same process ).
> > 
> > 
> > 
> >>(2) Tomcat instances in standby or "soft shutdown" mode where they serve
> >>requests bound by established sessions, and requests without a session only
> >>if all non-standby instances have failed.
> > 
> > 
> > That's what the SHM scoreboard is going to do ( among other things ). 
> > You can register tomcat instances ( which will be added automatically ),
> > or unregister - in which case no new requests ( except the old sessions )
> > will go to the unregistered tomcat.
> > 
> > 
> > Costin
> > 
> > 
> >>costinm@covalent.net wrote:
> >>
> >>
> >>>On Tue, 30 Apr 2002, Bernd Koecke wrote:
> >>>
> >>>
> >>>>some weeks ago I send a patch for mod_jk for an only routing lb_worker. A
> >>>
> >>few 
> >>
> >>>>days later I sent the docu. Henry Gomez said, that it should be commited.
> >>>
> >>But it 
> >>
> >>>>I think it isn't in the repository. But its the same  with me here, to
> >>>
> >>mutch 
> >>
> >>>>work for to less time :).
> >>>
> >>>I think it is in mod_jk, I remember seeing the commit. 
> >>>
> >>>And I think I commited it in jk2 as well ( after some modifications ).
> >>>
> >>>
> >>>>I need sticky sessions but no loadbalancing in the module. If a request
> >>>
> >>without 
> >>
> >>>>a session comes in, it should be routed to the _local_ tomcat.
> >>>
> >>>Well, there is another use-case with the exact same behavior - Apache2 
> >>>with tomcat in JNI mode. All requests without session should be routed to 
> >>>the _jni_ channel ( i.e. in-process, minimal overhead ).
> >>>
> >>>It's exacly the same - so be sure I do my best to handle this case :-)
> >>>
> >>>Apache2 acts like a 'natural' load-balancer/fail-over, with the parent
> >>>process monitoring for crashes and it starts/stop childs based on 
> >>>load.
> >>>
> >>>
> >>>
> >>>>I think this could be possible with the associated instance of a channel
> >>>
> >>(item 
> >>
> >>>>7). Then I have to configure all four nodes for the same group. Because
> >>>
> >>all 
> >>
> >>>>nodes will serve the same webapps and associate the channel with this
> >>>
> >>group. But 
> >>
> >>>>for this I need a non balancing group. I don't see if the default
> >>>
> >>behavior of a 
> >>
> >>>>group is balancing and if this can be switched off. Is this right or do I
> >>>
> >>miss 
> >>
> >>>>something?
> >>>
> >>>The default is balancing, but you can tune this using weithgs ( and I 
> >>>think we use your code for making one instance 'top priority').
> >>>
> >>>Please check the code, take a look and send additional comments/patches.
> >>>
> >>>It's not yet completely done, of course.
> >>>
> >>>
> >>>Thanks,
> >>>Costin 
> >>
> >>
> >>--
> >>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> >>For additional commands, e-mail: <ma...@jakarta.apache.org>
> >>
> >>--
> >>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> >>For additional commands, e-mail: <ma...@jakarta.apache.org>
> >>
> >>
> > 
> > 
> > --
> > To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> > For additional commands, e-mail: <ma...@jakarta.apache.org>
> > 
> 
> 
> 
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: PROPOSAL: mod_jk2: Group/Instance

Posted by Bernd Koecke <bk...@schlund.de>.

costinm@covalent.net wrote:
> On Thu, 2 May 2002, Amund Elstad wrote:
> 
> 
>>(1) all requests without a session are routed to a specific tomcat instance
>>(if that instance is working). 
> 
> 
> That has been added, and it should work in both jk1 and jk2 ( I don't 
> remember who sent the patch, but I remember adding it ). If it doesn't 
> work yet, is easy to fix.
> 

Sorry, may be I'm stupid, but how does it work? I looked at jk and jk2. My 
understanding is the following:

The worker which uses the jvm route is the lb_worker. The others don't use the 
SessionId extension. You could tweak this worker, but its buggy or I 
misunderstood it. After you said that my patch is included a had a closer look 
at mod_jk. I can't see anything of my code but I found the special meaning of 
the zero lb_factor/lb_value. It seems that I didn't understand it right at the 
first time. This could solve my problem but after a closer look and some testing 
I found another problem. When you set the lb_value in workers.properties to 1 
for the local tomcat and 0 for the others, you get the desired behavior. But if 
you switch off the local tomcat for a short time you come into trouble. The 
problem is the 0 for the other workers. The calculation of lb_worker transforms 
the 0 to _inf_. Because 1/0 for a double is _inf_. This is greater than any 
other lb_value and greater than the lb_value of the local tomcat. But after a 
failure of the local tomcat he is in error_state. After some time its set to 
recovering and if the local tomcat is back again the function jk(2)_get_max_lb 
gets the highest lb_value. This is _inf_ from one of the other workers. The 
addition of a value to _inf_ is meaningless. You end up with an lb_value of 
_inf_ for the local worker. If this worker isn't the first in the worker list, 
it will never be choosen again. Because his lb_value will never be less than 
another lb_value, because all the other workers have _inf_ as theire lb_values. 
So every request without a session will be routed to the first of the other 
tomcats.

The only way out is a restart of the local apache after tomcat is up and 
running. But I don't know when tomcat is finished with all his contexts and 
started the connectors.

I didn't looked very deep into jk2, but I found the same 
get_most_suitable_worker and get_max_lb functions. The jk2_get_max_lb function 
will always return _inf_. In your answer to some other mails you said, that 
workers could be removed. Do I understand it right, that if my local tomcat goes 
down his worker is removed from the list and after he is comming up again added 
to the worker list with reseted lb_value (only for mod_jk2)?

The next days I will look in the docu and code of jk2 and give it a try. May be 
all my problems gone away with the new module :).

Sorry if I ask stupid questions, but I want to make it working for our new cluster.

Thanks

Bernd

> This is essential for jk2's JNI worker, which fits perfectly this case
> ( you don't want to send via TCP when you have a tomcat instance in the 
> same process ).
> 
> 
> 
>>(2) Tomcat instances in standby or "soft shutdown" mode where they serve
>>requests bound by established sessions, and requests without a session only
>>if all non-standby instances have failed.
> 
> 
> That's what the SHM scoreboard is going to do ( among other things ). 
> You can register tomcat instances ( which will be added automatically ),
> or unregister - in which case no new requests ( except the old sessions )
> will go to the unregistered tomcat.
> 
> 
> Costin
> 
> 
>>costinm@covalent.net wrote:
>>
>>
>>>On Tue, 30 Apr 2002, Bernd Koecke wrote:
>>>
>>>
>>>>some weeks ago I send a patch for mod_jk for an only routing lb_worker. A
>>>
>>few 
>>
>>>>days later I sent the docu. Henry Gomez said, that it should be commited.
>>>
>>But it 
>>
>>>>I think it isn't in the repository. But its the same  with me here, to
>>>
>>mutch 
>>
>>>>work for to less time :).
>>>
>>>I think it is in mod_jk, I remember seeing the commit. 
>>>
>>>And I think I commited it in jk2 as well ( after some modifications ).
>>>
>>>
>>>>I need sticky sessions but no loadbalancing in the module. If a request
>>>
>>without 
>>
>>>>a session comes in, it should be routed to the _local_ tomcat.
>>>
>>>Well, there is another use-case with the exact same behavior - Apache2 
>>>with tomcat in JNI mode. All requests without session should be routed to 
>>>the _jni_ channel ( i.e. in-process, minimal overhead ).
>>>
>>>It's exacly the same - so be sure I do my best to handle this case :-)
>>>
>>>Apache2 acts like a 'natural' load-balancer/fail-over, with the parent
>>>process monitoring for crashes and it starts/stop childs based on 
>>>load.
>>>
>>>
>>>
>>>>I think this could be possible with the associated instance of a channel
>>>
>>(item 
>>
>>>>7). Then I have to configure all four nodes for the same group. Because
>>>
>>all 
>>
>>>>nodes will serve the same webapps and associate the channel with this
>>>
>>group. But 
>>
>>>>for this I need a non balancing group. I don't see if the default
>>>
>>behavior of a 
>>
>>>>group is balancing and if this can be switched off. Is this right or do I
>>>
>>miss 
>>
>>>>something?
>>>
>>>The default is balancing, but you can tune this using weithgs ( and I 
>>>think we use your code for making one instance 'top priority').
>>>
>>>Please check the code, take a look and send additional comments/patches.
>>>
>>>It's not yet completely done, of course.
>>>
>>>
>>>Thanks,
>>>Costin 
>>
>>
>>--
>>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>>For additional commands, e-mail: <ma...@jakarta.apache.org>
>>
>>--
>>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>>For additional commands, e-mail: <ma...@jakarta.apache.org>
>>
>>
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 

-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: bk@schlund.de

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

RE: PROPOSAL: mod_jk2: Group/Instance

Posted by co...@covalent.net.

On Thu, 2 May 2002, Amund Elstad wrote:

> (1) all requests without a session are routed to a specific tomcat instance
> (if that instance is working). 

That has been added, and it should work in both jk1 and jk2 ( I don't 
remember who sent the patch, but I remember adding it ). If it doesn't 
work yet, is easy to fix.

This is essential for jk2's JNI worker, which fits perfectly this case
( you don't want to send via TCP when you have a tomcat instance in the 
same process ).


> (2) Tomcat instances in standby or "soft shutdown" mode where they serve
> requests bound by established sessions, and requests without a session only
> if all non-standby instances have failed.

That's what the SHM scoreboard is going to do ( among other things ). 
You can register tomcat instances ( which will be added automatically ),
or unregister - in which case no new requests ( except the old sessions )
will go to the unregistered tomcat.


Costin

> costinm@covalent.net wrote:
> 
> >On Tue, 30 Apr 2002, Bernd Koecke wrote:
> >
> >> some weeks ago I send a patch for mod_jk for an only routing lb_worker. A
> few 
> >> days later I sent the docu. Henry Gomez said, that it should be commited.
> But it 
> >> I think it isn't in the repository. But its the same  with me here, to
> mutch 
> >> work for to less time :).
> >
> >I think it is in mod_jk, I remember seeing the commit. 
> >
> >And I think I commited it in jk2 as well ( after some modifications ).
> >
> >> I need sticky sessions but no loadbalancing in the module. If a request
> without 
> >> a session comes in, it should be routed to the _local_ tomcat.
> >
> >Well, there is another use-case with the exact same behavior - Apache2 
> >with tomcat in JNI mode. All requests without session should be routed to 
> >the _jni_ channel ( i.e. in-process, minimal overhead ).
> >
> >It's exacly the same - so be sure I do my best to handle this case :-)
> >
> >Apache2 acts like a 'natural' load-balancer/fail-over, with the parent
> >process monitoring for crashes and it starts/stop childs based on 
> >load.
> >
> >
> >> I think this could be possible with the associated instance of a channel
> (item 
> >> 7). Then I have to configure all four nodes for the same group. Because
> all 
> >> nodes will serve the same webapps and associate the channel with this
> group. But 
> >> for this I need a non balancing group. I don't see if the default
> behavior of a 
> >> group is balancing and if this can be switched off. Is this right or do I
> miss 
> >> something?
> >
> >The default is balancing, but you can tune this using weithgs ( and I 
> >think we use your code for making one instance 'top priority').
> >
> >Please check the code, take a look and send additional comments/patches.
> >
> >It's not yet completely done, of course.
> >
> >
> >Thanks,
> >Costin 
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>