You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geronimo.apache.org by Alex Blewitt <Al...@ioshq.com> on 2003/08/20 17:12:30 UTC

Re: [clustering] -- HttpSession and EJB Session

So far, the focus has been purely on the implementation of clustring 
WRT WebApps and therefore the HttpSession. However, consideration 
should be equally given to the way that EJB SessionBeans' state are 
held/clustered/scaled.

It's no good using a web application and for it to fall over to another 
node if that web app was using SessionBeans and it looses its session 
state.

The approach taken by WebSphere is to serialise the SessionBean in 
addition to the HttpSession out to an external system, and then load it 
when required. Not only is this a simple solution, but the same 
techniques can be applied for both HttpSession and SessionBean 
clustering.

I fear that the HttpSession will work based on very intimate details of 
how the HttpSession works (which is in Geronimo's control), whereas 
this will not be possible with SessionBeans to the same extent.

Have you considered representing the HttpSession as a SessionBean and 
then working on only one solution to the clustering problem from the 
SessionBean end? Then you would have a solution that fits both, and 
whilst the in-memory replication process/buddying system works well for 
HttpSession, I feel that this approach won't be suitable for 
SessionBeans.

On the other hand, a solution for SessionBeans will implicitly work for 
HttpSession as well.

Regards,

Alex.

On Wednesday, Aug 20, 2003, at 13:09 Europe/London, Jules Gosnell wrote:

> I'm going to pick up this thread again :-)
>
> we have to deal with both dumb and integrated load-balancers...
>
> DUMB LB:
>
> (A) undivided cluster, simple approach:
>
> every node buddies for every other node
> no 'locate/migrate' required since every session is on every node
> replication needs to be synchronous, in order to guarantee that node 
> on which nex request falls will be up-to-date
>
> problem: unscalable
>
> (B) subdivided cluster, more complex approach:
>
> cluster subdivided into buddy teams (possibly only of pairs).
> 'locate/migrate' required since request may fall on node that does not 
> have session to hand
> primary could use asyn and secondary sync replication, provided that 
> 'locate' always talked to primary
>
> problem: given a cluster of n nodes divided into teams of t nodes: 
> only t/n requests will be able to avoid the 'locate/migrate' step - in 
> a large cluster with small teams, this is not much more efficient than 
> a shared store solution.
>
> SMART LB (we're assuming it can do pretty much whatever we want it to).
>
> (A)
>
> assuming affinity, we can use async replication, because request will 
> always fall on most up to date node.
> if this node fails, the lb MUST pick one to failover to and continue 
> to use that one (or else we have to fall back to sync and assume dumb 
> lb)
> if original node comes back up, it doesn't matter whether lb goes back 
> to it, or remains stuck to fail-over node.
>
> (B)
>
> if we can arrange for LB use affinity, with failover limited to our 
> buddy-group, and always stick to the failover node as well we can lose 
> 'locate/migrate' and replicate asych. If we can't get 'always stick to 
> failover node', we replicate synch after failover.
>
> if we can only arrange affinity, but not failover within group, we can 
> replicate asynch and need 'locate/migrate'. If we can't have 
> lb-remains-stuck-to-failover-node, we are in trouble, because as soon 
> as primary node fails we go back to the situation outlined above where 
> we do a lot of locate/migrate and are not much better off than a 
> shared store.
>
>
> The lb-sticks-to-failover-node is not as simple as it sounds - mod_jk 
> doesn't do it.
>
> it implies
>
> either :
>
> you have the ability to change the routing info carried on the session 
> id client side (I've considered this and don't think it practical - I 
> may be wrong ...)
>
> or :
>
> the session id needs to carry not just a single piece of routing info 
> (like a mod_jk worker name) but a failover list 
> worker1,worker2,worker3 etc in effect your buddy-team,
>
> or:
>
> the lb needs to maintain state, remembering where each session was 
> last serviced and always sticking requests for that session to that 
> node. in a large deployment this requires lbs to replicate this state 
> between them so that they can balance over the same nodes in a 
> coordinated fashion. I think F5 Big-IP is capable of this, but 
> effectively you just shift the state problem from yourself to someone 
> else.
>
> Note that if your lb can understand extended routing info involving 
> the whole buddy team, then you know that it will always balance 
> requests to members of this team anyway, in which case you can 
> dispense with 'locate/migrate' again.
>
> Finally - you still need a migrate operation as sessions will need to 
> migrate from buddy-group to buddy-group as buddy-groups are created 
> and destroyed...
>
>
> in summary - I think that you can optimise away 'locate' and a lot of 
> 'migrate'-ion - Jetty's current impl has no locate and you can build 
> subdivided clusters with it and mod_jk.... but I don't do automatic 
> repartitioning yet....
>
>
> If you are still reading here, then you are doing well :-)
>
>
>
> Jules
>
>
> Jeremy Boynes wrote:
>
>>> I figure that we are talking about two different and orthogonal 
>>> types of
>>> partition here.
>>>
>> Agreed.
>>
>>
>>> I'm happy to call the way that nodes are linked into buddy-groups
>>> (groups of peers that store replicated state for each other) 
>>> something
>>> other than 'partition', if we want to reserve that term for some 
>>> sort of
>>> cluster management concept, but you do agree that these structures
>>> exist, do you not ? regardless of what they are called, otherwise 
>>> you do
>>> not scale, as we have all agreed.
>>>
>>> As for loadbalancer configuration I think this will draw upon both
>>> 'jeremy-partition' and 'jules-buddy-group' status as :
>>>
>>> - you only want to balance requests for a webapp to nodes on which 
>>> it is
>>> deployed
>>>
>> Yes
>>
>>
>>> - you only want to fail-over requests to other nodes in the same
>>> buddy-group as the failed node
>>>
>> Ideally, yes but this is not essential. See below.
>>
>>
>>> if you can do the latter you can avoid cluster-wide logic for findg 
>>> and
>>> migrating sessions from remote nodes to the one receiving the 
>>> request,
>>> because you can guarantee that the session is already there.
>>>
>>
>> The price to pay for this is that you always need to replicate state 
>> to any
>> node to which the request may be directed. If you allow for a locate 
>> phase,
>> then you can minimise the set of nodes to which data is replicated 
>> (the
>> buddy-group) because any node can find it later. In a high-affinity
>> configuration this reduces the overall load.
>>
>> Consider a four node partition A,B,C,D. In the 'replicate-everywhere' 
>> model,
>> A's state is replicated to three other nodes after every request, 
>> incurring
>> the processing cost on three nodes (assuming network multicast). If A 
>> dies,
>> any node can instantly pick up the work. The issue is we have a lot of
>> overhead to reduce the latency in the event of node death (which we 
>> hope is
>> infrequent).
>>
>> The other alternative is that every session has one and only one 
>> buddy. This
>> would result in 1/3 of A's sessions being replicated to B, 1/3 to C 
>> and 1/3
>> to D. Each session is replicated to just one node, allowing unicast 
>> to be
>> used (which has a lower overhead than multicast) and only incurring 
>> the
>> ongoing processing cost on one node.
>>
>> If A dies, then B,C,D pick new buddies for A's sessions and do bulk 
>> state
>> transfer to redistribute, ensuring that the state is always stored on 
>> two
>> nodes. Say B transfers to C, C to D and D to B. Again, unicast 
>> transfer. You
>> can avoid this if you are willing to lose a session if another node 
>> dies
>> (double failure scenario).
>>
>> An A request is now directed to a random node; if this node has the 
>> state,
>> then it becomes the primary and starts replicating to its buddy. If 
>> it does
>> not, then it sends a multicast inquiry to the partition, locates the 
>> state,
>> does a second transfer and starts replicating again.
>>
>> The trade off is lower overhead whilst running but a larger state 
>> transfer
>> in the event of node death. I tend to prefer the latter on the basis 
>> that
>> node deaths are infrequent.
>>
>>
>>
>>> Are we getting closer ?
>>>
>>>
>> :-)
>>
>> --
>> Jeremy
>>
>>
>
>
> -- 
> /**********************************
> * Jules Gosnell
> * Partner
> * Core Developers Network (Europe)
> **********************************/
>
>

Re: [clustering] -- HttpSession and EJB Session

Posted by Jules Gosnell <ju...@coredevelopers.net>.

Alex Blewitt wrote:

> On Thursday, Aug 21, 2003, at 10:11 Europe/London, Jules Gosnell wrote:
>
>> The reason we are cracking the HttpSession nut first is because, 
>> that's the bit I know about :-) and IMHO, it is by far the most 
>> tricky of the two - why ?
>>
>> 1. Session EJBs were designed from the ground up with distribution 
>> and all that that entails in mind - HttpSessions were not.
>> 2. With EJB clustering we have total control of the LB. It will 
>> probably be a smart client-side proxy - with HttpSessions we don't. 
>> It will be a 3rd party black box with varying fn-ality.
>> 3. People expect to take a hit in performance when the move data from 
>> a POJO to an EJB, with an HttpSession they don't
>
>
> Possibly true to all of these. However, I was pointing out that if the 
> SessionBean nut was cracked, then it would be trivial to implement 
> HttpSessions on top of SessionBeans. [It probably wouldn't be the most 
> efficient way, for sure :-)]

Alex,

I am not looking for the path of least resistance to a distributable 
httpsession impl - I already have one :-) I'm trying to improve it....

As soon as the Jetty integration settles down and I can find the time, I 
will either port or rewrite (probably a bit of both) my current impl. If 
I'm going to be doing major surgery on it, then I want to carefully 
reconsider everything that I learnt from the last iteration, or can 
squeeze out of the great minds on this list and put that in too ...

If you like the idea of a SessionBean based impl - have a look at 
Jetty's org.mortbay.j2ee.session.AbstractStore and CMPStore and figure 
out how you could write an e.g. SFSBStore... I'm sure I would be happy 
to accept it into the fold.

>
>
>> I always like to break the back of the hardest part of a problem 
>> first, rather than invest a lot of time travelling down a path, 
>> turning a corner and finding an insurmountable brick wall there... :-)
>
>
> True. Always good to know what's possible, rather than walking down a 
> blind alley. Right up until you start writing JavaMail ...
>
>> Perhaps you should start an EJB clustering thread ?
>
>
> Possibly a good idea, but I'm taking a look back at clustering in 
> general rather than being involved with EJBs specifically at this 
> point :-)
>
>>> It's no good using a web application and for it to fall over to 
>>> another node if that web app was using SessionBeans and it looses 
>>> its session state.
>>
>>
>> it won't-
>>
>> you put the bean into your httpsession (it will store a distributable 
>> handle).
>> you take it out on another node (the handle is turned back into a 
>> remote proxy)
>> you talk through it to your remote resource...
>
>
> Sorry, I should have been a bit more descriptive :-) I was assuming 
> that the SessionBean stayed in memory (remember, SessionBeans aren't 
> backed by a DB in the same way that EntityBeans are) and was in the 
> machine that crashed.
>
> Secondly, if you do get a handle to the SessionBean (and the handle 
> has been replicated to another machine) then the SessionBean will 
> still refer back to the crashed machine anyway ... (at least, until 
> EJB SessionBean clustering has been solved ;-)
>
>> as above, but the session bean is also replicated to the node you 
>> fail over to and the containers agree an optimisation underneath the 
>> covers so that when conversation is reprised on the failover node, it 
>> is all done locally....
>
>
> Yes, that is possible, but with the nature of transactions as well 
> that may become more complex. 

I'm sure someone has distributed XAs on their todo list :-)

>
>
>>> The approach taken by WebSphere is to serialise the SessionBean in 
>>> addition to the HttpSession out to an external system, and then load 
>>> it when required. Not only is this a simple solution, but the same 
>>> techniques can be applied for both HttpSession and SessionBean 
>>> clustering.
>>
>>
>> We've already done this model - it's 'shared-store'
>
>
> Yup, it's been talked about. Was it decided that this wouldn't be 
> implemented, or it might be implemented later? Most of the recent 
> conversations on clustering seems to be discussing the buddy(ies) system.

I already have two impls. Personally I think 'in-memory-replication' is 
the way to go.

>
>>> I fear that the HttpSession will work based on very intimate details 
>>> of how the HttpSession works (which is in Geronimo's control), 
>>> whereas this will not be possible with SessionBeans to the same extent.
>>
>>
>> the behaviour of HttpSession is defined in the servlet spec and that 
>> of session beans in the EJB spec - neither are under our control, we 
>> just have to come up with an impls for both which share as much code 
>> between themselves as is useful.
>
>
> Yes, it's true that they don't have to use the same system. However, I 
> wanted to call attention to the problem to see if there was a common 
> way forward that would make sense. For example, SessionBeans can 
> pretty much only use serialisation to get between machines, whereas 
> the HttpSession stuff you can get at the modifications at a much lower 
> detail (access of individual parameters, for example). 

exactly - my replication is done on a per-attribute basis

>
>
>>> Have you considered representing the HttpSession as a SessionBean 
>>> and then working on only one solution to the clustering problem from 
>>> the SessionBean end? Then you would have a solution that fits both, 
>>> and whilst the in-memory replication process/buddying system works 
>>> well for HttpSession, I feel that this approach won't be suitable 
>>> for SessionBeans.
>>
>>
>> see above.
>>
>> BTW, JBoss currently uses replication for clustered stateful session 
>> beans. As, I think, WebLogic does (can anyone confirm this ?)...
>
>
> IMHO the term 'replication' is still somewhat vague -- it defines a 
> family of processes, rather than the in-memory replication which it is 
> currently being used as. I think it would be a good idea to call it 
> something specific (in-memory replication, or something better that I 
> can't think of right now :-) so that when JavaDocs/User guides etc. 
> are written the terms are unambiguous. 

OK - we'll call it in-memory-replication

>
>
>>> On the other hand, a solution for SessionBeans will implicitly work 
>>> for HttpSession as well.
>>
>>
>> if you want to hobble your use of HttpSessions with all the 
>> additional aspects of session bean use, like remote semantics, single 
>> threadedness etc, yes. The HttpSession's local semantics make life 
>> really awkward.
>
>
> I'm not sure that the remote semantics are 100% necessary when working 
> with EJBs, though they can do. I'm also unconvinced that the 
> single-threadedness needs to be mandated either; if the EJB is running 
> locally then there are all sorts of optimisations that can be done on 
> an EJB. If the SessionBean doesn't use transactions, then there 
> wouldn't be any issues with multi-threading IMNSHO.
>
> But in any case, I wasn't (necessarily) thinking of suggesting 
> HttpSession-> SessionBean -> session service, more like HttpSession -> 
> session service <- SessionBean, so there was a common area in between. 
> Something like serialisation would work for both; it would just be a 
> case of 'when the user has finished with it (i.e. at the end of the 
> web/ejb request as appropriate). 

as much code as is practicable will be shared between the impls. I think 
session EJBs have not been mentioned because they are a solved problem. 
Jeremy is probably making mental notes of everything from this thread 
that can be used in common, as I hope other readers are also doing...

Jules

>
>
> Alex.
>


-- 
/*************************************
 * Jules Gosnell
 * Partner
 * Core Developers Network (Europe)
 * http://www.coredevelopers.net
 *************************************/

Re: [clustering] -- HttpSession and EJB Session

Posted by Alex Blewitt <Al...@ioshq.com>.

On Thursday, Aug 21, 2003, at 10:11 Europe/London, Jules Gosnell wrote:

> The reason we are cracking the HttpSession nut first is because, 
> that's the bit I know about :-) and IMHO, it is by far the most tricky 
> of the two - why ?
>
> 1. Session EJBs were designed from the ground up with distribution and 
> all that that entails in mind - HttpSessions were not.
> 2. With EJB clustering we have total control of the LB. It will 
> probably be a smart client-side proxy - with HttpSessions we don't. It 
> will be a 3rd party black box with varying fn-ality.
> 3. People expect to take a hit in performance when the move data from 
> a POJO to an EJB, with an HttpSession they don't

Possibly true to all of these. However, I was pointing out that if the 
SessionBean nut was cracked, then it would be trivial to implement 
HttpSessions on top of SessionBeans. [It probably wouldn't be the most 
efficient way, for sure :-)]

> I always like to break the back of the hardest part of a problem 
> first, rather than invest a lot of time travelling down a path, 
> turning a corner and finding an insurmountable brick wall there... :-)

True. Always good to know what's possible, rather than walking down a 
blind alley. Right up until you start writing JavaMail ...

> Perhaps you should start an EJB clustering thread ?

Possibly a good idea, but I'm taking a look back at clustering in 
general rather than being involved with EJBs specifically at this point 
:-)

>> It's no good using a web application and for it to fall over to 
>> another node if that web app was using SessionBeans and it looses its 
>> session state.
>
> it won't-
>
> you put the bean into your httpsession (it will store a distributable 
> handle).
> you take it out on another node (the handle is turned back into a 
> remote proxy)
> you talk through it to your remote resource...

Sorry, I should have been a bit more descriptive :-) I was assuming 
that the SessionBean stayed in memory (remember, SessionBeans aren't 
backed by a DB in the same way that EntityBeans are) and was in the 
machine that crashed.

Secondly, if you do get a handle to the SessionBean (and the handle has 
been replicated to another machine) then the SessionBean will still 
refer back to the crashed machine anyway ... (at least, until EJB 
SessionBean clustering has been solved ;-)

> as above, but the session bean is also replicated to the node you fail 
> over to and the containers agree an optimisation underneath the covers 
> so that when conversation is reprised on the failover node, it is all 
> done locally....

Yes, that is possible, but with the nature of transactions as well that 
may become more complex.

>> The approach taken by WebSphere is to serialise the SessionBean in 
>> addition to the HttpSession out to an external system, and then load 
>> it when required. Not only is this a simple solution, but the same 
>> techniques can be applied for both HttpSession and SessionBean 
>> clustering.
>
> We've already done this model - it's 'shared-store'

Yup, it's been talked about. Was it decided that this wouldn't be 
implemented, or it might be implemented later? Most of the recent 
conversations on clustering seems to be discussing the buddy(ies) 
system.

>> I fear that the HttpSession will work based on very intimate details 
>> of how the HttpSession works (which is in Geronimo's control), 
>> whereas this will not be possible with SessionBeans to the same 
>> extent.
>
> the behaviour of HttpSession is defined in the servlet spec and that 
> of session beans in the EJB spec - neither are under our control, we 
> just have to come up with an impls for both which share as much code 
> between themselves as is useful.

Yes, it's true that they don't have to use the same system. However, I 
wanted to call attention to the problem to see if there was a common 
way forward that would make sense. For example, SessionBeans can pretty 
much only use serialisation to get between machines, whereas the 
HttpSession stuff you can get at the modifications at a much lower 
detail (access of individual parameters, for example).

>> Have you considered representing the HttpSession as a SessionBean and 
>> then working on only one solution to the clustering problem from the 
>> SessionBean end? Then you would have a solution that fits both, and 
>> whilst the in-memory replication process/buddying system works well 
>> for HttpSession, I feel that this approach won't be suitable for 
>> SessionBeans.
>
> see above.
>
> BTW, JBoss currently uses replication for clustered stateful session 
> beans. As, I think, WebLogic does (can anyone confirm this ?)...

IMHO the term 'replication' is still somewhat vague -- it defines a 
family of processes, rather than the in-memory replication which it is 
currently being used as. I think it would be a good idea to call it 
something specific (in-memory replication, or something better that I 
can't think of right now :-) so that when JavaDocs/User guides etc. are 
written the terms are unambiguous.

>> On the other hand, a solution for SessionBeans will implicitly work 
>> for HttpSession as well.
>
> if you want to hobble your use of HttpSessions with all the additional 
> aspects of session bean use, like remote semantics, single 
> threadedness etc, yes. The HttpSession's local semantics make life 
> really awkward.

I'm not sure that the remote semantics are 100% necessary when working 
with EJBs, though they can do. I'm also unconvinced that the 
single-threadedness needs to be mandated either; if the EJB is running 
locally then there are all sorts of optimisations that can be done on 
an EJB. If the SessionBean doesn't use transactions, then there 
wouldn't be any issues with multi-threading IMNSHO.

But in any case, I wasn't (necessarily) thinking of suggesting 
HttpSession-> SessionBean -> session service, more like HttpSession -> 
session service <- SessionBean, so there was a common area in between. 
Something like serialisation would work for both; it would just be a 
case of 'when the user has finished with it (i.e. at the end of the 
web/ejb request as appropriate).

Alex.

Re: [clustering] -- HttpSession and EJB Session

Posted by Jules Gosnell <ju...@coredevelopers.net>.

Alex,

The reason we are cracking the HttpSession nut first is because, that's 
the bit I know about :-) and IMHO, it is by far the most tricky of the 
two - why ?

1. Session EJBs were designed from the ground up with distribution and 
all that that entails in mind - HttpSessions were not.
2. With EJB clustering we have total control of the LB. It will probably 
be a smart client-side proxy - with HttpSessions we don't. It will be a 
3rd party black box with varying fn-ality.
3. People expect to take a hit in performance when the move data from a 
POJO to an EJB, with an HttpSession they don't

I always like to break the back of the hardest part of a problem first, 
rather than invest a lot of time travelling down a path, turning a 
corner and finding an insurmountable brick wall there... :-)

Jetty's current distributable session impl could easily represent an 
HttpSession as a Session EJB, but I haven't bothered because the 
replicated backend on JavaGroups was much more performant. However I 
would hope that whatever solution arises from this dicussion will be 
sufficiently abstract to allow multiple impls including replicated, 
shared-store (maybe via EJB) and future impls which are still to be 
thought up.

Perhaps you should start an EJB clustering thread ?

more inline....

Alex Blewitt wrote:

> So far, the focus has been purely on the implementation of clustring 
> WRT WebApps and therefore the HttpSession. However, consideration 
> should be equally given to the way that EJB SessionBeans' state are 
> held/clustered/scaled.
>
> It's no good using a web application and for it to fall over to 
> another node if that web app was using SessionBeans and it looses its 
> session state. 

it won't-

you put the bean into your httpsession (it will store a distributable 
handle).
you take it out on another node (the handle is turned back into a remote 
proxy)
you talk through it to your remote resource...

or

as above, but the session bean is also replicated to the node you fail 
over to and the containers agree an optimisation underneath the covers 
so that when conversation is reprised on the failover node, it is all 
done locally....

>
>
> The approach taken by WebSphere is to serialise the SessionBean in 
> addition to the HttpSession out to an external system, and then load 
> it when required. Not only is this a simple solution, but the same 
> techniques can be applied for both HttpSession and SessionBean 
> clustering. 

we've already done this model - it's 'shared-store'

>
>
> I fear that the HttpSession will work based on very intimate details 
> of how the HttpSession works (which is in Geronimo's control), whereas 
> this will not be possible with SessionBeans to the same extent. 

the behaviour of HttpSession is defined in the servlet spec and that of 
session beans in the EJB spec - neither are under our control, we just 
have to come up with an impls for both which share as much code between 
themselves as is useful.

>
>
> Have you considered representing the HttpSession as a SessionBean and 
> then working on only one solution to the clustering problem from the 
> SessionBean end? Then you would have a solution that fits both, and 
> whilst the in-memory replication process/buddying system works well 
> for HttpSession, I feel that this approach won't be suitable for 
> SessionBeans. 

see above.

BTW, JBoss currently uses replication for clustered stateful session 
beans. As, I think, WebLogic does (can anyone confirm this ?)...

>
>
> On the other hand, a solution for SessionBeans will implicitly work 
> for HttpSession as well. 

if you want to hobble your use of HttpSessions with all the additional 
aspects of session bean use, like remote semantics, single threadedness 
etc, yes. The HttpSession's local semantics make life really awkward.


Jules

>
>
> Regards,
>
> Alex.
>
> On Wednesday, Aug 20, 2003, at 13:09 Europe/London, Jules Gosnell wrote:
>
>> I'm going to pick up this thread again :-)
>>
>> we have to deal with both dumb and integrated load-balancers...
>>
>> DUMB LB:
>>
>> (A) undivided cluster, simple approach:
>>
>> every node buddies for every other node
>> no 'locate/migrate' required since every session is on every node
>> replication needs to be synchronous, in order to guarantee that node 
>> on which nex request falls will be up-to-date
>>
>> problem: unscalable
>>
>> (B) subdivided cluster, more complex approach:
>>
>> cluster subdivided into buddy teams (possibly only of pairs).
>> 'locate/migrate' required since request may fall on node that does 
>> not have session to hand
>> primary could use asyn and secondary sync replication, provided that 
>> 'locate' always talked to primary
>>
>> problem: given a cluster of n nodes divided into teams of t nodes: 
>> only t/n requests will be able to avoid the 'locate/migrate' step - 
>> in a large cluster with small teams, this is not much more efficient 
>> than a shared store solution.
>>
>> SMART LB (we're assuming it can do pretty much whatever we want it to).
>>
>> (A)
>>
>> assuming affinity, we can use async replication, because request will 
>> always fall on most up to date node.
>> if this node fails, the lb MUST pick one to failover to and continue 
>> to use that one (or else we have to fall back to sync and assume dumb 
>> lb)
>> if original node comes back up, it doesn't matter whether lb goes 
>> back to it, or remains stuck to fail-over node.
>>
>> (B)
>>
>> if we can arrange for LB use affinity, with failover limited to our 
>> buddy-group, and always stick to the failover node as well we can 
>> lose 'locate/migrate' and replicate asych. If we can't get 'always 
>> stick to failover node', we replicate synch after failover.
>>
>> if we can only arrange affinity, but not failover within group, we 
>> can replicate asynch and need 'locate/migrate'. If we can't have 
>> lb-remains-stuck-to-failover-node, we are in trouble, because as soon 
>> as primary node fails we go back to the situation outlined above 
>> where we do a lot of locate/migrate and are not much better off than 
>> a shared store.
>>
>>
>> The lb-sticks-to-failover-node is not as simple as it sounds - mod_jk 
>> doesn't do it.
>>
>> it implies
>>
>> either :
>>
>> you have the ability to change the routing info carried on the 
>> session id client side (I've considered this and don't think it 
>> practical - I may be wrong ...)
>>
>> or :
>>
>> the session id needs to carry not just a single piece of routing info 
>> (like a mod_jk worker name) but a failover list 
>> worker1,worker2,worker3 etc in effect your buddy-team,
>>
>> or:
>>
>> the lb needs to maintain state, remembering where each session was 
>> last serviced and always sticking requests for that session to that 
>> node. in a large deployment this requires lbs to replicate this state 
>> between them so that they can balance over the same nodes in a 
>> coordinated fashion. I think F5 Big-IP is capable of this, but 
>> effectively you just shift the state problem from yourself to someone 
>> else.
>>
>> Note that if your lb can understand extended routing info involving 
>> the whole buddy team, then you know that it will always balance 
>> requests to members of this team anyway, in which case you can 
>> dispense with 'locate/migrate' again.
>>
>> Finally - you still need a migrate operation as sessions will need to 
>> migrate from buddy-group to buddy-group as buddy-groups are created 
>> and destroyed...
>>
>>
>> in summary - I think that you can optimise away 'locate' and a lot of 
>> 'migrate'-ion - Jetty's current impl has no locate and you can build 
>> subdivided clusters with it and mod_jk.... but I don't do automatic 
>> repartitioning yet....
>>
>>
>> If you are still reading here, then you are doing well :-)
>>
>>
>>
>> Jules
>>
>>
>> Jeremy Boynes wrote:
>>
>>>> I figure that we are talking about two different and orthogonal 
>>>> types of
>>>> partition here.
>>>>
>>> Agreed.
>>>
>>>
>>>> I'm happy to call the way that nodes are linked into buddy-groups
>>>> (groups of peers that store replicated state for each other) something
>>>> other than 'partition', if we want to reserve that term for some 
>>>> sort of
>>>> cluster management concept, but you do agree that these structures
>>>> exist, do you not ? regardless of what they are called, otherwise 
>>>> you do
>>>> not scale, as we have all agreed.
>>>>
>>>> As for loadbalancer configuration I think this will draw upon both
>>>> 'jeremy-partition' and 'jules-buddy-group' status as :
>>>>
>>>> - you only want to balance requests for a webapp to nodes on which 
>>>> it is
>>>> deployed
>>>>
>>> Yes
>>>
>>>
>>>> - you only want to fail-over requests to other nodes in the same
>>>> buddy-group as the failed node
>>>>
>>> Ideally, yes but this is not essential. See below.
>>>
>>>
>>>> if you can do the latter you can avoid cluster-wide logic for findg 
>>>> and
>>>> migrating sessions from remote nodes to the one receiving the request,
>>>> because you can guarantee that the session is already there.
>>>>
>>>
>>> The price to pay for this is that you always need to replicate state 
>>> to any
>>> node to which the request may be directed. If you allow for a locate 
>>> phase,
>>> then you can minimise the set of nodes to which data is replicated (the
>>> buddy-group) because any node can find it later. In a high-affinity
>>> configuration this reduces the overall load.
>>>
>>> Consider a four node partition A,B,C,D. In the 
>>> 'replicate-everywhere' model,
>>> A's state is replicated to three other nodes after every request, 
>>> incurring
>>> the processing cost on three nodes (assuming network multicast). If 
>>> A dies,
>>> any node can instantly pick up the work. The issue is we have a lot of
>>> overhead to reduce the latency in the event of node death (which we 
>>> hope is
>>> infrequent).
>>>
>>> The other alternative is that every session has one and only one 
>>> buddy. This
>>> would result in 1/3 of A's sessions being replicated to B, 1/3 to C 
>>> and 1/3
>>> to D. Each session is replicated to just one node, allowing unicast 
>>> to be
>>> used (which has a lower overhead than multicast) and only incurring the
>>> ongoing processing cost on one node.
>>>
>>> If A dies, then B,C,D pick new buddies for A's sessions and do bulk 
>>> state
>>> transfer to redistribute, ensuring that the state is always stored 
>>> on two
>>> nodes. Say B transfers to C, C to D and D to B. Again, unicast 
>>> transfer. You
>>> can avoid this if you are willing to lose a session if another node 
>>> dies
>>> (double failure scenario).
>>>
>>> An A request is now directed to a random node; if this node has the 
>>> state,
>>> then it becomes the primary and starts replicating to its buddy. If 
>>> it does
>>> not, then it sends a multicast inquiry to the partition, locates the 
>>> state,
>>> does a second transfer and starts replicating again.
>>>
>>> The trade off is lower overhead whilst running but a larger state 
>>> transfer
>>> in the event of node death. I tend to prefer the latter on the basis 
>>> that
>>> node deaths are infrequent.
>>>
>>>
>>>
>>>> Are we getting closer ?
>>>>
>>>>
>>> :-)
>>>
>>> -- 
>>> Jeremy
>>>
>>>
>>
>>
>> -- 
>> /**********************************
>> * Jules Gosnell
>> * Partner
>> * Core Developers Network (Europe)
>> **********************************/
>>
>>
>


-- 
/**********************************
 * Jules Gosnell
 * Partner
 * Core Developers Network (Europe)
 **********************************/