You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tomcat.apache.org by Filip Hanik - Dev Lists <de...@hanik.com> on 2006/03/03 17:44:21 UTC

[feedback request] session replication

I wrote together a little idea (also emailed to geronimo-dev for 
feedback) on how the next generation of session replication should be done.
Today's replication is an all-to-all replication, and within that realm, 
its pretty poor. It creates way to much network traffic as each request 
results in X number of network transmits (where X is the number of nodes 
in the cluster/domain).

The suggested solution offers two advantages:
1. Each request with a dirty session should only result in 1 network 
send (unless session create, session delete, or failover)
2. The session Manager (StandardManager,DeltaManager,etc) should not 
have to know about replication, the DeltaManager today is too 
complicated and too involved in the replication logic.

I propose a very simple, yet very efficient replication mechanism, that 
is so straight forward that with a simple extension of the 
StandardSession (if not even built in) you can have session replication. 
It will even support future efforts for replication when we can have AOP 
monitor the sessions itself for data that has changed without 
setAttribute/removeAttribute, and the implementation wouldn't change much.

in my opinion, the Session API should not have to know about clustering 
or session replication, nor should it need to worry about location.
the clustering API should take care of all of that.

the solution that we plan to implement for Tomcat is fairly straight 
forward. Let me see if I can give an idea of how the API shouldn't need 
to worry, its a little lengthy, but it shows that the Session and the 
SessionManager need to know zero about clustering or session locations. 
(this is only one solution, and other solutions should demonstrate the 
same point, SessionAPI need to know nothing about clustering or session 
locations)

1. Requirements to be implemented by the Session.java API
  bool isDirty - (has the session changed in this request)
  bool isDiffable - is the session able provide a diff
  byte[] getSessionData() - returns the whole session
  byte[] getSessionDiff() - optional, see isDiffable, resets the diff data
  void setSessionDiff(byte[] diff) - optional, see isDiffable, apply 
changes from another node

2. Requirements to be implemented by the SessionManager.java API
  void setSessionMap(HashMap map) - makes the map implementation pluggable

3. And the key to this, is that we will have an implementation of a 
LazyReplicatedHashMap
  The key object in this map is the session Id.
  The map entry object is an object that looks like this
  ReplicatedEntry {
     string id;//sessionid
     bool isPrimary; //does this node hold the data
     bool isBackup; //does this node hold backup data
     Session session; //not null values for primary and backup nodes
     Member primary; //information about the primary node
     Member backup; //information about the backup node
  }

  The LazyReplicatedHashMap overrides get(key) and put(id,session)

So all the nodes will have the a sessionId,ReplicatedEntry combinations 
in their session map. But only two nodes will have the actual data.
This solution is for sticky LB only, but when failover happens, the LB 
can pick any node as each node knows where to get the data.
The newly selected node, will keep the backup node or select a new one, 
and do a publish to the entire cluster of the locations.

As you can see, all-to-all communications only happens when a Session is 
(created|destroyed|failover). Other than that it is primary-to-backup 
communication only, and this can be in terms of diffs or entire sessions 
using the isDirty or getDiff. This is triggered either by an interceptor 
at the end of each request or by a batch process for less network jitter 
but less accuracy (but adequate) for fail over.

What makes this possible will be that Tribes will have true state 
replication and other true RPC calls into the cluster.

positive thoughts, criticism and bashing are all welcome :)
(remember that I work with the KISS principle)
http://people.apache.org/~fhanik/kiss.html

Filip


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Remy Maucherat <re...@apache.org>.

Filip Hanik - Dev Lists wrote:
>> This byte based solution doesn't seem useful to me: for example in 
>> JBoss there is support for finer grain replication, and it doesn't use 
>> byte arrays.
> I don't think data transfers get more fine grained than a single byte :) 

Of course, but using byte[] as a type anywhere is bad, I think (not 
recyclable, needs byte copying, etc). Using an indirection like 
ByteBuffer or ByteChunk is the minimum.

> (don't think you can write a bit to a stream)
> If you think of AOP and just transfer data diffs, they still get 
> transferred as bytes. using byte[] arrays lets the container control the 
> serialization. this way, you can have as fine or coarse grained as you 
> want. The API shouldn't impose what serialization mechanism is being 
> used, that is why it is byte[]

Yes, I was thinking AOP or similar mechanism. Of course, at some point, 
it will get to bytes to be transferred to another node, but it's a 
feature and format of the transport since the StandardSession will have 
no idea how to apply it. If (as I understand it) it's just a callback 
which is meant to be implemented by classes which extend 
StandardSession, then IMO it is too strict to impose dirty/diff logic. I 
still prefer using classes which extend StandardSession and StandardManager.

>> I'd like to have more explanation why this new APIs are a great idea, 
>> and how they will be used.
> of course, very large size clusters, with linear increase in network 
> chatter, not exponential like today.
> more to come...

I have trouble seeing the relation between the proposed API additions to 
org.apache.catalina.Session and the capability of having a large cluster ;)

Rémy

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Filip Hanik - Dev Lists <de...@hanik.com>.

Remy Maucherat wrote:
> Filip Hanik - Dev Lists wrote:
>> 1. Requirements to be implemented by the Session.java API
>>  bool isDirty - (has the session changed in this request)
>>  bool isDiffable - is the session able provide a diff
>>  byte[] getSessionData() - returns the whole session
>>  byte[] getSessionDiff() - optional, see isDiffable, resets the diff 
>> data
>>  void setSessionDiff(byte[] diff) - optional, see isDiffable, apply 
>> changes from another node
>>
>> 2. Requirements to be implemented by the SessionManager.java API
>>  void setSessionMap(HashMap map) - makes the map implementation 
>> pluggable
>
> This byte based solution doesn't seem useful to me: for example in 
> JBoss there is support for finer grain replication, and it doesn't use 
> byte arrays.
I don't think data transfers get more fine grained than a single byte :) 
(don't think you can write a bit to a stream)
If you think of AOP and just transfer data diffs, they still get 
transferred as bytes. using byte[] arrays lets the container control the 
serialization. this way, you can have as fine or coarse grained as you 
want. The API shouldn't impose what serialization mechanism is being 
used, that is why it is byte[]

>
> I'd like to have more explanation why this new APIs are a great idea, 
> and how they will be used.
of course, very large size clusters, with linear increase in network 
chatter, not exponential like today.
more to come...

Filip

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Remy Maucherat <re...@apache.org>.

Filip Hanik - Dev Lists wrote:
> 1. Requirements to be implemented by the Session.java API
>  bool isDirty - (has the session changed in this request)
>  bool isDiffable - is the session able provide a diff
>  byte[] getSessionData() - returns the whole session
>  byte[] getSessionDiff() - optional, see isDiffable, resets the diff data
>  void setSessionDiff(byte[] diff) - optional, see isDiffable, apply 
> changes from another node
> 
> 2. Requirements to be implemented by the SessionManager.java API
>  void setSessionMap(HashMap map) - makes the map implementation pluggable

This byte based solution doesn't seem useful to me: for example in JBoss 
there is support for finer grain replication, and it doesn't use byte 
arrays.

I'd like to have more explanation why this new APIs are a great idea, 
and how they will be used.

Rémy

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Filip Hanik - Dev Lists <de...@hanik.com>.

thanks for the comments, replies inlined.


> You may be able to simply judge isDiffable for yourself without that
> method by checking that getSessionDiff() != null.
>   
yes, that is true. I put in isDiffable, the semantics would be a little 
easier to understand, than using a return value of null for logic.

>> 2. Requirements to be implemented by the SessionManager.java API
>>   void setSessionMap(HashMap map) - makes the map implementation pluggable
>>     
>
> The argument there should be of type Map, not HashMap, to allow other
> Map implementations.
>   
Yes, good catch!!

Filip



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Yoav Shapira <yo...@apache.org>.

Hola,
A couple of small comments:


> 1. Requirements to be implemented by the Session.java API
>   bool isDirty - (has the session changed in this request)
>   bool isDiffable - is the session able provide a diff
>   byte[] getSessionData() - returns the whole session
>   byte[] getSessionDiff() - optional, see isDiffable, resets the diff data
>   void setSessionDiff(byte[] diff) - optional, see isDiffable, apply
> changes from another node

You may be able to simply judge isDiffable for yourself without that
method by checking that getSessionDiff() != null.

> 2. Requirements to be implemented by the SessionManager.java API
>   void setSessionMap(HashMap map) - makes the map implementation pluggable

The argument there should be of type Map, not HashMap, to allow other
Map implementations.

Yoav

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Filip Hanik - Dev Lists <de...@hanik.com>.

Andy Piper wrote:
> Hi Filip
>
> At 04:44 PM 3/3/2006, Filip Hanik - Dev Lists wrote:
>> 3. And the key to this, is that we will have an implementation of a 
>> LazyReplicatedHashMap
>>  The key object in this map is the session Id.
>>  The map entry object is an object that looks like this
>>  ReplicatedEntry {
>>     string id;//sessionid
>>     bool isPrimary; //does this node hold the data
>>     bool isBackup; //does this node hold backup data
>>     Session session; //not null values for primary and backup nodes
>>     Member primary; //information about the primary node
>>     Member backup; //information about the backup node
>>  }
>
> Burning a primary-secondary scheme into the API seems a little less 
> general. I think you should assume there can be N backups, where N is 
> usually 1. How do you handle locking with this API?
we already have an all-to-all implementation, this is just another 
implementation. but it is true, maybe there should be more than one 
backup, I'll start with one, and move to more than one if needed. In 
terms of locking, none planned, to much overhead.
correctness can be achieved within the session itself, ie if it is 
getting serialized, it would need to lock itself and then reset its 
state for the next diff. sessions today have access counters, which 
means that we can also periodically solicit sessions that are not being 
accessed at the time, and not worry about the locks at all.
>
>>  The LazyReplicatedHashMap overrides get(key) and put(id,session)
>>
>> So all the nodes will have the a sessionId,ReplicatedEntry 
>> combinations in their session map. But only two nodes will have the 
>> actual data.
>> This solution is for sticky LB only, but when failover happens, the 
>> LB can pick any node as each node knows where to get the data.
>> The newly selected node, will keep the backup node or select a new 
>> one, and do a publish to the entire cluster of the locations.
>
> Doesn't this mean you have to publish to the whole cluster at session 
> creation? This will eventually limit scalability IMO.
as mentioned earlier, for very large clusters, we can use a Cookie to 
store the backup location, this is the solution to your concern about 
broadcasting the map each time. although, other challenges arise with 
this. but I plan on having both solutions.

Filip

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Andy Piper <an...@bea.com>.

Hi Filip

At 04:44 PM 3/3/2006, Filip Hanik - Dev Lists wrote:
>3. And the key to this, is that we will have an implementation of a 
>LazyReplicatedHashMap
>  The key object in this map is the session Id.
>  The map entry object is an object that looks like this
>  ReplicatedEntry {
>     string id;//sessionid
>     bool isPrimary; //does this node hold the data
>     bool isBackup; //does this node hold backup data
>     Session session; //not null values for primary and backup nodes
>     Member primary; //information about the primary node
>     Member backup; //information about the backup node
>  }

Burning a primary-secondary scheme into the API seems a little less 
general. I think you should assume there can be N backups, where N is 
usually 1. How do you handle locking with this API?

>  The LazyReplicatedHashMap overrides get(key) and put(id,session)
>
>So all the nodes will have the a sessionId,ReplicatedEntry 
>combinations in their session map. But only two nodes will have the 
>actual data.
>This solution is for sticky LB only, but when failover happens, the 
>LB can pick any node as each node knows where to get the data.
>The newly selected node, will keep the backup node or select a new 
>one, and do a publish to the entire cluster of the locations.

Doesn't this mean you have to publish to the whole cluster at session 
creation? This will eventually limit scalability IMO.

andy 

_______________________________________________________________________
Notice:  This email message, together with any attachments, may contain
information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated
entities,  that may be confidential,  proprietary,  copyrighted  and/or
legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient,
and have received this message in error, please immediately return this
by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Filip Hanik - Dev Lists <de...@hanik.com>.

you can use a stupid one here too. just make sure it is sticky.

Tino Schwarze wrote:
> Hi Filip,
>
> On Fri, Mar 03, 2006 at 10:44:21AM -0600, Filip Hanik - Dev Lists wrote:
>   
>> I wrote together a little idea (also emailed to geronimo-dev for 
>> feedback) on how the next generation of session replication should be done.
>> Today's replication is an all-to-all replication, and within that realm, 
>> its pretty poor. It creates way to much network traffic as each request 
>> results in X number of network transmits (where X is the number of nodes 
>> in the cluster/domain).
>>     
> [...]
>
> The drawback of your approach is that you still need an intelligent
> loadbalancer in front of the cluster. When using all-to-all replication
> you can use a stupid one, which simply schedules requests round-robin.
>
> Bye,
>
> Tino.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Tino Schwarze <ti...@tisc.de>.

Hi Filip,

On Fri, Mar 03, 2006 at 10:44:21AM -0600, Filip Hanik - Dev Lists wrote:
> I wrote together a little idea (also emailed to geronimo-dev for 
> feedback) on how the next generation of session replication should be done.
> Today's replication is an all-to-all replication, and within that realm, 
> its pretty poor. It creates way to much network traffic as each request 
> results in X number of network transmits (where X is the number of nodes 
> in the cluster/domain).
[...]

The drawback of your approach is that you still need an intelligent
loadbalancer in front of the cluster. When using all-to-all replication
you can use a stupid one, which simply schedules requests round-robin.

Bye,

Tino.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Leon Rosenberg <ro...@googlemail.com>.

ups:

> If we would store the result-set (list of already created beans) in
> the session,
read:
If we would _NOT_ store the result-set (list of already created beans) in
 the session,

Sorry.
Leon

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Filip Hanik - Dev Lists <de...@hanik.com>.

for very large clusters, you use the same mechanism, except, instead of 
distributing the entire session map, the backup node info is stored in a 
cookie.

Filip


> If we would store the result-set (list of already created beans) in
> the session, we'd have to store them twice, once in the "cache" and
> once in request for presentation. However, a pluggable diff object
> would be great!
>   
how you store it, once or twice is up to you, and how you code your app. 
if you are storing it twice then that is a flaw in your programming, not 
in tomcat nor in the session replication.

> Btw, another point: The object/handler/whatever which decides whether
> a session create event should be distributed at all should be
> configurable/replaceable too.
> Background: most or at least many hardware loadbalancer use urls for
> service health monitoring. They do not send any cookies back, so in
> fact each heartbit creates a new session. Our lb + failover lb are
> sending heartbits each 8 seconds each. With session timeout of 30
> minutes we always have 450 active lb sessions on each server.
> Distributing those sessions should be considered spam and waste of
> network resources :-)
>   
as mentioned, you can choose cookie or distributed map. when you use the 
cookie logic, you still need to cancel out the primary cookie in case 
the lb did a false positive.

> In case primary node knows, that it will go down in near future and
> should send all his users away, it could stop accepting new requests
> and redirect old users directly to the backup node(s). That way the
> performance risk of getting sessions cross over the network could be
> reduced.
>
> What do you think about it?
>   
this would be implementing code that is not needed. solving a problem 
that is already solved doesn't help anyone.

>   
>> http://people.apache.org/~fhanik/kiss.html ;)
>>     
>
> I fully agree with the KISS principle, and follow it in my job and all
> projects, that's why we never use anything we don't need, like
> app-servers, or-mappers and such, until one proves, that using the
> thing make the live really easier and not complicated.
>
> Therefore I understand that implementing support for everything
> everyone need and keeping the code simple are contrarian goals, but
> making the code fine-graned and elements exchangeable wouldn't violate
> KISS, would it?
>   
it would only violate it if you the features are never used, and 
everyone sticks with the default. then you have code that is not in place.
I usually don't write the code until there is an expressed need for it, 
and I can't get that expressed need until people say they dont like the 
default impl.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Leon Rosenberg <ro...@googlemail.com>.

comments inside

On 3/4/06, Filip Hanik - Dev Lists <de...@hanik.com> wrote:
> Leon Rosenberg wrote:
> > Hello Filip,
> >
> > very interesting proporsal indeed. May I point you to some importand
> > use-cases and ask whether your proporsal has solutions for it.
> >
> > 1. Poviding an easy way to customize diffs.
> >
> > As far as I understand customizable diffs are possible by a) patching
> > the StandardSession or b) configuring alternative SessionManager which
> > would create a wrapper to the StandardSession. Still I'd prefer an
> > easier way, for example a DiffCreator object inside the session which
> > can be replaced upon session creating.
> > The background of the question is following:
> > we have a large webfarm with huge traffic on it. Our sessions are
> > partly heavyweght, since we using them for caching (we have much to
> > much memory in the tomcat and want to use it). For example we are
> > caching search results (very heavyweight) which are provided by a
> > third-party search engine. In case a use just switches the view or
> > navigate in cached parts of the search result, they are replied from
> > cache reducing load on the third-party system. In case of a server
> > crash and the failover to another server we would accept loosing the
> > cached version in favour of reducing the traffic. Therefore a
> > customization would be very useful.
> >
> This scenario sounds like you shouldn't use the session as cache,
> implement a cache object that does the same thing.
> but point taken, you want a way to customize the diffs, my guess is that
> you could have a pluggable diff object that attaches to the session.
> This object can be configured through server.xml (global) or context.xml
> (per webapp)
>

If we would store the result-set (list of already created beans) in
the session, we'd have to store them twice, once in the "cache" and
once in request for presentation. However, a pluggable diff object
would be great!

Btw, another point: The object/handler/whatever which decides whether
a session create event should be distributed at all should be
configurable/replaceable too.
Background: most or at least many hardware loadbalancer use urls for
service health monitoring. They do not send any cookies back, so in
fact each heartbit creates a new session. Our lb + failover lb are
sending heartbits each 8 seconds each. With session timeout of 30
minutes we always have 450 active lb sessions on each server.
Distributing those sessions should be considered spam and waste of
network resources :-)

>
>
> > 2. Session sharing between different webapps.
> > Following use-case: As soon as the user submits personal information
> > it's sent over https instead of http to secure it from the
> > man-in-the-middle. Our https is handled by the loadbalancer
> > (hardware), so we aren't able to provide https for every user
> > operation. The application which is handling personal data contains
> > all the same classes as the primary application, but another
> > configuration, so it can be considered a different webapps. For
> > tracking purposes we need data from users primary session, which we
> > can't access in the https application. It would be very cool (and
> > actually a task in my bugzilla account at work) to provide a service
> > which could maintain a part of the session centralized and allow other
> > servers/webapps to get this sessions data.
> >
> easier way to do this would be to create a centralized cache, and put it
> in <tomcat>/shared/lib/
> This might be out of scope for Tomcat, and out of scope for replication
> for sure.

Forgot to mention that the webapps are running on different instances
in different service pools :-) What I had in mind was a kind of second
session cookie, which is set per domain (configurable), read out by
any webapp in the same domain and synchronized with a "central session
holder". But you're right, this is probably out of tomcat scope and
could be solved with a central service instance available over the
network and filters in webapps (or whatever). Point taken :-)

>
> > 3. Controlled failover:
> > In order to make soft-releases and maintainance (without logging out
> > the user) it would be very cool to transfer a session from one server
> > to another and back.
> > Use case:
> > The webfarm consists of two servers, A and B. Admin issues a command
> > to server A notto accept new sessions anymore. Server A (customizable
> > code of course) rewrites the loadbalancer cookie to point to server B.
> > User makes next request and comes to server B which then gets the
> > session from A. After all A sessions expire (or a timeout) server A
> > goes down for maintainance. After Server A is back up again and the
> > game continues with B.
> >
> This is automatic. It will happen exactly the way you describe. The way
> the LazyReplicatedMap works is as follows:
> 1. Backup node fails -> primary node chooses a new backup node
> 2. Primary node fails -> since Tomcat doesn't know which node the user
> will come to their
>    next http request, nothing is done.
>    When the user makes a request, and the session manager says
> LazyMap.getSession(id) and that session is not yet on the server,
>    the lazymap will request the session from the backup server, load it
> up, set this node as primary.
>    that is why it is called lazy, cause it wont load the session until
> it is actually needed, and because it doesn't know what node will become
> primary, this is decided by the load balancer.

Understood... that means that all tomcats communicate with each other
sending at least discover requests on create/destroy session, which
you mentioned in the previous post.
I'm still not quite sure if this can work efficently without a central
place for session management, but you sound very confident.

One problem that I still see with your approach: in large clusters
(say more then 20 servers)  chances for user to come out on the backup
node are null (well 5.26% which is pretty near null in production
environment). This means that immediately after primary node fails a
lot of traffic between the backup node and the other nodes will take
place. In our use case, where we want to put down 10 of 20 servers for
release purposes it can mean VERY much unnecessary traffic.

In case primary node knows, that it will go down in near future and
should send all his users away, it could stop accepting new requests
and redirect old users directly to the backup node(s). That way the
performance risk of getting sessions cross over the network could be
reduced.

What do you think about it?

> http://people.apache.org/~fhanik/kiss.html ;)

I fully agree with the KISS principle, and follow it in my job and all
projects, that's why we never use anything we don't need, like
app-servers, or-mappers and such, until one proves, that using the
thing make the live really easier and not complicated.

Therefore I understand that implementing support for everything
everyone need and keeping the code simple are contrarian goals, but
making the code fine-graned and elements exchangeable wouldn't violate
KISS, would it?

>
> Filip
>

Leon

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Filip Hanik - Dev Lists <de...@hanik.com>.

Leon Rosenberg wrote:
> Hello Filip,
>
> very interesting proporsal indeed. May I point you to some importand
> use-cases and ask whether your proporsal has solutions for it.
>
> 1. Poviding an easy way to customize diffs.
>
> As far as I understand customizable diffs are possible by a) patching
> the StandardSession or b) configuring alternative SessionManager which
> would create a wrapper to the StandardSession. Still I'd prefer an
> easier way, for example a DiffCreator object inside the session which
> can be replaced upon session creating.
> The background of the question is following:
> we have a large webfarm with huge traffic on it. Our sessions are
> partly heavyweght, since we using them for caching (we have much to
> much memory in the tomcat and want to use it). For example we are
> caching search results (very heavyweight) which are provided by a
> third-party search engine. In case a use just switches the view or
> navigate in cached parts of the search result, they are replied from
> cache reducing load on the third-party system. In case of a server
> crash and the failover to another server we would accept loosing the
> cached version in favour of reducing the traffic. Therefore a
> customization would be very useful.
>   
This scenario sounds like you shouldn't use the session as cache, 
implement a cache object that does the same thing.
but point taken, you want a way to customize the diffs, my guess is that 
you could have a pluggable diff object that attaches to the session.
This object can be configured through server.xml (global) or context.xml 
(per webapp)



> 2. Session sharing between different webapps.
> Following use-case: As soon as the user submits personal information
> it's sent over https instead of http to secure it from the
> man-in-the-middle. Our https is handled by the loadbalancer
> (hardware), so we aren't able to provide https for every user
> operation. The application which is handling personal data contains
> all the same classes as the primary application, but another
> configuration, so it can be considered a different webapps. For
> tracking purposes we need data from users primary session, which we
> can't access in the https application. It would be very cool (and
> actually a task in my bugzilla account at work) to provide a service
> which could maintain a part of the session centralized and allow other
> servers/webapps to get this sessions data.
>   
easier way to do this would be to create a centralized cache, and put it 
in <tomcat>/shared/lib/
This might be out of scope for Tomcat, and out of scope for replication 
for sure.

> 3. Controlled failover:
> In order to make soft-releases and maintainance (without logging out
> the user) it would be very cool to transfer a session from one server
> to another and back.
> Use case:
> The webfarm consists of two servers, A and B. Admin issues a command
> to server A notto accept new sessions anymore. Server A (customizable
> code of course) rewrites the loadbalancer cookie to point to server B.
> User makes next request and comes to server B which then gets the
> session from A. After all A sessions expire (or a timeout) server A
> goes down for maintainance. After Server A is back up again and the
> game continues with B.
>   
This is automatic. It will happen exactly the way you describe. The way 
the LazyReplicatedMap works is as follows:
1. Backup node fails -> primary node chooses a new backup node
2. Primary node fails -> since Tomcat doesn't know which node the user 
will come to their
   next http request, nothing is done.
   When the user makes a request, and the session manager says 
LazyMap.getSession(id) and that session is not yet on the server,
   the lazymap will request the session from the backup server, load it 
up, set this node as primary.
   that is why it is called lazy, cause it wont load the session until 
it is actually needed, and because it doesn't know what node will become 
primary, this is decided by the load balancer.
> best regards
> Leon
>
> P.S. One more question... If I understood your correctly you plan to
> share the session with exact one other server. I haven't found
> anything about central replication service in your mail, so how is the
> server deciding which is he's replication partner? Kind of random
> algorythm or broadcast request? Please enlight me :-)
>
>   
the algorithm is pluggable, a tomcat A server can decide back up all 
sessions to the same node, tomcat B, or tomcatA can distribute its 
sessions round robin to B,C,D,...

or a more sophisticated and complicated mechanism can be developed, but 
probably not needed and it doesn't adhere to 
http://people.apache.org/~fhanik/kiss.html ;)


Filip


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [feedback request] session replication

Posted by Leon Rosenberg <ro...@googlemail.com>.

Hello Filip,

very interesting proporsal indeed. May I point you to some importand
use-cases and ask whether your proporsal has solutions for it.

1. Poviding an easy way to customize diffs.

As far as I understand customizable diffs are possible by a) patching
the StandardSession or b) configuring alternative SessionManager which
would create a wrapper to the StandardSession. Still I'd prefer an
easier way, for example a DiffCreator object inside the session which
can be replaced upon session creating.
The background of the question is following:
we have a large webfarm with huge traffic on it. Our sessions are
partly heavyweght, since we using them for caching (we have much to
much memory in the tomcat and want to use it). For example we are
caching search results (very heavyweight) which are provided by a
third-party search engine. In case a use just switches the view or
navigate in cached parts of the search result, they are replied from
cache reducing load on the third-party system. In case of a server
crash and the failover to another server we would accept loosing the
cached version in favour of reducing the traffic. Therefore a
customization would be very useful.

2. Session sharing between different webapps.
Following use-case: As soon as the user submits personal information
it's sent over https instead of http to secure it from the
man-in-the-middle. Our https is handled by the loadbalancer
(hardware), so we aren't able to provide https for every user
operation. The application which is handling personal data contains
all the same classes as the primary application, but another
configuration, so it can be considered a different webapps. For
tracking purposes we need data from users primary session, which we
can't access in the https application. It would be very cool (and
actually a task in my bugzilla account at work) to provide a service
which could maintain a part of the session centralized and allow other
servers/webapps to get this sessions data.

3. Controlled failover:
In order to make soft-releases and maintainance (without logging out
the user) it would be very cool to transfer a session from one server
to another and back.
Use case:
The webfarm consists of two servers, A and B. Admin issues a command
to server A notto accept new sessions anymore. Server A (customizable
code of course) rewrites the loadbalancer cookie to point to server B.
User makes next request and comes to server B which then gets the
session from A. After all A sessions expire (or a timeout) server A
goes down for maintainance. After Server A is back up again and the
game continues with B.

Those are the three main use-cases we would be very interested in. I
understand that neither of them is solveable without internal webapp
knowledge, but it could be solved the way, that the webapp only need
to provide/configure one-two custom handlers with few lines of code
and doesn't have to reimplement everything each time.

Since we have to develop the three use-cases either way and in case
tomcat team is interested in supporting them too, we would gladly
participate in the development and develop it "the tomcat conform"
way, so it can be contributed.

best regards
Leon

P.S. One more question... If I understood your correctly you plan to
share the session with exact one other server. I haven't found
anything about central replication service in your mail, so how is the
server deciding which is he's replication partner? Kind of random
algorythm or broadcast request? Please enlight me :-)

On 3/3/06, Filip Hanik - Dev Lists <de...@hanik.com> wrote:
> I wrote together a little idea (also emailed to geronimo-dev for
> feedback) on how the next generation of session replication should be done.
> Today's replication is an all-to-all replication, and within that realm,
> its pretty poor. It creates way to much network traffic as each request
> results in X number of network transmits (where X is the number of nodes
> in the cluster/domain).
>
> The suggested solution offers two advantages:
> 1. Each request with a dirty session should only result in 1 network
> send (unless session create, session delete, or failover)
> 2. The session Manager (StandardManager,DeltaManager,etc) should not
> have to know about replication, the DeltaManager today is too
> complicated and too involved in the replication logic.
>
> I propose a very simple, yet very efficient replication mechanism, that
> is so straight forward that with a simple extension of the
> StandardSession (if not even built in) you can have session replication.
> It will even support future efforts for replication when we can have AOP
> monitor the sessions itself for data that has changed without
> setAttribute/removeAttribute, and the implementation wouldn't change much.
>
> in my opinion, the Session API should not have to know about clustering
> or session replication, nor should it need to worry about location.
> the clustering API should take care of all of that.
>
> the solution that we plan to implement for Tomcat is fairly straight
> forward. Let me see if I can give an idea of how the API shouldn't need
> to worry, its a little lengthy, but it shows that the Session and the
> SessionManager need to know zero about clustering or session locations.
> (this is only one solution, and other solutions should demonstrate the
> same point, SessionAPI need to know nothing about clustering or session
> locations)
>
> 1. Requirements to be implemented by the Session.java API
>   bool isDirty - (has the session changed in this request)
>   bool isDiffable - is the session able provide a diff
>   byte[] getSessionData() - returns the whole session
>   byte[] getSessionDiff() - optional, see isDiffable, resets the diff data
>   void setSessionDiff(byte[] diff) - optional, see isDiffable, apply
> changes from another node
>
> 2. Requirements to be implemented by the SessionManager.java API
>   void setSessionMap(HashMap map) - makes the map implementation pluggable
>
> 3. And the key to this, is that we will have an implementation of a
> LazyReplicatedHashMap
>   The key object in this map is the session Id.
>   The map entry object is an object that looks like this
>   ReplicatedEntry {
>      string id;//sessionid
>      bool isPrimary; //does this node hold the data
>      bool isBackup; //does this node hold backup data
>      Session session; //not null values for primary and backup nodes
>      Member primary; //information about the primary node
>      Member backup; //information about the backup node
>   }
>
>   The LazyReplicatedHashMap overrides get(key) and put(id,session)
>
> So all the nodes will have the a sessionId,ReplicatedEntry combinations
> in their session map. But only two nodes will have the actual data.
> This solution is for sticky LB only, but when failover happens, the LB
> can pick any node as each node knows where to get the data.
> The newly selected node, will keep the backup node or select a new one,
> and do a publish to the entire cluster of the locations.
>
> As you can see, all-to-all communications only happens when a Session is
> (created|destroyed|failover). Other than that it is primary-to-backup
> communication only, and this can be in terms of diffs or entire sessions
> using the isDirty or getDiff. This is triggered either by an interceptor
> at the end of each request or by a batch process for less network jitter
> but less accuracy (but adequate) for fail over.
>
> What makes this possible will be that Tribes will have true state
> replication and other true RPC calls into the cluster.
>
> positive thoughts, criticism and bashing are all welcome :)
> (remember that I work with the KISS principle)
> http://people.apache.org/~fhanik/kiss.html
>
> Filip
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org