You are viewing a plain text version of this content. The canonical link for it is here.
Posted to wadi-commits@incubator.apache.org by bd...@apache.org on 2005/12/14 23:36:16 UTC
svn commit: r356933 [30/35] - in /incubator/wadi/trunk: ./ etc/ modules/
modules/assembly/ modules/assembly/src/ modules/assembly/src/bin/
modules/assembly/src/conf/ modules/assembly/src/main/
modules/assembly/src/main/assembly/ modules/core/ modules/c...
Added: incubator/wadi/trunk/wadi-site/src/old_docs/distributable.2.html
URL: http://svn.apache.org/viewcvs/incubator/wadi/trunk/wadi-site/src/old_docs/distributable.2.html?rev=356933&view=auto
==============================================================================
--- incubator/wadi/trunk/wadi-site/src/old_docs/distributable.2.html (added)
+++ incubator/wadi/trunk/wadi-site/src/old_docs/distributable.2.html Wed Dec 14 15:32:56 2005
@@ -0,0 +1,1026 @@
+<html>
+ <head>
+ </head>
+ <body>
+ <h1>
+ Distributable J2EE Web Applications
+ <br/>
+ A Container Provider's View of the current Servlet Specification.
+ </h1>
+ The
+ <a href="http://java.sun.com/products/servlet/download.html#specs">
+ 'Java(tm) Servlet Specification, Version 2.4'
+ </a>
+ makes a number of references to 'distributable' web applications
+ and httpsession 'migration'. It states that compliant deployments
+ "...can ensure scalability and quality of service features like
+ load-balancing and failover..." (SRV.7.7.2). In today's demanding
+ enterprise environments, such features are increasingly
+ required. This paper sets out to distil and understand the
+ relevant contents of the specification, construct a model of the
+ functionality that this seems to support, assess this
+ functionality with regard to feasibility and popular requirements
+ and finally make suggestions as to how a compliant implementation
+ might be architected.
+ <h2>
+ Prerequisites.
+ </h2>
+ TODO - A good understanding of what an HttpSession is, what it is
+ used for and how it behaves will be necessary for a full
+ understanding of this content. A comprehensive grasp of the
+ requirements driving architectures towards clustering and of
+ common cluster components (such as load-balancers) will also be
+ highly beneficial.
+ <h2>
+ The Servlet Specification - distilled:
+ </h2>
+ When a webapp declares itself <distributable/> it enters into a
+ contract with it's container. The Servlet Specification includes a dry
+ bones description of this contract which we will distil from it and
+ flesh out in this paper.
+ <p/>
+ For a successful outcome the implementors of both Container and
+ Containee need to be agreed on exactly what behaviour is expected of
+ each other. For a really deep understanding of the contract they will
+ need to know why it is as it is (TODO - This paper will provide such a
+ view, from both sides).
+ <p/>
+ The Specification mandates the following behaviour for distributable
+ Servlets:
+ <p/>
+
+ <h3>
+ Non-Distributable Servlets
+ </h3>
+ Only Servlets deployed within a webapp may be distributable. (TODO -
+ Ed.: is there any other standard way to deploy a Servlet? Perhaps
+ through the InvokerServlet?) (SRV.3.2) TODO - WHY?
+
+ <h3>
+ Single Threaded Servlets
+ </h3>
+ SingleThreadedModel Servlets, whilst discouraged (since it is
+ generally more efficient for the Servlet writer, who understands the
+ problem domain, to deal with application synchronisation issues) are
+ limited to a single instance pool per JVM.(SRV.2.3.3.1)
+
+ <h3>
+ Multi-Threaded Servlets
+ </h3>
+ Multithreaded HttpServlets are restricted to one Servlet
+ instance per JVM, thus delegating all application
+ synchronisation issues to a single point where the Servlet's
+ writer may resolve them with application-level
+ knowledge (SRV.2.2).
+
+ <h3>
+ Distributable State
+ </h3>
+ The only state to be distributed will be the HttpSession. Thus all
+ application state that requires distribution must be housed in an
+ HttpSession or alternative distributed resource (e.g. EJB, DB,
+ etc.). The contents of the ServletContext are NOT distributed.
+ (SRV.3.2, SRV.3.4.1, SRV.14.2.8)
+
+ <h3>
+ HttpSession Migration
+ </h3>
+
+ Moving HttpSessions between process boundaries (i.e. from JVM to
+ JVM, or JVM to store) is termed 'migration'.In order that the
+ container should know how to migrate application-space Objects,
+ stored in an HttpSession, they must be of mutually agreed type.
+
+ <p/>
+
+ In a J2EE (Version 1.4) environment (e.g. in a web container
+ embedded in an application server), the set of supported types
+ for HttpSession attributes is as follows, although web
+ containers are free to extend this set (J2EE.6.4): (Note that
+ using an extended type would impact your webapp's portability).
+
+ <p/>
+
+ <ul>
+ <li><code>java.io.Serializable</code></li>
+ <li><code>javax.ejb.EJBObject,</code></li>
+ <li><code>javax.ejb.EJBHome</code></li>
+ <li><code>javax.ejb.EJBLocalObject</code></li>
+ <li><code>javax.ejb.EJBLocalHome</code></li>
+ <li><code>javax.transaction.UserTransaction</code> (TODO ??)</li>
+ <li>"a <code>javax.naming.Context</code> object for the java:comp/env context" (TODO)</li>
+ </ul>
+ <p/>
+
+ Breaking this contract through use of an unagreed type will
+ result in the container throwing an
+ <code>IllegalArgumentException</code> upon its introduction to
+ the HttpSession, since the container must maintain the
+ migratability of this resource (SRV.7.7.2).
+
+ </li>
+
+ <h3>
+ Migration Implementation
+ </h3>
+ How migration is actually implemented is undefined and left up to
+ the container provider (SRV.7.7.2). The application is not even
+ guaranteed that the container will use <code>readObject()</code>
+ and <code>writeObject()</code> (TODO explain) methods if they are
+ present on an attribute. The only guarantee given by the
+ specification is that their "serializable closure" will be
+ "preserved" (SRV.7.7.2). This is to allow the container provider
+ maximum flexibility in this area.
+
+ <h3>
+ HttpSessionActivationListener
+ </h3>
+ The specification describes an
+ <code>HttpSessionActivationListener</code> interface. Attributes
+ requiring notification before or after migration can implement
+ this. The container will call their <code>willPassivate()</code>
+ method just before passivation, thus giving them the chance to
+ e.g. release non-serialisable resources. Immediately after
+ activation the container will call their
+ <code>didActivate()</code> method, giving them the chance to
+ e.g. reacquire such resources. (SRV.7.7.2, SRV.10.2.1, SRV.15.1.7,
+ SRV.15.1.8). Support for a number of other such listeners are
+ required in a compliant implementation, but these are not directly
+ related to session migration.
+
+ <h3>
+ HttpSession Affinity
+ </h3>
+ Given that:
+ <ul>
+ <li>
+ Multiple instances of a distributable webapp will be running
+ in multiple different JVMs within our proposed cluster
+ </li>
+ <li>
+ A client browser may throw multiple concurrent requests
+ for the same session at this cluster
+ </li>
+ <li>
+ The spirit of the specification and performance
+ requirements call for such a grouping of requests to be
+ processed concurrently, rather than serially,
+ </li>
+ </ul>
+ we can see that any implementation must resolve these
+ apparently contradictory issues satisfactorily.
+
+ <p/>
+
+ The Servlet Specification states:
+
+ <p/>
+
+ "All requests that are part of a session must be handled by
+ one Java Virtual Machine (JVM) at a time." (SRV.7.7.2).
+
+ <p/>
+
+ The intention of this statement is to resolve such
+ concurrency issues. It prunes the tree of possible
+ implementations substantially, insisting that all concurrent
+ requests for a particular session are delivered to the same
+ node.
+
+ <p/>
+
+ Delivering requests for the same session to the same node is
+ known variously as 'session affinity', 'sticky sessions',
+ persistent sessions' etc., depending on your container's
+ vendor. The specification is trading complexity in the
+ web-container tier for complexity in the load-balancer
+ tier. This added requirement will impact the latency of this
+ tier, in that the load-balancer will generally need to parse the
+ uri or headers of each http request travelling through it (in a
+ non-encrypted form) in order to extract the target session
+ id. However, the reduction of potentially awkward concurrency
+ issues/race conditions in the web-container tier is a gain
+ considered worth this sacrifice.
+
+ <p/>
+
+ It is worth noting that, since we have now introduced a
+ requirement for the load-balancer tier to have knowledge of
+ the location of httpsessions within the web-container tier,
+ the ability to 'migrate' these objects may, therefore,
+ require a certain amount of coordination between the two
+ tiers.
+
+ <h3>
+ Background Threads
+ </h3>
+ <p/>
+
+ The previous requirement reduces our problem from race
+ conditions between distributed objects in different JVMs, to a
+ situation where we simply have to manage coordination between
+ multiple threads in the same JVM. The purpose of this
+ coordination is to ensure that access to container managed
+ resources that are available to multiple concurrent application
+ space threads is properly synchronised.
+
+ <p/>
+
+ Whilst the container has implicit knowledge about any thread,
+ executing application code, for the lifecycle of which it is
+ responsible (i.e. request threads), it has no control over any
+ thread that is entirely managed by application code - Background
+ thread. Such threads might execute across request boundaries,
+ accessing otherwise predictably dormant resources that might
+ otherwise be passivated or migrated elsewhere.
+
+ <p/>
+
+ Fortunately, the specification also recommends that references
+ to container-managed objects should not be given to threads that
+ have been created by an application (SRV.2.3.3.3, SRV.S.17) and
+ whose lifecycle is not entirely bounded by that of a request
+ thread. The container is encouraged to generate warnings if this
+ should occur. Application developers should understand that
+ recommendations such as this become all the more important when
+ working in a distributed environment.
+
+ <p/>
+
+ This concept of "container-managed objects" needs more careful
+ discussion and we shall look at it more closely later.
+
+ <h3>
+ HttpSession Events
+ </h3>
+
+ <p/>
+
+ Finally, given that HttpSessions are the only type to be
+ distributed and that they should only ever be in one JVM at one
+ time, it should come as no surprise that ServletContext and
+ HttpSession events are not propagated outside the JVM in which
+ they were raised (SRV.10.7) as this would result in container
+ owned objects becoming active in a JVM through which no relevant
+ request thread was passing.
+
+ <h2>
+ Is this adequate ?
+ </h2>
+
+ Armed now with a deeper understanding of exactly what the
+ specification says about distributable webapps, we can begin to
+ speculate on what a compliant implementation might look like.
+
+ <p/>
+
+ The specification has done a reasonably good job of outlining our area
+ of interest. Before implementing a container, however, there are a
+ number of issues that we still need to address.
+
+ <h3>
+ Catastrophic failure
+ </h3>
+
+ <p/>
+
+ TODO -
+ Looking at what this specification actually says about
+ distributable webapps, it can be seen immediately that it seems
+ to reliably outline a mechanism for the controlled shutdown of a
+ node and the attendant migration of it's sessions to [an]other
+ node[s], or persistant storage.
+
+ <p/>
+
+ The ability to migrate sessions on controlled shutdown is useful
+ functionality (maintenance will be one of the main reasons
+ behind the occurrence of session migration), but it does not go
+ far enough for many enterprise-level users, who require a
+ solution capable of transparent recovery, without data loss,
+ even in the case of a node's catastrophic failure. If a node is
+ simply switched off, thus having no chance to perform a shutdown
+ sequence, then volatile state will simply be lost. It is too
+ late to call HttpSessionActivationListener.willPassivate() where
+ necessary and serialise all user state to a safe place!
+ Container implementors must ask themselves the question - 'What,
+ within the bounds of the current specification, can we do to
+ mitigate this event?'.
+
+ <p/>
+
+ Before moving into more detailed discussion about session
+ migration we need to discuss the synchronisation of session
+ attributes and to introduce the concepts of 'Reference vs. Value
+ Based Semantics' and 'Object Identity'.
+
+ <h3>
+ Session Attribute Synchronisation
+ </h3>
+
+ <p/>
+
+ We have shown that there are many times at which a container may
+ wish to take a backup copy, via serialisation, of a session or
+ session attribute. In a multi-threaded environment the container
+ needs to be able to ensure a consistent view of the object that
+ it is backing up. i.e. the object must remain unchanged
+ throughout the process of serialisation, otherwise the backup
+ copy can not be guaranteed valid.
+
+ <p/>
+
+ If we classify session attributes as "container-managed
+ objects", then we can see that the specification 'recommends'
+ their references not being given to any application thread
+ running beyond the scope of a request. This means that, provided
+ that no request threads for this session are running in the
+ container, we can be assured of thread-safe access to it's
+ attributes and thus a consistent snapshot of the session's
+ state.
+
+ <p/>
+
+ If we classify sessions but not session attributes as
+ "container-managed objects", then this assumption breaks down.
+
+ <p/>
+
+ Even given this asumption, backing up of sessions when a
+ relevant request or background thread is running (e.g. 'When'
+ policies 'Immediate' and 'Request') become problematic. This is
+ unfortunate, because inability to implement these policies
+ impacts on the guarantees that the container can make and thus
+ the quality of service that it can offer.
+
+ <p/>
+
+ These issues are not isolated to the management of HttpSessions,
+ they are present throughout distributed software
+ architectures. Aside from an explicit synchronisation protocol a
+ common and practical solution is to alter the semantics of object equality.
+
+ <p/>
+
+ Because the design of HttpSessions did not originally encompass their distributability
+
+ - explicit session attribute synchronisation protocol between
+ application and container code.
+
+ - shift from reference to value based semantics
+
+ Object Identity is also an issue.
+
+
+
+
+ <h3>
+ Reference vs Value Based Semantics - TODO - needs refactoring.
+ </h3>
+
+ Given the following Servlet code snippet:
+
+ <pre>
+ Foo foo1=new Foo();
+ session.setAttribute("foo", foo1);
+ Foo foo2=session.getAttribute("foo");
+ </pre>
+
+ Which of these assertions (assuming that <code>Foo.equals()</code>
+ is well implemented) would you expect to be true?
+
+ <ul>
+ <li>
+ <pre>
+ foo1==foo2;
+ </pre>
+ </li>
+ <li>
+ <pre>
+ foo1.equals(foo2);
+ </pre>
+ </li>
+ </ul>
+
+ <p/>
+
+ If you expect <code>foo1==foo2</code> then you are expecting
+ reference-based semantics.
+
+ <p/>
+
+ If you are expecting reference-based semantics you might well
+ write code such as this in order to avoid unnecessary
+ de/rehashes:
+
+ <pre>
+ Point p=new Point(0,0);
+ session.setAttribute("point", p);
+ p.setX(100);
+ p.setY(100);
+ </pre>
+
+ and then might expect that :
+
+ <pre>
+ ((Point)session.getAttribute("point")).getX()==100;
+ </pre>
+
+ <p/>
+
+ Using value based-semantics, out of these three (TODO)
+ assertions, only the second of the equality tests would succeed.
+
+ <p/>
+
+ Every parameter passed to and from a value based API must be
+ assumed to be copied from an original, since it may have come
+ across the wire from another address space.
+
+ <p/>
+
+ For this reason, when you start dealing with (possibly) remote
+ objects in a distributed scenario, you generally shift your
+ semantics from reference to value. (c.f. Remote EJB APIs)
+
+ <p/>
+
+ Unfortunately, the Servlet Specification, whilst clearly
+ mandating that every session attribute must be of a type that
+ the container knows how to move from VM to VM omits to mention
+ that a possible impact of doing this is an important shift in
+ semantics. This is exacerbated by the fact that, unlike EJBs,
+ which have been designed specifically for distributed use, the
+ httpsession API does not change (c.f. Local/Remote) according to
+ the semantic that is required, which is simply a single
+ deployment option. This encourages developers to believe that
+ they can make a webapp that has been written for Local use, into
+ a fully functional distributed component, simply by adding the
+ relevant tag to the web.xml. All attendant problems are
+ delegated, by spec and developer, to the unfortunate container
+ provider.
+
+ <p/>
+
+ Thus the container provider must make a choice here
+
+ <ul>
+ <li>
+ continue to support reference-based semantics in which case
+ migration may only occur when there are no active threads for
+ a session and there is an implicit contract between container
+ and containee that objects deriving from a session will not
+ have their references compared to objects deriving from
+ elsewhere, whose lifecycles may span across such periods.
+ </li>
+ <li>
+ make an explicit new contract that states that all interaction
+ with session attributes is by value and comparisons should
+ only be made in this way. The full ramifications of this
+ choice should become apparent as we progress further in this
+ paper.
+ </li>
+ </ul>
+
+ <h3>
+ Object Identity, Object Streams and Synchronisation
+ </h3>
+
+ <p/>
+
+ TODO - I guess Object Identity can only be preserved within a
+ single Object tree ? so attribute-based distribution will not
+ recognise the same object shared between different attributes
+
+ <p/>
+
+ How can we guarantee, unless we know that no other threads are
+ running, the synchronisation of values as we stream them out of
+ the container ?
+
+ <h3>
+ Session Backup - When
+ </h3>
+
+ <p/>
+
+ The answer to the concern of lost data is to frequently ship
+ backup copies off-node, so that in the case of its catastrophic
+ failure, we have a fallback position. The freshness of our
+ backup data depends directly on the frequency of this
+ process. This frequency is bounded by resource concerns and the
+ contract between container and containee, as discussed above.
+
+ <p/>
+ Let us examine some of the possibilities:
+
+ <ul>
+ <li>
+ Immediate - As soon as a change is made to a session, it is
+ backed up.
+ <ul>
+ <li>
+ Most Accurate - This policy constrains our window of
+ data-loss as much as is reasonably possible.
+ </li>
+ <li>
+ Most Expensive - This accuracy has a cost. Every write to
+ a session object will result in expensive back up code
+ being triggered.
+ </li>
+ <li>
+ TODO - Without some agreement on value-based semantics or
+ attribute synchronisation, the container cannot guarantee
+ thread-safety as it serialises its backup.
+ </li>
+ </ul>
+ </li>
+ <li>
+ Request - All changes to a session are backed up at the end of
+ each relevant request.
+ <ul>
+ <li>
+ Less Accurate - This is less accurate than the 'Immediate'
+ policy described above, since a failure halfway through a
+ request thread would result in the loss of all changes
+ that it had made to it's session.
+ </li>
+ <li>
+ Less Expensive - Since backups are only done at the end of
+ each request, they will be fewer than 'Immediate' mode,
+ resulting in much less expense.
+ </li>
+ <li>
+ Inconsistency - Assuming that the session is somehow in a
+ 'consistent' state at the end of each request is
+ misguided, since multiple requests may overlap. Although,
+ it may be, with the benefit of knowledge of the
+ application at one's disposal, that this problem can be
+ safely discounted.
+ </li>
+ <li>
+ Since there may be more application threads than just this
+ request running in the container, this policy suffers the
+ same synchronisation/semantic issues as 'Immediate',
+ although to a lesser extent.
+ </li>
+ </ul>
+ </li>
+ <li>
+ Request Group - All changes to a session are backed up as soon
+ as all associated active requests have been processed.
+ <ul>
+ <li>
+ Less Accurate - This policy will be less accurate still,
+ if requests for the same session overlap within the
+ container.
+ </li>
+ <li>
+ Less Expensive - For the above reason this will lead to it
+ being still less expensive.
+ </li>
+ <li>
+ Consistency - This policy guarantees that what it backs up
+ is a consistent view of the session's contents. No request
+ is only half processed.
+ </li>
+ <li>
+ Thread-Safe - The real win for this policy is that, with
+ no contract between containee and container regarding
+ attribute synchronisation/semantics, other than that
+ described in the Servlet Specification, the container may
+ access session attributes for backup, in the knowledge
+ that no other application code is concurrently modifying
+ them.
+ </li>
+ </ul>
+ </li>
+ <li>
+ WebApplication - Backup occurs in line with web application
+ lifecycle. i.e. Not until the web application is
+ <code>stop()</code>-ed by it's container.
+ <ul>
+ <li>
+ This policy gives no protection against catastrophic
+ failure, but is fine for maintenance-only scenarios.
+ </li>
+ <li>
+ There is no associated runtime overhead.
+ </li>
+ <li>
+ As with the 'Request Group' policy, there are no
+ consistency or synchronisation issues. All request and
+ background threads will have terminated before the session
+ is backed up.
+ </li>
+ </ul>
+ </li>
+ <li>
+ Timed - 'dirty' sessions are backed up at regular intervals,
+ orthogonal to the lifecycles of requests, web applications
+ etc...
+ <ul>
+ <li>
+ This might conceivably be useful to overlay on top of
+ e.g. the 'Request' policy if your request threads took a
+ long time to run, or the 'RequestGroup' policy if you
+ expected long periods without backing up because of many
+ overlapping requests for he same session.
+ </li>
+ <li>
+ A backup policy that worked in the manner would be of
+ conveniently tunable accuracy and impact.
+ </li>
+ <li>
+ However such a policy would suffer from all the
+ synchronisation and semantic issues common to 'Immediate'
+ and 'Request' approaches.
+ </li>
+ </ul>
+ </li>
+ </ul>
+
+ <p/>
+ TODO - NEEDS CONCLUDING
+ <p/>
+ <h3>
+ Session Backup - What ?
+ </h3>
+
+ <p/>
+
+ Once we have decided when to backup, we must think about what to
+ backup. Candidates include the following:
+
+ <ul>
+ <li>
+ Session - We backup the whole session every time it changes.
+ <ul>
+ <li>
+ Simple - This is the simple, brute-force solution.
+ </li>
+ <li>
+ Expensive - This will be an expensive approach if there
+ are e.g. many attributes in your session and you regularly
+ touch one of them.
+ </li>
+ <li>
+ Robust - Even if a backup message went astray, perhaps due
+ to a failure in your transport, the next change to the
+ session would bring the off-node backup fully up to date.
+ </li>
+ <li>
+ Object identity may be maintained across the scope of the
+ whole session. If the same object appears multiple times
+ in your session, then this may be an important feature and
+ worth the price.
+ </ul>
+ </li>
+ <li>
+ Delta - Every time some change occurs to the session,
+ encapsulate this change in an object and back that up. Deltas
+ may be batched until the 'When' policy flushes this onto the
+ distribution layer.
+ <ul>
+ <li>
+ This is a more complex solution, requiring infrastructure
+ capable of detecting changes and classes to encapsulate
+ these.
+ </li>
+ <li>
+ This will be a more lightweight solution in the case of
+ sessions with many attributes and few changes but the
+ additional complexity might outweigh this benefit in the
+ case of a very small session.
+ </li>
+ <li>
+ This solution depends more upon the reliability of the
+ transport used to distribute the backups, since if one is
+ lost, the backup may become invalid. To be guaranteed
+ valid, it must contain the complete set of changes that
+ have occurred to the session, correctly ordered (TODO:
+ excepting some optimisation which we will discuss later).
+ </li>
+ <li>
+ Object identity may only be scoped within the delta. If
+ the same object appears multiple time in the same session,
+ both inside and outside the delta, object identity will
+ not be preserved and reference-based semantics will break
+ down.
+ </li>
+ </ul>
+ </li>
+ <li>
+ All Sessions - This, of course can only be done if you select
+ 'WebApp' as your 'When' policy.
+ <ul>
+ <li>
+ A major win in this strategy is that Object Identity may
+ be scoped across all sessions. i.e. if you have common
+ objects referenced by many different sessions,
+ reference-based semantics will hold for them, since all
+ sessions will be serialised together in a single
+ synchronised block.
+ </li>
+ </ul>
+ </li>
+ </ul>
+
+ <p/>
+
+ <h3>
+ HttpSessionActivationListener:
+ </h3>
+
+ <p/>
+
+ The Servlet Specification has one final curve ball to throw at
+ the Container Provider here. We have already seen how
+ <code>HttpSessionActivationListener</code>s are notified around
+ passivation/activation. Assuming that they require this in order
+ to prepare themselves for serialisation, or recover from
+ deserialisation it is likely that when the container calls their
+ <code>willPassivate()</code> method, that they will move to a
+ new state that, whilst valid for serialisation, is invalid for
+ normal runtime operation. They might e.g. release a resource
+ that would be too expensive or awkward to passivate, knowing
+ that they can reacquire a replacement upon re-activation.
+
+ <p/>
+
+ Imagine now that rather than simply migrate a session from one
+ node to another, we are simply taking a backup of it at the end
+ of a request group, a guard point against the node's
+ catastrophic failure. If we simply call
+ <code>willPassivate()</code> and then serialise a copy into our
+ backup store, we will have the backup that we required, but will
+ have left the attribute in a state which may mean it is invalid
+ for normal operations
+
+ <p/>
+
+ The solution is to call <code>didActivate()</code> immediately
+ after taking the copy, thus restoring the attribute to its
+ previous valid state. In effect the backup procedure may be
+ thought of as a mini-migration off a node and then straight back
+ onto it again, leaving a spare copy off-node.
+
+ <p/>
+
+ This has interesting ramifications for the whole 'Session'
+ backup policy which may end up doing this to many attributes
+ which have not actually been added or altered since the last
+ backup was taken. If this involves an expensive release and
+ reacquisiton of resources, the impact may be substantial. The
+ 'Delta' policy will not suffer from this inefficiency, since it
+ will only concern itself with attributes that have changed.
+
+ <h3>
+ Optimisations
+ </h3>
+
+ <p/>
+
+ TODO - These need to be discussed here so that we can draw upon
+ them when discussing different impls.
+
+ <ul>
+ <li>
+ lastAccessedTime - don't distribute, ignore until last minute
+ ? <p/> If a node dies.all sessions upon it must be considered
+ just touched, since a request thread for them may have caused
+ the crash. Time of node death should be noted and associated
+ with these sessions. upon rehydration this is the lastAcessed
+ value that they should adopt.
+
+ <p/>
+
+ TODO - perhaps move this into previous section - a naive
+ impl may..., but: maybe not - think about it...
+
+ Unfortunately, the specification requires that every session
+ object carries a 'LastAccessedTime' value. Which is updated
+ every time the session is retrieved by an application thread
+ for reading or writing. Thus any request requiring stateful
+ interaction within the webapp will have the side effect of
+ writing a change to the session. Taken literally these
+ changes can be very expensive in a distributable scenario as
+ a naive implementation will require each such change to be
+ exported to another vm in case of catastrophic node failure.
+ </li>
+
+ <li>
+ Batching of deltas - compression of e.g. rem(X);set(X) or
+ se(X);set(X) etc..
+ </li>
+ <li>
+
+ TODO - lowering contention on session table...
+
+ DefaultServlet is stateless (session will never be fetched) so
+ can be excluded from the equation. This means that requests for
+ images etc can be excluded from concurrent request groups,
+ reducing them substantially. A smart lb would know about this
+ and could relax affinity as well (although a caching tier might
+ be serving this content before you hit the lb - this tier also
+ needs coordination with web-container teir so it can be
+ selectively flushed upon webapp redeployment).
+ </li>
+ </ul>
+
+ <h3>
+ Conclusions
+ </h3>
+
+ <p/>
+
+ In conclusion, we have been able to show that, whilst the spec
+ does not explicitly cater for recovery from catastrophic
+ failure, it does provide the Container Provider with enough
+ structure to be able to implement various solutions to this
+ problem. Unfortunately, because of its implicit reference-based
+ semantics and failure to impose a mandatory protocol for the
+ synchronisation of distributable session attributes it does not
+ go quite far enough to allow such implementations suffient room
+ to manoeuvre that they can deliver optimium results.
+
+ <p/>
+
+ Given the current state of the specification, therefore, the
+ solution space is an area of trade-off and compromise between
+ not only accuracy and economy as might well be expected but also
+ the semantics of reference, value and identity, which are really
+ areas that should not be open to interpretation. Any application
+ developer involving themselves in this would therefore be well
+ advised to aqcuaint themselves fully with these concepts so as
+ to be prepared for the unexpected side-effects that they are
+ likely to cause.
+
+ <p/>
+
+ No single set of the above policies is likely to implement the
+ desired "silver bullet", however, with a solid understanding of
+ the issues involved and the route that various implementations
+ take through this maze, the application architect has a much
+ improved chance of a successful outcome.
+
+ <p/>
+
+ With this in mind, we may now survey existing implementations in
+ the open source arena, with particular respect to the solutions
+ that they have chosen to overcome the problems that we have
+ identified.
+
+ <h2>
+ Current Open Source Implementations
+ </h2>
+
+ <h3>
+ <a href="www.sf.net/projects/jetty">Jetty</a>
+ </h3>
+
+ The Jetty distribution contains a pluggable distributable session
+ manager, written by the author, which relies on value-based
+ semantics to implement an immediate, by default, delta-based
+ replication strategy over JGroups (TODO - link), although other
+ distribution policies, notably by CMP EJB and JBoss(tm) clustering
+ (see below) are alos available.
+
+ With mod_jk and session affinity, via a pluggable session id
+ generator, backups may be done asynchronously, or if deployment
+ happens under a dumb load-balancer, backups may be taken
+ synchronously to ensure the consistency of the session no matter
+ where in the cluster a request lands.
+
+ <h3>
+ Tomcat 4.x - Filip Hanik
+ </h3>
+
+ <h3>
+ Tomcat 5.x - ???
+ </h3>
+
+ <h3>
+ <a href="www.sf.net/projects/jboss">JBoss(tm)</a>
+ </h3>
+
+ JBoss(tm) contains a ClusteredHttpSession service (TODO - check
+ name) which backs onto the JBoss clustering layer which is
+ implemented through replication using JGroups (TODO -
+ link). Replication is done on according to whole 'Session'
+ policy. The regularity of the backing up depends on the Web
+ Container making use of the service.
+
+ <p/>
+
+ The Jetty integration provides a pluggable 'Store' component which
+ allows it to make use of this medium. Backups are taken
+ immediately any change is made to a session. Jetty's other
+ distribution policies may also be used.
+
+ <p/>
+
+ The Tomcat integration relies on this service for it's
+ transport. (TODO - finish). CONFIRM.
+
+ <h3>
+ Apache Geronimo (TODO - URL)
+ </h3>
+
+ The Geronimo implementation is currently being undertaken by the
+ author of this paper, and therefore takes into account all points
+ raised herein.
+
+ <h2>
+ Further Reading:
+ </h2>
+
+ <ul>
+ <li>
+ SRV.7 Sessions
+ </li>
+ <li>
+ SRV.7.6 Last Accessed Times
+ </li>
+ <li>
+ SRV.15.1.7 HttpSession
+ </li>
+ <li>
+ SRV.15.1.8 HttpSessionActivationListener
+ </li>
+ <li>
+ SRV.15.1.9 HttpSessionAttributeListener
+ </li>
+ <li>
+ SRV.15.1.10 HttpSessionBindingEvent
+ </li>
+ <li>
+ SRV.15.1.11 HttpSessionBindingListener
+ </li>
+ <li>
+ SRV.15.1.13 HttpSessionEvent
+ </li>
+ <li>
+ SRV.15.1.14 HttpSessionListener
+ </li>
+ </ul>
+
+
+ TODO - more readings needed.
+
+ <ul>
+ <li>
+ TODO: URL FOR J2EE spec
+ </li>
+ </ul>
+
+
+ <h3>
+ Further Isues
+ </h3>
+ <ul>
+ <li>
+ ClassLoading - ?? where does this impact ?
+ </li>
+ </ul>
+
+ <h3>
+ Further Notes
+ </h3>
+ TODO - Look into Geronimo impl... SRV.10.6 Listener Exceptions
+
+ TODO - we can use a SecurityManager to prevent background threads
+ being created. We can prevent access from such a thread to a
+ container managed object, but we can't prevent such a reference
+ being held by such a thread...
+
+ <p/>
+
+ does anything else other than session need to be distributed ?
+
+ <li>security info</li>
+ <li>application level data (as opposed to user level)</li>
+ <li>etc</li>
+
+ <p/>
+
+ TODO - replication is faster than shared-store because
+ 'getAttribute' is not a remote call. Effectively, with
+ replication, each replicant IS a shared store which processes
+ requests locally.
+
+ <p/>
+
+ have we mentioned that migration is bad because cache hits go
+ down ?
+
+ <p/>
+
+ Q for Niall - if Object Identity table is scoped within a single
+ ObjectOutputStream, then howcome contention on this table is
+ meant to be a VM-wide problem ? Since the table only appears to
+ scope the lifecycle of a single instance of this stream, how
+ could it make sense for it to be a longlived global construct?
+ </body>
+</html>
+
Added: incubator/wadi/trunk/wadi-site/src/old_docs/distributable.html
URL: http://svn.apache.org/viewcvs/incubator/wadi/trunk/wadi-site/src/old_docs/distributable.html?rev=356933&view=auto
==============================================================================
--- incubator/wadi/trunk/wadi-site/src/old_docs/distributable.html (added)
+++ incubator/wadi/trunk/wadi-site/src/old_docs/distributable.html Wed Dec 14 15:32:56 2005
@@ -0,0 +1,1210 @@
+<html>
+ <head>
+ </head>
+ <body>
+ <h1>
+ Distributable J2EE Web Applications
+ <br/>
+ A Container Provider's View of the current Servlet Specification.
+ </h1>
+ The
+ <a href="http://java.sun.com/products/servlet/download.html#specs">
+ 'Java(tm) Servlet Specification, Version 2.4'
+ </a>
+ makes a number of references to 'distributable' web applications
+ and httpsession 'migration'. It states that compliant deployments
+ "...can ensure scalability and quality of service features like
+ load-balancing and failover..." (SRV.7.7.2). In today's demanding
+ enterprise environments, such features are increasingly
+ required. This paper sets out to distil and understand the
+ relevant contents of the specification, construct a model of the
+ functionality that this seems to support, assess this
+ functionality with regard to feasibility and popular requirements
+ and finally make suggestions as to how a compliant implementation
+ might be architected.
+ <h2>
+ Prerequisites.
+ </h2>
+ TODO - A good understanding of what an HttpSession is, what it is
+ used for and how it behaves will be necessary for a full
+ understanding of this content. A comprehensive grasp of the
+ requirements driving architectures towards clustering and of
+ common cluster components (such as load-balancers) will also be
+ highly beneficial.
+ <h2>
+ The Servlet Specification - distilled:
+ </h2>
+ When a webapp declares itself <distributable/> it enters into a
+ contract with it's container. The Servlet Specification includes a dry
+ bones description of this contract which we will distil from it and
+ flesh out in this paper.
+ <p/>
+ For a successful outcome the implementors of both Container and
+ Containee need to be agreed on exactly what behaviour is expected of
+ each other. For a really deep understanding of the contract they will
+ need to know why it is as it is (TODO - This paper will provide such a
+ view, from both sides).
+ <p/>
+ The Specification mandates the following behaviour for distributable
+ Servlets:
+ <p/>
+
+ <h3>
+ Non-Distributable Servlets
+ </h3>
+ Only Servlets deployed within a webapp may be distributable. (TODO -
+ Ed.: is there any other standard way to deploy a Servlet? Perhaps
+ through the InvokerServlet?) (SRV.3.2) TODO - WHY?
+
+ <h3>
+ Single Threaded Servlets
+ </h3>
+ SingleThreadedModel Servlets, whilst discouraged (since it is
+ generally more efficient for the Servlet writer, who understands the
+ problem domain, to deal with application synchronisation issues) are
+ limited to a single instance pool per JVM.(SRV.2.3.3.1)
+
+ <h3>
+ Multi-Threaded Servlets
+ </h3>
+ Multithreaded HttpServlets are restricted to one Servlet
+ instance per JVM, thus delegating all application
+ synchronisation issues to a single point where the Servlet's
+ writer may resolve them with application-level
+ knowledge (SRV.2.2).
+
+ <h3>
+ Distributable State
+ </h3>
+ The only state to be distributed will be the HttpSession. Thus all
+ application state that requires distribution must be housed in an
+ HttpSession or alternative distributed resource (e.g. EJB, DB,
+ etc.). The contents of the ServletContext are NOT distributed.
+ (SRV.3.2, SRV.3.4.1, SRV.14.2.8)
+
+ <h3>
+ HttpSession Migration
+ </h3>
+
+ Moving HttpSessions between process boundaries (i.e. from JVM to
+ JVM, or JVM to store) is termed 'migration'.In order that the
+ container should know how to migrate application-space Objects,
+ stored in an HttpSession, they must be of mutually agreed type.
+
+ <p/>
+
+ In a J2EE (Version 1.4) environment (e.g. in a web container
+ embedded in an application server), the set of supported types
+ for HttpSession attributes is as follows, although web
+ containers are free to extend this set (J2EE.6.4): (Note that
+ using an extended type would impact your webapp's portability).
+
+ <p/>
+
+ <ul>
+ <li><code>java.io.Serializable</code></li>
+ <li><code>javax.ejb.EJBObject,</code></li>
+ <li><code>javax.ejb.EJBHome</code></li>
+ <li><code>javax.ejb.EJBLocalObject</code></li>
+ <li><code>javax.ejb.EJBLocalHome</code></li>
+ <li><code>javax.transaction.UserTransaction</code> (TODO ??)</li>
+ <li>"a <code>javax.naming.Context</code> object for the java:comp/env context" (TODO)</li>
+ </ul>
+ <p/>
+
+ Breaking this contract through use of an unagreed type will
+ result in the container throwing an
+ <code>IllegalArgumentException</code> upon its introduction to
+ the HttpSession, since the container must maintain the
+ migratability of this resource (SRV.7.7.2).
+
+ </li>
+
+ <h3>
+ Migration Implementation
+ </h3>
+ How migration is actually implemented is undefined and left up to
+ the container provider (SRV.7.7.2). The application is not even
+ guaranteed that the container will use <code>readObject()</code>
+ and <code>writeObject()</code> (TODO explain) methods if they are
+ present on an attribute. The only guarantee given by the
+ specification is that their "serializable closure" will be
+ "preserved" (SRV.7.7.2). This is to allow the container provider
+ maximum flexibility in this area.
+
+ <h3>
+ HttpSessionActivationListener
+ </h3>
+ The specification describes an
+ <code>HttpSessionActivationListener</code> interface. Attributes
+ requiring notification before or after migration can implement
+ this. The container will call their <code>willPassivate()</code>
+ method just before passivation, thus giving them the chance to
+ e.g. release non-serialisable resources. Immediately after
+ activation the container will call their
+ <code>didActivate()</code> method, giving them the chance to
+ e.g. reacquire such resources. (SRV.7.7.2, SRV.10.2.1, SRV.15.1.7,
+ SRV.15.1.8). Support for a number of other such listeners are
+ required in a compliant implementation, but these are not directly
+ related to session migration.
+
+ <h3>
+ HttpSession Affinity
+ </h3>
+ Given that:
+ <ul>
+ <li>
+ Multiple instances of a distributable webapp will be running
+ in multiple different JVMs within our proposed cluster
+ </li>
+ <li>
+ A client browser may throw multiple concurrent requests
+ for the same session at this cluster
+ </li>
+ <li>
+ The spirit of the specification and performance
+ requirements call for such a grouping of requests to be
+ processed concurrently, rather than serially,
+ </li>
+ </ul>
+ we can see that any implementation must resolve these
+ apparently contradictory issues satisfactorily.
+
+ <p/>
+
+ The Servlet Specification states:
+
+ <p/>
+
+ "All requests that are part of a session must be handled by
+ one Java Virtual Machine (JVM) at a time." (SRV.7.7.2).
+
+ <p/>
+
+ The intention of this statement is to resolve such
+ concurrency issues. It prunes the tree of possible
+ implementations substantially, insisting that all concurrent
+ requests for a particular session are delivered to the same
+ node.
+
+ <p/>
+
+ Delivering requests for the same session to the same node is
+ known variously as 'session affinity', 'sticky sessions',
+ persistent sessions' etc., depending on your container's
+ vendor. The specification is trading complexity in the
+ web-container tier for complexity in the load-balancer
+ tier. This added requirement will impact the latency of this
+ tier, in that the load-balancer will generally need to parse the
+ uri or headers of each http request travelling through it (in a
+ non-encrypted form) in order to extract the target session
+ id. However, the reduction of potentially awkward concurrency
+ issues/race conditions in the web-container tier is a gain
+ considered worth this sacrifice.
+
+ <p/>
+
+ It is worth noting that, since we have now introduced a
+ requirement for the load-balancer tier to have knowledge of
+ the location of httpsessions within the web-container tier,
+ the ability to 'migrate' these objects may, therefore,
+ require a certain amount of coordination between the two
+ tiers.
+
+ <h3>
+ Background Threads
+ </h3>
+ <p/>
+
+ The previous requirement reduces our problem from race
+ conditions between distributed objects in different JVMs, to a
+ situation where we simply have to manage coordination between
+ multiple threads in the same JVM. The purpose of this
+ coordination is to ensure that access to container managed
+ resources that are available to multiple concurrent application
+ space threads is properly synchronised.
+
+ <p/>
+
+ Whilst the container has implicit knowledge about any thread,
+ executing application code, for the lifecycle of which it is
+ responsible (i.e. request threads), it has no control over any
+ thread that is entirely managed by application code - Background
+ thread. Such threads might execute across request boundaries,
+ accessing otherwise predictably dormant resources that might
+ otherwise be passivated or migrated elsewhere.
+
+ <p/>
+
+ Fortunately, the specification also recommends that references
+ to container-managed objects should not be given to threads that
+ have been created by an application (SRV.2.3.3.3, SRV.S.17) and
+ whose lifecycle is not entirely bounded by that of a request
+ thread. The container is encouraged to generate warnings if this
+ should occur. Application developers should understand that
+ recommendations such as this become all the more important when
+ working in a distributed environment.
+
+ <p/>
+
+ We shall take "container-managed objects" to include any object
+ that has been placed into an httpsession. By virtue of this
+ placement, its lifecycle is now the responsibility of the
+ encompassing session and ultimately therefore of the
+ container. This is a useful constraint, since the
+ container-provider may now prove that, provided that there are
+ no request threads active for a session within the container,
+ entirely thread-safe access may be made not only to an
+ httpsession but also to its attributes, although their locking
+ scheme, being application space components, is completely
+ unknown to the container-provider.
+
+ <h3>
+ HttpSession Events
+ </h3>
+
+ <p/>
+
+ Finally, given that HttpSessions are the only type to be
+ distributed and that they should only ever be in one JVM at one
+ time, it should come as no surprise that ServletContext and
+ HttpSession events are not propagated outside the JVM in which
+ they were raised (SRV.10.7) as this would result in container
+ owned objects becoming active in a JVM through which no relevant
+ request thread was passing.
+
+ <h2>
+ Is this adequate ?
+ </h2>
+
+ Armed now with a deeper understanding of exactly what the
+ specification says about distributable webapps, we can begin to
+ speculate on what a compliant implementation might look like.
+
+ <p/>
+
+ The specification has done a reasonably good job of outlining our area
+ of interest. Before implementing a container, however, there are a
+ number of issues that we still need to address.
+
+ <h3>
+ Catastrophic failure
+ </h3>
+
+ <p/>
+
+ TODO -
+ Looking at what this specification actually says about
+ distributable webapps, it can be seen immediately that it seems
+ to reliably outline a mechanism for the controlled shutdown of a
+ node and the attendant migration of it's sessions to [an]other
+ node[s], or persistant storage.
+
+ <p/>
+
+ The ability to migrate sessions on controlled shutdown is useful
+ functionality (maintenance will be one of the main reasons
+ behind the occurrence of session migration), but it does not go
+ far enough for many enterprise-level users, who require a
+ solution capable of transparent recovery, without data loss,
+ even in the case of a node's catastrophic failure. If a node is
+ simply switched off, thus having no chance to perform a shutdown
+ sequence, then volatile state will simply be lost. It is too
+ late to call HttpSessionActivationListener.willPassivate() where
+ necessary and serialise all user state to a safe place!
+ Container implementors must ask themselves the question - 'What,
+ within the bounds of the current specification, can we do to
+ mitigate this event?'.
+
+ <h3>
+ Session Backup - When
+ </h3>
+
+ <p/>
+
+ The answer to the concern of lost data is to frequently ship
+ backup copies off-node, so that in the case of its catastrophic
+ failure, we have a fallback position. The freshness of our
+ backup data depends directly on the frequency of this
+ process. This frequency is bounded by resource concerns and the
+ contract between container and containee, as discussed above.
+
+ <p/>
+ Let us examine some of the possibilities:
+
+ <ul>
+ <li>
+ Immediate - As soon as a change is made to a session, it is
+ backed up.
+ <ul>
+ <li>
+ Most Accurate - This policy constrains our window of
+ data-loss as much as is reasonably possible.
+ </li>
+ <li>
+ Most Expensive - This accuracy has a cost. Every write to
+ a session object will result in expensive back up code
+ being triggered.
+ </li>
+ <li>
+ Synchronisation Issues/Ref vs Val semantics - TODO
+ </li>
+ </ul>
+ </li>
+ <li>
+ Request - All changes to a session are backed up at the end of
+ each relevant request.
+ <ul>
+ <li>
+ Less Accurate - This is less accurate than the 'Immediate'
+ policy described above, since a failure halfway through a
+ request thread would result in the loss of all changes
+ that it had made to it's session.
+ </li>
+ <li>
+ Less Expensive - Since backups are only done at the end of
+ each request, they will be fewer than 'Immediate' mode,
+ resulting in much less expense.
+ </li>
+ <li>
+ Inconsistency - Assuming that the session is somehow in a
+ 'consistent' state at the end of each request is
+ misguided, since multiple requests may overlap. Although,
+ it may be, with the benefit of knowledge of the
+ application at one's disposal, that this problem can be
+ safely discounted.
+ </li>
+ <li>
+ Synchronisation Issues/Ref vs Val semantics - TODO - This
+ policy suffers from the same issues, in this respect as
+ 'Immediate'.
+ </li>
+ </ul>
+ </li>
+ <li>
+ Request Group - All changes to a session are backed up as soon
+ as all associated active requests have been processed.
+ <ul>
+ <li>
+ Less Accurate - This policy will be less accurate still,
+ if requests for the same session overlap within the
+ container.
+ </li>
+ <li>
+ Less Expensive - For the above reason this will lead to it
+ being still less expensive.
+ </li>
+ <li>
+ Consistency - This policy guarantees that what it backs up
+ is a consistent view of the session's contents. No request
+ is only half processed.
+ </li>
+ <li>
+ Thread-Safe - The real win for this policy is that, with
+ no contract between containee and container, other than
+ that described in the Servlet Specification, the container
+ may access session attributes for backup, in the knowledge
+ that no other application code is concurrently modifying
+ them.
+ </li>
+ </ul>
+ </li>
+ <li>
+ WebApplication - Backup occurs in line with web application
+ lifecycle. i.e. Not until the web application is
+ <code>stop()</code>-ed by it's container.
+ <ul>
+ <li>
+ This policy gives no protection against catastrophic
+ failure, but is fine for maintenance-only scenarios.
+ </li>
+ <li>
+ There is no associated runtime overhead.
+ </li>
+ <li>
+ There are no consistency or synchronisation issues. All
+ request and background threads will have terminated before
+ the session is backed up.
+ </li>
+ </ul>
+ </li>
+ <li>
+ Timed - 'dirty' sessions are backed up at regular intervals,
+ orthogonal to the lifecycles of requests, web applications
+ etc...
+ <ul>
+ <li>
+ This might conceivably be useful to overlay on top of
+ e.g. the 'Request' policy if your request threads took a
+ long time to run, or the 'RequestGroup' policy if you
+ expected long periods without backing up because of many
+ overlapping requests for he same session.
+ </li>
+ <li>
+ A backup policy that worked in the manner would be of
+ conveniently tunable accuracy and impact.
+ </li>
+ <li>
+ However such a policy would suffer from all the
+ consistency and synchronisation issues common to
+ 'Immediate' and 'Request' approaches.
+ </li>
+ </ul>
+ </li>
+ </ul>
+
+ <p/>
+ REFACTORING HAS GOT TO HERE...
+ <p/>
+
+ <h3>
+ Session Backup - At what granularity ?
+ </h3>
+
+ <ul>
+ <li>
+ whole session
+ <ul>
+ <li>
+ synchronisation issues - must lock more objects (TODO)
+ </li>
+ </ul>
+ </li>
+ <li>
+ single attribute
+ <ul>
+ <li>
+ more complex
+ </li>
+ <li>
+ less contention
+ </li>
+ <li>
+ Object Identity issues - TODO - same object in different attributes not preserved ?
+ </li>
+ </ul>
+ </li>
+ <li>
+ batched attributes
+ <ul>
+ <li>
+ even more complex
+ </li>
+ <li>
+ multiple changes to same attribute may be collapsed
+ </li>
+ <li>
+ Object Identity - as above - TODO
+ </li>
+ </ul>
+ </li>
+ </ul>
+
+ <p/>
+
+ (TODO - requests do not have transactional semantics)
+
+ <p/>
+
+ (TODO - if a single request reset an attribute a number of times,
+ immediate xfer would be expensive, batching would also be expensive
+ since each reset would involve a serialisation of which only the last
+ would be useful (or can we leave this til the last moment?))
+
+ <p/>
+
+ What can we do?
+
+ <ul>
+ <li>
+ whatever we do it must not break spec compliance/portability (so we
+ cannot e.g. extend APIs).
+ </li>
+
+ <li>
+ insist on the no background thread rule - unless distribution
+ is only done on webapp.stop()
+ </li>
+ <li>
+ agree an explicit session attribute synchronisation contract -
+ <code>synchronized(Object){...}</code> - why implicit rule is
+ not enough.
+ </li>
+
+ <h3>
+ Reference vs Value Based Semantics
+ </h3>
+
+ <p/>
+
+ It is useful to introduce the distinction between reference and
+ value based semantics at this point.
+
+ <p/>
+
+ Given the following Servlet code snippet:
+
+ <pre>
+ Foo foo1=new Foo();
+ session.setAttribute("foo", foo1);
+ Foo foo2=session.getAttribute("foo");
+ </pre>
+
+ Which of these assertions (assuming that <code>Foo.equals()</code>
+ is well implemented) would you expect to be true?
+
+ <ul>
+ <li>
+ <pre>
+ foo1==foo2;
+ </pre>
+ </li>
+ <li>
+ <pre>
+ foo1.equals(foo2);
+ </pre>
+ </li>
+ </ul>
+
+ <p/>
+
+ If you expect <code>foo1==foo2</code> then you are expecting
+ reference-based semantics.
+
+ <p/>
+
+ If you are expecting reference-based semantics you might well
+ write code such as this in order to avoid unnecessary
+ de/rehashes:
+
+ <pre>
+ Point p=new Point(0,0);
+ session.setAttribute("point", p);
+ p.setX(100);
+ p.setY(100);
+ </pre>
+
+ and then might expect that :
+
+ <pre>
+ ((Point)session.getAttribute("point")).getX()==100;
+ </pre>
+
+ <p/>
+
+ Using value based-semantics, out of these three (TODO)
+ assertions, only the second of the equality tests would succeed.
+
+ <p/>
+
+ Every parameter passed to and from a value based API must be
+ assumed to be copied from an original, since it may have come
+ across the wire from another address space.
+
+ <p/>
+
+ For this reason, when you start dealing with (possibly) remote
+ objects in a distributed scenario, you generally shift your
+ semantics from reference to value. (c.f. Remote EJB APIs)
+
+ <p/>
+
+ Unfortunately, the Servlet Specification, whilst clearly
+ mandating that every session attribute must be of a type that
+ the container knows how to move from VM to VM omits to mention
+ that a possible impact of doing this is an important shift in
+ semantics. This is exacerbated by the fact that, unlike EJBs,
+ which have been designed specifically for distributed use, the
+ httpsession API does not change (c.f. Local/Remote) according to
+ the semantic that is required, which is simply a single
+ deployment option. This encourages developers to believe that
+ they can make a webapp that has been written for Local use, into
+ a fully functional distributed component, simply by adding the
+ relevant tag to the web.xml. All attendant problems are
+ delegated, by spec and developer, to the unfortunate container
+ provider.
+
+ <p/>
+
+ Thus the container provider must make a choice here
+
+ <ul>
+ <li>
+ continue to support reference-based semantics in which case
+ migration may only occur when there are no active threads for
+ a session and there is an implicit contract between container
+ and containee that objects deriving from a session will not
+ have their references compared to objects deriving from
+ elsewhere, whose lifecycles may span across such periods.
+ </li>
+ <li>
+ make an explicit new contract that states that all interaction
+ with session attributes is by value and comparisons should
+ only be made in this way. The full ramifications of this
+ choice should become apparent as we progress further in this
+ paper.
+ </li>
+ </ul>
+
+ <h3>
+ Object Identity, Object Streams and Synchronisation
+ </h3>
+ TODO - I guess Object Identity can only be preserved within a
+ single Object tree ? so attribute-based distribution will not
+ recognise the same object shared between different attributes
+
+ How can we guarantee, unless we know that no other threads are
+ running, the synchronisation of values as we stream them out of
+ the container ?
+
+<!--
+ Are there any problems with this approach?
+
+ <p/>
+
+ We will outline below a number of shortcomings in the current
+ specification with regards to distributable webapps and proposed
+ solutions. As far as an implementation is concerned, it could
+ simply extend the specification in a proprietary direction,
+ adding features and corresponding API extensions to provide
+ extra functionality. However this breaks portability and creates
+ vendor lock-in. The real challenge is to fulfil our users
+ requirements whilst staying absolutely within the bounds of the
+ specification.
+
+ <p/>
+ Why ?
+ <p/>
+ Inconsistency:
+ <p/>
+ Migration:
+ <p/>
+
+ The API and language used in the spec, seem to imply that the
+ expected usage scenario is that a running distributable webapp
+ will be cleanly shut down and it's sessions 'migrated' to
+ another container or store.
+
+ <p/>
+
+ In a perfect world, this would be fine. The administrator would
+ stop the webapp. The container would cease running user requests
+ through it (they would all be load balanced to other nodes in
+ the cluster), all concurrency issues in the webapp would cease
+ and the single remaining control thread would serialise all
+ existing sessions into a shared store whence they could be
+ loaded by peer nodes if and when needed, or directly to one or
+ more peers.
+
+ <p/>
+
+ In the real world, whilst managed shutdown is a very common
+ issue, enterprise users require protection from not just
+ intentional, but also unintentional service outages. The
+ catastrophic failure of a node will stop a JVM instantly. All
+ control is lost, therefore if data (i.e. sessions) has not
+ already been backed up, it is lost. If this data was 10s of
+ 1000s of long, complex, half-filled out purchase orders, then
+ you have just lost a lot of business.
+
+ <p/>
+
+ Catastrophic failure, therefore, is not a scenario realistically
+ addressed by the spec, but the functionality that it does
+ mandate provides the container provider with just enough rope to
+ have a go at hanging himself...
+
+ <p/>
+
+ So, how might the container provider protect a business against
+ catastrophic failure ? Simply by ensuring that off-node backups
+ of each session exist and are as fresh as the business requires
+ and can be achieved within technical constraints.
+
+ <p/>
+
+ The container provider now has to make a couple of major decisions:
+
+ <ul>
+ <li>
+ When to ship session content off-node ?
+ </li>
+ <li>
+ What data to ship ?
+ </li>
+ </ul>
+
+ <h3>
+ When ?
+ </h3>
+
+ <p/>
+
+ If you are going to burn expensive resources, in terms of CPU
+ and bandwidth, serialising and shifting data backwards and
+ forwards, then you will want to ensure that that data is
+ captured in a meaningful state, otherwise you are wasting your
+ time. So we must ask ourself an apparently simple question :
+ "When is an httpsession content consistent?".. The most likely
+ answer to this appears to be - "at the end of each request".
+
+ <p/>
+
+ Unfortunately the spec requires WebContainers to support
+ multiple concurrent request threads. So simply waiting until the
+ end of each request before snapshotting the session is an
+ inadequate solution as an overlapping request may also be
+ writing to the same session. (c.f. EJBs, where the container
+ enforces a single threaded model, thus avoiding this
+ issue). (TODO - confirm.)
+
+ <p/>
+
+ Perhaps we could break the spec and prevent concurrent requests
+ (damaging performance) or only snapshot our session during
+ 'idle-time' when there it has no extant requests.
+
+ <p/>
+
+ Unfortunately the WebContainer is far more relaxed about
+ resource management than e.g. an EJB container and does not
+ forbid the running of background threads by it's containees. A
+ common pattern that takes advantage of this is for an initial
+ request to kick off a longterm job on a background thread to
+ which it hands a reference to it's session. The client then
+ makes requests that poll the session at regular intervals,
+ enquiring about the state of the outstanding task, until the
+ background thread finishes it's job and writes the result back
+ into the session.
+
+ <p/>
+
+ Thus it can be seen that deciding when it is best to snapshot
+ your session is not as simple as it first appeared. In fact, it
+ seems that the only consistent point in a session's lifecycle is
+ immediately after any mutation that may occur i.e. calls to
+ 'set' and 'remove' attributes. i.e. we must increas the
+ granularity at which we measure change from 'request' to
+ 'attribute'.
+
+ <p/>
+ (TODO - expand ?)
+ <p/>
+
+ <h3>
+ What ?
+ </h3>
+ <p/>
+
+ So now that we have decided when to ship data off-node we need
+ to decide what to ship. The two obvious choices are:
+
+ <ul>
+ <li>
+ The whole session.
+ </li>
+
+ <li>
+ A delta describing the change that has just occurred.
+ </li>
+ </ul>
+
+ <p/>
+
+ Implementations that snapshot the session at request boundaries
+ etc tend to capture the whole session as this is probably
+ cheaper than figuring out the delta.
+
+ <p/>
+
+ Triggering distribution after each session mutation makes it
+ simple to capture a delta cntaining the addition, alteration or
+ removal of a single attribute. If shipping these deltas off-node
+ immediately is too expensive, they may be batched and sent at
+ request boundaries anyway.
+
+ <p/>
+
+ One potential weakness of sending immediate deltas is the fact
+ that they rely on value based semantics as described
+ above. Putting an object into a session will result in it's
+ immediate backing up. Subsequent mutations of that object will
+ not be backed up unless e.g. setAttribute() is again called on
+ the session, which has backed up the value of the object that it
+ was given at the time it was given it, not some sort of magic
+ reference. (TODO - explain better).
+
+ <p/>
+
+ So, perhaps we are finally coming up with a workable design. We
+ insist that developers assume value-based semantics. We don't
+ extend the session API in any way. We ship deltas off node as
+ soon as practical and cross our fingers...
+
+-->
+ <p/>
+
+ Unfortunately, the specification requires that every session
+ object carries a 'LastAccessedTime' value. Which is updated
+ every time the session is retrieved by an application thread for
+ reading or writing. Thus any request requiring stateful
+ interaction within the webapp will have the side effect of
+ writing a change to the session. Taken literally these changes
+ can be very expensive in a distributable scenario as a naive
+ implementation will require each such change to be exported to
+ another vm in case of catastrophic node failure.
+
+ <p/>
+ DISCUSS
+
+ <p/>
+ GC GRANULARITY
+ ETC.
+ <p/>
+
+
+ The spec has one last curve ball for us to face.
+ <p/>
+
+ <h3>
+ HttpSessionActivationListener:
+ </h3>
+
+ <p/>
+
+ If a session attribute implements the
+ HttpSessionActivationListener interface, the spec requires the
+ container to call it's willPassivate() method just before and
+ it's didActivate() method just after 'migration'. Since we are
+ no longer implementing straightforward migration from one node
+ to another, but rather some form of multi-copy synchronisation
+ protocol, it is, at first hard to see how these two different
+ paradigms can be mapped to a single model that resolves the
+ problem of when notification should take place.
+
+ <p/>
+
+ How a container provider plays this ball really depends upon the
+ perceived intention of these notifications. The spec implies
+ that they are there so that an attribute which contains
+ expensive non-serializable resources has a chance to release and
+ reacquire them at opportune moments - basically a lifecycle for
+ such attributes. (TODO - confirm)
+
+ <p/>
+
+ The implication of this is that willPassivate() must always be
+ called before the attribute is serialised (i.e. shipped
+ off-node), however, this now leaves said attribute in a
+ passivated state, unsuitable for continued use within the
+ session, so before control flow is returned to the webapp,
+ didActivate() must also be called to put this attribute back
+ into normal service.
+
+ <p/>
+
+ In effect, each mutation can be seen as the mini-migration of a
+ single attribute off and back onto the same node - whilst the
+ container retains a copy of its serialised form to ship off-node
+ as an emergency backup.
+
+ <h3>
+ Concurrency
+ </h3>
+
+ <p/>
+
+ Concurrency is a major issue. It can be divided into two smaller
+ problems:
+
+ <p/>
+
+ Concurrency between threads in different processes.
+
+ <p/>
+
+ Concurrency between threads in the same process.
+
+ <p/>
+
+ If you take the decision that your design will allow concurrent
+ requests within the same session in different vms, you will need a
+ strategy for ensuring that all vms have a consistent view of each
+ session.
+
+ <p/>
+
+ This can be achieved by making the session a single remote object,
+ which all nodes make use of. Since there is only one copy of the
+ session there are no consistency issues, provided that it has
+ sufficient synchronisation within itself. However, since the session
+ is a remote object it will have value-based semantics. If you get the
+ same attribute value from it twice, unless you have a caching layer,
+ they will be 'equal' but not '='. If you have a caching layer you will
+ then have to concern yourself with the complexities of ensuring that
+ items in your cache are invalidated on time, which just brings you
+ full circle back to the consistency problem. I call this solution
+ 'shared store'. Finally, since all interactions are with a remote
+ object, this solution tends to be slow.
+
+ <p/>
+
+ Alternatively, your design might choose to have multiple objects (one
+ in each address space) all representing the same session. As one is
+ changed it notifies the others of the change, so that they can apply
+ it to themselves, so that all these objects maintain a state
+ consistent with each other. This is generally know as '[in-vm]
+ replication'. Implementors of this design need to consider the
+ following issues.
+
+ <p/>
+
+ 1. Race conditions between concurrent changes to the same session
+ occurring on different nodes. (TODO - spec avoids this issue).
+
+ <p/>
+
+ CONSIDER THAT WHEN SPEC SAYS 'AT THE SAME TIME' IT IS TALKING IN TERMS
+ OF REQUEST LIFETIME. - I.E. AFFINITY (AT LEAST FOR OVERLAPPING REQUEST
+ GROUPS) IS MANDATORY
+
+ <p/>
+
+ 2. If upon change, you replicate more than just that change (i.e. each
+ instance of a session has it's state completely replaced, rather than
+ just the attribute which changed on some remote node), you will find
+ that your session objects have inconsistent semantics, since sometimes
+ when you get an attribute from a session it's reference will be the
+ same as the last time, sometimes it won't, although your application
+ may never actually change this attribute - since change to another
+ attribute may have caused the entire session to have been replaced
+ with a fresh copy from across the wire.
+
+ <p/>
+
+ This first issue can be entirely avoided through the use of 'affinity'
+ or 'sticky' or 'persistent' sessions. Nomenclature depends on your
+ vendor. All amount to basically the same thing. A load-balancer that
+ supports this feature will ensure, preferably by tracking the presence
+ of the JSESSIONID cookie and jessionid path parameter, that all
+ requests pertaining to the same session will be delivered to the same
+ host:port combination. Thus, we can see that there never will be
+ concurrent threads contending for the same session in different
+ JVMs/WebContainers since all relevant requests will be routed to the
+ same one. Affinity has one further important benefit. Many webapps may
+ be deployed in complex environments where a lot of transparent caching
+ occurs below them. Without affinity, requests will be delivered to a
+ number of different nodes, all of which will have to populate such
+ caches with objects that will only be reused if another request for
+ the same webapp/session is directed to them. With affinity, requests
+ will always be processed on the same node, so the cache is only
+ populated once and then subsequently reused. This detail will increase
+ cache hits and may have a dramatic effect on performance and resource
+ consumption.
+
+ <p/>
+
+ TODO - WHAT IS HAPPENING HERE ??
+
+ <p/>
+
+ The second issue can be resolved with standard Java(tm)
+ synchronisation primitives or libraries, provided that all code
+ involved is in container-space. Problems arise when webapp code calls
+ container code and vice versa. This is exacerbated by the spec's
+ insistence that an HttpSession should allow access by multiple
+ concurrent threads and the fact that the webapp may still be holding
+ and modifying references to attribute values. Ultimately the container
+ provider will have to specify some contract which he expects the
+ webapp to abide by. A sensible one might be that any thread mutating
+ an attribute should take it's Object-level lock for the duration, so
+ that behind the scenes reads, such as those involved in serialisation
+ etc can synchronise on the same lock and be assured of a consistent
+ view of the object. (CHECK TO SEE WHETHER default read/writeObject are
+ synchronised). This however may be seen as burdensome by the webapp
+ developer who may have their own locking strategy for an attribute
+ type which is compromised by this contract...
+
+ <p/>
+
+ DISCUSS PROS AND CONS OF SHARED-STORE VS REPLICATION
+
+ <p/>
+
+ other items...
+
+ <p/>
+
+ does anything else other than session need to be distributed ?
+
+ <p/>
+
+ - security info
+
+ <p/>
+
+ - application level data (as opposed to user level)
+
+ <p/>
+
+ - etc
+
+ <p/>
+
+ store and replication mechanisms... - going to far.
+
+ <p/>
+
+ other thoughts ?
+
+ <p/>
+
+ reference semantics sacrificed if session is temporarily passiviated -
+ although hopefully no-one (what about background thread?) is holding a
+ reference to us...
+
+ <p/>
+
+ replication with affinity and change-by-delta best solution because it
+ preserves reference-semantics as far as possible - consider...
+
+ <p/>
+
+ replication is faster than shared-store because 'getAttribute' is not
+ a remote call. Effectively, with replication, each replicant IS a
+ shared store which processes requests locally.
+
+ <h2>
+ TODO - Survey existing impls:
+ </h2>
+
+ <ul>
+ <li>
+ TC 4.x and 5.x
+ </li>
+ <li>
+ Jetty
+ </li>
+ <li>
+ JBoss
+ </li>
+ <li>
+ Apache/mod_jk
+ </li>
+ <li>
+ simple TC recommended 'C' lb
+ </li>
+ <li>
+ mod_proxy solution
+ </li>
+ <li>
+ mod_backhand
+ </li>
+ </ul>
+
+ TODO additional requirements when operating in a J2ee environment
+
+ <h2>
+ Further Reading:
+ </h2>
+
+ <ul>
+ <li>
+ SRV.7 Sessions
+ </li>
+ <li>
+ SRV.7.6 Last Accessed Times
+ </li>
+ <li>
+ SRV.15.1.7 HttpSession
+ </li>
+ <li>
+ SRV.15.1.8 HttpSessionActivationListener
+ </li>
+ <li>
+ SRV.15.1.9 HttpSessionAttributeListener
+ </li>
+ <li>
+ SRV.15.1.10 HttpSessionBindingEvent
+ </li>
+ <li>
+ SRV.15.1.11 HttpSessionBindingListener
+ </li>
+ <li>
+ SRV.15.1.13 HttpSessionEvent
+ </li>
+ <li>
+ SRV.15.1.14 HttpSessionListener
+ </li>
+ </ul>
+
+ <ul>
+ <li>
+ TODO: provide URL for latest servlet spec. <http://java.sun.com/products/servlet/download.html#specs>
+ </li>
+ <li>
+ TODO: AND J2EE spec
+ </li>
+ </ul>
+
+
+ <h3>
+ Optimisations
+ </h3>
+
+ <ul>
+ <li>
+ DefaultServlet is stateless (session will never be fetched) so
+ can be excluded from the equation. This means that requests for
+ images etc can be excluded from concurrent request groups,
+ reducing them substantially. A smart lb would know about this
+ and could relax affinity as well (although a caching tier might
+ be serving this content before you hit the lb - this tier also
+ needs coordination with web-container teir so it can be
+ selectively flushed upon webapp redeployment).
+ </li>
+
+ <li>
+ lastAccessedTime - don't distribute, ignore until last
+ minute ? <p/> If a node dies.all sessions upon it must be
+ considered just touched, since a request thread for them may
+ have caused the crash. Time of node death should be noted
+ and associated with these sessions. upon rehydration this is
+ the lastAcessed value that they should adopt.
+ </li>
+
+ <li>
+ Batching of deltas
+ </li>
+
+ </ul>
+
+ <h3>
+ Further Issues
+ </h3>
+ <ul>
+ <li>
+ HtpSessionActivationListener - TODO - WHAT ?
+ </li>
+
+ <li>
+ ClassLoading
+ </li>
+ </ul>
+
+ <h3>
+ Further Notes
+ </h3>
+ TODO - Look into Geronimo impl... SRV.10.6 Listener Exceptions
+ TODO - can all my SRV refs be links ?
+
+ TODO - we can use a SecurityManager to prevent background threads
+ being created. We can prevent access from such a thread to a
+ container managed object, but we can't prevent such a reference
+ being hld by such a thread...
+
+ NB
+
+ </body>
+</html>
+