You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-dev@axis.apache.org by Rajith Attapattu <ra...@gmail.com> on 2006/09/22 05:19:01 UTC

[axis2] Clustering

Hi All,

Chathura and Chamikara have posted the following proposal on the wiki.
http://wiki.apache.org/ws/FrontPage/Axis2/clustering_proposal

Next step is for us to figure out and document the demarcation points where
we would want to cluster.
Then we can start on an implementation.

Regards,

Rajith

Re: [axis2] Clustering

Posted by Steve Loughran <st...@apache.org>.

Rajith Attapattu wrote:
> Hi All,
> 
> Chathura and Chamikara have posted the following proposal on the wiki.
> http://wiki.apache.org/ws/FrontPage/Axis2/clustering_proposal
> 
> Next step is for us to figure out and document the demarcation points where
> we would want to cluster.
> Then we can start on an implementation.
> 
> Regards,
> 
> Rajith
> 

Some observations

1. try and use other people's work if you can. Getting clustering to 
work properly in the face of network failures is hard. We use a 
partition-aware tuple space for such things; Anubis [1,2].

2. Are you planning on saving state to the servlet context? Its usually 
the simplest way, as the app server vendor will have solved some of the 
problems.

3. Servlet state can be brittle against hot redeploy on a single machine 
if you update the implementations, and very brittle if you do a rolling 
cold redeploy across a cluster. Unless you can go offline, you need to 
do choreographed redeploys in which you partition the cluster and have 
the load balancer serve the old nodes until the new nodes are up and 
have it switched over.

4. Round robin sucks from performance if requests in the same session 
arent biased towards the previous machine, just for cache (including HDD 
& DB) cache reasons.

5. Round robin needs to use happyaxis.jsp or equivalent to decide where 
to route stuff. You cannot just rely on presence of a machine as a 
liveness cue, you need to monitor the health of the operations.

6. Are your clusters going to be on the same site/network? What are the 
minimum network requirements, with WLAN and one end, and infiniband at 
the other?

7. How are you going to stop system management scaling at O(nodes) or 
worse. It can be worse unless your diagnostics are good at tracking down 
which machine has a problem, believe me.

8. Testing all of this gets hard indeed. Sometimes we have to resort to 
mathematical proofs of correctness.

Overall, you need to decide on your goals. Is it scalability or 
availability? Both can be done with clustering but you need good 
awareness of the problems before you can get it right. A High 
Availability system will be robust against transient network failures, 
and may or may not support rolling redeployment. More to the point, an 
underlying design that is not robust against network outages is very 
hard to fix, and stops you doing fun things like downsizing or upsizing 
the nodes based on demand, rerouting to different machines based on WS-A 
internals (*) and session info (i.e per-customer and geographic selection),

The other thing is that achieving consistency of behaviour in your 
distributed system is hard. Whoever implementing it needs to be able to 
argue about Lamport's papers on byzantine generals, or Gray's 
experiences, otherwise they haven't got the background needed to get it 
right. My own skills in the area are limited, which is why I delegate. 
But I do know why its hard.

1. Anubis is OSS, on our sourceforge project, so you could use it, but 
it is LGPL. while we are happy with you calling it from Apache code, I'm 
not sure that apache is. If we can come up with an 
implementation-neutral API, we may be able to implement it and so you 
could use it as your way of sharing state across a single-site cluster, 
preferably one with  a decent ethernet behind it.

2. I would think that a back-end neutral SOAP/HTTP load balancer with 
awareness of back end availability and able to route on WS-A information 
is broadly useful to other SOAP stacks, including Axis1.x and Xfire. 
Maybe it should be a separate project with a JMX management API for live 
configuration. And before you do it, look at what exists in terms of 
HTTP load balancing in the rest of Apache. There's mod-proxy in Apache 
HTTPD, and there's Tomcat's own rule-based load-balancer [3].

-Steve

[1,2] http://www.hpl.hp.com/techreports/2005/HPL-2005-72.html
http://www.smartfrog.org/releasedocs/smartfrogdoc/anubis/AnubisUserGuide.pdf
[3] 
http://tomcat.apache.org/tomcat-5.5-doc/balancer-howto.html#Using%20the%20balancer%20webapp

(*) Load balancing is one reason I dont like WS-A; you need to parse the 
doc to find the URL, unless the URL is the only thing you redirect on.

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-dev-help@ws.apache.org