You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geronimo.apache.org by Freddi Gyara <fg...@gmail.com> on 2004/07/20 11:55:43 UTC

Re: sandbox/messaging - your feedbacks are welcome

A couple of suggestions: 
a) Do we really need to have a specific topology. Since we are using
Multicast, ALL the servers in the cluster would get the multicast
packets anyway.
The servers can then work out their "neighbours" through a simple
(pluggable ?) sort algorithm.

b) The clustering code should be delineated from the rest of Geronimo
through a clear set of API.
Useful functions would include:
publishToNeighbours(obj): Publishes an object to the server's
neighbours in the cluster. The identification of neighbours,
marshalling and transport is left to the clustering layer to decide.
This would be useful for example, to cluster HttpSessions

publishGlobally(obj): Publishes an object to the entire cluster. eg
clustered JNDI context.



On Tue, 20 Jul 2004 14:07:02 +1000, Gianny Damour
<gi...@optusnet.com.au> wrote:
> Hello,
> 
> I am working on a prototype, sandbox/messaging, focused on providing the
> infrastructure for the implementation of clustered applications. This
> proto has reached a stage, which is according to me "good enough" for
> judgment.
> 
> I will try to describe here the main features of this infrastructure;
> hence, this memo will be a little bit long.
> 
> Its core ideas are:
> - to provide a mechanism to cluster/inter-connect N Geronimo servers.
> The way these servers are inter-connected should be at the same time
> manageable (e.g. I want this server to be connected to this one) and to
> some extent automatic (e.g. when a new server is detected, it should be
> added automatically to the cluster); and
> - to provide a set of base services built on top of the above
> infrastructure to simplify the implementation of clustered applications
> (e.g. creation of proxies for services running on remote Geronimo server).
> 
> Let's talk in more details about the way Geronimo servers are clustered.
> The implementation achieve this goal by organizing servers in a known
> and configurable topology, e.g. star, ring, hyper-cube, where edges of
> the associated graphs represent connections. At the very beginning, a
> server and two heartbeat services, namely heartbeat sender and heartbeat
> monitor, are started. The heartbeat sender sends periodically an
> heartbeat consisting of the meta-data (IP address, port and name) of its
> associated server to a multicast group. The heartbeat monitor monitors
> these heartbeats and detects the availability or failure of servers.
> When a new server is available or a failure is detected, a new topology
> is computed and cascaded to the servers of the current topology.
> 
> Let's consider the following scenario:
> Geronimo servers are organized in a ring topology; four servers are
> started and one server is killed.
> 
> 1. starts the first server, namely LearderNode. As it is the first
> server, it is in a stand-alone mode;
> 2. starts the second server, namely Node1. This server is detected by
> LeaderNode, which triggers a reconfiguration. The topology is LeaderNode
> -- Node1 -- LeaderNode;
> 3. starts the third server, namely Node2. LeaderNode inserts Node2
> between itself and Node1. The topology is LeaderNode -- Node1 -- Node2
> -- LeaderNode;
> 4. starts a fourth server, namely Node3. Detected by LeaderNode, it
> inserts Node3 between itself and Node2. The topology is LeaderNode --
> Node1 -- Node2 -- Node3 -- LeaderNode; and
> 5. stops Node2. LeaderNode drops it from the ring. The topology is
> LeaderNode -- Node1 -- Node3 -- LeaderNode.
> 
> As the proto supports the ring topology, it is possible to trial this
> scenario:
> cd sandbox/messaging
> maven (ClusterHBTest may fail, so ignore the test failures if required)
> maven -patch
> cd ../..
> java -jar target/bin/server.jar org/apache/geronimo/LeaderCluster
> java -jar target/bin/server-1101.jar org/apache/geronimo/Cluster8091
> java -jar target/bin/server-1102.jar org/apache/geronimo/Cluster8092
> java -jar target/bin/server-1103.jar org/apache/geronimo/Cluster8093
> kill <the process java -jar target/bin/server-1102.jar
> org/apache/geronimo/Cluster8092>
> 
> As a conclusion, this prototype tries to federate Geronimo servers in
> specific topologies. As an aside, it is rather simple to support other
> kinds of topologies without significant efforts. For instance, one of
> the JUnit test (NodeImplTest)  uses a bus topology.
> 
> Based on the knowledge of the enforced topology,  it should be possible
> to implement "efficient" clustered applications. For instance, the
> replication of Web sessions could work as follow: replicate the sessions
> created on this server to all of its direct neighbours (neighbours can
> be easily retrieved via a topology). This way the load is evenly
> distributed as long as sessions are evenly created in the cluster.
> 
> On top of this infrastructure, the proto implements a set of basic
> services, which could simplify the implementation of such clustered
> applications. These services are:
> - customization of the marshalling/unmarshalling of Objects to be
> sent/received to/from a remote server: it is possible to replace
> specific objects;
> - InputStream can be passed between servers: by leveraging the previous
> feature, InputStreams are replaced by a proxy which can be used to pull
> the content of an InputStream hosted on a remote server. This can be
> useful when dumping the content of a server to another server in order
> to initialize its state;
> - primitive reference layer: Objects implementing a specific interface
> can be passed around even if not serializable. For instance, the current
> implementation can pass around a MBeanServer (this is a bad example as
> JSR 160 is intended for that). If you have a look to
> MBeanServerEndPointImpl, you will see that this is actually the ability
> to return by reference an object to the remote caller. As this caller
> can also provide parameters, which implements this specific interface,
> one can achieves a pass by reference for both the parameters and the
> result between two servers;
> - proxy creation: it is the ability to acquire a proxy for a service
> running on a remote server:
> // Defines the proxy meta-data.
>            EndPointProxyInfo proxyInfo = new EndPointProxyInfo(
>                NodeEndPointView.NODE_ID, NodeEndPointView.class, nodeInfo);
> // Builds the proxy.
>            NodeEndPointView topologyEndPoint =
>                (NodeEndPointView) endPointProxyFactory.factory(proxyInfo);
> // Transforms the Msgs which will be sent by this proxy.
>            ((EndPointProxy) topologyEndPoint).setTransformer(new
> MsgTransformer() {...});
> // This call will actually invoke the service on the server nodeInfo.
>                topologyEndPoint.prepareTopology(aTopology);
> 
> As an aside, whatever the number of services communication with other
> remote services, the number of connections stay low: it is the number of
> edges defined by the current topology.
> 
> This proto has some bugs (e.g. memory leak of the reference layer) and
> some enhancements are required (e.g. classloading strategy is to be
> added). Nevertheless, I would like to have your inputs about the general
> concept and the current state of the implementation prior to progress
> any further.
> 
> Cheers,
> Gianny
> 
>

Re: sandbox/messaging - your feedbacks are welcome

Posted by Gianny Damour <gi...@optusnet.com.au>.

On 20/07/2004 11:29 PM, Freddi Gyara wrote:

>a) Users should have the choice of not having to specify one (default
>topology). Otherwise the complexity of setting up the cluster could be
>too onerous.
>  
>
Actually, I have implemented the Ring topology to avoid having to create 
a custom topology. Whatever the number of servers, the implementation  
is autonomous and will always build a ring. BTW, it is possible to 
implement algorithms to build topologies that human will have a  lot of 
pain to do themself. For instance to built an hypercube of dimension 4 
and provide the shortest paths between two nodes of such a graph is 
rather time consuming and error-prone, at least for a human :).

>A list of "prefered neighbours" - that may be a simpler configuration
>option v/s the definition of a topology
>  
>
I will make this happen.

Thanks,
Gianny

Re: sandbox/messaging - your feedbacks are welcome

Posted by Freddi Gyara <fg...@gmail.com>.

<snip>
>> As a matter of fact, this proto uses multicast (UDP) only for its
>>heartbeat mechanism and unicast (TCP) for all the other activities. </snip>

I think i was a bit vague. No doubt, TCP is required for all p2p
communications. However, when a server has a (sorted) list of live
servers (LeaderNode-Node1-Node2 as in your example), it automatically
knows its neighbours (Node1's nbrs are LeaderNode and Node2) and can
communicate with them over TCP.

Your idea of being able to create a bespoke topology is also good, but:
a) Users should have the choice of not having to specify one (default
topology). Otherwise the complexity of setting up the cluster could be
too onerous.

b) Un-trained users may end up creating a topology that is not
resilient to failures
eg consider a star with LeaderNode as the center and nodes N1, N2, N3
as spokes. If LeaderNode fails, which node becomes the center ? Who
decides ? Is this configurable .... etc etc etc). Ideally the system
should be able to identify topologies that have single points of
failure.



> This is a great idea. Actually, I had this thought a while back: for
> each server, one configures an ordered list of  servers. When a server
> is started, the first N available servers are considered as their
> "neighbours".
A list of "prefered neighbours" - that may be a simpler configuration
option v/s the definition of a topology

Re: sandbox/messaging - your feedbacks are welcome

Posted by Gianny Damour <gi...@optusnet.com.au>.

On 20/07/2004 7:55 PM, Freddi Gyara wrote:

>A couple of suggestions: 
>a) Do we really need to have a specific topology. Since we are using
>Multicast, ALL the servers in the cluster would get the multicast
>packets anyway.
>  
>
You are right: if information is broadcasted to all the servers, then 
one does not need a topology. However, why do we want to send the 
information to all the servers if one only wants to send it to a given 
set of servers?
For instance, in the case of a replication service, the state could be 
replicated to all the servers or only to a specific set. In this latter 
case and as far as I understand, either these packets are unnecessarily 
received by a couple of servers or one sends these packets to a specific 
multicast group only used by the servers that we would like to 
communicate with.
As a matter of fact, this proto uses multicast (UDP) only for its 
heartbeat mechanism and unicast (TCP) for all the other activities.
I think that multicast is useful only for highly redundant data 
distribution. I am not convinced that the replication of HTTP sessions 
needs to be highly redundant. Having said that, I am convinced that 
multicast is the way to go in order to maintain a cluster wide cache.

So, this proto uses the unicast model for peer-to-peer communications. 
It also imposes a specific topology to the servers in order to give full 
control on the way servers are inter-connected. If one wants this server 
to be a direct neighbour (in other word a "prefered" remote server) of 
this other server, then one can achieve that.

>The servers can then work out their "neighbours" through a simple
>(pluggable ?) sort algorithm.
>  
>
This is a great idea. Actually, I had this thought a while back: for 
each server, one configures an ordered list of  servers. When a server 
is started, the first N available servers are considered as their 
"neighbours".

>b) The clustering code should be delineated from the rest of Geronimo
>through a clear set of API.
>Useful functions would include:
>publishToNeighbours(obj): Publishes an object to the server's
>neighbours in the cluster. The identification of neighbours,
>marshalling and transport is left to the clustering layer to decide.
>This would be useful for example, to cluster HttpSessions
>
>publishGlobally(obj): Publishes an object to the entire cluster. eg
>clustered JNDI context.
>  
>
I agree. The proto allows more or less the same kind of operations from 
a semantic point-of-view: it supports the distribution of objects to a 
specified set of servers. For instance, it is possible to create a proxy 
for a service running on N remote servers. When an operation is called 
on this proxy, the invocation is sent to these N remote servers and 
executed. The N results are returned to the proxy, which consolidates 
them and return an "aggregated" result to the caller. You can have a 
look to NodeImplTest.testMulticast for such an example.

Thanks,
Gianny