You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geronimo.apache.org by Gianny Damour <gi...@optusnet.com.au> on 2004/07/20 06:07:02 UTC

sandbox/messaging - your feedbacks are welcome

Hello,


I am working on a prototype, sandbox/messaging, focused on providing the 
infrastructure for the implementation of clustered applications. This 
proto has reached a stage, which is according to me "good enough" for 
judgment.

I will try to describe here the main features of this infrastructure; 
hence, this memo will be a little bit long.

Its core ideas are:
- to provide a mechanism to cluster/inter-connect N Geronimo servers. 
The way these servers are inter-connected should be at the same time 
manageable (e.g. I want this server to be connected to this one) and to 
some extent automatic (e.g. when a new server is detected, it should be 
added automatically to the cluster); and
- to provide a set of base services built on top of the above 
infrastructure to simplify the implementation of clustered applications 
(e.g. creation of proxies for services running on remote Geronimo server).

Let's talk in more details about the way Geronimo servers are clustered. 
The implementation achieve this goal by organizing servers in a known 
and configurable topology, e.g. star, ring, hyper-cube, where edges of 
the associated graphs represent connections. At the very beginning, a 
server and two heartbeat services, namely heartbeat sender and heartbeat 
monitor, are started. The heartbeat sender sends periodically an 
heartbeat consisting of the meta-data (IP address, port and name) of its 
associated server to a multicast group. The heartbeat monitor monitors 
these heartbeats and detects the availability or failure of servers. 
When a new server is available or a failure is detected, a new topology 
is computed and cascaded to the servers of the current topology.

Let's consider the following scenario:
Geronimo servers are organized in a ring topology; four servers are 
started and one server is killed.

1. starts the first server, namely LearderNode. As it is the first 
server, it is in a stand-alone mode;
2. starts the second server, namely Node1. This server is detected by 
LeaderNode, which triggers a reconfiguration. The topology is LeaderNode 
-- Node1 -- LeaderNode;
3. starts the third server, namely Node2. LeaderNode inserts Node2 
between itself and Node1. The topology is LeaderNode -- Node1 -- Node2 
-- LeaderNode;
4. starts a fourth server, namely Node3. Detected by LeaderNode, it 
inserts Node3 between itself and Node2. The topology is LeaderNode -- 
Node1 -- Node2 -- Node3 -- LeaderNode; and
5. stops Node2. LeaderNode drops it from the ring. The topology is 
LeaderNode -- Node1 -- Node3 -- LeaderNode.

As the proto supports the ring topology, it is possible to trial this 
scenario:
cd sandbox/messaging
maven (ClusterHBTest may fail, so ignore the test failures if required)
maven -patch
cd ../..
java -jar target/bin/server.jar org/apache/geronimo/LeaderCluster
java -jar target/bin/server-1101.jar org/apache/geronimo/Cluster8091
java -jar target/bin/server-1102.jar org/apache/geronimo/Cluster8092
java -jar target/bin/server-1103.jar org/apache/geronimo/Cluster8093
kill <the process java -jar target/bin/server-1102.jar 
org/apache/geronimo/Cluster8092>

As a conclusion, this prototype tries to federate Geronimo servers in 
specific topologies. As an aside, it is rather simple to support other 
kinds of topologies without significant efforts. For instance, one of 
the JUnit test (NodeImplTest)  uses a bus topology.

Based on the knowledge of the enforced topology,  it should be possible 
to implement "efficient" clustered applications. For instance, the 
replication of Web sessions could work as follow: replicate the sessions 
created on this server to all of its direct neighbours (neighbours can 
be easily retrieved via a topology). This way the load is evenly 
distributed as long as sessions are evenly created in the cluster.

On top of this infrastructure, the proto implements a set of basic 
services, which could simplify the implementation of such clustered 
applications. These services are:
- customization of the marshalling/unmarshalling of Objects to be 
sent/received to/from a remote server: it is possible to replace 
specific objects;
- InputStream can be passed between servers: by leveraging the previous 
feature, InputStreams are replaced by a proxy which can be used to pull 
the content of an InputStream hosted on a remote server. This can be 
useful when dumping the content of a server to another server in order 
to initialize its state;
- primitive reference layer: Objects implementing a specific interface 
can be passed around even if not serializable. For instance, the current 
implementation can pass around a MBeanServer (this is a bad example as 
JSR 160 is intended for that). If you have a look to 
MBeanServerEndPointImpl, you will see that this is actually the ability 
to return by reference an object to the remote caller. As this caller 
can also provide parameters, which implements this specific interface, 
one can achieves a pass by reference for both the parameters and the 
result between two servers;
- proxy creation: it is the ability to acquire a proxy for a service 
running on a remote server:
// Defines the proxy meta-data.
            EndPointProxyInfo proxyInfo = new EndPointProxyInfo(
                NodeEndPointView.NODE_ID, NodeEndPointView.class, nodeInfo);
// Builds the proxy.
            NodeEndPointView topologyEndPoint =
                (NodeEndPointView) endPointProxyFactory.factory(proxyInfo);
// Transforms the Msgs which will be sent by this proxy.
            ((EndPointProxy) topologyEndPoint).setTransformer(new 
MsgTransformer() {...});
// This call will actually invoke the service on the server nodeInfo.
                topologyEndPoint.prepareTopology(aTopology);

As an aside, whatever the number of services communication with other 
remote services, the number of connections stay low: it is the number of 
edges defined by the current topology.

This proto has some bugs (e.g. memory leak of the reference layer) and 
some enhancements are required (e.g. classloading strategy is to be 
added). Nevertheless, I would like to have your inputs about the general 
concept and the current state of the implementation prior to progress 
any further.

Cheers,
Gianny

Re: sandbox/messaging - your feedbacks are welcome

Posted by Jules Gosnell <ju...@coredevelopers.net>.

Gianni,

Your pluggable topology stuff is duplicating work that is going on in 
WADI (wadi.codehaus.org). WADI is being built top-down from 
Tomcat/Jetty, and we already have a full, albeit not yet replicating, 
session manager for both platforms. Discussion has also been taking 
place between WADI and OpenEJB as to how best share code for common 
session management requirements.

Is there some way that we could put our heads together and come up with 
a workable solution that satisfies both our criteria and allows us to 
work together, rather than apart ? I would welcome input on WADI's 
topology management as I am sure James would on ActiveCluster...

Briefly :

WADI sits on top of ActiveCluster, which sits on top of ActiveMQ.

ActiveCluster is responsible for raising join/leave notifications 
concerning cluster membership.

WADI uses these notifications to recalculate the set of peers division 
into cells (subsets of peers) of configurable size.

I currently have Ring and NChooseK schemes up and running.

Each peer will carry a stack of weighting factors, which will be 
combined in each cell.

Each client requiring the storage of state will address a facade for the 
cells, which will divide the state according to weighting factors, 
amongst them. Various algorithms for the movement of state on the 
occurrence of client or cell-member death have been figured out. We are 
looking at solutions for surviving network split.

There is a thread on TSS at the moment that may be of interest 
(http://www.theserverside.com/news/thread.tss?thread_id=26705) and 
various documentation at the WADI site and in CVS (Could be better).

WADI is an external project, since it is already useful to standalone 
Jetty and Tomcat users, but now that we have agreed to work with OpenEJB 
to share common code, it will probably be pushed back into the Geronimo 
respository in order to underpin both projects.

Please take a look at WADI, ActiveCluster, ActiveMQ etc and let us know 
what you think...

Cheers,


Jules



jastrachan@mac.com wrote:

> For clustering, we've been working quite heavily for some time on this 
> abstraction...
>
>     http://activecluster.codehaus.org/
>
> (Note that ActiveCluster is not Geronimo specific and so can be used 
> to build clusters or anything).
>
> The current implementation works on top of any JMS provider, such as 
> ActiveMQ, which can work over UDP, multicast, TCP, SSL, g-network, 
> JGroups, JXTA etc.
>
> Jules has been working hard on distributed session state and handling 
> fail-over gracefully and cluster wide topology organisation protocols 
> such as for arranging buddies over subnets / DR zones and the like 
> using WADI
>
>     http://wadi.codehaus.org/
>
> which is using ActiveCluster and Jules is starting to put together 
> various algorithms for choosing buddies, pairs, sub-nets, controllers 
> and the like.
>
> Notice the simpler API for ActiveCluster which just reuses a few 
> interfaces from JMS.
>
> It seems your new messaging.cluster API is pretty similar to 
> ActiveCluster. Any ideas why you didn't just use ActiveCluster? 
> (Especially as I mentioned it to you quite a while ago :)
>
> Also, as I said to you a while ago, I don't see why the messaging 
> package doesn't use the JMS API for things like Msg / MsgBody / 
> MsgConsumer / MsgProducer and so forth. Not only would this mean your 
> API would become more J2EE standard, it'd mean you could reuse heaps 
> of open source and commercial implementations.
>
>
>
> On 20 Jul 2004, at 05:07, Gianny Damour wrote:
>
>> Hello,
>>
>>
>> I am working on a prototype, sandbox/messaging, focused on providing 
>> the infrastructure for the implementation of clustered applications. 
>> This proto has reached a stage, which is according to me "good 
>> enough" for judgment.
>>
>> I will try to describe here the main features of this infrastructure; 
>> hence, this memo will be a little bit long.
>>
>> Its core ideas are:
>> - to provide a mechanism to cluster/inter-connect N Geronimo servers. 
>> The way these servers are inter-connected should be at the same time 
>> manageable (e.g. I want this server to be connected to this one) and 
>> to some extent automatic (e.g. when a new server is detected, it 
>> should be added automatically to the cluster); and
>> - to provide a set of base services built on top of the above 
>> infrastructure to simplify the implementation of clustered 
>> applications (e.g. creation of proxies for services running on remote 
>> Geronimo server).
>>
>> Let's talk in more details about the way Geronimo servers are 
>> clustered. The implementation achieve this goal by organizing servers 
>> in a known and configurable topology, e.g. star, ring, hyper-cube, 
>> where edges of the associated graphs represent connections. At the 
>> very beginning, a server and two heartbeat services, namely heartbeat 
>> sender and heartbeat monitor, are started. The heartbeat sender sends 
>> periodically an heartbeat consisting of the meta-data (IP address, 
>> port and name) of its associated server to a multicast group. The 
>> heartbeat monitor monitors these heartbeats and detects the 
>> availability or failure of servers. When a new server is available or 
>> a failure is detected, a new topology is computed and cascaded to the 
>> servers of the current topology.
>>
>> Let's consider the following scenario:
>> Geronimo servers are organized in a ring topology; four servers are 
>> started and one server is killed.
>>
>> 1. starts the first server, namely LearderNode. As it is the first 
>> server, it is in a stand-alone mode;
>> 2. starts the second server, namely Node1. This server is detected by 
>> LeaderNode, which triggers a reconfiguration. The topology is 
>> LeaderNode -- Node1 -- LeaderNode;
>> 3. starts the third server, namely Node2. LeaderNode inserts Node2 
>> between itself and Node1. The topology is LeaderNode -- Node1 -- 
>> Node2 -- LeaderNode;
>> 4. starts a fourth server, namely Node3. Detected by LeaderNode, it 
>> inserts Node3 between itself and Node2. The topology is LeaderNode -- 
>> Node1 -- Node2 -- Node3 -- LeaderNode; and
>> 5. stops Node2. LeaderNode drops it from the ring. The topology is 
>> LeaderNode -- Node1 -- Node3 -- LeaderNode.
>>
>> As the proto supports the ring topology, it is possible to trial this 
>> scenario:
>> cd sandbox/messaging
>> maven (ClusterHBTest may fail, so ignore the test failures if required)
>> maven -patch
>> cd ../..
>> java -jar target/bin/server.jar org/apache/geronimo/LeaderCluster
>> java -jar target/bin/server-1101.jar org/apache/geronimo/Cluster8091
>> java -jar target/bin/server-1102.jar org/apache/geronimo/Cluster8092
>> java -jar target/bin/server-1103.jar org/apache/geronimo/Cluster8093
>> kill <the process java -jar target/bin/server-1102.jar 
>> org/apache/geronimo/Cluster8092>
>>
>> As a conclusion, this prototype tries to federate Geronimo servers in 
>> specific topologies. As an aside, it is rather simple to support 
>> other kinds of topologies without significant efforts. For instance, 
>> one of the JUnit test (NodeImplTest)  uses a bus topology.
>>
>> Based on the knowledge of the enforced topology,  it should be 
>> possible to implement "efficient" clustered applications. For 
>> instance, the replication of Web sessions could work as follow: 
>> replicate the sessions created on this server to all of its direct 
>> neighbours (neighbours can be easily retrieved via a topology). This 
>> way the load is evenly distributed as long as sessions are evenly 
>> created in the cluster.
>>
>> On top of this infrastructure, the proto implements a set of basic 
>> services, which could simplify the implementation of such clustered 
>> applications. These services are:
>> - customization of the marshalling/unmarshalling of Objects to be 
>> sent/received to/from a remote server: it is possible to replace 
>> specific objects;
>> - InputStream can be passed between servers: by leveraging the 
>> previous feature, InputStreams are replaced by a proxy which can be 
>> used to pull the content of an InputStream hosted on a remote server. 
>> This can be useful when dumping the content of a server to another 
>> server in order to initialize its state;
>> - primitive reference layer: Objects implementing a specific 
>> interface can be passed around even if not serializable. For 
>> instance, the current implementation can pass around a MBeanServer 
>> (this is a bad example as JSR 160 is intended for that). If you have 
>> a look to MBeanServerEndPointImpl, you will see that this is actually 
>> the ability to return by reference an object to the remote caller. As 
>> this caller can also provide parameters, which implements this 
>> specific interface, one can achieves a pass by reference for both the 
>> parameters and the result between two servers;
>> - proxy creation: it is the ability to acquire a proxy for a service 
>> running on a remote server:
>> // Defines the proxy meta-data.
>>            EndPointProxyInfo proxyInfo = new EndPointProxyInfo(
>>                NodeEndPointView.NODE_ID, NodeEndPointView.class, 
>> nodeInfo);
>> // Builds the proxy.
>>            NodeEndPointView topologyEndPoint =
>>                (NodeEndPointView) 
>> endPointProxyFactory.factory(proxyInfo);
>> // Transforms the Msgs which will be sent by this proxy.
>>            ((EndPointProxy) topologyEndPoint).setTransformer(new 
>> MsgTransformer() {...});
>> // This call will actually invoke the service on the server nodeInfo.
>>                topologyEndPoint.prepareTopology(aTopology);
>>
>> As an aside, whatever the number of services communication with other 
>> remote services, the number of connections stay low: it is the number 
>> of edges defined by the current topology.
>>
>> This proto has some bugs (e.g. memory leak of the reference layer) 
>> and some enhancements are required (e.g. classloading strategy is to 
>> be added). Nevertheless, I would like to have your inputs about the 
>> general concept and the current state of the implementation prior to 
>> progress any further.
>>
>> Cheers,
>> Gianny
>>
>>
>
> James
> -------
> http://radio.weblogs.com/0112098/
>


-- 
/*************************************
 * Jules Gosnell
 * Partner
 * Core Developers Network (Europe)
 * http://www.coredevelopers.net
 *************************************/

Re: sandbox/messaging - your feedbacks are welcome

Posted by ja...@mac.com.

On 21 Jul 2004, at 12:22, Gianny Damour wrote:
> On 21/07/2004 5:28 PM, jastrachan@mac.com wrote:
>
>> For clustering, we've been working quite heavily for some time on 
>> this abstraction...
>>
>>     http://activecluster.codehaus.org/
>>
>> (Note that ActiveCluster is not Geronimo specific and so can be used 
>> to build clusters or anything).
>>
>> The current implementation works on top of any JMS provider, such as 
>> ActiveMQ, which can work over UDP, multicast, TCP, SSL, g-network, 
>> JGroups, JXTA etc.
>
> As far as I know, ActiveCluster does not provide high-availability on 
> top of any JMS server. JMS servers are not designed equally from an 
> high-availability point-of-view. Some of them are high-available and 
> others aren't. For instance, if the JMS server used by ActiveCluster 
> fails, then it is the cluster as a whole which is down.

Surely its up to the JMS provider to provide HA - not the clustering 
abstraction you're using to write cluster based protocols?

FWIW we're using ActiveCluster with ActiveMQ to make HA JMS brokers. 
ActiveCluster doesn't need to be HA - its a tool for building HA :)

> So, ActiveCluster does not turn any JMS provider into an 
> highly-available JMS provider and if the JMS provider fails, then the 
> cluster as a whole fails.

Yes. You've gotta have a single point of failure somewhere and the 
ActiveCluster *provider* which may or may not be a JMS provider, is 
meant to handle this.

But you typically don't need HA for ActiveCluster as its mostly a 
discovery & lightweight state-replication protocol. Its there to 
discover nodes, not be a JMS server. HA protocols are developed behind 
different APIs (JMS, RMI, JNDI, JDBC etc)

>> Jules has been working hard on distributed session state and handling 
>> fail-over gracefully and cluster wide topology organisation protocols 
>> such as for arranging buddies over subnets / DR zones and the like 
>> using WADI
>>
>>     http://wadi.codehaus.org/
>>
>> which is using ActiveCluster and Jules is starting to put together 
>> various algorithms for choosing buddies, pairs, sub-nets, controllers 
>> and the like.
>>
>> Notice the simpler API for ActiveCluster which just reuses a few 
>> interfaces from JMS.
>>
>> It seems your new messaging.cluster API is pretty similar to 
>> ActiveCluster. Any ideas why you didn't just use ActiveCluster? 
>> (Especially as I mentioned it to you quite a while ago :)
>
> I know. Actually, I was so closed to reach a presentable state of 
> g-messaging that I have done the last step: at this very moment, N 
> servers can be started and the proto will auto-discover and syndicate 
> them. There is not a single point of failure. Actually this is wrong 
> as a "migratable" service is executed by a server and not migrated to 
> another node upon failure. This is a standard feature that should be 
> provided out-of-the-box: the ability to run a service only once in a 
> cluster and migrate it on demand/failure to another node.

And for some time ActiveCluster has been implemented using UDP / 
multicast, g-messaging or jgroups with no single point of failure 
too...

>> Also, as I said to you a while ago, I don't see why the messaging 
>> package doesn't use the JMS API for things like Msg / MsgBody / 
>> MsgConsumer / MsgProducer and so forth. Not only would this mean your 
>> API would become more J2EE standard, it'd mean you could reuse heaps 
>> of open source and commercial implementations.
>
> At the very beginning, I was really seduced by this idea. On second 
> thought, I prefer this simplified API.

Msg / MsgBody / MsgHeader is simpler than javax.jms.Message?

> No cumbersome JMSException at each and every simple call on a JMS 
> Message. One can see the various MsgX as wrappers around the JMS API. 
> Actually, any open source or vendor provided JMS implementations could 
> be re-used: the org.apache.geronimo.messaging.remotenode is done to 
> hook-in others transports such as JMS. As an aside, we are talking 
> here about only 5 classes in a code base couting the little more than 
> 100 classes.

Hmm. A whole new API to avoid a checked exception? Given that we're 
talking about messaging/clustering code here where exceptions can be 
thrown at any time, I'm not sure hiding exceptions is such a great 
idea. (Sure at the POJO layer this can be handy, but we're lower level 
communications code here that needs to be aware that lots of things can 
fail).

> Having said all that, I am still having a serious look to 
> ActiveCluster. However, I do not see how its current implementation 
> avoids a cluster wide failure in case of a JMS server failure.

ActiveCluster doesn't need to use a JMS server - indeed we recommend 
you don't use a JMS server to implement it. We use it today using UDP / 
multicast / g-messaging / jgroups without a JMS server or single point 
of failure. If you had a HA JMS you could use that too. Or you could 
write a native ActiveCluster implementation without using a JMS 
provider at all.

James
-------
http://radio.weblogs.com/0112098/

Re: sandbox/messaging - your feedbacks are welcome

Posted by Gianny Damour <gi...@optusnet.com.au>.

On 21/07/2004 5:28 PM, jastrachan@mac.com wrote:

> For clustering, we've been working quite heavily for some time on this 
> abstraction...
>
>     http://activecluster.codehaus.org/
>
> (Note that ActiveCluster is not Geronimo specific and so can be used 
> to build clusters or anything).
>
> The current implementation works on top of any JMS provider, such as 
> ActiveMQ, which can work over UDP, multicast, TCP, SSL, g-network, 
> JGroups, JXTA etc.

As far as I know, ActiveCluster does not provide high-availability on 
top of any JMS server. JMS servers are not designed equally from an 
high-availability point-of-view. Some of them are high-available and 
others aren't. For instance, if the JMS server used by ActiveCluster 
fails, then it is the cluster as a whole which is down.
If ActiveCluster yields clients of this problem by using a 
ConnectionFactory, which uses under the cover N ConnectionFactory to N 
distinct JMS servers or if it has the ability to detect the failure of a 
JMS server and migrates it on another node, then yes it supports 
high-availability.

So, ActiveCluster does not turn any JMS provider into an 
highly-available JMS provider and if the JMS provider fails, then the 
cluster as a whole fails.

>
> Jules has been working hard on distributed session state and handling 
> fail-over gracefully and cluster wide topology organisation protocols 
> such as for arranging buddies over subnets / DR zones and the like 
> using WADI
>
>     http://wadi.codehaus.org/
>
> which is using ActiveCluster and Jules is starting to put together 
> various algorithms for choosing buddies, pairs, sub-nets, controllers 
> and the like.
>
> Notice the simpler API for ActiveCluster which just reuses a few 
> interfaces from JMS.
>
> It seems your new messaging.cluster API is pretty similar to 
> ActiveCluster. Any ideas why you didn't just use ActiveCluster? 
> (Especially as I mentioned it to you quite a while ago :)

I know. Actually, I was so closed to reach a presentable state of 
g-messaging that I have done the last step: at this very moment, N 
servers can be started and the proto will auto-discover and syndicate 
them. There is not a single point of failure. Actually this is wrong as 
a "migratable" service is executed by a server and not migrated to 
another node upon failure. This is a standard feature that should be 
provided out-of-the-box: the ability to run a service only once in a 
cluster and migrate it on demand/failure to another node.

>
> Also, as I said to you a while ago, I don't see why the messaging 
> package doesn't use the JMS API for things like Msg / MsgBody / 
> MsgConsumer / MsgProducer and so forth. Not only would this mean your 
> API would become more J2EE standard, it'd mean you could reuse heaps 
> of open source and commercial implementations.

At the very beginning, I was really seduced by this idea. On second 
thought, I prefer this simplified API. No cumbersome JMSException at 
each and every simple call on a JMS Message. One can see the various 
MsgX as wrappers around the JMS API. Actually, any open source or vendor 
provided JMS implementations could be re-used: the 
org.apache.geronimo.messaging.remotenode is done to hook-in others 
transports such as JMS. As an aside, we are talking here about only 5 
classes in a code base couting the little more than 100 classes.

Having said all that, I am still having a serious look to ActiveCluster. 
However, I do not see how its current implementation avoids a cluster 
wide failure in case of a JMS server failure.

Cheers,
Gianny

Re: sandbox/messaging - your feedbacks are welcome

Posted by ja...@mac.com.

For clustering, we've been working quite heavily for some time on this 
abstraction...

     http://activecluster.codehaus.org/

(Note that ActiveCluster is not Geronimo specific and so can be used to 
build clusters or anything).

The current implementation works on top of any JMS provider, such as 
ActiveMQ, which can work over UDP, multicast, TCP, SSL, g-network, 
JGroups, JXTA etc.

Jules has been working hard on distributed session state and handling 
fail-over gracefully and cluster wide topology organisation protocols 
such as for arranging buddies over subnets / DR zones and the like 
using WADI

     http://wadi.codehaus.org/

which is using ActiveCluster and Jules is starting to put together 
various algorithms for choosing buddies, pairs, sub-nets, controllers 
and the like.

Notice the simpler API for ActiveCluster which just reuses a few 
interfaces from JMS.

It seems your new messaging.cluster API is pretty similar to 
ActiveCluster. Any ideas why you didn't just use ActiveCluster? 
(Especially as I mentioned it to you quite a while ago :)

Also, as I said to you a while ago, I don't see why the messaging 
package doesn't use the JMS API for things like Msg / MsgBody / 
MsgConsumer / MsgProducer and so forth. Not only would this mean your 
API would become more J2EE standard, it'd mean you could reuse heaps of 
open source and commercial implementations.



On 20 Jul 2004, at 05:07, Gianny Damour wrote:

> Hello,
>
>
> I am working on a prototype, sandbox/messaging, focused on providing 
> the infrastructure for the implementation of clustered applications. 
> This proto has reached a stage, which is according to me "good enough" 
> for judgment.
>
> I will try to describe here the main features of this infrastructure; 
> hence, this memo will be a little bit long.
>
> Its core ideas are:
> - to provide a mechanism to cluster/inter-connect N Geronimo servers. 
> The way these servers are inter-connected should be at the same time 
> manageable (e.g. I want this server to be connected to this one) and 
> to some extent automatic (e.g. when a new server is detected, it 
> should be added automatically to the cluster); and
> - to provide a set of base services built on top of the above 
> infrastructure to simplify the implementation of clustered 
> applications (e.g. creation of proxies for services running on remote 
> Geronimo server).
>
> Let's talk in more details about the way Geronimo servers are 
> clustered. The implementation achieve this goal by organizing servers 
> in a known and configurable topology, e.g. star, ring, hyper-cube, 
> where edges of the associated graphs represent connections. At the 
> very beginning, a server and two heartbeat services, namely heartbeat 
> sender and heartbeat monitor, are started. The heartbeat sender sends 
> periodically an heartbeat consisting of the meta-data (IP address, 
> port and name) of its associated server to a multicast group. The 
> heartbeat monitor monitors these heartbeats and detects the 
> availability or failure of servers. When a new server is available or 
> a failure is detected, a new topology is computed and cascaded to the 
> servers of the current topology.
>
> Let's consider the following scenario:
> Geronimo servers are organized in a ring topology; four servers are 
> started and one server is killed.
>
> 1. starts the first server, namely LearderNode. As it is the first 
> server, it is in a stand-alone mode;
> 2. starts the second server, namely Node1. This server is detected by 
> LeaderNode, which triggers a reconfiguration. The topology is 
> LeaderNode -- Node1 -- LeaderNode;
> 3. starts the third server, namely Node2. LeaderNode inserts Node2 
> between itself and Node1. The topology is LeaderNode -- Node1 -- Node2 
> -- LeaderNode;
> 4. starts a fourth server, namely Node3. Detected by LeaderNode, it 
> inserts Node3 between itself and Node2. The topology is LeaderNode -- 
> Node1 -- Node2 -- Node3 -- LeaderNode; and
> 5. stops Node2. LeaderNode drops it from the ring. The topology is 
> LeaderNode -- Node1 -- Node3 -- LeaderNode.
>
> As the proto supports the ring topology, it is possible to trial this 
> scenario:
> cd sandbox/messaging
> maven (ClusterHBTest may fail, so ignore the test failures if required)
> maven -patch
> cd ../..
> java -jar target/bin/server.jar org/apache/geronimo/LeaderCluster
> java -jar target/bin/server-1101.jar org/apache/geronimo/Cluster8091
> java -jar target/bin/server-1102.jar org/apache/geronimo/Cluster8092
> java -jar target/bin/server-1103.jar org/apache/geronimo/Cluster8093
> kill <the process java -jar target/bin/server-1102.jar 
> org/apache/geronimo/Cluster8092>
>
> As a conclusion, this prototype tries to federate Geronimo servers in 
> specific topologies. As an aside, it is rather simple to support other 
> kinds of topologies without significant efforts. For instance, one of 
> the JUnit test (NodeImplTest)  uses a bus topology.
>
> Based on the knowledge of the enforced topology,  it should be 
> possible to implement "efficient" clustered applications. For 
> instance, the replication of Web sessions could work as follow: 
> replicate the sessions created on this server to all of its direct 
> neighbours (neighbours can be easily retrieved via a topology). This 
> way the load is evenly distributed as long as sessions are evenly 
> created in the cluster.
>
> On top of this infrastructure, the proto implements a set of basic 
> services, which could simplify the implementation of such clustered 
> applications. These services are:
> - customization of the marshalling/unmarshalling of Objects to be 
> sent/received to/from a remote server: it is possible to replace 
> specific objects;
> - InputStream can be passed between servers: by leveraging the 
> previous feature, InputStreams are replaced by a proxy which can be 
> used to pull the content of an InputStream hosted on a remote server. 
> This can be useful when dumping the content of a server to another 
> server in order to initialize its state;
> - primitive reference layer: Objects implementing a specific interface 
> can be passed around even if not serializable. For instance, the 
> current implementation can pass around a MBeanServer (this is a bad 
> example as JSR 160 is intended for that). If you have a look to 
> MBeanServerEndPointImpl, you will see that this is actually the 
> ability to return by reference an object to the remote caller. As this 
> caller can also provide parameters, which implements this specific 
> interface, one can achieves a pass by reference for both the 
> parameters and the result between two servers;
> - proxy creation: it is the ability to acquire a proxy for a service 
> running on a remote server:
> // Defines the proxy meta-data.
>            EndPointProxyInfo proxyInfo = new EndPointProxyInfo(
>                NodeEndPointView.NODE_ID, NodeEndPointView.class, 
> nodeInfo);
> // Builds the proxy.
>            NodeEndPointView topologyEndPoint =
>                (NodeEndPointView) 
> endPointProxyFactory.factory(proxyInfo);
> // Transforms the Msgs which will be sent by this proxy.
>            ((EndPointProxy) topologyEndPoint).setTransformer(new 
> MsgTransformer() {...});
> // This call will actually invoke the service on the server nodeInfo.
>                topologyEndPoint.prepareTopology(aTopology);
>
> As an aside, whatever the number of services communication with other 
> remote services, the number of connections stay low: it is the number 
> of edges defined by the current topology.
>
> This proto has some bugs (e.g. memory leak of the reference layer) and 
> some enhancements are required (e.g. classloading strategy is to be 
> added). Nevertheless, I would like to have your inputs about the 
> general concept and the current state of the implementation prior to 
> progress any further.
>
> Cheers,
> Gianny
>
>

James
-------
http://radio.weblogs.com/0112098/

Re: sandbox/messaging - your feedbacks are welcome

Posted by Gianny Damour <gi...@optusnet.com.au>.

On 20/07/2004 11:29 PM, Freddi Gyara wrote:

>a) Users should have the choice of not having to specify one (default
>topology). Otherwise the complexity of setting up the cluster could be
>too onerous.
>  
>
Actually, I have implemented the Ring topology to avoid having to create 
a custom topology. Whatever the number of servers, the implementation  
is autonomous and will always build a ring. BTW, it is possible to 
implement algorithms to build topologies that human will have a  lot of 
pain to do themself. For instance to built an hypercube of dimension 4 
and provide the shortest paths between two nodes of such a graph is 
rather time consuming and error-prone, at least for a human :).

>A list of "prefered neighbours" - that may be a simpler configuration
>option v/s the definition of a topology
>  
>
I will make this happen.

Thanks,
Gianny

Re: sandbox/messaging - your feedbacks are welcome

Posted by Freddi Gyara <fg...@gmail.com>.

<snip>
>> As a matter of fact, this proto uses multicast (UDP) only for its
>>heartbeat mechanism and unicast (TCP) for all the other activities. </snip>

I think i was a bit vague. No doubt, TCP is required for all p2p
communications. However, when a server has a (sorted) list of live
servers (LeaderNode-Node1-Node2 as in your example), it automatically
knows its neighbours (Node1's nbrs are LeaderNode and Node2) and can
communicate with them over TCP.

Your idea of being able to create a bespoke topology is also good, but:
a) Users should have the choice of not having to specify one (default
topology). Otherwise the complexity of setting up the cluster could be
too onerous.

b) Un-trained users may end up creating a topology that is not
resilient to failures
eg consider a star with LeaderNode as the center and nodes N1, N2, N3
as spokes. If LeaderNode fails, which node becomes the center ? Who
decides ? Is this configurable .... etc etc etc). Ideally the system
should be able to identify topologies that have single points of
failure.



> This is a great idea. Actually, I had this thought a while back: for
> each server, one configures an ordered list of  servers. When a server
> is started, the first N available servers are considered as their
> "neighbours".
A list of "prefered neighbours" - that may be a simpler configuration
option v/s the definition of a topology

Re: sandbox/messaging - your feedbacks are welcome

Posted by Gianny Damour <gi...@optusnet.com.au>.

On 20/07/2004 7:55 PM, Freddi Gyara wrote:

>A couple of suggestions: 
>a) Do we really need to have a specific topology. Since we are using
>Multicast, ALL the servers in the cluster would get the multicast
>packets anyway.
>  
>
You are right: if information is broadcasted to all the servers, then 
one does not need a topology. However, why do we want to send the 
information to all the servers if one only wants to send it to a given 
set of servers?
For instance, in the case of a replication service, the state could be 
replicated to all the servers or only to a specific set. In this latter 
case and as far as I understand, either these packets are unnecessarily 
received by a couple of servers or one sends these packets to a specific 
multicast group only used by the servers that we would like to 
communicate with.
As a matter of fact, this proto uses multicast (UDP) only for its 
heartbeat mechanism and unicast (TCP) for all the other activities.
I think that multicast is useful only for highly redundant data 
distribution. I am not convinced that the replication of HTTP sessions 
needs to be highly redundant. Having said that, I am convinced that 
multicast is the way to go in order to maintain a cluster wide cache.

So, this proto uses the unicast model for peer-to-peer communications. 
It also imposes a specific topology to the servers in order to give full 
control on the way servers are inter-connected. If one wants this server 
to be a direct neighbour (in other word a "prefered" remote server) of 
this other server, then one can achieve that.

>The servers can then work out their "neighbours" through a simple
>(pluggable ?) sort algorithm.
>  
>
This is a great idea. Actually, I had this thought a while back: for 
each server, one configures an ordered list of  servers. When a server 
is started, the first N available servers are considered as their 
"neighbours".

>b) The clustering code should be delineated from the rest of Geronimo
>through a clear set of API.
>Useful functions would include:
>publishToNeighbours(obj): Publishes an object to the server's
>neighbours in the cluster. The identification of neighbours,
>marshalling and transport is left to the clustering layer to decide.
>This would be useful for example, to cluster HttpSessions
>
>publishGlobally(obj): Publishes an object to the entire cluster. eg
>clustered JNDI context.
>  
>
I agree. The proto allows more or less the same kind of operations from 
a semantic point-of-view: it supports the distribution of objects to a 
specified set of servers. For instance, it is possible to create a proxy 
for a service running on N remote servers. When an operation is called 
on this proxy, the invocation is sent to these N remote servers and 
executed. The N results are returned to the proxy, which consolidates 
them and return an "aggregated" result to the caller. You can have a 
look to NodeImplTest.testMulticast for such an example.

Thanks,
Gianny

Re: sandbox/messaging - your feedbacks are welcome

Posted by Freddi Gyara <fg...@gmail.com>.

A couple of suggestions: 
a) Do we really need to have a specific topology. Since we are using
Multicast, ALL the servers in the cluster would get the multicast
packets anyway.
The servers can then work out their "neighbours" through a simple
(pluggable ?) sort algorithm.

b) The clustering code should be delineated from the rest of Geronimo
through a clear set of API.
Useful functions would include:
publishToNeighbours(obj): Publishes an object to the server's
neighbours in the cluster. The identification of neighbours,
marshalling and transport is left to the clustering layer to decide.
This would be useful for example, to cluster HttpSessions

publishGlobally(obj): Publishes an object to the entire cluster. eg
clustered JNDI context.



On Tue, 20 Jul 2004 14:07:02 +1000, Gianny Damour
<gi...@optusnet.com.au> wrote:
> Hello,
> 
> I am working on a prototype, sandbox/messaging, focused on providing the
> infrastructure for the implementation of clustered applications. This
> proto has reached a stage, which is according to me "good enough" for
> judgment.
> 
> I will try to describe here the main features of this infrastructure;
> hence, this memo will be a little bit long.
> 
> Its core ideas are:
> - to provide a mechanism to cluster/inter-connect N Geronimo servers.
> The way these servers are inter-connected should be at the same time
> manageable (e.g. I want this server to be connected to this one) and to
> some extent automatic (e.g. when a new server is detected, it should be
> added automatically to the cluster); and
> - to provide a set of base services built on top of the above
> infrastructure to simplify the implementation of clustered applications
> (e.g. creation of proxies for services running on remote Geronimo server).
> 
> Let's talk in more details about the way Geronimo servers are clustered.
> The implementation achieve this goal by organizing servers in a known
> and configurable topology, e.g. star, ring, hyper-cube, where edges of
> the associated graphs represent connections. At the very beginning, a
> server and two heartbeat services, namely heartbeat sender and heartbeat
> monitor, are started. The heartbeat sender sends periodically an
> heartbeat consisting of the meta-data (IP address, port and name) of its
> associated server to a multicast group. The heartbeat monitor monitors
> these heartbeats and detects the availability or failure of servers.
> When a new server is available or a failure is detected, a new topology
> is computed and cascaded to the servers of the current topology.
> 
> Let's consider the following scenario:
> Geronimo servers are organized in a ring topology; four servers are
> started and one server is killed.
> 
> 1. starts the first server, namely LearderNode. As it is the first
> server, it is in a stand-alone mode;
> 2. starts the second server, namely Node1. This server is detected by
> LeaderNode, which triggers a reconfiguration. The topology is LeaderNode
> -- Node1 -- LeaderNode;
> 3. starts the third server, namely Node2. LeaderNode inserts Node2
> between itself and Node1. The topology is LeaderNode -- Node1 -- Node2
> -- LeaderNode;
> 4. starts a fourth server, namely Node3. Detected by LeaderNode, it
> inserts Node3 between itself and Node2. The topology is LeaderNode --
> Node1 -- Node2 -- Node3 -- LeaderNode; and
> 5. stops Node2. LeaderNode drops it from the ring. The topology is
> LeaderNode -- Node1 -- Node3 -- LeaderNode.
> 
> As the proto supports the ring topology, it is possible to trial this
> scenario:
> cd sandbox/messaging
> maven (ClusterHBTest may fail, so ignore the test failures if required)
> maven -patch
> cd ../..
> java -jar target/bin/server.jar org/apache/geronimo/LeaderCluster
> java -jar target/bin/server-1101.jar org/apache/geronimo/Cluster8091
> java -jar target/bin/server-1102.jar org/apache/geronimo/Cluster8092
> java -jar target/bin/server-1103.jar org/apache/geronimo/Cluster8093
> kill <the process java -jar target/bin/server-1102.jar
> org/apache/geronimo/Cluster8092>
> 
> As a conclusion, this prototype tries to federate Geronimo servers in
> specific topologies. As an aside, it is rather simple to support other
> kinds of topologies without significant efforts. For instance, one of
> the JUnit test (NodeImplTest)  uses a bus topology.
> 
> Based on the knowledge of the enforced topology,  it should be possible
> to implement "efficient" clustered applications. For instance, the
> replication of Web sessions could work as follow: replicate the sessions
> created on this server to all of its direct neighbours (neighbours can
> be easily retrieved via a topology). This way the load is evenly
> distributed as long as sessions are evenly created in the cluster.
> 
> On top of this infrastructure, the proto implements a set of basic
> services, which could simplify the implementation of such clustered
> applications. These services are:
> - customization of the marshalling/unmarshalling of Objects to be
> sent/received to/from a remote server: it is possible to replace
> specific objects;
> - InputStream can be passed between servers: by leveraging the previous
> feature, InputStreams are replaced by a proxy which can be used to pull
> the content of an InputStream hosted on a remote server. This can be
> useful when dumping the content of a server to another server in order
> to initialize its state;
> - primitive reference layer: Objects implementing a specific interface
> can be passed around even if not serializable. For instance, the current
> implementation can pass around a MBeanServer (this is a bad example as
> JSR 160 is intended for that). If you have a look to
> MBeanServerEndPointImpl, you will see that this is actually the ability
> to return by reference an object to the remote caller. As this caller
> can also provide parameters, which implements this specific interface,
> one can achieves a pass by reference for both the parameters and the
> result between two servers;
> - proxy creation: it is the ability to acquire a proxy for a service
> running on a remote server:
> // Defines the proxy meta-data.
>            EndPointProxyInfo proxyInfo = new EndPointProxyInfo(
>                NodeEndPointView.NODE_ID, NodeEndPointView.class, nodeInfo);
> // Builds the proxy.
>            NodeEndPointView topologyEndPoint =
>                (NodeEndPointView) endPointProxyFactory.factory(proxyInfo);
> // Transforms the Msgs which will be sent by this proxy.
>            ((EndPointProxy) topologyEndPoint).setTransformer(new
> MsgTransformer() {...});
> // This call will actually invoke the service on the server nodeInfo.
>                topologyEndPoint.prepareTopology(aTopology);
> 
> As an aside, whatever the number of services communication with other
> remote services, the number of connections stay low: it is the number of
> edges defined by the current topology.
> 
> This proto has some bugs (e.g. memory leak of the reference layer) and
> some enhancements are required (e.g. classloading strategy is to be
> added). Nevertheless, I would like to have your inputs about the general
> concept and the current state of the implementation prior to progress
> any further.
> 
> Cheers,
> Gianny
> 
>