You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomee.apache.org by David Blevins <da...@visi.com> on 2008/09/12 21:45:52 UTC
Re: Client/Server Multicast Discovery and Failover
Think I got all this functionality buttoned up. Here's a high level
view of it in its current form. As usual nothing is in stone and any
and all things can be changed based on group feedback.
DISCOVERY
What we have going on from a tech perspective is each server sends and
receives a multicast heartbeat. Each multicast packet contains a
single URI that advertises a service, its group, and its location.
Say for example "cluster1:ejb:ejbd://thehost:4201". We can definitely
explore the SLP format as Alan suggests.
There are other advantages of the simple, unchanging, URI style. The
URI is essentially stateless as there is no "i'm alive" URI or an "i'm
dead" URI, there is simply a URI for each service a server offers and
its presence on the network indicates its availability and its absence
indicates the service is no longer available. In this way the issues
with UDP being unordered and unreliable melt away as state is no
longer a concern and packet sizes are always small. Complicated
libraries that ride atop UDP and attempt to offer reliability
(retransmission) and ordering on UDP can be avoided. UDP/Multicast is
only used for discovery and from there on out critical information is
transmitted over TCP/IP which is obviously going to do a better job at
ensuring reliability and ordering.
On the client side of things, a special "multicast://" URL can be used
in the InitialContext properties to signify that multicast should be
used to seed the connection process. Such as:
Properties properties = new Properties();
properties.setProperty(Context.INITIAL_CONTEXT_FACTORY,
"org.apache.openejb.client.RemoteInitialContextFactory");
properties.setProperty(Context.PROVIDER_URL, "multicast://
239.255.2.3:6142");
InitialContext remoteContext = new InitialContext(properties);
The URL has optional query parameters such as "schemes" and "group"
and "timeout" which allow you to zero in on a particular type of
service of a particular cluster group as well as set how long you are
willing to wait in the discovery process till finally giving up. The
first matching service that it sees "flowing" around on the UDP stream
is the one it picks and sticks to for that and subsequent requests,
ensuring UDP is only used when there are no other servers to talk to.
FAILOVER
On each request the server, the client will send the version number
associated with the list of servers in the cluster it is aware of.
Initially this version will be zero and the list will be empty. Only
when the server sees the client has an old list will the server send
the updated list. This is an important distinction as the list
(ClusterMetaData) is not transmitted back and forth on every request,
only on change. If the membership of the cluster is stable there is
essentially no clustering overhead to the protocol -- 8 byte overhead
to each request and 1 byte on each response -- so you will *not* see
an exponential slowdown in response times the more members are added
to the cluster. This new list takes affect for all proxies that share
the same ServerMetaData data. Internally we key the ClusterMetaData
by ServerMetaData. I originally had the version be a simple
"increment by one" strategy, but eventually went with the value of
System.currentTimeMillis(). It's possible more than one server is
reachable via the ServerMetaData (i.e. multicast://) and each server
has it's own list and version number. Secondly, if a server is
restarted, the version number will go back to zero and the client
could be stuck thinking it has a more current list than the server.
When a server shuts down, more connections are refused, existing
connections not in mid-request are closed, any remaining connections
are closed immediately after completion of the request in progress and
clients can failover gracefully to the next server in the list. If a
server crashes requests are retried on the next server in the list.
This failover pattern is followed until there are no more servers in
the list at which point the client attempts a final multicast search
(if it was created with a multicast PROVIDER_URL) before abandoning
the request and throwing an exception to the caller. Currently, the
failover is ordered but could very easily be made random. The
multicast discovery aspect of the client adds a nice randomness to the
selection of the first server that is perhaps somewhat "just".
Theoretically, servers that are under more load will send out less
heart beats than servers with no load. This may not happen as theory
dictates, but certainly as we get more ejb statistic data wired into
the server functionality we can pursue deliberate heartbeat throttling
techniques that might make that theory really sing in practice.
-David
Re: Client/Server Multicast Discovery and Failover
Posted by David Blevins <da...@visi.com>.
Created a doc for this functionality finally and worked in the new
"Multipoint" (i.e. TCP based discovery) stuff I've been working on.
If people could review the doc and give some feedback, that would be
great!
https://cwiki.apache.org/confluence/display/OPENEJBx30/Failover
-David
Re: Client/Server Multicast Discovery and Failover
Posted by Dain Sundstrom <da...@iq80.com>.
On Sep 12, 2008, at 12:45 PM, David Blevins wrote:
> I originally had the version be a simple "increment by one"
> strategy, but eventually went with the value of
> System.currentTimeMillis(). It's possible more than one server is
> reachable via the ServerMetaData (i.e. multicast://) and each server
> has it's own list and version number. Secondly, if a server is
> restarted, the version number will go back to zero and the client
> could be stuck thinking it has a more current list than the server.
Time sometimes moves backwards on servers with connected to a time
server. How about something slightly more unique like a 16 bit rand +
the most significant 48 bits of the system time? 48 bits of
milliseconds is like 9000 years.
> When a server shuts down, more connections are refused, existing
> connections not in mid-request are closed, any remaining connections
> are closed immediately after completion of the request in progress
> and clients can failover gracefully to the next server in the list.
> If a server crashes requests are retried on the next server in the
> list. This failover pattern is followed until there are no more
> servers in the list at which point the client attempts a final
> multicast search (if it was created with a multicast PROVIDER_URL)
> before abandoning the request and throwing an exception to the
> caller. Currently, the failover is ordered but could very easily be
> made random. The multicast discovery aspect of the client adds a
> nice randomness to the selection of the first server that is perhaps
> somewhat "just". Theoretically, servers that are under more load
> will send out less heart beats than servers with no load. This may
> not happen as theory dictates, but certainly as we get more ejb
> statistic data wired into the server functionality we can pursue
> deliberate heartbeat throttling techniques that might make that
> theory really sing in practice.
Very cool.
-dain