You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geronimo.apache.org by David Jencks <da...@yahoo.com> on 2006/09/13 16:45:48 UTC

Re: gcache implementation ideas[long]

I'm a complete beginner in clustering.... but I have some questions.

On Sep 12, 2006, at 12:19 PM, Jeff Genender wrote:

> I wanted to go over a high level design on a gcache cache component  
> and
> get some feedback, input and invite folks who are interested to  
> join in.
> ..so here it goes...
>
> The gcache will be one of several cache/clustering offerings...but
> starting off with the first one...
>
> The first pass I want to go with the master/slave full replication
> implementation.  What this means is a centralized caching server which
> runs a cache implementation (likely will use ehcache underneath), and
> this server is known as a master.  My interest in ehcache is it  
> provides
> the ability to persist session state from a configuration if full
> failure recovery is needed (no need to reinvent the wheel on a great
> cache).  The master will communicate with N number of slave servers,
> also running a gcache implementation.
>
>    +--------+   +---------+  +---------+
>    |        |   |         |  |         |
>    | MASTER |   | SLAVE 1 |  | SLAVE 2 | ... n-slaves
>    |        |   |         |  |         |
>    +--------+   +---------+  +---------+
>       |   |            |           |
>       |   |            |           |
>       |   |____________|           |
>       |                            |
>       |____________________________|
>
>

I can't tell if the master communicates with all slaves or only slave  
1, then slave 1 communicates with slave2, etc.  My questions are  
biased a bit towards a linked list/chain of slaves, which might well  
not be what you intend.

Can you describe what happens if:
- the master goes down
- the former master comes back up
- slave 1 (or any intermediate slave) goes down.
- slave 1 (or any intermediate slave) comes back up
- slave n (last slave) goes down.

>
> We then have client component(s) that "plugs in" and communicates with
> the server.  The configuration for the client should be very light  
> where
> it will only really be concerned with the master/slave/slave/nth- 
> slave.
>  In other words, it communicates only with the master.  The master is
> responsible for "pushing" anything it receives to its slaves and other
> nodes in the cluster.  The slaves basically look like clients to  
> the master.
>

Are you saying that every client maintains the entire shared state?   
If not, how does a client determine if it needs to fetch some state  
and how much state?

>    +--------+   +---------+  +---------+
>    |        |   |         |  |         |
>    | MASTER |---| SLAVE 1 |  | SLAVE 2 |
>    |        |   |         |  |         |
>    +--------+   +---------+  +---------+
>        |  |                       |
>        |  +-----------------------+
>        |
>    ,-------.
>   ( CLIENT  )
>    `-------'
>
> In the event the master goes down, the client notes the timeout and  
> then
> automatically communicates with slave #1 as the new master.  Since  
> slave
> #1 is also a client of the MASTER, it can determine either by  
> itself, or
> by the first request that comes in asking for data, that it is the new
> master.
>
>    +--------+   +---------+  +---------+
>    |  OLD   |   |NEW MSTER|  |         |
>    | MASTER |   |   WAS   |--| SLAVE 2 |
>    |        |   | SLAVE 1 |  |         |
>    +--------+   +---------+  +---------+
>        |           _,'
>        X         ,'
>        |      ,-'
>    ,-------.<'
>   ( CLIENT  )
>    `-------'

It looks to me as if this relies on every client maintaining an  
identical list of master/slave servers.  How do they do that?  (This  
relates to my first set of questions I think)

What happens if

Client 1 maintains communication with master and all slaves  
throughout, and all servers remain running at all times
Client 2 loses connectivity to master long enough to decide that  
master is dead, and gets it back in time to communicate with slave 1,  
thus telling slave 1 it is the new master?

>
> I think this is a fairly simple implementation, yet fairly robust.
> Since we are not doing the heart beat and mcast, we cut down on a  
> lot of
> network traffic.

Can you describe briefly what they are for?

>
> Communication will be done by TCPIP sockets and would probably like to
> use NIO.
>
> I would like to see this component be able to run on its own...i.e. no
> Geronimo needed.  We can build a Geronimo gbean and deployer around  
> it,
> but I would like to see this component usable in many other areas,
> including outside of Geronimo.  Open source needs more "free"  
> clustering
> implementations.  I would like this component to be broken down into 2
> major categories...server and client.
>

What is the difference in functionality between a server and a client?

> After a successful implementation of master/slave, I would like to  
> make
> pluggable strategies, so we can provide for more of a distributed  
> cache,
> partitioning, and other types of joins, such as mcast/heart beat for
> those who want it.
>
> Thoughts and additional ideas?
>

Well, it made me think of lots of questions!

Are there any good references on "principles of clustering" for  
newbies like me?

thanks
david jencks

> Thanks,
>
> Jeff


Re: gcache implementation ideas[long]

Posted by Jeff Genender <jg...@apache.org>.
Hi Gianny,

Thanks for the comments.  I am not so sure I want to inject election or
quorum into the mix as I want the first pass to be as simple as
possible.  For the sake of simplicity, already knowing the order will
help.

As I thought about this more, if a slave comes back, it can notify the
other slaves that its alive and then it's back in its proper order.
Keep in mind, an intermediate slave only really needs to notify the
master and other slaves that it's back.

Jeff

Gianny Damour wrote:
> 
> On 14/09/2006, at 10:58 PM, Jacek Laskowski wrote:
> 
>> On 9/14/06, Jeff Genender <jg...@apache.org> wrote:
>>> David Jencks wrote:
>>
>>> > - slave 1 (or any intermediate slave) goes down.
>>> > - slave 1 (or any intermediate slave) comes back up
>>> > - slave n (last slave) goes down.
>>>
>>> All great questions.  I would like feedback here.
>>
>> See above. The next slave will act as a master until all are gone and
>> the cluster is deemed to have failed. The way 'the next' is computed
>> depends on the magical strategy that's in use (it could be the next in
>> the sense of a list concept or computed randomly).
> I agree. I think that the way this could be achieved is by the mean of
> an  ActiveCluster or WADI like API. Basically, each node executes
> locally an election strategy, when a failure is detected. If the
> election strategy implementation has a determistic outcome, i.e. each
> node uses the same list of nodes and each of them elects the same one,
> then you have your next master. For instance, the WADI implementation is
> to use the oldest node of the cluster.
> 
> I think that if each node picks a node at random, then each one will
> have to broadcast their selection to the other nodes. I believe that if
> one of them has quorum, then each node does know now which of them is
> master.
> 
> Thanks,
> Gianny
> 
> 
>>
>> Jacek
>>
>> --Jacek Laskowski
>> http://www.laskowski.net.pl

Re: gcache implementation ideas[long]

Posted by Gianny Damour <gi...@optusnet.com.au>.
On 14/09/2006, at 10:58 PM, Jacek Laskowski wrote:

> On 9/14/06, Jeff Genender <jg...@apache.org> wrote:
>> David Jencks wrote:
>
>> > - slave 1 (or any intermediate slave) goes down.
>> > - slave 1 (or any intermediate slave) comes back up
>> > - slave n (last slave) goes down.
>>
>> All great questions.  I would like feedback here.
>
> See above. The next slave will act as a master until all are gone and
> the cluster is deemed to have failed. The way 'the next' is computed
> depends on the magical strategy that's in use (it could be the next in
> the sense of a list concept or computed randomly).
I agree. I think that the way this could be achieved is by the mean  
of an  ActiveCluster or WADI like API. Basically, each node executes  
locally an election strategy, when a failure is detected. If the  
election strategy implementation has a determistic outcome, i.e. each  
node uses the same list of nodes and each of them elects the same  
one, then you have your next master. For instance, the WADI  
implementation is to use the oldest node of the cluster.

I think that if each node picks a node at random, then each one will  
have to broadcast their selection to the other nodes. I believe that  
if one of them has quorum, then each node does know now which of them  
is master.

Thanks,
Gianny


>
> Jacek
>
> -- 
> Jacek Laskowski
> http://www.laskowski.net.pl


Re: gcache implementation ideas[long]

Posted by Jacek Laskowski <ja...@laskowski.net.pl>.
On 9/14/06, Jeff Genender <jg...@apache.org> wrote:
> David Jencks wrote:
...
> > - the former master comes back up
>
> Good question.  Do we A) have it return as a master? or B) have it come
> in as a slave-n?  Need some input here.

First off, I'm a complete clastering beginner so "don't be cruel to a
heart that's true" ;-)

I'd say it depends on a strategy a cluster's configured with. On the
other hand, does it really matter who's the master if all are equal?
When a tcpip connection is made to a slave-n it would become a master
and in case it gets up and running again it could become a slave-n (as
the state would change in the meantime when it's been down) and only
become a master when other slaves are gone. I think there's no need to
promote a just-hung-and-booted server as a master when there's a
master and it's doing well.

> > - slave 1 (or any intermediate slave) goes down.
> > - slave 1 (or any intermediate slave) comes back up
> > - slave n (last slave) goes down.
>
> All great questions.  I would like feedback here.

See above. The next slave will act as a master until all are gone and
the cluster is deemed to have failed. The way 'the next' is computed
depends on the magical strategy that's in use (it could be the next in
the sense of a list concept or computed randomly).

Jacek

-- 
Jacek Laskowski
http://www.laskowski.net.pl

Re: gcache implementation ideas[long]

Posted by Jeff Genender <jg...@apache.org>.

David Jencks wrote:
> I'm a complete beginner in clustering.... but I have some questions.
> 
> 
> I can't tell if the master communicates with all slaves or only slave 1,
> then slave 1 communicates with slave2, etc.  My questions are biased a
> bit towards a linked list/chain of slaves, which might well not be what
> you intend.
> 
> Can you describe what happens if:
> - the master goes down

Master goes down, the first slave in order will become the new master.

> - the former master comes back up

Good question.  Do we A) have it return as a master? or B) have it come
in as a slave-n?  Need some input here.

> - slave 1 (or any intermediate slave) goes down.
> - slave 1 (or any intermediate slave) comes back up
> - slave n (last slave) goes down.

All great questions.  I would like feedback here.

> 
>>
>> We then have client component(s) that "plugs in" and communicates with
>> the server.  The configuration for the client should be very light where
>> it will only really be concerned with the master/slave/slave/nth-slave.
>>  In other words, it communicates only with the master.  The master is
>> responsible for "pushing" anything it receives to its slaves and other
>> nodes in the cluster.  The slaves basically look like clients to the
>> master.
>>
> 
> Are you saying that every client maintains the entire shared state?  If
> not, how does a client determine if it needs to fetch some state and how
> much state?

This one I can answer.  This first pass is full replication (not
distributed)...so under this scenario, it will maintain full state.  We
can handle this 2 ways.  We can either have that when the client joins
the cluster, it gets the full tomato...or we can have it retrieve full
state the first time it is asked for the session and it does not have it
(ie. Server does not get a copy of the session unless end users asks it
of that server).  I think the latter would probably be less overhead and
perform a bit better.

> It looks to me as if this relies on every client maintaining an
> identical list of master/slave servers.  How do they do that?  (This
> relates to my first set of questions I think)

Configuration startup on the client machines would contain that
information.  The clients need to be aware of who is master, and who are
the slaves.

> 
> What happens if
> 
> Client 1 maintains communication with master and all slaves throughout,
> and all servers remain running at all times
> Client 2 loses connectivity to master long enough to decide that master
> is dead, and gets it back in time to communicate with slave 1, thus
> telling slave 1 it is the new master?
> 

Great question once again.

A possible solution is that once the slave is being requested data, it
can check with the master to see if it's alive.

>>
>> I think this is a fairly simple implementation, yet fairly robust.
>> Since we are not doing the heart beat and mcast, we cut down on a lot of
>> network traffic.
> 
> Can you describe briefly what they are for?
> 

Sure.  A heart beat implementation is used with multicast (mcast).  Each
server that wants to join a cluster offers a ping every second or few
seconds, that tells everyone else that they are in the cluster.  So they
kind of join each other, creating a channel between each other.  Each
server in the cluster has n-1 connections to each other.  As you can
see, in a short amount of time, you can have a very chatty network.
This type of clustering solution is great for a small number of nodes.

In master/slave, I have a connection between the master and each client
(including the slaves as a client), all without the need for heartbeat.
 This cuts down on the network clutter significantly.

>>
>> Communication will be done by TCPIP sockets and would probably like to
>> use NIO.
>>
>> I would like to see this component be able to run on its own...i.e. no
>> Geronimo needed.  We can build a Geronimo gbean and deployer around it,
>> but I would like to see this component usable in many other areas,
>> including outside of Geronimo.  Open source needs more "free" clustering
>> implementations.  I would like this component to be broken down into 2
>> major categories...server and client.
>>
> 
> What is the difference in functionality between a server and a client?

The server is something that runs the central cache and it literally a
socket server.  This wold be your master or slave machines.  The client
is a simple SPI that communicates with the server. I say SPI because the
API itself is contractual in nature, thus allowing us to plug in other
strategies (i.e. heartbeat, localized cache, distributed, etc).  The
idea here is that the client does not care about implementation, only
using the simple API.

> 
>> After a successful implementation of master/slave, I would like to make
>> pluggable strategies, so we can provide for more of a distributed cache,
>> partitioning, and other types of joins, such as mcast/heart beat for
>> those who want it.
>>
>> Thoughts and additional ideas?
>>
> 
> Well, it made me think of lots of questions!
> 
> Are there any good references on "principles of clustering" for newbies
> like me?

Oh yes...how about the best of them all, Tangasol.  Cameron and friends
did about the best clustering writeup of any doc I have ever seen on
clustering.:

http://wiki.tangosol.com/display/COH32UG/Coherence+3.2+Home

> 
> thanks
> david jencks
> 
>> Thanks,
>>
>> Jeff