You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geronimo.apache.org by Andy Piper <an...@bea.com> on 2006/02/02 13:41:28 UTC
Re: Replication using totem protocol
At 09:25 AM 1/18/2006, Jules Gosnell wrote:
>I haven't been able to convince myself to take the quorum approach because...
>
>shared-something approach:
>- the shared something is a Single Point of Failure (SPoF) -
>although you could use an HA something.
That's how WAS and WLS do it. Use an HA database, SAN or dual-ported
scsi. The latter is cheap. The former are probably already available
to customers if they really care about availability.
>- If the node holding the lock 'goes crazy', but does not die, the
>rest of the
This is generally why you use leases. Then your craziness is only
believed for a fixed amount of time.
>cluster becomes a fragment - so it becomes an SPoF as well.
>- used in isolation, it does not take into account that the lock may
>be held by the smallest cluster fragment
You generally solve this again with leases. i.e. a lock that is valid
for some period.
>shared-nothing approach:
Nice in theory but tricky to implement well. Consensus works well here.
>- I prefer this approach, but, as you have stated, if the two halves
>are equally sized...
>- What if there are two concurrent fractures (does this happen?)
>- ActiveCluster notifies you of one membership change at a time - so
>you would have to decide on an algorithm for 'chunking' node loss,
>so that you could decide when a fragmentation had occurred...
If you really want to do this reliably you have to assume that AC
will send you bogus notifications. Ideally you want to achieve a
consensus on membership to avoid this. It sounds like totem solves
some of these issues.
andy
Re: Replication using totem protocol
Posted by Jules Gosnell <ju...@coredevelopers.net>.
Andy Piper wrote:
> At 09:25 AM 1/18/2006, Jules Gosnell wrote:
>
>> I haven't been able to convince myself to take the quorum approach
>> because...
>>
>> shared-something approach:
>> - the shared something is a Single Point of Failure (SPoF) - although
>> you could use an HA something.
>
>
> That's how WAS and WLS do it. Use an HA database, SAN or dual-ported
> scsi. The latter is cheap. The former are probably already available
> to customers if they really care about availability.
Well, I guess we will have to consider making something along these
lines available... - I guess we need a pluggable QuorumStrategy.
>
>> - If the node holding the lock 'goes crazy', but does not die, the
>> rest of the
>
>
> This is generally why you use leases. Then your craziness is only
> believed for a fixed amount of time.
Understood.
>
>> cluster becomes a fragment - so it becomes an SPoF as well.
>> - used in isolation, it does not take into account that the lock may
>> be held by the smallest cluster fragment
>
>
> You generally solve this again with leases. i.e. a lock that is valid
> for some period.
i don't follow you here - but we have lost quite a bit of context. i
think that I was saying that if the fragment that owned the
shared-something was the smaller of the two, then 'freezing' the larger
fragment would not be optimal - but, I guess you could use the
shared-something to negotiate between the two fragments and decide which
to freeze and which to allow to continue...
I don't see leases helping here - but maybe i have mitaken the context ?
>
>> shared-nothing approach:
>
>
> Nice in theory but tricky to implement well. Consensus works well here.
>
>> - I prefer this approach, but, as you have stated, if the two halves
>> are equally sized...
>> - What if there are two concurrent fractures (does this happen?)
>> - ActiveCluster notifies you of one membership change at a time - so
>> you would have to decide on an algorithm for 'chunking' node loss, so
>> that you could decide when a fragmentation had occurred...
>
>
> If you really want to do this reliably you have to assume that AC will
> send you bogus notifications. Ideally you want to achieve a consensus
> on membership to avoid this. It sounds like totem solves some of these
> issues.
Totem does seem to have some advanced consensus stuff, which, I am
?assuming?, relies on its virtual synchrony. This stuff would probably
be very useful under ActiveCluster to manage membership change and
partition notifications, as it would, I understand, guarantee that every
node received a consistant view of what was going on.
For the peer->peer messaging aspect of AC (1->1 and 1->all), I don't
think VS is required. In fact it might be an unwelcome overhead. I don't
know enough about the internals of AC and Totem to know if it would be
possible to reuse Totem's VS/consensus stuff on-top-of/along-side AMQs
e.g. peer:// protocol stack and underneath AC's membership notification
API, but it seems to me that ultimately the best solution would be a
hybrid, that uses these approaches where needed and not where not...
Have I got the right end of the stick ? Perhaps you can choose which
messages are virtually synchronous and which are not in Totem ? I am
pretty sure though, that it was using muticast, so is not the best
solution for 1->1 messaging....
Jules
>
> andy
--
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."
/**********************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
*
* www.coredevelopers.net
*
* Open Source Training & Support.
**********************************/