You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@qpid.apache.org by Sandy Pratt <pr...@adobe.com> on 2009/12/11 21:59:20 UTC

RE: [c++ cluster] User doc for persistent clusters.

> -----Original Message-----
> From: Alan Conway [mailto:aconway@redhat.com]
> Sent: Tuesday, November 24, 2009 7:56 AM
> To: Jonathan Robie; qpid-dev-apache
> Subject: [c++ cluster] User doc for persistent clusters.
> 
> I put up a user view of the peristent cluster changes, coming soon.
> Would
> appreaciate any feedback on the doc or the feature it describes.
> 
> http://cwiki.apache.org/confluence/display/qpid/Persistent+Cluster+Rest
> art+Design+Note


Hi Alan,

Looks like a great step forward for clustering.  Any hints on what's involved in the manual intervention to enable restart from a full cluster crash?  I'm eager to kick the tires.

Thanks,

Sandy

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org

Re: [c++ cluster] User doc for persistent clusters.

Posted by Alan Conway <ac...@redhat.com>.

On 12/11/2009 04:58 PM, Sandy Pratt wrote:
>
>
>> -----Original Message-----
>> From: Alan Conway [mailto:aconway@redhat.com]
>> Sent: Friday, December 11, 2009 1:35 PM
>> To: dev@qpid.apache.org
>> Cc: qpid-dev-apache
>> Subject: Re: [c++ cluster] User doc for persistent clusters.
>>
>> On 12/11/2009 03:59 PM, Sandy Pratt wrote:
>>>
>>>> -----Original Message-----
>>>> From: Alan Conway [mailto:aconway@redhat.com]
>>>> Sent: Tuesday, November 24, 2009 7:56 AM
>>>> To: Jonathan Robie; qpid-dev-apache
>>>> Subject: [c++ cluster] User doc for persistent clusters.
>>>>
>>>> I put up a user view of the peristent cluster changes, coming soon.
>>>> Would
>>>> appreaciate any feedback on the doc or the feature it describes.
>>>>
>>>>
>> http://cwiki.apache.org/confluence/display/qpid/Persistent+Cluster+Rest
>>>> art+Design+Note
>>>
>>>
>>> Hi Alan,
>>>
>>> Looks like a great step forward for clustering.  Any hints on what's
>> involved in the manual intervention to enable restart from a full
>> cluster crash?  I'm eager to kick the tires.
>>>
>>
>> Basically it amounts to picking the "best" store and marking it clean
>> by putting
>> a UUID in<data_dir>/cluster/shutdown.uuid, i.e. pretend it was shut
>> down cleanly.
>>
>
> Simple enough, thanks!
>
>> I'm working on providing some help to identify the "best" store and
>> ultimately
>> hope to provide a tool for doing all this a bit more automatically. It
>> will
>> probably mean running the tool on the data-directory of each cluster
>> member
>> initially which is a pain - assumes remote logins or shared file
>> systems.
>>
>> I'd like to find a way to do this from one location without assuming
>> shared
>> filesystems or remote logins. There was a suggestion that if there's a
>> total
>> faliure the brokers come up in "admin mode" where they don't accept any
>> clients
>> except an admin tool. The brokers would collect the info needed to pick
>> the
>> clean store and mark it clean driven remotely by the admin tool. Does
>> this sound
>> like a good direction, or do you have any other suggestions on how to
>> approach this?
>
> That does sound like a good approach to me.
>
> Suppose the cluster has crashed because of a power or network failure (single brokers crashing due to OS or hardware problems isn't the issue here, if I understand correctly).
>
> Then the common case is that they all power back up without issue, and can unanimously pick the correct journal while having access to all candidate journals (hand waving a bit here, maybe*).

In the event of a total failure like this I'm a little uneasy about the cluster 
picking the best store automatically. I had been considering this to be a manual 
intervention case, with some tools that make it fairly trivial but still 
requiring human intervention.

>
> The uncommon case is that they all die for whatever reason, then some member of the cluster fails to come back up.  At this point, full information from all candidate journals is not available, and a unanimous decision cannot be reached.  Manual intervention sounds fine here.
>
> *I noticed in some of the JIRA notes that the message store changes are labeled with a monotonically increasing sequence number.  Is this derived from the Lamport clock implemented by openais (in which case all the events are conveniently serialized by the CPG)?  I could be misunderstanding the way the CPG works, but if not it sounds like an excellent way to get the cluster back in sync.
>
The sequence number is the frame counter maintained by qpidd, but the cluster 
join protocol is designed so that all members will  have the same frame count 
for the same frame regarless when they join the cluster. It's currently not 
persisted so can't be used in recovery for 2 reasons

  - the cost of persisting it every frame time is significant compared to the 
cost to store a message.
  - a frame counter persisted separately from the message store does not 
necessarily reflect the state of the store.

My current thinking is to use a config-change counter (counts membership 
changes) in conjunction with a counter recorded by the store in the same 
transaction with a message called the Persistence ID or PID.
The PID is not the same across the clsuter but but we can resolve that:
  - record the config-change-counter and current PID with every config change.
  - on recovery compare the actual store PID values relative to the PID value 
recorded for the same config change.

I'll be working on this in the coming week or two so any input you have is very 
timely.
Cheers,
Alan.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org

RE: [c++ cluster] User doc for persistent clusters.

Posted by Sandy Pratt <pr...@adobe.com>.

> -----Original Message-----
> From: Alan Conway [mailto:aconway@redhat.com]
> Sent: Friday, December 11, 2009 1:35 PM
> To: dev@qpid.apache.org
> Cc: qpid-dev-apache
> Subject: Re: [c++ cluster] User doc for persistent clusters.
> 
> On 12/11/2009 03:59 PM, Sandy Pratt wrote:
> >
> >> -----Original Message-----
> >> From: Alan Conway [mailto:aconway@redhat.com]
> >> Sent: Tuesday, November 24, 2009 7:56 AM
> >> To: Jonathan Robie; qpid-dev-apache
> >> Subject: [c++ cluster] User doc for persistent clusters.
> >>
> >> I put up a user view of the peristent cluster changes, coming soon.
> >> Would
> >> appreaciate any feedback on the doc or the feature it describes.
> >>
> >>
> http://cwiki.apache.org/confluence/display/qpid/Persistent+Cluster+Rest
> >> art+Design+Note
> >
> >
> > Hi Alan,
> >
> > Looks like a great step forward for clustering.  Any hints on what's
> involved in the manual intervention to enable restart from a full
> cluster crash?  I'm eager to kick the tires.
> >
> 
> Basically it amounts to picking the "best" store and marking it clean
> by putting
> a UUID in <data_dir>/cluster/shutdown.uuid, i.e. pretend it was shut
> down cleanly.
> 

Simple enough, thanks!

> I'm working on providing some help to identify the "best" store and
> ultimately
> hope to provide a tool for doing all this a bit more automatically. It
> will
> probably mean running the tool on the data-directory of each cluster
> member
> initially which is a pain - assumes remote logins or shared file
> systems.
> 
> I'd like to find a way to do this from one location without assuming
> shared
> filesystems or remote logins. There was a suggestion that if there's a
> total
> faliure the brokers come up in "admin mode" where they don't accept any
> clients
> except an admin tool. The brokers would collect the info needed to pick
> the
> clean store and mark it clean driven remotely by the admin tool. Does
> this sound
> like a good direction, or do you have any other suggestions on how to
> approach this?

That does sound like a good approach to me.

Suppose the cluster has crashed because of a power or network failure (single brokers crashing due to OS or hardware problems isn't the issue here, if I understand correctly).

Then the common case is that they all power back up without issue, and can unanimously pick the correct journal while having access to all candidate journals (hand waving a bit here, maybe*).

The uncommon case is that they all die for whatever reason, then some member of the cluster fails to come back up.  At this point, full information from all candidate journals is not available, and a unanimous decision cannot be reached.  Manual intervention sounds fine here.

*I noticed in some of the JIRA notes that the message store changes are labeled with a monotonically increasing sequence number.  Is this derived from the Lamport clock implemented by openais (in which case all the events are conveniently serialized by the CPG)?  I could be misunderstanding the way the CPG works, but if not it sounds like an excellent way to get the cluster back in sync.

Sandy

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org

Re: [c++ cluster] User doc for persistent clusters.

Posted by Alan Conway <ac...@redhat.com>.

On 12/11/2009 03:59 PM, Sandy Pratt wrote:
>
>> -----Original Message-----
>> From: Alan Conway [mailto:aconway@redhat.com]
>> Sent: Tuesday, November 24, 2009 7:56 AM
>> To: Jonathan Robie; qpid-dev-apache
>> Subject: [c++ cluster] User doc for persistent clusters.
>>
>> I put up a user view of the peristent cluster changes, coming soon.
>> Would
>> appreaciate any feedback on the doc or the feature it describes.
>>
>> http://cwiki.apache.org/confluence/display/qpid/Persistent+Cluster+Rest
>> art+Design+Note
>
>
> Hi Alan,
>
> Looks like a great step forward for clustering.  Any hints on what's involved in the manual intervention to enable restart from a full cluster crash?  I'm eager to kick the tires.
>

Basically it amounts to picking the "best" store and marking it clean by putting 
a UUID in <data_dir>/cluster/shutdown.uuid, i.e. pretend it was shut down cleanly.

I'm working on providing some help to identify the "best" store and ultimately 
hope to provide a tool for doing all this a bit more automatically. It will 
probably mean running the tool on the data-directory of each cluster member 
initially which is a pain - assumes remote logins or shared file systems.

I'd like to find a way to do this from one location without assuming shared 
filesystems or remote logins. There was a suggestion that if there's a total 
faliure the brokers come up in "admin mode" where they don't accept any clients 
except an admin tool. The brokers would collect the info needed to pick the 
clean store and mark it clean driven remotely by the admin tool. Does this sound 
like a good direction, or do you have any other suggestions on how to approach this?

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org