You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Bergquist, Brett" <BB...@canoga.com> on 2014/01/10 22:45:22 UTC

Question on recoverying after replication break because of a system failure

The reason I am posting to the dev list is that I might want to look into improving Derby in this area.

Just so that I am understand correctly, the steps for replication are:


*         Make a copy of the database to the slave

*         Start replication on the slave and on the master

Now assume that this is working right along and all is well and then the system with the master fails.   So replication is broke and then the slave can be restarted in non-replication mode.   Time goes along and changes are made to the non-replicated database on the slave.   Finally the master machine is brought back on line.

So to get replication going we need to:

*         Copy the database from the slave to the master

*         Start replication on the slave and on the master

This assumes that we have an affinity for having the master being the master but even if this is not the case and the old slave is going to become the new master, we need to copy the database from the slave to the master before starting replication again.

Given a database that is fairly large (say on the order of 200Gb) and not a Gig connection between the master and slave, this could be a fairly long time for the transfer to occur.   Unfortunately during this transfer time, neither database can be used.    So while replication allows quick fail over in an initial failure, re-establishing the replication when the failure has been resolved can cause a substantial long downtime.

So my question, is there any way that this downtime can be reduced?   Could something be done with restoring a backup database and use the logs and then enable replication.     Something like:

*         Make a file system level backup of the slave (using something like freeze and ZFS snapshot, this can take only a couple of seconds) and then allow the slave to continue

o   Assuming that the database logs are being used so that they can be replayed later

*         Transfer the database to the master

*         Transfer the logs

o   Replay each log on the master somehow to get the master to catch up to the slave as close as possible

*         Stop the slave so that it becomes consistent

*         Transfer the last log to the master and replay the master log

*         Enable replication on the master and the slave

Basically limiting the downtime while the database transfer and log file transfer is taking place and then to have a small window of down time where they databases need to become in sync and then replication can be started again.

Any thoughts on this?   Is this an approach that is worth looking at?

RE: Question on recoverying after replication break because of a system failure

Posted by "Bergquist, Brett" <BB...@canoga.com>.
Thanks Knut.  At some point this spring I will probably take a look at implementing this.

-----Original Message-----
From: Knut Anders Hatlen [mailto:knut.hatlen@oracle.com] 
Sent: Tuesday, January 21, 2014 10:37 AM
To: derby-dev@db.apache.org
Subject: Re: Question on recoverying after replication break because of a system failure

Rick Hillegas <ri...@oracle.com> writes:

> On 1/15/14 8:19 AM, Bergquist, Brett wrote:
>>> Any thoughts on this?   Is this an approach that is worth looking at?
>>>
>>
> Hi Brett,
>
> I haven't studied the internals of Derby's online backup, but from a 
> high level this looks like a promising approach.

I agree. Jørgen, one of the developers who implemented the replication feature, posted some similar ideas here:

https://issues.apache.org/jira/browse/DERBY-4197

--
Knut Anders

Re: Question on recoverying after replication break because of a system failure

Posted by Knut Anders Hatlen <kn...@oracle.com>.
Rick Hillegas <ri...@oracle.com> writes:

> On 1/15/14 8:19 AM, Bergquist, Brett wrote:
>>> Any thoughts on this?   Is this an approach that is worth looking at?
>>>
>>
> Hi Brett,
>
> I haven't studied the internals of Derby's online backup, but from a
> high level this looks like a promising approach.

I agree. Jørgen, one of the developers who implemented the replication
feature, posted some similar ideas here:

https://issues.apache.org/jira/browse/DERBY-4197

-- 
Knut Anders

Re: Question on recoverying after replication break because of a system failure

Posted by Rick Hillegas <ri...@oracle.com>.
On 1/15/14 8:19 AM, Bergquist, Brett wrote:
> I took a look at PostgreSQL and its capability of restoring its WAL files and then switching to stream mode for replication once those are complete is almost what I desire.
>
> > From its manual:
>
> http://www.postgresql.org/docs/9.3/static/warm-standby.html
>
> " At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_xlog directory. If that fails, and streaming replication has been configured, the standby tries to connect to the primary server and start streaming WAL from the last valid record found in archive or pg_xlog. If that fails or streaming replication is not configured, or if the connection is later disconnected, the standby goes back to step 1 and tries to restore the file from the archive again. This loop of retries from the archive, pg_xlog, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file."
>
> Just thinking outside of the box and not knowing Derby's internals yet, I could see something like:
>
> - a slave connects to the master and issues the equivalent of "start replication"
> - the master uses something similar to the online backup but instead of writing the backup to a file, it writes the backup to a stream which is transported to the slave.  Simultaneously it also starts the replication stream writing the log entries to the slave.  This is done at the same time because as far as I can tell, when an online database backup is started, the backed up data is consistent as it was when the backup beings (ie. continuing changes to the database while the online backup is occurring are not present in the backup) so if both the backup and starting the shipping of the replication logs are performed at the same time, then once the backup is complete, it plus any replication log entries represent at consistent state of the database
> - the slave would process the backup stream, creating the database until this is complete.  Simultaneously it would be receiving the replication logs and persisting those.  Once the backup is completely received, it would process the persisted replication logs and then continue to process any new replication logs as they arrive.
>
> I understand that this might take a while to complete and would require storage at the slave to persist the replication logs while processing the backup from the master.  I also understand that the equivalent of the online backup might slow down the master while this is occurring but I think having the ability to bring up a slave without having down time on the master would be a great feature.
>
> I think internally, Derby has most of what is needed to accomplish this already:
>
>     - It already has the ability to perform an online backup.  What would need to be added would be the ability to write the backup data over network connection instead of to a filesystem storage
>     - It already has the ability to perform asynchronous replication using the recovery log.  What would need to be added would be on the slave side to buffer and not process this until a consistent backup were received.
>
> Any thoughts on this?
>
> -----Original Message-----
> From: Bergquist, Brett [mailto:BBergquist@canoga.com]
> Sent: Tuesday, January 14, 2014 12:49 PM
> To: derby-dev@db.apache.org
> Subject: RE: Question on recoverying after replication break because of a system failure
>
> Actually the expensive part is having the "master" system down to ensure a completely accurate copy of the database is being made on the "slave".  Note that my "master" here could be actually be the (original) slave system when the (original) master system is repaired.
>
> Derby's replication once the systems are in sync and running seems to be okay.   It is the initial setup time to get the "slave" database to be the same as the "master" database that is expensive because currently (unless I am wrong and correct me here if so), the master cannot be modified while this is occurring.  Then again, restoring to the replication state once a failed system is repaired is again expensive.
>
> I guess I will look at how other database handle this case.  I can't imagine that adding a "replication slave" requires that the master database being down and quiescent.  I would image that it is possible to add a "replication slave" while the "replication master" is hot and running.   This is what I would like Derby to be able to do (note that I am not asking for someone else to do it, as it could very well be a contribution from me).
>
> An analogy would be replacing a failed disk in a RAID array.  The RAID array continues to operate with the failed disk installed.  Now the disk is removed and a new one is installed.   Access to the RAID array is not blocked while the RAID rebuilds the data on the missing disk.
>
> It would be real useful for Derby to operate similarly whereby the replication database can be rebuilt in the background.  Note that while this is being done the replication is degraded (not operating of course with the current one-to-one replication) just is a RAID array is while the disk is being resilvered, but once this process is done, then the replication is up and running.
>
> -----Original Message-----
> From: Rick Hillegas [mailto:rick.hillegas@oracle.com]
> Sent: Tuesday, January 14, 2014 11:40 AM
> To: derby-dev@db.apache.org
> Subject: Re: Question on recoverying after replication break because of a system failure
>
> Hi Brett,
>
> I'm afraid that I'm not following your proposal. Some comments inline...
>
> On 1/10/14 1:45 PM, Bergquist, Brett wrote:
>> The reason I am posting to the dev list is that I might want to look
>> into improving Derby in this area.
>>
>> Just so that I am understand correctly, the steps for replication are:
>>
>> *Make a copy of the database to the slave
>>
> This seems to be the expensive step which results in long downtime.
>> *Start replication on the slave and on the master
>>
>> Now assume that this is working right along and all is well and then
>> the system with the master fails.   So replication is broke and then
>> the slave can be restarted in non-replication mode.   Time goes along
>> and changes are made to the non-replicated database on the slave.
>> Finally the master machine is brought back on line.
>>
>> So to get replication going we need to:
>>
>> *Copy the database from the slave to the master
>>
>> *Start replication on the slave and on the master
>>
>> This assumes that we have an affinity for having the master being the
>> master but even if this is not the case and the old slave is going to
>> become the new master, we need to copy the database from the slave to
>> the master before starting replication again.
>>
>> Given a database that is fairly large (say on the order of 200Gb) and
>> not a Gig connection between the master and slave, this could be a
>> fairly long time for the transfer to occur.   Unfortunately during
>> this transfer time, neither database can be used.    So while
>> replication allows quick fail over in an initial failure,
>> re-establishing the replication when the failure has been resolved can
>> cause a substantial long downtime.
>>
>> So my question, is there any way that this downtime can be reduced?
>> Could something be done with restoring a backup database and use the
>> logs and then enable replication.     Something like:
>>
>> *Make a file system level backup of the slave (using something like
>> freeze and ZFS snapshot, this can take only a couple of seconds) and
>> then allow the slave to continue
>>
>> oAssuming that the database logs are being used so that they can be
>> replayed later
>>
>> *Transfer the database to the master
>>
> I don't understand how this step is different from the expensive step you want to eliminate.
>
> Thanks,
> -Rick
>> *Transfer the logs
>>
>> oReplay each log on the master somehow to get the master to catch up
>> to the slave as close as possible
>>
>> *Stop the slave so that it becomes consistent
>>
>> *Transfer the last log to the master and replay the master log
>>
>> *Enable replication on the master and the slave
>>
>> Basically limiting the downtime while the database transfer and log
>> file transfer is taking place and then to have a small window of down
>> time where they databases need to become in sync and then replication
>> can be started again.
>>
>> Any thoughts on this?   Is this an approach that is worth looking at?
>>
>
Hi Brett,

I haven't studied the internals of Derby's online backup, but from a 
high level this looks like a promising approach.

Thanks,
-Rick

RE: Question on recoverying after replication break because of a system failure

Posted by "Bergquist, Brett" <BB...@canoga.com>.
I took a look at PostgreSQL and its capability of restoring its WAL files and then switching to stream mode for replication once those are complete is almost what I desire.

>From its manual:

http://www.postgresql.org/docs/9.3/static/warm-standby.html

" At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_xlog directory. If that fails, and streaming replication has been configured, the standby tries to connect to the primary server and start streaming WAL from the last valid record found in archive or pg_xlog. If that fails or streaming replication is not configured, or if the connection is later disconnected, the standby goes back to step 1 and tries to restore the file from the archive again. This loop of retries from the archive, pg_xlog, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file."

Just thinking outside of the box and not knowing Derby's internals yet, I could see something like:

- a slave connects to the master and issues the equivalent of "start replication"
- the master uses something similar to the online backup but instead of writing the backup to a file, it writes the backup to a stream which is transported to the slave.  Simultaneously it also starts the replication stream writing the log entries to the slave.  This is done at the same time because as far as I can tell, when an online database backup is started, the backed up data is consistent as it was when the backup beings (ie. continuing changes to the database while the online backup is occurring are not present in the backup) so if both the backup and starting the shipping of the replication logs are performed at the same time, then once the backup is complete, it plus any replication log entries represent at consistent state of the database
- the slave would process the backup stream, creating the database until this is complete.  Simultaneously it would be receiving the replication logs and persisting those.  Once the backup is completely received, it would process the persisted replication logs and then continue to process any new replication logs as they arrive.

I understand that this might take a while to complete and would require storage at the slave to persist the replication logs while processing the backup from the master.  I also understand that the equivalent of the online backup might slow down the master while this is occurring but I think having the ability to bring up a slave without having down time on the master would be a great feature.

I think internally, Derby has most of what is needed to accomplish this already:

   - It already has the ability to perform an online backup.  What would need to be added would be the ability to write the backup data over network connection instead of to a filesystem storage
   - It already has the ability to perform asynchronous replication using the recovery log.  What would need to be added would be on the slave side to buffer and not process this until a consistent backup were received.

Any thoughts on this?

-----Original Message-----
From: Bergquist, Brett [mailto:BBergquist@canoga.com] 
Sent: Tuesday, January 14, 2014 12:49 PM
To: derby-dev@db.apache.org
Subject: RE: Question on recoverying after replication break because of a system failure

Actually the expensive part is having the "master" system down to ensure a completely accurate copy of the database is being made on the "slave".  Note that my "master" here could be actually be the (original) slave system when the (original) master system is repaired.   

Derby's replication once the systems are in sync and running seems to be okay.   It is the initial setup time to get the "slave" database to be the same as the "master" database that is expensive because currently (unless I am wrong and correct me here if so), the master cannot be modified while this is occurring.  Then again, restoring to the replication state once a failed system is repaired is again expensive.

I guess I will look at how other database handle this case.  I can't imagine that adding a "replication slave" requires that the master database being down and quiescent.  I would image that it is possible to add a "replication slave" while the "replication master" is hot and running.   This is what I would like Derby to be able to do (note that I am not asking for someone else to do it, as it could very well be a contribution from me).

An analogy would be replacing a failed disk in a RAID array.  The RAID array continues to operate with the failed disk installed.  Now the disk is removed and a new one is installed.   Access to the RAID array is not blocked while the RAID rebuilds the data on the missing disk.  

It would be real useful for Derby to operate similarly whereby the replication database can be rebuilt in the background.  Note that while this is being done the replication is degraded (not operating of course with the current one-to-one replication) just is a RAID array is while the disk is being resilvered, but once this process is done, then the replication is up and running.

-----Original Message-----
From: Rick Hillegas [mailto:rick.hillegas@oracle.com]
Sent: Tuesday, January 14, 2014 11:40 AM
To: derby-dev@db.apache.org
Subject: Re: Question on recoverying after replication break because of a system failure

Hi Brett,

I'm afraid that I'm not following your proposal. Some comments inline...

On 1/10/14 1:45 PM, Bergquist, Brett wrote:
>
> The reason I am posting to the dev list is that I might want to look 
> into improving Derby in this area.
>
> Just so that I am understand correctly, the steps for replication are:
>
> *Make a copy of the database to the slave
>
This seems to be the expensive step which results in long downtime.
>
> *Start replication on the slave and on the master
>
> Now assume that this is working right along and all is well and then 
> the system with the master fails.   So replication is broke and then 
> the slave can be restarted in non-replication mode.   Time goes along 
> and changes are made to the non-replicated database on the slave.   
> Finally the master machine is brought back on line.
>
> So to get replication going we need to:
>
> *Copy the database from the slave to the master
>
> *Start replication on the slave and on the master
>
> This assumes that we have an affinity for having the master being the 
> master but even if this is not the case and the old slave is going to 
> become the new master, we need to copy the database from the slave to 
> the master before starting replication again.
>
> Given a database that is fairly large (say on the order of 200Gb) and 
> not a Gig connection between the master and slave, this could be a
> fairly long time for the transfer to occur.   Unfortunately during 
> this transfer time, neither database can be used.    So while 
> replication allows quick fail over in an initial failure, 
> re-establishing the replication when the failure has been resolved can 
> cause a substantial long downtime.
>
> So my question, is there any way that this downtime can be reduced?   
> Could something be done with restoring a backup database and use the 
> logs and then enable replication.     Something like:
>
> *Make a file system level backup of the slave (using something like 
> freeze and ZFS snapshot, this can take only a couple of seconds) and 
> then allow the slave to continue
>
> oAssuming that the database logs are being used so that they can be 
> replayed later
>
> *Transfer the database to the master
>
I don't understand how this step is different from the expensive step you want to eliminate.

Thanks,
-Rick
>
> *Transfer the logs
>
> oReplay each log on the master somehow to get the master to catch up 
> to the slave as close as possible
>
> *Stop the slave so that it becomes consistent
>
> *Transfer the last log to the master and replay the master log
>
> *Enable replication on the master and the slave
>
> Basically limiting the downtime while the database transfer and log 
> file transfer is taking place and then to have a small window of down 
> time where they databases need to become in sync and then replication 
> can be started again.
>
> Any thoughts on this?   Is this an approach that is worth looking at?
>


RE: Question on recoverying after replication break because of a system failure

Posted by "Bergquist, Brett" <BB...@canoga.com>.
Actually the expensive part is having the "master" system down to ensure a completely accurate copy of the database is being made on the "slave".  Note that my "master" here could be actually be the (original) slave system when the (original) master system is repaired.   

Derby's replication once the systems are in sync and running seems to be okay.   It is the initial setup time to get the "slave" database to be the same as the "master" database that is expensive because currently (unless I am wrong and correct me here if so), the master cannot be modified while this is occurring.  Then again, restoring to the replication state once a failed system is repaired is again expensive.

I guess I will look at how other database handle this case.  I can't imagine that adding a "replication slave" requires that the master database being down and quiescent.  I would image that it is possible to add a "replication slave" while the "replication master" is hot and running.   This is what I would like Derby to be able to do (note that I am not asking for someone else to do it, as it could very well be a contribution from me).

An analogy would be replacing a failed disk in a RAID array.  The RAID array continues to operate with the failed disk installed.  Now the disk is removed and a new one is installed.   Access to the RAID array is not blocked while the RAID rebuilds the data on the missing disk.  

It would be real useful for Derby to operate similarly whereby the replication database can be rebuilt in the background.  Note that while this is being done the replication is degraded (not operating of course with the current one-to-one replication) just is a RAID array is while the disk is being resilvered, but once this process is done, then the replication is up and running.

-----Original Message-----
From: Rick Hillegas [mailto:rick.hillegas@oracle.com] 
Sent: Tuesday, January 14, 2014 11:40 AM
To: derby-dev@db.apache.org
Subject: Re: Question on recoverying after replication break because of a system failure

Hi Brett,

I'm afraid that I'm not following your proposal. Some comments inline...

On 1/10/14 1:45 PM, Bergquist, Brett wrote:
>
> The reason I am posting to the dev list is that I might want to look 
> into improving Derby in this area.
>
> Just so that I am understand correctly, the steps for replication are:
>
> *Make a copy of the database to the slave
>
This seems to be the expensive step which results in long downtime.
>
> *Start replication on the slave and on the master
>
> Now assume that this is working right along and all is well and then 
> the system with the master fails.   So replication is broke and then 
> the slave can be restarted in non-replication mode.   Time goes along 
> and changes are made to the non-replicated database on the slave.   
> Finally the master machine is brought back on line.
>
> So to get replication going we need to:
>
> *Copy the database from the slave to the master
>
> *Start replication on the slave and on the master
>
> This assumes that we have an affinity for having the master being the 
> master but even if this is not the case and the old slave is going to 
> become the new master, we need to copy the database from the slave to 
> the master before starting replication again.
>
> Given a database that is fairly large (say on the order of 200Gb) and 
> not a Gig connection between the master and slave, this could be a
> fairly long time for the transfer to occur.   Unfortunately during 
> this transfer time, neither database can be used.    So while 
> replication allows quick fail over in an initial failure, 
> re-establishing the replication when the failure has been resolved can 
> cause a substantial long downtime.
>
> So my question, is there any way that this downtime can be reduced?   
> Could something be done with restoring a backup database and use the 
> logs and then enable replication.     Something like:
>
> *Make a file system level backup of the slave (using something like 
> freeze and ZFS snapshot, this can take only a couple of seconds) and 
> then allow the slave to continue
>
> oAssuming that the database logs are being used so that they can be 
> replayed later
>
> *Transfer the database to the master
>
I don't understand how this step is different from the expensive step you want to eliminate.

Thanks,
-Rick
>
> *Transfer the logs
>
> oReplay each log on the master somehow to get the master to catch up 
> to the slave as close as possible
>
> *Stop the slave so that it becomes consistent
>
> *Transfer the last log to the master and replay the master log
>
> *Enable replication on the master and the slave
>
> Basically limiting the downtime while the database transfer and log 
> file transfer is taking place and then to have a small window of down 
> time where they databases need to become in sync and then replication 
> can be started again.
>
> Any thoughts on this?   Is this an approach that is worth looking at?
>


Re: Question on recoverying after replication break because of a system failure

Posted by Rick Hillegas <ri...@oracle.com>.
Hi Brett,

I'm afraid that I'm not following your proposal. Some comments inline...

On 1/10/14 1:45 PM, Bergquist, Brett wrote:
>
> The reason I am posting to the dev list is that I might want to look 
> into improving Derby in this area.
>
> Just so that I am understand correctly, the steps for replication are:
>
> ·Make a copy of the database to the slave
>
This seems to be the expensive step which results in long downtime.
>
> ·Start replication on the slave and on the master
>
> Now assume that this is working right along and all is well and then 
> the system with the master fails.   So replication is broke and then 
> the slave can be restarted in non-replication mode.   Time goes along 
> and changes are made to the non-replicated database on the slave.   
> Finally the master machine is brought back on line.
>
> So to get replication going we need to:
>
> ·Copy the database from the slave to the master
>
> ·Start replication on the slave and on the master
>
> This assumes that we have an affinity for having the master being the 
> master but even if this is not the case and the old slave is going to 
> become the new master, we need to copy the database from the slave to 
> the master before starting replication again.
>
> Given a database that is fairly large (say on the order of 200Gb) and 
> not a Gig connection between the master and slave, this could be a 
> fairly long time for the transfer to occur.   Unfortunately during 
> this transfer time, neither database can be used.    So while 
> replication allows quick fail over in an initial failure, 
> re-establishing the replication when the failure has been resolved can 
> cause a substantial long downtime.
>
> So my question, is there any way that this downtime can be reduced?   
> Could something be done with restoring a backup database and use the 
> logs and then enable replication.     Something like:
>
> ·Make a file system level backup of the slave (using something like 
> freeze and ZFS snapshot, this can take only a couple of seconds) and 
> then allow the slave to continue
>
> oAssuming that the database logs are being used so that they can be 
> replayed later
>
> ·Transfer the database to the master
>
I don't understand how this step is different from the expensive step 
you want to eliminate.

Thanks,
-Rick
>
> ·Transfer the logs
>
> oReplay each log on the master somehow to get the master to catch up 
> to the slave as close as possible
>
> ·Stop the slave so that it becomes consistent
>
> ·Transfer the last log to the master and replay the master log
>
> ·Enable replication on the master and the slave
>
> Basically limiting the downtime while the database transfer and log 
> file transfer is taking place and then to have a small window of down 
> time where they databases need to become in sync and then replication 
> can be started again.
>
> Any thoughts on this?   Is this an approach that is worth looking at?
>


RE: Question on recoverying after replication break because of a system failure

Posted by "Bergquist, Brett" <BB...@canoga.com>.
Are there no comments on this?  Just looking for some feedback on this to see if it might be an avenue in pursuing.

Or maybe another approach that might be better.   Conceptually:

*         Create a new procedure to allow Derby to "prepare for replication" that would be executed on the slave.   This would accept the output of an online backup and any changes since that occurred (in the form of a processing the logs I guess) and would switch to replication mode when instructed

*         Create a new procedure to allow Derby to "initiate replication" that would be executed on the master.   This would perform the equivalent of the online backup with log archive mode (to keep track of the changes of the database since the backup was started) and ship the backup and logs to the slave where they would be processed to get the slave database in sync with the master and then switch to replication mode.

What this would try to achieve would be to get the slave up to date with the master and then process I replication mode while not requiring the downtime to the master.   The master would continue to run just like it does during an online backup and then once the slave has a copy of the database up to the point where it is consistent with the master, replication would be performed.

Any thoughts on this?

From: Bergquist, Brett [mailto:BBergquist@canoga.com]
Sent: Friday, January 10, 2014 4:45 PM
To: derby-dev@db.apache.org
Subject: Question on recoverying after replication break because of a system failure

The reason I am posting to the dev list is that I might want to look into improving Derby in this area.

Just so that I am understand correctly, the steps for replication are:


*         Make a copy of the database to the slave

*         Start replication on the slave and on the master

Now assume that this is working right along and all is well and then the system with the master fails.   So replication is broke and then the slave can be restarted in non-replication mode.   Time goes along and changes are made to the non-replicated database on the slave.   Finally the master machine is brought back on line.

So to get replication going we need to:

*         Copy the database from the slave to the master

*         Start replication on the slave and on the master

This assumes that we have an affinity for having the master being the master but even if this is not the case and the old slave is going to become the new master, we need to copy the database from the slave to the master before starting replication again.

Given a database that is fairly large (say on the order of 200Gb) and not a Gig connection between the master and slave, this could be a fairly long time for the transfer to occur.   Unfortunately during this transfer time, neither database can be used.    So while replication allows quick fail over in an initial failure, re-establishing the replication when the failure has been resolved can cause a substantial long downtime.

So my question, is there any way that this downtime can be reduced?   Could something be done with restoring a backup database and use the logs and then enable replication.     Something like:

*         Make a file system level backup of the slave (using something like freeze and ZFS snapshot, this can take only a couple of seconds) and then allow the slave to continue

o   Assuming that the database logs are being used so that they can be replayed later

*         Transfer the database to the master

*         Transfer the logs

o   Replay each log on the master somehow to get the master to catch up to the slave as close as possible

*         Stop the slave so that it becomes consistent

*         Transfer the last log to the master and replay the master log

*         Enable replication on the master and the slave

Basically limiting the downtime while the database transfer and log file transfer is taking place and then to have a small window of down time where they databases need to become in sync and then replication can be started again.

Any thoughts on this?   Is this an approach that is worth looking at?