You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geode.apache.org by Nick Reich <nr...@pivotal.io> on 2017/08/10 17:31:58 UTC

[DISCUSS] Improvements to backups

There is a desire to improve backup creation and restore. The suggested
improvements are listed below and I am seeking feedback from the community:

1) Allow saving of backups to different locations/systems: currently,
backups are saved to a directory on each member. Users can manually or
through scripting move those backups elsewhere, but it would be
advantageous to allow direct backups to cloud storage providers (amazon,
google, azure, etc.) and possibly other systems. To make this possible, it
is proposed to refactor backups into a service style architecture with
backup location plugins that can be used to specify the target location.
This would allow creation of additional backup strategies as demand is
determined and allow users to create their own plugins for their own
special use cases.

2) Changing backup restore procedure: backups create a restore script per
member that must be run from each member to restore a backup to. The script
created is based on the OS of the machine the backup is created on (it
mainly moves files to the correct directories). A more flexible system
would be to instead create a metadata file (xml, yaml, etc.) which contains
information on the files in the backup. This would allow the logic for
moving files and other activities in the backup restore process to be
maintained in our codebase in an operating system agnostic way. Because the
existing script is not dependent on geode code, old backups would not be
affected by this change, though the process for restoring new backups would
(likely using gfsh instead of sh or bat scripts).

3) Improved incremental backups: incremental backup allows for significant
space savings and is much quicker to run. However, it suffers from the
problem that you can only restore to the latest time the incremental backup
was run, as we overwrite user files, cache xml and properties, among other
files in the backup directory. By saving this information to timestamped
directories, restoring to a specific time point would be as simple as
choosing the newest point in the backup to include in the restore. Using
timestamped directories for normal backups as well would prevent successive
backups from overwriting each other.

Re: [DISCUSS] Improvements to backups

Posted by Nick Reich <nr...@pivotal.io>.

Dan, you are correct on #3: there is one location where this appears to not
be the case, but it is unused and thus timestamped directories is currently
implemented and overwrites should not be possible. This therefore also
covers incremental backups and negates the need for change #3. However,
what this means is that incremental backups need to know the timestamped
directory of the last backup. This suggests a different potential
optimization: keeping the (timestamped) incremental backup dirs in a base
directory and either using a metadata file or the timestamps from directory
names to determine the last incremental backup and automatically using that
as the baseline for the current backup (instead of having to (manually)
know what that directory was from the previous backup to use in the current
backup command)

On Thu, Aug 10, 2017 at 10:37 AM, Dan Smith <ds...@pivotal.io> wrote:

> +1 this all looks good to me. I think #2 in particular would probably
> simplify the incremental backup code.
>
> For #3, I could have sworn the backups were already going into timestamped
> directories and nothing got overwritten in an existing backup. If that is
> not already happening that definitely should change!
>
> -Dan
>
> On Thu, Aug 10, 2017 at 10:31 AM, Nick Reich <nr...@pivotal.io> wrote:
>
> > There is a desire to improve backup creation and restore. The suggested
> > improvements are listed below and I am seeking feedback from the
> community:
> >
> > 1) Allow saving of backups to different locations/systems: currently,
> > backups are saved to a directory on each member. Users can manually or
> > through scripting move those backups elsewhere, but it would be
> > advantageous to allow direct backups to cloud storage providers (amazon,
> > google, azure, etc.) and possibly other systems. To make this possible,
> it
> > is proposed to refactor backups into a service style architecture with
> > backup location plugins that can be used to specify the target location.
> > This would allow creation of additional backup strategies as demand is
> > determined and allow users to create their own plugins for their own
> > special use cases.
> >
> > 2) Changing backup restore procedure: backups create a restore script per
> > member that must be run from each member to restore a backup to. The
> script
> > created is based on the OS of the machine the backup is created on (it
> > mainly moves files to the correct directories). A more flexible system
> > would be to instead create a metadata file (xml, yaml, etc.) which
> contains
> > information on the files in the backup. This would allow the logic for
> > moving files and other activities in the backup restore process to be
> > maintained in our codebase in an operating system agnostic way. Because
> the
> > existing script is not dependent on geode code, old backups would not be
> > affected by this change, though the process for restoring new backups
> would
> > (likely using gfsh instead of sh or bat scripts).
> >
> > 3) Improved incremental backups: incremental backup allows for
> significant
> > space savings and is much quicker to run. However, it suffers from the
> > problem that you can only restore to the latest time the incremental
> backup
> > was run, as we overwrite user files, cache xml and properties, among
> other
> > files in the backup directory. By saving this information to timestamped
> > directories, restoring to a specific time point would be as simple as
> > choosing the newest point in the backup to include in the restore. Using
> > timestamped directories for normal backups as well would prevent
> successive
> > backups from overwriting each other.
> >
>

Re: [DISCUSS] Improvements to backups

Posted by Dan Smith <ds...@pivotal.io>.

+1 this all looks good to me. I think #2 in particular would probably
simplify the incremental backup code.

For #3, I could have sworn the backups were already going into timestamped
directories and nothing got overwritten in an existing backup. If that is
not already happening that definitely should change!

-Dan

On Thu, Aug 10, 2017 at 10:31 AM, Nick Reich <nr...@pivotal.io> wrote:

> There is a desire to improve backup creation and restore. The suggested
> improvements are listed below and I am seeking feedback from the community:
>
> 1) Allow saving of backups to different locations/systems: currently,
> backups are saved to a directory on each member. Users can manually or
> through scripting move those backups elsewhere, but it would be
> advantageous to allow direct backups to cloud storage providers (amazon,
> google, azure, etc.) and possibly other systems. To make this possible, it
> is proposed to refactor backups into a service style architecture with
> backup location plugins that can be used to specify the target location.
> This would allow creation of additional backup strategies as demand is
> determined and allow users to create their own plugins for their own
> special use cases.
>
> 2) Changing backup restore procedure: backups create a restore script per
> member that must be run from each member to restore a backup to. The script
> created is based on the OS of the machine the backup is created on (it
> mainly moves files to the correct directories). A more flexible system
> would be to instead create a metadata file (xml, yaml, etc.) which contains
> information on the files in the backup. This would allow the logic for
> moving files and other activities in the backup restore process to be
> maintained in our codebase in an operating system agnostic way. Because the
> existing script is not dependent on geode code, old backups would not be
> affected by this change, though the process for restoring new backups would
> (likely using gfsh instead of sh or bat scripts).
>
> 3) Improved incremental backups: incremental backup allows for significant
> space savings and is much quicker to run. However, it suffers from the
> problem that you can only restore to the latest time the incremental backup
> was run, as we overwrite user files, cache xml and properties, among other
> files in the backup directory. By saving this information to timestamped
> directories, restoring to a specific time point would be as simple as
> choosing the newest point in the backup to include in the restore. Using
> timestamped directories for normal backups as well would prevent successive
> backups from overwriting each other.
>