You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@airavata.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/06/02 20:29:00 UTC

[jira] [Commented] (AIRAVATA-3694) User data archive management commands

    [ https://issues.apache.org/jira/browse/AIRAVATA-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728863#comment-17728863 ] 

ASF subversion and git services commented on AIRAVATA-3694:
-----------------------------------------------------------

Commit 09e6aaf4350cf35ed92555a81e27aeb33d952843 in airavata's branch refs/heads/master from Marcus Christie
[ https://gitbox.apache.org/repos/asf?p=airavata.git;h=09e6aaf435 ]

AIRAVATA-3694 Ansible: configure data archive max ages for scigap hosted gateways


> User data archive management commands
> -------------------------------------
>
>                 Key: AIRAVATA-3694
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-3694
>             Project: Airavata
>          Issue Type: New Feature
>          Components: Django Portal
>            Reporter: Marcus Christie
>            Assignee: Marcus Christie
>            Priority: Major
>
> Create management commands to manage archiving user data. The use case is the gateway admin wants to archive older data and then delete that user data to free up disk space.
> The management commands will handle creating archives (as tarballs) and deleting the data from the user data archive directory. There will also be an unarchive command. There are settings for the max age of files to be archived and for the directory in which archives should be copied.
> How the archive file are archived. It's expected that the gateway admin would periodically (perhaps by cron) copy the archive files from the web server to some other file server.
> h3. Description
> archive_user_data creates a tarball archive of user data for all files and directories that are older than some configured amount of days. In addition to the tarball is a text file that lists all of the files and directories that are archived. The tarball and text file can be periodically pushed to tape backup or any other backup location.
> The configuration settings are 
> - GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS
> - GATEWAY_USER_DATA_ARCHIVE_DIRECTORY
> -- this is the directory in which to place the archive files and is also the place where temporary files are generated. Since the archive files can be large, it's important that there be enough free disk space on the partition where this directory lives
> - GATEWAY_USER_DATA_ARCHIVE_MINIMUM_ARCHIVE_SIZE_GB
> -- defaults to 1 GB. This can be used to prevent creating a lot of small archives since tape archives often want a few large files instead of many small files.
> h4. Running archive_user_data
> All commands should be run as the gateway server user (pga).
> {code}
> python manage.py archive_user_data --dry-run
> {code}
> This just prints the files and directories that would be archived and exits. Good for checking that configuration is correct, etc.
> {code}
> python manage.py archive_user_data
> {code}
> This will actually create an archive and then delete from user data the files that were archived.
> {code}
> python manage.py archive_user_data --max-age MAX_AGE
> {code}
> The --max-age flag allows overriding the GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS setting. This can be a good option to create the first few archives when introducing the user data archive to an existing gateway.
> h4. Running unarchive_user_data 
> unarchive_user_data requires an archive tarball as input. The main use case for this command is that the gateway administrator wants to restore some particular user data. First, the right archive must be found. The experiment details view in Experiment Statistics will display the name of the archive file for an experiment data directory that has been archived. Use this to then retrieve the tarball from backup. Then run unarchive_user_data on the file.
> {code}
> python manage.py unarchive_user_data /path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz
> {code}
> The timestamps will be restored from the archive, including the last modified timestamps. This means that the next time archive_user_data runs, all files unarchived will be re-archived. Sometimes that is desired, but if you want to reset the last modified times, use the {{--reset-modification}} option:
> {code}
> python manage.py unarchive_user_data --reset-modification /path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)