You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2019/07/05 02:58:00 UTC

[jira] [Updated] (SOLR-13608) Incremental backup for Solr

     [ https://issues.apache.org/jira/browse/SOLR-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cao Manh Dat updated SOLR-13608:
--------------------------------
    Description: 
Currently every call to backup API requires backup the whole index with different backupName. This is very costly and nearly useless in case of large frequent change indexes.

Since index files of Lucene are written one only, they also contains the informantion about checksum of files. Then we can rely on these to support incremental backup -- only upload files that do not present in the repository.

The design for this issue will be like this
* Adding another parameter named {{incremental}} to backup API.
* Adding new methods to {{BackupRepository}}, like compute checksum, deletefiles..
* {{UnsupportedOperationException}} on methods).
* {{SnapShooter}} will skip uploading files from local if file in repository matches in checksum and length.
* Segments_N will be copied last to guarantee that even the backup process get interrupted in the middle, the old backup will still can be used.
* We only keep the last {{IndexCommit}} therefore after uploading Segments_N successfully, any file does not needed for the last {{IndexCommit}} will be deleted. We will try to improve this situation in another issue.
* Any files in ZK will be re-uploaded
** The ZK files coressponds first backup will be stored in same location as today (to maintain backward compatibility)
** On subsequent backups ZK files will be stored in folder {{gen-ith}}



  was:
Currently every call to backup API requires backup the whole index with different backupName. This is very costly and nearly useless in case of large frequent change indexes.

Since index files of Lucene are written one only, they also contains the informantion about checksum of files. Then we can rely on these to support incremental backup -- only upload files that do not present in the repository.

The design for this issue will be like this
* Adding another parameter named {{incremental}} to backup API.
* Adding new methods to {{BackupRepository}}, unfortunately we still uses Java 8 for branch_8_x therefore we can't use {{default method}}. Therefore if someone already have a customised version of BackupRepository, they need to recompile their code. (Simply by throw {{UnsupportedOperationException}} on methods).
* {{SnapShooter}} will skip uploading files from local if file in repository matches in checksum and length.
* Segments_N will be copied last to guarantee that even the backup process get interrupted in the middle, the old backup will still can be used.
* We only keep the last {{IndexCommit}} therefore after uploading Segments_N successfully, any file does not needed for the last {{IndexCommit}} will be deleted. We will try to improve this situation in another issue.
* Any files in ZK will be re-uploaded
** The ZK files coressponds first backup will be stored in same location as today (to maintain backward compatibility)
** On subsequent backups ZK files will be stored in folder {{gen-ith}}




> Incremental backup for Solr
> ---------------------------
>
>                 Key: SOLR-13608
>                 URL: https://issues.apache.org/jira/browse/SOLR-13608
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>
> Currently every call to backup API requires backup the whole index with different backupName. This is very costly and nearly useless in case of large frequent change indexes.
> Since index files of Lucene are written one only, they also contains the informantion about checksum of files. Then we can rely on these to support incremental backup -- only upload files that do not present in the repository.
> The design for this issue will be like this
> * Adding another parameter named {{incremental}} to backup API.
> * Adding new methods to {{BackupRepository}}, like compute checksum, deletefiles..
> * {{UnsupportedOperationException}} on methods).
> * {{SnapShooter}} will skip uploading files from local if file in repository matches in checksum and length.
> * Segments_N will be copied last to guarantee that even the backup process get interrupted in the middle, the old backup will still can be used.
> * We only keep the last {{IndexCommit}} therefore after uploading Segments_N successfully, any file does not needed for the last {{IndexCommit}} will be deleted. We will try to improve this situation in another issue.
> * Any files in ZK will be re-uploaded
> ** The ZK files coressponds first backup will be stored in same location as today (to maintain backward compatibility)
> ** On subsequent backups ZK files will be stored in folder {{gen-ith}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org