You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by Apache Wiki <wi...@apache.org> on 2006/05/31 22:07:23 UTC
[Jackrabbit Wiki] Update of "BackupTool" by Nico

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The following page has been changed by Nico:
http://wiki.apache.org/jackrabbit/BackupTool

New page:
After some iterations on the ML and some discussion with Jukka, here is what we propose to build this summer.

Please feel free to comment (through the ML or this [http://www.contact-us.info/contact.php?id=18 contact form] to contact directly Nicolas Toper). 

== Architecture ==

We want to ease the installation of the backup tool and keep the most flexibility when performing a backup. 

Besides we need to support Jackrabbit installation in its three models: as a shared J2EE resource, as a repository server and as a webapp bundle. Since in this last case we cannot assume any communication layer, we have only one option on where to put the backup tool: as a local backup and maybe a local restore method in the standard org.apache.jackrabbit.api.JackrabbitRepository interface. Jukka? What class should implement this interface? RepositoryImpl in core package?

The backup operation is quite flexible. It can be fired from everywhere (application, sysadmin, remote client,…). 

The restore operation would be fired from a separate application which would be the only one using the repository. There are a lot of constraint to restore but since it is a quite rare operation, its impact is quite limited. 


JackRabbit is still a new project. The functionality set would probably evolve. The backup tool need to show good evolution capability so it can follow Jackrabbit evolutions. Jukka and I propose to use a XML configuration file loaded with each save operation. The configuration file would define what resource is to be saved (and therefore restored) and how (by pointing to a class). This way, it is easy to add new kind of resource (and share the code with the community) and create your own backup plan. If the API change a little bit, we would have only to update one class. (The configuration file is backuped with the repository so the restore operation know what to restore).

For instance, the configuration file would look like this: (it is not a proposition yet, just an example, I will propose later a real format).


<rabbitHole>

<user>

<param name=”login” value=”***”>

<param name=”password” value=”***”>


</user>

<repository>

<resource name=”custom node type” savingClass=”FQN backup class”/>

</repository>

<workspaces type=”selected|all” >

<workspace name=”wsp1” />


<workspace name=”wsp2” />

<workspace name=”wsp3” />


</workspaces>


</rabbitHole >


As you can see, we can backup either all workspaces or a specific one and the same class is used to save and restore a resource (we would be implementing a specific interface with two methods: backup and restore). The Javadoc is going to specify the dependency from the save/restore class to the “main” one (using @link?).

External parameters would be passed to the classes too (for instance to know where is the repository.xml).

== Saved data ==
For now, here is the data we plan to backup. It is only a first step and other resources can be added easily.

'''Backup Configuration file''' 

 * As described upper.


'''Jackrabbit Configuration File'''

 * repository.xml

 * workspace.xml

'''Data'''
 * All workspaces

 * Node version histories (we will backup the workspace directly contrary to what I have written before).

 * Custom Node Types

 * Namespace

'''NB'''

We will not save Lucene index for now.



== Save and restore algorithm ==
-	The configuration files is saved as files.

-	The workspaces (and node version histories) is transferred to a specific workspace (SavingWorkspace) using ObjectPM or XmlPM. We would zip the directory, copy it and destroy the workspace. 

-	Other resources (custom node types and namespaces) are saved and serialized using Jackrabbit's internal xml node type serialization format (NodeTypeWriter and NodeTypeReader for instance).


== Locking strategy ==

For the backup operation, locking will be managed on the backup class level. There is no need for now to hold a global lock for now. We will put a JCR deep lock on the root node of the workspace we are currently saving. If a lock is already held, we would raise an exception (but not kill the backup; it would proceed without the specified workspace). 

About the Jackrabbit conf files, they are not modified by any processes (even workspace.xml?) so there is no issue.

About the other resources (custom node types and namespaces), it is (is it?) managed by the already present code.

For the restore operation, there is no issue since, the restore tool would be the only one using the repository. It is application/syadmin responsibility to enforce this behaviour.


== Next Steps ==

 * exact format of the XML configuration file
 * code design phase: UML Class diagram. 
 * schedual 
 * coding

Each phase is serparated by an iteration on the ML to gather your feedback



=== Evolution ===
After the first release, here are some evolutions ideas

 * Add a remote client using either a dedicated RMI connection or the JCR one.  
 * Add support later for a restore operation while the repository is still in operation by rewriting the local restore operation and its client.
 * Hotbackup (see post on the ML on this subject)
 * Incremental backup (using Rsync ?)
 * Backup Lucene Index (see post on Lucene ML about saving indexes)


'''Please do not hesitate to contact Nicolas Toper (through the ML or this [http://www.contact-us.info/contact.php?id=18 contact form]) on any question/suggestion/idea on this project'''