You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Rakesh Vidyadharan <ra...@sptci.com> on 2014/06/20 17:34:20 UTC

Cleaning up version history

Hello all,

We are using JackRabbit 2.4.3 through Magnolia CMS.  Our users had used the documents workspace provided by Magnolia to upload large media files (movies, audio), some of which are over 100mb.  We migrated all that content away from Magnolia/JackRabbit into a regular Apache webserver, and I set up a DataStore Garbage Collector task to remove the underlying files that had been make unreachable by deleting the nodes.

It turns out that the documents workspace in Magnolia uses versioning, so after the GC run, I did not see much of a difference in the repository disk usage.  I followed the steps outlined in http://stackoverflow.com/questions/3292719/how-do-you-restore-a-versioned-node-in-a-jackrabbit-2-1-repository to iterate over all the version history for the workspace, and found the versions that were saved for the nodes that we had deleted.  I assume it is the version history that prevents the GC from removing the underlying files.

Unlike the code in that post, iterating the version history nodes does not return instances of javax.jcr.version.Version, but just regular javax.jcr.Node with primary node type nt:version.  I adapted my code accordingly to see the versions that are stored.  The question is how do I go about removing these versions from the workspace (we absolutely do not need those versions)?  The JCR API does not seem to give me any way to even restore these nodes if I wanted to, since the API seems to require the original node, from which I can then iterate over the saved versions and remove if needed.  In our case the original nodes have been removed, so there is no starting point to use to iterate over the version history.

For now, I have hacked a solution where, I look up the file name for the jcr:data, and then using the directory structure JackRabbit uses, delete the underlying files, so I have reclaimed the disk space.  This is obviously non-ideal, and I would like to find out the proper way to go about removing old versions we do not need and reclaiming the disk space.

Thanks
Rakesh

Re: Cleaning up version history

Posted by Rakesh Vidyadharan <ra...@sptci.com>.
On 26 Jun 2014, at 04:21, Julian Reschke <ju...@gmx.de> wrote:

> On 2014-06-20 17:34, Rakesh Vidyadharan wrote:
>> Hello all,
>> 
>> We are using JackRabbit 2.4.3 through Magnolia CMS.  Our users had used the documents workspace provided by Magnolia to upload large media files (movies, audio), some of which are over 100mb.  We migrated all that content away from Magnolia/JackRabbit into a regular Apache webserver, and I set up a DataStore Garbage Collector task to remove the underlying files that had been make unreachable by deleting the nodes.
>> 
>> It turns out that the documents workspace in Magnolia uses versioning, so after the GC run, I did not see much of a difference in the repository disk usage.  I followed the steps outlined in http://stackoverflow.com/questions/3292719/how-do-you-restore-a-versioned-node-in-a-jackrabbit-2-1-repository to iterate over all the version history for the workspace, and found the versions that were saved for the nodes that we had deleted.  I assume it is the version history that prevents the GC from removing the underlying files.
>> 
>> Unlike the code in that post, iterating the version history nodes does not return instances of javax.jcr.version.Version, but just regular javax.jcr.Node with primary node type nt:version.  I adapted my code accordingly to see the versions that are stored.  The question is how do I go about removing these versions from the workspace (we absolutely do not need those versions)?  The JCR API does not seem to give me any way to even restore these nodes if I wanted to, since the API seems to require the original node, from which I can then iterate over the saved versions and remove if needed.  In our case the original nodes have been removed, so there is no starting point to use to iterate over the version history.
>> 
>> For now, I have hacked a solution where, I look up the file name for the jcr:data, and then using the directory structure JackRabbit uses, delete the underlying files, so I have reclaimed the disk space.  This is obviously non-ideal, and I would like to find out the proper way to go about removing old versions we do not need and reclaiming the disk space.
> 
> The version histories are regular nodes. Why don't you use the JCR API to delete them?
> 
> Best regards, Julian
> 

Hi Julian,

Aren’t all nodes under “jcr:system” protected by default?  I tried to delete the version node as you suggested and got an exception:

Removing version node: /jcr:system/jcr:versionStorage/3a/5f/27/3a5f2714-03de-4b2b-8bb8-1b2bd1a6cb2b
ConstraintViolationException: Unable to perform operation. Node is protected.

Rakesh

Re: Cleaning up version history

Posted by Julian Reschke <ju...@gmx.de>.
On 2014-06-20 17:34, Rakesh Vidyadharan wrote:
> Hello all,
>
> We are using JackRabbit 2.4.3 through Magnolia CMS.  Our users had used the documents workspace provided by Magnolia to upload large media files (movies, audio), some of which are over 100mb.  We migrated all that content away from Magnolia/JackRabbit into a regular Apache webserver, and I set up a DataStore Garbage Collector task to remove the underlying files that had been make unreachable by deleting the nodes.
>
> It turns out that the documents workspace in Magnolia uses versioning, so after the GC run, I did not see much of a difference in the repository disk usage.  I followed the steps outlined in http://stackoverflow.com/questions/3292719/how-do-you-restore-a-versioned-node-in-a-jackrabbit-2-1-repository to iterate over all the version history for the workspace, and found the versions that were saved for the nodes that we had deleted.  I assume it is the version history that prevents the GC from removing the underlying files.
>
> Unlike the code in that post, iterating the version history nodes does not return instances of javax.jcr.version.Version, but just regular javax.jcr.Node with primary node type nt:version.  I adapted my code accordingly to see the versions that are stored.  The question is how do I go about removing these versions from the workspace (we absolutely do not need those versions)?  The JCR API does not seem to give me any way to even restore these nodes if I wanted to, since the API seems to require the original node, from which I can then iterate over the saved versions and remove if needed.  In our case the original nodes have been removed, so there is no starting point to use to iterate over the version history.
>
> For now, I have hacked a solution where, I look up the file name for the jcr:data, and then using the directory structure JackRabbit uses, delete the underlying files, so I have reclaimed the disk space.  This is obviously non-ideal, and I would like to find out the proper way to go about removing old versions we do not need and reclaiming the disk space.

The version histories are regular nodes. Why don't you use the JCR API 
to delete them?

Best regards, Julian