You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Michael Yin <MY...@theladders.com> on 2010/02/08 21:26:29 UTC

Revision cleanup

Our jackrabbit 1.4.x db has 80000 revisions. If we don't care about
version history, but also want to add new 'cluster nodes' at any point
but don't want to sit waiting for jackrabbit to process 80,000
revisions, is there any way to reset the revision counter to speed that
up? Currently we tend to copy around local repo folder, but that is just
asking for corruption.

 

I was thinking about exporting to XML then reimporting into a clean
repo, but there must be a better way than that. 

 

-mike

Re: Revision cleanup

Posted by Guo Du <mr...@gmail.com>.

On Mon, Feb 8, 2010 at 9:27 PM, Michael Yin <MY...@theladders.com> wrote:
> So, just to get specific,  I could delete the contents of the JOURNAL
> table and change REVISION_ID in GLOBAL_REVISION to 0?
Sorry, didn't aware it's already in a cluster. Have no idea how it
will affect replication. If it doesn't take too long to
backup/restore, may give it a try or use a few record in a fresh
system to do some test.

-Guo

RE: Revision cleanup

Posted by Michael Yin <MY...@theladders.com>.

Ah, I see. Thanks for all the info. 

So there is no way to clean up cluster revisions? I currently don't have the time commitment to create the backup scripts with consistency checking (but would like to do this in the future). 

Will exporting and importing achieve what I want to do? Is everything exported properly into XML?

-mike

> -----Original Message-----
> From: Alexander Klimetschek [mailto:aklimets@day.com]
> Sent: Tuesday, February 09, 2010 5:45 AM
> To: users@jackrabbit.apache.org
> Subject: Re: Revision cleanup
> 
> Michael,
> 
> On Tue, Feb 9, 2010 at 11:38, Alexander Klimetschek <ak...@day.com>
> wrote:
> > What Guo meant is to delete the table that holds the (virtual)
> version
> > workspace, defined under <Versioning> in the repository.xml.  The
> name
> > of that db table depends on the config (mainly schemaObjectPrefix).
> > AFAIK that approach should work, however make sure you make backups
> > before trying that!
> 
> Reading your original post again, it sounds like you only mean cluster
> revisions. To repeat, those revisions are not related to JCR versions,
> ie. JCR versions are stored just as any other nodes, only in a
> different persistence manager = "virtual" workspace. The cluster
> journal works a layer deeper.
> 
> So Ian's description should quite hit the nail for you.
> 
> Regards,
> Alex
> 
> --
> Alexander Klimetschek
> alexander.klimetschek@day.com

Re: Revision cleanup

Posted by Alexander Klimetschek <ak...@day.com>.

Michael,

On Tue, Feb 9, 2010 at 11:38, Alexander Klimetschek <ak...@day.com> wrote:
> What Guo meant is to delete the table that holds the (virtual) version
> workspace, defined under <Versioning> in the repository.xml.  The name
> of that db table depends on the config (mainly schemaObjectPrefix).
> AFAIK that approach should work, however make sure you make backups
> before trying that!

Reading your original post again, it sounds like you only mean cluster
revisions. To repeat, those revisions are not related to JCR versions,
ie. JCR versions are stored just as any other nodes, only in a
different persistence manager = "virtual" workspace. The cluster
journal works a layer deeper.

So Ian's description should quite hit the nail for you.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Revision cleanup

Posted by Alexander Klimetschek <ak...@day.com>.

On Mon, Feb 8, 2010 at 22:27, Michael Yin <MY...@theladders.com> wrote:
> So, just to get specific,  I could delete the contents of the JOURNAL
> table and change REVISION_ID in GLOBAL_REVISION to 0?

Note that these revision IDs are revision numbers for the
journal/clustering and have nothing to do with JCR versions stored in
the repository. For each persistent save() in one of the cluster
nodes, a new revision is handled by the cluster. You should not change
the journal table.

What Guo meant is to delete the table that holds the (virtual) version
workspace, defined under <Versioning> in the repository.xml.  The name
of that db table depends on the config (mainly schemaObjectPrefix).
AFAIK that approach should work, however make sure you make backups
before trying that!

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

RE: Revision cleanup

Posted by Michael Yin <MY...@theladders.com>.

So, just to get specific,  I could delete the contents of the JOURNAL
table and change REVISION_ID in GLOBAL_REVISION to 0?

While making sure no one was connected to the DB.

-mike

> -----Original Message-----
> From: Guo Du [mailto:mrduguo@gmail.com]
> Sent: Monday, February 08, 2010 4:11 PM
> To: users@jackrabbit.apache.org
> Subject: Re: Revision cleanup
> 
> On Mon, Feb 8, 2010 at 8:26 PM, Michael Yin <MY...@theladders.com>
> wrote:
> > I was thinking about exporting to XML then reimporting into a clean
> > repo, but there must be a better way than that.
> 
> Version history stored in separate table, I think you may simple run
> sql to delete all record in the version table after you stopped the
> repository instance.
> 
> Then delete the index before start the repository.
> 
> Never try this, take a safe backup before any action :)
> 
> -Guo

Re: Revision cleanup

Posted by Guo Du <mr...@gmail.com>.

On Mon, Feb 8, 2010 at 8:26 PM, Michael Yin <MY...@theladders.com> wrote:
> I was thinking about exporting to XML then reimporting into a clean
> repo, but there must be a better way than that.

Version history stored in separate table, I think you may simple run
sql to delete all record in the version table after you stopped the
repository instance.

Then delete the index before start the repository.

Never try this, take a safe backup before any action :)

-Guo

Re: Revision cleanup

Posted by Ian Boston <ie...@tfd.co.uk>.

On 8 Feb 2010, at 20:26, Michael Yin wrote:

> Our jackrabbit 1.4.x db has 80000 revisions. If we don't care about
> version history, but also want to add new 'cluster nodes' at any point
> but don't want to sit waiting for jackrabbit to process 80,000
> revisions, is there any way to reset the revision counter to speed that
> up? Currently we tend to copy around local repo folder, but that is just
> asking for corruption.
> 

We have been running in production for about 18 months in a 8 node cluster with JR1.4. Our app servers are hosted on Xen VM's and we drop and recreate them to adjust for load. Here is what we do.

1. We rsync backup the local repo onto a shares server, performing sequential rsyncs untill we get no modifications in the state of the files on disk from beginning to end.
1a once we have a stable copy we tar that up and send to a central backup server as a "snapshot" of the local node.
2 To determine is the snapshot is stable, we read the local revisions file from the local repo and compare it to the state in the central DB. If they are the same we know nothing was pending in the local state, so if there are no rsync changes the snapshot is stable and in sync with the DB.
3 We store all the local revisions number of all the snapshots in one place, and periodically clean the revision history in the DB upto the lowest revision number.

on creation of a VM to join the cluster. 
We find the latest snapshot from any node
Unpack the snapshot
Modify local settings (server ID etc)
Bring the node up, at which point it catches up with the rest of the cluster, usually a delay of < 1min

This was all implemented as perl scripts and as I say has been good for about a 18 months. The nice part is at any one time we have about 8 good snapshots, so if for any reason 1 is bad, there are 7 more to try.

The critical part is to get the snapshot stable before taking it, unfortunately there is no way of pausing JR to allow this to happen, although we could have put something into the ClusterNode implementation to trigger a snapshot. I suspect under really heavy load this would not work.

Ian

> 
> 
> I was thinking about exporting to XML then reimporting into a clean
> repo, but there must be a better way than that. 
> 
> 
> 
> -mike
>