You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Sven Schliesing <sc...@subshell.com> on 2008/06/27 15:00:27 UTC

Handling many versions, disposing/swapping out old ones

Hello group,

while our repository grows, we are experiencing a fast growing version 
storage, too.

Some Nodes have a rather large history (30000 and more). This is slowing 
down the system when creating new version.

We are using Jackrabbit 1.3 and already applied some patches to improve 
the version-store performance: JCR-975, JCR-1209 and JCR-1421.

Now I'd like to know about your experiences with a large version store. 
Is performance much better with jackrabbit 1.4 or even 1.5? Is there a 
tested approach to dispose old versions or even better to swap out those 
old versions to another store? Just for the case where you'd want to 
restore a version that is rather old.


Many thanks in advance for your efforts!


Sven

Re: Handling many versions, disposing/swapping out old ones

Posted by Alexander Klimetschek <ak...@day.com>.
Cross-posting to dev, because of the versioning storage discussion.

On Mon, Jun 30, 2008 at 8:26 AM, Sven Schliesing
<sc...@subshell.com> wrote:
> Hi Alex, hi list,
>
>> Wow, what use case do you have? 30000 sounds like the versions are
>> created by an automated system. Is it equally possible that one of
>> those thousands of old revisions are restored? Or displayed in a
>> whatever kind of version history?
>
> The versions are indeed created by humans. :) While the users publish new
> content the system creates one version each time. This is needed for legal
> issues. One use case is to get a version from a certain date.
>
> The customer creating this vast amount of versions is a editorial working
> 24/7. The system is running nearly 1,5 years now. So 50 versions a day is
> quite normal.
>
> As already mentioned we can't/are not allowed to remove old versions.

Ok, I see, you really have a hard use case there ;-)

> So there would be only the option of swapping out old versions.
> Maybe we could create an additional workspace we use for archiving these
> versions?

The versioning storage is like a separate workspace. The problem is
that with more than 10000 child nodes for a node, Jackrabbit gets
slower, since it is not optimized for that use case. This is true for
nearly all persistence managers, especially for the bundle pms. Since
the version storage uses a flat hierarchy, too, you noticed the
problem.

I think you could remove old versions and store them in a different
workspace with a non-flat hierarchy, but you would have to work around
the versioning API to get access to the old versions in that case. Or
even make that visible in the UI, such as calling these versions
"archived" or similar.

A solution in Jackrabbit could mean to change the versioning storage
to use nested nodes instead of a flat hierarchy, although I cannot say
if this is feasible (eg. branches are already nested, I think). What
do other devs think? Valid issue to think about?

Regards,
Alex


-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Handling many versions, disposing/swapping out old ones

Posted by Alexander Klimetschek <ak...@day.com>.
Cross-posting to dev, because of the versioning storage discussion.

On Mon, Jun 30, 2008 at 8:26 AM, Sven Schliesing
<sc...@subshell.com> wrote:
> Hi Alex, hi list,
>
>> Wow, what use case do you have? 30000 sounds like the versions are
>> created by an automated system. Is it equally possible that one of
>> those thousands of old revisions are restored? Or displayed in a
>> whatever kind of version history?
>
> The versions are indeed created by humans. :) While the users publish new
> content the system creates one version each time. This is needed for legal
> issues. One use case is to get a version from a certain date.
>
> The customer creating this vast amount of versions is a editorial working
> 24/7. The system is running nearly 1,5 years now. So 50 versions a day is
> quite normal.
>
> As already mentioned we can't/are not allowed to remove old versions.

Ok, I see, you really have a hard use case there ;-)

> So there would be only the option of swapping out old versions.
> Maybe we could create an additional workspace we use for archiving these
> versions?

The versioning storage is like a separate workspace. The problem is
that with more than 10000 child nodes for a node, Jackrabbit gets
slower, since it is not optimized for that use case. This is true for
nearly all persistence managers, especially for the bundle pms. Since
the version storage uses a flat hierarchy, too, you noticed the
problem.

I think you could remove old versions and store them in a different
workspace with a non-flat hierarchy, but you would have to work around
the versioning API to get access to the old versions in that case. Or
even make that visible in the UI, such as calling these versions
"archived" or similar.

A solution in Jackrabbit could mean to change the versioning storage
to use nested nodes instead of a flat hierarchy, although I cannot say
if this is feasible (eg. branches are already nested, I think). What
do other devs think? Valid issue to think about?

Regards,
Alex


-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Handling many versions, disposing/swapping out old ones

Posted by Sven Schliesing <sc...@subshell.com>.
Hi Alex, hi list,

> Wow, what use case do you have? 30000 sounds like the versions are
> created by an automated system. Is it equally possible that one of
> those thousands of old revisions are restored? Or displayed in a
> whatever kind of version history?

The versions are indeed created by humans. :) While the users publish 
new content the system creates one version each time. This is needed for 
legal issues. One use case is to get a version from a certain date.

The customer creating this vast amount of versions is a editorial 
working 24/7. The system is running nearly 1,5 years now. So 50 versions 
a day is quite normal.

As already mentioned we can't/are not allowed to remove old versions. So 
there would be only the option of swapping out old versions.
Maybe we could create an additional workspace we use for archiving these 
versions?


Many thanks in advance!


Sven

Re: Handling many versions, disposing/swapping out old ones

Posted by Alexander Klimetschek <ak...@day.com>.
Hi Sven!

On Fri, Jun 27, 2008 at 9:00 AM, Sven Schliesing
<sc...@subshell.com> wrote:
> Hello group,
>
> while our repository grows, we are experiencing a fast growing version
> storage, too.
>
> Some Nodes have a rather large history (30000 and more). This is slowing
> down the system when creating new version.

Wow, what use case do you have? 30000 sounds like the versions are
created by an automated system. Is it equally possible that one of
those thousands of old revisions are restored? Or displayed in a
whatever kind of version history?

JCR is generally targeted at user-generated content (think CMS), where
versions map more or less directly to user actions.

> We are using Jackrabbit 1.3 and already applied some patches to improve the
> version-store performance: JCR-975, JCR-1209 and JCR-1421.
>
> Now I'd like to know about your experiences with a large version store.

Not sure if anyone has ever done such a large store...

> Is performance much better with jackrabbit 1.4 or even 1.5?

Nothing so far in trunk, and AFAIK there hasn't been changes in 1.4
related to versioning performance (but not sure, anyone else maybe?).

> Is there a tested
> approach to dispose old versions or even better to swap out those old
> versions to another store? Just for the case where you'd want to restore a
> version that is rather old.

IMHO you should either reduce the number of versions that are stored
or remove unneeded old versions from time-to-time.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com