You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2013/10/11 16:28:01 UTC

Migration without an embedded Jackrabbit

Hi,

I've been thinking about the upgrade/migration code (oak-upgrade,
OAK-458) over the past few days, and trying to figure out how we could
achieve that without having to keep the full Jackrabbit 2.x codebase
as dependency. The same question comes up for the support for
Jackrabbit 2.x datastores (OAK-805).

The key problem here is that the Jackrabbit 2.x codebase is already so
convoluted that it's practically impossible to just pick up say
something like an individual persistence manager or data store
implementation and access it directly without keeping the rest of the
2.x codebase around. This is troublesome for many reasons, for example
using such components require lots of extra setup code (essentially a
full RepositoryImpl instance) and the size of the required extra
dependencies is about a dozen megabytes.

Thus I'm inclined to instead just implement the equivalent
functionality directly in Oak. This requires some code duplication
(we'd for example need the same persistence managers in both Oak and
Jackrabbit), but the versions in Oak could be a lot simpler and more
streamlined as only a subset of the functionality is needed. To reduce
the amount of duplication we could push some of the shared utility
code (like NodePropBundle, etc.) to jackrabbit-jcr-commons or to a new
jackrabbit-shared component.

WDYT?

BR,

Jukka Zitting

Re: Migration without an embedded Jackrabbit

Posted by Michael Marth <mm...@adobe.com>.
Hi Jukka,

I think the situation is slightly different between OAK-458 and OAK-805 (although I come to roughly the same conclusion in both cases)

OAK-805: JR users that cannot upgrade their existing data store will be stuck "eternally" after a migration (I think it is fair to assume and a reasonable deployment case that a large amount of binaries might not be feasible to move ever). So, in turn this means that we would be eternally stuck with all the mentioned JR dependencies in Oak unless we re-implement the existing functionality. However, it also means, that this re-implementation needs to be "production quality" for read/write (as opposed to the second case). So, I think a re-implmentation makes sense.

OAK-458: In this case the functionality that needs to be re-implemented is certainly a subset, as it requires only read-access to the repo (and does not need to cover a lot of edge cases that a full fledged read-write support needs to cover). So, I am inclined to also opt for a re-implementation if this subset is sufficiently easy to implement.
If not, we could pull in the whole enchilada of dependencies as we know that we can drop them later: IMO it would be fair to drop (or make optional) the ability to upgrade an existing JR repo to Oak at a certain point in the future and thus remove the additionally needed deps.

Michael


On Oct 11, 2013, at 4:28 PM, Jukka Zitting wrote:

Hi,

I've been thinking about the upgrade/migration code (oak-upgrade,
OAK-458) over the past few days, and trying to figure out how we could
achieve that without having to keep the full Jackrabbit 2.x codebase
as dependency. The same question comes up for the support for
Jackrabbit 2.x datastores (OAK-805).

The key problem here is that the Jackrabbit 2.x codebase is already so
convoluted that it's practically impossible to just pick up say
something like an individual persistence manager or data store
implementation and access it directly without keeping the rest of the
2.x codebase around. This is troublesome for many reasons, for example
using such components require lots of extra setup code (essentially a
full RepositoryImpl instance) and the size of the required extra
dependencies is about a dozen megabytes.

Thus I'm inclined to instead just implement the equivalent
functionality directly in Oak. This requires some code duplication
(we'd for example need the same persistence managers in both Oak and
Jackrabbit), but the versions in Oak could be a lot simpler and more
streamlined as only a subset of the functionality is needed. To reduce
the amount of duplication we could push some of the shared utility
code (like NodePropBundle, etc.) to jackrabbit-jcr-commons or to a new
jackrabbit-shared component.

WDYT?

BR,

Jukka Zitting


Re: Migration without an embedded Jackrabbit

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Nov 26, 2013 at 4:55 AM, Tobias Bocanegra <tr...@apache.org> wrote:
> I want to start working on OAK-1222 (migrating group memberships). Is
> there already code and/or test cases that show how migration would
> work?

The relevant code and test cases are in oak-upgrade. See also the (so
far undocumented) "upgrade" mode in oak-run, which allows you to run
the migration on the command line.

> are there already extension points for migrators?

The idea has been to use a commit hook for any required content
transformations. See the copy() method where the RegistrationEditor is
already used that way to handle node types. I think the group
membership structure should be updated similarly.

BR,

Jukka Zitting

Re: Migration without an embedded Jackrabbit

Posted by Tobias Bocanegra <tr...@apache.org>.
Hi Jukka,

I want to start working on OAK-1222 (migrating group memberships). Is
there already code and/or test cases that show how migration would
work?
are there already extension points for migrators?

regards, toby

On Wed, Oct 16, 2013 at 7:26 AM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On Mon, Oct 14, 2013 at 1:20 PM, Tobias Bocanegra <tr...@adobe.com> wrote:
>> I second Felix' comments and prefer a standalone upgrade tool. this does not mean that an upgrade
>> is always a manual step. the embedding application (e.g. Sling) can still contain the tool and
>> auto-upgrade if desired.
>
> After some more work on this, I'm actually inclined to go with a
> hybrid solution somewhat along those lines.
>
> I'd implement a custom code for simple upgrades (default Jackrabbit
> configuration, etc.) that can reasonably be done transparently without
> extra user action or with any of the extra Jackrabbit dependencies,
> but fall back to a separate upgrade tool with standard Jackrabbit
> components for more complex cases and situations where the user in any
> case wants full control over the upgrade. This should be reasonably
> straightforward to implement without much code duplication. For now
> I've added a simple "ugprade" mode to oak-run as an initial take on
> such a separate upgrade tool.
>
> I even think I have a solution that allows me to avoid having to embed
> all the Jackrabbit dependencies in an OSGi environment. Basically I'd
> define a few simplified extension interfaces in oak-upgrade for things
> like Jackrabbit persistence managers and other required internals, and
> Jackrabbit bundles installed in the same OSGi container can offer the
> required implementations for those interfaces.
>
>> I even think that a migration could be done purely on the JCR level, for example using vlt rcp
>> (which does not support copying versions, but this could be improved).
>
> Right. Something like that would ultimately be quite nice, as it would
> give us an implementation-independent way to backup and restore entire
> repositories, a bit like the dump features in many SQL databases work.
> Though getting to that point will probably require quite a bit of work
> (including API extensions required for version support, etc.), and it
> will probably never reach similar performance as a direct lower-level
> upgrade can.
>
> BR,
>
> Jukka Zitting

Re: Migration without an embedded Jackrabbit

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Oct 14, 2013 at 1:20 PM, Tobias Bocanegra <tr...@adobe.com> wrote:
> I second Felix' comments and prefer a standalone upgrade tool. this does not mean that an upgrade
> is always a manual step. the embedding application (e.g. Sling) can still contain the tool and
> auto-upgrade if desired.

After some more work on this, I'm actually inclined to go with a
hybrid solution somewhat along those lines.

I'd implement a custom code for simple upgrades (default Jackrabbit
configuration, etc.) that can reasonably be done transparently without
extra user action or with any of the extra Jackrabbit dependencies,
but fall back to a separate upgrade tool with standard Jackrabbit
components for more complex cases and situations where the user in any
case wants full control over the upgrade. This should be reasonably
straightforward to implement without much code duplication. For now
I've added a simple "ugprade" mode to oak-run as an initial take on
such a separate upgrade tool.

I even think I have a solution that allows me to avoid having to embed
all the Jackrabbit dependencies in an OSGi environment. Basically I'd
define a few simplified extension interfaces in oak-upgrade for things
like Jackrabbit persistence managers and other required internals, and
Jackrabbit bundles installed in the same OSGi container can offer the
required implementations for those interfaces.

> I even think that a migration could be done purely on the JCR level, for example using vlt rcp
> (which does not support copying versions, but this could be improved).

Right. Something like that would ultimately be quite nice, as it would
give us an implementation-independent way to backup and restore entire
repositories, a bit like the dump features in many SQL databases work.
Though getting to that point will probably require quite a bit of work
(including API extensions required for version support, etc.), and it
will probably never reach similar performance as a direct lower-level
upgrade can.

BR,

Jukka Zitting

Re: Migration without an embedded Jackrabbit

Posted by Tobias Bocanegra <tr...@adobe.com>.
Hi,

I second Felix' comments and prefer a standalone upgrade tool. this does not mean that an upgrade is always a manual step. the embedding application (e.g. Sling) can still contain the tool and auto-upgrade if desired.

I even think that a migration could be done purely on the JCR level, for example using vlt rcp (which does not support copying versions, but this could be improved).
This specially might be useful, if the migration also includes some more complex reconfiguration, e.g. setup a highly clustered environment or special datastores.

Regards, Toby





On Mon, Oct 14, 2013 at 6:46 AM, Jukka Zitting <ju...@gmail.com>> wrote:
Hi,

On Mon, Oct 14, 2013 at 9:09 AM, Felix Meschberger <fm...@adobe.com>> wrote:
> Thanks for the clarifications. Yet, they more confirm my first impression than they resolve the concerns.
>
> Particularly your timing and smoothness assumptions. Migrating JR2 to Oak is much more complex
> than migrating from one version of JR2 to the next. I would even assume it is way more complex
> than migrating from JR1 to JR2.

Yes, that's definitely true.

As for the timing assumptions, based on rough estimates I'm expecting
that a fairly large (i.e. millions of nodes) repository could be
upgraded in minutes with custom upgrade code vs. hours with standard
Jackrabbit components. Definitely not as smooth or fast as past
Jackrabbit upgrades, but still reasonable and IMO worth the overhead.

> So, let's agree to disagree ;-)

Thanks for sharing your concerns! Unless others also weigh in on the
side of a separate upgrade tool, I think for now I'd still give a go
for the proposed custom/duplicate code. But I'll keep an eye on the
amount of extra complexity, and as Michael noted, fall back to the
separate tool if the amount of duplication/overhead seems to start
becoming unmanageable. The PMs in Jackrabbit are now at 15kLOC, so my
warning bells would start ringing at around 2kLOC of extra code for PM
upgrades.

BR,

Jukka Zitting


Re: Migration without an embedded Jackrabbit

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Oct 14, 2013 at 9:09 AM, Felix Meschberger <fm...@adobe.com> wrote:
> Thanks for the clarifications. Yet, they more confirm my first impression than they resolve the concerns.
>
> Particularly your timing and smoothness assumptions. Migrating JR2 to Oak is much more complex
> than migrating from one version of JR2 to the next. I would even assume it is way more complex
> than migrating from JR1 to JR2.

Yes, that's definitely true.

As for the timing assumptions, based on rough estimates I'm expecting
that a fairly large (i.e. millions of nodes) repository could be
upgraded in minutes with custom upgrade code vs. hours with standard
Jackrabbit components. Definitely not as smooth or fast as past
Jackrabbit upgrades, but still reasonable and IMO worth the overhead.

> So, let's agree to disagree ;-)

Thanks for sharing your concerns! Unless others also weigh in on the
side of a separate upgrade tool, I think for now I'd still give a go
for the proposed custom/duplicate code. But I'll keep an eye on the
amount of extra complexity, and as Michael noted, fall back to the
separate tool if the amount of duplication/overhead seems to start
becoming unmanageable. The PMs in Jackrabbit are now at 15kLOC, so my
warning bells would start ringing at around 2kLOC of extra code for PM
upgrades.

BR,

Jukka Zitting

Re: Migration without an embedded Jackrabbit

Posted by Felix Meschberger <fm...@adobe.com>.
Hi

Thanks for the clarifications. Yet, they more confirm my first impression than they resolve the concerns. 

Particularly your timing and smoothness assumptions. Migrating JR2 to Oak is much more complex than migrating from one version of JR2 to the next. I would even assume it is way more complex than migrating from JR1 to JR2.

So, let's agree to disagree ;-)

Regards
Felix

Am 14.10.2013 um 15:02 schrieb Jukka Zitting:

> Hi,
> 
> On Mon, Oct 14, 2013 at 4:38 AM, Felix Meschberger <fm...@adobe.com> wrote:
>> IMNSHO migration of Jackrabbit 2 based repositories to Oak is a one-shot problem:
>> you apply this once to a repository and be done. So why load the application with
>> a host of unneeded pieces ?
> 
> I'd like to make the upgrade as smooth and transparent as possible. I
> all Jackrabbit versions so far the upgrade has required just starting
> up a new version of the repository, and any required migration steps
> have been handled transparently under the hood.
> 
>> Rather, I suggest to come up with a standalone application, which can be a conglomerate
>> of original Jackrabbit and Oak libraries and which do the migration in one step.
> 
> That's a valid alternative, especially since the Oak upgrade is by far
> the most complex migration so far. And I agree with your concerns
> about code duplication. In this case though I believe the benefits
> outweigh the drawbacks, see below.
> 
>> This application can be optimized and fine-tuned to just this single use-case: migration.
> 
> Unfortunately it can't. While the Oak internals are designed with
> these kinds of bulk operations in mind, the Jackrabbit internals are
> not. There are some pretty major optimizations (like streaming bundles
> from a persistence manager, vs. loading them one-by-one) that we could
> do with custom upgrade-oriented code and that wouldn't be available
> with the standard Jackrabbit components.
> 
>> This way, both Jackrabbit 2 and Oak applications stay clean of such migration junk.
> 
> Note that with the approach I'm proposing, all the custom migration
> code would go to the oak-upgrade component that's independent of the
> rest of the stack. Once the upgrade is done, a deployment can safely
> drop that component.
> 
>> This also makes it clear that migration of storage from Jackrabbit2 to Oak is not
>> something that can and will be done by just snipping your fingers, but which is
>> a potentially long-running and complex operation.
> 
> My goal here is to make the upgrade *not* be a long-running and
> complex operation.
> 
> BR,
> 
> Jukka Zitting


Re: Migration without an embedded Jackrabbit

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Oct 14, 2013 at 4:38 AM, Felix Meschberger <fm...@adobe.com> wrote:
> IMNSHO migration of Jackrabbit 2 based repositories to Oak is a one-shot problem:
> you apply this once to a repository and be done. So why load the application with
> a host of unneeded pieces ?

I'd like to make the upgrade as smooth and transparent as possible. I
all Jackrabbit versions so far the upgrade has required just starting
up a new version of the repository, and any required migration steps
have been handled transparently under the hood.

> Rather, I suggest to come up with a standalone application, which can be a conglomerate
> of original Jackrabbit and Oak libraries and which do the migration in one step.

That's a valid alternative, especially since the Oak upgrade is by far
the most complex migration so far. And I agree with your concerns
about code duplication. In this case though I believe the benefits
outweigh the drawbacks, see below.

> This application can be optimized and fine-tuned to just this single use-case: migration.

Unfortunately it can't. While the Oak internals are designed with
these kinds of bulk operations in mind, the Jackrabbit internals are
not. There are some pretty major optimizations (like streaming bundles
from a persistence manager, vs. loading them one-by-one) that we could
do with custom upgrade-oriented code and that wouldn't be available
with the standard Jackrabbit components.

> This way, both Jackrabbit 2 and Oak applications stay clean of such migration junk.

Note that with the approach I'm proposing, all the custom migration
code would go to the oak-upgrade component that's independent of the
rest of the stack. Once the upgrade is done, a deployment can safely
drop that component.

> This also makes it clear that migration of storage from Jackrabbit2 to Oak is not
> something that can and will be done by just snipping your fingers, but which is
> a potentially long-running and complex operation.

My goal here is to make the upgrade *not* be a long-running and
complex operation.

BR,

Jukka Zitting

Re: Migration without an embedded Jackrabbit

Posted by Felix Meschberger <fm...@adobe.com>.
Hi

I see the problem and I agree that this in fact *is* a problem.

But I still don't agree with an integrated, transparent solution to this upgrade problem. And I never will -- such application bloat and even code duplication along with testing and maintenance etc. requirements just sound scaring.

Also: Code duplication is one of the big evils in application development.

IMNSHO migration of Jackrabbit 2 based repositories to Oak is a one-shot problem: you apply this once to a repository and be done. So why load the application with a host of unneeded pieces ?

Rather, I suggest to come up with a standalone application, which can be a conglomerate of original Jackrabbit and Oak libraries and which do the migration in one step. This application can be optimized and fine-tuned to just this single use-case: migration.

This way, both Jackrabbit 2 and Oak applications stay clean of such migration junk.

This also makes it clear that migration of storage from Jackrabbit2 to Oak is not something that can and will be done by just snipping your fingers, but which is a potentially long-running and complex operation.

Regards
Felix

Am 11.10.2013 um 16:28 schrieb Jukka Zitting:

> Hi,
> 
> I've been thinking about the upgrade/migration code (oak-upgrade,
> OAK-458) over the past few days, and trying to figure out how we could
> achieve that without having to keep the full Jackrabbit 2.x codebase
> as dependency. The same question comes up for the support for
> Jackrabbit 2.x datastores (OAK-805).
> 
> The key problem here is that the Jackrabbit 2.x codebase is already so
> convoluted that it's practically impossible to just pick up say
> something like an individual persistence manager or data store
> implementation and access it directly without keeping the rest of the
> 2.x codebase around. This is troublesome for many reasons, for example
> using such components require lots of extra setup code (essentially a
> full RepositoryImpl instance) and the size of the required extra
> dependencies is about a dozen megabytes.
> 
> Thus I'm inclined to instead just implement the equivalent
> functionality directly in Oak. This requires some code duplication
> (we'd for example need the same persistence managers in both Oak and
> Jackrabbit), but the versions in Oak could be a lot simpler and more
> streamlined as only a subset of the functionality is needed. To reduce
> the amount of duplication we could push some of the shared utility
> code (like NodePropBundle, etc.) to jackrabbit-jcr-commons or to a new
> jackrabbit-shared component.
> 
> WDYT?
> 
> BR,
> 
> Jukka Zitting


Re: Migration without an embedded Jackrabbit

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Oct 14, 2013 at 4:27 AM, Bertrand Delacretaz
<bd...@apache.org> wrote:
> Would this be a problem is there was a way for this additional
> dependency to go away after startup, once the migration is done?

Yes, all the required code would go the oak-upgrade bundle that can be
dropped after the migration.

BR,

Jukka Zitting

Re: Migration without an embedded Jackrabbit

Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi,

On Fri, Oct 11, 2013 at 4:28 PM, Jukka Zitting <ju...@gmail.com> wrote:
> ...I've been thinking about the upgrade/migration code (oak-upgrade,
> OAK-458) over the past few days, and trying to figure out how we could
> achieve that without having to keep the full Jackrabbit 2.x codebase
> as dependency...

Would this be a problem is there was a way for this additional
dependency to go away after startup, once the migration is done?

A scenario like:

-Load an optional migration jar from a well-known location if Oak is
started on an existing Jackrabbit repository

-Drop these extra classes once the migration is done (worst case
requiring a restart)

would be acceptable IMO, if that makes the implementation easier that
might be something to consider.

-Bertrand