You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Robert Munteanu <ro...@apache.org> on 2016/03/28 16:29:20 UTC

Extracting subpaths from a DocumentStore repo

Hi,

In the context of the Multiplexing DocumentStore work for Oak [1] I'm
going to work on a tool to extract a few subpaths from a DS repository
which can then be plugged in a different repository.

The objective is to generate a 'private mount' which can be used
together with a different 'global repository'. For instance:

- create a repository (R1) , populate /foo and /bar with some content
- extract data for /foo and /bar from R1
- pre-populate a DS 'storage area' ( MongoDB collection or RDB table )
with the data extracted above
- configure a new repository (R2) to mount /foo and /bar with the data
from above

The main inconvenient is that many times commmits which affect /foo and
/bar are have the commit root at '/', so the collections extracted
using something like oak-run.js' printMongoExportCommand will not work.

I have two possible ways of doing this, so before experimenting I'd
like to discuss with you whether these are valid ways of approaching
the problem or if there's something better:

1) Manually create a new commit for each sub-path ( e.g. 1 for /foo and
1 for /bar ) and re-write the commit references for each node document
so that they point to the new commits

2) For each sub-path, copy the nodes into a temporary staging area (
e.g. /foo -> /staging/foo, or even /:staging/foo ), export the data,
and then manually alter the references.

Approach 1) is probably going to get me in trouble with the
DocumentNodeStore caches, so the Oak instance might not be usable after
I perform these changes ( which can be fine, since I'm going to spin it
up just for that ).

Approach 2) might get me branch commits, which are always rooted at the
'/', which invalidates the approach. Also, path find/replace sounds
error prone.

Any ideas how to best approach this?

Thanks,

Robert

[1]: https://issues.apache.org/jira/browse/OAK-3401

Re: Extracting subpaths from a DocumentStore repo

Posted by Robert Munteanu <ro...@apache.org>.
On Mon, 2016-03-28 at 20:18 +0530, Vikas Saurabh wrote:
> > 1) Manually create a new commit for each sub-path ( e.g. 1 for /foo
> and
> > 1 for /bar ) and re-write the commit references for each node
> document
> > so that they point to the new commits
> 
> So, I'd like approach 1 except that we don't create new commits..
> just
> sew the current state to break commit root dependency

Makes sense, thank you.

Robert

Re: Extracting subpaths from a DocumentStore repo

Posted by Vikas Saurabh <vi...@gmail.com>.
Hi Robert,

> The main inconvenient is that many times commmits which affect /foo and
> /bar are have the commit root at '/',

commit roots are required only at multi doc commit time (to track 2
phase commit logic's 'lock' document). So, node state for following 2
repo states is equiavelent (for a committed revision R1):
1. "/node1"->_commitRoot.R1="0" AND "/"->_revision.R1="c"
2. "/node1"->_revision.R1="c"

Since, you'd be working at persistence level, you can always map
_commitRoot.RX=N to _revision.RX="c" where RX is validated to be
already committed.

Although, this rewrite won't change document node state but it indeed
changes document state... but that should be OK for your case and
document cache is an in-memory cache while node state cache is backed
by pers. cache.

> so the collections extracted
> using something like oak-run.js' printMongoExportCommand will not work.

'printMongoExportCommand' currently is really dumb... in the sense
that it doesn't care about revisions etc. For a given path, it simply
gets it path elements and prints a mongo export command for each of
those paths (ie input path and its ancestors) as well as their split
doc counterparts. Since output of such exports are usually used by us
for debug purposes (and hence we'd have the idea if multiplexing doc
store is in play or not), we might simply document the expected
behavior and not care much about exactness of export. Otoh, we might
want to add some intelligence about multiplexing... but, I think that
should be towards lower priority.

> 1) Manually create a new commit for each sub-path ( e.g. 1 for /foo and
> 1 for /bar ) and re-write the commit references for each node document
> so that they point to the new commits

So, I'd like approach 1 except that we don't create new commits.. just
sew the current state to break commit root dependency

> Approach 1) is probably going to get me in trouble with the
> DocumentNodeStore caches, so the Oak instance might not be usable after
> I perform these changes ( which can be fine, since I'm going to spin it
> up just for that ).

See above (part 1 for cache)

Thanks,
Vikas

Re: Extracting subpaths from a DocumentStore repo

Posted by Robert Munteanu <ro...@apache.org>.
On Tue, 2016-03-29 at 08:16 +0000, Marcel Reutegger wrote:
> Hi,
> 
> as indicated already by Vikas, my recommendation is also
> to rewrite the documents. I'm doing something similar
> for OAK-3712. See e.g.:
> https://github.com/mreutegg/jackrabbit-oak/blob/OAK-3712/oak-core/src
> /main/java/org/apache/jackrabbit/oak/plugins/document/NodeDocumentSwe
> eper.java
> 
> The method committedBranch() turns a branch commit into
> a document local change. The commit root is on the document
> itself and self contained.

Looks useful as a starting point, thank you.

Robert

Re: Extracting subpaths from a DocumentStore repo

Posted by Marcel Reutegger <mr...@adobe.com>.
Hi,

as indicated already by Vikas, my recommendation is also
to rewrite the documents. I'm doing something similar
for OAK-3712. See e.g.:
https://github.com/mreutegg/jackrabbit-oak/blob/OAK-3712/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/NodeDocumentSweeper.java

The method committedBranch() turns a branch commit into
a document local change. The commit root is on the document
itself and self contained.

Regards
 Marcel

On 28/03/16 16:29, "Robert Munteanu" wrote:

Hi,

In the context of the Multiplexing DocumentStore work for Oak [1] I'm
going to work on a tool to extract a few subpaths from a DS repository
which can then be plugged in a different repository.

The objective is to generate a 'private mount' which can be used
together with a different 'global repository'. For instance:

- create a repository (R1) , populate /foo and /bar with some content
- extract data for /foo and /bar from R1
- pre-populate a DS 'storage area' ( MongoDB collection or RDB table )
with the data extracted above
- configure a new repository (R2) to mount /foo and /bar with the data
from above

The main inconvenient is that many times commmits which affect /foo and
/bar are have the commit root at '/', so the collections extracted
using something like oak-run.js' printMongoExportCommand will not work.

I have two possible ways of doing this, so before experimenting I'd
like to discuss with you whether these are valid ways of approaching
the problem or if there's something better:

1) Manually create a new commit for each sub-path ( e.g. 1 for /foo and
1 for /bar ) and re-write the commit references for each node document
so that they point to the new commits

2) For each sub-path, copy the nodes into a temporary staging area (
e.g. /foo -> /staging/foo, or even /:staging/foo ), export the data,
and then manually alter the references.

Approach 1) is probably going to get me in trouble with the
DocumentNodeStore caches, so the Oak instance might not be usable after
I perform these changes ( which can be fine, since I'm going to spin it
up just for that ).

Approach 2) might get me branch commits, which are always rooted at the
'/', which invalidates the approach. Also, path find/replace sounds
error prone.

Any ideas how to best approach this?

Thanks,

Robert

[1]: https://issues.apache.org/jira/browse/OAK-3401


Re: Extracting subpaths from a DocumentStore repo

Posted by Robert Munteanu <ro...@apache.org>.
On Tue, 2016-03-29 at 15:35 +0530, Chetan Mehrotra wrote:
> Hi Robert,
> 
> On Mon, Mar 28, 2016 at 7:59 PM, Robert Munteanu <ro...@apache.org>
> wrote:
> > 
> > - create a repository (R1) , populate /foo and /bar with some
> > content
> > - extract data for /foo and /bar from R1
> > - pre-populate a DS 'storage area' ( MongoDB collection or RDB
> > table )
> > with the data extracted above
> > - configure a new repository (R2) to mount /foo and /bar with the
> > data
> > from above
> Instead of relying on DocumentStore API for "cloning" certain path it
> might be easier to use Repository Sidegrade [1] sort of logic which
> works at NodeState level. In that case you would not need to rely on
> Document details

I'll take a closer look at the sidegrade approach and see if it's
something I can use. I'll probably try 'mending' the commit roots first
though, it looks simpler.

Robert

Re: Extracting subpaths from a DocumentStore repo

Posted by Chetan Mehrotra <ch...@gmail.com>.
Hi Robert,

On Mon, Mar 28, 2016 at 7:59 PM, Robert Munteanu <ro...@apache.org> wrote:
> - create a repository (R1) , populate /foo and /bar with some content
> - extract data for /foo and /bar from R1
> - pre-populate a DS 'storage area' ( MongoDB collection or RDB table )
> with the data extracted above
> - configure a new repository (R2) to mount /foo and /bar with the data
> from above

Instead of relying on DocumentStore API for "cloning" certain path it
might be easier to use Repository Sidegrade [1] sort of logic which
works at NodeState level. In that case you would not need to rely on
Document details

Chetan Mehrotra
[1] https://jackrabbit.apache.org/oak/docs/migration.html