You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Robert Munteanu <ro...@apache.org> on 2015/07/08 15:27:29 UTC

A multiplexing implementation of the DocumentStore

Hi,

I am working on a prototype to multiplex multiple DocumentStore
instances behind a single DocumentStore. The prototype is advanced
enough to start a discussion on and also I have some bugs to track down
which would probably be much easier to explain by someone with more
knowledge of Oak internals.

== Use case and high-level approach ==

The scenario for this multiplexing is the following:

- multiple Oak instances configured using a DocumentNodeStore
- all DocumentNodeStore instances connect to the same physical backend,
e.g. a mongod/mongos instance
- each Oak instance needs a private area that is not shared with the
other instances ( e.g. /tmp )

The concept is similar to Unix filesystem mounts managed in /etc/fstab
. A 'root' store manages the whole repository, while at certain points
other sub-stores take over.

An example configuration can be:

/         <- root store
 /apps    <- sub-store 1
 /libs    <- sub-store 1
 /tmp     <- sub-store 2

== What works ==

I have created a proof-of-concept implementation [1]. It's probably not
as fast as it could be, but seems to work at the DocumentStore level.
The key piecese are:

- added a MultiplexingDocumentStore [2] which wraps two or more
DocumentStore instances
- allowed the MongoDocumentStore to prefix collection names [3]
- updated the DocumentMK.Builder [4] to allow configuring a MongoDB
backend with mounts

This works fine at the DocumentStore level, as shown by the
MultiplexingDocumentStoreTest [5].

== What does not work ==

I seem to have missed something as the implementation does not work as
expected at the DocumentNodeStore level. I have written a test case [6]
which creates and saves a Tree in a DocumentNodeStore backed by a
MultiplexingDocumentStore. 

A sub-store is mounted at /tmp, and I create two trees in my test:

- one at /content
- one at /tmp

The write succeeds, but when trying to retrieve the trees, the one at
/content is found, but the one at /tmp is not...

The MongoDB collections look to have the right data;

- nodes ( corresponding to the root store) holds 

{
        "_id" : "0:/",
        "_revisions" : {
                "r14e6da2e8e0-0-1" : "c",
                "r14e6da2f6df-0-1" : "c"
        },
        "_modified" : NumberLong(1436358470),
        "_deleted" : {
                "r14e6da2e8e0-0-1" : "false"
        },
        "_modCount" : NumberLong(4),
        "_lastRev" : {
                "r0-0-1" : "r14e6da2e8e0-0-1"
        },
        "_children" : true,
        "_commitRoot" : {

        }
}
{
        "_id" : "1:/content",
        "_modified" : NumberLong(1436358470),
        "_commitRoot" : {
                "r14e6da2f6df-0-1" : "0"
        },
        "_deleted" : {
                "r14e6da2f6df-0-1" : "false"
        },
        "_modCount" : NumberLong(1)
}

- private_nodes ( corresponding to the store mounted at /tmp ) holds

{
        "_id" : "1:/tmp",
        "_modified" : NumberLong(1436358470),
        "_commitRoot" : {
                "r14e6da2f6df-0-1" : "0"
        },
        "_deleted" : {
                "r14e6da2f6df-0-1" : "false"
        },
        "_modCount" : NumberLong(1)
}

== What is not expected to work now ==

A number of Oak subsystems - ACLs, Indexing, etc - need to be adapted
for this to be fully usable. This is acknowledged but needs to be
handled separately, after I get the basic implemetation right.

To wrap up the email, two questions:

1. What are your thoughts on the basic multiplexing implementation as
done in this prototype?
2. Do you have any hints on where I should start debugging the error
with the missing Tree in the DocumentNodeStore test [6]?

Thanks,

Robert


[1]: https://github.com/apache/jackrabbit
-oak/compare/apache:trunk...rombert:features/docstore
-multiplex?expand=1
[2]: https://github.com/apache/jackrabbit
-oak/compare/apache:trunk...rombert:features/docstore
-multiplex?expand=1#diff-2
[3]: https://github.com/apache/jackrabbit
-oak/compare/apache:trunk...rombert:features/docstore
-multiplex?expand=1#diff-3
[4]: https://github.com/apache/jackrabbit
-oak/compare/apache:trunk...rombert:features/docstore
-multiplex?expand=1#diff-1
[5]: https://github.com/apache/jackrabbit
-oak/compare/apache:trunk...rombert:features/docstore
-multiplex?expand=1#diff-5
[6]: https://github.com/apache/jackrabbit
-oak/compare/apache:trunk...rombert:features/docstore
-multiplex?expand=1#diff-6

Re: A multiplexing implementation of the DocumentStore

Posted by Robert Munteanu <ro...@apache.org>.
On Thu, 2015-07-09 at 11:36 +0200, Michael Dürig wrote:
> 
> On 9.7.15 9:05 , Marcel Reutegger wrote:
> > in short, the mounted trees must be entirely self contained.
> 
> ... or have a consensus about shared stuff. This is probably the way 
> to 
> go for node types as self contained doesn't work here.
> 
> Overall these are the same constraints as what we came up with when 
> we 
> discussed multiplexing on top of the node store API. Is there a way 
> we 
> can enforce such restrictions? The last thing we want is to rely on 
> the 
> client to adhere to them or risk a repository corruption otherwise.


We should and probably can. The question is where to place those
constraints.

Ideally, at the higher level we should not be aware of the constraints
of the lower layers.

At the lower layers we have no clue about the high-level concepts.

In the end we will probably enforce the restrictions at the higher
levels, although I'm curious if a better solution exists.

Robert

Re: A multiplexing implementation of the DocumentStore

Posted by Michael Dürig <md...@apache.org>.

On 9.7.15 9:05 , Marcel Reutegger wrote:
> in short, the mounted trees must be entirely self contained.

... or have a consensus about shared stuff. This is probably the way to 
go for node types as self contained doesn't work here.

Overall these are the same constraints as what we came up with when we 
discussed multiplexing on top of the node store API. Is there a way we 
can enforce such restrictions? The last thing we want is to rely on the 
client to adhere to them or risk a repository corruption otherwise.

Michael


Re: A multiplexing implementation of the DocumentStore

Posted by Robert Munteanu <ro...@apache.org>.
Hi Marcel,

On Thu, 2015-07-09 at 07:05 +0000, Marcel Reutegger wrote:
> Hi Robert
> 
> 
> > 1. What are your thoughts on the basic multiplexing implementation
> > as done in this prototype?
> 
> it's an interesting approach and would allow for storing data
> in backends optimized for certain usage patterns in sub trees.
> 
> as noted already by you, the big challenge is how to deal with
> consistency rules imposed by commit hooks and other subsystems
> when part of the tree is shared and another part is local.
> 
> so far my view on this topic is: this is possible, but only with
> a number of limitations. mounted trees must not contribute to
> indexes defined in the 'root' store. this implies, mounted trees
> must not contain referenceable nodes. this again implies, mounted
> trees must not contain versionable nodes, which are by definition
> referenceable. at the same time this also avoids the problem of
> the global version store.
> 
> the mounted trees must not have access control entries, because
> those would need to be reflected in the global persistent store.
> 
> in short, the mounted trees must be entirely self contained.

This is more or less fine with the investigation stage of the
multiplexing that I'm on. I'm sure I will find other issues as I
progress.

> 
> > 2. Do you have any hints on where I should start debugging the
> > error with the missing Tree in the DocumentNodeStore test [6]?
> 
> I didn't look at the code yet, but I assume the problem could be
> caused by the DocumentStore reference in NodeDocument.
> A NodeDocument keeps a reference to the DocumentStore where it
> was loaded from and uses this store to read other NodeDocuments,
> e.g. to find the commit status of changes in the current
> NodeDocument. This is probably the case here. The NodeDocument
> created below /tmp references a commit root document in the
> other DocumentStore, which cannot be accessed using the local
> DocumentStore.

Yup, that was it. I made a quick experiment to see if overriding the
documentStore passed to a new document fixes the problem and it does.

Thanks!

Robert

> 
> 
> Regards
>  Marcel
> 
> 
> On 08/07/15 15:27, "Robert Munteanu" wrote:
> 
> > Hi,
> > 
> > I am working on a prototype to multiplex multiple DocumentStore
> > instances behind a single DocumentStore. The prototype is advanced
> > enough to start a discussion on and also I have some bugs to track 
> > down
> > which would probably be much easier to explain by someone with more
> > knowledge of Oak internals.
> > 
> > == Use case and high-level approach ==
> > 
> > The scenario for this multiplexing is the following:
> > 
> > - multiple Oak instances configured using a DocumentNodeStore
> > - all DocumentNodeStore instances connect to the same physical 
> > backend,
> > e.g. a mongod/mongos instance
> > - each Oak instance needs a private area that is not shared with 
> > the
> > other instances ( e.g. /tmp )
> > 
> > The concept is similar to Unix filesystem mounts managed in 
> > /etc/fstab
> > . A 'root' store manages the whole repository, while at certain 
> > points
> > other sub-stores take over.
> > 
> > An example configuration can be:
> > 
> > /         <- root store
> > /apps    <- sub-store 1
> > /libs    <- sub-store 1
> > /tmp     <- sub-store 2
> > 
> > == What works ==
> > 
> > I have created a proof-of-concept implementation [1]. It's probably 
> > not
> > as fast as it could be, but seems to work at the DocumentStore 
> > level.
> > The key piecese are:
> > 
> > - added a MultiplexingDocumentStore [2] which wraps two or more
> > DocumentStore instances
> > - allowed the MongoDocumentStore to prefix collection names [3]
> > - updated the DocumentMK.Builder [4] to allow configuring a MongoDB
> > backend with mounts
> > 
> > This works fine at the DocumentStore level, as shown by the
> > MultiplexingDocumentStoreTest [5].
> > 
> > == What does not work ==
> > 
> > I seem to have missed something as the implementation does not work 
> > as
> > expected at the DocumentNodeStore level. I have written a test case 
> > [6]
> > which creates and saves a Tree in a DocumentNodeStore backed by a
> > MultiplexingDocumentStore.
> > 
> > A sub-store is mounted at /tmp, and I create two trees in my test:
> > 
> > - one at /content
> > - one at /tmp
> > 
> > The write succeeds, but when trying to retrieve the trees, the one 
> > at
> > /content is found, but the one at /tmp is not...
> > 
> > The MongoDB collections look to have the right data;
> > 
> > - nodes ( corresponding to the root store) holds
> > 
> > {
> >        "_id" : "0:/",
> >        "_revisions" : {
> >                "r14e6da2e8e0-0-1" : "c",
> >                "r14e6da2f6df-0-1" : "c"
> >        },
> >        "_modified" : NumberLong(1436358470),
> >        "_deleted" : {
> >                "r14e6da2e8e0-0-1" : "false"
> >        },
> >        "_modCount" : NumberLong(4),
> >        "_lastRev" : {
> >                "r0-0-1" : "r14e6da2e8e0-0-1"
> >        },
> >        "_children" : true,
> >        "_commitRoot" : {
> > 
> >        }
> > }
> > {
> >        "_id" : "1:/content",
> >        "_modified" : NumberLong(1436358470),
> >        "_commitRoot" : {
> >                "r14e6da2f6df-0-1" : "0"
> >        },
> >        "_deleted" : {
> >                "r14e6da2f6df-0-1" : "false"
> >        },
> >        "_modCount" : NumberLong(1)
> > }
> > 
> > - private_nodes ( corresponding to the store mounted at /tmp ) 
> > holds
> > 
> > {
> >        "_id" : "1:/tmp",
> >        "_modified" : NumberLong(1436358470),
> >        "_commitRoot" : {
> >                "r14e6da2f6df-0-1" : "0"
> >        },
> >        "_deleted" : {
> >                "r14e6da2f6df-0-1" : "false"
> >        },
> >        "_modCount" : NumberLong(1)
> > }
> > 
> > == What is not expected to work now ==
> > 
> > A number of Oak subsystems - ACLs, Indexing, etc - need to be 
> > adapted
> > for this to be fully usable. This is acknowledged but needs to be
> > handled separately, after I get the basic implemetation right.
> > 
> > To wrap up the email, two questions:
> > 
> > 1. What are your thoughts on the basic multiplexing implementation 
> > as
> > done in this prototype?
> > 2. Do you have any hints on where I should start debugging the 
> > error
> > with the missing Tree in the DocumentNodeStore test [6]?
> > 
> > Thanks,
> > 
> > Robert
> > 
> > 
> > [1]: https://github.com/apache/jackrabbit
> > -oak/compare/apache:trunk...rombert:features/docstore
> > -multiplex?expand=1
> > [2]: https://github.com/apache/jackrabbit
> > -oak/compare/apache:trunk...rombert:features/docstore
> > -multiplex?expand=1#diff-2
> > [3]: https://github.com/apache/jackrabbit
> > -oak/compare/apache:trunk...rombert:features/docstore
> > -multiplex?expand=1#diff-3
> > [4]: https://github.com/apache/jackrabbit
> > -oak/compare/apache:trunk...rombert:features/docstore
> > -multiplex?expand=1#diff-1
> > [5]: https://github.com/apache/jackrabbit
> > -oak/compare/apache:trunk...rombert:features/docstore
> > -multiplex?expand=1#diff-5
> > [6]: https://github.com/apache/jackrabbit
> > -oak/compare/apache:trunk...rombert:features/docstore
> > -multiplex?expand=1#diff-6
> 


Re: A multiplexing implementation of the DocumentStore

Posted by Marcel Reutegger <mr...@adobe.com>.
Hi Robert


> 1. What are your thoughts on the basic multiplexing implementation
> as done in this prototype?

it's an interesting approach and would allow for storing data
in backends optimized for certain usage patterns in sub trees.

as noted already by you, the big challenge is how to deal with
consistency rules imposed by commit hooks and other subsystems
when part of the tree is shared and another part is local.

so far my view on this topic is: this is possible, but only with
a number of limitations. mounted trees must not contribute to
indexes defined in the 'root' store. this implies, mounted trees
must not contain referenceable nodes. this again implies, mounted
trees must not contain versionable nodes, which are by definition
referenceable. at the same time this also avoids the problem of
the global version store.

the mounted trees must not have access control entries, because
those would need to be reflected in the global persistent store.

in short, the mounted trees must be entirely self contained.

> 2. Do you have any hints on where I should start debugging the
> error with the missing Tree in the DocumentNodeStore test [6]?

I didn't look at the code yet, but I assume the problem could be
caused by the DocumentStore reference in NodeDocument.
A NodeDocument keeps a reference to the DocumentStore where it
was loaded from and uses this store to read other NodeDocuments,
e.g. to find the commit status of changes in the current
NodeDocument. This is probably the case here. The NodeDocument
created below /tmp references a commit root document in the
other DocumentStore, which cannot be accessed using the local
DocumentStore.


Regards
 Marcel


On 08/07/15 15:27, "Robert Munteanu" wrote:

>Hi,
>
>I am working on a prototype to multiplex multiple DocumentStore
>instances behind a single DocumentStore. The prototype is advanced
>enough to start a discussion on and also I have some bugs to track down
>which would probably be much easier to explain by someone with more
>knowledge of Oak internals.
>
>== Use case and high-level approach ==
>
>The scenario for this multiplexing is the following:
>
>- multiple Oak instances configured using a DocumentNodeStore
>- all DocumentNodeStore instances connect to the same physical backend,
>e.g. a mongod/mongos instance
>- each Oak instance needs a private area that is not shared with the
>other instances ( e.g. /tmp )
>
>The concept is similar to Unix filesystem mounts managed in /etc/fstab
>. A 'root' store manages the whole repository, while at certain points
>other sub-stores take over.
>
>An example configuration can be:
>
>/         <- root store
> /apps    <- sub-store 1
> /libs    <- sub-store 1
> /tmp     <- sub-store 2
>
>== What works ==
>
>I have created a proof-of-concept implementation [1]. It's probably not
>as fast as it could be, but seems to work at the DocumentStore level.
>The key piecese are:
>
>- added a MultiplexingDocumentStore [2] which wraps two or more
>DocumentStore instances
>- allowed the MongoDocumentStore to prefix collection names [3]
>- updated the DocumentMK.Builder [4] to allow configuring a MongoDB
>backend with mounts
>
>This works fine at the DocumentStore level, as shown by the
>MultiplexingDocumentStoreTest [5].
>
>== What does not work ==
>
>I seem to have missed something as the implementation does not work as
>expected at the DocumentNodeStore level. I have written a test case [6]
>which creates and saves a Tree in a DocumentNodeStore backed by a
>MultiplexingDocumentStore.
>
>A sub-store is mounted at /tmp, and I create two trees in my test:
>
>- one at /content
>- one at /tmp
>
>The write succeeds, but when trying to retrieve the trees, the one at
>/content is found, but the one at /tmp is not...
>
>The MongoDB collections look to have the right data;
>
>- nodes ( corresponding to the root store) holds
>
>{
>        "_id" : "0:/",
>        "_revisions" : {
>                "r14e6da2e8e0-0-1" : "c",
>                "r14e6da2f6df-0-1" : "c"
>        },
>        "_modified" : NumberLong(1436358470),
>        "_deleted" : {
>                "r14e6da2e8e0-0-1" : "false"
>        },
>        "_modCount" : NumberLong(4),
>        "_lastRev" : {
>                "r0-0-1" : "r14e6da2e8e0-0-1"
>        },
>        "_children" : true,
>        "_commitRoot" : {
>
>        }
>}
>{
>        "_id" : "1:/content",
>        "_modified" : NumberLong(1436358470),
>        "_commitRoot" : {
>                "r14e6da2f6df-0-1" : "0"
>        },
>        "_deleted" : {
>                "r14e6da2f6df-0-1" : "false"
>        },
>        "_modCount" : NumberLong(1)
>}
>
>- private_nodes ( corresponding to the store mounted at /tmp ) holds
>
>{
>        "_id" : "1:/tmp",
>        "_modified" : NumberLong(1436358470),
>        "_commitRoot" : {
>                "r14e6da2f6df-0-1" : "0"
>        },
>        "_deleted" : {
>                "r14e6da2f6df-0-1" : "false"
>        },
>        "_modCount" : NumberLong(1)
>}
>
>== What is not expected to work now ==
>
>A number of Oak subsystems - ACLs, Indexing, etc - need to be adapted
>for this to be fully usable. This is acknowledged but needs to be
>handled separately, after I get the basic implemetation right.
>
>To wrap up the email, two questions:
>
>1. What are your thoughts on the basic multiplexing implementation as
>done in this prototype?
>2. Do you have any hints on where I should start debugging the error
>with the missing Tree in the DocumentNodeStore test [6]?
>
>Thanks,
>
>Robert
>
>
>[1]: https://github.com/apache/jackrabbit
>-oak/compare/apache:trunk...rombert:features/docstore
>-multiplex?expand=1
>[2]: https://github.com/apache/jackrabbit
>-oak/compare/apache:trunk...rombert:features/docstore
>-multiplex?expand=1#diff-2
>[3]: https://github.com/apache/jackrabbit
>-oak/compare/apache:trunk...rombert:features/docstore
>-multiplex?expand=1#diff-3
>[4]: https://github.com/apache/jackrabbit
>-oak/compare/apache:trunk...rombert:features/docstore
>-multiplex?expand=1#diff-1
>[5]: https://github.com/apache/jackrabbit
>-oak/compare/apache:trunk...rombert:features/docstore
>-multiplex?expand=1#diff-5
>[6]: https://github.com/apache/jackrabbit
>-oak/compare/apache:trunk...rombert:features/docstore
>-multiplex?expand=1#diff-6