You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Michael Dürig <md...@apache.org> on 2012/04/01 17:12:06 UTC

Re: oak-api and move operations


On 30.3.12 14:45, Jukka Zitting wrote:
> Hi,
>
> On Fri, Mar 30, 2012 at 2:25 PM, Stefan Guggisberg
> <st...@gmail.com>  wrote:
>> On Fri, Mar 30, 2012 at 2:18 PM, Jukka Zitting<ju...@gmail.com>  wrote:
>>> Exactly. IMHO we should adjust the MK interface to support this. The
>>> solution should also address handling of large imports.
>>
>> sorry, i can't follow you here. could you please elaborate?
>
> The main trouble here is that oak-core currently needs to keep the
> full transient space in memory (or in some other custom non-MK
> storage) and then serialize it all to a single JSOP string during
> save().
>
> It would be much more convenient if the changes could be incrementally
> sent to the MicroKernel and tracked as a standard content tree just
> like any other content. This way we wouldn't need to come up with a
> separate storage mechanism in oak-core or use temporary subtrees in
> the main repository to work around this limitation.
>
> One way of dealing with this on the MK level would be to introduce the
> concept of "private branches" that are only visible to a single client
> until explicitly merged back to the main repository. A quick draft of
> what this could look like:
>
>      String addLotsOfData(MicroKernel mk) {
>          String baseRevision = mk.getHeadRevision();
>          String branchRevision = mk.branch(baseRevision);
>          for (int i = 0; i<  1000000; i++) {
>              branchRevision = mk.commit(
>                  "/", "+\"node" + i + "\":{}", branchRevision, null);
>          }
>          return mk.merge(branchRevision, baseRevision);
>      }
>

This seems like a nice approach to me. I think we should adopt it for 
the higher level API also. See revision 1308129.

Michael

> BR,
>
> Jukka Zitting

Re: oak-api and move operations

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>This is exactly the same operation as two MicroKernel cluster nodes
>will need to perform when syncing concurrent commits.

Yes, only that synching concurrent commits can be implemented on a higher
layer (above indexing). So it's not quite the same.

>AFAICT it should be possible to do this without knowledge of higher
>level semantics 

I personally don't think that's the case.

>as long as possible merge conflicts are recorded and
>left to higher level components to resolve as they see fit.

That would complicate things quite a bit.

>this is something that we in any case need to do for
>clustering, so I don't see the explicit merge() method adding any
>extra work for us.

I would like to wait until we have a clearer picture how clustering is
implemented.

>>Also, some MicroKernel implementations (for example a MongoDB
>> MicroKernel implementation) might not be able to support branching
>> and merging.
>
>They need to if they're going to support concurrent writes on more
>than one cluster node.

It sounds like you already have some particular way to implement a
clustered MongoDB MicroKernel in mind. If yes, could you share your ideas?

Regards,
Thomas

Re: oak-api and move operations

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Tue, Apr 3, 2012 at 12:02 PM, Thomas Mueller <mu...@adobe.com> wrote:
> This looks nice, but I'm not sure whether it would work. If merging is the
> responsibility of all MicroKernel implementations, then possibly quite a
> lot of business logic would have to be implemented within each MicroKernel
> implementations (separately).

This is exactly the same operation as two MicroKernel cluster nodes
will need to perform when syncing concurrent commits.

AFAICT it should be possible to do this without knowledge of higher
level semantics as long as possible merge conflicts are recorded and
left to higher level components to resolve as they see fit. Obviously
this is an area that we haven't yet looked at in any level of detail,
so there's probably a lot of work ahead of us on this. But as
mentioned above, this is something that we in any case need to do for
clustering, so I don't see the explicit merge() method adding any
extra work for us.

> Also, some MicroKernel implementations (for example a MongoDB
> MicroKernel implementation) might not be able to support branching
> and merging.

They need to if they're going to support concurrent writes on more
than one cluster node.

BR,

Jukka Zitting

Re: oak-api and move operations

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Tue, Apr 3, 2012 at 12:08 PM, Angela Schreiber <an...@adobe.com> wrote:
> well... looking at the current oak-jcr i still see
> revision = microkernel.commit("", changeLog.toJsop(), revision, "");
>
> this means to me that we still don't have a clear separation
> of the different layer as we discussed it multiple times in the
> past.

Ah, I see your point and totally agree.

My intention is not to expose this directly to the oak-jcr level. It
needs to work through the Oak API as exposed by oak-core.

BR,

Jukka Zitting


> if anyone else that the SPI layer has access to the MK this is just
> a completely different setup that requires fundamental changes in
> the way we envision the jr3 security concept.
>
> but maybe i just got sidetracked by the current code in relation
> to that discussion ... i don't mind having extra stuff on the mk-api
> if it fits our needs
>
> kind regards
> angela

Re: oak-api and move operations

Posted by Michael Dürig <md...@apache.org>.


On 3.4.12 11:08, Angela Schreiber wrote:
> hi jukka
>
>> On Tue, Apr 3, 2012 at 11:23 AM, Angela Schreiber<an...@adobe.com>
>> wrote:
>>> but please be aware that we need to make sure that we need
>>> to have a separate layer in place that enforces authorization
>>> and prevents direct write operations on the MK from higher
>>> levels... or the other way round: if we expose the MK to
>>> higher levels we have to move both the complete authentication and
>>> authorization process on the MK layer, which would look quite
>>> wrong to me.
>>
>> The "private branch" concept is just that, "private". Anything written
>> to such a branch is not made visible to any other clients, so there
>> should be no need to enforce access controls on it.
>
> well... looking at the current oak-jcr i still see
> revision = microkernel.commit("", changeLog.toJsop(), revision, "");
>
> this means to me that we still don't have a clear separation
> of the different layer as we discussed it multiple times in the
> past.

We are just not there yet. I'm pretty confident we can replace all 
places which directly access the Microkernel with the new API soon.

Michael

>
> if anyone else that the SPI layer has access to the MK this is just
> a completely different setup that requires fundamental changes in
> the way we envision the jr3 security concept.
>
> but maybe i just got sidetracked by the current code in relation
> to that discussion ... i don't mind having extra stuff on the mk-api
> if it fits our needs
>
> kind regards
> angela

Re: oak-api and move operations

Posted by Angela Schreiber <an...@adobe.com>.

hi jukka

> On Tue, Apr 3, 2012 at 11:23 AM, Angela Schreiber<an...@adobe.com>  wrote:
>> but please be aware that we need to make sure that we need
>> to have a separate layer in place that enforces authorization
>> and prevents direct write operations on the MK from higher
>> levels... or the other way round: if we expose the MK to
>> higher levels we have to move both the complete authentication and
>> authorization process on the MK layer, which would look quite
>> wrong to me.
>
> The "private branch" concept is just that, "private". Anything written
> to such a branch is not made visible to any other clients, so there
> should be no need to enforce access controls on it.

well... looking at the current oak-jcr i still see
revision = microkernel.commit("", changeLog.toJsop(), revision, "");

this means to me that we still don't have a clear separation
of the different layer as we discussed it multiple times in the
past.

if anyone else that the SPI layer has access to the MK this is just
a completely different setup that requires fundamental changes in
the way we envision the jr3 security concept.

but maybe i just got sidetracked by the current code in relation
to that discussion ... i don't mind having extra stuff on the mk-api
if it fits our needs

kind regards
angela

Re: oak-api and move operations

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Tue, Apr 3, 2012 at 11:23 AM, Angela Schreiber <an...@adobe.com> wrote:
> but please be aware that we need to make sure that we need
> to have a separate layer in place that enforces authorization
> and prevents direct write operations on the MK from higher
> levels... or the other way round: if we expose the MK to
> higher levels we have to move both the complete authentication and
> authorization process on the MK layer, which would look quite
> wrong to me.

The "private branch" concept is just that, "private". Anything written
to such a branch is not made visible to any other clients, so there
should be no need to enforce access controls on it.

So far the only place where we do need to enforce write access controls is:

    * Before calling MicroKernel.commit()

With the branch concept as proposed, this would change to:

   * Before calling MicroKernel.commit() on a non-branch revision
   * Before calling MicroKernel.merge()

BR,

Jukka Zitting

Re: oak-api and move operations

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>Tom, I think you are still miss reading things. This has nothing to do
>with transactions, 2 phase commit, isolation, etc. It is just a way how
>transient changes can be stored (written ahead) instead of kept in memory.

As far as I know, we want to support MicroKernel implementations that
store data in a relational or in a NoSQL database (for example MongoDB). I
don't know a relational or NoSQL database that natively supports branch()
and merge(). One way to support branch() on top of a relational database
is to use a (long running) transaction. It's true, probably 2 phase commit
is not required. For NoSQL databases, I don't know how it could be
implemented easily.

Regards,
Thomas

Re: oak-api and move operations

Posted by Michael Dürig <md...@apache.org>.


On 3.4.12 11:35, Thomas Mueller wrote:
> Hi,
>
>>>> There is not more merging
>>>> to be done by the Microkernel as it already does.
>>>
>>> Currently, commit calls can be easily synchronized in the MicroKernel
>>> implementation. If you want to achieve a similar isolation with branch()
>>> and merge(), then a branch() would block other commits (and branch
>>> calls)
>>> until the merge() call.
>>
>> I can't follow you here. Why would that be necessary?
>
> To achieve the same isolation as we have now.
>
>> That would work I guess. One remaining problem is
>>> then the ability to "undo" (rollback) a branch. I guess there should be
>>> a
>>> method for that ("revert").
>>
>> Why do we need a rollback?
>
> To undo a branch, so that others can commit.
>
>> Another remaining problem is that each
>>> MicroKernel implementation would need to support transactions that span
>>> multiple calls (similar to 2-phase commits or transaction savepoints).
>>
>> Again why?
>
> Branching might not be easy to support in the MicroKernel implementation.
> If branching is actually implemented as "start a transaction", then a
> commit (within a branch) could be implemented as a savepoint. A commit in
> a branch wouldn't result in transaction commit.

Tom, I think you are still miss reading things. This has nothing to do 
with transactions, 2 phase commit, isolation, etc. It is just a way how 
transient changes can be stored (written ahead) instead of kept in memory.

Michael

Re: oak-api and move operations

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>>>There is not more merging
>>> to be done by the Microkernel as it already does.
>>
>> Currently, commit calls can be easily synchronized in the MicroKernel
>> implementation. If you want to achieve a similar isolation with branch()
>> and merge(), then a branch() would block other commits (and branch
>>calls)
>> until the merge() call.
>
>I can't follow you here. Why would that be necessary?

To achieve the same isolation as we have now.

>That would work I guess. One remaining problem is
>> then the ability to "undo" (rollback) a branch. I guess there should be
>>a
>> method for that ("revert").
>
>Why do we need a rollback?

To undo a branch, so that others can commit.

>Another remaining problem is that each
>> MicroKernel implementation would need to support transactions that span
>> multiple calls (similar to 2-phase commits or transaction savepoints).
>
>Again why?

Branching might not be easy to support in the MicroKernel implementation.
If branching is actually implemented as "start a transaction", then a
commit (within a branch) could be implemented as a savepoint. A commit in
a branch wouldn't result in transaction commit.

Regards,
Thomas

Re: oak-api and move operations

Posted by Michael Dürig <md...@apache.org>.


On 3.4.12 11:20, Thomas Mueller wrote:
> Hi,
>
>> I think there is some misunderstanding here. There is not more merging
>> to be done by the Microkernel as it already does. Currently transient
>> changes are kept in memory and passed to the Microkernel in a single
>> commit call passing along a possible prohibitively large json diff. With
>> the "private branch" approach, the transient changes are simple written
>> ahead into a private branch and only on merge are these applied to the
>> main branch. Think of it like a write ahead of the json diff to the
>> Microkernel (although the implementation might differ).
>
> Currently, commit calls can be easily synchronized in the MicroKernel
> implementation. If you want to achieve a similar isolation with branch()
> and merge(), then a branch() would block other commits (and branch calls)
> until the merge() call.

I can't follow you here. Why would that be necessary?

That would work I guess. One remaining problem is
> then the ability to "undo" (rollback) a branch. I guess there should be a
> method for that ("revert").

Why do we need a rollback? Access to the previous revision is still 
possible through the previous revision id.

Another remaining problem is that each
> MicroKernel implementation would need to support transactions that span
> multiple calls (similar to 2-phase commits or transaction savepoints).

Again why?

Michael

>
> Regards,
> Thomas
>

Re: oak-api and move operations

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>I think there is some misunderstanding here. There is not more merging
>to be done by the Microkernel as it already does. Currently transient
>changes are kept in memory and passed to the Microkernel in a single
>commit call passing along a possible prohibitively large json diff. With
>the "private branch" approach, the transient changes are simple written
>ahead into a private branch and only on merge are these applied to the
>main branch. Think of it like a write ahead of the json diff to the
>Microkernel (although the implementation might differ).

Currently, commit calls can be easily synchronized in the MicroKernel
implementation. If you want to achieve a similar isolation with branch()
and merge(), then a branch() would block other commits (and branch calls)
until the merge() call. That would work I guess. One remaining problem is
then the ability to "undo" (rollback) a branch. I guess there should be a
method for that ("revert"). Another remaining problem is that each
MicroKernel implementation would need to support transactions that span
multiple calls (similar to 2-phase commits or transaction savepoints).

Regards,
Thomas

Re: oak-api and move operations

Posted by Michael Dürig <md...@apache.org>.


On 3.4.12 11:02, Thomas Mueller wrote:
> Hi,
>
>> String branchRevision = mk.branch(baseRevision);
>> mk.merge(branchRevision, baseRevision);
>
> This looks nice, but I'm not sure whether it would work. If merging is the
> responsibility of all MicroKernel implementations, then possibly quite a
> lot of business logic would have to be implemented within each MicroKernel
> implementations (separately).


I think there is some misunderstanding here. There is not more merging 
to be done by the Microkernel as it already does. Currently transient 
changes are kept in memory and passed to the Microkernel in a single 
commit call passing along a possible prohibitively large json diff. With 
the "private branch" approach, the transient changes are simple written 
ahead into a private branch and only on merge are these applied to the 
main branch. Think of it like a write ahead of the json diff to the 
Microkernel (although the implementation might differ).

Michael

For example indexing: if two sessions add
> new nodes to branches, and then merge the changes, then the index could
> get corrupt if the added nodes contain the same index values. For
> versioning, similar problems might arise (depending on how versioning is
> implemented). Also, some MicroKernel implementations (for example a
> MongoDB MicroKernel implementation) might not be able to support branching
> and merging.
>
> I'm against implementing this feature now without careful investigation of
> all possible problems and consequences.
>
> Relational databases as an example don't support merging, even if they
> support MVCC. For PostgreSQL, if you try to update the same row from
> within two connections and transactions, then the second connection is
> blocked until the first connection commits or rolls back the changes.
> Other databases work in the same way. If merging changes would be simple,
> then I'm sure relational databases would support it.
>
> Regards,
> Thomas
>

Re: oak-api and move operations

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>String branchRevision = mk.branch(baseRevision);
>mk.merge(branchRevision, baseRevision);

This looks nice, but I'm not sure whether it would work. If merging is the
responsibility of all MicroKernel implementations, then possibly quite a
lot of business logic would have to be implemented within each MicroKernel
implementations (separately). For example indexing: if two sessions add
new nodes to branches, and then merge the changes, then the index could
get corrupt if the added nodes contain the same index values. For
versioning, similar problems might arise (depending on how versioning is
implemented). Also, some MicroKernel implementations (for example a
MongoDB MicroKernel implementation) might not be able to support branching
and merging.

I'm against implementing this feature now without careful investigation of
all possible problems and consequences.

Relational databases as an example don't support merging, even if they
support MVCC. For PostgreSQL, if you try to update the same row from
within two connections and transactions, then the second connection is
blocked until the first connection commits or rolls back the changes.
Other databases work in the same way. If merging changes would be simple,
then I'm sure relational databases would support it.

Regards,
Thomas

Re: oak-api and move operations

Posted by Angela Schreiber <an...@adobe.com>.

hi

>>       String addLotsOfData(MicroKernel mk) {
>>           String baseRevision = mk.getHeadRevision();
>>           String branchRevision = mk.branch(baseRevision);
>>           for (int i = 0; i<   1000000; i++) {
>>               branchRevision = mk.commit(
>>                   "/", "+\"node" + i + "\":{}", branchRevision, null);
>>           }
>>           return mk.merge(branchRevision, baseRevision);
>>       }
>>
>
> This seems like a nice approach to me. I think we should adopt it for
> the higher level API also. See revision 1308129.

but please be aware that we need to make sure that we need
to have a separate layer in place that enforces authorization
and prevents direct write operations on the MK from higher
levels... or the other way round: if we expose the MK to
higher levels we have to move both the complete authentication and
authorization process on the MK layer, which would look quite
wrong to me.

angela