You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2012/03/19 19:11:59 UTC

MicroKernel API vs. protocol

Hi,

To help clarify the MK API I think it would be useful for us to
distinguish between the API as such and a potential related network
protocol used for accessing a remote MK deployment:

    http://people.apache.org/~jukka/2012/oak-mk-protocol.png

The MicroKernel interface as currently defined has many features of a
network protocol. For example all argument and return values are
serialized and the filter parameter was introduced to reduce the
amount of information that needs to pass across the interface.

I think we need to question this design since dealing directly with a
"network protocol" -like API in oak-core will be quite cumbersome and
we'll in any case need to implement a separate wrapper layer on top of
it to hide most of the details (JSON formatting, blob streaming, etc.)
that aren't relevant to higher level functionality.

So I think it would make more sense to rather redefine the MicroKernel
interface in terms of higher level constructs that abstract away the
protocol-level details. And to put the protocol-level bits (formatting
of diffs, etc.) into an actual protocol definition instead of a Java
interface. That protocol can then be implemented directly by a remote
MK implementation and consumed by a simple protocol binding for the
Java interface.

As a concrete example of what this could mean is the getNodes() method:

    String getNodes(String path, String revision, int depth, long
offset, int count, String filter)

The last four arguments of this method are only relevant in terms of
serialization. A more expressive version of the method could be:

    NodeState getNodeState(String path, String revision)

Or possibly even:

    NodeState getRootNodeState(String revision)

WDYT?

BR,

Jukka Zitting

Re: MicroKernel API vs. protocol

Posted by Julian Reschke <ju...@gmx.de>.

On 2012-03-20 10:57, Stefan Guggisberg wrote:
> On Mon, Mar 19, 2012 at 7:11 PM, Jukka Zitting<ju...@gmail.com>  wrote:
>> Hi,
>>
>> To help clarify the MK API I think it would be useful for us to
>> distinguish between the API as such and a potential related network
>> protocol used for accessing a remote MK deployment:
>>
>>     http://people.apache.org/~jukka/2012/oak-mk-protocol.png
>>
>> The MicroKernel interface as currently defined has many features of a
>> network protocol. For example all argument and return values are
>> serialized and the filter parameter was introduced to reduce the
>> amount of information that needs to pass across the interface.
>>
>> I think we need to question this design since dealing directly with a
>> "network protocol" -like API in oak-core will be quite cumbersome and
>> we'll in any case need to implement a separate wrapper layer on top of
>> it to hide most of the details (JSON formatting, blob streaming, etc.)
>> that aren't relevant to higher level functionality.
>
> i agree that it might make sense to implement a wrapper in the
> mk api consumer side, as long as we keep the current low-level
> api.

So we have a wrapper, providing a slighty stronger typed API, have that 
produce JSON/JSOP messages, and feed those into an implementation that 
parses them again?

Can you elaborate on how exactly this is better then having it the other 
way around?

/me confused

>> So I think it would make more sense to rather redefine the MicroKernel
>> interface in terms of higher level constructs that abstract away the
>> protocol-level details. And to put the protocol-level bits (formatting
>> of diffs, etc.) into an actual protocol definition instead of a Java
>> interface. That protocol can then be implemented directly by a remote
>> MK implementation and consumed by a simple protocol binding for the
>> Java interface.
>
> i don't agree. IMO there's nothing wrong with the current api. it's
> intentionally
> low-level. just because it's very straight-forward to remote IMO that

IMHO it only looks low-level, because it hides all the complexity in the 
string format for commit(), which, btw, is not defined anywhere in the 
API yet.

> doesn't imply
> that it should be spec'ed as a protocol instead of an api. it's light-weight
> (very few methods) and relatively easy to implement. while i am certainly aware
> that it's controversial (non-oo, string-based etc) i've not seen convincing
> technical arguments so far. IMO it's rather a question of personal preferences.

I think the "unnecessary serialization/parsing" argument is a strong 
one, and not only a matter of personal preferences.

That being said; I'm actually *very* interested in an efficient 
protocol. But that shouldn't make the implementation of JCR on top of 
unnecessary complex.

If the data model we have is essentially a JavaScript object tree, what 
would be wrong with exchanging Java maps (containing strings, numbers, 
and nested maps)?

> i am absolutely sure that we should allow for different mk implementations.

Yes.

> it doesn't  make sense to have 3rd party implementations implement a
> protocol instead of a single straight-forward api.
>
> therefore, -1 for replacing the current string-based api.

I'm not sure how you come to that conclusion.

Third party implementations can implement the API, and re-use a protocol 
layer on top of it.

Or they could implement the protocol directly, in which case the caller 
would need to access them though the protocol, even when on the same 
machine.

Best regards, Julian

Re: MicroKernel API vs. protocol

Posted by Michael Dürig <md...@apache.org>.

On 20.3.12 10:57, Stefan Guggisberg wrote:
> On Mon, Mar 19, 2012 at 7:11 PM, Jukka Zitting<ju...@gmail.com>  wrote:
>> Hi,
>>
>> To help clarify the MK API I think it would be useful for us to
>> distinguish between the API as such and a potential related network
>> protocol used for accessing a remote MK deployment:
>>
>>     http://people.apache.org/~jukka/2012/oak-mk-protocol.png
>>
>> The MicroKernel interface as currently defined has many features of a
>> network protocol. For example all argument and return values are
>> serialized and the filter parameter was introduced to reduce the
>> amount of information that needs to pass across the interface.
>>
>> I think we need to question this design since dealing directly with a
>> "network protocol" -like API in oak-core will be quite cumbersome and
>> we'll in any case need to implement a separate wrapper layer on top of
>> it to hide most of the details (JSON formatting, blob streaming, etc.)
>> that aren't relevant to higher level functionality.
>
> i agree that it might make sense to implement a wrapper in the
> mk api consumer side, as long as we keep the current low-level
> api.

It would actually make sense the other way around: to have a strongly 
typed API with protocol bindings on top where needed.

>
>>
>> So I think it would make more sense to rather redefine the MicroKernel
>> interface in terms of higher level constructs that abstract away the
>> protocol-level details. And to put the protocol-level bits (formatting
>> of diffs, etc.) into an actual protocol definition instead of a Java
>> interface. That protocol can then be implemented directly by a remote
>> MK implementation and consumed by a simple protocol binding for the
>> Java interface.
>
> i don't agree. IMO there's nothing wrong with the current api. it's
> intentionally

What is that intention? Is it still valid? I see quite a bit of risk 
introducing a legacy here before we even start.

> low-level. just because it's very straight-forward to remote IMO that
> doesn't imply
> that it should be spec'ed as a protocol instead of an api. it's light-weight
> (very few methods) and relatively easy to implement. while i am certainly aware
> that it's controversial (non-oo, string-based etc) i've not seen convincing
> technical arguments so far. IMO it's rather a question of personal preferences.

I don't think so. By using strings instead of Java data types:
- we lose strong typing which will cause bugs which otherwise would be 
caught by the compiler
- we add complexity by the need to serialise/deserialise
- we mix the concerns of application logic and serialisation/deserialisation
- we cause headaches further down the line in maintaining the code latter on
- we make the code harder to read which raises the bar for new developers
- we make refactoring harder and using data type based refactoring tools 
(like modern IDEs provide) impossible
- we add performance penalties caused by unnecessary 
serialisation/deserialisation
- we loose the ability of the compiler to optimize the code

Going down that route we are giving up on very well established 
engineering practices and we should have really good reasons in doing so.

Michael

>
> i am absolutely sure that we should allow for different mk implementations.
> it doesn't  make sense to have 3rd party implementations implement a
> protocol instead of a single straight-forward api.
>
> therefore, -1 for replacing the current string-based api.
>
> cheers
> stefan
>
>>
>> As a concrete example of what this could mean is the getNodes() method:
>>
>>     String getNodes(String path, String revision, int depth, long
>> offset, int count, String filter)
>>
>> The last four arguments of this method are only relevant in terms of
>> serialization. A more expressive version of the method could be:
>>
>>     NodeState getNodeState(String path, String revision)
>>
>> Or possibly even:
>>
>>     NodeState getRootNodeState(String revision)
>>
>> WDYT?
>>
>> BR,
>>
>> Jukka Zitting

Re: MicroKernel API vs. protocol

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Tue, Mar 20, 2012 at 10:57 AM, Stefan Guggisberg
<st...@gmail.com> wrote:
> i agree that it might make sense to implement a wrapper in the
> mk api consumer side, as long as we keep the current low-level
> api.

OK, let's go with that idea. See OAK-30 for my initial draft of how
this could work out based on the tree model we already discussed
earlier.

BR,

Jukka Zitting

Re: MicroKernel API vs. protocol

Posted by Stefan Guggisberg <st...@gmail.com>.

On Mon, Mar 19, 2012 at 7:11 PM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> To help clarify the MK API I think it would be useful for us to
> distinguish between the API as such and a potential related network
> protocol used for accessing a remote MK deployment:
>
>    http://people.apache.org/~jukka/2012/oak-mk-protocol.png
>
> The MicroKernel interface as currently defined has many features of a
> network protocol. For example all argument and return values are
> serialized and the filter parameter was introduced to reduce the
> amount of information that needs to pass across the interface.
>
> I think we need to question this design since dealing directly with a
> "network protocol" -like API in oak-core will be quite cumbersome and
> we'll in any case need to implement a separate wrapper layer on top of
> it to hide most of the details (JSON formatting, blob streaming, etc.)
> that aren't relevant to higher level functionality.

i agree that it might make sense to implement a wrapper in the
mk api consumer side, as long as we keep the current low-level
api.

>
> So I think it would make more sense to rather redefine the MicroKernel
> interface in terms of higher level constructs that abstract away the
> protocol-level details. And to put the protocol-level bits (formatting
> of diffs, etc.) into an actual protocol definition instead of a Java
> interface. That protocol can then be implemented directly by a remote
> MK implementation and consumed by a simple protocol binding for the
> Java interface.

i don't agree. IMO there's nothing wrong with the current api. it's
intentionally
low-level. just because it's very straight-forward to remote IMO that
doesn't imply
that it should be spec'ed as a protocol instead of an api. it's light-weight
(very few methods) and relatively easy to implement. while i am certainly aware
that it's controversial (non-oo, string-based etc) i've not seen convincing
technical arguments so far. IMO it's rather a question of personal preferences.

i am absolutely sure that we should allow for different mk implementations.
it doesn't  make sense to have 3rd party implementations implement a
protocol instead of a single straight-forward api.

therefore, -1 for replacing the current string-based api.

cheers
stefan

>
> As a concrete example of what this could mean is the getNodes() method:
>
>    String getNodes(String path, String revision, int depth, long
> offset, int count, String filter)
>
> The last four arguments of this method are only relevant in terms of
> serialization. A more expressive version of the method could be:
>
>    NodeState getNodeState(String path, String revision)
>
> Or possibly even:
>
>    NodeState getRootNodeState(String revision)
>
> WDYT?
>
> BR,
>
> Jukka Zitting

Re: MicroKernel API vs. protocol

Posted by Julian Reschke <ju...@gmx.de>.

On 2012-03-19 23:34, Jukka Zitting wrote:
> Hi,
>
> On Mon, Mar 19, 2012 at 9:31 PM, Julian Reschke<ju...@gmx.de>  wrote:
>> There's also a concern that isn't directly about strings vs objects but
>> about "flat or not". Forcing getNodes to return things as a hierarchy, when
>> it also could be a list of objects, decorated with a path, will make it
>> harder than it needs to be to process efficiently.
>
> Do you have a particular use case or access pattern in mind?

Streaming the result of a huge collection. If you have a flat list of 
things keyed by the path, it's very easy to consume because you don't 
need to hold state about the nesting level.

>> I believe that in order to remote things efficiently, we still need to be
>> able to optimize the number of requests. This means asking for a set of
>> NodeStates, for a hierarchy, and also filtering the result set (or selecting
>> specific parts of the hierarchy).
>
> Note that one remote request doesn't necessarily need to map to just a
> single Java API call. Filtering a list of results can easily be done
> on top of the NodeState interface. Doing it below the interface is
> only useful in case we can expect a performance or other benefit from
> doing so. Do we?

If you make it hard to optimize the network layer, it, well, will be 
hard. We see this today in WebDAV-based remoting in SPI2DAV.

Also, selecting a subset of the information to be returned is not only 
about payload size but also about avoiding to compute it.

>> I probably sound like a broken record but there's a reason why WebDAV's
>> PROPFIND/multistatus looks the way it does.
>
> Right. But as mentioned earlier, the constraints for WebDAV as a
> network protocol are quite different from those of a Java API that can
> leverage stuff like lazy loading. A method like the mentioned
> getRootNodeState(String revision) is perfectly fine for a Java API,
> whereas the equivalent WebDAV request would end up serializing the
> entire content tree.

I think we need an API that allows the caller to be very specific about 
what information is needed. If the API doesn't allow expressing this, 
we'll always fetch more information than we need at once, or end up 
doing many network requests instead of a single one.

Best regards, Julian

Re: MicroKernel API vs. protocol

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Mon, Mar 19, 2012 at 9:31 PM, Julian Reschke <ju...@gmx.de> wrote:
> There's also a concern that isn't directly about strings vs objects but
> about "flat or not". Forcing getNodes to return things as a hierarchy, when
> it also could be a list of objects, decorated with a path, will make it
> harder than it needs to be to process efficiently.

Do you have a particular use case or access pattern in mind?

> I believe that in order to remote things efficiently, we still need to be
> able to optimize the number of requests. This means asking for a set of
> NodeStates, for a hierarchy, and also filtering the result set (or selecting
> specific parts of the hierarchy).

Note that one remote request doesn't necessarily need to map to just a
single Java API call. Filtering a list of results can easily be done
on top of the NodeState interface. Doing it below the interface is
only useful in case we can expect a performance or other benefit from
doing so. Do we?

> I probably sound like a broken record but there's a reason why WebDAV's
> PROPFIND/multistatus looks the way it does.

Right. But as mentioned earlier, the constraints for WebDAV as a
network protocol are quite different from those of a Java API that can
leverage stuff like lazy loading. A method like the mentioned
getRootNodeState(String revision) is perfectly fine for a Java API,
whereas the equivalent WebDAV request would end up serializing the
entire content tree.

BR,

Jukka Zitting

Re: MicroKernel API vs. protocol

Posted by Julian Reschke <ju...@gmx.de>.

On 2012-03-20 09:07, Thomas Mueller wrote:
> ...
>> about "flat or not". Forcing getNodes to return things as a hierarchy,
>> when it also could be a list of objects, decorated with a path, will
>> make it harder than it needs to be to process efficiently.
>
> So far I didn't think this is a problem, but I'm not sure if I understand
> what you mean exactly. Could you give a example how an alternative
> representation would look like?
> ...

Right now we return a tree, such as:

"a" : { ":childCount" : 1, "title" : "foo",
	"b" : { ":childCount" : 0, "title" : "bar" }
}

The idea would be to return an array of leaf nodes, keyed by their 
identifiers.

Such as

[ "a" : { ":childCount" : 1, "title" : "foo" } ],
[ "a/b" : { ":childCount" : 0, "title" : "bar" } ]

etc...

The benefit is that it's slightly easier to process in a streaming way, 
because it's just an iterator over node infos, instead of a tree.

Best regards, Julian

Re: MicroKernel API vs. protocol

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>Serializing objects to strings just for the purpose of parsing
>them again makes little sense to me. ...
>doesn't need to be the only way to do it.

+1

>about "flat or not". Forcing getNodes to return things as a hierarchy,
>when it also could be a list of objects, decorated with a path, will
>make it harder than it needs to be to process efficiently.

So far I didn't think this is a problem, but I'm not sure if I understand
what you mean exactly. Could you give a example how an alternative
representation would look like?

>in order to remote things efficiently, we still need to
>be able to optimize the number of requests. This means asking for a set
>of NodeStates, for a hierarchy, and also filtering the result set (or
>selecting specific parts of the hierarchy).

+1

Regards,
Thomas

Re: MicroKernel API vs. protocol

Posted by Julian Reschke <ju...@gmx.de>.

On 2012-03-19 19:11, Jukka Zitting wrote:
> Hi,
>
> To help clarify the MK API I think it would be useful for us to
> distinguish between the API as such and a potential related network
> protocol used for accessing a remote MK deployment:
>
>      http://people.apache.org/~jukka/2012/oak-mk-protocol.png
>
> The MicroKernel interface as currently defined has many features of a
> network protocol. For example all argument and return values are
> serialized and the filter parameter was introduced to reduce the

Indeed.

> amount of information that needs to pass across the interface.

...which I believe is something worth to keep even in a Java API.

> I think we need to question this design since dealing directly with a
> "network protocol" -like API in oak-core will be quite cumbersome and
> we'll in any case need to implement a separate wrapper layer on top of
> it to hide most of the details (JSON formatting, blob streaming, etc.)
> that aren't relevant to higher level functionality.

Right. Serializing objects to strings just for the purpose of parsing 
them again makes little sense to me. It's something that you indeed need 
to do on the wire, and yes, we should define this as well, but it 
doesn't need to be the only way to do it.

There's also a concern that isn't directly about strings vs objects but 
about "flat or not". Forcing getNodes to return things as a hierarchy, 
when it also could be a list of objects, decorated with a path, will 
make it harder than it needs to be to process efficiently.

> So I think it would make more sense to rather redefine the MicroKernel
> interface in terms of higher level constructs that abstract away the
> protocol-level details. And to put the protocol-level bits (formatting
> of diffs, etc.) into an actual protocol definition instead of a Java
> interface. That protocol can then be implemented directly by a remote
> MK implementation and consumed by a simple protocol binding for the
> Java interface.

+1

> As a concrete example of what this could mean is the getNodes() method:
>
>      String getNodes(String path, String revision, int depth, long
> offset, int count, String filter)
>
> The last four arguments of this method are only relevant in terms of
> serialization. A more expressive version of the method could be:
>
>      NodeState getNodeState(String path, String revision)
>
> Or possibly even:
>
>      NodeState getRootNodeState(String revision)
>
> WDYT?

I believe that in order to remote things efficiently, we still need to 
be able to optimize the number of requests. This means asking for a set 
of NodeStates, for a hierarchy, and also filtering the result set (or 
selecting specific parts of the hierarchy).

I probably sound like a broken record but there's a reason why WebDAV's 
PROPFIND/multistatus looks the way it does.

Best regards, Julian

Re: MicroKernel API vs. protocol

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

One advantage of the current MicroKernel interface is it kind of forces us
to think about what nodes we really want to read, in order to reduce the
number of calls. That's an advantage if we want to use the MicroKernel API
for remoting (or JNI) in the future. Of course it's an option question
whether we will really ever use it in such a way :-)

>NodeState getRootNodeState(String revision)
>
>WDYT?

If we would use such an interface, we wouldn't have to think about what
nodes we are actually interested in. That's an advantage and a
disadvantage of course.

Regards,
Thomas