You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Julian Reschke <ju...@gmx.de> on 2012/03/14 14:54:03 UTC

Semantics of MicroKernel.getNodes()

Hi there,

I'm looking at MicroKernel.getNodes(), and I believe the semantics might 
perform non-optimal unless we get the property filtering right.

A typical use case always is browsing a repository using a tree view.

A tree view usually needs, given a node N:

- all or a subset of all properties of node N

- the set of child node names, and for each of these child nodes, a 
predefined set of properties that will allow the caller to decorate the 
node properly -- such as whether it's a container, and maybe the type).

Another problem is the String-fits-all return type; it would make it 
impossible to implement streaming of the result to the client; which 
will make the behavior for large collections non-optimal (the caller 
needs to wait for the complete JSON string to be ready before it can 
start forwarding information up the stack).


Best regards, Julian

Re: Semantics of MicroKernel.getNodes()

Posted by Stefan Guggisberg <st...@gmail.com>.

On Thu, Mar 15, 2012 at 9:57 AM, Bart van der Schans
<b....@onehippo.com> wrote:
> Hi,
>
> On Wed, Mar 14, 2012 at 2:54 PM, Julian Reschke <ju...@gmx.de> wrote:
>> Hi there,
>>
>> I'm looking at MicroKernel.getNodes(), and I believe the semantics might
>> perform non-optimal unless we get the property filtering right.
>>
>> A typical use case always is browsing a repository using a tree view.
>>
>> A tree view usually needs, given a node N:
>>
>> - all or a subset of all properties of node N
>>
>> - the set of child node names, and for each of these child nodes, a
>> predefined set of properties that will allow the caller to decorate the node
>> properly -- such as whether it's a container, and maybe the type).
>
> Not sure if it matters at all for this specific thread, but another
> very common use case with trees:
> - a check if node N has children or not

there's the system generated :childNodeCount property
and there's also the getChildNodeCount(String path, String revId) method.

cheers
stefan

>
> That check should/could be a lot more lightweight than fetching a list
> of child node names.
>
> Regards,
> Bart

Re: Semantics of MicroKernel.getNodes()

Posted by Bart van der Schans <b....@onehippo.com>.

Hi,

On Wed, Mar 14, 2012 at 2:54 PM, Julian Reschke <ju...@gmx.de> wrote:
> Hi there,
>
> I'm looking at MicroKernel.getNodes(), and I believe the semantics might
> perform non-optimal unless we get the property filtering right.
>
> A typical use case always is browsing a repository using a tree view.
>
> A tree view usually needs, given a node N:
>
> - all or a subset of all properties of node N
>
> - the set of child node names, and for each of these child nodes, a
> predefined set of properties that will allow the caller to decorate the node
> properly -- such as whether it's a container, and maybe the type).

Not sure if it matters at all for this specific thread, but another
very common use case with trees:
- a check if node N has children or not

That check should/could be a lot more lightweight than fetching a list
of child node names.

Regards,
Bart

Re: Semantics of MicroKernel.getNodes()

Posted by Julian Reschke <ju...@gmx.de>.

On 2012-03-15 09:44, Stefan Guggisberg wrote:
> On Wed, Mar 14, 2012 at 4:47 PM, Michael Dürig<md...@apache.org>  wrote:
>>
>>
>> On 14.3.12 14:54, Julian Reschke wrote:
>>>
>>> Hi there,
>>>
>>> I'm looking at MicroKernel.getNodes(), and I believe the semantics might
>>> perform non-optimal unless we get the property filtering right.
>>>
>>> A typical use case always is browsing a repository using a tree view.
>>>
>>> A tree view usually needs, given a node N:
>>>
>>> - all or a subset of all properties of node N
>>>
>>> - the set of child node names, and for each of these child nodes, a
>>> predefined set of properties that will allow the caller to decorate the
>>> node properly -- such as whether it's a container, and maybe the type).
>>
>>
>> Couldn't we use the filter parameter for such cases. AFAIK the parameter is
>> currently in the API only and its semantics is not defined yet. So if we
>> come up with the right semantics for it, wouldn't that work?
>
> i added the filter parameter for specifying the properties to be
> included in the json response but the format/syntax is still TBD.
>
> IMO there should be an implicit default filter (e.g all user-defined properties
> +  the system-defined ':childNodeCount' property).
>
> the ':hash' property should only be included on demand, i.e. when it is
> specified in the filter.
>
> cheers
> stefan
> ...

That sounds familiar; WebDAV PROPFIND works in a similar way (the 
default is "allprop", and it excludes system properties that may be 
irrelevant, expensive to compute, or both. -> 
<http://greenbytes.de/tech/webdav/rfc4918.html#rfc.section.9.1>

Best regards, Julian

Re: Semantics of MicroKernel.getNodes()

Posted by Stefan Guggisberg <st...@gmail.com>.

On Wed, Mar 14, 2012 at 4:47 PM, Michael Dürig <md...@apache.org> wrote:
>
>
> On 14.3.12 14:54, Julian Reschke wrote:
>>
>> Hi there,
>>
>> I'm looking at MicroKernel.getNodes(), and I believe the semantics might
>> perform non-optimal unless we get the property filtering right.
>>
>> A typical use case always is browsing a repository using a tree view.
>>
>> A tree view usually needs, given a node N:
>>
>> - all or a subset of all properties of node N
>>
>> - the set of child node names, and for each of these child nodes, a
>> predefined set of properties that will allow the caller to decorate the
>> node properly -- such as whether it's a container, and maybe the type).
>
>
> Couldn't we use the filter parameter for such cases. AFAIK the parameter is
> currently in the API only and its semantics is not defined yet. So if we
> come up with the right semantics for it, wouldn't that work?

i added the filter parameter for specifying the properties to be
included in the json response but the format/syntax is still TBD.

IMO there should be an implicit default filter (e.g all user-defined properties
+  the system-defined ':childNodeCount' property).

the ':hash' property should only be included on demand, i.e. when it is
specified in the filter.

cheers
stefan


>
>
>> Another problem is the String-fits-all return type; it would make it
>> impossible to implement streaming of the result to the client; which
>> will make the behavior for large collections non-optimal (the caller
>> needs to wait for the complete JSON string to be ready before it can
>> start forwarding information up the stack).
>
>
> Right. It was decided very early in the process to make the API string based
> in order to be language agnostic (i.e. leave the option for a C
> implementation). However we should really re-evaluate this requirement and
> its consequences taking into account the current situation.
>
> Michael
>
>>
>>
>> Best regards, Julian

Re: Semantics of MicroKernel.getNodes()

Posted by Julian Reschke <ju...@gmx.de>.

On 2012-03-14 16:47, Michael Dürig wrote:
>
>
> On 14.3.12 14:54, Julian Reschke wrote:
>> Hi there,
>>
>> I'm looking at MicroKernel.getNodes(), and I believe the semantics might
>> perform non-optimal unless we get the property filtering right.
>>
>> A typical use case always is browsing a repository using a tree view.
>>
>> A tree view usually needs, given a node N:
>>
>> - all or a subset of all properties of node N
>>
>> - the set of child node names, and for each of these child nodes, a
>> predefined set of properties that will allow the caller to decorate the
>> node properly -- such as whether it's a container, and maybe the type).
>
> Couldn't we use the filter parameter for such cases. AFAIK the parameter
> is currently in the API only and its semantics is not defined yet. So if
> we come up with the right semantics for it, wouldn't that work?

It probably will, and we should keep this use case in mind when we 
define it.

>> Another problem is the String-fits-all return type; it would make it
>> impossible to implement streaming of the result to the client; which
>> will make the behavior for large collections non-optimal (the caller
>> needs to wait for the complete JSON string to be ready before it can
>> start forwarding information up the stack).
>
> Right. It was decided very early in the process to make the API string
> based in order to be language agnostic (i.e. leave the option for a C
> implementation). However we should really re-evaluate this requirement
> and its consequences taking into account the current situation.

String-based doesn't necessarily imply it needs to use Java String 
objects (and yes, Thomas, I saw your comment and will have a look at that).

In general, if Java talks to Java of course it's a waste of time to 
serialize things into a String that need to be parsed again next. The 
JSON data model may be the right thing, and using application/json on 
the *wire* may make a lot of sense, but that doesn't necessarily mean it 
needs to be the only thing a Java API to the MicroKernel accepts...

Best regards, Julian

Re: Semantics of MicroKernel.getNodes()

Posted by Michael Dürig <md...@apache.org>.

On 14.3.12 14:54, Julian Reschke wrote:
> Hi there,
>
> I'm looking at MicroKernel.getNodes(), and I believe the semantics might
> perform non-optimal unless we get the property filtering right.
>
> A typical use case always is browsing a repository using a tree view.
>
> A tree view usually needs, given a node N:
>
> - all or a subset of all properties of node N
>
> - the set of child node names, and for each of these child nodes, a
> predefined set of properties that will allow the caller to decorate the
> node properly -- such as whether it's a container, and maybe the type).

Couldn't we use the filter parameter for such cases. AFAIK the parameter 
is currently in the API only and its semantics is not defined yet. So if 
we come up with the right semantics for it, wouldn't that work?

> Another problem is the String-fits-all return type; it would make it
> impossible to implement streaming of the result to the client; which
> will make the behavior for large collections non-optimal (the caller
> needs to wait for the complete JSON string to be ready before it can
> start forwarding information up the stack).

Right. It was decided very early in the process to make the API string 
based in order to be language agnostic (i.e. leave the option for a C 
implementation). However we should really re-evaluate this requirement 
and its consequences taking into account the current situation.

Michael

>
>
> Best regards, Julian

To String or not To String (was: Semantics of MicroKernel.getNodes())

Posted by Felix Meschberger <fm...@adobe.com>.

Hi,

Am 15.03.2012 um 09:49 schrieb Stefan Guggisberg:

> On Wed, Mar 14, 2012 at 2:54 PM, Julian Reschke <ju...@gmx.de> wrote:
>> Another problem is the String-fits-all return type; it would make it
>> impossible to implement streaming of the result to the client; which will
>> make the behavior for large collections non-optimal (the caller needs to
>> wait for the complete JSON string to be ready before it can start forwarding
>> information up the stack).
> 
> good point. how about returning a character stream (Reader/Readable)
> instead of a string?

I would like to start discussing whether it really makes sense to have String as the argument types of the methods and saying the String values are actually seriaized JSON data.

Almost all interaction will involve serialization and deserialization, which costs resources (time, CPU, memory).

It has been said, that this is done to allow for non-Java implementations. I think this argument is not really valid. All programming languages in use today allow for structured data (even plain old C). So there is no need for a String API.

It has been said, that for writing to the storage or for remoting a String is more helpful. I hold that the use of the data is an implementation detail. And if this detail would be important enough to make it into the API, it would really have to be byte[].

It has been said, there are wrappers on top of the API to provide data structure oriented API. It also has been said, that if we have such wrappers, something sounds wrong. I tend to agree with this.

So, I think that we should really replace the String-type arguments and results said to be serialized data structures to real data structures we can document and fill with semantics.

Regards
Felix

Re: Semantics of MicroKernel.getNodes()

Posted by Julian Reschke <ju...@gmx.de>.

On 2012-03-15 09:49, Stefan Guggisberg wrote:
> On Wed, Mar 14, 2012 at 2:54 PM, Julian Reschke<ju...@gmx.de>  wrote:
>> Hi there,
>>
>> I'm looking at MicroKernel.getNodes(), and I believe the semantics might
>> perform non-optimal unless we get the property filtering right.
>>
>> A typical use case always is browsing a repository using a tree view.
>>
>> A tree view usually needs, given a node N:
>>
>> - all or a subset of all properties of node N
>>
>> - the set of child node names, and for each of these child nodes, a
>> predefined set of properties that will allow the caller to decorate the node
>> properly -- such as whether it's a container, and maybe the type).
>>
>> Another problem is the String-fits-all return type; it would make it
>> impossible to implement streaming of the result to the client; which will
>> make the behavior for large collections non-optimal (the caller needs to
>> wait for the complete JSON string to be ready before it can start forwarding
>> information up the stack).
>
> good point. how about returning a character stream (Reader/Readable)
> instead of a string?
> ...

I think it would help.

A more general issue is the reliance on a JSON-type tree; this looks 
simple, but it means that returning information about lots of nested 
child nodes makes it harder to process the result, unless you're willing 
to wait for the end.

I always wondered why WebDAV multistatus messages contain a flat list of 
response elements (with the path being part of the response), instead of 
nesting. I believe I now know why; it makes streaming *much* easier (and 
before somebody claims that's academic; I have implemented that in a 
server in the past).

Best regards, Julian

Re: Semantics of MicroKernel.getNodes()

Posted by Stefan Guggisberg <st...@gmail.com>.

On Wed, Mar 14, 2012 at 2:54 PM, Julian Reschke <ju...@gmx.de> wrote:
> Hi there,
>
> I'm looking at MicroKernel.getNodes(), and I believe the semantics might
> perform non-optimal unless we get the property filtering right.
>
> A typical use case always is browsing a repository using a tree view.
>
> A tree view usually needs, given a node N:
>
> - all or a subset of all properties of node N
>
> - the set of child node names, and for each of these child nodes, a
> predefined set of properties that will allow the caller to decorate the node
> properly -- such as whether it's a container, and maybe the type).
>
> Another problem is the String-fits-all return type; it would make it
> impossible to implement streaming of the result to the client; which will
> make the behavior for large collections non-optimal (the caller needs to
> wait for the complete JSON string to be ready before it can start forwarding
> information up the stack).

good point. how about returning a character stream (Reader/Readable)
instead of a string?

cheers
stefan

>
>
> Best regards, Julian

Re: Semantics of MicroKernel.getNodes()

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

Somewhat related is that in getJournal, the json diff is "wrapped" as a
json string value. That means the json diff is escaped in the MicroKernel,
and has to be de-escaped on the client.

To avoid this, getJournal could return a json diff that is not wrapped,
but instead matches the json diff passed in the commit method. Simplified
example:

@{"msg": "commit message", "revision": "12345"}
+ "/test": "hello" {}
@{"msg": "commit message", "revision": "45678"}
+ "/test/hello": "world" {}

The @{} contains the commit metadata, which could also be supported in the
commit method. That way, commit would mirror getJournal.

Regards,
Thomas

On 3/14/12 4:01 PM, "Thomas Mueller" <mu...@adobe.com> wrote:

>Hi,
>
>>Another problem is the String-fits-all return type; it would make it
>>impossible to implement streaming of the result to the client; which
>>will make the behavior for large collections non-optimal (the caller
>>needs to wait for the complete JSON string to be ready before it can
>>start forwarding information up the stack).
>
>To avoid Strings, I wrote the interface
>org.apache.jackrabbit.mk.wrapper.Wrapper (extends MicroKernel) and the
>abstract class WrapperBase, where Strings are replaced with a JsopReader.
>
>Regards,
>Thomas
>
>
>

Re: Semantics of MicroKernel.getNodes()

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>Another problem is the String-fits-all return type; it would make it
>impossible to implement streaming of the result to the client; which
>will make the behavior for large collections non-optimal (the caller
>needs to wait for the complete JSON string to be ready before it can
>start forwarding information up the stack).

To avoid Strings, I wrote the interface
org.apache.jackrabbit.mk.wrapper.Wrapper (extends MicroKernel) and the
abstract class WrapperBase, where Strings are replaced with a JsopReader.

Regards,
Thomas