You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Stefan Guggisberg <st...@gmail.com> on 2011/05/11 18:31:18 UTC

[j3] Repository MicroKernel API draft

hi,

as some of you may have noticed i've started work on my own
MicroKernel proposal a while ago in the jackrabbit sandbox.

although the project is in a very early stage i wanted to share my work
with you.

my idea was to first come up with an abstraction of a bare-bone
MVCC -based repository storage engine and then test the feasibility
with a primitive prototype.

the source code is located here:
http://svn.apache.org/repos/asf/jackrabbit/sandbox/microkernel/

and here's some (still very basic) documentation:
http://wiki.apache.org/jackrabbit/RepositoryMicroKernel

as always, questions & feedback are welcome.

cheers
stefan

Re: [j3] Repository MicroKernel API draft

Posted by David Buchmann <da...@liip.ch>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hi,

with the jackalope project, we implement a php binding for jcr that
talks over the spi connection to the jackrabbit backend.

talking to the microkernel over REST would be the natural evolution of
things - the closer to the current spi the better :-)

cheers,david


Am 10.06.2011 16:54, schrieb Thomas Mueller:
> Hi,
> 
>> Sure. I'm just questioning whether the benefits of a custom
>> implementation really are worth the time spent reinventing the wheel
>> and fixing all the inevitable bugs.
> 
> Yes. In fact, I believe it is actually much better to re-write the few
> JSON specific methods we need than trying to cobble together existing
> libraries with new code (trying to combine an existing JSON parser /
> builder with some self-written JSOP parser / generator), because that will
> inevitably lead to problems. It actually already does (see "TODO ugly
> hack", plus it's currently not possible to store the JSON as is without
> de-escaping).
> 
>> My bigger concern here is that the JSON handling seems to be happening
>> at a way too low level.
> 
> Yes, I'm working on this problem now. Please note it will still be
> necessary to parse the keys and some of the values. It is a low level to
> do that, and possibly we will end up replacing JSON with a binary format
> (such a BSON) if we find out it's really worth it (for performance or
> other reasons). But one advantage of using JSON (if nothing else) is that
> it's actually very easy to debug the code, which is very valuable at this
> stage.
> 
>> both MicroKernelImpl
>> classes in o.a.j.mk and o.a.j.mk.mem duplicating pretty much the same
>> parsing and serialization logic. Can we refactor that code into a
>> single class/location?
> 
> Sure, the plan is to merge that. I didn't want to overwrite Stefans code
> (at least not before I really understand it), that's why I didn't even try
> to merge things currently.
> 
> Regards,
> Thomas
> 

- -- 
Liip AG // Agile Web Development // T +41 26 422 25 11
CH-1700 Fribourg // PGP 0xA581808B // www.liip.ch
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk35vsEACgkQqBnXnqWBgIubRwCeP4poXYcLcUqJGNZGwjAycYD7
ZkcAoIV+8KYcj3KhCza0WBqcG3cGU+Qx
=lHJB
-----END PGP SIGNATURE-----

Re: [j3] Repository MicroKernel API draft

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>Sure. I'm just questioning whether the benefits of a custom
>implementation really are worth the time spent reinventing the wheel
>and fixing all the inevitable bugs.

Yes. In fact, I believe it is actually much better to re-write the few
JSON specific methods we need than trying to cobble together existing
libraries with new code (trying to combine an existing JSON parser /
builder with some self-written JSOP parser / generator), because that will
inevitably lead to problems. It actually already does (see "TODO ugly
hack", plus it's currently not possible to store the JSON as is without
de-escaping).

>My bigger concern here is that the JSON handling seems to be happening
>at a way too low level.

Yes, I'm working on this problem now. Please note it will still be
necessary to parse the keys and some of the values. It is a low level to
do that, and possibly we will end up replacing JSON with a binary format
(such a BSON) if we find out it's really worth it (for performance or
other reasons). But one advantage of using JSON (if nothing else) is that
it's actually very easy to debug the code, which is very valuable at this
stage.

>both MicroKernelImpl
>classes in o.a.j.mk and o.a.j.mk.mem duplicating pretty much the same
>parsing and serialization logic. Can we refactor that code into a
>single class/location?

Sure, the plan is to merge that. I didn't want to overwrite Stefans code
(at least not before I really understand it), that's why I didn't even try
to merge things currently.

Regards,
Thomas

Re: [j3] Repository MicroKernel API draft

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Fri, Jun 10, 2011 at 3:13 PM, Thomas Mueller <mu...@adobe.com> wrote:
> This is similar to 'why do you build your own SQL-2 parser and don't use
> Lex/Flex/Yacc/Javacc/ANTLR/other parser tool', or 'why do you build your
> own cache and don't use Ehcache/other cache libary', or 'why do you build
> your own (connection) pooling'. There are multiple reasons:

Sure. I'm just questioning whether the benefits of a custom
implementation really are worth the time spent reinventing the wheel
and fixing all the inevitable bugs. Anyway, this is more of an
implementation detail, so I don't really care that much how it's
really done.

My bigger concern here is that the JSON handling seems to be happening
at a way too low level. For example, we now have both MicroKernelImpl
classes in o.a.j.mk and o.a.j.mk.mem duplicating pretty much the same
parsing and serialization logic. Can we refactor that code into a
single class/location?

BR,

Jukka Zitting

Re: [j3] Repository MicroKernel API draft

Posted by Michael Dürig <md...@apache.org>.

> What I do worry though, and why I did bring this up in the first
> place, is that the API being defined ends up being as easy and
> straightforward to use for key clients, as that's a major part in
> deciding whether the API is successful or not. Personally, with the
> assumption that I'll be writing notable parts of the higher-level
> components on top of the microkernel, I'd much prefer a type-safe API
> that doesn't require major string processing.

And like I said earlier people will start to write their own wrappers to 
cope with the unwieldiness of the original API. And the windows will be 
broken...

Michael

Re: [j3] Repository MicroKernel API draft

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Tue, Jun 21, 2011 at 11:45 AM, Thomas Mueller <mu...@adobe.com> wrote:
> Otherwise the discussion is essentially hot air against hot air.

Agreed on the performance part. More notably, I'd say that the
performance or memory overhead of JSON vs. noJSON is pretty much
irrelevant when compared to the disk or network delays that we in any
case need to worry about.

What I do worry though, and why I did bring this up in the first
place, is that the API being defined ends up being as easy and
straightforward to use for key clients, as that's a major part in
deciding whether the API is successful or not. Personally, with the
assumption that I'll be writing notable parts of the higher-level
components on top of the microkernel, I'd much prefer a type-safe API
that doesn't require major string processing.

BR,

Jukka Zitting

Re: [j3] Repository MicroKernel API draft

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>Sounds like a theoretical discussion doesn't help here, only
>performance/resource usage comparisons can prove those differences.

Yes, that's why we need to build the prototype :-) Which we can then
compare against other implementations (my J3 prototype, Jackrabbit).

Otherwise the discussion is essentially hot air against hot air.

Regards,
Thomas

Re: [j3] Repository MicroKernel API draft

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 21.06.11 09:13, "Thomas Mueller" <mu...@adobe.com> wrote:
>>What about using the
>>json library in-memory representations for the microkernel to avoid
>>serialization/parsing if not necessary?
>
>How would that make parsing and serialization _not_ necessary?

That very much depends on what is happing on the client & the
implementation of the microkernel. I was thinking that it might be typical
that a client has to convert from the JCR API (and an internal
representation for it) to the JSON string (serialization) and the
microkernel implementation would then have to parse it and convert into
its own persistence format.

Of course this was just an assumption - if the plan is to leverage the
JSON string form 1:1 in the persistence, then this might already work. But
I can imagine that sooner or later the kernel will do things like
syncing/merging of cluster nodes and might have to actually parse and
understand the strings.

>Currently, the MicroKernel doesn't fully parse the Json - it just splits
>the Json diff into tokens that are substrings of the Json diff. Tokenizing
>is faster than parsing because strings don't have to be de-escaped, and
>numbers don't have to be parsed. Also, generating the Json is fast because
>numbers don't have to be converted to string, and strings don't have to be
>escaped. So that's faster than using a regular Json parser / writer as
>well.

Makes sense, if this will stay this way.

> 
>And because in Java, String.substring creates a string that _shares_ the
>character array of the original string, the memory overhead of the
>MicroKernel is low (specially if there are a lot of string values, which
>seems to be the case).

Have you measured it? I only remember the old saying, that Java is only
slow because most Java programs do too much string processing, if people
compared it to C :-).

Sounds like a theoretical discussion doesn't help here, only
performance/resource usage comparisons can prove those differences.

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

Re: [j3] Repository MicroKernel API draft

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>What about using the
>json library in-memory representations for the microkernel to avoid
>serialization/parsing if not necessary?

How would that make parsing and serialization _not_ necessary?

Currently, the MicroKernel doesn't fully parse the Json - it just splits
the Json diff into tokens that are substrings of the Json diff. Tokenizing
is faster than parsing because strings don't have to be de-escaped, and
numbers don't have to be parsed. Also, generating the Json is fast because
numbers don't have to be converted to string, and strings don't have to be
escaped. So that's faster than using a regular Json parser / writer as
well. 

And because in Java, String.substring creates a string that _shares_ the
character array of the original string, the memory overhead of the
MicroKernel is low (specially if there are a lot of string values, which
seems to be the case).

>The microkernel should not validate properties or do any fancy stuff with
>it, so a generic string -> value map is all there is, right?

It's a string -> string map. The value is the raw Json value (including
escape quotes and escape characters).

Regards,
Thomas

Re: [j3] Repository MicroKernel API draft

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 20.06.11 15:22, "Thomas Mueller" <mu...@adobe.com> wrote:
>>This is actually the way I's prefer to go.
>
>Me too - otherwise, I wouldn't have implemented those classes :-)
>Val.toString() returns Json by the way.

If json provides the right amount of (standardized) unstructuredness I
guess it's useful (and can be "natively" remoted). What about using the
json library in-memory representations for the microkernel to avoid
serialization/parsing if not necessary?

>When implementing my jackrabbit-j3 prototype, I found that it makes sense
>to re-use the value implementation, but it might not make sense for the
>node implementation. And about property implementation: I think it's just
>not needed at all. There is no need for a property class, except on the
>JCR API level.

The microkernel should not validate properties or do any fancy stuff with
it, so a generic string -> value map is all there is, right?

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

Re: [j3] Repository MicroKernel API draft

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>This is actually the way I's prefer to go.

Me too - otherwise, I wouldn't have implemented those classes :-)
Val.toString() returns Json by the way.

>Define representations of
>commonly used entities (Value, Node, Property...) which can be easily
>used across the stack.

When implementing my jackrabbit-j3 prototype, I found that it makes sense
to re-use the value implementation, but it might not make sense for the
node implementation. And about property implementation: I think it's just
not needed at all. There is no need for a property class, except on the
JCR API level.

> If we'd carefully design these there would be no
>need to rewrap or repack at each level of the stack.

I think this 'no need to re-wrap' is a potential optimization, but we
don't _need_ this optimization at the moment. We could optimize this part
later on.

Regards,
Thomas

Re: [j3] Repository MicroKernel API draft

Posted by Michael Dürig <mi...@gmail.com>.

> I did implement low level 'value' and 'bundle' classes, which could be
> used by both the client and the MicroKernel:
> org.apache.jackrabbit.j3.mc.Val
> org.apache.jackrabbit.j3.mc.Bundle
>

This is actually the way I's prefer to go. Define representations of 
commonly used entities (Value, Node, Property...) which can be easily 
used across the stack. If we'd carefully design these there would be no 
need to rewrap or repack at each level of the stack.

Michael

Re: [j3] Repository MicroKernel API draft

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>The MicroKernel implementation has to parse and reserialize JSON
>strings whenever it needs to process a diff or is given a nonstandard
>set of depth, offset and count values in a getNodes() call.

Currently, the MicroKernel doesn't really parse the Json in this case. It
only tokenizes the diff. It _is_ a loop over all bytes / characters, which
could be avoided when using a different serialization. This would be a
potential optimization for the future.

What it does is create the Json result using the in-memory Json-snippets.
This is basically three or four StringBuilder.append(..) calls per
property.

>This parsing and reserialization is exactly the same stuff that any
>client needs to perform to use the API, which is duplication that
>could and IMHO should be avoided.

No matter what the MicroKernel API, the client will in many cases need a
different in-memory representation.

I did implement low level 'value' and 'bundle' classes, which could be
used by both the client and the MicroKernel:
org.apache.jackrabbit.j3.mc.Val
org.apache.jackrabbit.j3.mc.Bundle


Regards,
Thomas

Re: [j3] Repository MicroKernel API draft

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Mon, Jun 20, 2011 at 12:50 PM, Thomas Mueller <mu...@adobe.com> wrote:
> Only if the implementation needs to do anything with the values (except
> for storing and retrieving). The MicroKernel doesn't care about most of
> the values - they are just stored "as is".

The MicroKernel implementation has to parse and reserialize JSON
strings whenever it needs to process a diff or is given a nonstandard
set of depth, offset and count values in a getNodes() call.

This parsing and reserialization is exactly the same stuff that any
client needs to perform to use the API, which is duplication that
could and IMHO should be avoided.

BR,

Jukka Zitting

Re: [j3] Repository MicroKernel API draft

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>>why JSON strings?
>> - easy portability
>> - remoting-friendly
>> - leads to very compact API
>> - JSON parsing/generating overhead is IMO minimal
>>    and probably more effecient than creating (and collecting) tons of
>>    small java objects on the heap
>>
>
>I doubt this. In fact I think the situation with the small Java objects
>is even worse in the face of a JSON based API: typically a
>(de)serialization layer will create an additional set of intermediate
>objects which have to be consumed/translated into the domain objects of
>the respective implementation.

Only if the implementation needs to do anything with the values (except
for storing and retrieving). The MicroKernel doesn't care about most of
the values - they are just stored "as is". The only object that is
currently created for each value is a string that *shares the content*
with the Json diff string (which is part of the API). So the in-memory
overhead is low (on the MicroKernel side).

The MicroKernel *client* on the other hand anyway needs a more complex
object model. It doesn't make sense if the MicroKernel knows anything
about that.

I'm not saying the Json based API is the 'fastest possible' or the one
that uses the 'least' memory. But I also don't think the situation is very
bad.

Regards,
Thomas

Re: [j3] Repository MicroKernel API draft

Posted by Michael Dürig <md...@apache.org>.


On 20.6.11 12:36, Stefan Guggisberg wrote:
> On Mon, Jun 20, 2011 at 12:21 PM, Michael Dürig<md...@apache.org>  wrote:
>>
>>> why JSON strings?
>>> - easy portability
>>> - remoting-friendly
>>> - leads to very compact API
>>> - JSON parsing/generating overhead is IMO minimal
>>>    and probably more effecient than creating (and collecting) tons of
>>>    small java objects on the heap
>>>
>>
>> I doubt this. In fact I think the situation with the small Java objects is
>> even worse in the face of a JSON based API: typically a (de)serialization
>> layer will create an additional set of intermediate objects which have to be
>> consumed/translated into the domain objects of the respective
>> implementation.
>
> that's IMO implementation specific. the JSON response doesn't need to
> be parsed fully, the JSON string might just as well be kept instead and
> domain objects would be generated on demand from the underlying JSON data.

I very much doubt that this will work out. And it is completely opposite 
to what the current prototype looks like. Both, above and below the 
microkernel api.

Michael

>
>>
>> Michael
>>

Re: [j3] Repository MicroKernel API draft

Posted by Stefan Guggisberg <st...@gmail.com>.

On Mon, Jun 20, 2011 at 12:21 PM, Michael Dürig <md...@apache.org> wrote:
>
>> why JSON strings?
>> - easy portability
>> - remoting-friendly
>> - leads to very compact API
>> - JSON parsing/generating overhead is IMO minimal
>>   and probably more effecient than creating (and collecting) tons of
>>   small java objects on the heap
>>
>
> I doubt this. In fact I think the situation with the small Java objects is
> even worse in the face of a JSON based API: typically a (de)serialization
> layer will create an additional set of intermediate objects which have to be
> consumed/translated into the domain objects of the respective
> implementation.

that's IMO implementation specific. the JSON response doesn't need to
be parsed fully, the JSON string might just as well be kept instead and
domain objects would be generated on demand from the underlying JSON data.

>
> Michael
>

Re: [j3] Repository MicroKernel API draft

Posted by Michael Dürig <md...@apache.org>.

> why JSON strings?
> - easy portability
> - remoting-friendly
> - leads to very compact API
> - JSON parsing/generating overhead is IMO minimal
>    and probably more effecient than creating (and collecting) tons of
>    small java objects on the heap
>

I doubt this. In fact I think the situation with the small Java objects 
is even worse in the face of a JSON based API: typically a 
(de)serialization layer will create an additional set of intermediate 
objects which have to be consumed/translated into the domain objects of 
the respective implementation.

Michael

Re: [j3] Repository MicroKernel API draft

Posted by Stefan Guggisberg <st...@gmail.com>.

On Mon, Jun 20, 2011 at 12:43 PM, Michael Dürig <md...@apache.org> wrote:
>
>> agreed. since the audience of the MicroKernel API is pretty small
>> programmer-friendliness has admittedly not been a top priority ;)
>
> I think programmer-friendliness should have a higher priority then. Where
> did this priorities come from btw?

those priorities reflect my personal judgement, based on various coffee break
discussions, mailing-list discussions and 10 years working on jackrabbit core.

i've started the sandbox project to share it with interested parties
in the community
and hoping to be able to test the feasibility of this approach.

cheers
stefan

>
> Programmer unfriendly api's lead to programmers designing ad-hoc convenience
> wrappers which lead to fragmentation and broken windows. This is already now
> apparent from the jr3 prototype code base! Which is pretty alarming to me.
>
>> OTOH portability, remotability, stateless nature and compactness
>> have been. having those goals in mind, JSON is IMO a perfect fit.
>
> Again, where did these goals come from? What are the rationals?
>
> While I think having a stateless api is a good think, I think we should not
> let this preclude other features and functionality. For example having the
> commit method to contain all the transient changes in a single json string
> will limit the size of transient modifications.
>
> We could as well allow transient changes to be written to the micro kernel
> and then committed later on:
>
> changes.add(mk.writeTransient("+/foo/bar, {}"))
> changes.add(mk.writeTransient("+/foo2/bar2, {}"))
> ...
> mk.commit(changes)
>
> IMO this is as stateless as the other approach.
>
> Michael
>
>
>
>>
>> cheers
>> stefan
>>
>>>
>>> BR,
>>>
>>> Jukka Zitting
>>>
>

Re: [j3] Repository MicroKernel API draft

Posted by Michael Dürig <md...@apache.org>.

> agreed. since the audience of the MicroKernel API is pretty small
> programmer-friendliness has admittedly not been a top priority ;)

I think programmer-friendliness should have a higher priority then. 
Where did this priorities come from btw?

Programmer unfriendly api's lead to programmers designing ad-hoc 
convenience wrappers which lead to fragmentation and broken windows. 
This is already now apparent from the jr3 prototype code base! Which is 
pretty alarming to me.

> OTOH portability, remotability, stateless nature and compactness
> have been. having those goals in mind, JSON is IMO a perfect fit.
Again, where did these goals come from? What are the rationals?

While I think having a stateless api is a good think, I think we should 
not let this preclude other features and functionality. For example 
having the commit method to contain all the transient changes in a 
single json string will limit the size of transient modifications.

We could as well allow transient changes to be written to the micro 
kernel and then committed later on:

changes.add(mk.writeTransient("+/foo/bar, {}"))
changes.add(mk.writeTransient("+/foo2/bar2, {}"))
...
mk.commit(changes)

IMO this is as stateless as the other approach.

Michael



>
> cheers
> stefan
>
>>
>> BR,
>>
>> Jukka Zitting
>>

Re: [j3] Repository MicroKernel API draft

Posted by Stefan Guggisberg <st...@gmail.com>.

On Sun, Jun 19, 2011 at 3:15 PM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On Fri, Jun 17, 2011 at 6:37 PM, Stefan Guggisberg
> <st...@gmail.com> wrote:
>> one of the design goals of the MicroKernel API was "easy portablilty".
>> typically it would be used in-proc by some higher level code, comparable
>> to the current SPI.
>
> To me it seems like the JSON parts of the API would be better suited
> for a higher-level integration layer.
>
> The problem here is that since the MicroKernel is an intentionally
> low-level API, we'll need a lot of higher level code to implement
> features like versioning and search.

correct.

> My assumption is that these
> components would still reside in the same JVM as the MicroKernel and
> thus access it through the defined Java interface.

typically yes, but not necessarily.

> Are we expecting
> such code to have to parse and generate JSON strings whenever it wants
> to access the underlying content?

yes, and i don't think that it's a problem.


>
> Consider, for example, a simple task of updating a counter. The JCR
> API for that is something like this:
>
>    Property count = session.getProperty("/counter/count");
>    count.setValue(count.getLong() + 1);
>    session.save();
>
> The equivalent MicroKernel code, as far as I understand the API, would
> be something like this:
>
>    String revision = microkernel.getHeadRevision();
>    String counter = microkernel.getNodes("/counter", revision, 0, 0, 0);
>    JSONObject json = new JSONObject(counter);
>    long count = json.getLong("count") + 1;
>    revision = microkernel.commit("/counter", "^count:" + count, revision);
>
> This doesn't strike me as a particularly programmer-friendly API.

agreed. since the audience of the MicroKernel API is pretty small
programmer-friendliness has admittedly not been a top priority ;)

OTOH portability, remotability, stateless nature and compactness
have been. having those goals in mind, JSON is IMO a perfect fit.

cheers
stefan

>
> BR,
>
> Jukka Zitting
>

Re: [j3] Repository MicroKernel API draft

Posted by Michael Dürig <md...@apache.org>.

> 2) The complexity of the API: this is a *a lot* simpler than the SPI.

You are mixing things up here. The SPI interfaces cope with versioning, 
access rights, search, workspaces, name spaces, node types, import, 
locking and observation. All of which is not present in the micro 
kernel. That's where most of the additional complexity comes from.


> 3) No API change required if we add features. This is similar to using SQL
> as the API (as in ODBC, JDBC): the API is simple ("execute(String sql)"),
> so both clients and servers can evolve without having to change the API a
> lot if there is a new feature. Actually, JDBC is worse because the data
> types are part of the API.

public static void main(String[] args) {
     Jackrabbit3.create().execute(args[0]);
}

No more API changes. Ever!

Honestly, I think we better live with the changes, face them and manage 
them instead of hiding them behind a string based api with ad-hoc 
semantics.

Michael

>
>> The problem here is that since the MicroKernel is an intentionally
>> low-level API, we'll need a lot of higher level code to implement
>> features like versioning and search.
>
> Versioning: In the current implementation, versioned content is like
> regular content, except for a different path and additional properties. I
> would probably use a similar mechanism for Jackrabbit 3. I don't see how
> using a different API would help here.
>
> Search: We didn't start implementing search yet, and we didn't discuss
> this yet. I think it would be too early to define an API at that point, or
> even the search architecture. A few options are:
>
> - Use Lucene as we do now.
> - Use some other indexing mechanism; don't store data in the MicroKernel.
> - Use some other indexing mechanism; store data in the MicroKernel.
> - The MicroKernel could provide low-level indexing features
>    (to be defined).
>
>> Consider, for example, a simple task of updating a counter. The JCR
>> API for that is something like this:
>>
>>     Property count = session.getProperty("/counter/count");
>>     count.setValue(count.getLong() + 1);
>>     session.save();
>>
>> The equivalent MicroKernel code, as far as I understand the API, would
>> be something like this:
>>
>>     String revision = microkernel.getHeadRevision();
>>     String counter = microkernel.getNodes("/counter", revision, 0, 0, 0);
>>     JSONObject json = new JSONObject(counter);
>>     long count = json.getLong("count") + 1;
>>     revision = microkernel.commit("/counter", "^count:" + count,
>> revision);
>>
>> This doesn't strike me as a particularly programmer-friendly API.
>
> It's actually quite programmer-friendly in that you can easily debug
> (having everything as strings). I think this actually doesn't look too bad
> (for a low-level, internal API). It's a bit simpler/different though:
>
> ...
> String counter = microkernel.getNodes("/counter", revision);
> ...
> revision = microkernel.commit("/counter", "^ \"count\":" + count,
> revision);
>
> If it turns out that "incrementing" is very important, we could also add a
> new feature for it (without having to change the API):
>
> microkernel.commit("/counter", "+= \"count\": 1", revision);
>
> Regards,
> Thomas
>

Re: [j3] Repository MicroKernel API draft

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>To me it seems like the JSON parts of the API would be better suited
>for a higher-level integration layer.

What other API do you suggest, and why/how would it be better than JSON? A
few advantages of using JSON:

1) Loose coupling: the MicroKernel doesn't have to know all the details
about the higher level (data types,...).

2) The complexity of the API: this is a *a lot* simpler than the SPI.

3) No API change required if we add features. This is similar to using SQL
as the API (as in ODBC, JDBC): the API is simple ("execute(String sql)"),
so both clients and servers can evolve without having to change the API a
lot if there is a new feature. Actually, JDBC is worse because the data
types are part of the API.

>The problem here is that since the MicroKernel is an intentionally
>low-level API, we'll need a lot of higher level code to implement
>features like versioning and search.

Versioning: In the current implementation, versioned content is like
regular content, except for a different path and additional properties. I
would probably use a similar mechanism for Jackrabbit 3. I don't see how
using a different API would help here.

Search: We didn't start implementing search yet, and we didn't discuss
this yet. I think it would be too early to define an API at that point, or
even the search architecture. A few options are:

- Use Lucene as we do now.
- Use some other indexing mechanism; don't store data in the MicroKernel.
- Use some other indexing mechanism; store data in the MicroKernel.
- The MicroKernel could provide low-level indexing features
  (to be defined).

>Consider, for example, a simple task of updating a counter. The JCR
>API for that is something like this:
>
>    Property count = session.getProperty("/counter/count");
>    count.setValue(count.getLong() + 1);
>    session.save();
>
>The equivalent MicroKernel code, as far as I understand the API, would
>be something like this:
>
>    String revision = microkernel.getHeadRevision();
>    String counter = microkernel.getNodes("/counter", revision, 0, 0, 0);
>    JSONObject json = new JSONObject(counter);
>    long count = json.getLong("count") + 1;
>    revision = microkernel.commit("/counter", "^count:" + count,
>revision);
>
>This doesn't strike me as a particularly programmer-friendly API.

It's actually quite programmer-friendly in that you can easily debug
(having everything as strings). I think this actually doesn't look too bad
(for a low-level, internal API). It's a bit simpler/different though:

...
String counter = microkernel.getNodes("/counter", revision);
...
revision = microkernel.commit("/counter", "^ \"count\":" + count,
revision);

If it turns out that "incrementing" is very important, we could also add a
new feature for it (without having to change the API):

microkernel.commit("/counter", "+= \"count\": 1", revision);

Regards,
Thomas

Re: [j3] Repository MicroKernel API draft

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Fri, Jun 17, 2011 at 6:37 PM, Stefan Guggisberg
<st...@gmail.com> wrote:
> one of the design goals of the MicroKernel API was "easy portablilty".
> typically it would be used in-proc by some higher level code, comparable
> to the current SPI.

To me it seems like the JSON parts of the API would be better suited
for a higher-level integration layer.

The problem here is that since the MicroKernel is an intentionally
low-level API, we'll need a lot of higher level code to implement
features like versioning and search. My assumption is that these
components would still reside in the same JVM as the MicroKernel and
thus access it through the defined Java interface. Are we expecting
such code to have to parse and generate JSON strings whenever it wants
to access the underlying content?

Consider, for example, a simple task of updating a counter. The JCR
API for that is something like this:

    Property count = session.getProperty("/counter/count");
    count.setValue(count.getLong() + 1);
    session.save();

The equivalent MicroKernel code, as far as I understand the API, would
be something like this:

    String revision = microkernel.getHeadRevision();
    String counter = microkernel.getNodes("/counter", revision, 0, 0, 0);
    JSONObject json = new JSONObject(counter);
    long count = json.getLong("count") + 1;
    revision = microkernel.commit("/counter", "^count:" + count, revision);

This doesn't strike me as a particularly programmer-friendly API.

BR,

Jukka Zitting

Re: [j3] Repository MicroKernel API draft

Posted by Stefan Guggisberg <st...@gmail.com>.

sorry for being late...

On Fri, Jun 10, 2011 at 1:18 PM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On Wed, May 11, 2011 at 6:31 PM, Stefan Guggisberg
> <st...@gmail.com> wrote:
>> as some of you may have noticed i've started work on my own
>> MicroKernel proposal a while ago in the jackrabbit sandbox.
>>
>> although the project is in a very early stage i wanted to share my work
>> with you.
>
> Good stuff! I see people are already starting to collaborate on this.
>
>> as always, questions & feedback are welcome.
>
> The MicroKernel interface [1] reads more like a REST than a Java API.
> I guess that's the intention,

no, it just ended up like this after several redesign cycles ;)

> but I question why one would ever want
> to pass around serialized JSON strings around in a Java application. A
> Java client would just parse the string again, leading to unnecessary
> serialize/parse rounds whenever an API call is made. So, assuming we
> are designing a REST API (which seems like a good idea), instead of
> defining the API as a Java interface, wouldn't it make more sense to
> directly with a HTTP binding or alternatively a more abstract API
> definition?

one of the design goals of the MicroKernel API was "easy portablilty".
typically it would be used in-proc by some higher level code, comparable
to the current SPI.

why JSON strings?
- easy portability
- remoting-friendly
- leads to very compact API
- JSON parsing/generating overhead is IMO minimal
  and probably more effecient than creating (and collecting) tons of
  small java objects on the heap

cheers
stefan

>
> Another question: Why would we ever want to build our own JSON parsing
> and serialization code? Just use one of the existing libraries out
> there.
>
> [1] http://svn.apache.org/repos/asf/jackrabbit/sandbox/microkernel/src/main/java/org/apache/jackrabbit/mk/api/MicroKernel.java
>
> BR,
>
> Jukka Zitting
>

Re: [j3] Repository MicroKernel API draft

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Fri, Jun 10, 2011 at 1:41 PM, Michael Dürig <mi...@gmail.com> wrote:
>> Another question: Why would we ever want to build our own JSON parsing
>> and serialization code? Just use one of the existing libraries out
>> there.
>
> One of the reasons is that we are not strictly JSON. i.e. order of the items
> is of importance.

Sling already has the org.apache.sling.commons.json component that
implements using code forked from the external JSON Java library [1].

[1] http://www.json.org/java/

BR,

Jukka Zitting

Re: [j3] Repository MicroKernel API draft

Posted by Michael Dürig <mi...@gmail.com>.

Hi,

> Another question: Why would we ever want to build our own JSON parsing
> and serialization code? Just use one of the existing libraries out
> there.

One of the reasons is that we are not strictly JSON. i.e. order of the 
items is of importance.

Michael

Re: [j3] Repository MicroKernel API draft

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>Why would we ever want to build our own JSON parsing and serialization
>code? Just use one of the existing libraries out there.

This is similar to 'why do you build your own SQL-2 parser and don't use
Lex/Flex/Yacc/Javacc/ANTLR/other parser tool', or 'why do you build your
own cache and don't use Ehcache/other cache libary', or 'why do you build
your own (connection) pooling'. There are multiple reasons:

a) Part of it is JSOP (JSON DIFF) and not JSON. There is no 'standard'
JSOP tokenizer or parser yet, except the one Angela made (within
Jackrabbit - it seems you didn't notice that).

b) A large part of the JSON doesn't need to be fully parsed. The plan is
to store and return the 'raw' JSON, similar to how other modern systems do
it: 
http://stackoverflow.com/questions/853265/databases-using-json-as-storage-t
ransport-format - values don't need to be de-escaped for storage; instead,
the text is stored 'as is'. Using a full blown parser will unnecessarily
slow down processing.

c) One problem is how to preserve the property type of a value. JSON only
supports very few data types. See also
http://en.wikipedia.org/wiki/JSON#Unsupported_native_data_types - There is
a relatively simple solution which _requires_ that the MicroKernel doesn't
re-format the JSON text: http://en.wikipedia.org/wiki/JSON#cite_note-8

d) Because JSON is so simple, there is simple very very little code to
tokenizing, parsing, and specially generating JSON.

e) The existing JSON parsers I found are simply a pain to use. I want a
tokenizer, not a parser. Not a DOM-style parser that unnecessarily creates
a huge number of little objects. And not a callback/event/handler style
parser where you have to remember the state in some really ugly way. Part
of the current MicroKernel uses org.json.simple.parser.JSONParser, and I
actually find that part of the code painful and ugly.

Regards,
Thomas

Re: [j3] Repository MicroKernel API draft

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Wed, May 11, 2011 at 6:31 PM, Stefan Guggisberg
<st...@gmail.com> wrote:
> as some of you may have noticed i've started work on my own
> MicroKernel proposal a while ago in the jackrabbit sandbox.
>
> although the project is in a very early stage i wanted to share my work
> with you.

Good stuff! I see people are already starting to collaborate on this.

> as always, questions & feedback are welcome.

The MicroKernel interface [1] reads more like a REST than a Java API.
I guess that's the intention, but I question why one would ever want
to pass around serialized JSON strings around in a Java application. A
Java client would just parse the string again, leading to unnecessary
serialize/parse rounds whenever an API call is made. So, assuming we
are designing a REST API (which seems like a good idea), instead of
defining the API as a Java interface, wouldn't it make more sense to
directly with a HTTP binding or alternatively a more abstract API
definition?

Another question: Why would we ever want to build our own JSON parsing
and serialization code? Just use one of the existing libraries out
there.

[1] http://svn.apache.org/repos/asf/jackrabbit/sandbox/microkernel/src/main/java/org/apache/jackrabbit/mk/api/MicroKernel.java

BR,

Jukka Zitting

Re: [j3] Repository MicroKernel API draft

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

I started working on an in-memory implementation of the API. This should
be useful for testing. I copied some of the code from my jackrabbit-j3
prototype.

Regards,
Thomas




On 5/11/11 6:31 PM, "Stefan Guggisberg" <st...@gmail.com>
wrote:

>hi,
>
>as some of you may have noticed i've started work on my own
>MicroKernel proposal a while ago in the jackrabbit sandbox.
>
>although the project is in a very early stage i wanted to share my work
>with you.
>
>my idea was to first come up with an abstraction of a bare-bone
>MVCC -based repository storage engine and then test the feasibility
>with a primitive prototype.
>
>the source code is located here:
>http://svn.apache.org/repos/asf/jackrabbit/sandbox/microkernel/
>
>and here's some (still very basic) documentation:
>http://wiki.apache.org/jackrabbit/RepositoryMicroKernel
>
>as always, questions & feedback are welcome.
>
>cheers
>stefan