You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Thomas Müller <th...@day.com> on 2010/03/15 13:00:35 UTC

[jr3] MicroKernel prototype

Hi,

I committed a prototype for the Jackrabbit 3 architecture. See:

Source code:
http://svn.apache.org/repos/asf/jackrabbit/sandbox/jackrabbit-j3/

Documentation:
http://wiki.apache.org/jackrabbit/MicroKernelPrototype

Feedback is welcome.

Regards,
Thomas

Re: [jr3] MicroKernel prototype

Posted by Guo Du <mr...@gmail.com>.

On Thu, Mar 18, 2010 at 10:24 AM, Thomas Müller <th...@day.com> wrote:
>> - security
>> - locking
>> - scalability (number of concurrent sessions and repository size)
>> - transactions
> OK, I will then try to implement (prototype) those features now.
Hi Thomas, no push to those features. Take your time and priority
features with your plan.

Thanks!

-Guo

Re: [jr3] MicroKernel prototype

Posted by Thomas Müller <th...@day.com>.

Hi,

> it's too early IMO to judge whether a caching hierarchy manager is
> needed or not...
> IMO the only statement that can be made based on your comparison
> is that if the prototype with very limited functionality were slower than
> jackrabbit with a fully implemented feature set, the protoype's architecture
> would probably need to be reconsidered ;)

I agree.

> - security
> - locking
> - scalability (number of concurrent sessions and repository size)
> - transactions

OK, I will then try to implement (prototype) those features now.

> very flat hierarchies

Yes. We do want to solve that, it will affect the architecture, and we
don't have much experience yet how to best solve it. So I guess it's
also one of the features that should be implemented early.

Regards,
Thomas

Re: [jr3] MicroKernel prototype

Posted by Stefan Guggisberg <st...@gmail.com>.

On Wed, Mar 17, 2010 at 7:42 PM, Thomas Müller <th...@day.com> wrote:
> Hi,
>
>> i doubt that the results of this comparison is any way significant.
>
> It was not supposed to be a fair comparison :-) Of course the
> prototype doesn't implement all features. For example path are parsed
> in a very simplistic way. I don't think the end result will be as fast
> as the prototype. Still, I do hope that the missing features will not
> slow down the code significantly if they are not used. And if they are
> used, the penalty shouldn't be too high.
>
> What is significant is: the prototype is not slower than the "full"
> Jackrabbit, even without the CachingHierarchyManager.

some of the 'missing' features are path-based (e.g. locking).
it's too early IMO to judge whether a caching hierarchy manager is
needed or not...

IMO the only statement that can be made based on your comparison
is that if the prototype with very limited functionality were slower than
jackrabbit with a fully implemented feature set, the protoype's architecture
would probably need to be reconsidered ;)

> For me that's
> relatively important because it would simplify the architecture. More
> tests are required to check if the current architecture works well
> even if there are millions of nodes and many concurrent sessions. And
> it's important to add more features of course.
>
> I'm wondering what is the *most* problematic features to verify the
> architecture:
>
> - security
> - orderable child nodes
> - same name siblings
> - locking
> - transactions
> - clustering
> - observation
> - workspaces
> - node types
> - large number of child nodes
> - search
> - correct path parsing and lookup
> - multiple sessions

in my experience the most critical features performance-wise
are

- security
- locking
- scalability (number of concurrent sessions and repository size)
- transactions

very flat hierarchies are performance-critical aswell.
however, since i consider them rather an edge case
i wouldn't focus on optimizing the architecture for this
very specific use case. i agree that we should support
it but performance for reasonably distributed hierarchies
(up to say 10k child entries per node) should be optimized
since this is probably the most common use case.

cheers
stefan

>
>> cut some features to gian performance improvement.
>
> I'm not sure. What features could be cut?
>
> Regards,
> Thomas
>

Re: [jr3] MicroKernel prototype

Posted by Alexander Klimetschek <ak...@day.com>.

Hi Thomas,

results look good! I think it's clear that this is not a final and
fair comparison, but it's good to be able to see which of the missing
features actually makes it more complex and/or slower while they will
be implemented.

On Wed, Mar 17, 2010 at 19:42, Thomas Müller <th...@day.com> wrote:
> I'm wondering what is the *most* problematic features to verify the
> architecture:

I have no real answer, but just a prioritization what should be fast
and what not (I reordered the bullet points accordingly):

> - orderable child nodes
> - transactions
> - clustering
> - observation
> - correct path parsing and lookup
> - multiple sessions
> - large number of child nodes (as discussed for jr3)

Core features IMO, must be as fast as possible.

> - same name siblings
> - node types
> - locking

I think the above features should be "optional" and only add
additional computations if used (eg. if a non-nt:unstructured node
type is used, etc.).

> - security

Since this typically has to work on each node/property read, it has a
major impact. However, I guess that it is already quite optimized in
Jackrabbit 2.0.

> - workspaces

Shouldn't impact performance - just like a larger number of total
nodes in a single workspace should not affect performance.

> - search

Lies in-between... the problem with search is the (required ?)
transactionality on writes.

>> cut some features to gian performance improvement.
>
> I'm not sure. What features could be cut?

None, it should still be fully jcr 2.0 compliant. But as we discussed,
it can deliberately prefer certain use cases.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: [jr3] MicroKernel prototype

Posted by Guo Du <mr...@gmail.com>.

On Wed, Mar 17, 2010 at 6:42 PM, Thomas Müller <th...@day.com> wrote:
> I'm wondering what is the *most* problematic features to verify the
> architecture:
> - security
> - orderable child nodes
> - same name siblings
> - locking
> - transactions
> - clustering
> - observation
> - workspaces
> - node types
> - large number of child nodes
> - search
> - correct path parsing and lookup
> - multiple sessions
>> cut some features to gian performance improvement.
> I'm not sure. What features could be cut?
Your list of  *most* problematic features. Those features will be
implemented on the prototype micro kernel, if they are plugged in as
optional feature, then user can take out the feature if they want. Of
course, the user need response for the consequence :)

-Guo

Re: [jr3] MicroKernel prototype

Posted by Thomas Müller <th...@day.com>.

Hi,

> i doubt that the results of this comparison is any way significant.

It was not supposed to be a fair comparison :-) Of course the
prototype doesn't implement all features. For example path are parsed
in a very simplistic way. I don't think the end result will be as fast
as the prototype. Still, I do hope that the missing features will not
slow down the code significantly if they are not used. And if they are
used, the penalty shouldn't be too high.

What is significant is: the prototype is not slower than the "full"
Jackrabbit, even without the CachingHierarchyManager. For me that's
relatively important because it would simplify the architecture. More
tests are required to check if the current architecture works well
even if there are millions of nodes and many concurrent sessions. And
it's important to add more features of course.

I'm wondering what is the *most* problematic features to verify the
architecture:

- security
- orderable child nodes
- same name siblings
- locking
- transactions
- clustering
- observation
- workspaces
- node types
- large number of child nodes
- search
- correct path parsing and lookup
- multiple sessions

> cut some features to gian performance improvement.

I'm not sure. What features could be cut?

Regards,
Thomas

Re: [jr3] MicroKernel prototype

Posted by Guo Du <mr...@gmail.com>.

On Wed, Mar 17, 2010 at 4:45 PM, Stefan Guggisberg
<st...@gmail.com> wrote:
> jackrabbit is the reference implementation of JCR 1.0/2.0 and therefore has
> to fully support all the spec'ed features (node types, same name siblings,
> locking, access control, etc etc).
I agree that the default distribution should comply with the jcr spec.
But hope jr3 architectured to be able to cut some features to gian
performance improvement.

-Guo

Re: [jr3] MicroKernel prototype

Posted by Stefan Guggisberg <st...@gmail.com>.

On Tue, Mar 16, 2010 at 7:46 PM, Thomas Müller <th...@day.com> wrote:
> Hi,
>
> I have some early performance test results: There is a test with 3
> levels of child nodes (each node 20 children)
> (TestSimple.createReadNodes).
>
> With the JDBC storage and the H2 database, this is about 14 times
> faster than the Jackrabbit 2.0 trunk (0.2 seconds versus 2.9 seconds
> for Jackrabbit 2.0). This is after 3 test runs. The storage space
> usage is about 1/3 (2.8 MB for the prototype versus 9.5 MB for
> Jackrabbit 2.0).

sorry, but i doubt that the results of this comparison is any way significant.
jackrabbit is the reference implementation of JCR 1.0/2.0 and therefore has
to fully support all the spec'ed features (node types, same name siblings,
locking, access control, etc etc).

the protoype in its current state deliberately ommits most of those 'features'
which makes the comparison somehow unfair.

i don't doubt that the prototype is on the right track and that the resulting
implementation will indeed provide significant performance gains.
i just think that making such claims at this early stage is a bit premature.

cheers
stefan

>
> Regards,
> Thomas
>

Re: [jr3] MicroKernel prototype

Posted by Felix Meschberger <fm...@gmail.com>.

Hi,

On 17.03.2010 10:34, Thomas Müller wrote:
> Hi,
> 
>> In your opinion, which part make the most of performance contribution?
> 
> It's hard to say. I would rather spend my time to work on the
> prototype than to find out.

Agreed. And at the end of the day, it is probably not very interesting
given that the new architecture should start from scratch and not try to
enhance the existing.

> To keep the prototype fast, it's important
> to always check performance.

This is a dangerous sentence as it gives way to premature optimization ;-)

> I guess we didn't always do that for
> Jackrabbit unfortunately. By the way, if you use a slower persistence
> layer then the gain will be much less.

I think the approach to Jackrabbit 3 must be first and foremost to come
up with a clean and easy to understand architecture. This is much more
important than raw performance.

Once you have an implementation based on a clear architecture, you can
analyze where you can loose weight.

(and yes I know it is not black and white and performance must be kept
in mind, it must not be the first goal)

Regards
Felix

> 
>> Does prototype have similar cache size.
> 
> Both Jackrabbit and the prototype use about 20 MB at the end of the
> run. The prototype uses a weak reference map (so does the current
> Jackrabbit as far as I know) and doesn't have to read data from the
> persistence layer. The current Jackrabbit does read, not sure why. If
> I disable this cache in the prototype, the test takes 0.24 seconds
> instead of 0.2 seconds.
> 
> Regards,
> Thomas
>

Re: [jr3] MicroKernel prototype

Posted by Guo Du <mr...@gmail.com>.

On Wed, Mar 17, 2010 at 9:34 AM, Thomas Müller <th...@day.com> wrote:
>> In your opinion, which part make the most of performance contribution?
> It's hard to say. I would rather spend my time to work on the
> prototype than to find out. To keep the prototype fast, it's important
Understand it will take effort to identify the hot point. Performance
is not the top priority especially for prototype. But if we know
where/why it slow/fast in our implementations, it could help the new
code base.

-Guo

Re: [jr3] MicroKernel prototype

Posted by Thomas Müller <th...@day.com>.

Hi,

> In your opinion, which part make the most of performance contribution?

It's hard to say. I would rather spend my time to work on the
prototype than to find out. To keep the prototype fast, it's important
to always check performance. I guess we didn't always do that for
Jackrabbit unfortunately. By the way, if you use a slower persistence
layer then the gain will be much less.

> Does prototype have similar cache size.

Both Jackrabbit and the prototype use about 20 MB at the end of the
run. The prototype uses a weak reference map (so does the current
Jackrabbit as far as I know) and doesn't have to read data from the
persistence layer. The current Jackrabbit does read, not sure why. If
I disable this cache in the prototype, the test takes 0.24 seconds
instead of 0.2 seconds.

Regards,
Thomas

Re: [jr3] MicroKernel prototype

Posted by Guo Du <mr...@gmail.com>.

On Tue, Mar 16, 2010 at 6:46 PM, Thomas Müller <th...@day.com> wrote:
> With the JDBC storage and the H2 database, this is about 14 times
> faster than the Jackrabbit 2.0 trunk (0.2 seconds versus 2.9 seconds
Thanks for the exciting result.

In your opinion, which part make the most of performance contribution?
Does prototype have similar cache size.

-Guo

Re: [jr3] MicroKernel prototype

Posted by Felix Meschberger <fm...@gmail.com>.

Hi,

On 17.03.2010 12:19, Thomas Müller wrote:
> Hi,
> 
>> premature optimization
> 
> Sure, premature optimization should be avoided. But sometimes you need
> to validate that a certain architecture / algorithm doesn't result in
> very slow or unmaintainable code. I mean, that's the reason to write a
> prototype: so you can actually test it.
> 
>> clean and easy to understand architecture
> 
> Yes, make it as easy as possible. For example, I didn't implement a
> CachingHierarchyManager. I was a bit afraid it will be a problem, but
> it looks like it's not. That's good.
> 
>> loose weight
> 
> You probably mean "speed up".

Yeah, took the liberty of using a metaphor ;-)

On the other hand, removing unneeded, duplicate code is also something
worth while...

> Yes, it's almost always possible to
> improve performance a bit by tweaking the implementation. But it's
> usually very hard to "remove code" after it has been added and used.

Depends. If you use proper encapsulation separating public and private
parts, you are completely free restructuring the private part - as long
as you stick with the general contract ...

This is why I am so keen on separating internal and external parts ! Its
expensive to begin with but gives far more freedom afterwards.

> 
>> the prototype currently probably misses tasks that Jackrabbit currently does
> 
> Sure. But I don't know any that would account for big performance or
> architectural problems. Shareable nodes was one such case (without
> shareable nodes the prototype wouldn't need NodeState).
> 
>> data validation against the node types
> 
> Nobody uses node types. Just joking. The test case doesn't use node
> types so I don't think this is the reason why the prototype is so much
> faster. But there might be other things I forgot.

Well, ehrm, unless Jackrabbit handles "nt:unstructured" specially, you
*always* use node types ;-)

Regards
Felix

Re: [jr3] MicroKernel prototype

Posted by Thomas Müller <th...@day.com>.

Hi,

> premature optimization

Sure, premature optimization should be avoided. But sometimes you need
to validate that a certain architecture / algorithm doesn't result in
very slow or unmaintainable code. I mean, that's the reason to write a
prototype: so you can actually test it.

> clean and easy to understand architecture

Yes, make it as easy as possible. For example, I didn't implement a
CachingHierarchyManager. I was a bit afraid it will be a problem, but
it looks like it's not. That's good.

> loose weight

You probably mean "speed up". Yes, it's almost always possible to
improve performance a bit by tweaking the implementation. But it's
usually very hard to "remove code" after it has been added and used.

> the prototype currently probably misses tasks that Jackrabbit currently does

Sure. But I don't know any that would account for big performance or
architectural problems. Shareable nodes was one such case (without
shareable nodes the prototype wouldn't need NodeState).

> data validation against the node types

Nobody uses node types. Just joking. The test case doesn't use node
types so I don't think this is the reason why the prototype is so much
faster. But there might be other things I forgot.

Regards,
Thomas

Re: [jr3] MicroKernel prototype

Posted by Felix Meschberger <fm...@gmail.com>.

Hi

Given as such, this looks promising, event though the prototype
currently probably misses tasks that Jackrabbit currently does, right ?
I am thinking of data validation against the node types when storing.

Regards
Felix

On 16.03.2010 19:46, Thomas Müller wrote:
> Hi,
> 
> I have some early performance test results: There is a test with 3
> levels of child nodes (each node 20 children)
> (TestSimple.createReadNodes).
> 
> With the JDBC storage and the H2 database, this is about 14 times
> faster than the Jackrabbit 2.0 trunk (0.2 seconds versus 2.9 seconds
> for Jackrabbit 2.0). This is after 3 test runs. The storage space
> usage is about 1/3 (2.8 MB for the prototype versus 9.5 MB for
> Jackrabbit 2.0).
> 
> Regards,
> Thomas
>

Re: [jr3] MicroKernel prototype

Posted by Thomas Müller <th...@day.com>.

Hi,

I have some early performance test results: There is a test with 3
levels of child nodes (each node 20 children)
(TestSimple.createReadNodes).

With the JDBC storage and the H2 database, this is about 14 times
faster than the Jackrabbit 2.0 trunk (0.2 seconds versus 2.9 seconds
for Jackrabbit 2.0). This is after 3 test runs. The storage space
usage is about 1/3 (2.8 MB for the prototype versus 9.5 MB for
Jackrabbit 2.0).

Regards,
Thomas